to navigate

to select

to close

On this page

async-profiler

async-profiler is a low-overhead Java profiler that uses AsyncGetCallTrace (JDK internal API) and perf events to produce accurate CPU and allocation profiles with flame graph output.

Why async-profiler?

Feature	async-profiler	VisualVM Sampler
Overhead	<2%	5–15%
Production safe	Yes	Caution
Flame graphs	Native	Plugin needed
Allocation profiling	Yes	Limited
Wall-clock profiling	Yes	No

Installation

  # Download from GitHub releases
wget https://github.com/async-profiler/async-profiler/releases/download/v2.9/async-profiler-2.9-linux-x64.tar.gz
tar xzf async-profiler-2.9-linux-x64.tar.gz

# Or via package manager (macOS)
brew install async-profiler

CPU Profiling

  # Profile for 60 seconds, output flame graph
./profiler.sh -d 60 -f cpu-flame.html <pid>

# Profile specific event
./profiler.sh -e cpu -d 30 -f cpu.html <pid>

# Profile with thread filter
./profiler.sh -d 60 -t -f cpu-threads.html <pid>

Allocation Profiling

Find where objects are allocated:

  # Profile allocations > 1KB
./profiler.sh -e alloc -d 60 --alloc 1024 -f alloc-flame.html <pid>

Wall-Clock Profiling

Profile elapsed time including I/O waits:

  ./profiler.sh -e wall -d 60 -f wall-flame.html <pid>

Useful when CPU profiling shows low utilization but the app is slow (I/O bound).

Attaching to Running JVM

  # List Java processes
jps -l

# Start profiling
./profiler.sh start <pid>
# ... run load test ...
./profiler.sh stop <pid> -f profile.html

# One-liner
./profiler.sh -d 60 -f /tmp/profile.html $(pgrep -f myapp.jar)

Reading Flame Graphs

  Flame graph (bottom = call stack root, top = leaf)
┌──────────────────────────────────────────────┐
│              StringBuilder.append             │  ← hot leaf method
├──────────────────────┬───────────────────────┤
│   processOrder       │    buildResponse       │
├──────────────────────┴───────────────────────┤
│              OrderService.handle              │
├──────────────────────────────────────────────┤
│           Tomcat HTTP thread                  │  ← root (bottom)
└──────────────────────────────────────────────┘

Width = time spent (wider = more CPU)

Look for:

Wide plateaus — methods consuming the most CPU
Unexpected calls — regex, reflection, serialization in hot paths
Tall stacks — deep call chains (may indicate recursion or excessive layering)

JFR Integration

Export as JFR for analysis in JDK Mission Control:

  ./profiler.sh -d 60 -o jfr -f recording.jfr <pid>
jmc  # open recording.jfr

Continuous Profiling (Production)

  # Cron job: 60s profile every 5 minutes during peak hours
*/5 9-17 * * * /opt/async-profiler/profiler.sh -d 60 -f /var/log/profiles/$(date +\%H\%M).html $(pgrep -f myapp.jar)

Store flame graphs for trend analysis — compare hot methods over time.

Common Findings

Hot method	Likely fix
`Pattern.compile`	Cache compiled Pattern
`SimpleDateFormat`	Use `DateTimeFormatter` (thread-safe)
`ObjectInputStream.readObject`	Cache deserialized objects
`HashMap.get/put`	Wrong hash function or oversized map
JDBC `executeQuery`	Missing index, N+1 queries
`GC` frames	Tune heap or reduce allocation

Best Practices

Profile under realistic load — idle apps show misleading results
Use -e wall when CPU profile shows low utilization but latency is high
Compare flame graphs before and after optimizations
Safe for production at default sampling rates (typically 100Hz)
Combine with JFR for holistic JVM analysis

Profiling Basics

JIT Compilation Tiers

async-profiler

Why async-profiler? link

Installation link

CPU Profiling link

Allocation Profiling link

Wall-Clock Profiling link

Attaching to Running JVM link

Reading Flame Graphs link

JFR Integration link

Continuous Profiling (Production) link

Common Findings link

Best Practices link