On this page
async-profiler
async-profiler is a low-overhead Java profiler that uses AsyncGetCallTrace (JDK internal API) and perf events to produce accurate CPU and allocation profiles with flame graph output.
Why async-profiler?
| Feature | async-profiler | VisualVM Sampler |
|---|---|---|
| Overhead | <2% | 5–15% |
| Production safe | Yes | Caution |
| Flame graphs | Native | Plugin needed |
| Allocation profiling | Yes | Limited |
| Wall-clock profiling | Yes | No |
Installation
# Download from GitHub releases
wget https://github.com/async-profiler/async-profiler/releases/download/v2.9/async-profiler-2.9-linux-x64.tar.gz
tar xzf async-profiler-2.9-linux-x64.tar.gz
# Or via package manager (macOS)
brew install async-profiler
CPU Profiling
# Profile for 60 seconds, output flame graph
./profiler.sh -d 60 -f cpu-flame.html <pid>
# Profile specific event
./profiler.sh -e cpu -d 30 -f cpu.html <pid>
# Profile with thread filter
./profiler.sh -d 60 -t -f cpu-threads.html <pid>
Allocation Profiling
Find where objects are allocated:
# Profile allocations > 1KB
./profiler.sh -e alloc -d 60 --alloc 1024 -f alloc-flame.html <pid>
Wall-Clock Profiling
Profile elapsed time including I/O waits:
./profiler.sh -e wall -d 60 -f wall-flame.html <pid>
Useful when CPU profiling shows low utilization but the app is slow (I/O bound).
Attaching to Running JVM
# List Java processes
jps -l
# Start profiling
./profiler.sh start <pid>
# ... run load test ...
./profiler.sh stop <pid> -f profile.html
# One-liner
./profiler.sh -d 60 -f /tmp/profile.html $(pgrep -f myapp.jar)
Reading Flame Graphs
Flame graph (bottom = call stack root, top = leaf)
┌──────────────────────────────────────────────┐
│ StringBuilder.append │ ← hot leaf method
├──────────────────────┬───────────────────────┤
│ processOrder │ buildResponse │
├──────────────────────┴───────────────────────┤
│ OrderService.handle │
├──────────────────────────────────────────────┤
│ Tomcat HTTP thread │ ← root (bottom)
└──────────────────────────────────────────────┘
Width = time spent (wider = more CPU)
Look for:
- Wide plateaus — methods consuming the most CPU
- Unexpected calls — regex, reflection, serialization in hot paths
- Tall stacks — deep call chains (may indicate recursion or excessive layering)
JFR Integration
Export as JFR for analysis in JDK Mission Control:
./profiler.sh -d 60 -o jfr -f recording.jfr <pid>
jmc # open recording.jfr
Continuous Profiling (Production)
# Cron job: 60s profile every 5 minutes during peak hours
*/5 9-17 * * * /opt/async-profiler/profiler.sh -d 60 -f /var/log/profiles/$(date +\%H\%M).html $(pgrep -f myapp.jar)
Store flame graphs for trend analysis — compare hot methods over time.
Common Findings
| Hot method | Likely fix |
|---|---|
Pattern.compile |
Cache compiled Pattern |
SimpleDateFormat |
Use DateTimeFormatter (thread-safe) |
ObjectInputStream.readObject |
Cache deserialized objects |
HashMap.get/put |
Wrong hash function or oversized map |
JDBC executeQuery |
Missing index, N+1 queries |
GC frames |
Tune heap or reduce allocation |
Best Practices
- Profile under realistic load — idle apps show misleading results
- Use
-e wallwhen CPU profile shows low utilization but latency is high - Compare flame graphs before and after optimizations
- Safe for production at default sampling rates (typically 100Hz)
- Combine with JFR for holistic JVM analysis