async-profiler is a low-overhead Java profiler that uses AsyncGetCallTrace (JDK internal API) and perf events to produce accurate CPU and allocation profiles with flame graph output.

Why async-profiler?

Feature async-profiler VisualVM Sampler
Overhead <2% 5–15%
Production safe Yes Caution
Flame graphs Native Plugin needed
Allocation profiling Yes Limited
Wall-clock profiling Yes No

Installation

  # Download from GitHub releases
wget https://github.com/async-profiler/async-profiler/releases/download/v2.9/async-profiler-2.9-linux-x64.tar.gz
tar xzf async-profiler-2.9-linux-x64.tar.gz

# Or via package manager (macOS)
brew install async-profiler
  

CPU Profiling

  # Profile for 60 seconds, output flame graph
./profiler.sh -d 60 -f cpu-flame.html <pid>

# Profile specific event
./profiler.sh -e cpu -d 30 -f cpu.html <pid>

# Profile with thread filter
./profiler.sh -d 60 -t -f cpu-threads.html <pid>
  

Allocation Profiling

Find where objects are allocated:

  # Profile allocations > 1KB
./profiler.sh -e alloc -d 60 --alloc 1024 -f alloc-flame.html <pid>
  

Wall-Clock Profiling

Profile elapsed time including I/O waits:

  ./profiler.sh -e wall -d 60 -f wall-flame.html <pid>
  

Useful when CPU profiling shows low utilization but the app is slow (I/O bound).

Attaching to Running JVM

  # List Java processes
jps -l

# Start profiling
./profiler.sh start <pid>
# ... run load test ...
./profiler.sh stop <pid> -f profile.html

# One-liner
./profiler.sh -d 60 -f /tmp/profile.html $(pgrep -f myapp.jar)
  

Reading Flame Graphs

  Flame graph (bottom = call stack root, top = leaf)
┌──────────────────────────────────────────────┐
│              StringBuilder.append             │  ← hot leaf method
├──────────────────────┬───────────────────────┤
│   processOrder       │    buildResponse       │
├──────────────────────┴───────────────────────┤
│              OrderService.handle              │
├──────────────────────────────────────────────┤
│           Tomcat HTTP thread                  │  ← root (bottom)
└──────────────────────────────────────────────┘

Width = time spent (wider = more CPU)
  

Look for:

  • Wide plateaus — methods consuming the most CPU
  • Unexpected calls — regex, reflection, serialization in hot paths
  • Tall stacks — deep call chains (may indicate recursion or excessive layering)

JFR Integration

Export as JFR for analysis in JDK Mission Control:

  ./profiler.sh -d 60 -o jfr -f recording.jfr <pid>
jmc  # open recording.jfr
  

Continuous Profiling (Production)

  # Cron job: 60s profile every 5 minutes during peak hours
*/5 9-17 * * * /opt/async-profiler/profiler.sh -d 60 -f /var/log/profiles/$(date +\%H\%M).html $(pgrep -f myapp.jar)
  

Store flame graphs for trend analysis — compare hot methods over time.

Common Findings

Hot method Likely fix
Pattern.compile Cache compiled Pattern
SimpleDateFormat Use DateTimeFormatter (thread-safe)
ObjectInputStream.readObject Cache deserialized objects
HashMap.get/put Wrong hash function or oversized map
JDBC executeQuery Missing index, N+1 queries
GC frames Tune heap or reduce allocation

Best Practices

  • Profile under realistic load — idle apps show misleading results
  • Use -e wall when CPU profile shows low utilization but latency is high
  • Compare flame graphs before and after optimizations
  • Safe for production at default sampling rates (typically 100Hz)
  • Combine with JFR for holistic JVM analysis