Profile Go Code with Flamegraphs: A Practical pprof Guide

Flamegraphs are visual representations of your application's call stack. They help you see exactly where your code spends time during execution.

When your application runs slowly, flamegraphs show you which functions are taking the most CPU cycles, making it easier to find and fix performance problems.

The terms "flame graph" (two words, the original spelling from Brendan Gregg) and "flamegraph" (one word) refer to the same chart. For a Go flamegraph, the canonical workflow is to feed a pprof CPU profile into go tool pprof and render it interactively, no external scripts needed.

This is a flamegraph showing the call stack of a running program.

How to read a flamegraph

Width represents time - wider boxes mean more CPU time spent in that function
Height shows the call stack depth - each level is a function call
Colors help distinguish different functions (varies by tool)
The widest sections are usually your performance bottlenecks

Most profiling tools let you hover over sections to see function names and timing details.

Generating a Go flamegraph with pprof

The shortest path to a working go pprof flamegraph is to expose the pprof endpoints over HTTP and let the Go toolchain render the chart for you.

import (
"net/http"
_ "net/http/pprof"
)

func main() {
go http.ListenAndServe("localhost:6060", nil)
// your program continues here
}

While the program is doing real work, capture a 30 second CPU profile and open it in pprof's web UI:

go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30

The web UI ships with a Flame Graph view. That is the same kind of chart you saw above, generated from your own program's CPU profile, with no flamegraph.pl or perl scripts in the loop.

For a one-shot benchmark or short-lived program, write the profile to a file with runtime/pprof and pass that file to the same go tool pprof -http command.

Real example: fixing a performance bug

Here's how I used flamegraphs to find and fix a performance issue in an email validation service that was using way more CPU than expected.

Flamegraph showing performance bottleneck in email validation

Looking at the flamegraph, you can see wide blocks under regexp.MatchString and specifically regexp.compile calls taking up significant CPU time.

The root cause

The flamegraph shows the issue clearly. Here's the problematic code:

func validateEmails(emails []string) []bool {
results := make([]bool, len(emails))
for i, email := range emails {
	matched, _ := regexp.MatchString(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`, email)
	results[i] = matched
}
return results
}

The problem: every call to regexp.MatchString compiles the regex pattern from scratch. For 1000 emails, that's 1000 unnecessary regex compilations.

The flamegraph shows regexp.compile taking way too much CPU time.

Golang Regex Compile Documentation

The Go documentation clearly states that for performance-critical code, you should pre-compile your regex patterns.

The fix

Here's the optimized version:

var emailRegex = regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)

func validateEmails(emails []string) []bool {
results := make([]bool, len(emails))
for i, email := range emails {
	results[i] = emailRegex.MatchString(email)
}
return results
}

Before vs after

Using pprof's -diff_base flag to compare performance:

Flamegraph diff

The differential flamegraph shows the regexp.compile calls have completely disappeared, and CPU usage dropped dramatically.

This simple change resulted in a 10x performance improvement.

Takeaways

Don't guess where performance issues are, use flamegraphs to show you
Look for patterns - repeated function calls in hot paths are often the biggest problems
Read the documentation - most performance issues come from not following best practices
Always measure before and after to validate your optimizations

Common issues flamegraphs expose

Regex compilation in loops (like our example)
N+1 database queries
Database connections not being reused
Inefficient string concatenation
Memory allocations in hot paths
Blocking I/O in CPU intensive code

Flamegraphs save time by showing you exactly where to look instead of guessing.