Spring Boot 3 Webflux OOM Troubleshooting
目录
Issue Description#
Recently, I encountered an OutOfMemoryError
(OOM) issue in a Spring Boot 3 Webflux application. The error log is as follows:
"logger": "reactor.core.scheduler.Schedulers",
"@version": 1,
"appVersion": "0.0.0",
"cloud": "aws-ec2",
"stacktrace": "java.lang.OutOfMemoryError: Java heap space\n",
"thread": "boundedElastic-8",
"level": "ERROR"
Root Cause Analysis#
The application was using java21 with 6GB heap memory. But the actual memory usage was not high. When the oom occurred, the prometheus can’t scrape the metrics endpoint anymore.
Troubleshooting Steps#
Before we restart the application, we can try to get the heap dump file first.
Because we are using spring boot actuator, we can check the /actuator/heapdump
endpoint to get the heap dump file first. Then analyze the heap dump file with Eclipse MAT
.
Get Heap Dump File#
-
reference Spring Boot Actuator - Heap Dump
-
add configuration in
application.yml
(spring boot 3.5.4)
management:
endpoints:
web:
exposure:
include: health,info,heapdump
endpoint:
heapdump:
enabled: true
- access the endpoint
http://xxx/actuator/heapdump
to download the heap dump file
Analyze Heap Dump File#
- Download and install Eclipse MAT
- For M2 Mac, you need to configure the
/Applications/MemoryAnalyzer.app/Contents/Eclipse/MemoryAnalyzer.ini
### Add your vm arguments here
-vm
/Users/matthew/.sdkman/candidates/java/current/bin
### need to add -vm before -vmargs
-vmargs
--add-exports=java.base/jdk.internal.org.objectweb.asm=ALL-UNNAMED
- Open the heap dump file with Eclipse MAT
(Should revise the heapdump file name to
heapdump.hprof
first. Then the Eclipse MAT can recognize the file.)
Seems like too many PrometheusDistributionSummary
instances are created.
Solution#
I just removed the distribution
configuration in application.yml
for prometheus metrics.
Then the OOM issue is resolved.
If we want to use the distribution
configuration, we can try to narrow down the scope of the metrics.
- reference Actuator metrics