Issue Description#

Recently, I encountered an OutOfMemoryError (OOM) issue in a Spring Boot 3 Webflux application. The error log is as follows:

    "logger": "reactor.core.scheduler.Schedulers",
    "@version": 1,
    "appVersion": "0.0.0",
    "cloud": "aws-ec2",
    "stacktrace": "java.lang.OutOfMemoryError: Java heap space\n",
    "thread": "boundedElastic-8",
    "level": "ERROR"

Root Cause Analysis#

The application was using java21 with 6GB heap memory. But the actual memory usage was not high. When the oom occurred, the prometheus can’t scrape the metrics endpoint anymore.

webflux-oom-jvm

Troubleshooting Steps#

Before we restart the application, we can try to get the heap dump file first.

Because we are using spring boot actuator, we can check the /actuator/heapdump endpoint to get the heap dump file first. Then analyze the heap dump file with Eclipse MAT.

Get Heap Dump File#

management:
    endpoints:
        web:
            exposure:
                include: health,info,heapdump
    endpoint:
        heapdump:
            enabled: true
  • access the endpoint http://xxx/actuator/heapdump to download the heap dump file

Analyze Heap Dump File#

  • Download and install Eclipse MAT
  • For M2 Mac, you need to configure the /Applications/MemoryAnalyzer.app/Contents/Eclipse/MemoryAnalyzer.ini
### Add your vm arguments here
-vm
/Users/matthew/.sdkman/candidates/java/current/bin
### need to add -vm before -vmargs
-vmargs
--add-exports=java.base/jdk.internal.org.objectweb.asm=ALL-UNNAMED
  • Open the heap dump file with Eclipse MAT (Should revise the heapdump file name to heapdump.hprof first. Then the Eclipse MAT can recognize the file.) webflux-oom-mat

Seems like too many PrometheusDistributionSummary instances are created.

Solution#

I just removed the distribution configuration in application.yml for prometheus metrics.

webflux-oom-fix

Then the OOM issue is resolved.

If we want to use the distribution configuration, we can try to narrow down the scope of the metrics. spring-boot-distribution