Intermittent 413s Behind Spring Cloud Gateway: Anatomy of an h2c Upgrade Trap
目录
The symptom#
A production mystery: file uploads through a Spring Cloud Gateway (Server MVC, servlet stack) — POST /api/file/upload, multipart/form-data — behaved like this:
-
With 1 gateway pod: 413 never happens;
-
Scaled to 3 gateway pods: intermittent
413 PAYLOAD_TOO_LARGE, gateway log:"logger": "o.s.web.servlet.DispatcherServlet", "message": "Completed 413 PAYLOAD_TOO_LARGE" -
Scale back to 1 pod, problem disappears;
-
The gateway was freshly deployed and its pods homogeneous; the upload backend ran 3 pods, also homogeneous.
The uploaded file was only 3MB — far below the backend’s 10MB multipart limit. So who is rejecting this request? And why does it correlate with the gateway pod count?
TL;DR#
No service’s body-size limit was ever hit, and pod homogeneity is irrelevant. It’s three individually reasonable defaults composing into a Tomcat connector-level 413:
| # | Configuration fact | Where it comes from |
|---|---|---|
| 1 | The upload backend has cleartext HTTP/2 (h2c) enabled: SERVER_HTTP2_ENABLED: "true" |
An environment ConfigMap injected uniformly into most backend services |
| 2 | The gateway’s downstream HTTP client is the JDK HttpClient (no Apache HC5 / Jetty / Reactor Netty on the classpath), and the JDK HttpClient prefers HTTP/2 by default | Gateway pom + Spring Boot’s classpath-based client detection |
| 3 | For an h2c Upgrade request that carries a body, Tomcat must read and buffer the whole body first (RFC 7230); the buffer is capped by maxSavePostSize, default 4096 bytes — anything larger gets a 413 and the connection closed |
Tomcat Http11Processor / AbstractHttp11Protocol |
Composed behavior:
- The JDK HttpClient attaches an
Upgrade: h2cheader to the first request of every new connection — but only when the pool has no usable HTTP/2 connection for that origin — regardless of whether that request is a 3MB file upload; - If that first request’s body is > 4KB → connector-level 413 from Tomcat (before auth/Spring — the backend logs nothing);
- If the first request’s body is ≤ 4KB (a typical GET/small POST) → the upgrade succeeds, the h2 connection enters the pool, and every subsequent request (large uploads included) multiplexes over it just fine;
- Tomcat closes an h2 connection after 20 seconds of idle time (
Http2Protocol.DEFAULT_KEEP_ALIVE_TIMEOUT = 20000) → the pool goes cold again → the trap is re-armed.
The animation below walks through the full cold/warm connection cycle (① cold + big upload → 413 → ② cold + small request → upgrade, pooled → ③ warm + big upload → 200 → ④ idle 20s → cold again, on loop):
Why the gateway pod count matters: the pool is per JVM (per pod) and keeps only one h2 connection per backend origin, so the system-wide connection count equals the gateway pod count. One pod concentrates all traffic on one always-warm connection; three pods split it three ways, so each connection cools past the 20s window far more often → intermittent 413. The mechanism (and a Poisson estimate) is in Why pod count acts as an amplifier below.
Root cause: how a 413 grows out of a connection pool#
Version mapping: the production gateway runs Spring Boot 3.5.6 (which manages spring-web 6.2.11 and tomcat-embed-core 10.1.46) with spring-cloud-gateway-server-mvc 4.3.x; the demo was verified on Boot 3.5.16 (spring-web 6.2.15, Tomcat 10.1.55) and JDK 25 (java.net.http). The relevant code paths are identical across both version sets (key lines shift by at most one), and every source link below is pinned to the production version tags, with line anchors you can open side by side.
Why the gateway uses the JDK HttpClient — and prefers HTTP/2#
The client selection itself is Spring Boot’s job: ClientHttpRequestFactoryBuilder.detect() (Boot 3.5.6, L216-L231) picks the first implementation present on the classpath:
Apache HttpClient 5 (httpComponents)
Jetty (jetty)
Reactor Netty (reactor)
JDK HttpClient (jdk)
simple (last resort)
SCG-MVC’s own GatewayHttpClientEnvironmentPostProcessor (L217-L260) probes the same classes, but only to apply JDK-specific gateway setup when the JDK client is the one selected — it is not the selector.
The gateway’s pom has none of the first three → JDK it is. On the Spring side (JdkClientHttpRequestFactory L51-L53):
public JdkClientHttpRequestFactory() {
this(HttpClient.newHttpClient());
}
And the Javadoc contract of HttpClient.newHttpClient() is explicit (HttpClient.java L178-L182):
The default settings include: the “GET” request method, a preference of HTTP/2, a redirection policy of NEVER, …
In other words: nobody ever chose HTTP/2 explicitly. It’s three layers of defaults inherited all the way down.
The JDK HttpClient pool: the upgrade decision happens at the moment of a pool miss#
The HTTP/2 connection pool itself (Http2ClientImpl L69) — one per HttpClient instance (= per gateway JVM), one connection per origin:
private final Map<String,Http2Connection> connections = new ConcurrentHashMap<>();
The entry point getConnectionFor(...) (its header comment spells out the four possible outcomes, see L85-L100); on a pool miss over cleartext http (L130-L135):
if (!req.secure() || failures.contains(key)) {
// secure: negotiate failed before. Use http/1.1
// !secure: no connection available in cache. Attempt upgrade
if (debug.on()) debug.log("not found in connection pool");
return MinimalFuture.completedFuture(null); // ← null = let the caller try an upgrade
}
Two crucial details:
- The
failuresnegative cache only records ALPN (https) failures (L154). A server rejecting an h2c upgrade (say, with a 413) is never remembered — the next cold connection will try the exact same thing again. That’s why the problem keeps recurring instead of happening once. - Only a successful upgrade puts the h2 connection into the pool (
offerConnection(conn)); a server GOAWAY/close removes it (removeFromPool) → back to cold.
What the caller does with that null (ExchangeImpl.createExchangeImpl(...) L154-L169):
if (c == null) {
// no existing connection. Send request with HTTP 1 and then
// upgrade if successful
return createHttp1Exchange(exchange, connection)
.thenApply((e) -> {
exchange.h2Upgrade(); // ← mark THIS request for the upgrade attempt
return e;
});
} else {
Stream<U> s = c.createStream(exchange); // ← warm connection: open an h2 stream directly
...
}
Exchange.h2Upgrade() (L334-L338) then adds to the request:
Connection: Upgrade, HTTP2-Settings
Upgrade: h2c
HTTP2-Settings: <base64>
Note that there is no check anywhere for “does this request carry a large body” — the upgrade request is the business request itself. If the first request to hit a cold connection is a 3MB multipart upload, those 3MB get sent as the body of the upgrade request.
The Tomcat side: an upgrade request’s body must be fully buffered — over 4KB means 413#
Http11Processor.service() (Tomcat 10.1.46, L332-L348):
// Has an upgrade been requested?
if (isConnectionToken(request.getMimeHeaders(), "upgrade")) {
String requestedProtocol = request.getHeader("Upgrade");
UpgradeProtocol upgradeProtocol = protocol.getUpgradeProtocol(requestedProtocol);
if (upgradeProtocol != null) {
if (upgradeProtocol.accept(request)) {
Request upgradeRequest = null;
try {
upgradeRequest = cloneRequest(request);
} catch (ByteChunk.BufferOverflowException ioe) {
response.setStatus(HttpServletResponse.SC_REQUEST_ENTITY_TOO_LARGE); // ← 413
setErrorState(ErrorState.CLOSE_CLEAN, null); // ← and close
} ...
Why the body must be cloned/buffered (cloneRequest() L517-L530):
// Need to read and buffer the request body, if any. RFC 7230 requires
// that the request is fully read before the upgrade takes place.
ByteChunk body = new ByteChunk();
int maxSavePostSize = protocol.getMaxSavePostSize();
if (maxSavePostSize != 0) {
body.setLimit(maxSavePostSize); // ← buffer cap
...
while (source.getInputBuffer().doRead(buffer) >= 0) {
body.append(buffer.getByteBuffer()); // ← overflow throws BufferOverflowException
}
}
The cap’s default (AbstractHttp11Protocol L248):
private int maxSavePostSize = 4 * 1024; // 4096 bytes
The semantics are sound: after a successful upgrade the connection switches to HTTP/2 and the original request has to be “replayed” on h2 stream 1, so the body must be buffered in full first. Tomcat refuses to buffer arbitrarily large bodies for that (it would be an OOM attack surface), so past the cap → BufferOverflowException → 413 + connection closed. This happens before servlet dispatch — auth filters, Spring MVC, business code: none of it ever runs — which is why the backend side is completely silent.
The clock behind “intermittent”: Tomcat closes idle h2 connections after 20 seconds#
// Http2Protocol.java L49
static final long DEFAULT_KEEP_ALIVE_TIMEOUT = 20000;
(Http2Protocol.java L49 / L77)
This 20s timer is what turns a static misconfiguration into an intermittent failure: the cold/warm cycle shown in the TL;DR animation replays every time a connection sits idle past it.
Why pod count acts as an amplifier#
First, let’s correct an intuitive-but-wrong connection-count model: the number of connections is not “gateway pods × backend pods”. The JDK client’s h2 pool stores connections keyed by origin (host:port — i.e. the backend Service’s VIP), and keeps exactly one per origin (the Map<String,Http2Connection> above; the key comes from Http2Connection.keyFor, called at L102). How many pods the backend has is completely invisible to the client — kube-proxy pins the connection to one backend pod at TCP connect time (which is why servedBy stays constant in the demo’s output). So gateway 1 pod vs 3 pods means 1 connection vs 3 connections, independent of backend pod count.
What actually drives the failure rate is two factors:
- Traffic dilution: a fixed total load λ split across N pools stretches each connection’s inter-request gap, and the probability of exceeding Tomcat’s 20s idle timeout rises exponentially under a Poisson model:
P(cold) ≈ e^(−(λ/N)·20s). Example: at 0.2 req/s total for the origin, 1 pod →e⁻⁴ ≈ 1.8%; 3 pods →e^(−1.33) ≈ 26%— over an order of magnitude apart. - Warmth is not shared across pods (the more fundamental one): with 1 pod, anyone’s small request keeps the single connection alive, so big uploads always ride along; with 3 pods, warming up pod A does nothing for an upload routed to pod B.
Measured demonstration from the demo (3 backend pods fixed; gateway fully cold, then 1 warm-up 2KB request followed by 12 consecutive 3MB uploads):
| Gateway | Warm-up | 12 uploads |
|---|---|---|
| 3 pods | 200 (warmed one pod) | 1×200 / 11×413 (the only 200 shares servedBy with the warm-up) |
| 1 pod | 200 | 12×200 (servedBy identical throughout = one connection) |
Here is a timeline replay comparing how a single connection heats and cools in the two topologies (the playhead sweeps 75 seconds of traffic):
Also note the blast radius: it’s not just file uploads — any request through the gateway with a body > 4KB (large JSON POSTs included) that lands on a cold connection gets a 413. Symptoms concentrate on uploads simply because uploads are reliably large.
How to troubleshoot this class of connection-layer problems#
In hindsight, this incident had several classic misdirections — each maps to a reusable troubleshooting lesson.
1. Establish who logged the line, and at which layer#
DispatcherServlet: Completed 413 PAYLOAD_TOO_LARGE shows up in the gateway’s log, so the first instinct is “the gateway’s own size limit fired”. But when a servlet gateway proxies back a downstream 413, its own DispatcherServlet logs the exact same line. The same log line can mean “originated here” or “forwarded from downstream” — the line alone cannot distinguish them.
The way to tell is to chase evidence downstream: does the backend have a matching request log? In this case the backend had zero logs — itself the strongest clue: the request was rejected before it ever reached the servlet layer, pointing the suspicion at the connector layer or below.
2. Look at what the 413 response body actually is#
413s minted at different layers have different bodies:
- Tomcat connector layer → Tomcat’s HTML error page (
HTTP Status 413 – …); - Spring layer (
MaxUploadSizeExceededExceptionetc.) → usually a JSON error structure or the app’s uniform error format.
Get the response body and you basically know which layer produced the 413.
3. Run a minimal controlled experiment with curl#
Once the protocol-upgrade path is a suspect, curl can shrink the variables down to exactly one. curl --http2 against a cleartext URL behaves precisely like the JDK HttpClient on a cold connection: it sends an HTTP/1.1 request with Upgrade: h2c. Hitting the backend directly, bypassing the gateway:
head -c 3000000 /dev/urandom > /tmp/big.bin
# Reproduce what the gateway's JDK client does on a cold connection (h2c upgrade attempt):
curl -s -o /tmp/r -w '%{http_code} %{http_version}\n' --http2 \
-F module=DEPOSIT -F 'file=@/tmp/big.bin;type=image/png' \
http://<upload-backend-svc>.<namespace>.svc.cluster.local/api/file/upload
# expected: 413 1.1, /tmp/r is Tomcat's HTML error page → root cause confirmed
# Control group (no upgrade):
curl -s -o /dev/null -w '%{http_code}\n' --http1.1 \
-F module=DEPOSIT -F 'file=@/tmp/big.bin;type=image/png' \
http://<upload-backend-svc>.<namespace>.svc.cluster.local/api/file/upload
# expected: 401/400 (it reached auth/business logic), NOT 413
Same file, same backend, the only difference being one h2c upgrade attempt — 200/401 vs 413 nails the root cause to the upgrade path. Bonus: the connector rejects before authentication, so this confirmation needs no token.
4. Find out which HTTP client the gateway actually uses downstream#
Spring Boot detects the client factory by classpath, and the “fallback” JDK HttpClient behaves nothing like Apache HC5 (HTTP/2 preference, one h2 connection per origin, upgrade attempts on cold connections). Ways to confirm:
- Check the gateway’s dependency tree:
mvn dependency:tree | grep -E 'httpclient5|jetty-client|reactor-netty'; - Or via actuator: look up the concrete
ClientHttpRequestFactorytype in/actuator/beans(JdkClientHttpRequestFactoryvsHttpComponentsClientHttpRequestFactory); - For finer-grained visibility, enable the JDK HttpClient’s debug log:
-Djdk.internal.httpclient.debug=true— you can watch the very decision points, likenot found in connection pool/new Http1Exchange, try to upgrade.
5. For “intermittent” problems, look for the clock first#
Intermittent ≠ random. Behind an intermittent connection-layer failure there is almost always a timeout/clock flipping some state — here, Tomcat’s 20s h2 idle timeout. Make “time since the previous request” an experiment variable (fire back-to-back vs after 45s idle); if the outcome flips with the gap, you’re on the right trail.
How the demo reproduces and verifies it#
Theory alone isn’t enough — especially a “three defaults composing” conclusion, which can easily sound like hand-waving. The demo repo’s k8s-repro/ builds a deterministic reproduction in a local kind cluster and turns every claim of the theory into an automated assertion.
Reproduction topology#
Two minimal stand-in apps mirror the production setup:
- backend413: Spring Boot Web with
server.http2.enabled=true(mirroring theSERVER_HTTP2_ENABLEDinjected by the production ConfigMap), a 10MB multipart limit (mirroring the upload service’s intentional business limit), echoingservedBy(pod name) andprotocol(actual protocol) in each response; deployed as 3 identical pods; - gateway413:
spring-cloud-starter-gateway-server-webmvcwith no HTTP client dependency in the pom (mirroring the production gateway → JDK HttpClient fallback).
Worth highlighting is the negative evidence from earlier attempts: with a stand-in backend that did not enable h2c, no amount of pod-count tweaking or artificial pod heterogeneity could reproduce the 413. Adding that single flag — server.http2.enabled=true, verbatim from the production ConfigMap — flipped cold-connection 3MB uploads from 200 to 413 across the board. Single-variable isolation beats any amount of reasoning.
Automated verification: 9 assertions (verify-h2c-413.sh)#
The script runs the cold/warm sequence, the direct-to-backend controls, and the fix control group in kind, with PASS/FAIL asserts:
T1 [PASS] cold gateway pod, first request = 3MB upload -> 413
T2 [PASS] 2KB upload (≤ maxSavePostSize=4096) -> 200 over HTTP/2.0
T3 [PASS] 3MB right after (warm h2 conn, multiplexed) -> 200 over HTTP/2.0
T4 [PASS] 3MB after 45s idle (> Tomcat h2 timeout 20s) -> 413
T5a [PASS] direct to backend, plain HTTP/1.1, 3MB -> 200
T5b [PASS] direct + h2c upgrade + 3MB -> 413
T5b2 [PASS] ...and the 413 body is Tomcat's HTML error page (connector level)
T5c [PASS] direct + h2c upgrade + 2KB -> 200 over HTTP/2
T6 [PASS] fix control group: same gateway with Apache HC5
(mvn -Phc5), cold conn + 3MB -> 200 over HTTP/1.1
Each assertion pins down one link in the causal chain:
-
T1 vs T2/T3: same cold pod, only the body size differs — the 4KB threshold is real, and big uploads on a warm connection are perfectly fine (proving it was never “the backend can’t take large files”);
-
T4: after 45s idle the very same upload turns into a 413 again — the 20s idle timeout is the clock behind the intermittency;
-
T5a vs T5b: direct to the backend, same file, the only difference is
--http1.1vs--http2— 200 vs 413. The gateway is innocent; the root cause lives in the upgrade path; -
T5b2: the 413 body is Tomcat’s HTML error page — confirming the connector layer, not Spring;
-
T6: the fix control group. Identical gateway code, one extra profile in the pom:
<profile> <id>hc5</id> <dependencies> <dependency> <groupId>org.apache.httpcomponents.client5</groupId> <artifactId>httpclient5</artifactId> </dependency> </dependencies> </profile>Spring Boot’s detection now prefers HC5 (whose classic mode speaks plain HTTP/1.1 and never attempts an h2c upgrade); a cold-connection 3MB upload → 200. The fix works, and it proves client selection really is a link in the chain.
Two engineering details in the script worth stealing:
- How to manufacture a “cold connection”:
kubectl rollout restart deployment/gateway413— a fresh JVM’s pool is empty by construction, far more deterministic than waiting out the 20s timeout; - How to observe the protocol: the backend echoes
request.getProtocol()(HTTP/1.1vsHTTP/2.0) plus its hostname in the response JSON; the loadtester parses outhttp_code|servedBy|protocolso every assertion checks both status code and actual protocol — which incidentally also demonstrates that kube-proxy balances per TCP connection (requests on one h2 connection have a constantservedBy).
Trigger conditions and the fix#
| Condition | Value | Removing it defuses the trap? |
|---|---|---|
| Downstream Tomcat has h2c on | SERVER_HTTP2_ENABLED=true |
✔ (recommended) |
| Gateway client prefers HTTP/2 and attempts h2c upgrades | JDK HttpClient default | ✔ (recommended: add HC5) |
Upgrade request body > maxSavePostSize |
default 4096B — every upload exceeds it | ✘ (raising it = moving the cliff + in-memory buffering; not advised) |
| Request lands on a cold connection | pool idle >20s / fresh pod / connection closed | can’t be eliminated, only made less likely |
Fix recommendations (either one alone is a complete fix; doing both is belt-and-braces):
- Add the Apache HttpClient 5 dependency to the gateway (verified by T6). It’s also the more conventional production-grade pool implementation (tunable timeouts/pool parameters). One caveat: classpath detection only kicks in when
spring.http.client.factoryis not set — if your config pins it (e.g.spring.http.client.factory: jdk), adding the dependency alone changes nothing; remove the property or set it tohttp-components. Alternative: register aJdkClientHttpRequestFactorybuilt withHttpClient.newBuilder().version(HttpClient.Version.HTTP_1_1). - Remove
SERVER_HTTP2_ENABLED: "true"from the backends the gateway proxies to. The benefit of h2c on an in-cluster hop (single-connection multiplexing) is not worth this trap; the ingress layer (nginx/ALB) talks HTTP/1.1 to the gateway anyway. - Not recommended: raising
server.tomcat.max-save-post-size— Tomcat would buffer the entire upload body in memory for the replay, which is a self-built OOM risk, and it merely relocates the threshold.
Takeaways#
- Not a single component had a bug: ops enabled h2c uniformly, the JDK HttpClient’s HTTP/2 preference is documented Javadoc behavior, and
maxSavePostSize=4KBis Tomcat’s reasonable OOM protection — three reasonable defaults composed into one unreasonable production incident; - The key moves for connection-layer troubleshooting: identify which layer minted a log line (original vs proxied), read the error response body’s “birthplace”, run single-variable curl control experiments, and hunt for the clock behind anything “intermittent”;
- Reproduction should aim for determinism and single variables: nine automated assertions in kind, each nailing one link of the causal chain (threshold, clock, protocol path, fix) — far more convincing than “try it a few more times in the test environment”.
References#
- Demo & verification script: spring-cloud-gateway-fileupload-demo (
k8s-repro/verify-h2c-413.sh,docs/gateway-413-h2c-upgrade.md) - RFC 7230 §6.7 Upgrade — the spec Tomcat’s source cites; it has since been obsoleted by RFC 9110 §7.8 / RFC 9112
- Tomcat HTTP Connector: maxSavePostSize
- JDK HttpClient Javadoc (HTTP/2 preferred by default)
Source links (pinned to the versions referenced)#
Tomcat (tag 10.1.46, matching Boot 3.5.6’s tomcat-embed-core; the demo’s 10.1.55 differs by at most one line):
Http11Processor.service()upgrade branch + 413 (L332-L348)Http11Processor.cloneRequest()body buffering (L517-L530)AbstractHttp11Protocol.maxSavePostSize = 4 * 1024(L248)Http2Protocol.DEFAULT_KEEP_ALIVE_TIMEOUT = 20000(L49)
JDK (openjdk/jdk, tag jdk-25-ga):
HttpClientJavadoc “a preference of HTTP/2” (L178-L182)Http2ClientImplpool map (L69),failuresnegative cache (L72)Http2Connection.keyFormakes the pool key = origin (call site, L102)Http2ClientImpl.getConnectionFor()four-outcomes comment (L85-L100), pool-miss branch (L130-L135)ExchangeImpl.createExchangeImpl()cold/warm branch (L154-L169)Exchange.h2Upgrade()(L334-L338)
Spring (spring-boot v3.5.6, spring-framework v6.2.11, spring-cloud-gateway v4.3.0):
ClientHttpRequestFactoryBuilder.detect()— classpath-order client factory selection (L216-L231)JdkClientHttpRequestFactoryno-arg constructor (L51-L53)GatewayServerMvcAutoConfiguration$GatewayHttpClientEnvironmentPostProcessor— gateway-specific JDK-client setup (restrictedHostheader, redirect default) (L217-L260)