🌏 中文版
TL;DR
A brief error during nginx restart caused Cloudflare to mark the origin as unhealthy and stop forwarding requests, returning 502 itself. The key clues: hitting the origin directly from localhost returns 200, and nginx access logs are completely empty. Just wait for Cloudflare to automatically re-check the origin — it recovers on its own.
Context
After adjusting nginx upstream configuration and restarting nginx, several subdomains started returning 502 consistently. Not the “first request 502, then 200” pattern — all requests returned 502.
Affected subdomains: app.daodao.so, app-dev.daodao.so, app-feat.daodao.so
Working fine: server.daodao.so, ai-dev.daodao.so
The Problem
for i in {1..5}; do
curl -s -o /dev/null -w "%{http_code}\n" https://app.daodao.so
done
502
502
502
502
502
Investigation
Containers are healthy:
docker ps | grep prod_product
# Up 2 weeks (healthy)
Hitting the upstream directly from inside the nginx container returns 200:
docker exec nginx curl -s -o /dev/null -w "%{http_code}" http://prod_product:3001
# 200
Hitting via VPS localhost also returns 200:
curl -s -o /dev/null -w "%{http_code}\n" http://localhost -H "Host: app.daodao.so"
# 200
nginx error log is empty. nginx access log is also empty.
This is the critical clue — requests never reached nginx at all.
Inspecting response headers:
curl -si https://app.daodao.so | head -20
HTTP/2 502
content-type: text/plain; charset=UTF-8
content-length: 15
server: cloudflare
cf-ray: 9dcb32d71dc4b486-SIN
error code: 502
A few things stand out:
server: cloudflare, not nginxcontent-type: text/plain— nginx’s 502 page is HTML- No
cf-cache-statusheader - Body is
error code: 502(exactly 15 characters)
This is not a response from nginx, and it’s not a Cloudflare cache hit — it’s an error page generated by Cloudflare itself.
Root Cause
nginx actually restarted twice in this incident:
- First restart: The new config had an issue (missing
zonedirective in the upstream block), causing nginx to fail to start. - Second restart: After fixing the config, nginx started successfully — but during early startup, upstream DNS hadn’t resolved yet, so it briefly returned 502.
Cloudflare detected consecutive errors from the origin and triggered its origin health check mechanism, temporarily stopping all request forwarding to the origin and returning 502 directly from Cloudflare itself.
Why were only some subdomains affected? Because Cloudflare tracks origin health per hostname. The subdomains that happened to receive requests during the brief window when nginx was misbehaving got flagged as unhealthy.
Resolution
Wait for Cloudflare to automatically re-check the origin. Once it confirms the origin is healthy, it resumes forwarding. In practice, recovery happens within a few minutes with no action required.
No manual steps needed, but here’s a quick diagnostic checklist for next time:
# Confirm it's a Cloudflare issue, not nginx
curl -si https://your-domain.so | grep "server:"
# server: cloudflare → Cloudflare is generating the response
# Confirm the origin itself is healthy
curl -s -o /dev/null -w "%{http_code}\n" http://localhost -H "Host: your-domain.so"
# 200 → Origin is fine, just wait for CF to re-check
Prevention
Use reload instead of restart for nginx whenever possible. Reload is zero-downtime and doesn’t reset shared memory state, which avoids triggering Cloudflare’s origin health check:
docker exec nginx nginx -s reload
Key Takeaway
Empty nginx access log = requests never reached nginx. When you see a 502, check the server: header first. If it says cloudflare, investigate Cloudflare. If it says nginx, then dig into nginx.
References
Loading...