nginx Restarted Fine, but Cloudflare Keeps Returning 502 — Even Though the Origin Is Healthy

TL;DR A brief error during nginx restart caused Cloudflare to mark the origin as unhealthy and stop forwarding requests, returning 502 on its own. The key clues: localhost hits to the origin return 200, and nginx access logs are completely empty. Just wait for Cloudflare to automatically re-check the origin — it recovers on its own.

#nginx #cloudflare #docker #reverse-proxy #debugging

Table of Contents

TL;DR
Context
The Problem
Investigation
Root Cause
Resolution
Prevention
Key Takeaway
References

🌏 中文版

TL;DR

A brief error during nginx restart caused Cloudflare to mark the origin as unhealthy and stop forwarding requests, returning 502 itself. The key clues: hitting the origin directly from localhost returns 200, and nginx access logs are completely empty. Just wait for Cloudflare to automatically re-check the origin — it recovers on its own.

Context

After adjusting nginx upstream configuration and restarting nginx, several subdomains started returning 502 consistently. Not the “first request 502, then 200” pattern — all requests returned 502.

Affected subdomains: app.daodao.so, app-dev.daodao.so, app-feat.daodao.so

Working fine: server.daodao.so, ai-dev.daodao.so

The Problem

for i in {1..5}; do
  curl -s -o /dev/null -w "%{http_code}\n" https://app.daodao.so
done
502
502
502
502
502

Investigation

Containers are healthy:

docker ps | grep prod_product
# Up 2 weeks (healthy)

Hitting the upstream directly from inside the nginx container returns 200:

docker exec nginx curl -s -o /dev/null -w "%{http_code}" http://prod_product:3001
# 200

Hitting via VPS localhost also returns 200:

curl -s -o /dev/null -w "%{http_code}\n" http://localhost -H "Host: app.daodao.so"
# 200

nginx error log is empty. nginx access log is also empty.

This is the critical clue — requests never reached nginx at all.

Inspecting response headers:

curl -si https://app.daodao.so | head -20

HTTP/2 502
content-type: text/plain; charset=UTF-8
content-length: 15
server: cloudflare
cf-ray: 9dcb32d71dc4b486-SIN

error code: 502

A few things stand out:

server: cloudflare, not nginx
content-type: text/plain — nginx’s 502 page is HTML
No cf-cache-status header
Body is error code: 502 (exactly 15 characters)

This is not a response from nginx, and it’s not a Cloudflare cache hit — it’s an error page generated by Cloudflare itself.

Root Cause

nginx actually restarted twice in this incident:

First restart: The new config had an issue (missing zone directive in the upstream block), causing nginx to fail to start.
Second restart: After fixing the config, nginx started successfully — but during early startup, upstream DNS hadn’t resolved yet, so it briefly returned 502.

Cloudflare detected consecutive errors from the origin and triggered its origin health check mechanism, temporarily stopping all request forwarding to the origin and returning 502 directly from Cloudflare itself.

Why were only some subdomains affected? Because Cloudflare tracks origin health per hostname. The subdomains that happened to receive requests during the brief window when nginx was misbehaving got flagged as unhealthy.

Resolution

Wait for Cloudflare to automatically re-check the origin. Once it confirms the origin is healthy, it resumes forwarding. In practice, recovery happens within a few minutes with no action required.

No manual steps needed, but here’s a quick diagnostic checklist for next time:

# Confirm it's a Cloudflare issue, not nginx
curl -si https://your-domain.so | grep "server:"
# server: cloudflare → Cloudflare is generating the response

# Confirm the origin itself is healthy
curl -s -o /dev/null -w "%{http_code}\n" http://localhost -H "Host: your-domain.so"
# 200 → Origin is fine, just wait for CF to re-check

Prevention

Use reload instead of restart for nginx whenever possible. Reload is zero-downtime and doesn’t reset shared memory state, which avoids triggering Cloudflare’s origin health check:

docker exec nginx nginx -s reload

Key Takeaway

Empty nginx access log = requests never reached nginx. When you see a 502, check the server: header first. If it says cloudflare, investigate Cloudflare. If it says nginx, then dig into nginx.