Build a Production Health Check with OpenSwoole Event Loop Lag Metrics

OpenSwoole 26.2.0 introduced event loop lag metrics that expose how long the event loop is stalled per iteration. In this post, we'll use these metrics to build a production-ready health check endpoint that load balancers can query to detect and route traffic away from unhealthy workers.

The Problem

In a production OpenSwoole server, a single blocking operation — a slow database query, a synchronous file read, or a CPU-heavy computation — can stall the event loop for an entire worker. When this happens, that worker stops processing new requests until the blocking operation completes.

Traditional health checks only verify that the server is listening on its port. They don't tell you whether individual workers are actually responsive. A server can pass a TCP health check while half its workers are blocked and response times are through the roof.

Event Loop Lag Metrics

OpenSwoole 26.2.0 exposes three metrics via $server->stats():

MetricDescription
event_loop_lag_msCurrent event loop lag in milliseconds
event_loop_lag_max_msMaximum lag observed since server start
event_loop_lag_avg_msAverage event loop lag

A healthy event loop typically shows lag under 10ms. When lag exceeds 50–100ms, something is blocking the loop and requests are queuing up.

Building the Health Check

Here's a complete example that exposes a /healthz endpoint. The endpoint returns HTTP 200 when the event loop is healthy and HTTP 503 when lag exceeds a threshold — exactly what load balancers like Nginx, HAProxy, or AWS ALB need to make routing decisions.

<?php

use OpenSwoole\Http\Server;
use OpenSwoole\Http\Request;
use OpenSwoole\Http\Response;
use OpenSwoole\Timer;

$server = new Server("0.0.0.0", 9501);

$server->set([
    'worker_num' => 4,
    'hook_flags' => OPENSWOOLE_HOOK_ALL,
]);

// Threshold in milliseconds — tune this for your workload
const LAG_THRESHOLD_MS = 50;

$server->on("request", function (Request $request, Response $response) use ($server) {
    if ($request->server['request_uri'] === '/healthz') {
        $stats = $server->stats();
        $lagMs = $stats['event_loop_lag_ms'] ?? 0;
        $maxMs = $stats['event_loop_lag_max_ms'] ?? 0;
        $avgMs = $stats['event_loop_lag_avg_ms'] ?? 0;

        $healthy = $lagMs < LAG_THRESHOLD_MS;

        $body = json_encode([
            'status'   => $healthy ? 'healthy' : 'degraded',
            'lag_ms'   => round($lagMs, 2),
            'max_ms'   => round($maxMs, 2),
            'avg_ms'   => round($avgMs, 2),
            'worker'   => $server->getWorkerId(),
        ]);

        $response->status($healthy ? 200 : 503);
        $response->header('Content-Type', 'application/json');
        $response->end($body);
        return;
    }

    // Your application logic here
    $response->end("Hello World\n");
});

$server->on("workerStart", function (Server $server, int $workerId) {
    // Periodic self-check: log warnings when lag spikes
    Timer::tick(10000, function () use ($server) {
        $stats = $server->stats();
        $lagMs = $stats['event_loop_lag_ms'] ?? 0;

        if ($lagMs > LAG_THRESHOLD_MS) {
            error_log(sprintf(
                "[HEALTH] Worker %d degraded: lag=%.2fms max=%.2fms avg=%.2fms",
                $server->getWorkerId(),
                $lagMs,
                $stats['event_loop_lag_max_ms'],
                $stats['event_loop_lag_avg_ms']
            ));
        }
    });
});

$server->start();

Nginx Integration

Point your Nginx upstream health check at the /healthz endpoint:

upstream openswoole_backend {
    server 127.0.0.1:9501;
    server 127.0.0.1:9502;
}

server {
    listen 80;

    location / {
        proxy_pass http://openswoole_backend;
    }

    location /healthz {
        proxy_pass http://openswoole_backend;
        access_log off;
    }
}

For Nginx Plus or OpenResty with active health checks, the 503 response from a degraded worker causes the load balancer to temporarily remove it from the rotation until the event loop recovers.

What Causes High Event Loop Lag?

If your health check starts returning 503, look for these common causes:

  • Unhookable blocking calls: some C extensions perform synchronous I/O that OpenSwoole's runtime hooks can't intercept. Move these to task workers.
  • CPU-intensive work: image processing, encryption, or heavy computation should be offloaded to task workers.
  • Synchronous file I/O: use OpenSwoole's coroutine file operations or the new io_uring async file I/O engine in v26.2.0.
  • Large JSON encoding/decoding: consider streaming or chunking large payloads.

Summary

Event loop lag metrics give you direct visibility into worker health — something TCP-level health checks can't provide. Combined with a simple /healthz endpoint and your load balancer, you get automatic traffic routing away from degraded workers with zero downtime.

Check out the Event Loop Lag Metrics documentation and the OpenSwoole 26.2.0 release notes for more details.

Install OpenSwoole 26.2.0

The easiest way to install OpenSwoole is via PIE (PHP Installer for Extensions):

pie install openswoole/ext-openswoole

Or via PECL:

pecl install openswoole-26.2.0

Install the core library:

composer require openswoole/core:26.2.0

Docker images are also available:

docker pull openswoole/openswoole:26.2-php8.5-alpine

For full installation options and compilation flags, see the installation documentation.