ParaloomPARALOOM

Monitoring

Health, readiness, and Prometheus metrics endpoints, plus log-based monitoring for paraloom validators.

Monitoring

Status: the operational HTTP server (/health, /ready, /metrics) lives in the codebase under src/health/, but it is not yet wired into paraloom-node — there is no config knob to enable it pre-mainnet. Until it lands, monitor via logs (see Log-based monitoring below). The endpoint reference describes what will ship; track #67.

The server is designed to expose three operational HTTP endpoints, deliberately separate from the gossip port (libp2p listens on TCP 9300), so you can firewall them or expose them only to your monitoring stack.

EndpointPurposeFormat
/healthLiveness — is the process alive and not deadlocked?JSON, 200 / 503
/readyReadiness — has the validator finished startup and joined the network?JSON, 200 / 503
/metricsPrometheus metrics — counters, gauges, histogramsPrometheus text format

Implementation: src/health/. Closing issue: #67.

/health

$ curl -s http://localhost:9100/health | jq
{
  "status": "ok",
  "uptime_secs": 84221,
  "version": "0.5.0-rc2"
}

Returns 200 if the process is alive and the main event loop is responsive. Returns 503 if the event loop has stalled (heartbeat-style health check uses a watchdog). Use this for kubernetes livenessProbe.

/ready

$ curl -s http://localhost:9100/ready | jq
{
  "ready": true,
  "peer_count": 9
}

Returns 200 once the validator has joined the network and reached its minimum peer count (mirrored by the paraloom_ready metric). Returns 503 otherwise. Use this for kubernetes readinessProbe.

/metrics

Prometheus-formatted metrics. The server emits exactly four series today:

# HELP paraloom_peer_count Currently connected libp2p peers
# TYPE paraloom_peer_count gauge
paraloom_peer_count 9

# HELP paraloom_ready Whether the node has finished startup and joined the network
# TYPE paraloom_ready gauge
paraloom_ready 1

# HELP paraloom_uptime_seconds Process uptime in seconds
# TYPE paraloom_uptime_seconds gauge
paraloom_uptime_seconds 84221

# HELP paraloom_build_info Build metadata (version label)
# TYPE paraloom_build_info gauge
paraloom_build_info{version="0.5.0-rc2"} 1

Key series to alert on:

MetricWhat it tells you
paraloom_peer_countDrop indicates DHT or NAT issues
paraloom_ready0 means the node has not (re)joined the network
paraloom_uptime_secondsResets to a low value reveal restarts/crash-loops
paraloom_build_infoTracks the deployed version label across the fleet

Log-based monitoring

The /metrics server above is not yet wired into the node, so today the source of truth is the process logs. Run the node directly or under systemd and watch them:

# foreground, verbose
RUST_LOG=info ./target/release/paraloom validator start --config ~/.paraloom/validator.toml

# or, under systemd
sudo journalctl -u paraloom-validator -f

Watch for the bootstrap handshake against the anchor and a rising peer count after start. For on-chain validator state (reputation_score, total_earnings, is_active), read the ValidatorAccount PDA via Solana Explorer (devnet) or:

paraloom validator list

Once the metrics server is wired into paraloom-node and bound to a port, a scrape config will look like:

scrape_configs:
  - job_name: paraloom
    scrape_interval: 15s
    static_configs:
      - targets: ['validator-1.local:9100', 'validator-2.local:9100']

On this page