Health, readiness, and Prometheus metrics endpoints exposed by paraloom validators.

Monitoring

Every paraloom validator exposes three operational HTTP endpoints on a configurable metrics port (default 0.0.0.0:9300). They are deliberately separate from the gossip port, so you can firewall them or expose them only to your monitoring stack.

Endpoint	Purpose	Format
`/health`	Liveness — is the process alive and not deadlocked?	JSON, 200 / 503
`/ready`	Readiness — has the validator finished startup and joined the network?	JSON, 200 / 503
`/metrics`	Prometheus metrics — counters, gauges, histograms	Prometheus text format

Implementation: src/health/. Closing issue: #67.

/health

$ curl -s http://localhost:9300/health | jq

{
  "status": "ok",
  "uptime_secs": 84221,
  "version": "0.5.0-rc2"
}

Returns 200 if the process is alive and the main event loop is responsive. Returns 503 if the event loop has stalled (heartbeat-style health check uses a watchdog). Use this for kubernetes livenessProbe.

/ready

$ curl -s http://localhost:9300/ready | jq

{
  "ready": true,
  "kademlia_bootstrapped": true,
  "consensus_synced": true,
  "merkle_root_in_sync_with_chain": true,
  "peer_count": 9
}

Returns 200 only when the validator has:

Joined the Kademlia DHT and bootstrapped its routing table
Caught up on consensus (height matches network)
Reconciled its local merkle root with the on-chain root
Reached the minimum peer count

Returns 503 if any check fails. Use this for kubernetes readinessProbe.

/metrics

Prometheus-formatted metrics. Sample of what's emitted:

# HELP paraloom_proof_verify_seconds Time to verify a Groth16 proof
# TYPE paraloom_proof_verify_seconds histogram
paraloom_proof_verify_seconds_bucket{le="0.005"} 0
paraloom_proof_verify_seconds_bucket{le="0.010"} 1842
paraloom_proof_verify_seconds_bucket{le="0.020"} 1844
paraloom_proof_verify_seconds_count 1844
paraloom_proof_verify_seconds_sum 18.7

# HELP paraloom_consensus_round_total Consensus rounds by outcome
# TYPE paraloom_consensus_round_total counter
paraloom_consensus_round_total{outcome="agreed"} 1839
paraloom_consensus_round_total{outcome="timeout"} 5

# HELP paraloom_peers_connected Currently connected libp2p peers
# TYPE paraloom_peers_connected gauge
paraloom_peers_connected 9

# HELP paraloom_nullifier_set_size Total spent nullifiers
# TYPE paraloom_nullifier_set_size gauge
paraloom_nullifier_set_size 28412

Key series to alert on:

Metric	What it tells you
`paraloom_proof_verify_seconds`	Verification latency drift; should stay under ~15 ms p99
`paraloom_consensus_round_total{outcome="timeout"}`	Sustained timeouts → network partition or peer issues
`paraloom_peers_connected`	Drop indicates DHT or NAT issues
`paraloom_coordinator_role`	`1` if primary, `0` if passive — track failover events
`paraloom_heartbeat_missed_total`	Counts missed primary heartbeats from passives' view

Configuration

Set the metrics port at startup:

$ paraloom validator start \
    --config ./validator.toml \
    --metrics 0.0.0.0:9300

Or in the TOML config:

[metrics]
listen = "0.0.0.0:9300"

Recommended scrape config

scrape_configs:
  - job_name: paraloom
    scrape_interval: 15s
    static_configs:
      - targets: ['validator-1.local:9300', 'validator-2.local:9300']

Monitoring

Monitoring

/health

/ready

/metrics

Configuration

Recommended scrape config

On this page