ParaloomPARALOOM

Monitoring

Health, readiness, and Prometheus metrics endpoints exposed by paraloom validators.

Monitoring

Every paraloom validator exposes three operational HTTP endpoints on a configurable metrics port (default 0.0.0.0:9300). They are deliberately separate from the gossip port, so you can firewall them or expose them only to your monitoring stack.

EndpointPurposeFormat
/healthLiveness — is the process alive and not deadlocked?JSON, 200 / 503
/readyReadiness — has the validator finished startup and joined the network?JSON, 200 / 503
/metricsPrometheus metrics — counters, gauges, histogramsPrometheus text format

Implementation: src/health/. Closing issue: #67.

/health

$ curl -s http://localhost:9300/health | jq
{
  "status": "ok",
  "uptime_secs": 84221,
  "version": "0.5.0-rc2"
}

Returns 200 if the process is alive and the main event loop is responsive. Returns 503 if the event loop has stalled (heartbeat-style health check uses a watchdog). Use this for kubernetes livenessProbe.

/ready

$ curl -s http://localhost:9300/ready | jq
{
  "ready": true,
  "kademlia_bootstrapped": true,
  "consensus_synced": true,
  "merkle_root_in_sync_with_chain": true,
  "peer_count": 9
}

Returns 200 only when the validator has:

  • Joined the Kademlia DHT and bootstrapped its routing table
  • Caught up on consensus (height matches network)
  • Reconciled its local merkle root with the on-chain root
  • Reached the minimum peer count

Returns 503 if any check fails. Use this for kubernetes readinessProbe.

/metrics

Prometheus-formatted metrics. Sample of what's emitted:

# HELP paraloom_proof_verify_seconds Time to verify a Groth16 proof
# TYPE paraloom_proof_verify_seconds histogram
paraloom_proof_verify_seconds_bucket{le="0.005"} 0
paraloom_proof_verify_seconds_bucket{le="0.010"} 1842
paraloom_proof_verify_seconds_bucket{le="0.020"} 1844
paraloom_proof_verify_seconds_count 1844
paraloom_proof_verify_seconds_sum 18.7

# HELP paraloom_consensus_round_total Consensus rounds by outcome
# TYPE paraloom_consensus_round_total counter
paraloom_consensus_round_total{outcome="agreed"} 1839
paraloom_consensus_round_total{outcome="timeout"} 5

# HELP paraloom_peers_connected Currently connected libp2p peers
# TYPE paraloom_peers_connected gauge
paraloom_peers_connected 9

# HELP paraloom_nullifier_set_size Total spent nullifiers
# TYPE paraloom_nullifier_set_size gauge
paraloom_nullifier_set_size 28412

Key series to alert on:

MetricWhat it tells you
paraloom_proof_verify_secondsVerification latency drift; should stay under ~15 ms p99
paraloom_consensus_round_total{outcome="timeout"}Sustained timeouts → network partition or peer issues
paraloom_peers_connectedDrop indicates DHT or NAT issues
paraloom_coordinator_role1 if primary, 0 if passive — track failover events
paraloom_heartbeat_missed_totalCounts missed primary heartbeats from passives' view

Configuration

Set the metrics port at startup:

$ paraloom validator start \
    --config ./validator.toml \
    --metrics 0.0.0.0:9300

Or in the TOML config:

[metrics]
listen = "0.0.0.0:9300"
scrape_configs:
  - job_name: paraloom
    scrape_interval: 15s
    static_configs:
      - targets: ['validator-1.local:9300', 'validator-2.local:9300']

On this page