Monitoring
Health, readiness, and Prometheus metrics endpoints exposed by paraloom validators.
Monitoring
Every paraloom validator exposes three operational HTTP endpoints on a configurable metrics port (default 0.0.0.0:9300). They are deliberately separate from the gossip port, so you can firewall them or expose them only to your monitoring stack.
| Endpoint | Purpose | Format |
|---|---|---|
/health | Liveness — is the process alive and not deadlocked? | JSON, 200 / 503 |
/ready | Readiness — has the validator finished startup and joined the network? | JSON, 200 / 503 |
/metrics | Prometheus metrics — counters, gauges, histograms | Prometheus text format |
Implementation: src/health/. Closing issue: #67.
/health
$ curl -s http://localhost:9300/health | jq{
"status": "ok",
"uptime_secs": 84221,
"version": "0.5.0-rc2"
}Returns 200 if the process is alive and the main event loop is responsive. Returns 503 if the event loop has stalled (heartbeat-style health check uses a watchdog). Use this for kubernetes livenessProbe.
/ready
$ curl -s http://localhost:9300/ready | jq{
"ready": true,
"kademlia_bootstrapped": true,
"consensus_synced": true,
"merkle_root_in_sync_with_chain": true,
"peer_count": 9
}Returns 200 only when the validator has:
- Joined the Kademlia DHT and bootstrapped its routing table
- Caught up on consensus (height matches network)
- Reconciled its local merkle root with the on-chain root
- Reached the minimum peer count
Returns 503 if any check fails. Use this for kubernetes readinessProbe.
/metrics
Prometheus-formatted metrics. Sample of what's emitted:
# HELP paraloom_proof_verify_seconds Time to verify a Groth16 proof
# TYPE paraloom_proof_verify_seconds histogram
paraloom_proof_verify_seconds_bucket{le="0.005"} 0
paraloom_proof_verify_seconds_bucket{le="0.010"} 1842
paraloom_proof_verify_seconds_bucket{le="0.020"} 1844
paraloom_proof_verify_seconds_count 1844
paraloom_proof_verify_seconds_sum 18.7
# HELP paraloom_consensus_round_total Consensus rounds by outcome
# TYPE paraloom_consensus_round_total counter
paraloom_consensus_round_total{outcome="agreed"} 1839
paraloom_consensus_round_total{outcome="timeout"} 5
# HELP paraloom_peers_connected Currently connected libp2p peers
# TYPE paraloom_peers_connected gauge
paraloom_peers_connected 9
# HELP paraloom_nullifier_set_size Total spent nullifiers
# TYPE paraloom_nullifier_set_size gauge
paraloom_nullifier_set_size 28412Key series to alert on:
| Metric | What it tells you |
|---|---|
paraloom_proof_verify_seconds | Verification latency drift; should stay under ~15 ms p99 |
paraloom_consensus_round_total{outcome="timeout"} | Sustained timeouts → network partition or peer issues |
paraloom_peers_connected | Drop indicates DHT or NAT issues |
paraloom_coordinator_role | 1 if primary, 0 if passive — track failover events |
paraloom_heartbeat_missed_total | Counts missed primary heartbeats from passives' view |
Configuration
Set the metrics port at startup:
$ paraloom validator start \
--config ./validator.toml \
--metrics 0.0.0.0:9300Or in the TOML config:
[metrics]
listen = "0.0.0.0:9300"Recommended scrape config
scrape_configs:
- job_name: paraloom
scrape_interval: 15s
static_configs:
- targets: ['validator-1.local:9300', 'validator-2.local:9300']