Performance
Measured costs of paraloom-core operations, where they bottleneck, and how to scale.
Performance
Paraloom is designed so that proof verification fits on a laptop. The whole BFT cohort sees ~10 ms of CPU per withdrawal proof; that's the budget that lets the network run on commodity hardware. This page is the operating envelope today, not aspirational targets.
All numbers below come from
cargo benchand the integration test suite at v0.5.0-rc2 unless otherwise noted. Reproduce withcargo bench --allonparaloom-coremain.
Privacy layer
| Operation | Time | Where | Notes |
|---|---|---|---|
| Groth16 verification | ~10 ms | validator (per proof) | single CPU core, BLS12-381 |
| Groth16 generation | 2–3 s | client (per proof) | parallelizable across cores |
| Poseidon hash | < 1 ms | both | ~500 constraints in-circuit |
| Pedersen commit | < 1 ms | both | additive homomorphism |
| Sparse merkle path verify | < 1 ms | validator | fixed-depth |
| Batch verification (16 proofs) | ~50 ms | validator | amortizes pairings |
Proof generation is single-threaded by default but each proof is independent — submitters can parallelize across cores trivially. Verification is the load-bearing number: if it weren't ~10 ms, the validator hardware bar would have to go up.
Consensus
| Metric | Value |
|---|---|
| Round latency (p50) | < 1 s |
| Round latency (p99) | < 3 s |
| Heartbeat interval | 5 s |
| Heartbeat threshold | 15 s (3 missed) |
| Coordinator failover | < 30 s (RTO) |
Round latency is dominated by network round-trips, not by verification. With the 7-of-10 threshold and reputation gating, the cohort waits for the 7th valid vote — typically the slowest of the gated subset.
Compute layer (alpha)
| Job class | Typical wall-clock | Memory |
|---|---|---|
| Simple WASM (sub-µs ops) | < 1 s end-to-end | < 4 MiB |
| Aggregation over 10 KB inputs | 1–3 s | 16–32 MiB |
| ML-style inference (small model) | 5–30 s | up to 64 MiB |
Per-validator throughput is about 10 jobs/sec for trivial WASM (single-validator); replicated jobs scale with cohort size minus the consensus overhead.
Throughput today (devnet)
| Operation | Throughput | Bottleneck |
|---|---|---|
| Deposits | ~100/sec | Solana L1 confirmation |
| Withdrawals | 1–5/sec | BFT round time + Solana confirmation |
| Compute jobs | ~10/sec/validator | WASM execution |
| P2P messages | ~1000/sec | gossipsub fanout |
Where these come from:
- Deposits are pure Solana txs; the L2 just observes events. Capped by Solana, not paraloom.
- Withdrawals carry a full BFT round + on-chain submission. Round time + slot time = ~1 s in good conditions.
- Compute is currently bounded by single-thread WASM execution per job; replication multiplies cost across validators.
Network
| Value | |
|---|---|
| P2P latency (LAN) | < 100 ms |
| P2P latency (WAN, geo-distributed) | 100–300 ms |
| Bridge listener poll | 10 s default |
| Gossipsub heartbeat | 1 s |
| Codec read cap | bounded (DoS-hardened) |
Codec bounds were tightened in v0.4.0 — every gossip message has a size cap that's enforced before deserialization. See Networking.
Storage
| Data | Size | Backing |
|---|---|---|
| Merkle leaf (commitment) | 32 B | RocksDB sparse merkle |
| Nullifier entry | 32 B + index | RocksDB, fsync on hot writes |
| Block / round metadata | ~1 KiB / round | RocksDB |
| Proving key (per circuit) | ~1.5 MiB | filesystem |
| Verifying key (per circuit) | ~600 B | embedded in Anchor program for on-chain verify |
fsync on hot writes is on by default (#68) — closes a corruption window where in-flight writes could be lost on power failure. Cost is minor on SSDs; HDDs will lag visibly.
Resource envelope
Validator (recommended hardware)
| Idle | Steady-state | Peak | |
|---|---|---|---|
| CPU | 1–5% | 10–20% one core | 50–80% one core during proof verification bursts |
| RAM | ~500 MiB | 500 MiB – 1 GiB | up to 2 GiB under heavy compute load |
| Disk I/O | < 1 KB/s | 10–100 KB/s | 1–5 MB/s during state replication |
| Network | < 1 KB/s | ~10 KB/s | ~1 MB/s during catch-up or compute |
Hardware bar is intentionally low — Apple Silicon, Raspberry Pi 5, or a small VPS all comfortably hold the steady-state numbers.
Submitter (proof generation)
| CPU | 100% one core during proving (parallelizable) |
| RAM | 100–200 MiB per concurrent proof |
| Disk | proving key in cache (~1.5 MiB resident) |
Scaling
What scales horizontally
- Validator count. Adds capacity for compute and parallelism for verification. Threshold is configurable per network.
- Submitters. Proof generation is fully parallel; many users can prove simultaneously without coordination.
- Compute replication factor. Higher replication = stronger BFT guarantees, but linear cost across validators.
What's vertical / fixed
- Per-proof verification time. ~10 ms is a curve+circuit cost; scaling validator count reduces latency variance, not per-validator cost.
- Solana L1. Deposit/withdraw confirmation is bounded by L1, not L2.
- Proving key size. ~1.5 MiB per circuit. Doesn't grow with traffic.
Optimizations on the table
| Status | |
|---|---|
| Batch verification (amortize pairings) | implemented (src/privacy/batch.rs) |
| Parallel single-proof generation (multi-core) | implemented via rayon in submitter |
| Proof aggregation (one proof for many) | research / future track |
| Recursive Groth16 (constant-size aggregation) | future track |
| GPU proving | not in scope for v0.5.0 |
The biggest near-term win is batch verification on the validator side when withdrawal volume grows — already wired, currently dormant on devnet because volume is low.
Metrics to track
Each validator exposes Prometheus-format metrics on metrics.listen (default :9300). Most relevant for performance:
| What to watch | |
|---|---|
paraloom_proof_verify_seconds | per-proof histogram; p99 should stay < 50 ms |
paraloom_consensus_round_total{outcome=...} | rate and outcome distribution of rounds |
paraloom_consensus_round_seconds | end-to-end round latency histogram |
paraloom_peer_count | drop = network reconfiguration or partition |
paraloom_nullifier_set_size | grows monotonically; rate = real withdrawal traffic |
paraloom_storage_fsync_seconds | high values = slow disk; consider NVMe |
paraloom_coordinator_role | 0 passive, 1 primary; flips during failover |
Alert thresholds (suggested):
| Warn | Critical | |
|---|---|---|
paraloom_proof_verify_seconds p99 | > 50 ms | > 200 ms |
paraloom_consensus_round_seconds p99 | > 3 s | > 10 s |
paraloom_peer_count | < expected − 2 | < expected − 5 |
paraloom_storage_fsync_seconds p99 | > 50 ms | > 500 ms |
Full reference: Monitoring.
How to reproduce
cargo bench --all # microbenches
cargo test --release --test consensus_* # round latency under load
cargo test --release --test bridge_* # end-to-end with localnet
./scripts/localnet/test-privacy-e2e.sh # full deposit→transfer→withdraw timingIf your numbers diverge meaningfully from the table above, check: NTP drift, RocksDB on HDD, or fsync disabled — those are the three usual suspects.