ParaloomPARALOOM

Performance

Measured costs of paraloom-core operations, where they bottleneck, and how to scale.

Performance

Paraloom is designed so that proof verification fits on a laptop. The whole BFT cohort sees ~10 ms of CPU per withdrawal proof; that's the budget that lets the network run on commodity hardware. This page is the operating envelope today, not aspirational targets.

All numbers below come from cargo bench and the integration test suite at v0.5.0-rc2 unless otherwise noted. Reproduce with cargo bench --all on paraloom-core main.

Privacy layer

OperationTimeWhereNotes
Groth16 verification~10 msvalidator (per proof)single CPU core, BLS12-381
Groth16 generation2–3 sclient (per proof)parallelizable across cores
Poseidon hash< 1 msboth~500 constraints in-circuit
Pedersen commit< 1 msbothadditive homomorphism
Sparse merkle path verify< 1 msvalidatorfixed-depth
Batch verification (16 proofs)~50 msvalidatoramortizes pairings

Proof generation is single-threaded by default but each proof is independent — submitters can parallelize across cores trivially. Verification is the load-bearing number: if it weren't ~10 ms, the validator hardware bar would have to go up.

Consensus

MetricValue
Round latency (p50)< 1 s
Round latency (p99)< 3 s
Heartbeat interval5 s
Heartbeat threshold15 s (3 missed)
Coordinator failover< 30 s (RTO)

Round latency is dominated by network round-trips, not by verification. With the 7-of-10 threshold and reputation gating, the cohort waits for the 7th valid vote — typically the slowest of the gated subset.

Compute layer (alpha)

Job classTypical wall-clockMemory
Simple WASM (sub-µs ops)< 1 s end-to-end< 4 MiB
Aggregation over 10 KB inputs1–3 s16–32 MiB
ML-style inference (small model)5–30 sup to 64 MiB

Per-validator throughput is about 10 jobs/sec for trivial WASM (single-validator); replicated jobs scale with cohort size minus the consensus overhead.

Throughput today (devnet)

OperationThroughputBottleneck
Deposits~100/secSolana L1 confirmation
Withdrawals1–5/secBFT round time + Solana confirmation
Compute jobs~10/sec/validatorWASM execution
P2P messages~1000/secgossipsub fanout

Where these come from:

  • Deposits are pure Solana txs; the L2 just observes events. Capped by Solana, not paraloom.
  • Withdrawals carry a full BFT round + on-chain submission. Round time + slot time = ~1 s in good conditions.
  • Compute is currently bounded by single-thread WASM execution per job; replication multiplies cost across validators.

Network

Value
P2P latency (LAN)< 100 ms
P2P latency (WAN, geo-distributed)100–300 ms
Bridge listener poll10 s default
Gossipsub heartbeat1 s
Codec read capbounded (DoS-hardened)

Codec bounds were tightened in v0.4.0 — every gossip message has a size cap that's enforced before deserialization. See Networking.

Storage

DataSizeBacking
Merkle leaf (commitment)32 BRocksDB sparse merkle
Nullifier entry32 B + indexRocksDB, fsync on hot writes
Block / round metadata~1 KiB / roundRocksDB
Proving key (per circuit)~1.5 MiBfilesystem
Verifying key (per circuit)~600 Bembedded in Anchor program for on-chain verify

fsync on hot writes is on by default (#68) — closes a corruption window where in-flight writes could be lost on power failure. Cost is minor on SSDs; HDDs will lag visibly.

Resource envelope

IdleSteady-statePeak
CPU1–5%10–20% one core50–80% one core during proof verification bursts
RAM~500 MiB500 MiB – 1 GiBup to 2 GiB under heavy compute load
Disk I/O< 1 KB/s10–100 KB/s1–5 MB/s during state replication
Network< 1 KB/s~10 KB/s~1 MB/s during catch-up or compute

Hardware bar is intentionally low — Apple Silicon, Raspberry Pi 5, or a small VPS all comfortably hold the steady-state numbers.

Submitter (proof generation)

CPU100% one core during proving (parallelizable)
RAM100–200 MiB per concurrent proof
Diskproving key in cache (~1.5 MiB resident)

Scaling

What scales horizontally

  • Validator count. Adds capacity for compute and parallelism for verification. Threshold is configurable per network.
  • Submitters. Proof generation is fully parallel; many users can prove simultaneously without coordination.
  • Compute replication factor. Higher replication = stronger BFT guarantees, but linear cost across validators.

What's vertical / fixed

  • Per-proof verification time. ~10 ms is a curve+circuit cost; scaling validator count reduces latency variance, not per-validator cost.
  • Solana L1. Deposit/withdraw confirmation is bounded by L1, not L2.
  • Proving key size. ~1.5 MiB per circuit. Doesn't grow with traffic.

Optimizations on the table

Status
Batch verification (amortize pairings)implemented (src/privacy/batch.rs)
Parallel single-proof generation (multi-core)implemented via rayon in submitter
Proof aggregation (one proof for many)research / future track
Recursive Groth16 (constant-size aggregation)future track
GPU provingnot in scope for v0.5.0

The biggest near-term win is batch verification on the validator side when withdrawal volume grows — already wired, currently dormant on devnet because volume is low.

Metrics to track

Each validator exposes Prometheus-format metrics on metrics.listen (default :9300). Most relevant for performance:

What to watch
paraloom_proof_verify_secondsper-proof histogram; p99 should stay < 50 ms
paraloom_consensus_round_total{outcome=...}rate and outcome distribution of rounds
paraloom_consensus_round_secondsend-to-end round latency histogram
paraloom_peer_countdrop = network reconfiguration or partition
paraloom_nullifier_set_sizegrows monotonically; rate = real withdrawal traffic
paraloom_storage_fsync_secondshigh values = slow disk; consider NVMe
paraloom_coordinator_role0 passive, 1 primary; flips during failover

Alert thresholds (suggested):

WarnCritical
paraloom_proof_verify_seconds p99> 50 ms> 200 ms
paraloom_consensus_round_seconds p99> 3 s> 10 s
paraloom_peer_count< expected − 2< expected − 5
paraloom_storage_fsync_seconds p99> 50 ms> 500 ms

Full reference: Monitoring.

How to reproduce

cargo bench --all                              # microbenches
cargo test --release --test consensus_*        # round latency under load
cargo test --release --test bridge_*           # end-to-end with localnet
./scripts/localnet/test-privacy-e2e.sh         # full deposit→transfer→withdraw timing

If your numbers diverge meaningfully from the table above, check: NTP drift, RocksDB on HDD, or fsync disabled — those are the three usual suspects.

On this page