Measured costs of paraloom-core operations, where they bottleneck, and how to scale.

Performance

Paraloom is designed so that proof verification fits on a laptop. The whole BFT cohort sees ~10 ms of CPU per withdrawal proof; that's the budget that lets the network run on commodity hardware. This page is the operating envelope today, not aspirational targets.

All numbers below come from cargo bench and the integration test suite at v0.5.0-rc2 unless otherwise noted. Reproduce with cargo bench --all on paraloom-core main.

Privacy layer

Operation	Time	Where	Notes
Groth16 verification	~10 ms	validator (per proof)	single CPU core, BLS12-381
Groth16 generation	2–3 s	client (per proof)	parallelizable across cores
Poseidon hash	< 1 ms	both	~500 constraints in-circuit
Pedersen commit	< 1 ms	both	additive homomorphism
Sparse merkle path verify	< 1 ms	validator	fixed-depth
Batch verification (16 proofs)	~50 ms	validator	amortizes pairings

Proof generation is single-threaded by default but each proof is independent — submitters can parallelize across cores trivially. Verification is the load-bearing number: if it weren't ~10 ms, the validator hardware bar would have to go up.

Consensus

Metric	Value
Round latency (p50)	< 1 s
Round latency (p99)	< 3 s
Heartbeat interval	5 s
Heartbeat threshold	15 s (3 missed)
Coordinator failover	< 30 s (RTO)

Round latency is dominated by network round-trips, not by verification. With the 7-of-10 threshold and reputation gating, the cohort waits for the 7th valid vote — typically the slowest of the gated subset.

Compute layer (alpha)

Job class	Typical wall-clock	Memory
Simple WASM (sub-µs ops)	< 1 s end-to-end	< 4 MiB
Aggregation over 10 KB inputs	1–3 s	16–32 MiB
ML-style inference (small model)	5–30 s	up to 64 MiB

Per-validator throughput is about 10 jobs/sec for trivial WASM (single-validator); replicated jobs scale with cohort size minus the consensus overhead.

Throughput today (devnet)

Operation	Throughput	Bottleneck
Deposits	~100/sec	Solana L1 confirmation
Withdrawals	1–5/sec	BFT round time + Solana confirmation
Compute jobs	~10/sec/validator	WASM execution
P2P messages	~1000/sec	gossipsub fanout

Where these come from:

Deposits are pure Solana txs; the L2 just observes events. Capped by Solana, not paraloom.
Withdrawals carry a full BFT round + on-chain submission. Round time + slot time = ~1 s in good conditions.
Compute is currently bounded by single-thread WASM execution per job; replication multiplies cost across validators.

Network

	Value
P2P latency (LAN)	< 100 ms
P2P latency (WAN, geo-distributed)	100–300 ms
Bridge listener poll	10 s default
Gossipsub heartbeat	1 s
Codec read cap	bounded (DoS-hardened)

Codec bounds were tightened in v0.4.0 — every gossip message has a size cap that's enforced before deserialization. See Networking.

Storage

Data	Size	Backing
Merkle leaf (commitment)	32 B	RocksDB sparse merkle
Nullifier entry	32 B + index	RocksDB, fsync on hot writes
Block / round metadata	~1 KiB / round	RocksDB
Proving key (per circuit)	~1.5 MiB	filesystem
Verifying key (per circuit)	~600 B	embedded in Anchor program for on-chain verify

fsync on hot writes is on by default (#68) — closes a corruption window where in-flight writes could be lost on power failure. Cost is minor on SSDs; HDDs will lag visibly.

Resource envelope

Validator (recommended hardware)

	Idle	Steady-state	Peak
CPU	1–5%	10–20% one core	50–80% one core during proof verification bursts
RAM	~500 MiB	500 MiB – 1 GiB	up to 2 GiB under heavy compute load
Disk I/O	< 1 KB/s	10–100 KB/s	1–5 MB/s during state replication
Network	< 1 KB/s	~10 KB/s	~1 MB/s during catch-up or compute

Hardware bar is intentionally low — Apple Silicon, Raspberry Pi 5, or a small VPS all comfortably hold the steady-state numbers.

Submitter (proof generation)


CPU	100% one core during proving (parallelizable)
RAM	100–200 MiB per concurrent proof
Disk	proving key in cache (~1.5 MiB resident)

Scaling

What scales horizontally

Validator count. Adds capacity for compute and parallelism for verification. Threshold is configurable per network.
Submitters. Proof generation is fully parallel; many users can prove simultaneously without coordination.
Compute replication factor. Higher replication = stronger BFT guarantees, but linear cost across validators.

What's vertical / fixed

Per-proof verification time. ~10 ms is a curve+circuit cost; scaling validator count reduces latency variance, not per-validator cost.
Solana L1. Deposit/withdraw confirmation is bounded by L1, not L2.
Proving key size. ~1.5 MiB per circuit. Doesn't grow with traffic.

Optimizations on the table

	Status
Batch verification (amortize pairings)	implemented (`src/privacy/batch.rs`)
Parallel single-proof generation (multi-core)	implemented via rayon in submitter
Proof aggregation (one proof for many)	research / future track
Recursive Groth16 (constant-size aggregation)	future track
GPU proving	not in scope for v0.5.0

The biggest near-term win is batch verification on the validator side when withdrawal volume grows — already wired, currently dormant on devnet because volume is low.

Metrics to track

Each validator exposes Prometheus-format metrics on metrics.listen (default :9300). Most relevant for performance:

	What to watch
`paraloom_proof_verify_seconds`	per-proof histogram; p99 should stay < 50 ms
`paraloom_consensus_round_total{outcome=...}`	rate and outcome distribution of rounds
`paraloom_consensus_round_seconds`	end-to-end round latency histogram
`paraloom_peer_count`	drop = network reconfiguration or partition
`paraloom_nullifier_set_size`	grows monotonically; rate = real withdrawal traffic
`paraloom_storage_fsync_seconds`	high values = slow disk; consider NVMe
`paraloom_coordinator_role`	0 passive, 1 primary; flips during failover

Alert thresholds (suggested):

	Warn	Critical
`paraloom_proof_verify_seconds` p99	> 50 ms	> 200 ms
`paraloom_consensus_round_seconds` p99	> 3 s	> 10 s
`paraloom_peer_count`	< expected − 2	< expected − 5
`paraloom_storage_fsync_seconds` p99	> 50 ms	> 500 ms

Full reference: Monitoring.

How to reproduce

cargo bench --all                              # microbenches
cargo test --release --test consensus_*        # round latency under load
cargo test --release --test bridge_*           # end-to-end with localnet
./scripts/localnet/test-privacy-e2e.sh         # full deposit→transfer→withdraw timing

If your numbers diverge meaningfully from the table above, check: NTP drift, RocksDB on HDD, or fsync disabled — those are the three usual suspects.

Performance

On this page