FAFO by LayerZero: Parallel EVM Execution at Web2 Speed

Abstract

Imagine a blockchain handling over a million transactions per second on a single computer, matching the speed of systems like Visa while keeping its decentralized nature. Fast Ahead-of-Formation Optimization (FAFO), built by LayerZero Labs, makes this possible by organizing transactions to run at the same time on modern multi-core computers. FAFO uses smart, efficient tools to spot potential conflicts between transactions, ensuring high speed with minimal extra work. Paired with the Rust Ethereum Virtual Machine (REVM), FAFO processes over 1.1 million native ETH transfers and over 565,000 ERC20 token transfers per second on one computer, costing 91% less than advanced split-up systems. Unlike other fast blockchains, FAFO organizes data into a tree structure after every block, supporting lightweight apps (like mobile wallets) and secure validation for privacy-focused applications. FAFO grows easily with more computer power and is freely available at https://github.com/LayerZero-Labs/fafo.

Benchmark	Throughput (TPS)
Native	1,121,732
ERC20	565,956

FAFO achieves up to 1.1M transactions per second (TPS) on a 96-core computer.

FAFO Speed on 96-Core Computer

Introduction

Blockchains like Ethereum process transactions one by one, slowing them down and limiting their ability to handle many users. This happens because they don’t fully use modern computers with multiple processing cores. FAFO solves this by rearranging transactions to run simultaneously, achieving over 1 million transactions per second (TPS) on a single computer without losing the decentralized trust that makes blockchains special.

In a blockchain, every computer in the network must agree on the same transaction results, like ensuring everyone sees the same bank balance after a payment. FAFO does this by carefully scheduling transactions to avoid conflicts (e.g., two payments trying to update the same account at once). Using the Ethereum Virtual Machine (EVM), FAFO maximizes parallel processing—running multiple transactions at once. Unlike other systems that process transactions in order or struggle to run them together, FAFO organizes transactions before they’re locked into blocks. It also organizes data into a tree (called Merkleizing) using QMDB, making it easier for lightweight apps and privacy-focused tools to verify transactions.

Contributions

ParaBloom: A fast tool to check for transaction conflicts.
ParaFramer: A method to group non-conflicting transactions for parallel processing.
ParaScheduler: A system to run transactions smoothly and avoid errors.

FAFO’s open-source code at https://github.com/LayerZero-Labs/fafo lets developers build fast blockchain systems.

The "Contributions" section outlines the key innovations and components that collectively enable FAFO to achieve its stated performance and benefits. It highlights the specific technical advancements that FAFO introduces to solve the problem of limited blockchain throughput due to data contention and inefficient CPU utilization.

Design

FAFO uses a four-step process to run transactions quickly while keeping results consistent across the network. This section explains parallel processing (§2.1), outlines the process (§2.2), and describes its parts: ParaLyze (§2.3), ParaFramer & ParaBloom (§2.4), and ParaScheduler (§2.5).

Parallel Processing

Parallel processing (or Transaction-Level Parallelism, TLP) is about running multiple transactions at the same time without causing errors. Think of it like a busy kitchen where chefs (transactions) can cook different dishes (access data) as long as they don’t need the same ingredients (data conflicts). FAFO ensures transactions don’t clash by checking if they touch the same data, especially if one changes it. The best schedule runs as many transactions as possible together, but finding the perfect setup is tricky and time-consuming. FAFO uses smart shortcuts to get close to the ideal while keeping things fast.

Key Takeaway: Parallel processing lets FAFO use all parts of a computer to handle transactions quickly and correctly.

FAFO’s Process

FAFO assumes one computer (the block producer) has time to gather many transactions, like in systems like Solana. It creates an ordered list of transactions, adding markers to group them into blocks, and works with various blockchain apps (e.g., rollups, main chains). The four steps are:

ParaLyze: Guesses which data each transaction will use.
ParaFramer: Groups transactions that can run together using ParaBloom.
ParaScheduler: Runs the grouped transactions without conflicts.
Block Creation: Finalizes groups of transactions for the blockchain.

FAFO System Architecture

Mempool

Input Transactions

ParaLyze

Parallel ParaLyze

Analyzed Data Flow

ParaFramer &
ParaBloom

Grouped Transactions

ParaScheduler

Block Stream

Finalized Blocks

Note: Dropped transactions return to the Mempool for rescheduling.

ParaLyze: Transaction Guessing

ParaLyze looks at transactions waiting to be processed and predicts what data they’ll read or change, like planning which ingredients a chef needs. It uses simulations or provided hints (e.g., EIP-2930 ) to make these guesses. To keep things fast, it doesn’t save changes during guessing, which can lead to mistakes for complex transactions. Running guesses in parallel ensures speed.

Key Takeaway: ParaLyze predicts data needs to plan conflict-free transaction groups, though some guesses may need rechecking.

ParaFramer & ParaBloom: Grouping Transactions

ParaBloom

Purpose: Quickly checks if transactions conflict.
How It Works: Uses two compact data tools (2048-bit Bloom filters) per group—one for data read, one for data changed—to spot conflicts fast.
Trade-Off: It might flag non-conflicts as conflicts (reducing parallel processing by ~8%), but this speeds up the overall system.

ParaBloom setup, showing two data tools per group for reading and changing data.

ParaFramer

ParaFramer groups transactions that can run together, creating a list ready for parallel processing. A transaction fits into a group if it doesn’t conflict with others in that group, checked using this rule:

[ \text{No overlap between transaction’s reads/changes and group’s changes/reads.} ]

The grouping method (Algorithm 1) adds each transaction to the first non-conflicting group or starts a new group if needed.

Algorithm 1: Simple Grouping Method

FAFO Algorithm 1 - Greedy Frame-Packing

Algorithm 1: Greedy Frame-Packing (In Simpler terms )

Start with empty groups
For each transaction:
    Found a spot ← no
    For each group:
        If transaction fits (no conflicts):
            Add transaction to group
            Update group’s read/change lists
            Found a spot ← yes
            Stop checking groups
    If no spot found:
        Finish the largest group
        Start a new group with this transaction
        Set group’s read/change lists

ParaBloom uses 64 pairs of data tools, fitting neatly into a computer’s fast memory (32 KiB of L1 cache), making checks quick.

Key Takeaway: ParaFramer and ParaBloom group transactions to run many at once, using fast tools to keep checks efficient.

ParaScheduler: Running Transactions

ParaScheduler takes the grouped transactions and runs them, ensuring no conflicts. It builds a map (called a DAG) for each piece of data, showing which transactions need to wait for others. Transactions run as soon as their dependencies are done, keeping everything in order. If a transaction’s actual data use differs from ParaLyze’s guess, it’s sent back to be regrouped. The number of transactions running together matches or beats the average group size.

Key Takeaway: ParaScheduler runs transactions without conflicts, using smart maps to maximize computer use.

Research Contributions

ParaBloom: New, fast data tools for spotting conflicts.
ParaFramer: Smart grouping method for parallel processing.
ParaScheduler: Map-based system for smooth, conflict-free transaction runs.

Evaluation

FAFO shows high speed and flexibility, even with challenging workloads, tested on a powerful AWS computer (96 ARM-based Graviton 3 cores, 768 GiB memory, 6 fast drives combined, 22.5 TB, 2.16M operations per second).

Workloads

Two test scenarios were used:

Native Transfer: Moves ETH between accounts, updating two data points (sender/receiver balances).
ERC20 Transfer: Moves tokens, updating three data points (sender ETH/token balances, receiver token balance).

Tests used ~1 billion accounts and 256 million transactions, varied by:

Hot Spots (( \gamma )): Number of frequently used accounts.
Conflict Chance (( \alpha )): Likelihood of using hot accounts.

Results

FAFO delivers:

High Speed: 1,121,732 TPS (Native) and 565,956 TPS (ERC20) on a 96-core computer.
Smooth Growth: Handles ~130 parallel transactions for Native transfers, growing with up to 96 cores.
Low Cost: Over 1M TPS at $6,013/month, 91% cheaper than sharded systems ($65,361/month).
Handling Conflicts: Keeps 1.1M TPS and 130 parallel transactions even when 99% of requests hit 0.0001% of accounts (( \alpha=0.99, \gamma=1000 );. Real-world scenarios are less extreme (( \gamma > 1M )).
Low Extra Work: Organizes over 2M TPS without running transactions.

Blockchains often have “hot spots,” where 0.1% of data gets 62% of access, slowing parallel processing. FAFO outperforms other approaches:

More Context

The "Related Work" section provides essential context for FAFO by situating it within the current landscape of blockchain scalability solutions. It highlights the prevalent challenges that existing approaches face and demonstrates how FAFO's unique design effectively addresses or surpasses them.

The "Hotspot Problem" and Its Impact

A fundamental challenge in many blockchains is the "hotspot problem". This occurs when a very small percentage of data (e.g., 0.1% of storage slots) accounts for a disproportionately large percentage of accesses (e.g., 62% of accesses). This high contention significantly limits parallel processing, as numerous transactions attempt to access the same popular data, leading to conflicts and reduced efficiency. This problem underscores the critical need for solutions that can handle such bottlenecks effectively.

Limitations of Existing Approaches

Several approaches have been developed to enhance blockchain throughput, but each comes with its own set of limitations:

Optimistic Concurrency Control (OCC) / Speculative Execution:
- Method: Systems like Block-STM and ParallelEVM apply multi-version OCC to speculatively run transactions in parallel.
- Problems: Block-STM can slow down considerably under high contention due to high abort overhead. ParallelEVM, while offering fine-grained operation-level concurrency, has only shown modest speedups of 4.28x.
Sharding:
- Method: Sharding aims to scale horizontally by splitting the blockchain workload across multiple computers or "shards".
- Problems: This approach introduces complex coordination challenges and new synchronization and storage bottlenecks. For instance, Shardines encountered storage bottlenecks at 30 nodes and exhibited sublinear scaling with diminishing returns, experiencing a 33% drop in efficiency for contentious workloads when the number of shards was doubled.
Lock-based Scheduling on Static Read/Write Sets:
- Method: Solana, for example, avoids transaction aborts by using lock-based scheduling based on static read/write sets.
- Problems: This method offloads the burden of accurately specifying resource access to contract developers, which can be a complex and error-prone task.

FAFO's Differentiating Edge

FAFO significantly distinguishes itself by addressing the shortcomings of these existing solutions:

Versus Block-STM: FAFO organizes transactions ahead of block formation to proactively avoid slowdowns caused by conflicts, rather than correcting them speculatively.
Versus Solana: Unlike Solana, which requires developers to define data needs, FAFO automatically checks conflicts using its innovative ParaBloom mechanism.
Versus Sharding: FAFO achieves comparable, or even superior, throughput on a single node. This single-node efficiency makes FAFO 91% cheaper than state-of-the-art sharding-based approaches, eliminating the complexities and costs associated with sharding.
Versus ParallelEVM: FAFO demonstrates over 100x faster execution through its efficient transaction grouping and scheduling, compared to the limited speedups observed in operation-level concurrent execution.

Broader Context of Transaction Reordering

FAFO's "ahead-of-formation" approach is a key differentiator from "replica-side schedulers" like OptME and DMVCC, which reorder transactions

after a block has been disseminated. These systems often incur high overhead from speculative execution or impose a poor user experience by requiring users to provide access lists. Moreover, they cannot easily integrate block producer-side policies. In contrast, FAFO's block producers reorder transactions

before block formation, which reduces per-validator CPU overhead and allows for arbitrary scheduler upgrades.

FAFO also operates differently from Hyperledger-style systems, which adhere to an Execute-Order-Validate (EOV) model, as FAFO aligns with the Order-Execute architecture typical of account-based blockchains.

FAFO's Hybrid Approach

FAFO adopts a hybrid approach that strikes a better balance between the extremes seen in other systems. It avoids the burden placed on developers in lock-based systems like Solana and the increased CPU overhead on leaders in purely speculative scheduling approaches. By combining efficient approximate conflict detection (ParaBloom) with lightweight static scheduling (ParaScheduler), FAFO optimizes for both performance and practical usability, maximizing concurrency and minimizing synchronization overhead

Approach	Method	Problems	FAFO’s Edge
Block-STM [9]	Guesses and corrects conflicts	Slows down with many conflicts	Organizes transactions early to avoid slowdowns
Solana [21]	Locks data for scheduling	Developers must define data needs	Automatically checks conflicts with ParaBloom
Sharding [11]	Splits work across computers	Complex coordination	Single computer, 91% cheaper
ParallelEVM [12]	Fine-tuned parallel tasks	Only 4.28× faster	Over 100× faster with transaction grouping

Unlike systems that organize transactions after blocks are set (e.g., OptME, DMVCC, FAFO plans ahead, saving computer power. Solana burdens developers, and guessing-based systems use extra resources. FAFO blends planning and efficiency.

Conclusion

FAFO processes over 1.1 million EVM transactions per second on one computer, matching split-up systems at 91% lower cost. By organizing transactions early, FAFO maximizes parallel processing with ParaBloom to spot conflicts and ParaScheduler to run transactions smoothly. This powers apps like decentralized finance (DeFi) for thousands of trades per second or gaming platforms for millions of players. Developers can use FAFO’s open-source code at https://github.com/LayerZero-Labs/fafo to build fast EVM-based blockchains with just a transaction pool and agreement system, cutting costs and boosting decentralization.

Reference

https://layerzero.network/publications/FAFO_Whitepaper.pdf

FAFO by LayerZero: Parallel EVM Execution at Web2 Speed

Abstract

FAFO Speed on 96-Core Computer