Sharding and Horizontal Scalability in Blockchain

Sharding and Horizontal Scalability in Blockchain

As public blockchains evolve from niche payments networks into global platforms for finance, data, and applications, they face a critical trade-off between security, decentralization, and scalability. Early designs replicated every transaction and every smart contract state across all nodes. While this “full replication” model maximizes trustlessness and censorship resistance, it also limits throughput, drives up costs for node operators, and slows transaction confirmations when usage spikes.

Sharding offers a path beyond these limits by slicing the blockchain’s state into parallel segments called shards. Instead of every node processing every transaction, nodes can specialize in a single shard’s data and consensus. When designed correctly, this horizontal partitioning multiplies overall capacity, lowers per-node resource demands, and preserves decentralization.

Sharding Basics

Why Scalability Matters

Blockchains like Bitcoin and Ethereum initially prioritized security and censorship resistance by making every node validate every transaction. In Ethereum’s case, that yields roughly 15 transactions per second (TPS) under ideal conditions. As decentralized finance (DeFi), non-fungible tokens (NFTs), gaming, and other high-throughput use cases emerged, these limits resulted in network congestion, surging gas fees, and slow confirmations, undermining user experience and pricing stability.

Vertical Partitioning vs. Horizontal Partitioning

In database systems, two main partitioning strategies exist: vertical and horizontal. Vertical partitioning divides data by columns or attributes. For example, one table might hold account balances while another stores smart contract code, but every node still manages all rows for its subset of columns. While this can optimize certain queries, it does not reduce the total data volume each node must store.

Horizontal partitioning sharding splits data by rows or records. In a blockchain context, those “rows” can represent account balances, contract storage slots, transaction histories, and state roots. Each shard maintains a mini-blockchain for its assigned accounts and smart contracts. Nodes only store and process the transactions relevant to their shard. By running these shards in parallel, a network of NN shards can theoretically approach NN times the throughput of a single chain, subject to cross-shard communication overhead.

Types of Sharding

  • Network Sharding: The peer-to-peer network forms committees, each responsible for gossiping and relaying messages for a specific shard. This reduces network bandwidth per node.
  • Transaction Sharding: Transactions are routed directly to the committee that holds the relevant state (e.g., the account or contract address). This minimizes unnecessary verification work.
  • State Sharding: The blockchain’s global state, the mapping of addresses to balances, code, and storage, is divided across shards. This is the most complex form because it requires secure cross-shard messaging and robust data availability guarantees.

State sharding carries the promise of order-of-magnitude increases in capacity but introduces challenges around cross-shard atomicity, data availability, and balanced validator assignment.

Ethereum’s Sharding Roadmap

After Ethereum transitions to Proof-of-Stake (the “Merge”), its roadmap pivots from execution-sharding, where each shard would run its EVM, to data-sharding, which focuses on scaling rollups rather than splitting execution logic.

Proto-Danksharding (EIP-4844)

Proto-Danksharding introduces “blobs,” large binary data payloads attached to blocks at a significantly lower gas cost than calldata. These blobs serve as a temporary, high-capacity data layer for Layer 2 rollups to post batched transaction data. Blobs are prunable after a short window (approximately 18 days), minimizing long-term storage growth on Ethereum nodes. By lowering the cost of posting rollup data, Proto-Danksharding drives down Layer 2 fees and lays the groundwork for full data-sharding.

Full Danksharding

Full Danksharding extends Proto-Danksharding by increasing the number of blobs per block (from 1 to up to 64) and integrating a data availability sampling protocol. Key elements include:

  • Data Availability Sampling (DAS), which lets light clients verify that all blob data is available with high probability by sampling random fragments.
  • Proposer-Builder Separation (PBS), which helps mitigate centralization risks in block production while ensuring efficient data inclusion.
  • Shard Committees, small validator subsets assigned each epoch to sign off on blob data and attest to its availability.

Full Danksharding is designed to support rollup throughput over 100,000 TPS by offloading execution entirely off-chain.

Why Data-Sharding over Execution-Sharding

Execution-sharding would require each shard to maintain its own EVM, complicating cross-shard calls, state finality, and validator assignments. Data-sharding instead creates a universal, cost-effective data layer for rollups, leveraging the existing L1 for security and focusing on data availability rather than logic execution. This rollup-centric approach simplifies protocol design, accelerates deployment, and ensures interoperability across diverse Layer 2 solutions.

Polkadot’s Parachains as Parallel Shards

Polkadot embraces horizontal scalability through parachains, independent blockchains that run in parallel and share security via a central Relay Chain.

Architecture and Shared Security

Parachains connect to Polkadot’s Relay Chain, which is responsible for overall consensus, finality, and validator staking. Parachain validators submit block candidates to the Relay Chain for validation and finalization. Once accepted, parachain blocks become part of the shared Polkadot security umbrella. This model marries the throughput benefits of horizontal partitioning with pooled validator security.

Parachains communicate via Cross-Chain Message Passing (XCMP), enabling token transfers, data sharing, and composability across parallel shards. Developers can customize each parachain’s runtime, token economics, and governance, tailoring performance and features to specific use cases without forking the base protocol. For a comprehensive introduction to parachains.

Advantages and Trade-Offs

  • Scalability: With up to 100 parachain slots, Polkadot can process transactions in parallel across its shard-like chains.
  • Customization: Each parachain can implement specialized logic, consensus tweaks, or domain-specific languages, fostering innovation.
  • Interoperability: Native XCMP support removes the need for bridges, reducing trust assumptions and friction.
  • Slot Auctions: Projects bid DOT tokens to secure a parachain slot, aligning economic incentives but potentially raising barriers for smaller teams.

Data Availability Solutions

State sharding and even data-only sharding hinge on robust Data Availability (DA) guarantees. Without ensuring that all shard data is accessible to all honest nodes, an adversary could withhold portions of a shard’s state, preventing verification or enabling invalid state transitions.

On-Chain Data Availability

On-chain DA embeds data directly into the main L1 blockchain. Ethereum’s blobs (Proto-Danksharding) and full Danksharding blobs are prime examples. Data is verifiably published on-chain, and light clients can use DAS to check availability without downloading every byte. The trade-off is increased block size and bandwidth usage for validators.

Off-Chain Data Availability

Off-chain DA decouples data storage from L1 consensus. Key models include:

  • Data Availability Committees (DACs): Permissioned nodes agree to store and serve data for a fee. Used by Validium rollups (e.g., StarkEx), DACs provide high throughput but introduce trust assumptions around committee honesty.
  • Data Availability Layers (DALs): Permissionless networks, such as Celestia, specialize in storing and distributing block data. Anyone can run a node and stake tokens to secure the DA layer. DALs remove trust assumptions, scale independently, and can serve as a universal data layer for multiple blockchains.

Data Availability, Sampling, and Erasure Coding

Data Availability Sampling lets nodes randomly request small shares of an erasure-coded blob. If sufficient distinct shares are retrievable, the entire dataset is reconstructible, guaranteeing availability with minimal bandwidth. Erasure coding further enhances resilience by spreading redundancy across shards—losing some shares does not compromise recoverability.

Conclusion

Horizontal scalability through sharding marks a pivotal evolution in blockchain design. Ethereum’s data-sharding roadmap, centered on rollup-focused Proto and Full Danksharding, promises to unlock unprecedented throughput while maintaining the L1’s security and settlement guarantees. Polkadot’s parachains showcase a modular, customizable approach to shard-like parallelism with native cross-chain interoperability. Meanwhile, specialized data availability solutions from on-chain blobs to off-chain layers like Celestia ensure that shards cannot be withheld or sabotaged.

Yet challenges remain. Cross-shard communication must balance latency and atomicity, validator assignments must guard against collusion, and user-friendly tools are needed to abstract shard complexity for developers. As research into Data Availability Sampling, light-client proofs, and adaptive sharding continues, we can anticipate blockchains scaling to hundreds of thousands of transactions per second without compromising trustlessness.


References


MITOSIS official links:

GLOSSARY
Mitosis University
WEBSITE 
X (Formerly Twitter)  
DISCORD
DOCS