zkTrie
This document describes zkTrie, a sparse binary Merkle Patricia Trie used to store key-value pairs efficiently. It explains the tree structure, construction, node hashing, and tree operations, including insertion and deletion.
You can also explore our zktrie repo.
Tree Structure
zkTrie is a sparse binary Merkle Patricia Trie, depicted in the below figure. Before diving into the Sparse Binary Merkle Patricia Trie, let’s briefly touch on Merkle Trees and Patricia Tries.
- Merkle Tree: A Merkle Tree is a tree where each leaf node represents a hash of a data block, and each non-leaf node represents the hash of its child nodes.
- Patricia Trie: A Patricia Trie is a type of radix tree or compressed trie used to store key-value pairs efficiently. It encodes the nodes with same prefix of the key to share the common path, where the path is determined by the value of the node key.
As illustrated in Figure 1, there are three types of nodes in the zkTrie.
- Branch Node: Given the zkTrie is a binary tree, a branch node has two children.
- Leaf Node: A leaf node holds the data of a key-value pair.
- Empty Node: An empty node is a special type of node, indicating the sub-trie that shares the same prefix is empty.
In zkTrie, we use Poseidon hash to compute the node hash because it’s more friendly and efficient to prove it in the zk circuit.
Tree Construction
Given a key-value pair, we first compute a secure key for the corresponding leaf node by hashing the original key (i.e., account address and storage key) using the Poseidon hash function. This can make the key uniformly distributed over the key space. The node key hashing method is described in the Node Hashing section below.
We encode the path of a new leaf node by traversing the secure key from Least Significant Bit (LSB) to the Most Significant Bit (MSB). At each step, if the bit is 0, we will traverse to the left child; otherwise, traverse to the right child.
We limit the maximum depth of zkTrie to 248, meaning that the tree will only traverse the lower 248 bits of the key. Because the secure key space is a finite field used by Poseidon hash that doesn’t occupy the full range of , the bit representation of the key can be ambiguous in the finite field and thus results in a soundness issue in the zk circuit. After we truncate the key to lower 248 bits, the key space can fully occupy the range of and won’t have the ambiguity in the bit representation.
We apply an optimization to reduce the tree depth by contracting a subtree that has only one leaf node to a single leaf node. For example, in Figure 1, the tree has three nodes in total, with keys 0100
, 0010
, and 1010
. Because there is only one node that has a key with suffix 00
, the leaf node for key 0100
only traverses the suffix 00
and doesn’t fully expand its key which would have resulted in a depth of 4.
Tree Operations
Insertion
When we insert a new leaf node into a zkTrie, there are two cases illustrated in Figure 2.
- When traversing the path of the node key, it reaches an empty node (Figure 2a). In this case, we just need to replace this empty node by this leaf node and backtrace the path to update the merkle hash of branch nodes till the root.
- When traversing the path of the node key, it reaches another leaf node
b
(Figure 2b). In this case, we need to push down the existing leaf nodeb
until the next bit in the node keys of two leaf nodes differs. At each push-down step, we need to insert an empty sibling node into the branch node. When we reach the level where the bits differ, we then place two leaf nodesb
andd
as the left child and the right child depending on their bits. At last, we backtrace the path and update the Merkle hash of all branch nodes.
Deletion
The deletion of a leaf node is similar to the insertion. There are two cases illustrated in Figure 3.
- The sibling node of the to-be-deleted leaf node is a
branch node (Figure 3a). In this case, we can simply replace the node
a
with an empty node and update the node hash of its ancestors till the root node. - The sibling node of the to-be-deleted leaf node is a leaf node (Figure 3b). Similarly to case 1, we first replace the leaf node with an empty node and start to contract its sibling node upwards until its sibling node is not an empty node. For example, in Figure 3b, we replace the leaf node
b
with an empty node. Because the sibling of nodec
now becomes an empty node, we need to move nodec
one level upward and replace its parent. The new sibling of nodec
, nodee
, is still an empty node. So again we move nodec
upward. Now that the sibling of nodec
is nodea
, a leaf node, the deletion process is finished.
Note that the sibling of a leaf node in a valid zkTrie cannot be an empty node. Otherwise, we should always prune the subtree and move the leaf node upwards.
Node Hashing
In this section, we will describe how the leaf secure key and node Merkle hash are computed. We use Poseidon hash with arity 2 for both hashing computations. In Scroll, the Poseidon hash function is configured to take two field element inputs each time and a domain_value
as the initial context for domain separation, denoted as follows.
Empty Node
The node hash of an empty node is 0.
Branch Node
The branch node hash is computed as follows
Leaf Node
The node hash of a leaf node is computed as follows.
The computation involves two fields nodeKey
and valueHash
.
nodeKey
is hashed from the original key. The domain value used in the Poseidon hash is 256.valueHash
is calculated by hashing the leaf node value. The domain value used in the Poseidon hash is256 * n
, wheren
is the number of elements in the leaf node value.
There are two types of leaf nodes: Ethereum accounts and storage key-value pairs. Next, we will describe the calculation method of the node key and value hash for each leaf node type respectively.
Ethereum Account Leaf Node
An Ethereum Account Leaf Node consists of an Ethereum address and a state account data structure. The secure key is derived from the Ethereum address.
A state account struct in the Scroll consists of the following fields (Fr
indicates the finite field and is a 254-bit value)
Nonce
: u64Balance
: u256, but treated as FrStorageRoot
: FrKeccakCodeHash
: u256PoseidonCodeHash
: FrCodeSize
: u64
Before computing the value hash, the state account is first marshaled into a list of u256
values. The marshaling scheme is
The marshal function also returns a flag
value along with a vector of u256
values. The flag
is a bitmap that indicates whether a u256
value cannot be treated as a field element (Fr). The flag
value for state account is 8, shown below.
The value hash is computed in two steps:
- Convert the value that cannot be represented as a field element of the Poseidon hash to the field element.
- Combine field elements in a binary tree structure till the tree root is treated as the value hash.
In the first step, when the bit in the flag
is 1 indicating the u256
value that cannot be treated as a field element, we split the value into a high-128bit value and a low-128bit value, and then pass them to a Poseidon hash to derive a field element value, h(valueHi, valueLo)
.
Second, the value hash is computed as follows.
Storage Leaf Node
A Storage Leaf Node encodes a key-value pair where both key and value are u256
values. The secure key of this leaf node is derived from the storage key.
The storage value is a u256
value. The flag
for the storage value is 1, shown below.
The value hash is computed as follows