This repository contains an implementation of a STARK, and serves as an accompanying codebase to the excellent Diving DEEP FRI in the STARK world blog post by LambdaClass. It is called STARK 102 as it is meant to be a follow-up to Starkware's STARK 101. Ultimately, the goal for this repository is to serve as a stepping stone to understanding and contributing to real world STARK libraries such as Winterfell.
Please open an issue or start a new Discussion if anything is not clear and requires further explanation. If something is confusing to you, it probably is for many others! We'll gladly use the feedback to improve the documentation.
Specifically, we implement a STARK that proves the following statement:
I computed the following sequence:
over the prime field with prime 17.
Unlike STARK libraries such as Winterfell, this STARK implementation is completeley hardcoded to this problem.
This repository builds on STARK 101. Specifically, STARK 101 only shows how the prover works; in STARK 102, we also implement the verifier. STARK 101 also left some nitty-gritty details unexplained, rightly so, to focus on the most important aspects. We try to fill that gap here and explain every little detail, either in this document or in comments in the source code. Also, this implementation is in Rust, the language typically used for production implementations of STARKs.
Similar to STARK 101, this is meant as a resource to learn about STARKs. The goal is not to be efficient; rather, it is to get the whole STARK idea across from start to finish for a simple problem. We agree with LambdaClass that doing a "pen and paper" example of a complex topic is the best way to learn it. Tailoring the implementation to the abovementioned problem allows the reader to easily play around with the code. For example, we hardcode the domain values for the trace (and low-degree extended) polynomials (see src/domain.rs). If the reader prints out domain values to inspect the program at runtime, they can refer back to the definition of the domain and see their printed value in the source file. We believe this can be helpful in relieving the brain to focus on actually learning STARKs; it certainly was for us.
Where appropriate, we choose the simpler of 2 valid options. For example, we use Lagrange interpolation instead of Fast Fourier Transforms, and FRI instead of DEEP FRI. There are no dependencies other than blake3 for a hash function, and anyhow for convenient errors. We wanted every last detail about what makes STARKs tick to be contained in this repository, whether it's how to compute the logarithm of a field element, how Lagrange interpolation works, or how Merkle tree proof verification actually works. We strongly believe that having everything in one place, where the focus is ease of understanding as opposed to efficiency, is very helpful. This is similar in philosophy to STARK 101. Finally, some loops are unrolled, such as when computing FRI layers. This allows us to give a name to each FRI layer, and makes the number of layers explicit. We believe this can help readers identify shortcomings in their understanding. Maybe they expected there to be 4 layers, where in reality there are 3; they probably wouldn't have realized that if we stored the layers as Vec<FriLayer>.
lib.rs contains the definition of StarkProof, the type that defines what a proof looks like. You should first head over to prover::generate_proof() to see how a proof is constructed. This will introduce you to all our core types, such as field::BaseField, poly::Polynomial, merkle::MerkleTree, etc.
Then, you can head over to verifier::verify() to see how the verifier uses the StarkProof struct to accept or reject a proof.
The test at the bottom of lib.rs demonstrates the usage of the library. You can also run cargo doc --open to generate the docs.
In this section we will discuss important topics in detail. We will focus on the ones that weren't fully explored in STARK 101.
If if it is not clear to you why the "commit and query" strategy used in STARKs is a valid way to verify if the prover does indeed have the claimed polynomial, then I recommend you read this article by Vitalik. I found it did a great job at conveying the intuition.
The Channel, defined in src/channel.rs, is the type that implements the Fiat-Shamir transform. You will find an equivalent type both in STARK 101 and Winterfell. It is a core piece of the STARK implementations.
The Fiat-Shamir transform is a widely used technique to convert an interactive protocol into a non-interactive one. STARKs are defined as an interactive protocol turned non-interactive using the Fiat-Shamir transform. I recommend watching the first 7 minutes of this video to get a concise description of the Fiat-Shamir transform.
The Channel works in the following way. Creating a Channel with Channel::new() initializes it with a fixed value (currently the hash of 42). The prover can send messages to the verifier using Channel::commit(). Internally, this updates the Channel's state by hashing the prover's message with its current state. Then, a verifier can send messages back (which as mentioned in the video are defined to be uniformly random values in an Interactive Argument) with either Channel::random_element() or Channel::random_integer(). This works simply by interpreting the Channel's current hash as a field element or an integer, respectively. We then make sure to update the internal hash to a new value so that the random_*() methods can be called multiple times and return different values for each call.
The Channel is a very clean abstraction to turn an interactive protocol into a non-interactive one. You should now go re-read the implementation of prover::generate_proof(), and pay attention to all the calls to channel.commit() and channel.random_element()/channel.random_integer(). In your head, you should now see those as messages being sent back and forth between the prover and the verifier!
Finally, let's turn our attention to how the verifier uses the Channel in verifier::verify(). First, it must interact with the Channel in exactly the same way that the prover did. That way, it ensures to draw the same values from Channel::random_element() and Channel::random_integer(). This is critical. Notice that the random values (from random_element/integer()) are not included in the StarkProof. Rather, they are re-derived by the verifier. Pause and ponder why this is the only way that the verifier can ensure that the values are indeed random, and that the prover didn't pick convenient values. Think about how the prover could trick the verifier if all "random" values were included in the StarkProof as opposed to being rederived by the verifier.
You might have noticed that we don't send a MerkleRoot of the last FRI layer of degree 0, and hence nor don't we send a MerklePath along with the last queried element.
The last layer (degree 0) has 2 elements that have the same value (remember: a
degree 0 polynomial is a constant function f(x) = c). We don't build a Merkle
tree for that layer as it is unnecessary: assuming that the prover sends the
right value for the last layer to the verifier (if it doesn't the proof fails
anyway), then the Merkle root is deterministic and doesn't provide any
information.
Try it yourself. Build a Merkle tree of 2 or 4 of the same elements, and build a MerklePath for each of them. Notice that all the MerklePaths are equal. Hence, in the scenario that we sent a MerklePath along with the element of the last FRI layer, the verifier could take the value that it expects for the last layer (here), build the MerklePath itself, and ensure that the MerklePath it got is equal to the one included in the proof. But, it might as well do the equality check on the value directly; why bother with the MerklePath? This is the intuition behind what it means to "not provide any information".
When constructing the query phase of the proof, the prover needs to send
Both of these problems are solved in STARK 101, but not explained. Explaining why the way we compute these is correct is the goal of this section.
If the FRI layer has
The first thing to realize is that computing the index of
Let's start with the first FRI layer (degree 3), the LDE domain.
We can see that the relationship holds.
Let's look at the domain of the next FRI layer.
The relationship holds once more.
And indeed,
So we've confirmed numerically that our relationship holds. Next, we'll give a proof of that our relationship holds. This will give us insight into why it works.
Recall that the LDE domain is constructed in 2 steps
- Construct the subgroup
$D_0 = {g^0, g^1, g^2, g^3, g^4, g^5, g^6, g^7}$ , where$g=9$ - Shift every element in
$D_0$ by 3,$LDE = {3g^0, 3g^1, 3g^2, 3g^3, 3g^4, 3g^5, 3g^6, 3g^7}$
To make the proof easier to follow, we will prove the relationship over
Therefore, we have
Notice that
Hence,
Can you show that
Lastly, we need to show that this holds for every FRI layer, not just the first one. We'll only give a proof sketch for why this is true. Once again, we will ignore the fact that every element is shifted by 3, because it doesn't change the result and makes the proof easier to read. Remember that we construct the subsequent FRI layer as
Convince yourself that the generator is
Notice once again that
This completes the proof sketch. As an exercise, use this proof sketch to write a complete proof by induction.
Let
Remember that by definition of being a generator of
Notice that indeed, the first half is the same as the second half. This makes it clear why the index of the next layer is
As an exercise (very similar to the previous one), you can show why this is true for every FRI layer.
Q: How does the verifier ensure that the prover interpolated the trace polynomial over the predetermined DOMAIN_TRACE?
The STARK protocol states that the prover should
- Run the program and record the 4-element trace
$T$ - Interpolate a polynomial
$P_T$ that maps the elements ofDOMAIN_TRACEto$T$ - That is,
$P_T: BaseField \rightarrow BaseField$ is a polynomial such that-
$P_T(1) = T[0]$ , -
$P_T(13) = T[1]$ , -
$P_T(16) = T[2]$ , and $P_T(4) = T[3]$
-
- That is,
- Evaluate
$P_T$ overDOMAIN_LDEand use that "extended trace" in the rest of the protocol.
This part of the protocol is crucial for the zero-knowledge property. That is, by evaluating over DOMAIN_LDE and committing to that instead of the evaluations over DOMAIN_TRACE, we ensure to not leak any of the elements of the original trace. However, how does the verifier know that the prover indeed executed step 2 properly? Specifically, how does it know that the prover used DOMAIN_TRACE as a domain for interpolating the polynomial, as opposed to any other domain? In other words, what part of the verifier's algorithm will fail if the prover uses a different DOMAIN_TRACE? Try to answer on your own!
Show answer
The key is to realize that the original domain is encoded in the boundary and transition constraints. As a reminder,
-
boundary constraint:
$C_1(x) = \frac{P_T(x) - 3}{x - \texttt{DOMAIN\_TRACE}[0]}$ is a polynomial -
transition constraint:
$C_2(x) = \frac{P_T(gx) - P_T(x)}{(x - \texttt{DOMAIN\_TRACE}[0])(x - \texttt{DOMAIN\_TRACE}[1])(x - \texttt{DOMAIN\_TRACE}[2])}$ is a polynomial
So what will go wrong if the prover interpolated
Well, this other polynomial (call it DOMAIN_TRACE[0], and hence
Follow-up question: Can you identify how FRI will fail when you feed in a
Q: At query time, how does the verifier ensure that the prover sent the trace element at the requested query index?
At query time, the verifier sends to the prover the "query index" (i.e. the index in the extended trace that the prover needs to send to the verifier). The (honest) prover sends that value, along with a proof that the value is indeed in the extended trace. However, the Merkle proof only proves that the value sent is somewhere in the trace, but it doesn't prove that the value is indeed the value at the queried index. How does the verifier then ensure that the prover did indeed send the value at the queried index?
Show answer
The answer lies in the fact that the verifier will use the queried index when verifying the query. Similar to the previous question, we'll only consider the boundary constraint in this answer, but the same reasoning applies to the transition constraint.
-
boundary constraint:
$C_1(x) = \frac{P_T(x) - 3}{x - \texttt{DOMAIN\_TRACE}[0]}$ is a polynomial
The key lies in remembering that the verifier queries a random x (derived directly from the queried index), which it uses to evaluate the boundary and transition constraints (and all the way down to the last FRI layer). If the prover sent P_T(x') (where
If this is not totally clear to you, then try it out! Write a malicious prover which when the verifier queries query_idx, the malicious prover supplies the value at index query_idx + 1. This should be a very small modification from the existing prover. Then add print statements to the verifier, run the proof_verification test, and try to get an intuitive sense of exactly where and why things go wrong. Notably, confirm that the Merkle proof verification checks out just fine.
There's nothing like getting your hands dirty to truly understand something, and hence this exercise. We modify the original statement to:
I computed the following sequence:
over the prime field F_p with prime 17, for some public x ∈ F_p.
Essentially, modify the codebase to make the first value in the sequence any value in the set {0, ..., 16}. Or in other words, any element of BaseField. Notably, this will require you to
- Change
constraints::boundary_constraint()to take a parameterx: BaseField- Hint: you will need to implement polynomial division
- We can now make
Channel::new()to take the parameterx: Basefield, and initialize the hash usingxas opposed toCHANNEL_SALT.