From ZK to Bulletproofsproof via Python¶
I Motivation¶
Prove that some condition holds true, without revealing anything else, but even better - do it using only a tiny amount of data!
II Commitments¶
Cryptographic commitments usually used in a situation where you want to promise something is true before later proving it to be true.
Commitments are only possible because of existence of one-way functions’
E.g. If Alice want to commit to the value “2”, She can send Bob its hash (53c234e5e8472b6ac51c1ae1cab3fe06fad053beb8ebfd8977b010655bfdd3c3), the hash is the commitments of value “2”
If Alice always send same commitment for value “2”, it’s lack of Semantic Security, so instead of commit to “2”, Alice may commit to “2” + some random data(salt). The commitments may looks like \(\mathcal{H}(secret||salt)\)
Key properties of any commitment scheme:
Hiding - a commitment \(C\) does not reveal the value it commit to.
Binding - have make the commitment \(C(m)\) to \(m\), you can’t change your mind and open it as commitment to a different message \(m'\)
III Homorphic and Pedersen¶
A Pedersen commitment in elliptic curve form¶
\(H\) is a curve point, for which nobody knows the discrete logarithm \(q\) s.t. \(H = qG\).
[3]:
from klefki.curves.secp256k1 import (
EllipticCurveGroupSecp256k1 as ECG,
FiniteFieldSecp256k1 as F,
FiniteFieldCyclicSecp256k1 as CF
)
G = ECG.G
[4]:
import random
N = 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141
def random_cf() -> CF: return CF(random.randint(1, N) % F.P)
[5]:
q = random_cf()
H = G @ q
[6]:
def C(r: CF, a: CF): return (H@r + G@a)
Homomorphism of Pedersen commitment¶
[7]:
r_1, a_1, r_2, a_2 = random_cf(), random_cf(), random_cf(), random_cf()
[8]:
assert C(r_1, a_1) + C(r_2, a_2) == H@r_1 + G@a_1 + H@r_2 + G@a_2
[9]:
assert C(r_1, a_1) + C(r_2, a_2) == C(r_1 + r_2, a_1 + a_2)
The Vector Pedersen Commitment¶
NUMS-ness and binding¶
NUMS sands for “Noting Up My Sleeve”. ref: https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number
A easy way is to use a hash fuction like SHA256.
For \(C = rH_aG\) to find two scalar values \(s,b\) that \(s\neq r, b\neq a\) is impossible, unless the ECDLP(Ellipic Curve discrete Logarithm Problem) had cracked.
To extend to a more powerful form of the Pedersen commitment already defined, we go from:
to:
The individual G i s can be constructed using a simple algorithm of the form already mentioned (like, take a \(\mathcal{H}(encode(G)||i)\) where \(\mathcal{H}\) represents some hash function).
Not: we can extend this yet further to create commitments to 2 or even multiple vectors;
We just need a set of N NUMS base curve points for each vector we’re committing to, for example (2 vectors):
Note here that the single curve point H is not part of the vector H.
[8]:
from klefki.types.algebra.utils import encode, decode
from hashlib import sha256
[9]:
from functools import reduce
def map2curve(x: CF):
return G @ x
secret = "sec"
Gs = list(map(map2curve,
(map(lambda x: int(x, 16),
(map(lambda x: sha256(x.encode()).hexdigest(), sha256(secret.encode()).hexdigest()))))))
def vgm(a: [CF]) -> [ECG]:
return reduce(lambda x,y: x+y,
list(map(lambda a: a[0] @ a[1], zip(Gs, a))))
def Cv(r:CF, a:[CF]):
return (H@r + vgm(a))
Homomorphism¶
[10]:
r_1, r_2 = random_cf(), random_cf()
v_1, v_2 = [random_cf() for i in range(0, 5)], [random_cf() for i in range(0, 5)]
[11]:
assert Cv(r_1 + r_2, list(map(lambda x: x[0] + x[1], zip(v_1, v_2)))) == Cv(r_1, v_1) + Cv(r_2, v_2)
Perfect Hiding and Perfect Binding are incompatible¶
The perdersen commitment is property as “perfeact” or “satistical” hiding
But it’s not binding, if you know the discrete log of.
Consider, if I have commitment \(C=rH+aG\), there is another \(r'\) s.t. \(C=r'H+a'G\) for any chosen \(a'\);
It’s just that we can’t find that \(r'\) without the ability to crack the ECDLP. But the fact that it even exists means that the commitment is a valid commitment to any value \(a\), for the right \(r\).
the Pedersen commitment’s binding is not perfect – it is “computational”. What this means is that, much as discussed just above, in a case where the ECDLP was cracked, the binding property could fail and people could open the commitment to other values.
IV Zero Knowledge argument of knowledge set of vectors¶
There are three key properties of any zero knowledge proof:
- Does an honest Prover successd in convincing the Verifier?
Classical Proofs def: If \(x \in L,\), then there exists proofs s.t. \(V(x, proof)\)=accept.
Interactive Proofs def: If \(G_0, G_1\) are non isomorphic then there is only one \(b'\) such that \(H \approx G_{b'}\). In this case, \(P\) always ansers \(b'\) correctly. Hence, \(b=b'\) always, and:
\[Pr([P,V](1^n,x;1^n,x)=(1,1))=1\]
- Does the Prover actually **prove** the truth of the statement.
Classical Proofs def:
if \(x \notin L\), then for all \(proof*\), \(V(x, proof*)=reject.\)
Interactive Proofs def:
The probability that \(P\) guesses \(b\) correctly for all \(|x|\) rounds is \(\frac{1}{|x|}\), which is negligible.
- Can we reveal that the Prover reveals nothing else than that the statement is true.
Def: A proof system for a language L is an interactive proof \((V_L , P_L )\) for \(L\), such that there is a probabilistic algorithm S (or “Simulator”) that runs in expected polynomial time and such that for every string x ∈ L the distribution of outputs of \(S(x)\) is identical to the distribution of views of \(V_L\) of the interaction between \(P_L\) and \(V_L\) on input \(x\).
ref: http://www.cs.tau.ac.il/~canetti/f09-materials/f09-scribe3.pdf
An “argument of knowledge” is a technical term distinguished from “proof of knowledge”.
The proof is only computational – an adversary with enough computing power may be able to convince you that he knows the secret value(s), even if he doesn’t.
Here just assume that Verifier of the proof will interact with the Prover in real time.
Our argument of knowledge will come after we have generated a set of commitments for each of m vectors \(x_1 , x_2 , \cdots , x_m\) , each of the same dimension \(N(\neq m)\)).Explicitly:
[12]:
m = 5
N = 7
x: [[CF]] = [[random_cf() for j in range(0, N)] for i in range(0, m)]
r: [CF] = [random_cf() for i in range(0, m)]
q: CF = random_cf()
H: ECG = G@q
C: [ECG] = list(map(lambda x: H @ x[0] + vgm(x[1]), zip(r, x)))
\(P\rightarrow V: C_0\) (0 (a new commitment to a newly chosen random vector of dimension \(N\))
\(V\rightarrow P: e\) (a random scalar)
\(P\rightarrow V: (\mathbf{z}, s)\)(a single vector of dimension N, and another scalar)
Note that the summations start at 0; this means that the sums include the extra, random commitment, indexed 0, that was created as the first step of the interaction.
for \(z=\sum_{i=0}^m e^ix_i\), \(z_n = x_0n + e*x_1n + . . . e^m*x_mn\), aht addition hides the individual values.
[13]:
m = 5
N = 7
x: [[CF]] = [[random_cf() for j in range(0, N)] for i in range(0, m)]
r: [CF] = [random_cf() for i in range(0, m)]
q: CF = random_cf()
H: ECG = G@q
C: [ECG] = list(map(lambda x: H @ x[0] + vgm(x[1]), zip(r, x)))
from functools import reduce
P, V = {}, {}
# Step 1
V['C_0'] = C[0]
# Step 2
V['e'] = random_cf()
P['e'] = V['e']
# Step 3
# z: a single vector of dimension N
# z_n = x_0n + e*x_1n + . . . e^m*x_mn ;
P['z'] = [
reduce(lambda x, y: x + y , [x[i][j] @ (P['e'] ** (i)) for i in range(0, m)])
for j in range(0, N)
]
P['s'] = reduce(lambda x, y: x + y, [r[i] @ (P['e'] ** (i)) for i in range(0, m)])
V['z'], V['s'] = P['z'], P['s']
Having received this \((z, s)\), the verifer of course needs to verify whether the proof is valid. He does the following:
[14]:
assert reduce(lambda x, y: x+y, [C[i] @ (P['e'] ** (i)) for i in range(0, m)]) == H @ V['s'] + vgm(V['z'])
Completness¶
For RHS:
Zero-knowledgeness¶
We deal with zero knowledgeness before soundness, because the latter is the harder proof.
If the distribution of transcripts of the conversation between Prover and Verifier, in the case where the verifier’s execution environment is controlled and it is run by a notional entity called a “Simulator”, and we can simulate a proof without actually having the knowledge, is the same distribution as that obtained for genuine conversations with Prover(s) who do know the opening of the vector commitments, it follows that the Verifier learns zero from the interaction other than the aforementioned single bit.
def:
A “witness” is a piece of (usually secret) data corresponding to a “statement” which the Prover possesses but does not want to reveal.
E.g. : Schnorr’s identity protocol¶
ref: https://crypto.stanford.edu/cs355/19sp/lec5.pdf
Prover starts with a public key P and a corresponding private key \(x\) s.t. \(P = xG\). Prover wishes to prove in zero knowledge, that he knows \(x\).
\(P\rightarrow V:P=xG\)
\(P\rightarrow V:R\) (a new random curve point, but \(P\) knows \(k\) s.t. \(R=kG\))
\(V\rightarrow P:e\) (a random scalar)
\(P\rightarrow V:s\) (which \(P\) calcuated from the quation \(s=k+ex\))
Note: the transcript referred to above, would here be: \((R, e, s)\).
[15]:
P, V = {}, {}
x = random_cf()
# step 1
P['x'] = x
V['P'] = G @ P['x']
# step 2
P['k'] = random_cf()
V['R'] = G @ P['k']
# step 3
V['e'] = random_cf()
P['e'] = V['e']
# step 4
P['s'] = P['k'] + P['e'] * P['x']
V['s'] = P['s']
transcript = V['R'], V['e'], V['s']
Verification works fairly trivially: verifier checks \(sG \stackrel{?}{=} R+eP\).
[16]:
assert G @ V['s'] == V['R'] + V['P'] @ V['e']
It may be entirely valid to the interacting Verifier, but entirely meaningless (as in this case) to a third party who is shown the transcript later.
zero knowledgeness proof:¶
The “Simulator”, which controls the execution of the verifier, given the public key P, just as the Verifier would be, can fake a valid transcript as follows:
Choose \(s\) randomly. Then, choose \(e\), also randomly. Finally, we only need to choose \(R\) to create a complete conversation transcript; it must be \(R = sG − eP\).
Then we have successfully simulated a conversation which is entirely valid: \((R, e, s)\), without ever knowing the secret key \(x\), and which it’s easy to see is randomly distributed in the same way as a real one would be (\(R\) is a free variable).
[17]:
s, e = random_cf(), random_cf()
R = G @ s - V['P'] @ e
[18]:
assert G @ s == R + V['P'] @ e
Another way of looking at it is that this kind of proof is deniable –
it may be entirely valid to the interacting Verifier,but entirely meaningless (as in this case) to a third party who is shown the transcript later.
For Vector proof of knowledge case:¶
The conversation transcipts looks like: \((C_0, e, (\mathbf{z}, s)\), which is almost the same, except that the final portion is a vector + a scalar instead of a single scalar.
And so the same reasoning applies: a Simulator can fake the transcript by choosing out of order.
involved issue here: you choose \((\mathbf{z}, s)\) both at random, as well as \(e\), and you can deduce the right value of the point:
The \(C 1 , C 2 , . . . , C\) m are all set in advance.
Now we try to do the prove via random \(\mathbf{z}, s, e\)
[19]:
z = [random_cf() for i in range(0, N)]
s = random_cf()
e = random_cf()
[20]:
q: CF = random_cf()
H: ECG = G@q
[21]:
C_0 = H @ s + vgm(z) - reduce(lambda x, y: x + y, [C[i] @ e ** (i) for i in range(1, m)])
We provide \((C_0, e, (\mathbf{z}, s))\)
[22]:
assert reduce(lambda x, y: x+y, [([C_0] + C[1:])[i] @ (e ** (i)) for i in range(0, m)]) == H @ s + vgm(z)
The entire transcript will look valid to a third party
Knowledge sondness - does a verifying interaction actually prove knowledge of the vectors?¶
Proving “soundness” is somehow complementary/“opposite” to proving zero knowledge, in the following sense: the idea here is to isolate/control the oper- ation of the Prover, as a machine, rather than isolate the verifier.
If we can control the Prover’s environment and by doing so get him to spit out the secret information (the “witness”), it follows that he must have it in the first place!
God (The Extractor) Stealing the secret from the Prover(Machine)
imagine the Prover is literally a function. You can start running it, stop at any point (imagine debugger breakpoints). Crucially you can make copies of its current state.
E.g. In the Schnorr identity protocol case:¶
we get the Prover to run twice, but only after the same initial random point \(R\). So imagine I as “Extractor” (what we call the “God” controlling the Prover) create two runs of the protocol with two different values \(e_1\) , \(e_2\) against the same initial \(R\), then:
Secret \(x\) is LEAK!
Thus, the Extractor get the secret key in two runs of the protocol that happened to share the same “nonce” point \(R\) (remember, \(R = kG\) and it’s the same in both runs here). This is such a widely known “exploit” of both Schnorr and ECDSA signatures (when wrongly implemented)
Back to Knowledge of set of vector case¶
First point:
We need to get the Prover to output not just two transcripts, but m+1. This will be enough to prevent the system of equations from being underdetermined, i.e. it will give us a unique solution.
we have the Extractor start the Prover, who gener- ates here a \(C_0\) , then provide it with a random challenge \(e\), then retrieve from it a pair \((\mathbf{z}, s)\). Assuming that this response is valid, we can repeat the process, a total of \(m + 1\) times, resulting in this set of transcripts:
[23]:
def gen_ts():
z = [random_cf() for i in range(0, N)]
s = random_cf()
e = random_cf()
C_0 = H @ s + vgm(z) - reduce(lambda x, y: x + y, [C[i] @ e ** (i) for i in range(1, m)])
return (C_0, e, (z, s))
[24]:
ts_set = [gen_ts() for i in range(0, m+1)]
Then Extractor start to constructing the Vandermonde matrix:
[25]:
from numpy import matrix
[26]:
A_rev = [[ts[1] ** i for i in range(0, m + 1)] for ts in ts_set]
The Vandermonde matrix, acting on the column vector of a set of coefficients of a polynomial, outputs a new column vec- tor which represents the evaluation of that polynomial at each of the points.
this means the \(inverse\) of that matrix, if it \(exists\), therefore maps a set of \(m+1\) polynomial evaluations (the polynomial here has degree \(m\)), back to its set of coefficients, and most crucially that mapping is one-one and therefore the solution is unique.
a set of :math:`N + 1` evaluations fixes a polynomial of degree :math:`N`.
To be continue ….¶
Aside: the Sigma protocol¶
\(P\rightarrow V:R\) (a new random curve point, but \(P\) knows \(k\) s.t. \(R=kG\).
\(V \rightarrow P: e\) (a random salar)
\(P \rightarrow V: s\) (which \(P\) calculated from eq. \(s=k+ex\)
The transcript is: \((R, e, s)\)
===== And =====
\(P \rightarrow V: C_0\) (a new commitment to a newly chosen random vector of dim N)
\(V \rightarrow P: e\) (a random scalar)
\(P \rightarrow V:(\mathbf{z}, s)\) (a single vector of fim \(N\), and another scalar)
The transcript is: \((C_0, e, (\mathbf{z}, s))\)
(the first was Schnorr’s identity protocol; the second was the proof of knowledge of a set of vectors).
These are both examples of Sigma protocols, so called because of a vague resemblance to the greek letter \(\Sigma\), in that the process goes forwards once, then backwards, then forwards finally. The common pattern, though, is more than this three step interactive process. We generalise it as something like:
\(P\rightarrow V:\) commitment
\(V \rightarrow P\): callange
\(P \rightarrow V\): response (proof)
V An inner product proof¶
In Groth’s paper, he presents the core algorithm, which probably- not-coincidentally is also the core of Bulletproofs. The inner product proof here uses all the same elements as we’ve discussed above, although in a slightly more complicated structure.
It starts by assuming that the Prover has two vectors :math:`x` and :math:`y`, and obviously knows the inner product of those, which we’ll now call :math:`z`.
The Prover’s job will now be to convince the verifier that Pedersen commitments to these three quantities obey \(z = \mathbf{x}· \mathbf{y}\); so we assume upfront that the three commitments are known, we’ll call them from now \(Cz , C_x , C_y\) :
1. The commitment step¶
Analogous to the \(R\) value of above two, The Prover \(P\) will need to send commitments to two nonce vectors (\(\mathbf{x, y}\)).
These nonce vectors will be called \(\mathbf{d}_x, \mathbf{d}y\).
Instead of sending \(\mathbf{d}_x , \mathbf{d}_y\) , the Prover will instead send Pedersen commitments to them:
[27]:
m = 10
x = [random_cf() for i in range(0, m)]
y = [random_cf() for i in range(0, m)]
def inner_product(x: [CF], y: [CF]):
return reduce(lambda x, y: x + y, map(lambda x: x[0] * x[1], zip(x, y)))
z = inner_product(x, y)
H = G @ random_cf()
t, r, s = random_cf(), random_cf(), random_cf()
C_z = H@t + G@z
C_x = H@r + vgm(x)
C_y = H@s + vgm(y)
\(r_d , s_d\) will be random values as usual for Pedersen commitments.
Just as the final Schnorr response is \(ex + k\), so here our final response(s) are of the form \(e\mathbf{x} + \mathbf{d}\), more specifically, one for each:
[28]:
dx = [random_cf() for i in range(0, m)]
dy = [random_cf() for i in range(0, m)]
r_d, s_d = random_cf(), random_cf()
A_d = H @ r_d + vgm(dx)
B_d = H @ s_d + vgm(dy)
However, that’s not enough; we’re trying to prove an inner product, too.
What we’ll have to do also in the commitment step is to send a commitment to the expected inner product of this \(blinded\) form of our vectors. The blinded form has already been mentioned as \(e\mathbf{x} + \mathbf{d}_x\) , \(e\mathbf{y} + \mathbf{d}_y\) , but we don’t yet know the challenge \(e\), so we have to factor that out somehow.
Now, :math:`e` is a linear factor in each of these terms, so dot-product-ing them\((e\mathbf{x} + \mathbf{d}_x)\cdot(e\mathbf{y} + \mathbf{d}_y)\)will result in a quadratic in :math:`e`, so there will be three coefficients, and we’ll therefore need to provide commitments in advance for each of these three coefficients.
We known that:
So we’ll therefore need to provide two additional commitments:
Thus we got (Comm stand for Pedersen commitment) :
[29]:
t1, t0 = random_cf(), random_cf()
C_1 = H@t1 + G@(inner_product(x, dy) + inner_product(y, dx))
C_0 = H@t0 + G@inner_product(dx, dy)
So to do all this in the commitment step, the Prover had to come up with 4 random scalars \(r_d\) , \(s_d\) , \(t_1\) , \(t_0\) and two random vectors \(\mathbf{d}_x\) , \(\mathbf{d}_y\) and then send 4 Pedersen commitments using this data: \(A_d , B_d , C_1 , C_0\).
The commitment should be:
[30]:
(A_d, B_d, C_1, C_0)
[30]:
(EllipticCurveGroupSecp256k1::(FiniteFieldSecp256k1::107296709053369096896019338157260914018838120260700826036278518553079215056466, FiniteFieldSecp256k1::77568923792984566617605484883939490785699525518622088879361106068326887339150),
EllipticCurveGroupSecp256k1::(FiniteFieldSecp256k1::110615271327284575327623265355917185658846175240444091042980971154499916523599, FiniteFieldSecp256k1::87739299285397358992301153180942992237469949801877638244691286647809203570976),
EllipticCurveGroupSecp256k1::(FiniteFieldSecp256k1::63332741828529211778439861392811059766246385579451319379906017464478470924295, FiniteFieldSecp256k1::68926615334820379836895905128027899357626545050561183660245822174315433904240),
EllipticCurveGroupSecp256k1::(FiniteFieldSecp256k1::40310417929055891074976476877034803677213564495612382129320555979658680066630, FiniteFieldSecp256k1::80021093934540213408010330224258582906083447608942586519179154315665352282518))
2. The challange step¶
Nothing to discuss here – the Verifier simply sends a single scalar value \(e\).
[10]:
e = random_cf()
3. The response step¶
The above detailed discussion will hopefully make the following set of data, sent by the Prover, less bewildering:
[32]:
fx = [e*x[i] + dx[i] for i in range(0, m)]
fy = [e*y[i] + dy[i] for i in range(0, m)]
rx = e*r + r_d
sy = e*s + s_d
tz = (e**2) * t + e * t1 + t0
note that here we are sending the blinded forms \(f_x , f_y\) of the two vectors, not the Pedersen commitments – the idea is that the Verifier will verify precisely by reconstructing the commitments and checking they match \(C_x , C+y\) . Those two checks are:
[33]:
assert C_x @ e + A_d == H @ rx + vgm(fx)
assert C_y @ e + B_d == H @ sy + vgm(fy)
[34]:
assert H @ tz + G @ inner_product(fx, fy) == C_z @ (e**2) + C_1 @ e + C_0
Overview:¶
\(P \rightarrow V: (A_d, B_d, C_1, C_0)\)
\(V \rightarrow P: e\)
:math:`P rightarrow V: (mathbf{f}_x,mathbf{f}_y, r_x, s_y, t_z) `
The transcript is \(((A_d, B_d, C_1, C_0), e, ( \mathbf{f}_x,\mathbf{f}_y, r_x, s_y, t_z))\)
Completeness¶
The \(r_x , s_y\) were chosen to make the random values equate in the relevant Peder- sen commitments. It’s not hard to see that these equations will verify for honest behaviour:
To ensures that the inner product is correct, \(t_z\) is needed for the third check.
VI Ref:¶
[1] From ZK to bulletproof, Adam Gibson, https://github.com/AdamISZ
[2] Interactive Proofs and Zero Knowledge, Ran Canetti, Alon Rosen, Fall 2009
[ ]: