We've been trying to understand how Telegram's VoIP end-to-end encryption works.
Key Exchange
Detailed description of the key exchange algorithm is documented on the Telegram website.
Here is the crux of the algorithm, taken from the same page:
- A->B: generates a and sends g_a_hash := $hash(g^a)$
- B->A: stores g_a_hash, generates b and sends g_b := $g^b$
- A->B: computes key $(g^b)^a$, then sends g_a := $g^a$ and the fingerprint $f\left((g^b)^a\right)$.
- B: checks $hash(g^a)$ = g_a_hash, computes key $(g^a)^b$ and checks $f\left((g^a)^b\right) = f\left((g^b)^a\right)$.
For the $hash(x)$ function, they use SHA256. Fingerprint $f(x)$ is the lower 64 bits from SHA1. All messages between $A$ to $B$ go through Telegram server that basically re-transmits messages to the other party.
Questions
Why are they using $hash(g^a)$ in the first step? They say that sending the $hash(g^a)$ first doesn't allow the party $B$ to select $b$ such that $hash((g^a)^b)$ will have some specific properties. Do such attacks exist?
Why $f\left((g^b)^a\right)$ is sent in the last step? They explain it "as an implementation sanity check". Does it make sense?