Limited Entropy Dot Com Not so random thoughts on security featured by Eloi Sanfèlix

18Apr/101

Crypto Series: Introduction to the RSA algorithm

Posted by Eloi Sanfèlix

After seeing how the ElGamal system works, today we are going to take a look at the RSA public key cryptosystem. The RSA algorithm was first published by Rivest, Shamir and Adleman in 1978 and is probably the most used crypto algorithm today.

Despite this fact, the algorithm seems to have been invented by Clifford Cocks, a british mathematician who worked for a UK intelligence agency. Since this work was never published due to the top-secret classification, the algorithm received its name from Rivest, Shamir and Adleman who were the first to discuss it publicly. A document declassified in 1997 revealed the fact that Clifford Cocks had actually described an equivalent system in 1973.

Let me remind you once again that these posts are not intended to be 100% accurate in a mathematical sense, but an introduction for people who doesn't know much about cryptography. If you want more accurate and complete descriptions, take a crypto book such as the Handbook of Applied Cryptography I've linked in most of my posts :).

Setting up the RSA algorithm

The RSA algorithm is based on the assumption that integer factorization is a difficult problem. This means that given a large value n, it is difficult to find the prime factors that make up n.

Based on this assumption, when Alice and Bob want to use RSA for their communications, each of them generates a big number n which is the product of two primes p,q with approximately the same length.

n = p\cdot q

Next, they choose their public exponent e, modulo n. Typical values for e include 3 (which is not recommended!) and 2^{16}+1 (65537). From e, they compute their private exponent d so that:

e \cdot d \equiv 1 \pmod{\varphi(n)}

Where \varphi(n) is the Euler's totient of n. This is a mathematical function which is equal to the number of numbers smaller than n which are comprimes with n, i.e. numbers that do not have any common factor with n. If n is a prime p, then its totient is p-1 since all numbers below p are comprimes with p.

In the case of the RSA setup, n is the product of two primes. In that case, the resulting value is lcm((p-1)(q-1)) because only the multiples of p and q are not comprimes with n.

Once our two parties have their respective public and private exponents, they can share the public exponents and the modulus they computed.

Encryption with RSA

Once the public key (i.e. e and n) of the receiving end of the communication is known, the sending party can encrypt messages like this:

c = m^e \pmod{n}

When this message is received, it can be decrypted using the private key and a modular exponentiation as well:

m^{\prime} = c^d \pmod{n} = m

Example

sage: p=random_prime(10000)
sage: q=random_prime(10000)
sage: n=p*q
sage: p,q,n
(883, 2749, 2427367)
sage: e=17
sage: G=IntegerModRing(lcm(p-1,q-1))
sage: d = G(e)^-1
sage: G(d)*G(e)
1
sage: m=1337
sage: G2=IntegerModRing(n)
sage: c=G2(m)^e
sage: c
1035365
sage: m_prime=G2(c)^d
sage: m_prime
1337

In the commands above, I first create two random primes below 10000 and compute n. Then I create a IntegerModRing object to compute things modulo lcm(p-1,q-1) and perform the computation of the private exponent as the inverse of the public exponent on that ring.

Next, I create a new ring modulo N. Then I can use the public exponent to encrypt a message m and use the private exponent to decipher the cryptotext c... and it works!

Correctness of RSA encryption/decryption

We have seen it works with our previous example, but that doesn't prove that it really works always. I could have chosen the numbers carefully for my example and make them work.

Euler's theorem tells us that given a number n and another number a which does not divide n the following is true:

a^{\varphi(n)} \equiv 1 \pmod{n}

Therefore, and since e\cdot d \equiv 1 \pmod{\varphi(n)} , for any message m that does not divide n the encryption and decryption process will work fine. However, for values of m that divide n we need to use more advanced maths to prove the correctness.

Another way to prove it is to use Fermat's little theorem and the Chinese Remainder Theorem. I will explain these theorems in my next post and then I will provide a complete proof based on them.

RSA for signing

In the case of RSA, digital signatures can be easily computed by just using d instead of e. So, for an RSA signature one would take message m and compute its hash H(m). Then, one would compute the signature s as:

s = (H(m))^d \pmod{n}

For verifying the signature, the receiving end would have to compute the message hash H(m) and compare it to the hash contained in the signature:

H'(m) = s^e \pmod{n} \equiv (H(m)^d)^e \pmod{n} \equiv H(m) \pmod{n}

Therefore, if the hash computed over the received message matches the one computed from the signature, the message has not been altered and comes from the claimed sender.

Security of RSA

In order to completely break RSA, one would have to factor n into it's two prime factors, p and q. Otherwise, computing d from e would be hard because (p-1) and (q-1) are not known and n is a large number (which means that computing its totient is also difficult).

In a few posts I will show an algorithm to solve the factorization problem. However, another way to break RSA encrypted messages would be to solve a discrete logarithm. Indeed, since c=m^e \pmod{n} , if one solves the discrete logarithm of c modulo n, the message would be recovered.

Luckily, we already know that discrete logs are not easy to compute. And in this case, solving one does not break the whole system but just one message.