## Crypto Series: Discrete Logarithm

From last post, it becomes clear that at this stage we won't be able to make it without some maths. That's because we are dealing now with public key crypto, which is based on difficult mathematical problems (as in difficult to solve, not as in difficult to understand).

With symmetric crypto, we could understand the concepts of diffusion and confusion without needing to dive into maths. On the other hand, here we will need to understand the problems on which the algorithms rely in order to understand how they work.

In this post, we'll see what's the Discrete Logarithm problem, why it is difficult to solve based on a simple intuition, and finally a method to solve this kind of problems. Of course it's not the only (nor the best) existing method, but in my opinion it is the simplest one to understand.

**The Discrete Logarithm problem**

As I said before, the Discrete Logarithm problem is formulated as follows. Given a cyclic group G with a generator g, x is called a discrete logarithm of h over this group if it satisfies the following condition:

in G

So this is the equivalent to a logarithm, but instead of computing it over the real numbers it is computed over a finite cyclic group. And now, if you don't have any background on discrete maths, coding theory and the like, you are probably asking something on these lines: *what the hell does that mean?*

To keep it simple, a finite cyclic group G with a generator g means that the successive powers of g (i.e ) will generate the different elements of the group. At some point, after a finite number of elements ( ) the result will cycle over to the first element (i.e. ), and this is what gives the name to the groups ;-). Now, this value, *n*, is called the *order* of the group and is obviously also the number of elements of the group, or cardinality.

I won't go any further with the explanation of the properties of cyclic groups and all the group theory behind this. I'll just say that a simple example of finite cyclic groups is that of the integers modulo some prime number *p*, excluding the zero element. This groups are usually noted as , where *p *is our prime number and the group order is *p-1*.

For instance, say we look at . Then, for this group we get that the group elements are these:

1,2,3,4,5,6

Since those are all the integers modulo 7. Now, a generator of this group would be for instance *g=3*. You can see that in this case the successive powers of 3 modulo 7 are:

1,3,2,6,4,5,1

And there you have that . Therefore, this is a cyclic group of order *p-1=6*.

**Difficulty of the DL problem**

Now, where is the difficulty of the DL problem? We'll just take an intuitive approach to it. When you think of a *classical* logarithm over the real numbers, it turns out that this is a continuous and monotonous function where if . This means that if you know the logarithm of x, and y is pretty close to it, most likely the logarithm of y will be pretty close to it as well.

But when you look at the discrete logarithm, you can see that the behavior of this problem is not as predictable as that of the logarithm function. For instance, in the example above we have that , but . Extrapolating this to big numbers, you can see that it is probably not very easy to go back from a certain power of a prime number to the exponent itself (i.e., computing the DL).

**Solving Discrete Logarithms: Baby Step Giant Step**

All right, now we get to look at an actual method to compute discrete logarithms. The method is called *Baby Step Giant Step* due to the way we approach the problem: we create a table of *a few* powers of our generator: this are our baby steps. Then we take our target problem, and take big steps (of the size of the generated table) downwards until we hit the table. At that point, we know how many steps we took and we can compute the actual logarithm.

But of course, all this may sound like bullshit until you see an actual example. Let's take the following problem, which uses intentionally small numbers:

Given , compute its discrete logarithm in .

Ok, let's start then. We compute a table of a given size, let's say 100 elements. I used to do it with Mathematica, but I do not have it right now so I'm using for the first time ever the Sage open source program. I advise you to install it, because it will also come handy to verify other examples (such as RSA examples) in the future.

So we start by instantiating our cyclic group, and getting a generator and our value *y*:

`sage: G = IntegerModRing(17627)`

sage: g = G(6)

sage: g.multiplicative_order()

17626

sage: y=G(8938)

Now, we build our list of 100 powers and plot it:

sage: powers = [ g^i for i in range(100) ]

sage: list_plot(powers)

Note that *sage* directly applies modular exponentiation since *g* was created as an element of the finite field we are using for this problem. Also, note that the behavior is not really predictable and after a *big *number it often comes a *small* number, but of course not always. You can observe this behavior in the plot obtained:

Ok, let's continue with our search. First, we know that our number, *y*, is part of the finite field, and therefore we can write it as follows:

Where of course *i *is a number below 100. We can further develop this equation and write it the following way:

And here it comes the magic! If you look at this equation, is actually a member of our table of powers. Further, we can compute , which is easy. Then, we can take *y* and multiply it by *a* and check if the result is on the table. In that case, it means that and we can easily compute x!

If it was not the case (which is likely), then we will have to multiply again by *g* as many times as we need until we hit the table. Let's call that number k. At that point, we've found that . Since we know k (it's the number of times we applied our multiplication!) and *i* (we take it from the table), we can compute *x*.

All this can be translated into the following fragment of sage commands:

`sage: j=1`

sage: while not y*a^j in powers: j += 1

....:

sage: j

79

sage: i = powers.index(y*a^j)

sage: i

70

sage: x=100*j+i

sage: x

7970

sage: g^x == y

True

So what you see above means that after 79 steps we have found the value at position 70 in the list. Therefore, the discrete logarithm of *y* in G is *x=7970*. After that, I compare the *x-th* power of *g* with *y* to be sure that the result is correct, and *sage* returns a True. If you happen to know a bit of Python, you can notice that it has a pretty similar syntax (but not identical).

Of course, *sage* also provides easier ways to do it. You can just type *y.log(g)* to solve the problem here:

`sage: y.log(g)`

7970

**Closing thoughts**

The method explained above is not the only one nor the most efficient. Also, as usual the explanations here do not attempt to be a 100% accurate description of the problem from a mathematical point of view (I'm not a mathematician after all) but rather to explain crypto topics in a simple way so that most people can understand it.

If you want to go further with DL problems, get accurate descriptions of it and understand other methods of solving the problem, you can resort (once again) to the Handbook of Applied Cryptography. Specially chapters 2 and 3 treat this and related subjects, covering maths background and reference problems.

I hope you are enjoying these posts and see you soon!

vierito5February 4th, 2010 - 21:37

Awesome article and great explained 😀

schillyFebruary 16th, 2010 - 18:17

Hi, very cool article – and welcome to sage 😉

Just some picky comments about the python code:

1. you can create the powers list more pythonic:

sage: powers = [ g^i for i in range(100) ]

2. you can visualize it’s “randomness” via list_plot(powers)

3. the while loop is a bit odd, this looks better:

sage: a = g^-100; y=G(8938); j=1

sage: while not y*a^j in powers: j += 1

….:

sage: j

79

Eloi SanfèlixFebruary 17th, 2010 - 19:31

@schilly Cool, thanks for the tips!

I’m not much of a Python guy, I’m more of a C coder. I’ll update the post with your suggestions 🙂

john HowserMay 28th, 2014 - 00:41

can you send me the output of the 100 data points. I am not savvy enough to figure out Sage.

thanks

ChrisMarch 6th, 2015 - 17:16

Can you explain how did you select the number 100 for the number of powers? As far as I know it should be sufficient to take the square root from the highest divisor of p-1, but it would be 36 in this case so it doesn’t work. Thanks!

Eloi SanfèlixApril 2nd, 2015 - 22:02

It’s just an arbitrary number. The bigger the precomputed table, the less steps you need to perform (but the more memory and pre-computation work!).

You can easily see it in the example above. Say you use 1000 powers instead of 100, then you’d need 7 steps instead of 70, and you’d find your result at index 970.

Thus, your final result will still be 7970. Only now you’ll do 7 steps (instead of 70) and at each step you need to perform a search within a list of 1000 points (rather than 100).