[Home]
Random Number Generators
I set out to find the state of the art in random number generation.
It is more difficult than I expected. George Marsaglia is an expert
in the field. One of his
posts explains the problem and gives code for several good
RNGs (latest versions of KISS, MWC256, and CMWC4096). He compares
choosing the "best" random number generator to choosing the best in a
"Miss America Contest". To be politically correct, I'll change his
analogy to choosing the best automobile. If someone asked you which
was the best car to get, how would you answer? The answer depends on
the needs of the user. If he needs to move a lot of stuff around, you
might recommend a pickup truck. If he needs to drive his family
around, you might recommend an SUV, minivan, or 4-door sedan. Price
and performance are also factors to consider. Then, there are the
aesthetic issues. Which car, in terms of looks and design, is the
most attractive to the user? So it goes on and on. And so it goes
with random number generators.
The problem is that random number generators are not truly random.
If you had access to a truly random source on your computer, you could
just use that. Instead, you're trying to find an algorithm to
generate random-looking numbers. As John von Neumann famously
remarked, "Anyone who considers arithmetical methods of producing
random numbers is, of course, in a state of sin." The quest is flawed
from the start.
Generating random numbers for cryptography is very different than
generating numbers for simulation. Most random number generators
are designed to have good statistical properties; they are not
designed to be unpredictable. I'm mainly concerned with RNGs
with good statistical properties. For cryptographically secure
random number generators, look at Yarrow, Blum
Blum Shub, ISAAC, or
one based on a secure digest or cipher. See, for example,
Peter Gutmann's paper on
Software Generation of Practically Strong Random Numbers. The Mersenne Twister FAQ also has suggestions for making it
cryptographically secure.
The experts use the following criteria to measure a random number
generator. See Pierre L'Ecuyer's Random
Number Generation (2004, pdf) for a more thorough discussion of
quality criteria.
- Good Theoretical Basis. The RNG should be
based on an algorithm with provably good statistical properties.
It's fair to ask, "Why should I care about this if my random
number generator passes all the standard statistical tests?" The
problem is that statistical tests cover the tip of the iceberg.
That is, statistical tests are very limited in scope due to
computational limitations. You can't search all numbers produced
by an RNG with a period of 2^131 for statistical anomalies. It's
similar to cryptography. Just because no one has cracked your
cipher, doesn't mean you have a good cipher. Statistical tests
can prove that an RNG is bad; they can't prove that an
RNG is good.
- Long Period. The RNG should have a
"long" period. Every RNG based on a finite state eventually
repeats the numbers it generates. The period should be long enough
to ensure that the RNG does not cycle in practice.
A period of 2^32 or less is too short. It seems like
a period above 2^60 is sufficient.
- "Pass" Empirical Statistical Tests. There
are many tests suites available:
TestU01,
DIEHARD,
SPRNG,
and NIST's Tests. Most
statistical tests compute a p-value which should be uniform over
[0,1). Poor RNGs will fail many tests with p-values at or
very close to 0 or 1. However, I have yet to see an RNG
which passes every test perfectly. As Marsaglia writes, "p happens".
Pierre L'Ecuyer writes, "Building a RNG that passes all
statistical tests is an impossible dream." He
continues: "So we may say that bad RNGs are those
that fail simple tests, whereas good RNGs
fail only complicated tests that are hard to find and run."
- Efficient. Many researchers place a strong
emphasis on efficiency. It's not clear to me why. Obviously, an
RNG which is very cpu-intensive will be impractical. The
emphasis you place on this criterion depends a great deal on your
application for the RNG. If you require generating many
numbers very quickly, then you'll place a high value on
performance. Likewise if you need to implement the RNG in
hardware and have very limited resources. But, for many
applications, efficiency will be low on the list of
priorities.
- Repeatable. For simulation, you would like
an RNG that is repeatable. In other words, given the same
starting state, it should generate the same numbers.
For cryptographers this is less important; they need
RNGs that are unpredictable.
- Portable. In order for an RNG to be
generally useful, it should be portable. That means
that it should be relatively easy to implement on a wide range of
hardware, operating systems, and programming environments.
RNGs which use only 32-bit integers are more portable than
RNGs that use 64-bit integers.
Where To Find More Information
- Pierre L'Ecuyer
is one of the experts in the field. He has an excellent web site
with lots of information on random number generators. Best of all, many of
his published
papers are available there. Especially good are, Random
Number Generation (2004, pdf) and Uniform
Random Number Generation (1994, ps). If you want to experiment with
random number generators, check out TestU01. It has
many random number generator implementations and many statistical
tests.
- The latest version of George Marsaglia's DIEHARD test
suite.
- George Marsaglia's USENET posts. Marsaglia's
1999 post gives code for
several good RNGs. Another
post in 2003 explains the difficulty in finding the "best"
RNG and gives code for several RNGs with very long periods. For
those obsessed with RNGs with very long periods, Marsaglia
provides code for several RNGs with periods much longer than the
famous Mersenne Twister. CMWC4096 has a period of 2^131086.
- pLab is a great
source for information about random number generators.
- PRNG is
portable C implementation of several random number
generators.
- GNU Scientific
Library (GSL) includes several very good random number
generators including Mersenne Twister.
Recommendations
Unfortunately, it depends on what you need it for. For simple
uses, a well-designed linear congruential generator may be
fine. If you need a cryptographically secure RNG, choose one
that was designed to be secure (most RNGs weren't).
Ideally, you should test the RNG yourself. I list several
test suites above. Even better is if you have statistical tests
specific to how you will use the RNG. It's good
to verify that the implementation you're using is good and that
the RNG doesn't exhibit any biases when applied to your type of
application.
There are dozens of very good RNGs. None is perfect, but none has
obvious flaws. None is clearly superior to the others. How do you
choose? It's as if you had chosen the best automobile for your needs
and the salesman asks whether you want it to be black, silver, white,
blue, or red. Which is best?
In Random Number Generation, L'Ecuyer recommends combined MRG's,
combined LFSR's, and Mersenne twisters. GSL's
Manual recommends several RNGs including Mersenne Twister. George
Marsaglia has reservations about the complexity of Mersenne Twister,
but considers it to be a good RNG nonetheless. The criticisms of
Mersenne Twister tend to be aesthetic -- there are many simpler and
faster RNGs with good foundations in theory which perform as well in
statistical tests.
Although some experts may not like the aesthetic design of Mersenne
Twister, I haven't seen any substantial criticisms.
Matsumoto's Mersenne
Twister seems to be a safe choice. It has strong support
from people knowledgeable in the field, a good theoretical basis, a
long period, good performance characteristics, and empirical
statistical support.
[Last Modified: 9 June 2004]