OU blog

Personal Blogs

Richard Walker

Ten Random Digits

Visible to anyone in the world
Edited by Richard Walker, Friday 19 December 2025 at 23:58

Suppose we choose 10 digits 0-9 at random, one after another. For example, we might get

3, 0, 5, 2, 1, 5, 3, 1, 9, 2

How many distinct digits appear?  In the example there were 6 –

0, 1, 2, 3, 5, 

– but the answer could be anything from 1 (all the digits are the same), up to 10 (they are different). 

Can we work out the expected number of distinct digits that appear; how many different digits show up on average?

Consider the probability that 0 does not appear. The probability of the first digit not being 0 is nine divided by 10 , because 9 out of 10 digits aren't 0.  The probability that the second digit is not 0 is the same, and so on, up to the tenth digit.

The probability the none of the digits is 0, i.e. 0 does not appear, is therefore nine divided by 10 dot operator nine divided by 10 dot operator ellipsis dot operator nine divided by 10 equals left parenthesis nine divided by 10 right parenthesis super 10 .

What then is the probability that 0 does appear. Well, this is easy. It either appears or it doesn't, and the two probabilities must therefore add up to 1. Hence the probability of 0 being present in our sequence is 

one minus left parenthesis nine divided by 10 multiplication nine divided by 10 multiplication ellipsis multiplication nine divided by 10 right parenthesis equals one minus left parenthesis nine divided by 10 right parenthesis super 10 , or about 0.651.
 

This is equivalent to saying that the expected number of times 0 will come up is 0.651. But there was nothing special about 0, the calculation would be the same for the other digits.

Now imagine we keep a score of what distinct digits appear, counting 1 if a digit appears and 0 if it doesn't. The probability that a digit appears is 0.651, in which case it will contribute 1 to the total. and 1 - 0.651 = 0.349 that it does not turn up and contributes 0. So it will make a net contribute of 0.651 x 1 + 0.349 x 0 = 0.651.

Now here is a piece of utter magic. There is a theorem, "Linearity of Expectation" which says that to get the overall score across all the digits all we need to do is add up the individual scores! This is actually quite surprising but it's true.

So we can just work out 10 x 0.651 = 6.51 and that is the average number of distinct digits we expect to see.

I always like to simulate these answers, in case I have something horribly wrong, so here is a short Python program


digits = [0,1,2,3,4,5,6,7,8,9]
count = 0
trials = 1000000

for i in range(trials):
    count = count + len(set(random.choices(digits, k=10)))

print('mean number of different digits =', round(count/trials, 2))

And here is the output

mean number of different digits = 6.51

But there is more. I started thinking what if we did the same thing in base 16, with 16 digits 0 - 9, A, B, C, D, E, F? Or used the whole alphabet of 26 letters, or went beyond that?  Suppose we used a n-character alphabet?

This smelt like a situation where Euler's famous number e ≈ 2.71828 might crop and when I investigated, there it was, right on cue. The beautiful result is that with an n-character alphabet, for large n the expected number of distinct characters in a run of n random choices approaches 

n dot operator left parenthesis one minus one divided by e right parenthesis
 
For our original case n = 10 ths would give 6.32, so the value of 6.51 that we found is already quite close.

Permalink
Share post
Richard Walker

Sock Problem Solution

Visible to anyone in the world
Edited by Richard Walker, Thursday 8 May 2025 at 20:15
The question didn't say how many pairs of sock are involved, so suppose we have 2 socks of each colour and n colours. so there are 2n socks altogether.

Then as the socks are drawn one by one from the draw it is possible that socks 1 and 2 match, socks 2 and 3 match, and so on, up to socks 2n - 1 and 2n. In all there are 2n - 1 possible pairs of consecutive socks whose colour could match.

In each consecutive pair once the colour of the first sock is known there is then a 1/(2- 1) probability that the next sock with be the same colour, or in other words the mean expectation of the colours matching is 1/(2- 1).

But this applies to all 2n - 1 pairs and by the principle known as linearity of expectation we can find the overall expectation simply by adding the 2n - 1 individual expectations. But (2n - 1) x 1/(2- 1) = 1 and so we conclude the expected number of matches is 1, whatever the number of pairs of socks in the draw!

I wrote a Python program and ran it, and sure enough as predicted the result is always close to 1.

n = 20
trials = 10000
sox = list(range(n)) * 2
matches = 0
import random
for i in range(trials):
    random.shuffle(sox)
        for j in range(2*n - 1):
        if sox[j] == sox[j + 1]: # do socks match?
            matches = matches + 
print('Mean =', matches/trials)
Permalink
Share post
Richard Walker

Linearity of Expectation Again

Visible to anyone in the world

Linearity of Expectation can pop in surprising places.

Buffon's needle is a famous experiment first published by the Comte de Buffon in 1777. His idea was to repeatedly throw a short needle on to an array of equally spaced parallel lines and count how often the needle crossed a line. This can then be used to estimate the value of the famous number π, a rather surprising fact.

Suppose the lines are spaced 1 unit apart and the needle is of length L less than 1, so the needle will cross at most one line. Then it can be shown the average number of crossings is 2 x L/π.

The standard way to work this out uses calculus, but reading around Linearity of Expectation I came a cross a far simpler way.

In the sketch below the circles are of diameter 1 and you can see that a circle however positioned will have exactly 2 points in common with the array of paralle lines.


So we can say immediately that the expected number of crossings when a circle is placed on the array of parallel lines is 2.

Now imagine replacing the circle with a polygon made up of many tiny needles each of identical length L. The polygon will only be an approximate circle, but if the needles are very short the approximation will be good.

Given the circle has diameter 1 its circumference will be π x 1 = π and the approximating polygon will require π / L needles. Suppose we define an indicator which is 1 if a given needle crosses a line and 0 otherwise. Let the expected value of this indicator variable be E. This is the same for all the needles; it represents the average number of crossings, not the actual number of crossing by any particular needle.

Using linearity of expectation and adding this expectation E across all the π / L needles we get E x π / L and this must be the expected number of crossing for the circle, because the needles taken together make up the circle! So E x π / L = 2.

Rearrange this slightly and voilà! - we have Buffon's formula.

E = 2 x L / π

This extremely neat and simple approach was found by Barbier as long ago as 1860 but I only stumbled across it a couple of days back.

We need to do a little more work before we have a complete proof, because we've assumed L is very small. What if it's not? I'll add something in the comments about this.



Permalink 1 comment (latest comment by Richard Walker, Friday 6 September 2019 at 18:48)
Share post
Richard Walker

Solution To 'Dermatoglyphics' Question

Visible to anyone in the world

The question at

https://learn1.open.ac.uk/mod/oublog/viewpost.php?post=221531

asked for the average, i.e. the mean, number of letters that would be left unchanged in position if the letters of 'Dermatoglyphics' are scrambled at random. The answer is 1 and surprisingly, to me anyway, the answer would be exactly the same for any other word with no repeated letters, whatever its length. Even though I know why this is it still amazes me, it seems to go against intution.

We can try this out with short word.

1 letter - 'a'
There is 1 arrangement, with 1 letter in the right place.
So the average is 1.

2 letters - 'at'
Now there are 2 arrangements.
'at' with 2 letters in the right place.
'ta' with 0 letters in the right place.
So the average is 2 + 0 = 2, divided by 2, which is 1.
 
3 letters - 'cat'
Now there are 6 arrangements.
'cat' with 3 letters in the right place.
'cta' with 1 letter in the right place.
'act' with 1 letter in the right place.
'atc' with 0 letters in the right place.
'tca' with 0 letters in the right place.
'tac' with 1 letter in the right place.
So the average is 3 + 1 + 1 + 0 + 0 + 1 = 6, divided by 6, which is 1.

Doing it this way will rapidly become hard work, but luckily there is something called 'linearity of expectation' which provides a general argument why this average will always be 1. For anyone interested, I'll put something in the comments tomorrow.

Permalink 5 comments (latest comment by Richard Walker, Sunday 1 September 2019 at 22:04)
Share post

This blog might contain posts that are only visible to logged-in users, or where only logged-in users can comment. If you have an account on the system, please log in for full access.

Total visits to this blog: 3474450