In this worksheet we will write some functions for simulating throws of dice and investigate random number generation.

R has plenty of random number generators.

- Uniform(0,1) with
`runif(10)`

:`0.6515, 0.2056, 0.7945, 0.7793, 0.7888, 0.152, 0.4109, 0.7085, 0.1717, 0.5241`

- Normal(0,1) with
`rnorm(10)`

:`0.4415, 0.6087, 0.1816, -0.0075, 0.08, -0.5867, 2.1508, -0.0653, -1.0329, 0.1495`

- Poisson(lambda) with
`rpois(10,lambda=3)`

:`3, 2, 3, 4, 1, 3, 5, 2, 1, 3`

- ...and many more.

Each of these has parameters to control the mean, standard deviation, and whatever other parameters the distribution has.

How do we simulate single die rolls?

d1 = as.integer(runif(1, 1, 6)) d1

[1] 5

Well possibly. But let's roll a few more and tabulate them to check:

table(as.integer(runif(10000, 1, 6)))

1 2 3 4 5 2013 2010 1917 2046 2014

Oops, we seem to have rolled a d5. We didn't read the help for `runif`

well enough. The upper limit is excluded
and we round down. Next try:

table(as.integer(runif(10000, 1, 7)))

1 2 3 4 5 6 1701 1624 1672 1676 1645 1682

Much better.

Another method is to use the `sample`

function:

```
sample(1:6, 1)
```

[1] 6

This takes a random sample of 1 value from its first argument. With a slight tweak we can take multiple values:

table(sample(1:6, 10000))

Error: cannot take a sample larger than the population when 'replace = FALSE'

table(sample(1:6, 10000, replace = TRUE))

1 2 3 4 5 6 1627 1678 1593 1661 1679 1762

We should probably wrap this into a function. The input is just the number of single die rolls to do, and the return value is a vector of that length.

die = function(n) { return(sample(1:6, n, replace = TRUE)) }

check all the following work:

die(1) die(3) table(die(5)) table(die(1000))

Good functions should probably cope with bad input. What does yours do with these:

die(0) die(1, 2, 3)

Error: unused argument(s) (2, 3)

die(pi) die(-1)

Error: invalid 'size' argument

How much of this behaviour is expected?

Let's fix some things with a simple check:

die = function(n) { if (n < 1) { stop("n is less than one") } return(sample(1:6, n, replace = TRUE)) } die(0)

Error: n is less than one

What other conditions should you check for?

Now we can roll one die a thousand times. It's distribution looks flat:

plot(table(die(10000)))

What if we sum two dice? We can just call our function twice and add the resulting vectors - remember R adds each element, and as long as our vectors are the same length this will work as planned.

plot(table(die(10000) + die(10000)))

This gives us a neat triangular distribution peaking at seven. Let's extend this to more dice:

par(mfrow = c(2, 2)) plot(table(die(10000) + die(10000))) plot(table(die(10000) + die(10000) + die(10000) + die(10000))) plot(table(die(10000) + die(10000) + die(10000) + die(10000) + die(10000))) plot(table(die(10000) + die(10000) + die(10000) + die(10000) + die(10000) + die(10000) + die(10000)))

par(mfrow = c(1, 1))

Repetition like that is a pain to write, and a pain to debug. Let the computer do the work. We'll
now write a `dice`

function to throw multiple dice and return the sum.

- Input:
- T, number of throws
- N, number of dice in each set

- Output: vector with length equal to number of throws containing sums of sets.

There's various ways to do this. The neatest may be to create a 2-d matrix of die scores and then add up the rows. R has some functions for this:

m = matrix(die(33), 3, 11) m

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [1,] 6 3 3 3 6 4 6 3 6 5 4 [2,] 1 4 1 4 4 6 3 2 6 2 1 [3,] 4 5 1 6 3 6 5 5 1 2 1

```
apply(m, 2, sum)
```

[1] 11 12 5 13 13 16 14 10 13 9 6

This creates a 3x11 matrix of single die rolls and then uses `apply`

to sum along dimension 2, which is columns.
The `apply`

function is a general purpose function for doing anything along rows or columns
of a matrix, in this case the `sum`

function.

We can actually use `colSums`

here. Let's build our function. We set
a default of ten throws of two dice.

dice = function(T = 10, N = 2) { m = matrix(die(N * T), N, T) return(colSums(m)) } dice()

[1] 10 7 7 10 4 7 8 9 3 7

```
dice(5, 10)
```

[1] 36 40 32 35 37

What happens if you give nonsensical inputs to this function?

Ever wanted to throw 100 dice a thousand times?

lots = dice(1000, 100) hist(lots)

There's some statistical theory that says the more dice you roll, the more it looks like a Normal distribution. Let's do a qqplot.

```
summary(lots)
```

Min. 1st Qu. Median Mean 3rd Qu. Max. 298 339 351 350 362 402

qqnorm(scale(lots)) # rescale our dice to mean 0, sd 1

While we're rolling dice, let's look at Yahtzee, a dice game that is a trademark of Hasbro. In it the player rolls five dice and tries to get certain patterns. For example getting all five dice the same number is a "Yahtzee" and scores fifty points. A triple and a pair is a "Full House" and scores 25 points. Let's write some code to check for some of the Yahtzee patterns in a vector of dice rolls.

What are our functions going to look like? They should take a vector of length 5 of values from 1 to 6 and return TRUE or FALSE.

is.yahtzee = function(d6) { } is.fullhouse = function(d6) { } is.longstraight = function(d6) { }

and so on. Let's take these in order.

To score a yahtzee, all the dice are the same. Which means all the dice are the same as the first one.
We use the `all`

function, which returns a single TRUE if fed a vector of all TRUE values.

all(c(TRUE, TRUE, TRUE))

[1] TRUE

all(c(TRUE, FALSE, FALSE))

[1] FALSE

all(c(FALSE, FALSE, FALSE))

[1] FALSE

is.yahtzee = function(d6) { return(all(d6 == d6[1])) } is.yahtzee(c(1, 2, 3, 4, 5)) # no

[1] FALSE

is.yahtzee(c(3, 3, 3, 3, 3)) # yes

[1] TRUE

is.yahtzee(c(3, 3, 3, 3, 6)) # no

[1] FALSE

To test for this we can tabulate the dice rolls, sort the table, and see if we get (0,0,0,0,2,3). We have to convert the dice roll into a factor with six levels to get zeroes in the table.

is.fullhouse = function(d6) { return(all(sort(table(factor(d6, levels = 1:6))) == c(0, 0, 0, 0, 2, 3))) } is.fullhouse(c(1, 2, 1, 2, 2)) # yes

[1] TRUE

is.fullhouse(c(1, 2, 1, 2, 3)) # no

[1] FALSE

is.fullhouse(c(6, 6, 5, 5, 6)) # yes

[1] TRUE

is.fullhouse(c(3, 3, 3, 3, 3)) # no....

[1] FALSE

Is that that last result correct? Is a yahtzee also a full house? It is a triple and a pair, just both of the same number. This is often a source of arguments among children, and without source code how can they be sure? Let's change our function to allow it.

is.fullhouse = function(d6) { return(is.yahtzee(d6) | all(sort(table(factor(d6, levels = 1:6))) == c(0, 0, 0, 0, 2, 3))) } is.fullhouse(c(1, 1, 1, 1, 1)) # yes

[1] TRUE

is.fullhouse(c(1, 2, 1, 2, 1)) # yes

[1] TRUE

is.fullhouse(c(1, 1, 1, 1, 6)) # no

[1] FALSE

This is a good point to mention the `testthat`

package. It lets you write a
set of tests which you can run to make sure that changing your function doesn't break any of
the functionality you expect. The tests would be a set of expressions and expected values (or expected errors)
which you can run with a single function. If anything returns a different value to that specified in the
test, or if it returns an error when you didn't expect one, or if it **doesn't** return an
error when you specified that it darn well **should**, this will get flagged up.

Here's a few tests that should run without errors. Note I'm testing for an expected error in the last line.

```
require(testthat)
```

expect_true(is.yahtzee(c(1, 1, 1, 1, 1))) expect_false(is.yahtzee(c(1, 1, 1, 1, 2))) expect_error(die(0))

A long straight is either 1,2,3,4,5 or 2,3,4,5,6. We could sort the dice and check for these two conditions, or we could check the sorted differences are all equal to one.

is.longstraight = function(d6) { return(all(diff(sort(d6)) == 1)) } is.longstraight(c(4, 3, 2, 3, 4)) # no

[1] FALSE

is.longstraight(c(5, 4, 3, 2, 1)) # yes

[1] TRUE

is.longstraight(c(1, 2, 3, 4, 5)) # yes

[1] TRUE

is.longstraight(c(6, 2, 3, 4, 5)) # yes

[1] TRUE

Finally let's write some code to see how common various patterns are.

First let's write a function to return a matrix of yahtzee dice throws, default 10

throwfive = function(N = 10) { return(matrix(die(N * 5), N, 5)) } throws = throwfive(3) throws

[,1] [,2] [,3] [,4] [,5] [1,] 2 6 6 5 2 [2,] 5 5 6 2 6 [3,] 3 4 4 3 3

Each row here is a set of five dice. We can now apply our test functions to rows using the `apply`

function. We're going across rows now, so we use 1 as the dimension index.

```
apply(throws, 1, is.yahtzee)
```

[1] FALSE FALSE FALSE

```
apply(throws, 1, is.fullhouse)
```

[1] FALSE FALSE TRUE

```
apply(throws, 1, is.longstraight)
```

[1] FALSE FALSE FALSE

Let's scale this up. A thousand throws... Then tabulate how many full houses and long straights we got:

throws = throwfive(1000) table(apply(throws, 1, is.fullhouse))

FALSE TRUE 964 36

table(apply(throws, 1, is.longstraight))

FALSE TRUE 962 38

By simple division we can work out the probability.

Did we get any Yahtzees??

if (any(apply(throws, 1, is.yahtzee))) { print("Yahtzee!!") }

How many? Work out the probability of throwing a Yahtzee numerically. Note this is just from one throw of five dice.

Write some more functions for checking the other yahtzee score throws (see Wikipedia for details). Note that four-of-a-kind is also three-of-a-kind, and a short straight is also a long straight.

The `throwfive`

function looks a bit like the `dice`

function. How could you rewrite
the code to avoid duplication?

Here's another interesting thing. Passing a function to a function:

applyDice = function(T, N, f) { d = matrix(die(N * T), N, T) return(apply(d, 2, f)) } applyDice(20, 5, sum)

[1] 18 14 13 14 14 19 22 17 20 15 16 17 20 19 17 17 18 20 22 19

```
applyDice(20, 5, is.fullhouse)
```

[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE [20] FALSE

When playing Yahtzee you throw five dice, then you can keep some and rethrow the others in a bid to improve your score, and then you can do that again. Simulate that in R, and work out the probability of getting yahtzee in real play! What's the best strategy for rethrows? My code for this has a 'verbose' argument that says what it is doing, and returns -1 if it fails or the number of throws it took to get a Yahtzee.

getYahtzee(verbose = TRUE)

First throw... Got 3 of a kind Second throw... Got 3 of a kind Last chance... No luck this time :(

[1] -1

getYahtzee(verbose = TRUE)

First throw... Got 2 of a kind Second throw... Got 2 of a kind Last chance... No luck this time :(

[1] -1

getYahtzee(verbose = TRUE)

First throw... Got 2 of a kind Second throw... Got 3 of a kind Last chance... Yahtzee! Last roll!

[1] 3

Now repeat a thousand times:

y = rep(0, 1000) for (i in 1:1000) { y[i] = getYahtzee() } table(y > 0)

FALSE TRUE 957 43