Neat method for calculating how long it takes for a sequence of coin flips to appear

Taken from Probability in the Real World by David Aldous.

There are a bunch of math puzzles of the form “What is the expected number of times to flip a coin until XXXXX appears” where XXXXX is some sequence of Heads and Tails. (See here for example, and I know a few financial trading firms like to use them as interview questions). Here is a neat argument for calculating them.

(Using David’s example)

What is the expected number of flips of a coin until the sequence HTHT appears?

Consider the following betting strategy on a sequence of coin flips, where you either double your money or lose it all:

1/ Bet 1 on the first flip being Heads, if win, continue
2/ Bet 2 (your winnings) on the second flip being Tails, if win, continue
3/ Bet 4 (your winnings) on the third flip being Heads, if win, continue
4/ Bet 8 (your winnings) on the fourth flip being Tails.
5/ Stop, you’ve either won 16, or lost all your money.

If a new person starts playing this on every flip, and the sequence HTHT appears after K flips, how much does the house gain / lose?

Well, the house gains K coins from all the initial betting. They will pay out 16 to the person who got all 4 correct, 4 to the person who got the first two correct, and nothing to everyone else. ie the cost is K-20.

The expected winnings should be 0, (since the game was fair), therefore E[K-20] = 0, ie the expected value for K is 20.

Lawyers and Central Bankers

central-bankers

Powell has been confirmed by the US Senate as next Federal Reserve Chair, starting 3 February. Powell is not an economist, but a lawyer. The head of the German Reichsbank in the Weimar hyperinflation was not an economist, but a lawyer. It is a good thing correlation does not equal causation.

Paul Donovan at UBS

The Fed’s last lawyer chairperson (William Miller) didn’t have a great record either. The Bank of England’s lawyer-leaders didn’t fare so badly.

Overthinking Superbowl Squares

Superbowl Squares is pretty simple. (And I’m going to analyse an especially simple version). Take a 10×10 grid of and assign the numbers 0-9 to each row and each column (at random), assign one team to the rows and another team to the columns.

grid

The winner is the person who picks the last digit of each teams score at the end of the match (alternatively each quarter). e.g. if the score was 18-12 to the Patriots, then the person who picks the orange “x” cell wins the pot.

Typically you pick the squares before knowing which number is assigned to each column. Why? Because some squares are probably more likely than others – it’s impossible for the score to be 1-0, so the cell (1,0) is probably less likely to win. How much more or less likely? Well, that’s what we’re  going to find out.

Spoilers: tl;dr If you get a choice, pick one of 7-0, 7-4, 4-7 to the favoured team.

Calculation 1 – Simple calculation

We can look at a long history of NFL games, and check out what the final scores were in each of those games. Then calculate the frequency of each square. Using the data from pro-football-reference.com (PFR), we can calculate this fairly simply.

simplest-grid

The best squares are those with scores of the form x7-x0. (For example, the most popular score is 20-17, which occurs in 260 games).

However, we should be able to do quite a bit better that this. For one thing, this grid is symmetric, and given that the Patriots are 2-1 on (~66% chance of winning), we might expect the grid to be less symmetric than this.

Calculation 2 -using simple winner model

winner

Creating the same grid as before, but conditioning on “Patriots win”. Comparing this with the symmetric grid, we see that winning means that 4-3 is relatively more likely and 3-4 is relatively less likely. (7-1 is relatively less likely, this is just last digits, so we shouldn’t necessarily expect larger number – smaller number scores to become more likely).

A simple model would be to take the odds for the match, and do a weighted average of the winner grid and the loser grid. This appears as follows:

weighted-winner

This still looks fairly similar to the original grid, so let’s look at the differences to see which squares are most improved.

weighted-winner-diff

So the biggest winners (red) are 1-0, 4-0,  3-0, 8-7, 4-3, 1-7. (NB, the difference matrix is skew-symmetric by construction*).

It appears as though the loser scoring a multiple of 10 (including 0) is more likely. I might look into why at a later date.

Calculation 3 – adjusting using market odds

So far we’ve not done anything especially complicated. From here on we go down a rabbit hole.

Total score

scoreline-cdf

Looking at the Betfair market for Total Score, we can see that compared to our data source we are expecting more points to be scored. (A simpler way to see this is the median score in our data set is 40, Betfair has 48). (Note to self – is this particular to these teams, or have score lines increased more recently?)

We then can then approximate the full Betfair points distribution by taking the PFR shape and fitting it to the Betfair values.

Points difference

We can learn even more than just Patriots are favourites from Betfair. Betfair has odds on score differences with handicaps from Eagles -10 to Eagles +15. (These odds give market probabilities that if you add “x” to the Eagle’s scoreline the Eagles win).

Firstly, let’s take a look at the empirical points distribution. This appears to be very similar to a logistic distribution, but with quite a bit of noise close to zero. (This is not too surprising, given what we’ve seen before with the last digits across all matches).

points_diff

Now let’s look at the Betfair market:

shifted_logicThe blue dots are Betfair probabilities, which match fairly closely the logistic calculated empirically. ie a shifted logistic function is a reasonable approximation for both the empirical distribution and the distribution for this match. (And we can compute odds-ratios between the two using these functions).

Putting it all together

We have the frequencies of scores from our empirical distribution and we are interested in the frequencies of scores given our new information. Using Bayes, (and crossing our fingers and hoping the correlation between total score and points difference doesn’t matter too much), the probability of each score should now be:

\propto f \cdot \frac{\text{score difference from shifted logistic}}{\text{score from unshifted logistic}} \cdot \frac{\text{total score from modified cdf}}{\text{total score from empirical cdf}}

grid_final_ev

The differences – fairly similar to the simpler model although slightly more extreme in places.

grid_final

The more informative way to plot this is not absolute difference (which is what you might care about if you were marking your P&L on this), but the ratio of the change (to see where the new value is appearing. This looks as follows (using the log ratio).

weighted-winner-ratio

Looking at this, it appears that losing and scoring a 5 is hard. This is presumably because both 5 and 15 are hard to achieve, and to get to 25 means you are much more likely to win.

This whole exercise has got me wondering, could a toy model match these results. (Each play selected at random from the scoring plays at the frequencies achieved in the league. What would happen as you increase the expected number of plays. What happens in the limit as the number of plays goes to infinity.).

* A_{\text{winner}} = A_{\text{loser}}^{T}, A_{\text{estimate}} = p_{win} A_{\text{winner}} +  (1-p_{win}) A_{\text{winner}}^T
A_{\text{diff}} =\frac{1}{2} (A_{\text{winner}}+A_{\text{winner}}^T) - p_{win} A_{\text{winner}} +  (1-p_{win}) A_{\text{winner}}^T
= (\frac{1}{2} - p_{win})A_{\text{winner}} -  (\frac{1}{2} - p_{win})A_{\text{winner}}^T which is clearly skew-symmetric

Ireland Facts of the Day

On the Irish Potato Famine (emphasis mine):

In 1841 the Irish census revealed that just over 8 million lived on the island; and, by 1845, when the potato blight struck, that figure was closer to 8.5 million. By 1851, when the Famine had run its course, the census of that year showed that the Irish population had fallen by over 20 per cent, with one million dead from starvation and disease and another million or so having fled to Britain or north America.

In just thirty years the population of Ireland had fallen by approximately one-third, with nearly 3 million people missing from the records.

and

However, a reluctance to return to Ireland did not mean that the emigrants ceased to think about the country. On the contrary, they took an abiding interest in Irish politics and developments. More dramatically – and probably with greater impact – by the remittance of substantial sums or money: between 1860 and 1880, some $30 million – two-fifths of which was in the form of in the form of prepaid tickets – was sent to relatives in Ireland by Irish emigrants in the United States, and as late as the 1950s emigrant remittances constituted some 2 per cent of southern Ireland’s gross national product.

This is from Ireland: A History by Thomas Bartlett. (Originally sourced from Tyler Cowen’s list of best books about each country)

This whole article’s format is unashamedly stolen from Marginal Revolution.

Media Influences (Part 1 of n) – Podcasts Jan 2018 Edition

As discussed earlier, these are the podcasts which I am currently subscribed to. I manage my podcasts using Pocket Casts. For a few of these, I have selected a few recent episodes I would recommend to start with. I have attempted to rate each series out of 5. I suspect I will lean heavily on the end of the 5s. Roughly speaking, this means:

5 – Highly recommended. If you’re not listening, I would strongly encourage you to give it a try
4 – Recommended. I would encourage you to try it.
3 – Neutral. I am not ashamed of listening to this, but I would find it hard to convince someone to try in good conscious.
2 – Not recommended.
1 – Pretty bad, but not so bad that I haven’t found time for it.

Continue reading “Media Influences (Part 1 of n) – Podcasts Jan 2018 Edition”

Media Influences (Part 0 of n)

I am planning on writing a series regarding all of the media I consume. I’m currently thinking things along the lines of RSS Feeds / Podcasts / TV series / Subreddits / Websites / YouTube. I have several reasons for this:

1/ We are heavily influenced by the media we consume. This should give readers some idea where I’m coming from.
2/ To get suggestion from others for what I’m missing. Feel free to comment on the posts with suggestions of related content you think I might enjoy!
3/ To have something to point at when people ask for recommendations for media. Hopefully I will be able to update this somewhat frequently.

At the same time, publishing this information feels very personal. Each piece of media I consume, says something about the type of person I am. You can get a very good idea of who I am by the balance of media I consume. Hopefully they point towards “balanced”, and if you think there is a (high quality) alternative point of view I’m missing, I would love to read / listen to / watch it.