Lies, damned lies and… We all know the end of the quotation is ‘statistics’, but as both Spiegelhalter and Criado Perez demonstrate, it should really be ‘misunderstood statistics’ or possibly ‘deliberately misleading statistics’ which wouldn’t be as snappy but would be more accurate. We are bombarded with convincing phrases ‘backed up’ with persuasive figures by politicians and journalists, many of whom are alarmingly inept numerically. For example, in 2012, 60 out of 97 MPs could not give the correct answer to the question ‘If you spin a coin twice what is the probability of getting two heads?’ Sadly, the same might apply to many educators and that’s why we should all read these books.

Spiegelhalter’s book is the more technical of the two and should be on every economics and business teacher’s bookshelf. His big ‘point of difference’ from most statistics experts is to understand that the presentation and explanation of statistical data is as important as the technical calculations; it’s no good being correct if no-one can understand why you’re correct. Without careful, clear presentation the blaggers with their coloured charts and simple slogans will carry the day. Pointless exam results, reviews and silly pie charts anyone?

He reframes statistics into problem solving cycles which are not complete until the solution has been clearly communicated. The chapters cover the usual suspects of regression, correlation, sampling and probability but he practises what he preaches by using interesting stories and examples to keep the reader engaged and able to follow without drowning in formulae. Harold Shipman, infant heart operation mortality and sexual behaviour data are some of the subjects that keep one reading because, like a detective novel, we want to see whodunit. It’s also quite fun to find out that eating bacon sandwiches isn’t as bad for you as the health police would have you believe!

For the data geeks there is also a fascinating explanation of Bayesian methods (as used by Turing and co. to code break) under conditions of ignorance and high uncertainty. This has obvious application to CV19 where the world was starting from a point of near zero data and having to build workable models from the ground up, adapting as each new piece of information appeared. If the media had been given access to Bletchley methods during WW2 they’d have screamed ‘U-turn, codebreakers admit they are stumbling in the dark whilst hundreds die!’

Criado Perez’s book is more campaigning as the title suggests but she has done her homework and produced a considered argument, backed by numerical analysis, on the hidden biases in data collection and application. The main issue (also covered by Spiegelhalter) is that statistics works by sampling. It is impossible to get accurate data on the whole population so it is collected from small groups and the conclusions scaled up. Standard practice, but of course the key is choosing a sample that is representative of the population and that’s where the problems start.

Criado Perez demonstrates that far too many samples use men (and often white men, there’s a BAME issue too) as the default for ‘human’. For example, we all ‘know’ that men are more likely to die of a heart attack don’t we? Er, maybe not; women frequently go undiagnosed until it’s too late because the diagnostic model was based on male data. It is eye opening to find out that much of what we ‘know’ is based on male only samples (even animal trials often only use male mice). Or what about car safety when crash test dummies were the size and shape of the average male? Turns out male and female bodies are shaped and function differently, who’d have guessed?

What about those algorithms that were supposed to objectify job recruitment by cutting out human bias? Unfortunately they often make things worse because the algorithms are built on past data about successful senior managers but historically most of these were white, university educated men so guess what the algorithm tends to select? Once we know this we can re-write the algorithms but, like many things that seem ‘obvious’, someone like Criado Perez has to point it out before anything changes.

There’s more, much of it highly persuasive and illuminating, especially the section on effective targeting of Aid money to the poorest countries. This book is a must read for educators and Oxbridge candidates and highly recommended for all students but I do have one caveat. Criado Perez has done valuable work in pointing out the problems when the samples exclude women and how easily this can be corrected but this could also be extended. In areas like medicine and education the Holy Grail is to find a cheap way of providing solutions tailored to the complex individual. The current explosion in data collection is raising beguiling possibilities, but this brings its own challenges and this could have been covered.

Data analysis is more important than ever now we have the possibility of collecting and processing information in previously undreamed of quantity and detail. Everyone should be data literate and these two books are a good place to start.

Ruth Corderoy