Tuesday, 8 July 2008

Lies, Damn Lies and Planning Statistics

I've been looking at the document that Planning have produced as part of their case for dismembering the Island Plan - "Review of the Island Plan to Rezone Land for Lifelong Dwellings for the over 55s and First Time Buyers - Summary of Responses" (May 2008). There are various statistics relating to questions asked such as:

Do you think land should be rezoned to help meet the needs of first time buyers (82% yes)
Do you think land should be rezoned to help to meet the needs of social rent housing for the over 55s? (64% yes)
Do you think land should be rezoned to help meet the needs of housing for the over 55s enabling home owners to downsize (69% yes)

Some very pretty pie charts are given. However...

The statistics given are from a very small number of responses - 86 written responses, and several public meetings. The comments derived from public meetings were from small numbers (100 people at any one meeting at most) and do not indicate how many of those attending agreed with the statements made.

Basically, what we have with this report is a self-selected study sample. This is very weak statistically, especially given the small numbers of responses. Self-selected samples are notorious for bias, with interest groups and activitists dominating. For example, G.R Langlois Ltd, a local builder, is one of those submitting a written response. There are not (as far as I am aware) one builder for every other 85 members of the population, so any such submission is disproportionate in its weighting. In this kind of submission, to report the views of such a narrow group of people as representative of the Island as a whole seems unbalanced in the extreme.

Statistically, a self-selective study sample can only be used to establish a hypothesis. A hypothesis established on the basis of the characteristics of a self-selected sample can be valid for the sample population, but it cannot be used to draw valid conclusions about the general population from which the self-selected or skewed sample was chosen. It needs proper testing by random sampling and larger numbers.

No valid projections can be made from the results of a statistical analysis of a non-randomly selected study sample. This is because the sample is not representative of the full population, so that projecting data beyond the sample is not justified., because the sampling error is unknown and cannot be measured.

This is what vitiates any opinion poll run by Channel Television or the Jersey Evening Post, and makes them merely "fun" . These polls do not work as a barometer of views precisely because they rely on respondents returning surveys themselves, which more often than not produces a self-selecting sample who are usually biased in one direction or another. This is why the main polling companies will seek out a representative sample ( a necessary condition for predicting the behaviour of a wider group). But Planning are seeking to base a decision on precisely the same kind of response as these opinion polls!

The advantage of probability sampling is that sampling error can be calculated. Sampling error is the degree to which a sample might differ from the population. When inferring to the population, results are reported plus or minus the sampling error. In nonprobability sampling, the degree to which the sample differs from the population remains unknown.


A "self-selecting" sample - getting people to actively respond to proposals - is simply not comparable to another sample whose members were selected at random. For this reason, the study itself, should come with a warning emphasizing that those who chose to participate may not be representative of the general population, and that the unfeasibility of obtaining a representative sample constitutes a major limitation of this study.

I would recommend that Planning contact the States Statistical Department, and ask how to set up proper stratified samples, so that they don't come out with results that, quite honestly, would be good examples in an A-Level mathematics course of how not to present statistics.

For a detailed guide to sampling bias, I would recommend for the layman

Darrell Huff's "How to Lie with Statistics", still in print, and still the easiest introduction for the non-mathematician. A snip at £5.99

http://www.amazon.co.uk/How-Lie-Statistics-Penguin-Business/dp/0140136290/ref=sr_1_1?ie=UTF8&s=books&qid=1215501549&sr=8-1


The Tiger That Isn't: Seeing Through a World of Numbers by Michael Blastland and Andrew Dilnot (
£9.09

http://www.amazon.co.uk/Tiger-That-Isnt-Through-Numbers/dp/1861978391/ref=sr_1_1?ie=UTF8&s=books&qid=1215501670&sr=1-1

Darrell Huff and Fifty Years of How to Lie with Statistics, online at:
http://www-stat.wharton.upenn.edu/~steele/Publications/PDF/SteeleSS2005.pdf

How to Lie with Statistics by Kjell Konis, online at
http://www.stats.ox.ac.uk/~konis/talks/HtLwS.pdf

And from Joel Best, Professor and Chair of Sociology and Criminal Justice at the University of Delaware, some papers online at
http://www.statlit.org/Best.htm


and also some books:

Damned Lies and Statistics: Untangling Numbers from the Media, Politicians and Activists - Joel Best
http://www.amazon.co.uk/Damned-Lies-Statistics-Untangling-Politicians/dp/0520219783/ref=pd_rhf_f_t_cs_1


More Damned Lies and Statistics: How Numbers Confuse Public Issues - Joel Best
http://www.amazon.co.uk/More-Damned-Lies-Statistics-Numbers/dp/0520238303/ref=pd_sim_b_2


And see also, on kinds of Sample
http://www.statpac.com/surveys/sampling.htm

2 comments:

voiceforchildren said...

Tony.

Aren't statistics a very peculiar subject.

Our "powers that be" through their mouth peice (the entire local media) are very quick to give us statistics that support them and theirs.

I have a statitistic I would like to throw into the equation. It involves a vote by one of my elected "representitives" Deputy Ian (GST 28) Gorst It was on a proposal by Len Norman to postpone the introduction of GST until May 2009.

I asked Deputy Ian (GST 28) Gorst to "represent" me and indeed the majority of the island, by voting in support of Len Normans ammendment.

Deputy Ian (GST 28) voted against the proposition and stated that I was the only parishoner that had contacted him on this subject.

So that being the case, wouldn't it be fare to say (as a statistic) 100% of the people that contacted him were in favour of the proposition? nobody he had heard from opposed it and he voted against 100% of the people who had contacted him?

TonyTheProf said...

If you look at my section Chesterton on Democracy, you will see that very very few elected politicians ever bother to much in the way of soundings from their constituents. That's why voting become a "token" and makes little change, despite the rhetoric on democracy.

http://tonymusings.blogspot.com/2008/07/chesterton-on-democracy.html