by Richard Dawkins. Published as "Three herring gull chicks . . the reason juries don't work" in The Observer (London), Sunday November 16, 1997.
Trial by jury must be one of the most conspicuously bad good ideas anyone ever had. Its devisers can hardly be blamed. They lived before the principles of statistical sampling and experimental design had been worked out. They weren't scientists. Let me explain using an analogy. And if, at the end, somebody objects to my argument on the grounds that humans aren't herring gulls, I'll have failed to get my point across.
Adult herring gulls have a bright yellow bill with a conspicuous red spot near the tip. Their babies peck at the red spot, which induces the parents to regurgitate food for them. Niko Tinbergen, Nobel-Prizewinning zoologist and my old maestro at Oxford, offered naive young chicks a range of cardboard dummy gull heads varying in bill and spot colour, and shape. For each colour, shape or combination, Tinbergen measured the preferences of the baby chicks by counting their pecks in a standard time. The idea was to discover whether naive gull chicks are born with a built-in preference for long yellow things with red spots. If so, this would suggest that genes equip the young birds with detailed prior knowledge of the world in which they are about to hatch a world in which food comes out of adult herring gull beaks.
Never mind the reason for the research, and never mind the conclusions. Consider, instead, the methods you must use, and the pitfalls you must avoid, if you want to get a correct result in any such experiment. These turn out to be general principles which apply to human juries as strongly as to gull chicks.
First, you obviously must test more than one chick. It could be that some chicks are red-biased, others blue-biased, with no tendency for herring gull chicks in general to share the same favourite colour. So, by picking out a single chick, you are measuring nothing more than individual bias. It is no answer to this objection that our chick may have given hundreds more pecks to one colour than to the other. A chick might begin by choosing any old colour at random, but once he has chosen he gets 'locked on' to that colour and hammers away at it, giving the other colours no chance. The essential problem here is that successive pecks, however numerous, are not 'independent data'.
So, we must test more than one chick. How many? Is two enough? No, nor is three, and now we must start to think statistically. To make it simple, suppose that in a particular experiment we are comparing only red spots versus blue spots, both on a yellow background, and always presented simultaneously. If we test just two chicks separately, suppose the first chick chooses red. It had a 50% chance of doing so, at random. Now the second chick also happens to choose red. Again, the odds were 50% that it would do so at random, even if it were colourblind. There's a 50% chance that two randomly choosing chicks will agree (half of the four possibilities: red red, red blue, blue red, blue blue). Three chicks aren't enough either. If you write down all the possibilities, you'll find that there's a 25% chance of a unanimous verdict, by luck alone. Twenty five percent, as the odds of reaching a conclusion for the wrong reason, is unacceptably large.
How about twelve good chicks and true? Now you're talking. If twelve chicks are independently offered a choice between two alternatives, the odds that they will all reach the same verdict by chance alone are satisfyingly low, only one in 1024.
But now suppose that, instead of testing our twelve chicks independently, we test them as a group. We take a maelstrom of twelve cheeping chicks and lower into their midst a red spotted dummy and a blue spotted dummy, each fitted with an electrical device for automatically tallying pecks. And suppose that the collective of chicks registers 532 pecks at red and zero at blue. Does this massive disparity show that herring gull chicks, in general, prefer red? Absolutely not. The pecks are not independent data. Chicks could have a strong tendency to imitate one another (as well as imitate themselves in lock-on effects). If one chick just happened to peck at red first, others might copy him and the whole company of chicks join in a frenzy of imitative pecking. As a matter of fact this is precisely what domestic chicken chicks do, and gull chicks are very likely the same. Even if not, the principle remains that the data are not independent and the experiment is therefore invalid. The twelve chicks are strictly equivalent to a single chick, and their summed pecks amount to only a single independent result.
Turning to courts of law, why are twelve jurors preferred to a single judge? Not because they are wiser, more knowledgeable or more practised in the arts of reasoning. Certainly not, and with a vengeance. Think of the astronomical damages awarded by juries in footling libel cases. Think how juries bring out the worst in histrionic, gallery-playing lawyers. Twelve jurors are preferred to one judge only because they are more numerous. Letting a single judge decide a verdict would be like letting a single chick speak for the whole herring gull species. Twelve heads are better than one, because they represent twelve assessments of the evidence.
But for this argument to be valid, the twelve assessments really have to be independent. And of course they are not. Twelve men and women locked in a jury room are like our clutch of twelve gull chicks. Whether they actually imitate each other like chicks, they might. That is enough to invalidate the principle by which a jury might be preferred over a single judge.
In practice, as is well documented and as I remember from the three juries that it has been my misfortune to serve on, juries are massively swayed by one or two vocal individuals. There is also strong pressure to conform to a unanimous verdict, which further undermines the principle of independent data. Increasing the number of jurors doesn't help, or not much (and not at all in strict principle). What you have to increase is the number of independent verdict-reaching units.
Oddly enough, the bizarre American system of televising trials opens up a real possibility of improving the jury system. By the end of trials such as those of Louise Woodward or O.J.Simpson, literally thousands of people around the country have attended to the evidence as assiduously as the official jury. A mass phone-in might produce a fairer verdict than a jury. But unfortunately journalistic discussion, radio talk-shows, and ordinary gossip would violate the Principle of Independent Data and we'd be back where we started. The broadcasting of trials, in any case, has horrible consequences. In the wake of Louise Woodward's trial, the Internet seethes with ill-spelled and ungrammatical viciousness, the cheque-book journalists are queuing up, and the unfortunate Judge Zobel has had to change his telephone number and employ a bodyguard.
So, how can we improve the system? Should twelve jurors be locked in twelve isolation chambers and their opinions separately polled so that they constitute genuinely independent data? If it is objected that some would be too stupid or inarticulate to reach a verdict on their own, we are left wondering why such individuals are allowed on a jury at all. Perhaps there is something to be said for the collective wisdom that emerges when a group of twelve people thrash out a topic together, round a table. But this still leaves the principle of independent data unsatisfied.
Should all cases be tried by two separate juries? Or three? Or twelve? Too expensive, at least if each jury has twelve members. Two juries of six members, or three juries of four members, would probably be an improvement over the present system. But isn't there some way of testing the relative merits of such alternative options, or of comparing the merits of trial by jury versus trial by judge?
Yes, there is. I'll call it the Two Verdicts Concordance Test. It is based on the principle that, if a decision is valid, two independent shots at making it should yield the same result. Just for purposes of the test, we run to the expense of having two juries, listening to the same case and forbidden to talk to members of the other jury. At the end, we lock the two juries in two separate jury rooms and see if they reach the same verdict. If they don't, nothing can be proved beyond reasonable doubt, and this would cast reasonable doubt on the jury system itself.
To make the experimental comparison with Trial by Judge, we need two experienced judges to listen to the same case, and require them too to reach their separate verdicts without talking to each other. Whichever system, Trial by Jury or Trial by Judge, yields the higher score of agreements over a number of trials is the better system and might even be accredited for future use with some confidence.
Would you bet on two independent juries reaching the same verdict in the Louise Woodward case? Could you imagine even one other jury reaching the same verdict in the O.J.Simpson case? Two judges, on the other hand, seem to me rather likely to score well on the concordance test. And should I be charged with a serious crime here's how I want to be tried. If I know myself to be guilty, I'll go with the loose cannon of a jury, the more ignorant, prejudiced and capricious the better. But if I am innocent, and the ideal of multiple independent decision-takers is unavailable, please give me a judge. Preferably Judge Hiller Zobel.
Chesteron on Juries (an extract from "The Twelve Men"):
The trend of our epoch up to this time has been consistently towards specialism and professionalism. We tend to have trained soldiers because they fight better, trained singers because they sing better, trained dancers because they dance better, specially instructed laughers because they laugh better, and so on and so on. The principle has been applied to law and politics by innumerable modern writers. Many Fabians have insisted that a greater part of our political work should be performed by experts. Many legalists have declared that the untrained jury should be altogether supplanted by the trained judge.
Now, if this world of ours were really what is called reasonable, I do not know that there would be any fault to find with this. But the true result of all experience and the true foundation of all religion is this. That the four or five things that it is most practically essential that a man should know, are all of them what people call paradoxes. That is to say, that though we all find them in life to be mere plain truths, yet we cannot easily state them in words without being guilty of seeming verbal contradictions. One of them, for instance, is the unimpeachable platitude that the man who finds most pleasure for himself is often the man who least hunts for it. Another is the paradox of courage; the fact that the way to avoid death is not to have too much aversion to it. Whoever is careless enough of his bones to climb some hopeful cliff above the tide may save his bones by that carelessness. Whoever will lose his life, the same shall save it; an entirely practical and prosaic statement.
Now, one of these four or five paradoxes which should be taught to every infant prattling at his mother's knee is the following: That the more a man looks at a thing, the less he can see it, and the more a man learns a thing the less he knows it. The Fabian argument of the expert, that the man who is trained should be the man who is trusted would be absolutely unanswerable if it were really true that a man who studied a thing and practised it every day went on seeing more and more of its significance. But he does not. He goes on seeing less and less of its significance. In the same way, alas! we all go on every day, unless we are continually goading ourselves into gratitude and humility, seeing less and less of the significance of the sky or the stones.
Now it is a terrible business to mark a man out for the vengeance of men. But it is a thing to which a man can grow accustomed, as he can to other terrible things; he can even grow accustomed to the sun. And the horrible thing about all legal officials, even the best, about all judges, magistrates, barristers, detectives, and policemen, is not that they are wicked (some of them are good), not that they are stupid (several of them are quite intelligent), it is simply that they have got used to it.
Strictly they do not see the prisoner in the dock; all they see is the usual man in the usual place. They do not see the awful court of judgment; they only see their own workshop. Therefore, the instinct of Christian civilization has most wisely declared that into their judgments there shall upon every occasion be infused fresh blood and fresh thoughts from the streets. Men shall come in who can see the court and the crowd, and coarse faces of the policeman and the professional criminals, the wasted faces of the wastrels, the unreal faces of the gesticulating counsel, and see it all as one sees a new picture or a play hitherto unvisited.
Our civilization has decided, and very justly decided, that determining the guilt or innocence of men is a thing too important to be trusted to trained men. It wishes for light upon that awful matter, it asks men who know no more law than I know, but who can feel the things that I felt in the jury box. When it wants a library catalogued, or the solar system discovered, or any trifle of that kind, it uses up specialists. But when it wishes anything done which is really serious, it collects twelve of the ordinary men standing round. The same thing was done, if I remember right, by the Founder of Christianity.
No comments:
Post a Comment