{ "cells": [ { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [ "hide-cell" ] }, "outputs": [], "source": [ "from IPython.display import display, HTML, Markdown\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import seaborn as sns\n", "from myst_nb import glue\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Why do we have to learn statistics?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{note}\n", "This chapter is adapted from Danielle Navarro's excellent [Learning Statistics with R](https://learningstatisticswithr.com) book {cite}`Navarro2011`. The main text has mainly be left intact with a few modifications, also the code adapted to use python and jupyter.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> \"Thou shalt not answer questionnaires \n", "Or quizzes upon World Affairs, \n", "Nor with compliance \n", "Take any test. \n", "Thou shalt not sit \n", "With statisticians nor commit\" \n", "-W.H. Auden [^quote]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[^quote]: The quote comes from Auden's 1946 poem Under Which Lyre: A Reactionary Tract for the Times, delivered as part of a commencement address at Harvard University. The history of the poem is kind of interesting: http://harvardmagazine.com/2007/11/a-poets-warning.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## On the psychology of statistics\n", "\n", "To the surprise of many students, statistics is a fairly significant part of a psychological education. To the surprise of no-one, statistics is very rarely the *favourite* part of one's psychological education. After all, if you really loved the idea of doing statistics, you'd probably be enrolled in a statistics class right now, not a psychology class. So, not surprisingly, there's a pretty large proportion of the student base that isn't happy about the fact that psychology has so much statistics in it. In view of this, I thought that the right place to start might be to answer some of the more common questions that people have about stats...\n", "\n", "A big part of this issue at hand relates to the very idea of statistics. What is it? What's it there for? And why are scientists so bloody obsessed with it? These are all good questions, when you think about it. So let's start with the last one. As a group, scientists seem to be bizarrely fixated on running statistical tests on everything. In fact, we use statistics so often that we sometimes forget to explain to people why we do. It's a kind of article of faith among scientists -- and especially social scientists -- that your findings can't be trusted until you've done some stats. Undergraduate students might be forgiven for thinking that we're all completely mad, because no-one takes the time to answer one very simple question:\n", "\n", ">*Why do you do statistics? Why don't scientists just use **common sense?***\n", "\n", "It's a naive question in some ways, but most good questions are. There's a lot of good answers to it (including the suggestion that common sense is in short supply among scientists), but for my money, the best answer is a really simple one: we don't trust ourselves enough. We worry that we're human, and susceptible to all of the biases, temptations and frailties that humans suffer from. Much of statistics is basically a safeguard. Using \"common sense\" to evaluate evidence means trusting gut instincts, relying on verbal arguments and on using the raw power of human reason to come up with the right answer. Most scientists don't think this approach is likely to work.\n", "\n", "In fact, come to think of it, this sounds a lot like a psychological question to me, and since I do work in a psychology department, it seems like a good idea to dig a little deeper here. Is it really plausible to think that this \"common sense\" approach is very trustworthy? Verbal arguments have to be constructed in language, and all languages have biases -- some things are harder to say than others, and not necessarily because they're false (e.g., quantum electrodynamics is a good theory, but hard to explain in words). The instincts of our \"gut\" aren't designed to solve scientific problems, they're designed to handle day to day inferences -- and given that biological evolution is slower than cultural change, we should say that they're designed to solve the day to day problems for a *different world* than the one we live in. Most fundamentally, reasoning sensibly requires people to engage in \"induction\", making wise guesses and going beyond the immediate evidence of the senses to make generalisations about the world. If you think that you can do that without being influenced by various distractors, well, I have a bridge in Brooklyn I'd like to sell you. Heck, as the next section shows, we can't even solve \"deductive\" problems (ones where no guessing is required) without being influenced by our pre-existing biases." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The curse of belief bias\n", "\n", "People are mostly pretty smart. We're certainly smarter than the other species that we share the planet with (though many people might disagree). Our minds are quite amazing things, and we seem to be capable of the most incredible feats of thought and reason. That doesn't make us perfect though. And among the many things that psychologists have shown over the years is that we really do find it hard to be neutral, to evaluate evidence impartially and without being swayed by pre-existing biases. A good example of this is the ***belief bias effect*** in logical reasoning: if you ask people to decide whether a particular argument is logically valid (i.e., conclusion would be true if the premises were true), we tend to be influenced by the believability of the conclusion, even when we shouldn't. For instance, here's a valid argument where the conclusion is believable:\n", "\n", ">No cigarettes are inexpensive (Premise 1) \n", ">Some addictive things are inexpensive (Premise 2) \n", ">Therefore, some addictive things are not cigarettes (Conclusion). \n", "\n", "And here's a valid argument where the conclusion is not believable:\n", "\n", ">No addictive things are inexpensive (Premise 1) \n", ">Some cigarettes are inexpensive (Premise 2) \n", ">Therefore, some cigarettes are not addictive (Conclusion)\n", "\n", "The logical *structure* of argument #2 is identical to the structure of argument #1, and they're both valid. However, in the second argument, there are good reasons to think that premise 1 is incorrect, and as a result it's probably the case that the conclusion is also incorrect. But that's entirely irrelevant to the topic at hand: an argument is deductively valid if the conclusion is a logical consequence of the premises. That is, a valid argument doesn't have to involve true statements.\n", "\n", "On the other hand, here's an invalid argument that has a believable conclusion:\n", "\n", ">No addictive things are inexpensive (Premise 1) \n", ">Some cigarettes are inexpensive (Premise 2) \n", ">Therefore, some addictive things are not cigarettes (Conclusion)\n", "\n", "And finally, an invalid argument with an unbelievable conclusion:\n", "\n", ">No cigarettes are inexpensive (Premise 1) \n", ">Some addictive things are inexpensive (Premise 2) \n", ">Therefore, some cigarettes are not addictive (Conclusion)\n", "\n", "Now, suppose that people really are perfectly able to set aside their pre-existing biases about what is true and what isn't, and purely evaluate an argument on its logical merits. We'd expect 100% of people to say that the valid arguments are valid, and 0% of people to say that the invalid arguments are valid. So if you ran an experiment looking at this, you'd expect to see data like this:\n", "\n", "\n", "| | conclusion feels true| conclusion feels false |\n", "|------------------ |:--------------------:|:----------------------:|\n", "|argument is valid |100% say \"valid\" |100% say \"valid\" |\n", "|argument is invalid|0% say \"valid\" |0% say \"valid\" |\n", "\n", "If the psychological data looked like this (or even a good approximation to this), we might feel safe in just trusting our gut instincts. That is, it'd be perfectly okay just to let scientists evaluate data based on their common sense, and not bother with all this murky statistics stuff. However, you guys have taken psych classes, and by now you probably know where this is going...\n", "\n", "In a classic study, Evans, Barston, & Pollard (1983) {cite}`Evans1983` ran an experiment looking at exactly this. What they found is that when pre-existing biases (i.e., beliefs) were in agreement with the structure of the data, everything went the way you'd hope: \n", "\n", "\n", "| | conclusion feels true| conclusion feels false |\n", "|------------------ |:--------------------:|:----------------------:|\n", "|argument is valid |92% say \"valid\" | |\n", "|argument is invalid| |8% say \"valid\" |\n", "\n", "Not perfect, but that's pretty good. But look what happens when our intuitive feelings about the truth of the conclusion run against the logical structure of the argument:\n", "\n", "| | conclusion feels true| conclusion feels false |\n", "|------------------ |:--------------------:|:----------------------:|\n", "|argument is valid |92% say \"valid\" |**46% say \"valid\"** |\n", "|argument is invalid|**92% say \"valid\"** |8% say \"valid\" |\n", "\n", "Oh dear, that's not as good. Apparently, when people are presented with a strong argument that contradicts our pre-existing beliefs, we find it pretty hard to even perceive it to be a strong argument (people only did so 46% of the time). Even worse, when people are presented with a weak argument that agrees with our pre-existing biases, almost no-one can see that the argument is weak (people got that one wrong 92% of the time!)\n", "\n", "If you think about it, it's not as if these data are horribly damning. Overall, people did do better than chance at compensating for their prior biases, since about 60% of people's judgements were correct (you'd expect 50% by chance). Even so, if you were a professional \"evaluator of evidence\", and someone came along and offered you a magic tool that improves your chances of making the right decision from 60% to (say) 95%, you'd probably jump at it, right? Of course you would. Thankfully, we actually do have a tool that can do this. But it's not magic, it's statistics. So that's reason #1 why scientists love statistics. It's just *too easy* for us to \"believe what we want to believe\"; so if we want to \"believe in the data\" instead, we're going to need a bit of help to keep our personal biases under control. That's what statistics does: it helps keep us honest." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The cautionary tale of Simpson's paradox\n", "\n", "The following is a true story (I think...). In 1973, the University of California, Berkeley had some worries about the admissions of students into their postgraduate courses. Specifically, the thing that caused the problem was that the gender breakdown of their admissions, which looked like this...\n", "\n", "| | Number of applicants | Percent admitted |\n", "|-------|:--------------------:|:----------------:|\n", "|Males |8442 |46% |\n", "|Females|4321 |35% |\n", "\n", "...and the were worried about being sued. Given that there were nearly 13,000 applicants, a difference of 9% in admission rates between males and females is just way too big to be a coincidence. Pretty compelling data, right? And if I were to say to you that these data *actually* reflect a weak bias in favour of women (sort of!), you'd probably think that I was either crazy or sexist. \n", "\n", "Oddly, it's actually sort of true ...when people started looking more carefully at the admissions data Bickel, Hammel, & O’Connell (1975) {cite}`Bickel1975` they told a rather different story. Specifically, when they looked at it on a department by department basis, it turned out that most of the departments actually had a slightly *higher* success rate for female applicants than for male applicants. {numref}`tbl:simpsontable` shows the admission figures for the six largest departments (with the names of the departments removed for privacy reasons):" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "application/papermill.record/text/html": "
\n | Male Applicants | \nMale Percent Admitted | \nFemale Applicants | \nFemale Percent admitted | \n
---|---|---|---|---|
Department | \n\n | \n | \n | \n |
A | \n825 | \n62% | \n108 | \n82% | \n
B | \n560 | \n63% | \n25 | \n68% | \n
C | \n325 | \n37% | \n593 | \n34% | \n
D | \n417 | \n33% | \n375 | \n35% | \n
E | \n191 | \n28% | \n393 | \n24% | \n
F | \n272 | \n6% | \n341 | \n7% | \n