{ "cells": [ { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [ "remove_cell" ] }, "outputs": [], "source": [ "from IPython.display import display, HTML, Markdown\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# A brief introduction to research design" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{note}\n", "This chapter is adapted from Danielle Navarro's excellent [Learning Statistics with R](https://learningstatisticswithr.com) book {cite}`Navarro2011`. The main text has mainly be left intact with a few modifications adapting the personal elements that make more sense applied to the instructor of the course, also the code adapted to use Python and Jupyter.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">\"To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.\"\n", ">\n", ">-- Sir Ronald Fisher [^quote1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[^quote1]: Presidential Address to the First Indian Statistical Congress, 1938. Source: [Wikiquote](http://en.wikiquote.org/wiki/Ronald_Fisher)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this chapter, we're going to start thinking about the basic ideas that go into designing a study, collecting data, checking whether your data collection works, and so on. It won't give you enough information to allow you to design studies of your own, but it will give you a lot of the basic tools that you need to assess the studies done by other people. However, since the focus of this book is much more on data analysis than on data collection, I'm only giving a very brief overview. Note that this chapter is \"special\" in two ways. Firstly, it's much more psychology-specific than the later chapters. Secondly, it focuses much more heavily on the scientific problem of research methodology, and much less on the statistical problem of data analysis. Nevertheless, the two problems are related to one another, so it's traditional for stats textbooks to discuss the problem in a little detail." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{admonition} Sourcing of ideas\n", ":class: tip\n", "This chapter relies heavily on {cite}`Campbell1963` for the discussion of study design, and {cite}`Stevens1946` for the discussion of scales of measurement. Later versions will attempt to be more precise in the citations. Although this chapter was built from the classic little book by {cite}`Campbell1963`, but there are of course a large number of textbooks out there on research design. Spend a few minutes with your favourite search engine and you'll find dozens.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction to psychological measurement\n", "\n", "The first thing to understand is data collection can be thought of as a kind of ***measurement***. That is, what we're trying to do here is measure something about human behaviour or the human mind. What do I mean by \"measurement\"? " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Some thoughts about psychological measurement\n", "\n", "Measurement itself is a subtle concept, but basically it comes down to finding some way of assigning numbers, or labels, or some other kind of well-defined descriptions to \"stuff\". So, any of the following would count as a psychological measurement:\n", "\n", "\n", "- My **age** is *41 years*.\n", "- I *do not* **like the global pandemic**.\n", "- My **chromosomal gender** is *male*. \n", "- My **self-identified gender** is *male*. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the short list above, the **bolded part** is \"the thing to be measured\", and the *italicised part* is \"the measurement itself\". In fact, we can expand on this a little bit, by thinking about the set of possible measurements that could have arisen in each case:\n", "\n", "\n", "- My **age** (in years) could have been *0, 1, 2, 3 ...*, etc. The upper bound on what my age could possibly be is a bit fuzzy, but in practice you'd be safe in saying that the largest possible age is *150*, since no human has ever lived that long.\n", "- When asked if I **like anchovies**, I might have said that *I do*, or *I do not*, or *I have no opinion*, or *I sometimes do*. \n", "- My **chromosomal gender** is almost certainly going to be *male (XY)* or *female (XX)*, but there are a few other possibilities. I could also have *Klinefelter's syndrome (XXY)*, which is more similar to male than to female. And I imagine there are other possibilities too.\n", "- My **self-identified gender** is also very likely to be *male* or *female*, but it doesn't have to agree with my chromosomal gender. I may also choose to identify with *neither*, or to explicitly call myself *transgender*.\n", "\n", "As you can see, for some things (like age) it seems fairly obvious what the set of possible measurements should be, whereas for other things it gets a bit tricky. But I want to point out that even in the case of someone's age, it's much more subtle than this. For instance, in the example above, I assumed that it was okay to measure age in years. But if you're a developmental psychologist, that's way too crude, and so you often measure age in *years and months* (if a child is 2 years and 11 months, this is usually written as \"2;11\"). If you're interested in newborns, you might want to measure age in *days since birth*, maybe even *hours since birth*. In other words, the way in which you specify the allowable measurement values is important. \n", "\n", "Looking at this a bit more closely, you might also realise that the concept of \"age\" isn't actually all that precise. In general, when we say \"age\" we implicitly mean \"the length of time since birth\". But that's not always the right way to do it. Suppose you're interested in how newborn babies control their eye movements. If you're interested in kids that young, you might also start to worry that \"birth\" is not the only meaningful point in time to care about. If Baby Alice is born 3 weeks premature and Baby Bianca is born 1 week late, would it really make sense to say that they are the \"same age\" if we encountered them \"2 hours after birth\"? In one sense, yes: by social convention, we use birth as our reference point for talking about age in everyday life, since it defines the amount of time the person has been operating as an independent entity in the world, but from a scientific perspective that's not the only thing we care about. When we think about the biology of human beings, it's often useful to think of ourselves as organisms that have been growing and maturing since conception, and from that perspective Alice and Bianca aren't the same age at all. So you might want to define the concept of \"age\" in two different ways: the length of time since conception, and the length of time since birth. When dealing with adults, it won't make much difference, but when dealing with newborns it might.\n", " \n", "Moving beyond these issues, there's the question of methodology. What specific \"measurement method\" are you going to use to find out someone's age? As before, there are lots of different possibilities:\n", "\n", "\n", "- You could just ask people \"how old are you?\" The method of self-report is fast, cheap and easy, but it only works with people old enough to understand the question, and some people lie about their age.\n", "- You could ask an authority (e.g., a parent) \"how old is your child?\" This method is fast, and when dealing with kids it's not all that hard since the parent is almost always around. It doesn't work as well if you want to know \"age since conception\", since a lot of parents can't say for sure when conception took place. For that, you might need a different authority (e.g., an obstetrician). \n", "- You could look up official records, like birth certificates. This is time consuming and annoying, but it has its uses (e.g., if the person is now dead). \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Operationalisation: defining your measurement\n", "\n", "All of the ideas discussed in the previous section all relate to the concept of ***operationalisation***. To be a bit more precise about the idea, operationalisation is the process by which we take a meaningful but somewhat vague concept, and turn it into a precise measurement. The process of operationalisation can involve several different things:\n", "\n", "\n", "- Being precise about what you are trying to measure. For instance, does \"age\" mean \"time since birth\" or \"time since conception\" in the context of your research?\n", "- Determining what method you will use to measure it. Will you use self-report to measure age, ask a parent, or look up an official record? If you're using self-report, how will you phrase the question? \n", "- Defining the set of the allowable values that the measurement can take. Note that these values don't always have to be numerical, though they often are. When measuring age, the values are numerical, but we still need to think carefully about what numbers are allowed. Do we want age in years, years and months, days, hours? Etc. For other types of measurements (e.g., gender), the values aren't numerical. But, just as before, we need to think about what values are allowed. If we're asking people to self-report their gender, what options to we allow them to choose between? Is it enough to allow only \"male\" or \"female\"? Do you need an \"other\" option? Or should we not give people any specific options, and let them answer in their own words? And if you open up the set of possible values to include all verbal response, how will you interpret their answers?\n", "\n", " \n", "Operationalisation is a tricky business, and there's no \"one, true way\" to do it. The way in which you choose to operationalise the informal concept of \"age\" or \"gender\" into a formal measurement depends on what you need to use the measurement for. Often you'll find that the community of scientists who work in your area have some fairly well-established ideas for how to go about it. In other words, operationalisation needs to be thought through on a case by case basis. Nevertheless, while there are a lot of issues that are specific to each individual research project, there are some aspects to it that are pretty general. \n", "\n", "Before moving on, I want to take a moment to clear up our terminology, and in the process introduce one more term. Here are four different things that are closely related to each other:\n", "\n", "\n", "- ***A theoretical construct***. This is the thing that you're trying to take a measurement of, like \"age\", \"gender\" or an \"opinion\". A theoretical construct can't be directly observed, and often they're actually a bit vague. \n", "- ***A measure***. The measure refers to the method or the tool that you use to make your observations. A question in a survey, a behavioural observation or a brain scan could all count as a measure. \n", "- ***An operationalisation***. The term \"operationalisation\" refers to the logical connection between the measure and the theoretical construct, or to the process by which we try to derive a measure from a theoretical construct.\n", "- ***A variable***. Finally, a new term. A variable is what we end up with when we apply our measure to something in the world. That is, variables are the actual \"data\" that we end up with in our data sets.\n", "\n", "\n", "\n", "In practice, even scientists tend to blur the distinction between these things, but it's very helpful to try to understand the differences." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scales of measurement\n", "\n", "As the previous section indicates, the outcome of a psychological measurement is called a variable. But not all variables are of the same qualitative type, and it's very useful to understand what types there are. A very useful concept for distinguishing between different types of variables is what's known as ***scales of measurement***. \n", "\n", "\n", "### Nominal scale\n", "\n", "A **_nominal scale_** variable (also referred to as a ***categorical*** variable) is one in which there is no particular relationship between the different possibilities: for these kinds of variables it doesn't make any sense to say that one of them is \"bigger' or \"better\" than any other one, and it absolutely doesn't make any sense to average them. The classic example for this is \"eye colour\". Eyes can be blue, green and brown, among other possibilities, but none of them is any \"better\" than any other one. As a result, it would feel really weird to talk about an \"average eye colour\". Similarly, gender is nominal too: male isn't better or worse than female, neither does it make sense to try to talk about an \"average gender\". In short, nominal scale variables are those for which the only thing you can say about the different possibilities is that they are different. That's it.\n", "\n", "Let's take a slightly closer look at this. Suppose I was doing research on how people commute to and from work. One variable I would have to measure would be what kind of transportation people use to get to work. This \"transport type\" variable could have quite a few possible values, including: \"train\", \"bus\", \"car\", \"bicycle\", etc. For now, let's suppose that these four are the only possibilities, and suppose that when I ask 100 people how they got to work today, and I get this:\n", " \n", "|Transportation|Number of people|\n", "|:-:|:-:|\n", "| (1) Train | 12|\n", "| (2) Bus | 30|\n", "| (3) Car | 48|\n", "| (4) Bicycle | 10|\n", " \n", "So, what's the average transportation type? Obviously, the answer here is that there isn't one. It's a silly question to ask. You can say that travel by car is the most popular method, and travel by train is the least popular method, but that's about all. Similarly, notice that the order in which I list the options isn't very interesting. I could have chosen to display the data like this\n", " \n", "\n", " |Transportation|Number of people|\n", "|:-:|:-:|\n", "| (3) Car | 48|\n", "| (1) Train | 12|\n", "| (4) Bicycle | 10|\n", "| (2) Bus | 30|\n", "\n", "and nothing really changes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Ordinal scale\n", "\n", "**_Ordinal scale_** variables have a bit more structure than nominal scale variables, but not by a lot. An ordinal scale variable is one in which there is a natural, meaningful way to order the different possibilities, but you can't do anything else. The usual example given of an ordinal variable is \"finishing position in a race\". You *can* say that the person who finished first was faster than the person who finished second, but you *don't* know how much faster. As a consequence we know that 1st > 2nd, and we know that 2nd > 3rd, but the difference between 1st and 2nd might be much larger than the difference between 2nd and 3rd.\n", "\n", "Here's a more psychologically interesting example. Suppose I'm interested in people's attitudes to climate change, and I ask them to pick one of these four statements that most closely matches their beliefs:\n", "\n", ">(1) Temperatures are rising, because of human activity\n", ">(2) Temperatures are rising, but we don't know why\n", ">(3) Temperatures are rising, but not because of humans\n", ">(4) Temperatures are not rising\n", "\n", "Notice that these four statements actually do have a natural ordering, in terms of \"the extent to which they agree with the current science\". Statement 1 is a close match, statement 2 is a reasonable match, statement 3 isn't a very good match, and statement 4 is in strong opposition to the science. So, in terms of the thing I'm interested in (the extent to which people endorse the science), I can order the items as 1 > 2 > 3 > 4. Since this ordering exists, it would be very weird to list the options like this...\n", "\n", ">(3) Temperatures are rising, but not because of humans\n", ">(1) Temperatures are rising, because of human activity\n", ">(4) Temperatures are not rising\n", ">(2) Temperatures are rising, but we don't know why \n", "\n", "... because it seems to violate the natural \"structure\" to the question. \n", "\n", "So, let's suppose I asked 100 people these questions, and got the following answers:\n", "\n", "|Response | Number|\n", "|-------- |:-----:|\n", "|(1) Temperatures are rising, because of human activity | 51 |\n", "|(2) Temperatures are rising, but we don't know why | 20 |\n", "|(3) Temperatures are rising, but not because of humans | 10 |\n", "|(4) Temperatures are not rising | 19 |\n", "\n", "When analysing these data, it seems quite reasonable to try to group (1), (2) and (3) together, and say that 81 of 100 people were willing to *at least partially* endorse the science. And it's *also* quite reasonable to group (2), (3) and (4) together and say that 49 of 100 people registered *at least some disagreement* with the dominant scientific view. However, it would be entirely bizarre to try to group (1), (2) and (4) together and say that 90 of 100 people said... what? There's nothing sensible that allows you to group those responses together at all.\n", "\n", "That said, notice that while we *can* use the natural ordering of these items to construct sensible groupings, what we *can't* do is average them. For instance, in my simple example here, the \"average\" response to the question is 1.97. If you can tell me what that means, I'd love to know. Because that sounds like gibberish to me!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Interval scale\n", "\n", "In contrast to nominal and ordinal scale variables, **_interval scale_** and ratio scale variables are variables for which the numerical value is genuinely meaningful. In the case of interval scale variables, the *differences* between the numbers are interpretable, but the variable doesn't have a \"natural\" zero value. A good example of an interval scale variable is measuring temperature in degrees celsius. For instance, if it was 15$^\\circ$ yesterday and 18$^\\circ$ today, then the 3$^\\circ$ difference between the two is genuinely meaningful. Moreover, that 3$^\\circ$ difference is *exactly the same* as the 3$^\\circ$ difference between 7$^\\circ$ and 10$^\\circ$. In short, addition and subtraction are meaningful for interval scale variables.\n", "\n", "However, notice that the 0$^\\circ$ does not mean \"no temperature at all\": it actually means \"the temperature at which water freezes\", which is pretty arbitrary. As a consequence, it becomes pointless to try to multiply and divide temperatures. It is wrong to say that $20^\\circ$ is *twice as hot* as 10$^\\circ$, just as it is weird and meaningless to try to claim that 20$^\\circ$ is negative two times as hot as -10$^\\circ$. \n", "\n", "Again, lets look at a more psychological example. Suppose I'm interested in looking at how the attitudes of first-year university students have changed over time. Obviously, I'm going to want to record the year in which each student started. This is an interval scale variable. A student who started in 2003 did arrive 5 years before a student who started in 2008. However, it would be completely insane for me to divide 2008 by 2003 and say that the second student started \"1.0024 times later\" than the first one. That doesn't make any sense at all.\n", "\n", "### Ratio scale\n", "\n", "The fourth and final type of variable to consider is a ***ratio scale*** variable, in which zero really means zero, and it's okay to multiply and divide. A good psychological example of a ratio scale variable is response time (RT). In a lot of tasks it's very common to record the amount of time somebody takes to solve a problem or answer a question, because it's an indicator of how difficult the task is. Suppose that Alan takes 2.3 seconds to respond to a question, whereas Ben takes 3.1 seconds. As with an interval scale variable, addition and subtraction are both meaningful here. Ben really did take 3.1 - 2.3 = 0.8 seconds longer than Alan did. However, notice that multiplication and division also make sense here too: Ben took 3.1 / 2.3 = 1.35 times as long as Alan did to answer the question. And the reason why you can do this is that, for a ratio scale variable such as RT, \"zero seconds\" really does mean \"no time at all\".\n", "\n", "### Continuous versus discrete variables\n", "\n", "There's a second kind of distinction that you need to be aware of, regarding what types of variables you can run into. This is the distinction between continuous variables and discrete variables. The difference between these is as follows:\n", "\n", "\n", "- A ***continuous variable*** is one in which, for any two values that you can think of, it's always logically possible to have another value in between. \n", "- A ***discrete variable*** is, in effect, a variable that isn't continuous. For a discrete variable, it's sometimes the case that there's nothing in the middle.\n", "\n", "\n", "These definitions probably seem a bit abstract, but they're pretty simple once you see some examples. For instance, response time is continuous. If Alan takes 3.1 seconds and Ben takes 2.3 seconds to respond to a question, then it's possible for Cameron's response time to lie in between, by taking 3.0 seconds. And of course it would also be possible for David to take 3.031 seconds to respond, meaning that his RT would lie in between Cameron's and Alan's. And while in practice it might be impossible to measure RT that precisely, it's certainly possible in principle. Because we can always find a new value for RT in between any two other ones, we say that RT is continuous. \n", "\n", "Discrete variables occur when this rule is violated. For example, nominal scale variables are always discrete: there isn't a type of transportation that falls \"in between\" trains and bicycles, not in the strict mathematical way that 2.3 falls in between 2 and 3. So transportation type is discrete. Similarly, ordinal scale variables are always discrete: although \"2nd place\" does fall between \"1st place\" and \"3rd place\", there's nothing that can logically fall in between \"1st place\" and \"2nd place\". Interval scale and ratio scale variables can go either way. As we saw above, response time (a ratio scale variable) is continuous. Temperature in degrees celsius (an interval scale variable) is also continuous. However, the year you went to school (an interval scale variable) is discrete. There's no year in between 2002 and 2003. The number of questions you get right on a true-or-false test (a ratio scale variable) is also discrete: since a true-or-false question doesn't allow you to be \"partially correct\", there's nothing in between 5/10 and 6/10. Table 1 summarises the relationship between the scales of measurement and the discrete/continuity distinction. Cells with a tick mark correspond to things that are possible. I'm trying to hammer this point home, because (a) some textbooks get this wrong, and (b) people very often say things like \"discrete variable\" when they mean \"nominal scale variable\". It's very unfortunate.\n", "\n", "\n", "---\n", "\n", "