name: title class: middle, center, dark # Samples, populations, and sampling --- class: light # The problem 1. Psychology interpets patterns in data to draw conclusions about psychological processes -- 2. Chance can produce "patterns" in data -- 3. **Problem**: How can we know if the pattern is real, or simply a random accident produced by chance? --- class: light # Steps to a solution 1. Need to understand what chance is -- 2. Need to find out what chance can actually do in a particular situation -- 3. Create tools to help us determine whether chance was likely or unlikely to produce patterns in the data --- class: light # Issues for this class 1. **Probability Basics** 2. **Aributions** 3. **Sampling from distributions** --- class: light, center, middle, clear # Probability Basics --- class: light # What is a probability? - A number bounded between 0 and 1 - Describes the "chances" or "likelihood" of an event occuring --- class: light # Two probability statements - A coin has a 50% chance of landing heads - p(heads) = .5 -- - There is a 10% chance of rain tomorrow - p(rain tomorrow) = .1 --- class: light # Flipping a coin 100 times ![](4a_Distribution_files/figure-html/unnamed-chunk-2-1.png)<!-- --> --- class: light # Four simulations ![](4a_Distribution_files/figure-html/unnamed-chunk-3-1.png)<!-- --> --- class: light # Flipping a coin 10000 times ![](4a_Distribution_files/figure-html/unnamed-chunk-4-1.png)<!-- --> --- class: light # coin flipping summary 1. 50% heads/tails means that **over the long run**, you should get half heads and half tails 2. When sample size (number of flips) is small, you can "randomly" get more or less than 50% heads --- class: light # Discrete probability distributions 1. Defines the probability of each possible outcome/event in a set. 2. All probabilities must add up to 1 --- class: light # Coin flipping distribution ![](4a_Distribution_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- class: light, center, middle, clear # What can the coin flipping distribution do? --- class: light # 10 sets of 10 flips ![](4a_Distribution_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- class: light # simulating 10 flips many times Steps: 1. Flip a coin 10 times 2. count the number of heads, save the number 3. Repeat above experiment 1,000 times or more 4. plot the histogram of number of heads --- class: light # Distribution of heads (10 flips; "sample distribution") ![](4a_Distribution_files/figure-html/unnamed-chunk-7-1.png)<!-- --> --- class: light # Summary of simulation 1. Chance produces a range of outcomes (number of heads out of 10) 2. Chance most frequently produces 5 heads and 5 tails 3. Chance produces more extreme outcomes with increasingly less frequency (lower probability) 4. E.g., chance is very unlikely to produce 9 out of 10 heads --- class: light, center, middle, clear # Distributions --- class: light # Probability density functions A tool to define the chances of getting particular values <img src="figs-crump/distribution/d1.png" width="80%" /> --- class: light # Area under the curve <img src="figs-crump/distribution/d2.png" width="80%" /> --- class: light # Interpreting densities <img src="figs-crump/distribution/d3.png" width="80%" /> --- class: light # Single values <img src="figs-crump/distribution/d4.png" width="80%" /> --- class: light # Probability ranges <img src="figs-crump/distribution/d5.png" width="80%" /> --- class: light # Uniform Distribution Definition: 1. All numbers in a particular range have an equal (uniform) chance of occuring --- class: light # Uniform Distribution ![](4a_Distribution_files/figure-html/unnamed-chunk-14-1.png)<!-- --> --- class: light # Sampling from a uniform Python let's you sample numbers from a uniform distribution ``` np.random.uniform(0,1,3 ) ``` ``` array([0.06567211, 0.83317117, 0.77310506]) ``` ``` np.random.uniform(0,10,3) ``` ``` array([3.23770171, 3.47811007, 7.11685081]) ``` --- class: light # looking at samples ![](figs-crump/distribution/sampleUnifExpected-1.gif)<!-- --> --- class: light # Random samples are not all the same <img src="figs-crump/distribution/5manysamples-1.png" width="90%" /> --- class: light # Samples estimate the distribution 1. Samples are sets of numbers taken from a distribution -- 2. **Samples become more like the distribution they came from, as sample size (N) increases** --- class: light # Uniform: N=100 <img src="figs-crump/distribution/5unifsamp100-1.png" width="80%" /> --- class: light # Uniform: N=1,000 <img src="figs-crump/distribution/5unifsamp1000-1.png" width="80%" /> --- class: light # Uniform: N=100,000 <img src="figs-crump/distribution/5sampunifALOT-1.png" width="80%" /> --- class: light, center, middle, clear # Some questions --- class: light # Samples and distributions How do samples relate to distributions? -- - Samples come from distributions -- - Samples approximate the distribution they came from as sample-size increases --- class: light # Is my sample likely? Let's say you take a sample of numbers from a distribution. 1. Is your sample representative of the population? 2. Was your sample likely (you would usually get a sample like this), or unlikely (you got a weird sample, usually you would not get a sample like this) --- class: light # Simulation and sampling How can we know if a sample we obtained is "normal", or "weird"? -- - We can find out by simulating the process of sampling. - We sample some numbers, measure the sample, then repeat - We can now look at how our measurement of the sample behaves --- class: light # Animation of sample mean ![](figs-crump/distribution/sampleHistUnif-1.gif)<!-- --> --- class: light # What to notice - The histogram shows that each sample is different - But, the mean of each sample is always around 5.5 - We have measured a property of the sample (the mean), for each sample. --- class: light # Something Curious If we repeat this 10K times, producing 10K sample means... <img src="figs-crump/distribution/4unifmany-1.png" width="592" /> --- class: light, center, middle, clear # Key graph <img src="figs-crump/distribution/pop_samp.png" width="545" /> --- class: light, center, middle, clear # Samples and populations --- class: light # Samples and populations - Population: A defined set of things - Sample: a subset of the population --- class: light # Random Sampling - A process for generating a sample (taking things from a population) -- - Good sampling ensure that each value in a sample is drawn **independently** from other values --- class: light # Example: Sampling heights of people Let's say we wanted to know something about how tall people are. We can't measure the entire population (it's too big). So we take a sample. -- What would happen if: 1. We only measured really tall people (biased sample) -- 2. We randomly measured a bunch of people? --- class: light # Population statistics Populations have statistics. For example, The population of all people has: 1. A distributions of heights 2. The distribution has a mean (mean height of all people) 3. The distribution has a standard deviation --- class: light # The population problem In the real world, we usually do not have all of the data for the entire population. So, we never actually know: 1. The population distribution 2. The population mean 3. The population standard deviation, etc. --- class: light # The sampling solution Unknown: The population Solution: Take a sample of the population 1. Samples will tend to look the population they came from, especially when sample-size (N) is large. 2. We can use the sample to **estimate** the population. --- class: light # The sampling problem We take samples, and use them to estimate things. This works well when we have large, representative samples. But, how do we know if the sample we obtained is "normal", or happens to be "weird"? Solution: We need to learn how the process of sampling works. We can use python to simulate the process of sampling. Then we can see how samples behave. --- class: light # Samples become more like population - As sample-size increases, the sample becomes more like the population. - As sample N approaches the population N, the sample becomes the population. --- class: light # Law of large numbers - As sample-size increases, properties of the sample become more like properties of the population Example: - As sample-size increases, the mean of the sample becomes more like the mean of the population --- class: light # Simulation: Population mean=100 ![](4b_Sampling_files/figure-html/unnamed-chunk-2-1.png)<!-- --> --- class: light # The sampling problem We take samples, and use them to estimate things. This works well when we have large, representative samples. **But, how do we know if the sample we obtained is "normal", or happens to be "weird"?** Solution: **Sampling Distributions** --- class: light, center, middle, clear # Sampling distributions --- class: light # What are sampling distributions? - Definition: The distribution of a sample statistic - Example: - Many samples are drawn from the same distribution - A statistic (e.g., mean, standard deviation) is computed for each sample, and saved - The sampling distribution is the distribution of the measured statistic for each sample - Sampling distributions can be simulated in python --- class: light # Begin with a distribution ![](4b_Sampling_files/figure-html/unnamed-chunk-3-1.png)<!-- --> --- class: light # Take many samples Save a sample statistic (e.g., mean) for each sample ![](figs-crump/distribution/sampleHistUnif-1.gif)<!-- --> --- class: light # Plot distribution of sample statistic <img src="figs-crump/distribution/4unifmany-1.png" width="592" /> --- class: light # Sampling distribution is bell-shaped Notice that the sampling distribution of the mean is bell-shaped, also called a **Normal Distribution**. It doesn't matter if the original distribution is bell-shaped, the sample mean will always be bell-shaped (**central limit theorem**) This allows us to see if a pattern in our sample is "typical" or not, given distributions expected by chance <img src="figs-crump/distribution/4unifmany-1.png" width="50%" /> --- class: light Thanks to Matt Crump and Todd Gureckis for slides.