{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# In Class Activity - Hypothesis testing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{note}\n", "This exercise authored by Brenden Lake and is released under the [license](/LICENSE.html).\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The chapter discussed some basic issues in hypothesis testing. In this notebook, you will explore these ideas in Python." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing packages" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import numpy.random as npr\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "import scipy.stats as stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem 0: Critical regions and critical values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's recreate the example from Section 10.4.1, which talks about computing the critical region for the ESP study. Here are some of the details, but I suggest you review Section 10.4.1 first.\n", "\n", "Our population parameter $\\theta$ is just the overall probability that people respond correctly when asked to guess which side of the card is displayed in the ESP study. Our test statistic $X$ is the *count* of the number of people who did so, out of a sample size of $N$. This is exactly what the [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution) describes! So, we would say that the null hypothesis predicts that $X$ is binomially distributed, which is written\n", "$\n", "X \\sim \\mbox{Binomial}(\\theta,N).\n", "$\n", "Since the null hypothesis states that $\\theta = 0.5$, and our experiment has $N=100$ people, we can use this as a sampling distribution.\n", "\n", "Your goal here is to compute the critical region for $\\alpha = .05$ (the type-I error rate) using sampling. Thus, you should randomly generate 10,000 experiments, each with $N=100$ coin flips, and record the test statistic $X$ for each simulated experiment. Your goal is to find the lower and upper \"critical values\" given your sampling distribution (See figure below for a visualization. Your simulations should roughly match this plot).\n", "\n", "You may find the following functions helpful: `np.random.binomial`, `np.sort`, and `sns.displot`\n", "" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Your answer goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem 1: Critical values and sample size\n", "\n", "Copy the code above and see how the location of the critical values changes as a function of the number of simulated participants in an experiment. Thus, rather than $N=100$, what are the critical values when $N=200$ or $N=500$? You will want to use $X/N$ as your sample statistic instead (percent of participants that get it right) so you can compare more easily across different numbers of participants." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Your answer goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem 2: Binomial test\n", "\n", "Check your simulated critical values in Problem 0 to see how they correspond with an exact binomial test. To do so, check out the scipy function `stats.binom_test`" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Your answer here" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }