{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Answers - In Class Activity - Sampling"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{note}\n",
    "This exercise authored by Todd Gureckis and Brenden Lake, and is released under the [license](/LICENSE.html).\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the chapter, we discussed some basic issues in sampling. In this notebook, you will explore some handy python methods for sampling and consider the implications of sampling on what you understand about some target group (i.e., what you can generalize)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Importing and using existing functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import numpy.random as npr\n",
    "import pandas as pd\n",
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 0: Seeding a random number generator"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When we use the computer to play with random numbers (or random samples), we aren't actually using random numbers.  Generally speaking your computer is a deterministic machine so it is unable to make truely random numbers.  Intead the numbers your computer gives you are known as pseudo-random because they have many of the properties we want from random numbers but are not exactly and entirely random.\n",
    "\n",
    "Anytime we use random numbers in a script, simulation, or analysis it is important to \"seed\" the random number generator.  This initialized the random number generator function to a particularly \"state\" and this makes the number in the script random but repeatable.  \n",
    "\n",
    "Let's experiment with this.  First try running the following cell and seeing what the output is.  Try running it multiple times and seeing how the numbers change."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([5, 7, 6, 8, 4, 3, 6, 9, 9, 8])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "npr.randint(0,10,10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now run this cell:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[9 4 0 1 9 0 1 8 9 0]\n",
      "[8 6 4 3 0 4 6 8 1 8]\n"
     ]
    }
   ],
   "source": [
    "npr.seed(10)\n",
    "print(npr.randint(0,10,10))\n",
    "print(npr.randint(0,10,10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, try repeating the cell execution over and over.  What do you observe?\n",
    "\n",
    "Try restarting the kernel and run the cell again.  What do you notice?  Compare to other people in your group.  Also change the argument to `npr.seed()` and see what happens."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Answer 0 here:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[5 6 8 6 1 6 4 8 1 8]\n",
      "[5 1 0 8 8 8 2 6 8 1]\n"
     ]
    }
   ],
   "source": [
    "## Enter solution here\n",
    "npr.seed(9)\n",
    "print(npr.randint(0,10,10))\n",
    "print(npr.randint(0,10,10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Bottom line:  Always seed the random number generator at the start of any script that uses random numbers so your code is more repeatable."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 1: Sampling from a finite population\n",
    "\n",
    "Imagine I create a list with 100 randomly determined values as below.  Using the web, research the the numpy random `choice()` function.  Use it generate a random sample of size 10 from this data.  Do it twice, once with replacement and once without replacement."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_data = np.array([75, 25, 59, 63, 48, 29,  3, 17, 68, 39,  9, 62, 61, 52, 64, 45, 90,\n",
    "       87,  0, 42, 26, 52, 22, 25, 20, 22, 81, 25, 48, 79, 37,  6, 33, 30,\n",
    "       81,  5, 37, 85, 65,  0, 27, 40, 96, 67, 77, 29, 32, 25,  4, 53, 46,\n",
    "        7, 51, 65, 46, 91, 60, 52, 93, 26,  2, 42, 18, 19, 97, 45, 78, 33,\n",
    "       25, 30, 97, 96, 99, 32, 86, 43, 81, 83, 51, 81, 36, 29,  2, 33, 95,\n",
    "       39, 79,  1, 80, 17, 50, 38,  1, 98, 30, 89, 93, 27, 43, 30])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Answer 1 here:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[60 30 62 85 42 39 36 60 53 19]\n",
      "[52 25 43 45 30  1 68  7 81 81]\n"
     ]
    }
   ],
   "source": [
    "## Enter solution here\n",
    "print(npr.choice(my_data,size=10,replace=True))\n",
    "print(npr.choice(my_data,size=10,replace=False))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 2: Sampling from a data frame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sometimes what we are interested in is sampling from a pandas dataframe rather than a list or numpy array.  Why might we want to sample from a dataset? One is to randomly select a subset of the data, for a training vs. test split, if we are doing machine learning projects on the data (we'll talk about this later).  Another is if there are too many records to analyze so it makes sense to randomly select a subset and analyze those.  Another is to repeatedly sample over and over again from a dataset to do a statistical method called \"boostrapping\" (https://en.wikipedia.org/wiki/Bootstrapping_(statistics))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This code loads an example pandas dataset of different penguins.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "penguins_df = sns.load_dataset('penguins')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.5</td>\n",
       "      <td>17.4</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>36.7</td>\n",
       "      <td>19.3</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "0  Adelie  Torgersen            39.1           18.7              181.0   \n",
       "1  Adelie  Torgersen            39.5           17.4              186.0   \n",
       "2  Adelie  Torgersen            40.3           18.0              195.0   \n",
       "3  Adelie  Torgersen             NaN            NaN                NaN   \n",
       "4  Adelie  Torgersen            36.7           19.3              193.0   \n",
       "\n",
       "   body_mass_g     sex  \n",
       "0       3750.0    Male  \n",
       "1       3800.0  Female  \n",
       "2       3250.0  Female  \n",
       "3          NaN     NaN  \n",
       "4       3450.0  Female  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 344 entries, 0 to 343\n",
      "Data columns (total 7 columns):\n",
      " #   Column             Non-Null Count  Dtype  \n",
      "---  ------             --------------  -----  \n",
      " 0   species            344 non-null    object \n",
      " 1   island             344 non-null    object \n",
      " 2   bill_length_mm     342 non-null    float64\n",
      " 3   bill_depth_mm      342 non-null    float64\n",
      " 4   flipper_length_mm  342 non-null    float64\n",
      " 5   body_mass_g        342 non-null    float64\n",
      " 6   sex                333 non-null    object \n",
      "dtypes: float64(4), object(3)\n",
      "memory usage: 18.9+ KB\n"
     ]
    }
   ],
   "source": [
    "penguins_df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Research the pandas `sample()` method and randomly sample 20 penguins from the dataframe."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Answer 2a here:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>329</th>\n",
       "      <td>Gentoo</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>48.1</td>\n",
       "      <td>15.1</td>\n",
       "      <td>209.0</td>\n",
       "      <td>5500.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>190</th>\n",
       "      <td>Chinstrap</td>\n",
       "      <td>Dream</td>\n",
       "      <td>46.9</td>\n",
       "      <td>16.6</td>\n",
       "      <td>192.0</td>\n",
       "      <td>2700.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>207</th>\n",
       "      <td>Chinstrap</td>\n",
       "      <td>Dream</td>\n",
       "      <td>52.2</td>\n",
       "      <td>18.8</td>\n",
       "      <td>197.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>255</th>\n",
       "      <td>Gentoo</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>48.4</td>\n",
       "      <td>16.3</td>\n",
       "      <td>220.0</td>\n",
       "      <td>5400.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>268</th>\n",
       "      <td>Gentoo</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>44.9</td>\n",
       "      <td>13.3</td>\n",
       "      <td>213.0</td>\n",
       "      <td>5100.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>224</th>\n",
       "      <td>Gentoo</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>47.6</td>\n",
       "      <td>14.5</td>\n",
       "      <td>215.0</td>\n",
       "      <td>5400.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>138</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Dream</td>\n",
       "      <td>37.0</td>\n",
       "      <td>16.5</td>\n",
       "      <td>185.0</td>\n",
       "      <td>3400.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>271</th>\n",
       "      <td>Gentoo</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>48.5</td>\n",
       "      <td>14.1</td>\n",
       "      <td>220.0</td>\n",
       "      <td>5300.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>342</th>\n",
       "      <td>Gentoo</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>45.2</td>\n",
       "      <td>14.8</td>\n",
       "      <td>212.0</td>\n",
       "      <td>5200.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>60</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>35.7</td>\n",
       "      <td>16.9</td>\n",
       "      <td>185.0</td>\n",
       "      <td>3150.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>322</th>\n",
       "      <td>Gentoo</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>47.2</td>\n",
       "      <td>15.5</td>\n",
       "      <td>215.0</td>\n",
       "      <td>4975.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>185</th>\n",
       "      <td>Chinstrap</td>\n",
       "      <td>Dream</td>\n",
       "      <td>51.0</td>\n",
       "      <td>18.8</td>\n",
       "      <td>203.0</td>\n",
       "      <td>4100.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>122</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.2</td>\n",
       "      <td>17.0</td>\n",
       "      <td>176.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>279</th>\n",
       "      <td>Gentoo</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>50.4</td>\n",
       "      <td>15.3</td>\n",
       "      <td>224.0</td>\n",
       "      <td>5550.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>267</th>\n",
       "      <td>Gentoo</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>50.5</td>\n",
       "      <td>15.9</td>\n",
       "      <td>225.0</td>\n",
       "      <td>5400.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>334</th>\n",
       "      <td>Gentoo</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>46.2</td>\n",
       "      <td>14.1</td>\n",
       "      <td>217.0</td>\n",
       "      <td>4375.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>57</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>40.6</td>\n",
       "      <td>18.8</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>106</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>38.6</td>\n",
       "      <td>17.2</td>\n",
       "      <td>199.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>311</th>\n",
       "      <td>Gentoo</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>52.2</td>\n",
       "      <td>17.1</td>\n",
       "      <td>228.0</td>\n",
       "      <td>5400.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "329     Gentoo     Biscoe            48.1           15.1              209.0   \n",
       "190  Chinstrap      Dream            46.9           16.6              192.0   \n",
       "207  Chinstrap      Dream            52.2           18.8              197.0   \n",
       "255     Gentoo     Biscoe            48.4           16.3              220.0   \n",
       "268     Gentoo     Biscoe            44.9           13.3              213.0   \n",
       "224     Gentoo     Biscoe            47.6           14.5              215.0   \n",
       "138     Adelie      Dream            37.0           16.5              185.0   \n",
       "2       Adelie  Torgersen            40.3           18.0              195.0   \n",
       "271     Gentoo     Biscoe            48.5           14.1              220.0   \n",
       "342     Gentoo     Biscoe            45.2           14.8              212.0   \n",
       "60      Adelie     Biscoe            35.7           16.9              185.0   \n",
       "322     Gentoo     Biscoe            47.2           15.5              215.0   \n",
       "185  Chinstrap      Dream            51.0           18.8              203.0   \n",
       "122     Adelie  Torgersen            40.2           17.0              176.0   \n",
       "279     Gentoo     Biscoe            50.4           15.3              224.0   \n",
       "267     Gentoo     Biscoe            50.5           15.9              225.0   \n",
       "334     Gentoo     Biscoe            46.2           14.1              217.0   \n",
       "57      Adelie     Biscoe            40.6           18.8              193.0   \n",
       "106     Adelie     Biscoe            38.6           17.2              199.0   \n",
       "311     Gentoo     Biscoe            52.2           17.1              228.0   \n",
       "\n",
       "     body_mass_g     sex  \n",
       "329       5500.0    Male  \n",
       "190       2700.0  Female  \n",
       "207       3450.0    Male  \n",
       "255       5400.0    Male  \n",
       "268       5100.0  Female  \n",
       "224       5400.0    Male  \n",
       "138       3400.0  Female  \n",
       "2         3250.0  Female  \n",
       "271       5300.0    Male  \n",
       "342       5200.0  Female  \n",
       "60        3150.0  Female  \n",
       "322       4975.0  Female  \n",
       "185       4100.0    Male  \n",
       "122       3450.0  Female  \n",
       "279       5550.0    Male  \n",
       "267       5400.0    Male  \n",
       "334       4375.0  Female  \n",
       "57        3800.0    Male  \n",
       "106       3750.0  Female  \n",
       "311       5400.0    Male  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## Enter solution here\n",
    "penguins_df.sample(n=20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, for part b of this question, in a for loop, 100 times create a random sample of the dataframe and compute the mean body mass of the penguins in your sample.  Append all these values to a list and then plot a histogram of these values (using `sns.displot`).  Compare it to the mean of the dataset containing all the penguins."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Answer 2b here:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overall mean is  4201.754385964912\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAFgCAYAAACFYaNMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAQ4klEQVR4nO3dfaxkdX3H8fcHFsUWHyAsZLvsBpuikZoU05UqtAmWaLfUijYqJa0lkcpqxYhaG9SktTVN8KFqn2JZhYAWFawQH6ooUsQaFFwoIBQN1qIsEHapMdI01Sz77R/30L1sd+FC95zvzL3vVzK5M7+Zuef3g73vPffMzNlUFZKk6e3XPQFJWqkMsCQ1McCS1MQAS1ITAyxJTVZ1T2ApNm7cWJdffnn3NCTpscqeBudiD/i+++7rnoIk7XNzEWBJWo4MsCQ1McCS1MQAS1ITAyxJTQywJDUxwJLUxABLUhMDLElNDLAkNTHAktRktAAnWZfkqiS3Jbk1yeuH8bcnuSvJjcPlpLHmIEmzbMyzoe0A3lRVNyR5InB9kiuG+95XVe8ZcduSNPNGC3BV3QPcM1y/P8ltwNqxtidJ82aSY8BJjgSeBVw7DJ2Z5OYk5yc5eC/POSPJliRbtm/fPsU0NUPWrltPkkkua9et716uVqiM/c/SJzkIuBr486q6NMnhwH1AAe8A1lTVKx/ue2zYsKG2bNky6jw1W5JwyrnXTLKtizcdx9g/B1rxpj8he5IDgE8CF1XVpQBVdW9VPVBVO4EPAseOOQdJmlVjvgsiwHnAbVX13kXjaxY97CXALWPNQZJm2ZjvgjgeeAXwzSQ3DmNvBU5NcgwLhyDuADaNOAdJmlljvgviq+z5uMfnxtqmJM0TPwknSU0MsCQ1McCS1MQAS1ITAyxJTQywJDUxwJLUxABLUhMDLElNDLAkNTHAktTEAEtSEwMsSU0MsCQ1McCS1MQAS1ITAyxJTQywJDUxwJLUxABLUhMDLElNDLAkNTHAktTEAEtSEwMsSU0MsCQ1McCS1MQAS1ITAyxJTQywJDUxwJLUxABLUhMDLElNDLAkNTHAktTEAEtSEwMsSU0MsCQ1McCS1MQAS1ITAyxJTQywJDUxwJLUxABLUhMDLElNDLAkNTHAktTEAEtSEwMsSU0MsCQ1McCS1MQAS1KT0QKcZF2Sq5LcluTWJK8fxg9JckWS24evB481B0maZWPuAe8A3lRVzwCeA7w2ydHA2cCVVXUUcOVwW5JWnNECXFX3VNUNw/X7gduAtcDJwIXDwy4EXjzWHCRplk1yDDjJkcCzgGuBw6vqHliINHDYXp5zRpItSbZs3759imlK0qRGD3CSg4BPAmdV1Y+W+ryq2lxVG6pqw+rVq8eboCQ1GTXASQ5gIb4XVdWlw/C9SdYM968Bto05B0maVWO+CyLAecBtVfXeRXd9GjhtuH4a8Kmx5iBJs2zViN/7eOAVwDeT3DiMvRU4B7gkyenA94GXjTgHSZpZowW4qr4KZC93nzjWdiVpXvhJOElqYoAlqYkBlqQmBliSmhhgSWpigCWpiQHWkqxdt54kk12klWDMD2JoGbl7652ccu41k23v4k3HTbYtqYt7wJLUxABLUhMDLElNDLAkNTHAktTEAEtSEwMsSU0MsCQ1McCS1MQAS1ITAyxJTQywJDUxwJLUxABLUhMDLElNDLAkNTHAktTEAEtSEwMsSU0MsCQ1McCS1MQAS1ITAyxJTQywtN8qkkx2WbtuffeKNSNWdU9AardzB6ece81km7t403GTbUuzzT1gSWpigCWpiQGWpCYGWJKaGGBJamKAJamJAZakJgZYkpoYYElqYoAlqYkBlqQmBliSmhhgSWpigCWpiQGWpCYGWJKaGGBJamKAJamJAZakJqMFOMn5SbYluWXR2NuT3JXkxuFy0ljbl6RZN+Ye8AXAxj2Mv6+qjhkunxtx+5I000YLcFV9BfjBWN9fkuZdxzHgM5PcPByiOHhvD0pyRpItSbZs3759yvnNhbXr1pNksoukfW/VxNv7APAOoIavfwG8ck8PrKrNwGaADRs21FQTnBd3b72TU869ZrLtXbzpuMm2Ja0Uk+4BV9W9VfVAVe0EPggcO+X2JWmWTBrgJGsW3XwJcMveHitJy91ohyCSfAw4ATg0yVbgT4ATkhzDwiGIO4BNY21fkmbdaAGuqlP3MHzeWNuTpHnjJ+EkqYkBlqQmBliSmhhgSWpigCWpyZICnOT4pYxJkpZuqXvAf73EMUnSEj3s+4CTPBc4Dlid5I2L7noSsP+YE5Ok5e6RPojxOOCg4XFPXDT+I+ClY01KklaChw1wVV0NXJ3kgqr63kRzkqQVYakfRX58ks3AkYufU1W/OsakJGklWGqAPwH8HfAh4IHxpiNJK8dSA7yjqj4w6kwkaYVZ6tvQPpPkD5KsSXLIg5dRZyZJy9xS94BPG76+edFYAT+7b6cjSSvHkgJcVU8deyKStNIsKcBJfm9P41X14X07HUlaOZZ6COLZi64fCJwI3AAYYEl6jJZ6COJ1i28neTLwkVFmJEkrxGM9HeV/AUfty4lI0kqz1GPAn2HhXQ+wcBKeZwCXjDUpSVoJlnoM+D2Lru8AvldVW0eYjyStGEs6BDGclOdbLJwR7WDgJ2NOSpJWgqX+ixgvB64DXga8HLg2iaejlKT/h6Uegngb8Oyq2gaQZDXwJeAfxpqYJC13S30XxH4PxnfwH4/iuZKkPVjqHvDlSb4AfGy4fQrwuXGmJEkrwyP9m3A/BxxeVW9O8lvALwMBvgZcNMH8JGnZeqTDCO8H7geoqkur6o1V9QYW9n7fP+7UJGl5e6QAH1lVN+8+WFVbWPjniSRJj9EjBfjAh7nvCftyIpK00jxSgL+R5FW7DyY5Hbh+nClJy9x+q0gy2WXtuvXdK9ZePNK7IM4CLkvyO+wK7gbgccBLRpyXtHzt3MEp514z2eYu3nTcZNvSo/OwAa6qe4HjkjwPeOYw/I9V9U+jz0ySlrmlng/4KuCqkeciSSuKn2aTpCYGWJKaGGBJamKAJamJAZakJgZYkpoYYElqYoAlqYkBlqQmBliSmhhgSWpigCWpiQGWpCYGWJKaGGBJamKAJamJAZakJqMFOMn5SbYluWXR2CFJrkhy+/D14LG2L0mzbsw94AuAjbuNnQ1cWVVHAVcOtyVpRRotwFX1FeAHuw2fDFw4XL8QePFY25ekWTf1MeDDq+oegOHrYXt7YJIzkmxJsmX79u2TTVCSpjKzL8JV1eaq2lBVG1avXt09HUna56YO8L1J1gAMX7dNvH1JmhlTB/jTwGnD9dOAT028fUmaGWO+De1jwNeApyfZmuR04Bzg+UluB54/3JakFWnVWN+4qk7dy10njrVNSZonM/sinCQtdwZYkpoYYElqYoAlqYkBlqQmBliSmhhgSWpigCWpiQGWpCYGWJKaGGBJamKAJamJAZakJgZYkpoYYElqYoAlqYkBlqQmBliSmhhgSWpigCWpiQGWpCYGWJKaGGBJamKAJamJAZakJgZYkpoYYElqYoAlqYkBlqQmBliSmhhgSWpigCWpiQGWpCYGWJKaGGBJamKAJamJAZakJgZYkpoYYElqYoAlqYkBlqQmBliSmhhgSWpigCWpiQGWpCYGWJKaGGBJamKAJamJAZakJgZYkpoYYElqYoAlqcmqjo0muQO4H3gA2FFVGzrmIUmdWgI8eF5V3de4fUlq5SEISWrSFeACvpjk+iRnNM1Bklp1HYI4vqruTnIYcEWSb1XVVxY/YAjzGQDr16/vmKMkjaplD7iq7h6+bgMuA47dw2M2V9WGqtqwevXqqacoSaObPMBJfjrJEx+8DrwAuGXqeUhSt45DEIcDlyV5cPsfrarLG+YhSa0mD3BVfRf4ham3K0mzxrehSVITAyxJTQywJDUxwJLUxABLUhMDLElNDLAkNTHA0nK33yqSTHJZu87ztjwanecDljSFnTs45dxrJtnUxZuOm2Q7y4V7wJLUxABLUhMDLElNDLAkNTHAktTEAEtSEwMsSU0MsCQ1McCS1MQAS1KTZR3gtevW+xl4STNrWZ8L4u6td/oZeEkza1nvAUvSLDPAktTEAEtSEwMsSU0MsCQ1McCS1MQAS1ITAyxJTQywJDUxwJLUZFl/FHlS+60iSfcspF4T/xz8zBHruOvO70+2vX3NAO8rO3dMdt4J8NwTmlH+HDwqHoKQpCYGWJKaGGBJamKAJamJAZakJgZYkpoYYElqYoAlqYkBlqQmBliSmhhgSWpigCXNr+HkP1Nd1q5bv0+n78l4JM2vOT/5j3vAktTEAEtSEwMsSU0MsCQ1McCS1MQAS1ITAyxJTVoCnGRjkm8n+U6SszvmIEndJg9wkv2BvwV+HTgaODXJ0VPPQ5K6dewBHwt8p6q+W1U/AT4OnNwwD0lqlaqadoPJS4GNVfX7w+1XAL9UVWfu9rgzgDOGm08Hvj3pRHc5FLivadtjcD2zzfXMtse6nvuqauPugx3ngsgexv7P3wJVtRnYPP50Hl6SLVW1oXse+4rrmW2uZ7bt6/V0HILYCqxbdPsI4O6GeUhSq44AfwM4KslTkzwO+G3g0w3zkKRWkx+CqKodSc4EvgDsD5xfVbdOPY9Hof0wyD7memab65lt+3Q9k78IJ0la4CfhJKmJAZakJisuwEkOTHJdkpuS3JrkT4fxY5J8PcmNSbYkOXbRc94yfGz620l+bdH4Lyb55nDfXyXZ01vsJpFk/yT/kuSzw+1DklyR5Pbh68GLHjuP63l3km8luTnJZUmesuixc7eeReN/mKSSHLpobC7Xk+R1w5xvTfKuReNzt57JelBVK+rCwvuQDxquHwBcCzwH+CLw68P4ScCXh+tHAzcBjweeCvwbsP9w33XAc4fv+fkHn9+0rjcCHwU+O9x+F3D2cP1s4J1zvp4XAKuG6++c9/UMY+tYeDH6e8Ch87we4HnAl4DHD7cPm/P1TNKDFbcHXAv+c7h5wHCp4fKkYfzJ7Hpv8snAx6vqx1X178B3gGOTrAGeVFVfq4X/+h8GXjzRMh4iyRHAbwAfWjR8MnDhcP1Cds1tLtdTVV+sqh3Dza+z8P5xmNP1DN4H/BEP/SDSvK7nNcA5VfVjgKraNozP63om6cGKCzD8768bNwLbgCuq6lrgLODdSe4E3gO8ZXj4WuDORU/fOoytHa7vPt7h/Sz8IO9cNHZ4Vd0DMHw9bBif1/Us9koW9jBgTteT5EXAXVV1026Pncv1AE8DfiXJtUmuTvLsYXxe13MWE/RgRQa4qh6oqmNY2Is6NskzWfgb/A1VtQ54A3De8PC9fXR6SR+pHluSFwLbqur6pT5lD2Nzs54kbwN2ABc9OLSHh830epL8FPA24I/39JQ9jM30egargINZOJz3ZuCS4RjovK5nkh50nAtiZlTVD5N8GdgInAa8frjrE+z6dWRvH53eyq5fgxePT+144EVJTgIOBJ6U5O+Be5Osqap7hl+PHvyVcC7XU1W/m+Q04IXAicOveTCH6wE+wsLxw5uG12mOAG4YXuiZu/UMf962ApcO/1+uS7KThRPXzOt6fpMpetB10LvrAqwGnjJcfwLwzyz8UN8GnDCMnwhcP1z/eR560P277Dro/g0W/sZ/8KD7Sc1rO4FdLyK8m4e+CPeuOV/PRuBfgdW7PWYu17Pb+B3sehFuLtcDvBr4s+H601j4NT1zvJ5JerAS94DXABdm4cTw+wGXVNVnk/wQ+Mskq4D/ZjgVZlXdmuQSFn74dwCvraoHhu/1GuACFkL+eXYdl5wF57Dwa+DpwPeBl8Fcr+dvWPhDf8Ww1/j1qnr1HK9nj+Z4PecD5ye5BfgJcFotVGle1/MqJuiBH0WWpCYr8kU4SZoFBliSmhhgSWpigCWpiQGWpCYGWJKaGGBJavI/yNpxXsq8y4cAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 360x360 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "## Enter solution here\n",
    "overall_mean = penguins_df['body_mass_g'].mean()\n",
    "\n",
    "list_sample_mean = []\n",
    "for i in range(100):\n",
    "    sample_mean = penguins_df.sample(n=20)['body_mass_g'].mean()\n",
    "    list_sample_mean.append(sample_mean)\n",
    "\n",
    "sns.displot(list_sample_mean)\n",
    "print(\"Overall mean is \", overall_mean)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 3: Stratified sampling"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One problem with the simple random samples we made of the penguins is that in each sample we might exclude some important groups of the data.  For example, if we only sampled 10 penguins perhaps all of them are male.  If we wanted to be more even handed name make sure our samples were _representative_ of the sex differences then we might want to sample from the subpopulations.  This is called \"stratified sampling\".\n",
    "\n",
    "Please read this example webpage: https://www.statology.org/stratified-sampling-pandas/\n",
    "on stratified sampling and adapt the code to generate a random sample of 10 penguins that is stratified so that there are 5 male and 5 female examples in the sample"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Problem 3: Answer here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>184</th>\n",
       "      <td>Chinstrap</td>\n",
       "      <td>Dream</td>\n",
       "      <td>42.5</td>\n",
       "      <td>16.7</td>\n",
       "      <td>187.0</td>\n",
       "      <td>3350.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>38.7</td>\n",
       "      <td>19.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>37.9</td>\n",
       "      <td>18.6</td>\n",
       "      <td>193.0</td>\n",
       "      <td>2925.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>64</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>36.4</td>\n",
       "      <td>17.1</td>\n",
       "      <td>184.0</td>\n",
       "      <td>2850.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>40.5</td>\n",
       "      <td>17.9</td>\n",
       "      <td>187.0</td>\n",
       "      <td>3200.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>168</th>\n",
       "      <td>Chinstrap</td>\n",
       "      <td>Dream</td>\n",
       "      <td>50.3</td>\n",
       "      <td>20.0</td>\n",
       "      <td>197.0</td>\n",
       "      <td>3300.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>179</th>\n",
       "      <td>Chinstrap</td>\n",
       "      <td>Dream</td>\n",
       "      <td>49.5</td>\n",
       "      <td>19.0</td>\n",
       "      <td>200.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>123</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>41.4</td>\n",
       "      <td>18.5</td>\n",
       "      <td>202.0</td>\n",
       "      <td>3875.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>101</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>41.0</td>\n",
       "      <td>20.0</td>\n",
       "      <td>203.0</td>\n",
       "      <td>4725.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>194</th>\n",
       "      <td>Chinstrap</td>\n",
       "      <td>Dream</td>\n",
       "      <td>50.9</td>\n",
       "      <td>19.1</td>\n",
       "      <td>196.0</td>\n",
       "      <td>3550.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "184  Chinstrap      Dream            42.5           16.7              187.0   \n",
       "16      Adelie  Torgersen            38.7           19.0              195.0   \n",
       "104     Adelie     Biscoe            37.9           18.6              193.0   \n",
       "64      Adelie     Biscoe            36.4           17.1              184.0   \n",
       "27      Adelie     Biscoe            40.5           17.9              187.0   \n",
       "168  Chinstrap      Dream            50.3           20.0              197.0   \n",
       "179  Chinstrap      Dream            49.5           19.0              200.0   \n",
       "123     Adelie  Torgersen            41.4           18.5              202.0   \n",
       "101     Adelie     Biscoe            41.0           20.0              203.0   \n",
       "194  Chinstrap      Dream            50.9           19.1              196.0   \n",
       "\n",
       "     body_mass_g     sex  \n",
       "184       3350.0  Female  \n",
       "16        3450.0  Female  \n",
       "104       2925.0  Female  \n",
       "64        2850.0  Female  \n",
       "27        3200.0  Female  \n",
       "168       3300.0    Male  \n",
       "179       3800.0    Male  \n",
       "123       3875.0    Male  \n",
       "101       4725.0    Male  \n",
       "194       3550.0    Male  "
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## Enter solution here\n",
    "penguins_df.groupby('sex').sample(5)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}