{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "6073eb74",
   "metadata": {},
   "source": [
    "# Answers -In Class Activity - More Pandas and data processing"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f1a34fa",
   "metadata": {},
   "source": [
    "First, let's import our packages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "27fec022",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import numpy.random as npr\n",
    "import math"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d5c7ba4",
   "metadata": {},
   "source": [
    "Second, we want to read in a csv file in the dataframe `df`. The file has a list of various faculty members and their phone numbers (don't worry, the phone numbers are randomly generated. So best not try calling them). We also want to tell `df` that the phone numbers are actually strings rather than traditional numbers, which we do with the `astype` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "193b2fdb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Last name</th>\n",
       "      <th>First name</th>\n",
       "      <th>Phone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adolph</td>\n",
       "      <td>Karen</td>\n",
       "      <td>9285162643</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Amodio</td>\n",
       "      <td>David</td>\n",
       "      <td>3803269359</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Balcetis</td>\n",
       "      <td>Emily</td>\n",
       "      <td>2204342277</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Van Bavel</td>\n",
       "      <td>Jay</td>\n",
       "      <td>3252450008</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Carrasco</td>\n",
       "      <td>Marisa</td>\n",
       "      <td>2092652228</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Cimpian</td>\n",
       "      <td>Andrei</td>\n",
       "      <td>6966417505</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Craig</td>\n",
       "      <td>Maureen</td>\n",
       "      <td>7022020569</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Curtis</td>\n",
       "      <td>Clay</td>\n",
       "      <td>7768322051</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Dillon</td>\n",
       "      <td>Moira</td>\n",
       "      <td>6209318809</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Freeman</td>\n",
       "      <td>Jon</td>\n",
       "      <td>4746939756</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Gollwitzer</td>\n",
       "      <td>Peter</td>\n",
       "      <td>3195372989</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Last name First name       Phone\n",
       "0       Adolph      Karen  9285162643\n",
       "1       Amodio      David  3803269359\n",
       "2     Balcetis      Emily  2204342277\n",
       "3    Van Bavel        Jay  3252450008\n",
       "4     Carrasco     Marisa  2092652228\n",
       "5      Cimpian     Andrei  6966417505\n",
       "6        Craig    Maureen  7022020569\n",
       "7       Curtis       Clay  7768322051\n",
       "8       Dillon      Moira  6209318809\n",
       "9      Freeman        Jon  4746939756\n",
       "10  Gollwitzer      Peter  3195372989"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('faculty.csv')\n",
    "df['Phone'] = df['Phone'].astype(str)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd755d44",
   "metadata": {},
   "source": [
    "### Problem 0: Changing the format of a phone number"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67e11322",
   "metadata": {},
   "source": [
    "For this problem, we want to convert the phone numbers to a more readable format [ 9285162643 $\\rightarrow$ (928)516-2643 ]. You may have experience doing this sort of thing by hand with Excel, which can be very cumbersome and error-prone. Let's see how to do this with pandas instead.\n",
    "\n",
    "Please write code to make the transformation to (ABC)DEF-HIJK format for each phone number in `df`.\n",
    "\n",
    "_Hint_: Write a function `convert_phone` that converts the format of a single phone number (also, remind yourself about Python list slicing). Then, you can apply that function using the `transform` operation described in book section 6.12.1.2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a0496aa",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your answer goes here\n",
    "\n",
    "def convert_phone(x):\n",
    "    assert(len(x)==10)\n",
    "    return '(' + x[:3] + ')' + x[3:6] + '-' + x[6:]\n",
    "\n",
    "df['Phone'] = df['Phone'].transform(convert_phone)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cec6b441",
   "metadata": {},
   "source": [
    "### Problem 1: Making  a new column"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2dcb7d61",
   "metadata": {},
   "source": [
    "Using the same dataframe `df`, make a new column that lists the complete name of each professor. For instance, the new column should be called 'Complete name' and the first entry should be the string 'Karen Adolph'.\n",
    "\n",
    "_Hint_: You could make a new function and use the same logic as above. Alternatively, you can also try directly summing the relevant columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "d9b3d082",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Last name</th>\n",
       "      <th>First name</th>\n",
       "      <th>Phone</th>\n",
       "      <th>Complete name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adolph</td>\n",
       "      <td>Karen</td>\n",
       "      <td>9285162643</td>\n",
       "      <td>Karen Adolph</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Amodio</td>\n",
       "      <td>David</td>\n",
       "      <td>3803269359</td>\n",
       "      <td>David Amodio</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Balcetis</td>\n",
       "      <td>Emily</td>\n",
       "      <td>2204342277</td>\n",
       "      <td>Emily Balcetis</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Van Bavel</td>\n",
       "      <td>Jay</td>\n",
       "      <td>3252450008</td>\n",
       "      <td>Jay Van Bavel</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Carrasco</td>\n",
       "      <td>Marisa</td>\n",
       "      <td>2092652228</td>\n",
       "      <td>Marisa Carrasco</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Cimpian</td>\n",
       "      <td>Andrei</td>\n",
       "      <td>6966417505</td>\n",
       "      <td>Andrei Cimpian</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Craig</td>\n",
       "      <td>Maureen</td>\n",
       "      <td>7022020569</td>\n",
       "      <td>Maureen Craig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Curtis</td>\n",
       "      <td>Clay</td>\n",
       "      <td>7768322051</td>\n",
       "      <td>Clay Curtis</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Dillon</td>\n",
       "      <td>Moira</td>\n",
       "      <td>6209318809</td>\n",
       "      <td>Moira Dillon</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Freeman</td>\n",
       "      <td>Jon</td>\n",
       "      <td>4746939756</td>\n",
       "      <td>Jon Freeman</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Gollwitzer</td>\n",
       "      <td>Peter</td>\n",
       "      <td>3195372989</td>\n",
       "      <td>Peter Gollwitzer</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Last name First name       Phone     Complete name\n",
       "0       Adolph      Karen  9285162643      Karen Adolph\n",
       "1       Amodio      David  3803269359      David Amodio\n",
       "2     Balcetis      Emily  2204342277    Emily Balcetis\n",
       "3    Van Bavel        Jay  3252450008     Jay Van Bavel\n",
       "4     Carrasco     Marisa  2092652228   Marisa Carrasco\n",
       "5      Cimpian     Andrei  6966417505    Andrei Cimpian\n",
       "6        Craig    Maureen  7022020569     Maureen Craig\n",
       "7       Curtis       Clay  7768322051       Clay Curtis\n",
       "8       Dillon      Moira  6209318809      Moira Dillon\n",
       "9      Freeman        Jon  4746939756       Jon Freeman\n",
       "10  Gollwitzer      Peter  3195372989  Peter Gollwitzer"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Your answer here\n",
    "df['Complete name'] = df['First name'] + ' ' + df['Last name']\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0c2970a",
   "metadata": {},
   "source": [
    "### Problem 2: Computing the area and creating a new column"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5638e223",
   "metadata": {},
   "source": [
    "Let's create the rectangle and circle dataframe from last week and call it `df_shapes`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c7e592c2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>width</th>\n",
       "      <th>height</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>rectangle</td>\n",
       "      <td>0.144597</td>\n",
       "      <td>5.902562</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>circle</td>\n",
       "      <td>0.944694</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>rectangle</td>\n",
       "      <td>9.350486</td>\n",
       "      <td>3.828103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>rectangle</td>\n",
       "      <td>7.251338</td>\n",
       "      <td>0.301459</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>circle</td>\n",
       "      <td>1.120220</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>rectangle</td>\n",
       "      <td>8.768920</td>\n",
       "      <td>8.915431</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>circle</td>\n",
       "      <td>0.582969</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>rectangle</td>\n",
       "      <td>2.239293</td>\n",
       "      <td>0.090153</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>circle</td>\n",
       "      <td>7.015915</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>circle</td>\n",
       "      <td>6.086628</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        type     width    height\n",
       "0  rectangle  0.144597  5.902562\n",
       "1     circle  0.944694       NaN\n",
       "2  rectangle  9.350486  3.828103\n",
       "3  rectangle  7.251338  0.301459\n",
       "4     circle  1.120220       NaN\n",
       "5  rectangle  8.768920  8.915431\n",
       "6     circle  0.582969       NaN\n",
       "7  rectangle  2.239293  0.090153\n",
       "8     circle  7.015915       NaN\n",
       "9     circle  6.086628       NaN"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mytype = np.array(['rectangle','circle','rectangle','rectangle','circle','rectangle','circle','rectangle','circle','circle'])\n",
    "width = npr.rand(len(mytype))*10.\n",
    "height = npr.rand(len(mytype))*10.\n",
    "height[mytype=='circle']=np.nan  \n",
    "df_shapes = pd.DataFrame({\"type\":mytype, \"width\":width, \"height\":height})\n",
    "df_shapes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b90d30d",
   "metadata": {},
   "source": [
    "Next, you should compute the mean 'width' separately for the rectangles and circles.\n",
    "\n",
    "_Hint_: you can use `groupby` and `.mean()` from chapter 6.12."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a7bd9e91",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your answer here\n",
    "df_shapes.groupby('type')['width'].mean()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}