{ "cells": [ { "cell_type": "markdown", "id": "6073eb74", "metadata": {}, "source": [ "# Answers - In Class Activity - Pandas and data processing" ] }, { "cell_type": "markdown", "id": "1f1a34fa", "metadata": {}, "source": [ "First, let's import our packages." ] }, { "cell_type": "code", "execution_count": null, "id": "27fec022", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import numpy.random as npr\n", "import math" ] }, { "cell_type": "markdown", "id": "6d5c7ba4", "metadata": {}, "source": [ "The code below creates a pandas dataframe named `df` that stores information about several items, each of which is a shape. Each shape has a `type` (just rectangle or circle), a `width`, and the rectangles also have a `height` (for circles, this is not-a-valid-number (NaN) since circles must have same width and height). Note that pandas and numpy work together pretty well." ] }, { "cell_type": "code", "execution_count": null, "id": "193b2fdb", "metadata": {}, "outputs": [], "source": [ "mytype = np.array(['rectangle','circle','rectangle','rectangle','circle','rectangle','circle','rectangle','circle','circle'])\n", "width = npr.rand(len(mytype))*10.\n", "height = npr.rand(len(mytype))*10.\n", "height[mytype=='circle']=np.nan \n", "df = pd.DataFrame({\"type\":mytype, \"width\":width, \"height\":height})\n", "df" ] }, { "cell_type": "markdown", "id": "bd755d44", "metadata": {}, "source": [ "### Problem 0: Changing entries in the table" ] }, { "cell_type": "markdown", "id": "67e11322", "metadata": {}, "source": [ "Your first task is to manually change some of the entries in the dataframe. For the last 'rectangle' in the table, please change it's width to `5.0` and it's height to be `2.0`." ] }, { "cell_type": "code", "execution_count": null, "id": "9a0496aa", "metadata": {}, "outputs": [], "source": [ "# Your answer goes here\n", "df.at[7,'width'] = 5.0\n", "df.at[7,'height'] = 2.0\n", "df" ] }, { "cell_type": "markdown", "id": "cec6b441", "metadata": {}, "source": [ "### Problem 1: Droppings rows with missing data, and computing a mean" ] }, { "cell_type": "markdown", "id": "2dcb7d61", "metadata": {}, "source": [ "Using pandas function `dropna`, drop any rows that have missing data / a `NaN` value (in essence, all the circles in the table will be dropped). Save the resulting dataframe to a new variable `df2`. \n", "\n", "Next, compute the average height of the items in `df2`" ] }, { "cell_type": "code", "execution_count": null, "id": "d9b3d082", "metadata": {}, "outputs": [], "source": [ "# Your answer here\n", "df2 = df.dropna()\n", "df2.mean()" ] }, { "cell_type": "markdown", "id": "a0c2970a", "metadata": {}, "source": [ "### Problem 2: Computing the area and creating a new column" ] }, { "cell_type": "markdown", "id": "7b90d30d", "metadata": {}, "source": [ "Forget about `df2`, let's go back to the original table `df`. We want to create a new column of `df` that lists the area of each shape. Please write code that does this--both creating the column and computing the area of each shape. Remember, the formula for the area of a circle is $\\Pi r^2$ for radius $r$. In our case, it is s $\\Pi (w/2)^2$ for width $w$.\n", "\n", "Note, if you haven't read through Chapter 6.10 yet, it's a good time to do so. Especially see 6.10.4 on \"Selecting\"." ] }, { "cell_type": "code", "execution_count": null, "id": "a7bd9e91", "metadata": {}, "outputs": [], "source": [ "# Your answer here (this answer uses a for-loop)\n", "df['area'] = df['width']*df['height']\n", "n = df.shape[0]\n", "for i in range(n):\n", " if df.at[i,'type'] == 'circle':\n", " df.at[i,'area'] = math.pi * (df.at[i,'width']/2) ** 2\n", "df" ] }, { "cell_type": "code", "execution_count": null, "id": "52c5ba1a", "metadata": {}, "outputs": [], "source": [ "# Your answer here (this answer doesn't use a for-loop)\n", "df['area'] = np.nan\n", "df['area'] = df['width']*df['height']\n", "sel = df['type'] == 'circle'\n", "df.loc[sel,'area'] = math.pi * (df.loc[sel,'width']/2) ** 2\n", "df" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" } }, "nbformat": 4, "nbformat_minor": 5 }