Answers - In Class Activity - Pandas and data processing¶
First, let’s import our packages.
import pandas as pd
import numpy as np
import numpy.random as npr
import math
The code below creates a pandas dataframe named df
that stores information about several items, each of which is a shape. Each shape has a type
(just rectangle or circle), a width
, and the rectangles also have a height
(for circles, this is not-a-valid-number (NaN) since circles must have same width and height). Note that pandas and numpy work together pretty well.
mytype = np.array(['rectangle','circle','rectangle','rectangle','circle','rectangle','circle','rectangle','circle','circle'])
width = npr.rand(len(mytype))*10.
height = npr.rand(len(mytype))*10.
height[mytype=='circle']=np.nan
df = pd.DataFrame({"type":mytype, "width":width, "height":height})
df
Problem 0: Changing entries in the table¶
Your first task is to manually change some of the entries in the dataframe. For the last ‘rectangle’ in the table, please change it’s width to 5.0
and it’s height to be 2.0
.
# Your answer goes here
df.at[7,'width'] = 5.0
df.at[7,'height'] = 2.0
df
Problem 1: Droppings rows with missing data, and computing a mean¶
Using pandas function dropna
, drop any rows that have missing data / a NaN
value (in essence, all the circles in the table will be dropped). Save the resulting dataframe to a new variable df2
.
Next, compute the average height of the items in df2
# Your answer here
df2 = df.dropna()
df2.mean()
Problem 2: Computing the area and creating a new column¶
Forget about df2
, let’s go back to the original table df
. We want to create a new column of df
that lists the area of each shape. Please write code that does this–both creating the column and computing the area of each shape. Remember, the formula for the area of a circle is \(\Pi r^2\) for radius \(r\). In our case, it is s \(\Pi (w/2)^2\) for width \(w\).
Note, if you haven’t read through Chapter 6.10 yet, it’s a good time to do so. Especially see 6.10.4 on “Selecting”.
# Your answer here (this answer uses a for-loop)
df['area'] = df['width']*df['height']
n = df.shape[0]
for i in range(n):
if df.at[i,'type'] == 'circle':
df.at[i,'area'] = math.pi * (df.at[i,'width']/2) ** 2
df
# Your answer here (this answer doesn't use a for-loop)
df['area'] = np.nan
df['area'] = df['width']*df['height']
sel = df['type'] == 'circle'
df.loc[sel,'area'] = math.pi * (df.loc[sel,'width']/2) ** 2
df