In Class Activity - Pandas and data processing

First, let’s import our packages.

import pandas as pd
import numpy as np
import numpy.random as npr
import math

The code below creates a pandas dataframe named df that stores information about several items, each of which is a shape. Each shape has a type (just rectangle or circle), a width, and the rectangles also have a height (for circles, this is not-a-valid-number (NaN) since circles must have same width and height). Note that pandas and numpy work together pretty well.

mytype = np.array(['rectangle','circle','rectangle','rectangle','circle','rectangle','circle','rectangle','circle','circle'])
width = npr.rand(len(mytype))*10.
height = npr.rand(len(mytype))*10.
height[mytype=='circle']=np.nan  
df = pd.DataFrame({"type":mytype, "width":width, "height":height})
df

Problem 0: Changing entries in the table

Your first task is to manually change some of the entries in the dataframe. For the last ‘rectangle’ in the table, please change it’s width to 5.0 and it’s height to be 2.0.

# Your answer here

Problem 1: Droppings rows with missing data, and computing a mean

Using pandas function dropna, drop any rows that have missing data / a NaN value (in essence, all the circles in the table will be dropped). Save the resulting dataframe to a new variable df2.

Next, compute the average height of the items in df2

# Your answer here

Problem 2: Computing the area and creating a new column

Forget about df2, let’s go back to the original table df. We want to create a new column of df that lists the area of each shape. Please write code that does this–both creating the column and computing the area of each shape. Remember, the formula for the area of a circle is \(\Pi r^2\) for radius \(r\). In our case, it is s \(\Pi (w/2)^2\) for width \(w\).

Note, if you haven’t read through Chapter 6.10 yet, it’s a good time to do so. Especially see 6.10.4 on “Selecting”.

# Your answer here