Answers -In Class Activity - More Pandas and data processing¶

First, let’s import our packages.

import pandas as pd
import numpy as np
import numpy.random as npr
import math

Second, we want to read in a csv file in the dataframe df. The file has a list of various faculty members and their phone numbers (don’t worry, the phone numbers are randomly generated. So best not try calling them). We also want to tell df that the phone numbers are actually strings rather than traditional numbers, which we do with the astype method.

df = pd.read_csv('faculty.csv')
df['Phone'] = df['Phone'].astype(str)
df

	Last name	First name	Phone
0	Adolph	Karen	9285162643
1	Amodio	David	3803269359
2	Balcetis	Emily	2204342277
3	Van Bavel	Jay	3252450008
4	Carrasco	Marisa	2092652228
5	Cimpian	Andrei	6966417505
6	Craig	Maureen	7022020569
7	Curtis	Clay	7768322051
8	Dillon	Moira	6209318809
9	Freeman	Jon	4746939756
10	Gollwitzer	Peter	3195372989

Problem 0: Changing the format of a phone number¶

For this problem, we want to convert the phone numbers to a more readable format [ 9285162643 \(\rightarrow\) (928)516-2643 ]. You may have experience doing this sort of thing by hand with Excel, which can be very cumbersome and error-prone. Let’s see how to do this with pandas instead.

Please write code to make the transformation to (ABC)DEF-HIJK format for each phone number in df.

Hint: Write a function convert_phone that converts the format of a single phone number (also, remind yourself about Python list slicing). Then, you can apply that function using the transform operation described in book section 6.12.1.2.

# Your answer goes here

def convert_phone(x):
    assert(len(x)==10)
    return '(' + x[:3] + ')' + x[3:6] + '-' + x[6:]

df['Phone'] = df['Phone'].transform(convert_phone)

Problem 1: Making a new column¶

Using the same dataframe df, make a new column that lists the complete name of each professor. For instance, the new column should be called ‘Complete name’ and the first entry should be the string ‘Karen Adolph’.

Hint: You could make a new function and use the same logic as above. Alternatively, you can also try directly summing the relevant columns.

# Your answer here
df['Complete name'] = df['First name'] + ' ' + df['Last name']
df

	Last name	First name	Phone	Complete name
0	Adolph	Karen	9285162643	Karen Adolph
1	Amodio	David	3803269359	David Amodio
2	Balcetis	Emily	2204342277	Emily Balcetis
3	Van Bavel	Jay	3252450008	Jay Van Bavel
4	Carrasco	Marisa	2092652228	Marisa Carrasco
5	Cimpian	Andrei	6966417505	Andrei Cimpian
6	Craig	Maureen	7022020569	Maureen Craig
7	Curtis	Clay	7768322051	Clay Curtis
8	Dillon	Moira	6209318809	Moira Dillon
9	Freeman	Jon	4746939756	Jon Freeman
10	Gollwitzer	Peter	3195372989	Peter Gollwitzer

Problem 2: Computing the area and creating a new column¶

Let’s create the rectangle and circle dataframe from last week and call it df_shapes.

mytype = np.array(['rectangle','circle','rectangle','rectangle','circle','rectangle','circle','rectangle','circle','circle'])
width = npr.rand(len(mytype))*10.
height = npr.rand(len(mytype))*10.
height[mytype=='circle']=np.nan  
df_shapes = pd.DataFrame({"type":mytype, "width":width, "height":height})
df_shapes

	type	width	height
0	rectangle	0.144597	5.902562
1	circle	0.944694	NaN
2	rectangle	9.350486	3.828103
3	rectangle	7.251338	0.301459
4	circle	1.120220	NaN
5	rectangle	8.768920	8.915431
6	circle	0.582969	NaN
7	rectangle	2.239293	0.090153
8	circle	7.015915	NaN
9	circle	6.086628	NaN

Next, you should compute the mean ‘width’ separately for the rectangles and circles.

Hint: you can use groupby and .mean() from chapter 6.12.

# Your answer here
df_shapes.groupby('type')['width'].mean()

Lab in C&P (Fall 2023)

Answers -In Class Activity - More Pandas and data processing¶

Problem 0: Changing the format of a phone number¶

Problem 1: Making a new column¶

Problem 2: Computing the area and creating a new column¶