# In Class Activity - More Pandas and data processing¶

First, let’s import our packages.

```
import pandas as pd
import numpy as np
import numpy.random as npr
import math
```

Second, we want to read in a csv file in the dataframe `df`

. The file has a list of various faculty members and their phone numbers (don’t worry, the phone numbers are randomly generated. So best not try calling them). We also want to tell `df`

that the phone numbers are actually strings rather than traditional numbers, which we do with the `astype`

method.

```
df = pd.read_csv('faculty.csv')
df['Phone'] = df['Phone'].astype(str)
df
```

## Problem 0: Changing the format of a phone number¶

For this problem, we want to convert the phone numbers to a more readable format [ 9285162643 \(\rightarrow\) (928)516-2643 ]. You may have experience doing this sort of thing by hand with Excel, which can be very cumbersome and error-prone. Let’s see how to do this with pandas instead.

Please write code to make the transformation to (ABC)DEF-HIJK format for each phone number in `df`

.

*Hint*: Write a function `convert_phone`

that converts the format of a single phone number (also, remind yourself about Python list slicing). Then, you can apply that function using the `transform`

operation described in book section 6.12.1.2.

```
# Your answer goes here
```

## Problem 1: Making a new column¶

Using the same dataframe `df`

, make a new column that lists the complete name of each professor. For instance, the new column should be called ‘Complete name’ and the first entry should be the string ‘Karen Adolph’.

*Hint*: You could make a new function and use the same logic as above. Alternatively, you can also try directly summing the relevant columns.

```
# Your answer here
```

## Problem 2: Computing mean by group¶

Let’s create the rectangle and circle dataframe from last week and call it `df_shapes`

.

```
mytype = np.array(['rectangle','circle','rectangle','rectangle','circle','rectangle','circle','rectangle','circle','circle'])
width = npr.rand(len(mytype))*10.
height = npr.rand(len(mytype))*10.
height[mytype=='circle']=np.nan
df_shapes = pd.DataFrame({"type":mytype, "width":width, "height":height})
df_shapes
```

Next, you should compute the mean ‘width’ separately for the rectangles and circles.

*Hint*: you can use `groupby`

and `.mean()`

from chapter 6.12.

```
# Your answer here
```