Intro to For loops¶
Note
For the first section of this page, the original content was developed by Lisa Tagliaferri for digitalocean.com released under the Creative Commons Attribution-NonCommercial-ShakeAlike 4.0 International Licence. The text of Lisa’s tutorial had been modified in a minor way by Todd Gureckis in a few sections. Todd added 40 different examples of for loops ranging from simple to more complex. This page is released under the license for this book
Using loops in computer programming allows us to automate and repeat similar tasks multiple times. This is very common in data analysis. In this tutorial, we’ll be covering Python’s for loop.
A for
loop implements the repeated execution of code based on a loop counter or loop variable. This means that for
loops are used most often when the number of repetitions is known before entering the loop, unlike while loops which can run until some condition is met.
For Loops¶
In Python, for
loops are constructed like so:
for [iterating variable] in [sequence]:
[do something]
The something that is being done (known as a code block) will be executed until the sequence is over. The code block itself can consist of any number of lines of code, as long as they are tabbed over once from the left hand side of the code.
Let’s look at a for
loop that iterates through a range of values:
for i in range(0,5):
print(i)
When we run this program, the output looks like this:
for i in range(0,5):
print(i)
0
1
2
3
4
This for
loop sets up i
as its iterating variable, and the sequence exists in the range of 0 to 5.
Then within the loop we print out one integer per loop iteration. Keep in mind that in programming we tend to begin at index 0, so that is why although 5 numbers are printed out, they range from 0-4.
You’ll commonly see and use for
loops when a program needs to repeat a block of code a number of times.
For Loops using range()¶
One of Python’s built-in immutable sequence types is range()
. In loops, range()
is used to control how many times the loop will be repeated.
When working with range()
, you can pass between 1 and 3 integer arguments to it:
start
states the integer value at which the sequence begins, if this is not included then start begins at 0stop
is always required and is the integer that is counted up to but not includedstep
sets how much to increase (or decrease in the case of negative numbers) the next iteration, if this is omitted then step defaults to 1
We’ll look at some examples of passing different arguments to range()
.
First, let’s only pass the stop
argument, so that our sequence set up is range(stop)
:
for i in range(6):
print(i)
In the program above, the stop argument is 6, so the code will iterate from 0-6 (exclusive of 6):
for i in range(6):
print(i)
0
1
2
3
4
5
Next, we’ll look at range(start, stop)
, with values passed for when the iteration should start and for when it should stop:
for i in range(20,25):
print(i)
Here, the range goes from 20 (inclusive) to 25 (exclusive), so the output looks like this:
for i in range(20,25):
print(i)
20
21
22
23
24
The step argument of range()
can be used to skip values within the sequence.
With all three arguments, step
comes in the final position: range(start, stop, step)
. First, let’s use a step
with a positive value:
for i in range(0,15,3):
print(i)
In this case, the for
loop is set up so that the numbers from 0 to 15 print out, but at a step of 3, so that only every third number is printed, like so:
for i in range(0,15,3):
print(i)
0
3
6
9
12
We can also use a negative value for our step
argument to iterate backwards, but we’ll have to adjust our start and stop arguments accordingly:
for i in range(100,0,-10):
print(i)
Here, 100 is the start
value, 0 is the stop
value, and -10 is the range, so the loop begins at 100 and ends at 0, decreasing by 10 with each iteration. We can see this occur in the output:
for i in range(100,0,-10):
print(i)
100
90
80
70
60
50
40
30
20
10
When programming in Python, for loops often make use of the range()
sequence type as its parameters for iteration.
## For Loops using Sequential Data Types
Lists and other data sequence types can also be leveraged as iteration parameters in for loops. Rather than iterating through a range()
, you can define a list and iterate through that list.
We’ll assign a list to a variable, and then iterate through the list:
sharks = ['hammerhead', 'great white', 'dogfish', 'frilled', 'bullhead', 'requiem']
for shark in sharks:
print(shark)
In this case, we are printing out each item in the list. Though we used the variable shark, we could have called the variable any other valid variable name and we would get the same output:
sharks = ['hammerhead', 'great white', 'dogfish', 'frilled', 'bullhead', 'requiem']
for shark in sharks:
print(shark)
hammerhead
great white
dogfish
frilled
bullhead
requiem
The output above shows that the for
loop iterated through the list, and printed each item from the list per line.
Lists and other sequence-based data types like strings and tuples are common to use with loops because they are iterable. You can combine these data types with range() to add items to a list, for example:
sharks = ['hammerhead', 'great white', 'dogfish', 'frilled', 'bullhead', 'requiem']
for item in range(len(sharks)):
sharks.append('shark')
print(sharks)
['hammerhead', 'great white', 'dogfish', 'frilled', 'bullhead', 'requiem', 'shark', 'shark', 'shark', 'shark', 'shark', 'shark']
Here, we have added a placeholder string of ‘shark’ for each item of the length of the sharks list.
You can also use a for
loop to construct a list from scratch:
integers = []
for i in range(10):
integers.append(i)
print(integers)
In this example, the list integers
is initialized as an empty list, but the for loop populates the list like so:
integers = []
for i in range(10):
integers.append(i)
print(integers)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Similarly, we can iterate through strings:
sammy = 'Sammy'
for letter in sammy:
print(letter)
S
a
m
m
y
Iterating through tuples is done in the same format as iterating through lists or strings above.
When iterating through a dictionary, it’s important to keep the key:value
structure in mind to ensure that you are calling the correct element of the dictionary. Here is an example that calls both the key and the value:
sammy_shark = {'name': 'Sammy', 'animal': 'shark', 'color': 'blue', 'location': 'ocean'}
for key in sammy_shark:
print(key + ': ' + sammy_shark[key])
name: Sammy
animal: shark
color: blue
location: ocean
When using dictionaries with for
loops, the iterating variable corresponds to the keys of the dictionary, and dictionary_variable[iterating_variable]
corresponds to the values. In the case above, the iterating variable key was used to stand for key
, and sammy_shark[key]
was used to stand for the values.
Loops are often used to iterate and manipulate sequential data types.
Nested For Loops¶
Loops can be nested in Python, as they can with other programming languages.
A nested loop is a loop that occurs within another loop, structurally similar to nested if statements. These are constructed like so:
for [first iterating variable] in [outer loop]: # Outer loop
[do something] # Optional
for [second iterating variable] in [nested loop]: # Nested loop
[do something]
The program first encounters the outer loop, executing its first iteration. This first iteration triggers the inner, nested loop, which then runs to completion. Then the program returns back to the top of the outer loop, completing the second iteration and again triggering the nested loop. Again, the nested loop runs to completion, and the program returns back to the top of the outer loop until the sequence is complete or a break or other statement disrupts the process.
Let’s implement a nested for
loop so we can take a closer look. In this example, the outer loop will iterate through a list of integers called num_list
, and the inner loop will iterate through a list of strings called alpha_list
.
num_list = [1, 2, 3]
alpha_list = ['a', 'b', 'c']
for number in num_list:
print(number)
for letter in alpha_list:
print(letter)
When we run this program, we’ll receive the following output:
num_list = [1, 2, 3]
alpha_list = ['a', 'b', 'c']
for number in num_list:
print(number)
for letter in alpha_list:
print(letter)
1
a
b
c
2
a
b
c
3
a
b
c
The output illustrates that the program completes the first iteration of the outer loop by printing 1
, which then triggers completion of the inner loop, printing a
,b
, c
consecutively. Once the inner loop has completed, the program returns to the top of the outer loop, prints 2
, then again prints the inner loop in its entirety (a
, b
, c
), etc.
Nested for
loops can be useful for iterating through items within lists composed of lists. In a list composed of lists, if we employ just one for loop, the program will output each internal list as an item:
list_of_lists = [['hammerhead', 'great white', 'dogfish'],[0, 1, 2],[9.9, 8.8, 7.7]]
for list in list_of_lists:
print(list)
['hammerhead', 'great white', 'dogfish']
[0, 1, 2]
[9.9, 8.8, 7.7]
In order to access each individual item of the internal lists, we’ll implement a nested for
loop:
list_of_lists = [['hammerhead', 'great white', 'dogfish'],[0, 1, 2],[9.9, 8.8, 7.7]]
for list in list_of_lists:
for item in list:
print(item)
hammerhead
great white
dogfish
0
1
2
9.9
8.8
7.7
When we utilize a nested for loop we are able to iterate over the individual items contained in the lists.
Conclusion¶
This tutorial went over how for loops work in Python and how to construct them. For loops continue to loop through a block of code provided a certain number of times.
40 Example For Loops¶
In the next section I will provide 40 for loops. Each for loop is a different example of using for loops in cases the often come up in python data analysis.
Simple For Loops¶
Print the same thing 10 times¶
for i in range(10):
print("hi")
hi
hi
hi
hi
hi
hi
hi
hi
hi
hi
Print the numbers 0-9¶
for i in range(10):
print(i)
0
1
2
3
4
5
6
7
8
9
Print the numbers 1-10¶
for i in range(1,11):
print(i)
1
2
3
4
5
6
7
8
9
10
Print only the even numbers 1-10 combining a for loop with a if statement¶
for i in range(1,11):
if i%2==0:
print(i)
2
4
6
8
10
Print the elements of a list¶
students = ['anna','alex','anselm','david','pam','zhiwei','ili','shannon','neil']
for student in students:
print(student)
anna
alex
anselm
david
pam
zhiwei
ili
shannon
neil
Print the elements of a list backwards¶
students = ['anna','alex','anselm','david','pam','zhiwei','ili','shannon','neil']
for student in students[::-1]:
print(student)
neil
shannon
ili
zhiwei
pam
david
anselm
alex
anna
Print the first four elements of a list¶
students = ['anna','alex','anselm','david','pam','zhiwei','ili','shannon','neil']
for student in students[:4]:
print(student)
anna
alex
anselm
david
Print the entire list of students four times¶
Careful with this one! Notice the small difference between the variables students
(plural) and student
(singular).
students = ['anna','alex','anselm','david','pam','zhiwei','ili','shannon','neil']
for student in students[:4]:
print(students)
['anna', 'alex', 'anselm', 'david', 'pam', 'zhiwei', 'ili', 'shannon', 'neil']
['anna', 'alex', 'anselm', 'david', 'pam', 'zhiwei', 'ili', 'shannon', 'neil']
['anna', 'alex', 'anselm', 'david', 'pam', 'zhiwei', 'ili', 'shannon', 'neil']
['anna', 'alex', 'anselm', 'david', 'pam', 'zhiwei', 'ili', 'shannon', 'neil']
Keep a counter (i) of how many times the loop repeated¶
students = ['anna','alex','anselm','david','pam','zhiwei','ili','shannon','neil']
for i, student in enumerate(students):
print(i, student)
0 anna
1 alex
2 anselm
3 david
4 pam
5 zhiwei
6 ili
7 shannon
8 neil
Do something 10 times (make a random number) and don’t use a named variable for iterator¶
This is kind of a python style thing. You can create a variable with a name just _
(underscore). Since you would probably never name a variable something like that anywhere else in the code it can be good in for
loops were you mostly care about repeating the same code a bunch of times and not iterating down the values of a list.
import numpy as np
for _ in range(10):
print(np.random.randn())
1.7457328618712995
-0.4957320427978833
1.7142799341183015
1.4397353198097897
-0.07644382036398782
0.5061570560413113
0.04191190760958118
-1.2484947560962847
-0.8951057795559447
1.8120049924670218
Iterate down two lists at the same time¶
The zip
command takes to lists and combines them element by element. The lists need to be the same length!
firstname = ['anna','alex','anselm','david','pam','zhiwei','ili','shannon','neil']
lastname = ['smith','johnson','alexander','baker','palmeri','zoubok','weng','foster','shields']
for person in zip(firstname, lastname):
print(person)
('anna', 'smith')
('alex', 'johnson')
('anselm', 'alexander')
('david', 'baker')
('pam', 'palmeri')
('zhiwei', 'zoubok')
('ili', 'weng')
('shannon', 'foster')
('neil', 'shields')
Iterating the entries in a dictionary by the key¶
id_cards = {"123": "Anna", "d131": "Alex", "3f32": "Anselm"}
for key in id_cards.keys():
print(key,id_cards[key])
123 Anna
d131 Alex
3f32 Anselm
Iterating the values in a dictionary directly¶
for name in id_cards.values():
print(name)
Anna
Alex
Anselm
Keeping track of a result within a loop¶
All the examples so far have just repeated some simple block of code like printing something out. However, sometimes you want each iteration of the loop to compute something for you and store the result for further analysis. For example here we will square each number in a list and store it in a new list called results. Then we we plot the results.
import matplotlib.pyplot as plt
a = range(10)
results = []
for i in a:
results.append(i**2)
print(results)
plt.plot(a,results)
plt.show()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Print out the contents of a text file¶
This will open a text file on your computer or jupyter hub instance and will print out each line of the file.
myfile = 'something.txt'
with open(myfile, 'r') as f:
for line in f:
print(line)
For Loops Inside of For Loops (Nested)¶
Print out a square using a nested loop¶
Notice here that I used the underscore character for the _i
and _j
iterator variables.
for _i in range(10): # rows
for _j in range(10): # columns
print("* ", end='') # this prevents a new line being printed each time
print() # this only prints the new line at the end of each row
* * * * * * * * * *
* * * * * * * * * *
* * * * * * * * * *
* * * * * * * * * *
* * * * * * * * * *
* * * * * * * * * *
* * * * * * * * * *
* * * * * * * * * *
* * * * * * * * * *
* * * * * * * * * *
Use the counter from the outer loop to change the inner loops¶
for _i in range(10):
for _j in range(_i): # add as many columns as we have experienced rows
print("* ", end='')
print()
*
* *
* * *
* * * *
* * * * *
* * * * * *
* * * * * * *
* * * * * * * *
* * * * * * * * *
Three nested loops? Why not!?¶
The more outer loops make squares of growing sides
for _i in range(10):
for _j in range(_i): # add as many columns as we have experienced rows
for _k in range(_i):
print("* ", end='')
print()
print('----')
----
*
----
* *
* *
----
* * *
* * *
* * *
----
* * * *
* * * *
* * * *
* * * *
----
* * * * *
* * * * *
* * * * *
* * * * *
* * * * *
----
* * * * * *
* * * * * *
* * * * * *
* * * * * *
* * * * * *
* * * * * *
----
* * * * * * *
* * * * * * *
* * * * * * *
* * * * * * *
* * * * * * *
* * * * * * *
* * * * * * *
----
* * * * * * * *
* * * * * * * *
* * * * * * * *
* * * * * * * *
* * * * * * * *
* * * * * * * *
* * * * * * * *
* * * * * * * *
----
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
----
For Loops and Numpy¶
This section looks at the use of for loops in the context of a couple of common numpy
functions.
Print out a range from a numpy array¶
This acts pretty much like the range()
function described above. A np.arange()
just returns a numpy array instead of a list.
import numpy as np
for i in np.arange(0,10):
print(i)
0
1
2
3
4
5
6
7
8
9
Print out 20 steps between 0 and 10¶
This does a normal for loop over a np.linspace()
array. This function returns an numpy array between a start values (0) and end value (10) taking (20) steps. What is nice about this is that it figure how big the steps have to be so that you take two between the start and end value.
import numpy as np
for i in np.linspace(0,10,20):
print(i)
0.0
0.5263157894736842
1.0526315789473684
1.5789473684210527
2.1052631578947367
2.631578947368421
3.1578947368421053
3.6842105263157894
4.2105263157894735
4.7368421052631575
5.263157894736842
5.789473684210526
6.315789473684211
6.842105263157895
7.368421052631579
7.894736842105263
8.421052631578947
8.947368421052632
9.473684210526315
10.0
Using a nested loop to iterated over an array¶
If you have an array of arrays (sometimes called a matrix, although there is a specific matrix type in numpy), you might need to iterate over the elemnts:
x = np.array([[1,2,3],[4,5,6]])
for _row in x:
for _col in _row:
print(_col,end=' ')
print()
1 2 3
4 5 6
Interating over an array using indicies¶
Python is nice because you can iterate in a for loop directly over things in a collection like a list, dictionary or numpy array. However, sometimes you want to iterate by an index.
x = np.array([[1,2,3],[4,5,6]])
rows, cols = x.shape
for i in range(rows):
for j in range(cols):
print(x[i][j],end=' ') # here we are looking up the value in the array using our indicies
print()
1 2 3
4 5 6
For Loops and Pandas¶
This section looks at the use of for loops in the context of a couple of common pandas
data munging operations. For these examples, I am loading a .csv file hosted on the class webpage on salary of professors that we encountered in a previous homework.
Iterating over columns¶
First let’s print out each of the columns in this data frame
import pandas as pd
salary_data = pd.read_csv('http://gureckislab.org/courses/fall19/labincp/data/salary.csv')
for col in salary_data.columns:
print(col)
salary
gender
departm
years
age
publications
Iterating over rows¶
Dataframes have a couple of ways you can iterate over the rows. The best is the .iterrows()
method available on any data frame which is a proper iterator similar to what you get using the enumerate()
function we explored above. If you wanted to print out the entire data frame you can just delete the .head(n=3)
part of the command (i.e., salary_data.iterrows()
.
import pandas as pd
salary_data = pd.read_csv('http://gureckislab.org/courses/fall19/labincp/data/salary.csv')
for index, row in salary_data.head(n=3).iterrows():
print(index, row)
print('---')
0 salary 86285
gender 0
departm bio
years 26
age 64
publications 72
Name: 0, dtype: object
---
1 salary 77125
gender 0
departm bio
years 28
age 58
publications 43
Name: 1, dtype: object
---
2 salary 71922
gender 0
departm bio
years 10
age 38
publications 23
Name: 2, dtype: object
---
Iterating over groups¶
One of the most useful functions of pandas dataframes is the groupby
operation which divides up a larger dataframe into smaller groups based on the value of one or more columns. This is ideal for psychological data analysis because you might want to divide up your data based on trial type, participant number, etc… After you form the groups it is often useful to iterate over the groups to do additional analyses.
import pandas as pd
salary_data = pd.read_csv('http://gureckislab.org/courses/fall19/labincp/data/salary.csv')
grouped = salary_data.groupby("departm")
for name, group in grouped:
print(name)
print('-----')
print(group)
print()
bio
-----
salary gender departm years age publications
0 86285 0 bio 26.0 64.0 72
1 77125 0 bio 28.0 58.0 43
2 71922 0 bio 10.0 38.0 23
3 70499 0 bio 16.0 46.0 64
4 66624 0 bio 11.0 41.0 23
5 64451 0 bio 23.0 60.0 44
6 64366 0 bio 23.0 53.0 22
7 59344 0 bio 5.0 40.0 11
8 58560 0 bio 8.0 38.0 8
9 58294 0 bio 20.0 50.0 12
10 56092 0 bio 2.0 40.0 4
11 54452 0 bio 13.0 43.0 7
12 54269 0 bio 26.0 56.0 12
13 55125 0 bio 8.0 38.0 9
68 59139 1 bio 8.0 38.0 23
69 52968 1 bio 18.0 48.0 32
chem
-----
salary gender departm years age publications
14 97630 0 chem 34.0 64.0 43
15 82444 0 chem 31.0 61.0 42
16 76291 0 chem 29.0 65.0 33
17 75382 0 chem 26.0 56.0 39
18 64762 0 chem 25.0 NaN 29
19 62607 0 chem 20.0 45.0 34
20 60373 0 chem 26.0 56.0 43
21 58892 0 chem 18.0 48.0 21
22 47021 0 chem 4.0 34.0 12
23 44687 0 chem 4.0 34.0 19
70 55949 1 chem 4.0 34.0 12
geol
-----
salary gender departm years age publications
24 104828 0 geol NaN 50.0 44
25 71456 0 geol 11.0 41.0 32
26 65144 0 geol 7.0 37.0 12
27 52766 0 geol 4.0 38.0 32
math
-----
salary gender departm years age publications
62 82142 0 math 9.0 39.0 9
63 70509 0 math 23.0 53.0 7
64 60320 0 math 14.0 44.0 7
65 55814 0 math 8.0 38.0 6
66 53638 0 math 4.0 42.0 8
67 53517 2 math 5.0 35.0 5
75 61885 1 math 23.0 60.0 9
76 49542 1 math 3.0 33.0 5
neuro
-----
salary gender departm years age publications
28 112800 0 neuro 14.0 44.0 33
29 105761 0 neuro 9.0 39.0 30
30 92951 0 neuro 11.0 41.0 20
31 86621 0 neuro 19.0 49.0 10
32 85569 0 neuro 20.0 46.0 35
33 83896 0 neuro 10.0 40.0 22
34 79735 0 neuro 11.0 41.0 32
35 71518 0 neuro 7.0 37.0 34
36 68029 0 neuro 15.0 45.0 33
37 66482 0 neuro 14.0 44.0 42
38 61680 0 neuro 18.0 48.0 20
39 60455 0 neuro 8.0 38.0 49
40 58932 0 neuro 11.0 41.0 49
71 58893 1 neuro 10.0 35.0 4
72 53662 1 neuro 1.0 31.0 3
physics
-----
salary gender departm years age publications
54 96936 0 physics 15.0 50.0 17
55 83216 0 physics 11.0 37.0 19
56 72044 0 physics 2.0 32.0 16
57 64048 0 physics 23.0 53.0 4
58 58888 0 physics 26.0 56.0 7
59 58744 0 physics 20.0 50.0 9
60 55944 0 physics 21.0 51.0 8
61 54076 0 physics 19.0 49.0 12
stat
-----
salary gender departm years age publications
41 106412 0 stat 23.0 53.0 29
42 86980 0 stat 23.0 53.0 42
43 78114 0 stat 8.0 38.0 24
44 74085 0 stat 11.0 41.0 33
45 72250 0 stat 26.0 56.0 9
46 69596 0 stat 20.0 50.0 18
47 65285 0 stat 20.0 50.0 15
48 62557 0 stat 28.0 58.0 14
49 61947 0 stat 22.0 58.0 17
50 58565 0 stat 29.0 59.0 11
51 58365 0 stat 18.0 48.0 21
52 53656 0 stat 2.0 32.0 4
53 51391 0 stat 5.0 35.0 8
73 57185 1 stat 9.0 39.0 7
74 52254 1 stat 2.0 32.0 9
Iterating over muliple groups¶
You can group not just on a single column but combinations of multiple combinations.
import pandas as pd
salary_data = pd.read_csv('http://gureckislab.org/courses/fall19/labincp/data/salary.csv')
grouped = salary_data.groupby(["departm","gender"])
for name, group in grouped:
print(name)
print('-----')
print(group)
print()
('bio', 0)
-----
salary gender departm years age publications
0 86285 0 bio 26.0 64.0 72
1 77125 0 bio 28.0 58.0 43
2 71922 0 bio 10.0 38.0 23
3 70499 0 bio 16.0 46.0 64
4 66624 0 bio 11.0 41.0 23
5 64451 0 bio 23.0 60.0 44
6 64366 0 bio 23.0 53.0 22
7 59344 0 bio 5.0 40.0 11
8 58560 0 bio 8.0 38.0 8
9 58294 0 bio 20.0 50.0 12
10 56092 0 bio 2.0 40.0 4
11 54452 0 bio 13.0 43.0 7
12 54269 0 bio 26.0 56.0 12
13 55125 0 bio 8.0 38.0 9
('bio', 1)
-----
salary gender departm years age publications
68 59139 1 bio 8.0 38.0 23
69 52968 1 bio 18.0 48.0 32
('chem', 0)
-----
salary gender departm years age publications
14 97630 0 chem 34.0 64.0 43
15 82444 0 chem 31.0 61.0 42
16 76291 0 chem 29.0 65.0 33
17 75382 0 chem 26.0 56.0 39
18 64762 0 chem 25.0 NaN 29
19 62607 0 chem 20.0 45.0 34
20 60373 0 chem 26.0 56.0 43
21 58892 0 chem 18.0 48.0 21
22 47021 0 chem 4.0 34.0 12
23 44687 0 chem 4.0 34.0 19
('chem', 1)
-----
salary gender departm years age publications
70 55949 1 chem 4.0 34.0 12
('geol', 0)
-----
salary gender departm years age publications
24 104828 0 geol NaN 50.0 44
25 71456 0 geol 11.0 41.0 32
26 65144 0 geol 7.0 37.0 12
27 52766 0 geol 4.0 38.0 32
('math', 0)
-----
salary gender departm years age publications
62 82142 0 math 9.0 39.0 9
63 70509 0 math 23.0 53.0 7
64 60320 0 math 14.0 44.0 7
65 55814 0 math 8.0 38.0 6
66 53638 0 math 4.0 42.0 8
('math', 1)
-----
salary gender departm years age publications
75 61885 1 math 23.0 60.0 9
76 49542 1 math 3.0 33.0 5
('math', 2)
-----
salary gender departm years age publications
67 53517 2 math 5.0 35.0 5
('neuro', 0)
-----
salary gender departm years age publications
28 112800 0 neuro 14.0 44.0 33
29 105761 0 neuro 9.0 39.0 30
30 92951 0 neuro 11.0 41.0 20
31 86621 0 neuro 19.0 49.0 10
32 85569 0 neuro 20.0 46.0 35
33 83896 0 neuro 10.0 40.0 22
34 79735 0 neuro 11.0 41.0 32
35 71518 0 neuro 7.0 37.0 34
36 68029 0 neuro 15.0 45.0 33
37 66482 0 neuro 14.0 44.0 42
38 61680 0 neuro 18.0 48.0 20
39 60455 0 neuro 8.0 38.0 49
40 58932 0 neuro 11.0 41.0 49
('neuro', 1)
-----
salary gender departm years age publications
71 58893 1 neuro 10.0 35.0 4
72 53662 1 neuro 1.0 31.0 3
('physics', 0)
-----
salary gender departm years age publications
54 96936 0 physics 15.0 50.0 17
55 83216 0 physics 11.0 37.0 19
56 72044 0 physics 2.0 32.0 16
57 64048 0 physics 23.0 53.0 4
58 58888 0 physics 26.0 56.0 7
59 58744 0 physics 20.0 50.0 9
60 55944 0 physics 21.0 51.0 8
61 54076 0 physics 19.0 49.0 12
('stat', 0)
-----
salary gender departm years age publications
41 106412 0 stat 23.0 53.0 29
42 86980 0 stat 23.0 53.0 42
43 78114 0 stat 8.0 38.0 24
44 74085 0 stat 11.0 41.0 33
45 72250 0 stat 26.0 56.0 9
46 69596 0 stat 20.0 50.0 18
47 65285 0 stat 20.0 50.0 15
48 62557 0 stat 28.0 58.0 14
49 61947 0 stat 22.0 58.0 17
50 58565 0 stat 29.0 59.0 11
51 58365 0 stat 18.0 48.0 21
52 53656 0 stat 2.0 32.0 4
53 51391 0 stat 5.0 35.0 8
('stat', 1)
-----
salary gender departm years age publications
73 57185 1 stat 9.0 39.0 7
74 52254 1 stat 2.0 32.0 9
Reading in an entire directory of files¶
Sometimes you need to read in an process individual files in a folder. This code snippet for instance reads all the .csv files in a particular folder into a pandas dataframe and concatenates them.
import pandas as pd
data_path = './myfile/'
files = os.listdir(data_path)
frames = []
for data_file in files:
if data_file[-3:] == 'csv':
df = pd.read_csv(data_path+data_file)
frames.append(df)
alldata_df = pd.concat(frames)
For Loops and Seaborn¶
This section looks at the use of for loops in the context of plotting with seaborn
.
Plotting to subpanels of a matplotlib figure with a for loop¶
fig,ax = plt.subplots(7,3,figsize=(12,24))
ax = ax.ravel()
for i,s in enumerate(subs):
part_df=all_df[all_df.participant==s]
p1=sns.regplot(x='angle',y='trialResp.rt',data=part_df,ax=ax[i])
plt.show()