{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Intro to Python for Psychology Undergrads" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{note}\n", "This chapter authored by [Todd M. Gureckis](http://gureckislab.org/~gureckis) is released under the [license](/LICENSE.html) for the book. The section on for loops was developed by [Lisa Tagliaferri](https://twitter.com/lisaironcutter) for digitalocean.com released under the Creative Commons Attribution-NonCommercial-ShakeAlike 4.0 International Licence. This document is targetted toward psych undergrads in our Lab in Cognition and Perception course at NYU but could be useful for anyone learning Python for the purpose of data analysis.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video Lecture\n", "\n", "This video provides an complementary overview of this chapter. There are things mentioned in the chapter not mentioned in the video and vice versa. Together they give an overview of this unit so please read and watch.\n", "\n", "
\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "In this class, we will use Python as our analysis and programming language. Teaching the full scope of programming in Python is beyond the scope of the class, especially if you have no prior programming experience. However, this chapter aims give you enough of what you need to do for most of the types of data analysis we will be doing in this lab course. There are many additional tutorials and learning resources on the [class homepage](../../tips/pythonresources.html). \n", "\n", "In addition, both the univeristy in general as well as the department are offering courses on introductory programming using Python. Thus, don't let this class be the only or last exposure to these tools. Consider it the beginning of learning!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{warning}\n", "This document by itself is by no means a complete introduction to programming or Python. Instead I'm trying to give you the best coverage of things you will encounter in this class!!\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What is Python?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python is a popular programming language that is often considered easy to learn, flexible, and (importantly) free to use on most computers. Python is used extensively in science and in industry. One reason is that Python has a very strong set of add-on libraries that let you use it for all kinds of tasks including data analysis, statistical modeling, developing applications for the web, running experiments, programming video games, making desktop apps, and programming robots. \n", "\n", "
\n", "\n", "You can learn more about the language on [python.org](https://www.python.org/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This chapters gives you an overview of most the language features in Python you can expect to encounter in this course. Bookmark this chapter for an easy reminder of the most important use cases, but also when you are starting out it can make sense to step through these elements one by one. \n", "\n", "> This chapter will be distributed to your JupyterHub compute instance. You are encouraged to run each Jupyter cell one by one trying to read and understand the output. I also encourage you to try changing cell and playing with slightly different inputs. There's no way to permenantly break anything and you can learn a lot by trying variations of things as well as making mistakes!\n", "\n", "The chapter is divided into different subsections reviewing a basic feature of Python including:\n", "\n", "\n", "- [Comments](#comments)\n", "- [Calling Functions](#calling-functions)\n", "- [Using Python as a calculator](#using-python-as-a-calculator)\n", "- [Variables](#variables)\n", "- [Messing with text (i.e., strings)](#messing-with-text-i-e-strings)\n", "- [Collections](#collections)\n", " - [Lists](#lists)\n", " - [Dictionaries](#dictionaries)\n", " - [Sets](#sets)\n", "- [Flow Control](#flow-control)\n", " - [Testing if things are true](#testing-if-things-are-true)\n", " - [Conditionals (i.e., if-then-else)](#conditionals-if-then-else)\n", " - [For loops](#for-loops)\n", "- [Writing New Functions](#writing-new-functions)\n", "- [Importing additional functionality via libraries](#importing-additional-functionality)\n", "- [Dealing with error messages](#dealing-with-error-messages)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Comments in Python start with the hash character, `#`, and extend to the end of the physical line. A comment may appear at the start of a line or following whitespace or code, but not within a string literal. A hash character within a string literal is just a hash character. Since comments are to clarify code and are not interpreted by Python, they may be omitted when typing in examples.\n", "\n", "Some examples:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# This is not a comment because it's inside quotes.\n" ] } ], "source": [ "# this is the first comment\n", "spam = 1 # and this is the second comment\n", " # ... and now a third!\n", "text = \"# This is not a comment because it's inside quotes.\"\n", "print(text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calling functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A **function** in Python is a piece of code, often made up of several instructions, which runs when it is referenced or \"called\". Functions are also called **methods** or **procedures**. Python provides many default functions (like `print()` referenced in the last section) but also gives you freedom to create your own custom functions, something we'll discuss more later.\n", "\n", "Functions have a couple of key elements: \n", "- A **name** which is how you specify the function you want to use\n", "- The name is followed by a open and close parentheses `()`\n", "- optionally, a function can include one or more arguments or parameters which are placed inside the parentheses\n", "\n", "Here is a little schematic of how it looks:\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We already say one function, which you will use a lot, called `print()`. The `print()` function lets you print out the value of whatever you provide as arguments or parameters. For example:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n" ] } ], "source": [ "a=1\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This code defines a variable (see below) named and then prints the value of a. Notice how in Jupyter the color of the word print is special (usually green). The `print()` function is itself a set of lower-level commands that determine how to print things out. \n", "\n", "Here's another example built-in function called `abs()` that computes the absolute value of a number:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "abs(2)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "abs(-1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again notice the syntax here. A name for the function, and then arguments as a matched parentheses `()`. It is important that the parantheses are matched. If you forget the closing parentheses, you'll get an error:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "ename": "SyntaxError", "evalue": "unexpected EOF while parsing (, line 1)", "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m abs(1\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m unexpected EOF while parsing\n" ] } ], "source": [ "abs(1" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "ename": "SyntaxError", "evalue": "unexpected EOF while parsing (, line 1)", "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m abs(-1(\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m unexpected EOF while parsing\n" ] } ], "source": [ "abs(-1(" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You'll learn more about writing your own functions as well as importing other special, powerful function later. However, first I just want you to be able to pick out when a function is being used and what that looks like in the code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Python as a Calculator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "The Python interpreter can act as a simple calculator: type an expression at it outputs the value.\n", "\n", "Expression syntax is straightforward: the operators `+`, `-`, `*` and `/` work just like in most other programming languages (such as Pascal or C); parentheses (`()`) can be used for grouping and order or precedence. For example:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2 + 2" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "20" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "50 - 5*6" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You often have to use parentheses to enforce the order of operations." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5.0" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(50 - 5*6) / 4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In most computer programming language, there are multiple \"types\" of numbers. This is because the internals of the computer deal with numbers differently depending on whether or not they have decimals. Specifically, the two types of numbers are: Integers (`int`; numbers without decimcals) or Floating-point numbers (`float`; numbers with decimals). You can check the type of a number using the `type()` function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "int" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(2)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "float" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(2.0)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "float" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(2.)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "float" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(1.234)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Where it can sometimes get tricky is if you convert between types. For instance dividing two numbers sometimes results in a fraction result which converts the numbers to a `float`" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.6" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "8 / 5 # Division always returns a floating point number." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(int, int, float)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(8), type(5), type(8/5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " To do what is known as a floor division and get an integer result (discarding any fractional result), you can use the // operator; to calculate the remainder, you can use %:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5.666666666666667" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "17 / 3 # Classic division returns a float." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "17 // 3 # Floor division discards the fractional part." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "17 % 3 # The % operator returns the remainder of the division." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This last operation (%) is pretty common or useful. For example, you can use it to cycle through a list of numbers:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 0\n", "1 1\n", "2 2\n", "3 0\n", "4 1\n", "5 2\n", "6 0\n", "7 1\n", "8 2\n", "9 0\n", "10 1\n" ] } ], "source": [ "numbers = [0,1,2,3,4,5,6,7,8,9,10]\n", "for i in numbers:\n", " print(i, i%3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although we haven't talked about for loops (coming below), the important thing to notice here is how we are stepping through the list of numbers between 0 and 10, but the second column of numbers is limited to between 0 and 2. Thus if you `%` (mod) a integer, it can create a cycle of integers between a particular value. Try changing the denominator to 5 instead of 3 and see what changes!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other calculator type functions are `**` operator to calculate powers:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "25" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "5 ** 2 # 5 squared" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "128" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2 ** 7 # 2 to the power of 7" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to `int` and `float`, Python supports other types of numbers, such as [`Decimal`](https://docs.python.org/3.5/library/decimal.html#decimal.Decimal) and [`Fraction`](https://docs.python.org/3.5/library/fractions.html#fractions.Fraction). Python also has built-in support for [complex numbers](https://docs.python.org/3.5/library/stdtypes.html#typesnumeric), and uses the `j` or `J` suffix to indicate the imaginary part (e.g. `3+5j`)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Variables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the most important concepts in programming, and one feature that makes it really useful is the ability to create **variables** to refer to numbers. Variables are named entities that refer to certain types of data inside the programming language. We can assign values to a variable in order to save a result or use it later. \n", "\n", "I like to think of variables as buckets.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We write the name of the variable on the outside of the bucket and put something in the bucket using assignment.\n", "\n", "Let's look at an example. We can create a variable named `width` and `height`. The equal sign (`=`) assigns a value to a variable:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "area = 9000\n" ] } ], "source": [ "width = 20\n", "height = 5 * 90\n", "area = width * height\n", "print(\"area = \", area)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Luckily in Python, the first time we assign something to a variable, it is created. So by executing `width=20`, it created in the memory of the computer a variable names width that contained the `int` value 20. Similarly, we created a variable called `height` which contains the value `5*90`. Then we created a variable `area` which is the multiplication of `width` and `height`. One nice feature of this is that the code is kind of \"readable\". By giving names to the numbers we give them *meaning*. We understand that this calculation computes the area of a rectangle and which dimensions refer to which parts of the rectangle." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can make up any name for variable, but there are a few simple rules. The rules for the names of variables in Python are:\n", "- A variable name must start with a letter or the underscore character (e.g., `_width`)\n", "- A variable name cannot start with a number\n", "- A variable name can only contain alpha-numeric characters and underscores (A-z, 0-9, and _ )\n", "- Variable names are case-sensitive (`age`, `Age`, and `AGE` are three different variables)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Accessing the value of an undefined variable will cause an error. For instance we have not yet defined `n`, so asking Jupyter to output its value here will not work:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'n' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mn\u001b[0m \u001b[0;31m# Try to access an undefined variable.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mNameError\u001b[0m: name 'n' is not defined" ] } ], "source": [ "n # Try to access an undefined variable." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can get a list of all the current named variables in our Jupyter kernel using this `%whos` command, which is a special feature of Jupyter and not part of Python core language:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Variable Type Data/Info\n", "------------------------------\n", "area int 9000\n", "autopep8 module te-packages/autopep8.py'>\n", "height int 450\n", "i int 10\n", "json module hon3.7/json/__init__.py'>\n", "numbers list n=11\n", "spam int 1\n", "text str # This is not a comment b<...>cause it's inside quotes.\n", "width int 20\n" ] } ], "source": [ "%whos" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You might see a number of variables here in the list but it should include the variables `area`, `width`, and `height` which we defined at the start of this subsection.\n", "\n", "Variables can hold all types of information, not just single number as we will see. In fact, when we get into data analysis, we will read in entire datasets into a variable, and our graphs, statistical tests, and results will all be placed into variables. So it is good to understand the concept of variables and the rules for naming them.\n", "\n", "Variables can also be used temporarily to move things around. For instance, let's define variables `x` and `y`." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "x = 2\n", "y = 7" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We would like to swap the values, so that `x` has the value of `y` and `y` has the value of `x`. To do this, we need to stay organized because if we just assign the value of y directly to x, it will overwrite it. So instead we will create a third \"temporary\" variable to swap them:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x = 1\n", "y = 2\n" ] } ], "source": [ "tmp = x\n", "x = y\n", "y = tmp\n", "print(\"x = \", x)\n", "print(\"y = \", y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To save space in Python, you can define multiple variables at once on the same line:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10 20\n" ] } ], "source": [ "width, height = 10, 20\n", "print(\"width = \", width)\n", "print(\"height = \", height)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This type of compact notation can even be used to more efficiently swap variables:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x = 2\n", "y = 1\n" ] } ], "source": [ "x, y = 1, 2\n", "x, y = y, x\n", "print(\"x = \", x)\n", "print(\"y = \", y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Messing with text (i.e., strings)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Besides numbers, Python can also manipulate strings. Strings are small pieces of text that can be manipulated in Python. Strings can be enclosed in single quotes (`'...'`) or double quotes (`\"...\"`) with the same result. Use `\\` to escape quotes, that is, to use a quote within the string itself:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'spam eggs'" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'spam eggs' # Single quotes." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\"doesn't\"" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'doesn\\'t' # Use \\' to escape the single quote..." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\"doesn't\"" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"doesn't\" # ...or use double quotes instead." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the interactive interpreter and Jupyter notebooks, the output string is enclosed in quotes and special characters are escaped with backslashes. Although this output sometimes looks different from the input (the enclosing quotes could change), the two strings are equivalent. The string is enclosed in double quotes if the string contains a single quote and no double quotes; otherwise, it's enclosed in single quotes. The [`print()`](https://docs.python.org/3.6/library/functions.html#print) function produces a more readable output by omitting the enclosing quotes and by printing escaped and special characters:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'\"Isn\\'t,\" she said.'" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'\"Isn\\'t,\" she said.'" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\"Isn't,\" she said.\n" ] } ], "source": [ "print('\"Isn\\'t,\" she said.')" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'First line.\\nSecond line.'" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = 'First line.\\nSecond line.' # \\n means newline.\n", "s # Without print(), \\n is included in the output." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First line.\n", "Second line.\n" ] } ], "source": [ "print(s) # With print(), \\n produces a new line." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "String literals can span multiple lines and are delineated by triple-quotes: `\"\"\"...\"\"\"` or `'''...'''`. End of lines are automatically included in the string, but it's possible to prevent this by adding a `\\` at the end of the line. For example, without a `\\`, the following example includes an extra line at the beginning of the output:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Usage: thingy [OPTIONS]\n", " -h Display this usage message\n", " -H hostname Hostname to connect to\n", "\n" ] } ], "source": [ "print(\"\"\"\n", "Usage: thingy [OPTIONS]\n", " -h Display this usage message\n", " -H hostname Hostname to connect to\n", "\"\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Strings can be *concatenated* (glued together) with the `+` operator, and repeated with `*`:" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'unununium'" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 3 times 'un', followed by 'ium'\n", "3 * 'un' + 'ium'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To concatenate variables, or a variable and a literal, use `+`:" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Python'" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prefix = 'Py'\n", "prefix + 'thon'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Strings can be *indexed* (subscripted), with the first character having index 0. There is no separate character type; a character is simply a string of size one:" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'P'" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word = 'Python'\n", "word[0] # Character in position 0." ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'n'" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word[5] # Character in position 5." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indices may also be negative numbers, which means to start counting from the end of the string. Note that because -0 is the same as 0, negative indices start from -1:" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'n'" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word[-1] # Last character." ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'o'" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word[-2] # Second-last character." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to indexing, which extracts individual characters, Python also supports *slicing*, which extracts a substring. To slide, you indicate a *range* in the format `start:end`, where the start position is included but the end position is excluded:" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Py'" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word[0:2] # Characters from position 0 (included) to 2 (excluded)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you omit either position, the default start position is 0 and the default end is the length of the string:" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Py'" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word[:2] # Character from the beginning to position 2 (excluded)." ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'on'" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word[4:] # Characters from position 4 (included) to the end." ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'on'" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word[-2:] # Characters from the second-last (included) to the end." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This characteristic means that `s[:i] + s[i:]` is always equal to `s`:" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Python'" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word[:2] + word[2:]" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Python'" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word[:4] + word[4:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of *n* characters has index *n*. For example:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n",
    " +---+---+---+---+---+---+\n",
    " | P | y | t | h | o | n |\n",
    " +---+---+---+---+---+---+\n",
    " 0   1   2   3   4   5   6\n",
    "-6  -5  -4  -3  -2  -1\n",
    "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first row of numbers gives the position of the indices 0...6 in the string; the second row gives the corresponding negative indices. The slice from *i* to *j* consists of all characters between the edges labeled *i* and *j*, respectively.\n", "\n", "For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, the length of `word[1:3]` is 2.\n", "\n", "Attempting to use an index that is too large results in an error:" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "ename": "IndexError", "evalue": "string index out of range", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mword\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m42\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;31m# The word only has 6 characters.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mIndexError\u001b[0m: string index out of range" ] } ], "source": [ "word[42] # The word only has 6 characters." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python strings are [immutable](https://docs.python.org/3.5/glossary.html#term-immutable), which means they cannot be changed. Therefore, assigning a value to an indexed position in a string results in an error:" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "'str' object does not support item assignment", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mword\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'J'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: 'str' object does not support item assignment" ] } ], "source": [ "word[0] = 'J'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The built-in function [`len()`](https://docs.python.org/3.5/library/functions.html#len) returns the length of a string:" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "34" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = 'supercalifragilisticexpialidocious'\n", "len(s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A couple of other useful things for strings are changing to upper or lower case:" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'PYTHON'" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = 'Python'\n", "s.upper()" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'python'" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.lower()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also test if a part of a string is inside a larger string (more on conditionals later):" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'thon' in 'Python'" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'todd' in 'Python'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reverse a string" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'nohtyP'" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[::-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Split a string into parts based on a particular character:" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['milk', ' eggs', ' chocolate', ' ice cream', ' bananas', ' cereal', ' coffee']" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = 'milk, eggs, chocolate, ice cream, bananas, cereal, coffee'\n", "s.split(',')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above cut the string up into a smaller string each time it encountered a comma. The results is a list... speaking of!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Case examples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok so maybe you are thinking, why do I need to program with strings? Well here are a couple of use cases where string as useful." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, if you want to load a file from the internet you often use a string to represent the file name. This code snippet loads the welcome screen for the [psiturk](http://psiturk.org) package. Notice how a string is used to represent the url (i.e., the thing starting 'https')." ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "https://psiturk.org\n", " ______ ______ __ ______ __ __ ______ __ __ \n", "/\\ == \\ /\\ ___\\ /\\ \\ /\\__ _\\ /\\ \\/\\ \\ /\\ == \\ /\\ \\/ / \n", "\\ \\ _-/ \\ \\___ \\ \\ \\ \\ \\/_/\\ \\/ \\ \\ \\_\\ \\ \\ \\ __< \\ \\ _\"-. \n", " \\ \\_\\ \\/\\_____\\ \\ \\_\\ \\ \\_\\ \\ \\_____\\ \\ \\_\\ \\_\\ \\ \\_\\ \\_\\ \n", " \\/_/ \\/_____/ \\/_/ \\/_/ \\/_____/ \\/_/ /_/ \\/_/\\/_/ \n", " \n", " an open platform for science on Amazon Mechanical Turk\n", " \t\t\t\t\t\t\t\t\t\n", "--------------------------------------------------------------------\n", "System status:\n", "\n", "Hi all, You need to be running psiTurk version >= 2.0.0 to use the \n", "Ad Server feature! \n", "\n", "The latest stable version is 2.3.0.\n", "\n", "**ALERT**: Due to a recent Amazon API deprecation, you need to install psiturk version 2.3.0 or later.\n", "\n", "Check https://github.com/NYUCCL/psiTurk or https://psiturk.org for \n", "latest info.\n", "\n" ] } ], "source": [ "import urllib.request\n", "import json\n", "\n", "for line in urllib.request.urlopen(\"https://api.psiturk.org/status_msg?version=2.3\"):\n", " message = json.loads(line.decode('utf-8'))\n", " print(message['status'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Collections" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Everything we have considered so far is mostly single elements (numbers, strings). However, we often also need to deal with collections of numbers and things (actually a string is pretty much a collection of individual characters, but...). There are three built-in types of collections that are useful to know about for this class: lists, dictionaries, and sets." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Lists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Python knows a number of _compound_ data types, which are used to group together other values. The most versatile is the [*list*](https://docs.python.org/3.5/library/stdtypes.html#typesseq-list), which can be written as a sequence of comma-separated values (items) between square brackets. Lists might contain items of different types, but usually the items all have the same type." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is an empty list that contains nothing:" ] }, { "cell_type": "code", "execution_count": 130, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 130, "metadata": {}, "output_type": "execute_result" } ], "source": [ "squares = []\n", "squares" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is a list of numbers" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 4, 9, 16, 25]" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "squares = [1, 4, 9, 16, 25]\n", "squares" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and here are two lists of either all strings or a mixture of numbers and strings:" ] }, { "cell_type": "code", "execution_count": 120, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['one', 'four', 'nine', 'sixteen', 'twentyfive']" ] }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ "squares_string = [\"one\", \"four\", \"nine\", \"sixteen\", \"twentyfive\"]\n", "squares_string" ] }, { "cell_type": "code", "execution_count": 123, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['one', 4, 9, 'sixteen', 25]" ] }, "execution_count": 123, "metadata": {}, "output_type": "execute_result" } ], "source": [ "squares_mixed = [\"one\", 4, 9, \"sixteen\", 25]\n", "squares_mixed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Like strings (and all other built-in [sequence](https://docs.python.org/3.5/glossary.html#term-sequence) types), lists can be indexed and sliced:" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "squares[0] # Indexing returns the item." ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "25" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "squares[-1]" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[9, 16, 25]" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "squares[-3:] # Slicing returns a new list." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All slice operations return a new list containing the requested elements. This means that the following slice returns a new (shallow) copy of the list:" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 4, 9, 16, 25]" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "squares[:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists also support concatenation with the `+` operator:" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "squares + [36, 49, 64, 81, 100]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unlike strings, which are [immutable](https://docs.python.org/3.5/glossary.html#term-immutable), lists are a [mutable](https://docs.python.org/3.5/glossary.html#term-mutable) type, which means you can change any value in the list:" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "64" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cubes = [1, 8, 27, 65, 125] # Something's wrong here ...\n", "4 ** 3 # the cube of 4 is 64, not 65!" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 8, 27, 64, 125]" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cubes[3] = 64 # Replace the wrong value.\n", "cubes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the list's `append()` method to add new items to the end of the list:" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 8, 27, 64, 125, 216, 343]" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cubes.append(216) # Add the cube of 6 ...\n", "cubes.append(7 ** 3) # and the cube of 7.\n", "cubes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can even assign to slices, which can change the size of the list or clear it entirely:" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['a', 'b', 'c', 'd', 'e', 'f', 'g']" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']\n", "letters" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['a', 'b', 'C', 'D', 'E', 'f', 'g']" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Replace some values.\n", "letters[2:5] = ['C', 'D', 'E']\n", "letters" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['a', 'b', 'f', 'g']" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Now remove them.\n", "letters[2:5] = []\n", "letters" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Clear the list by replacing all the elements with an empty list.\n", "letters[:] = []\n", "letters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The built-in [`len()`](https://docs.python.org/3.5/library/functions.html#len) function also applies to lists:" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "letters = ['a', 'b', 'c', 'd']\n", "len(letters)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can nest lists, which means to create lists that contain other lists. For example:" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[['a', 'b', 'c'], [1, 2, 3]]" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = ['a', 'b', 'c']\n", "n = [1, 2, 3]\n", "x = [a, n]\n", "x" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['a', 'b', 'c']" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[0]" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'b'" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[0][1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can create lists from scratch using a number of methods. For example, to create a list of the number from 0 to 10, you can use the `range()` function which automatically generates an [iterator](https://wiki.python.org/moin/Iterator) which steps through a set of values:" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(range(10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also create lists by repeating a list many times:" ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]" ] }, "execution_count": 125, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[1]*10" ] }, { "cell_type": "code", "execution_count": 126, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]" ] }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[0]*10" ] }, { "cell_type": "code", "execution_count": 127, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2]" ] }, "execution_count": 127, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[1,2]*10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dictionaries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dictionaries are collections of `key`-`value` pairs. One easy way to understand the difference between lists and dictionaries is that in a list you can \"lookup\" an entry of the list using a index value (e.g., `squares[1]`). In a dictionary, you can look up values using anything as the index value, including string, numbers, or other Python elements (technically any object which is hashable).\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we can create an empty dictionary that contains nothing:" ] }, { "cell_type": "code", "execution_count": 131, "metadata": {}, "outputs": [], "source": [ "person = {}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we might want to try initializing with some key value pairs. Each key-value pair is separate by commas (similar to a list collection) but the key and value are separated with `:`" ] }, { "cell_type": "code", "execution_count": 135, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'firstname': 'todd',\n", " 'hair': 'greying',\n", " 'lastname': 'gureckis',\n", " 'office': '859'}" ] }, "execution_count": 135, "metadata": {}, "output_type": "execute_result" } ], "source": [ "person = { 'firstname': 'todd', 'lastname': 'gureckis', 'office': 859, 'hair': 'greying'}\n", "person" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another way to create a dictionary is using the `dict()` function:" ] }, { "cell_type": "code", "execution_count": 137, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'firstname': 'todd', 'hair': 'greying', 'lastname': 'gureckis', 'office': 859}" ] }, "execution_count": 137, "metadata": {}, "output_type": "execute_result" } ], "source": [ "person = dict(firstname='todd', lastname='gureckis', office=859, hair='greying')\n", "person" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The main think about dictionaries is that you can \"lookup\" any value you want by the \"key\":" ] }, { "cell_type": "code", "execution_count": 138, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'todd'" ] }, "execution_count": 138, "metadata": {}, "output_type": "execute_result" } ], "source": [ "person['firstname']" ] }, { "cell_type": "code", "execution_count": 139, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'greying'" ] }, "execution_count": 139, "metadata": {}, "output_type": "execute_result" } ], "source": [ "person['hair']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sometimes you want to look inside a dictionary to see all the elements:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is all the keys:" ] }, { "cell_type": "code", "execution_count": 141, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['firstname', 'lastname', 'office', 'hair'])" ] }, "execution_count": 141, "metadata": {}, "output_type": "execute_result" } ], "source": [ "person.keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Values:" ] }, { "cell_type": "code", "execution_count": 142, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_values(['todd', 'gureckis', 859, 'greying'])" ] }, "execution_count": 142, "metadata": {}, "output_type": "execute_result" } ], "source": [ "person.values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or items:" ] }, { "cell_type": "code", "execution_count": 143, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_items([('firstname', 'todd'), ('lastname', 'gureckis'), ('office', 859), ('hair', 'greying')])" ] }, "execution_count": 143, "metadata": {}, "output_type": "execute_result" } ], "source": [ "person.items()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looking up a key that doesn't exists in the dictionary results in a `KeyError` error:" ] }, { "cell_type": "code", "execution_count": 146, "metadata": {}, "outputs": [ { "ename": "KeyError", "evalue": "'nope'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mperson\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'nope'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mKeyError\u001b[0m: 'nope'" ] } ], "source": [ "print(person['nope'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to initializing a dictionary you can add to it later:" ] }, { "cell_type": "code", "execution_count": 149, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'age': '>30',\n", " 'building': 'meyer',\n", " 'email': 'no thanks',\n", " 'firstname': 'todd',\n", " 'hair': 'greying',\n", " 'lastname': 'gureckis',\n", " 'office': 859,\n", " 'position': 'associate professor'}" ] }, "execution_count": 149, "metadata": {}, "output_type": "execute_result" } ], "source": [ "person = { 'firstname': 'todd', 'lastname': 'gureckis', 'office': 859, 'hair': 'greying'}\n", "person['age']='>30'\n", "person['position']='associate professor'\n", "person['building']='meyer'\n", "person['email']='no thanks'\n", "person" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also overwrite an existing value:" ] }, { "cell_type": "code", "execution_count": 151, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'age': '>30',\n", " 'building': 'meyer',\n", " 'email': 'no thanks',\n", " 'firstname': 'TODD',\n", " 'hair': 'greying',\n", " 'lastname': 'gureckis',\n", " 'office': 859,\n", " 'position': 'associate professor'}" ] }, "execution_count": 151, "metadata": {}, "output_type": "execute_result" } ], "source": [ "person['firstname']='TODD'\n", "person" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to indexing by key using the `[]`, you can use the `.get()` function to lookup by a key. This is useful because you can provide an optional value in case the lookup fails:" ] }, { "cell_type": "code", "execution_count": 152, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'TODD'" ] }, "execution_count": 152, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# this works\n", "person.get('firstname')" ] }, { "cell_type": "code", "execution_count": 157, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'oops'" ] }, "execution_count": 157, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# this fails but you get to return 'oops' in that case instead of an error\n", "person.get('phonenumber', 'oops')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can merge two dictionary together:" ] }, { "cell_type": "code", "execution_count": 159, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'a': 1, 'b': 2, 'c': 3}\n" ] } ], "source": [ "dict1 = {'a': 1, 'b': 2}\n", "dict2 = {'c': 3}\n", "dict1.update(dict2)\n", "print(dict1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If they have the same keys, the second one will overwrite the first." ] }, { "cell_type": "code", "execution_count": 160, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'a': 1, 'b': 2, 'c': 4}\n" ] } ], "source": [ "# If they have same keys:\n", "dict1.update({'c': 4})\n", "print(dict1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Why learn about dictionaries? Well at least one reason is that dictionaries are a very useful way of organizing data. For example, one might naturally think of the columns of a excel spreadsheet or data file as being labeled with 'keys' that have a list of values underneath them. This is exactly a data format that `pandas` (a library that we will use in this class; more on this and other libraries later) likes:" ] }, { "cell_type": "code", "execution_count": 164, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
studentgrades
010.95
120.27
230.45
340.80
\n", "
" ], "text/plain": [ " student grades\n", "0 1 0.95\n", "1 2 0.27\n", "2 3 0.45\n", "3 4 0.80" ] }, "execution_count": 164, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df=pd.DataFrame({'student': [1,2,3,4], 'grades':[0.95, 0.27, 0.45, 0.8] })\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another reason is that a very common data file format online is [JSON](https://stackoverflow.com/questions/383692/what-is-json-and-why-would-i-use-it), which is a data file format composed of key-value pairs. Here is an example of a string that contains JSON:" ] }, { "cell_type": "code", "execution_count": 170, "metadata": {}, "outputs": [], "source": [ "jsonstring='''\n", "{\n", " \"firstName\": \"John\",\n", " \"lastName\": \"Smith\",\n", " \"address\": {\n", " \"streetAddress\": \"21 2nd Street\",\n", " \"city\": \"New York\",\n", " \"state\": \"NY\",\n", " \"postalCode\": 10021\n", " },\n", " \"phoneNumbers\": [\n", " \"212 555-1234\",\n", " \"646 555-4567\"\n", " ]\n", " }\n", "'''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which can be quickly loaded into a dictionary using the `json` library:" ] }, { "cell_type": "code", "execution_count": 172, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'address': {'city': 'New York',\n", " 'postalCode': 10021,\n", " 'state': 'NY',\n", " 'streetAddress': '21 2nd Street'},\n", " 'firstName': 'John',\n", " 'lastName': 'Smith',\n", " 'phoneNumbers': ['212 555-1234', '646 555-4567']}" ] }, "execution_count": 172, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import json\n", "json_dictionary=json.loads(jsonstring)\n", "json_dictionary" ] }, { "cell_type": "code", "execution_count": 173, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'John'" ] }, "execution_count": 173, "metadata": {}, "output_type": "execute_result" } ], "source": [ "json_dictionary['firstName']" ] }, { "cell_type": "code", "execution_count": 174, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['212 555-1234', '646 555-4567']" ] }, "execution_count": 174, "metadata": {}, "output_type": "execute_result" } ], "source": [ "json_dictionary['phoneNumbers']" ] }, { "cell_type": "code", "execution_count": 175, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'city': 'New York',\n", " 'postalCode': 10021,\n", " 'state': 'NY',\n", " 'streetAddress': '21 2nd Street'}" ] }, "execution_count": 175, "metadata": {}, "output_type": "execute_result" } ], "source": [ "json_dictionary['address']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This last example shows how a dictionary can be inside a dictionary!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "The final collection we will discuss is a set. You might have learned about sets and set theory in high school. Well Python has some tools for reasoning about sets. Remember a set is an unordered collection of objects. So unlike a list, there is not \"first element\" or \"second element\" in a set. Instead, a set just contains objects and allows you to do various types of set operations, such as testing if an element is within a set, testing if a set is a subset of another set, etc...\n", "\n", "You can create a set like this:" ] }, { "cell_type": "code", "execution_count": 177, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Bear', 'Cat', 'Coyote', 'Dog', 'Elephant'}" ] }, "execution_count": 177, "metadata": {}, "output_type": "execute_result" } ], "source": [ "animal_set = set(['Coyote', 'Dog', 'Bear', 'Cat', 'Elephant'])\n", "animal_set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how even though 'Coyote' was the first element in the list that we used to initialize the set, it is no longer the first item in the output. This is because sets are **unordered**.\n", "\n", "Sets are related to dictionaries (the elements of a dictionary are also unordered but they are indexed by a key where as a set has no key). Thus, you can create a set using the `{}` operations." ] }, { "cell_type": "code", "execution_count": 179, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "set_1 = {2, 4, 6, 8, 10}\n", "set_2 = {42, 'foo', (1, 2, 3), 3.14159}\n", "print(type(set_1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The size of a set can be found with the `len()` operator we saw before:" ] }, { "cell_type": "code", "execution_count": 180, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 180, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(animal_set)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can test if an element is contained within a set:" ] }, { "cell_type": "code", "execution_count": 181, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 181, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'owl' in animal_set" ] }, { "cell_type": "code", "execution_count": 182, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 182, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'Coyote' in animal_set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can take the union of two sets, which will find all the elements unique and in common to both set. Repeats are removed this way:" ] }, { "cell_type": "code", "execution_count": 183, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'bar', 'baz', 'foo', 'quux', 'qux'}" ] }, "execution_count": 183, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x1 = {'foo', 'bar', 'baz'}\n", "x2 = {'baz', 'qux', 'quux'}\n", "\n", "x1 | x2" ] }, { "cell_type": "code", "execution_count": 184, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'bar', 'baz', 'foo', 'quux', 'qux'}" ] }, "execution_count": 184, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# or\n", "x1.union(x2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also find the set intersection" ] }, { "cell_type": "code", "execution_count": 185, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'baz'}" ] }, "execution_count": 185, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x1 = {'foo', 'bar', 'baz'}\n", "x2 = {'baz', 'qux', 'quux'}\n", "\n", "x1.intersection(x2)" ] }, { "cell_type": "code", "execution_count": 186, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'baz'}" ] }, "execution_count": 186, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x1 & x2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The intersection finds the things in one set that are not in the other:" ] }, { "cell_type": "code", "execution_count": 187, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'bar', 'foo'}" ] }, "execution_count": 187, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x1 = {'foo', 'bar', 'baz'}\n", "x2 = {'baz', 'qux', 'quux'}\n", "\n", "x1.difference(x2)" ] }, { "cell_type": "code", "execution_count": 188, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'bar', 'foo'}" ] }, "execution_count": 188, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x1-x2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `symmetric_difference` function finds the things in either set A or set B but not both:" ] }, { "cell_type": "code", "execution_count": 189, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'bar', 'foo', 'quux', 'qux'}" ] }, "execution_count": 189, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x1 = {'foo', 'bar', 'baz'}\n", "x2 = {'baz', 'qux', 'quux'}\n", "\n", "x1.symmetric_difference(x2)" ] }, { "cell_type": "code", "execution_count": 190, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'bar', 'foo', 'quux', 'qux'}" ] }, "execution_count": 190, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x1 ^ x2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You and ask if two sets contain anything in common using `isdisjoint()`." ] }, { "cell_type": "code", "execution_count": 191, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 191, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x1 = {1, 3, 5}\n", "x2 = {2, 4, 6}\n", "\n", "x1.isdisjoint(x2)" ] }, { "cell_type": "code", "execution_count": 192, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 192, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x1 = {1, 3, 5, 7} # adding one element in commont\n", "x2 = {2, 4, 6, 7}\n", "\n", "x1.isdisjoint(x2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can ask if one set is a subset of another one:" ] }, { "cell_type": "code", "execution_count": 194, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 194, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x1 = {1, 3, 5, 7} \n", "x2 = {2, 4, 6, 7}\n", "\n", "x1.issubset(x2)" ] }, { "cell_type": "code", "execution_count": 196, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 196, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x1 = {2,4}\n", "x2 = {2, 4, 6, 7}\n", "\n", "x1.issubset(x2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To add or remove a new element to the set, just use `add()` or `remove()`" ] }, { "cell_type": "code", "execution_count": 199, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{2, 4, 'foo'}" ] }, "execution_count": 199, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x1.add('foo')\n", "x1" ] }, { "cell_type": "code", "execution_count": 200, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{2, 4}" ] }, "execution_count": 200, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x1.remove('foo')\n", "x1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sets are sometimes useful in data analysis because they can be used to get all the unique elements of a list. For example, if you have a list of ages of participants, a set could be a nice way to find the different values it takes:" ] }, { "cell_type": "code", "execution_count": 201, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{14, 15, 17, 18, 22, 24, 35}" ] }, "execution_count": 201, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ages = [14,15,35,15,24,14,17,18,22,22,24]\n", "set(ages)" ] }, { "cell_type": "code", "execution_count": 202, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7" ] }, "execution_count": 202, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(set(ages))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above shows there are 7 unique ages in the data set by getting rid of duplicates." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Flow Control" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Within a Jupyter code cell, we have been seeing each line of code execute in sequence. However often times we need to exert control over which bits of code run depending on other variables or settings. The is generally known as [flow control](https://docs.python.org/3/tutorial/controlflow.html) elements. These include conditionals like `if`, `else`, and `elif`, as well as loops like `for`. Again this is not a full programming course so we can only cover this in partial depth (e.g., we will not talk about `while` loops although they can be really useful). However, it should be enough to get by in this class." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Testing if things are true" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "\n", "The first thing we need is to consider how we deal with testing if some condition has been met. One of the most common ways to deal with this in Python is with the double equals sign `==`. The double equals sign is different from the single equals sign in that the single **assign** a value to a variable while the double **tests** if two things are equal or not. Thus the single equal sign is a bit like a command to \"make these things equal\" whereas the double equal sign is a bit more like a question asking \"are these two things equal?\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1==2" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1==1" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'hello'=='hello'" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'hello'=='HELLO'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While this gives you a few example, a more common use case is testing if a variable is equal to a particular value. We already know 1 is equal to 1, but we might not know the contents of a variable and thus testing gives us a way to check if it means a particular condition." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myvar = 10\n", "\n", "myvar == 10" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myvar == 11" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are related comparisons. For example, we might want to test if a variable is greater than a particular value" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myvar > 10 # is myvar greater than 10" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myvar >=5 # is myvar greater than or equal to 5" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myvar <= 15 # is myvar less than or equal to 15" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A final one is not equal to which is just the opposite of the `==`:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myvar != 5" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myvar != 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is also a slightly more English-like version of this test:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myvar is 10" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myvar is not 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also test if an item is a part of a collection (e.g., a list or set):" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mylist = ['one', 'two', 'three', 'four']\n", "\n", "'one' in mylist # is 'one' in mylist?" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'six' not in mylist # is 'six' not in mylist?" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mystring = 'lkjasldfkj'\n", "\n", "'jas' in mystring # tests if 'jas' is a substring of mystring" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'jas' not in mystring # tests if 'jas' is not a substring of mystring" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conditionals (if-then-else)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Now that we have seen a few ways to test if things are true/false or meet some particular condiiton, the next step is to execute different code depending on what the test gives us. The simplest version of code that accomplishes this has the general form like this:\n", "\n", "```\n", "if :\n", " \n", "```\n", "\n", "where `expression` is a true/false (Boolean) as described in the previous section and `` is some collection of lines that will be executed only if the expression is `True`.\n", "\n", "For a more example, lets create a variable called 'myvar' and set it equal to 10. Then we will write a short program that prints 'hi':" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mow the lawn\n", "Weed the garden\n", "Walk the dog\n" ] } ], "source": [ "weather = 'nice'\n", "\n", "if weather == 'nice':\n", " print('Walk the dog')\n", " print('Mow the lawn')\n", " print('Weed the flower bed')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The important thing about this code is that you could run it even if you weren't sure what the value of `weather` was because it was set earlier in the program or by some other piece of complex code. Thus, it lets you run a special bit of code depending on the value of a variable.\n", "\n", "The situation you would use this type of code in is very intuitive... sometimes you only do certain step when something is true (like take an umbrella when it is raining).\n", "\n", "Now there is a couple of **very important** but sometimes **subtle or confusing** parts about this.\n", "\n", "First is that you'll notice that the lines which reads `print()` are not aligned with the rest of the code in that cell. This is because the first character of that line is the tab character (the one on your keyboard you sometimes use to start the first line of a paragraph when writing).\n", "\n", "In Python this spacing is **very important**. Any line that is tabbed over from the line above it is known as a \"code block\":" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "this is not\n" ] } ], "source": [ "myvar = 10\n", "myvar2 = 20\n", "\n", "if myvar == myvar2:\n", " # this is inside the code block\n", " print(\"this is\")\n", " print(\"a code block\")\n", "\n", "# this is not in the code block because it is not tabbed over\n", "print(\"this is not\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All programming languages have some type of code block syntax, but in Python, you just use tab and untab to do this. This feature might actually be one of the main reasons Python is so popular (I kid you not). The reason is this is a very elegant way to indicate code blocks and it makes the code very readable compared to other languages. The simplicity can sometimes be confusing for new users, though, because you really have to keep track of the level of indentation of your code. It is not a big deal once you get used to it but at first you will need to be on-guard about which lines are or are not tabbed over." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "this is\n", "a code block\n", "this is not\n" ] } ], "source": [ "myvar = 10\n", "myvar2 = 10\n", "\n", "if myvar == myvar2:\n", " print(\"this is\")\n", " print(\"a code block\")\n", "\n", "print(\"this is not\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consider the two contrasting examples above (the one just now and the one before it). In one, `myvar` and `myvar2` have a different value. Thus, the equal `==` test fails (return `False`) and then the tabbed code block is skipped. In the second example, the value of the test is `True` so the code block is run. In both cases, the final print (which is *not* tabbed over) runs no matter what (so it is printed out in both examples). The general structure is this:\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the `if` statement we just considered you take an optional path through the code and then continue. However, othertimes you want to take one path if something is `True` and another path if it is `False`. For example:\n", "\n", "```\n", "if raining:\n", " - take umbrella\n", "otherwise:\n", " - take sunglasses\n", "- take wallet\n", "```\n", "\n", "This is not valid Python but it makes intuitive sense... sometimes if the conditional is false we want to do something else. This in Python is accomplished with the `else` command:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Take umbrella\n", "Take wallet\n" ] } ], "source": [ "raining = True\n", "\n", "if raining:\n", " print(\"Take umbrella\")\n", "else:\n", " print(\"Take sunglasses\")\n", "\n", "print(\"Take wallet\")" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Take sunglasses\n", "Take wallet\n" ] } ], "source": [ "raining = False\n", "\n", "if raining:\n", " print(\"Take umbrella\")\n", "else:\n", " print(\"Take sunglasses\")\n", "\n", "print(\"Take wallet\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you compare the two code cells above, you can see that depending on the value of raining (either `True` or `False`... these are special words in Python) then it takes a different path through the code.\n", "\n", "Great, but what if there many conditions? So you might want to take your sunglasses only if it is not cloudy. But somedays it might not be raining but still be cloudy and so you can use the `elif` (which is a combination of the `else` and `if` commands to consider multiple conditions:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Take sunglasses\n", "Take wallet\n" ] } ], "source": [ "weather = 'sunny'\n", "\n", "if weather == 'raining':\n", " print(\"Take umbrella\")\n", "elif weather == 'sunny':\n", " print(\"Take sunglasses\")\n", "elif weather == 'cloudy':\n", " print(\"Take sweater\")\n", "\n", "print(\"Take wallet\")" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Take umbrella\n", "Take wallet\n" ] } ], "source": [ "weather = 'raining'\n", "\n", "if weather == 'raining':\n", " print(\"Take umbrella\")\n", "elif weather == 'sunny':\n", " print(\"Take sunglasses\")\n", "elif weather == 'cloudy':\n", " print(\"Take sweater\")\n", "\n", "print(\"Take wallet\")" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Take sweater\n", "Take wallet\n" ] } ], "source": [ "weather = 'cloudy'\n", "\n", "if weather == 'raining':\n", " print(\"Take umbrella\")\n", "elif weather == 'sunny':\n", " print(\"Take sunglasses\")\n", "elif weather == 'cloudy':\n", " print(\"Take sweater\")\n", "\n", "print(\"Take wallet\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Comparing the three cells above, you can see that depending on multiple values that the variable `weather` could take, you do different things. Pretty simple!\n", "\n", "If needed, you can always end a `if/elif` sequence with an final `else`:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I don't know what to do!\n", "Take wallet\n" ] } ], "source": [ "weather = 'heat wave'\n", "\n", "if weather == 'raining':\n", " print(\"Take umbrella\")\n", "elif weather == 'sunny':\n", " print(\"Take sunglasses\")\n", "elif weather == 'cloudy':\n", " print(\"Take sweater\")\n", "else:\n", " print(\"I don't know what to do!\")\n", "\n", "print(\"Take wallet\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, in the case we come up against a heat wave (or anything else!), our program doesn't know what to recommend!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### For Loops" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using loops in computer programming allows us to automate and repeat similar tasks multiple times. This is very common in data analysis. In this tutorial, we’ll be covering Python’s **for loop**.\n", "\n", "\n", "
\n", "\n", "A `for` loop implements the repeated execution of code based on a loop counter or loop variable. This means that `for` loops are used most often when the number of repetitions is known before entering the loop, unlike **while loops**, which can run until some condition is met." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Python, `for` loops are constructed like so:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "for [iterating variable] in [sequence]:\n", " [do something]\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The something that is being done (known as a code block) will be executed until the sequence is over. The code block itself can consist of any number of lines of code, as long as they are tabbed over once from the left hand side of the code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s look at a `for` loop that iterates through a range of values:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "1\n", "2\n", "3\n", "4\n" ] } ], "source": [ "for i in range(0,5):\n", " print(i)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This `for` loop sets up `i` as its iterating variable, and the sequence exists in the range of 0 to 5.\n", "\n", "Then within the loop, we print out one integer per loop iteration. Keep in mind that in programming, we tend to begin at index 0, so that is why although 5 numbers are printed out, they range from 0-4.\n", "\n", "You’ll commonly see and use `for` loops when a program needs to repeat a block of code a number of times." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### For Loops using `range()`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of Python’s built-in immutable sequence types is `range()`. In loops, `range()` is used to control how many times the loop will be repeated.\n", "\n", "When working with `range()`, you can pass between 1 and 3 integer arguments to it:\n", "\n", "- `start` states the integer value at which the sequence begins, if this is not included then start begins at 0\n", "- `stop` is always required and is the integer that is counted up to but not included\n", "- `step` sets how much to increase (or decrease in the case of negative numbers) the next iteration, if this is omitted then step defaults to 1\n", "\n", "We’ll look at some examples of passing different arguments to `range()`.\n", "\n", "First, let’s only pass the `stop` argument, so that our sequence set up is `range(stop)`:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "1\n", "2\n", "3\n", "4\n", "5\n" ] } ], "source": [ "for i in range(6):\n", " print(i)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the program above, the stop argument is 6, so the code will iterate from 0-6 (exclusive of 6)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we’ll look at `range(start, stop)`, with values passed for when the iteration should start and for when it should stop. Here, the range goes from 20 (inclusive) to 25 (exclusive), so the output looks like this:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "20\n", "21\n", "22\n", "23\n", "24\n" ] } ], "source": [ "for i in range(20,25):\n", " print(i)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The step argument of `range()` can be used to skip values within the sequence.\n", "\n", "With all three arguments, `step` comes in the final position: `range(start, stop, step)`. First, let’s use a `step` with a positive value. In this case, the `for` loop is set up so that the numbers from 0 to 15 print out, but at a step of 3, so that only every third number is printed, like so:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "3\n", "6\n", "9\n", "12\n" ] } ], "source": [ "for i in range(0,15,3):\n", " print(i)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also use a negative value for our `step` argument to iterate backwards, but we’ll have to adjust our start and stop arguments accordingly. Here, 100 is the `start` value, 0 is the `stop` value, and -10 is the range, so the loop begins at 100 and ends at 0, decreasing by 10 with each iteration. We can see this occur in the output:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "100\n", "90\n", "80\n", "70\n", "60\n", "50\n", "40\n", "30\n", "20\n", "10\n" ] } ], "source": [ "for i in range(100,0,-10):\n", " print(i)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When programming in Python, for loops often make use of the `range()` sequence type as its parameters for iteration." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### For Loops using Sequential Data Types" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists and other data sequence types can also be leveraged as iteration parameters in for loops. Rather than iterating through a `range()`, you can define a list and iterate through that list.\n", "\n", "We’ll assign a list to a variable, and then iterate through the list:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hammerhead\n", "great white\n", "dogfish\n", "frilled\n", "bullhead\n", "requiem\n" ] } ], "source": [ "sharks = ['hammerhead', 'great white', 'dogfish', 'frilled', 'bullhead', 'requiem']\n", "\n", "for shark in sharks:\n", " print(shark)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output above shows that the `for` loop iterated through the list, and printed each item from the list per line." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists and other sequence-based data types like strings and tuples are common to use with loops because they are iterable. You can combine these data types with range() to add items to a list, for example:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['hammerhead', 'great white', 'dogfish', 'frilled', 'bullhead', 'requiem', 'shark', 'shark', 'shark', 'shark', 'shark', 'shark']\n" ] } ], "source": [ "sharks = ['hammerhead', 'great white', 'dogfish', 'frilled', 'bullhead', 'requiem']\n", "\n", "for item in range(len(sharks)):\n", " sharks.append('shark')\n", "\n", "print(sharks)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we have added a placeholder string of 'shark' for each item of the length of the sharks list.\n", "\n", "You can also use a `for` loop to construct a list from scratch. In this example, the list `integers` is initialized as an empty list, but the for loop populates the list like so:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n" ] } ], "source": [ "integers = []\n", "\n", "for i in range(10):\n", " integers.append(i)\n", "\n", "print(integers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, we can iterate through strings:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "S\n", "a\n", "m\n", "m\n", "y\n" ] } ], "source": [ "sammy = 'Sammy'\n", "\n", "for letter in sammy:\n", " print(letter)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When iterating through a dictionary, it’s important to keep the `key:value` structure in mind to ensure that you are calling the correct element of the dictionary. Here is an example that calls both the key and the value:" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "name: Sammy\n", "animal: shark\n", "color: blue\n", "location: ocean\n" ] } ], "source": [ "sammy_shark = {'name': 'Sammy', 'animal': 'shark', 'color': 'blue', 'location': 'ocean'}\n", "\n", "for key in sammy_shark:\n", " print(key + ': ' + sammy_shark[key])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When using dictionaries with `for` loops, the iterating variable corresponds to the keys of the dictionary, and `dictionary_variable[iterating_variable]` corresponds to the values. In the case above, the iterating variable key was used to stand for `key`, and `sammy_shark[key]` was used to stand for the values." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Loops are often used to iterate and manipulate sequential data types." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Nested For Loops" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Loops can be nested in Python, as they can with other programming languages.\n", "\n", "A nested loop is a loop that occurs within another loop, structurally similar to nested if statements. These are constructed like so:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "for [first iterating variable] in [outer loop]: # Outer loop\n", " [do something] # Optional\n", " for [second iterating variable] in [nested loop]: # Nested loop\n", " [do something] \n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The program first encounters the outer loop, executing its first iteration. This first iteration triggers the inner, nested loop, which then runs to completion. Then the program returns back to the top of the outer loop, completing the second iteration and again triggering the nested loop. Again, the nested loop runs to completion, and the program returns back to the top of the outer loop until the sequence is complete or a break or other statement disrupts the process." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s implement a nested `for` loop so we can take a closer look. In this example, the outer loop will iterate through a list of integers called `num_list`, and the inner loop will iterate through a list of strings called `alpha_list`." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n", "a\n", "b\n", "c\n", "2\n", "a\n", "b\n", "c\n", "3\n", "a\n", "b\n", "c\n" ] } ], "source": [ "num_list = [1, 2, 3]\n", "alpha_list = ['a', 'b', 'c']\n", "\n", "for number in num_list:\n", " print(number)\n", " for letter in alpha_list:\n", " print(letter)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output illustrates that the program completes the first iteration of the outer loop by printing `1`, which then triggers completion of the inner loop, printing `a`,`b`, `c` consecutively. Once the inner loop has completed, the program returns to the top of the outer loop, prints `2`, then again prints the inner loop in its entirety (`a`, `b`, `c`), etc." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nested `for` loops can be useful for iterating through items within lists composed of lists. In a list composed of lists, if we employ just one for loop, the program will output each internal list as an item:" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['hammerhead', 'great white', 'dogfish']\n", "[0, 1, 2]\n", "[9.9, 8.8, 7.7]\n" ] } ], "source": [ "list_of_lists = [['hammerhead', 'great white', 'dogfish'],[0, 1, 2],[9.9, 8.8, 7.7]]\n", "\n", "for list in list_of_lists:\n", " print(list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to access each individual item of the internal lists, we’ll implement a nested `for` loop:" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hammerhead\n", "great white\n", "dogfish\n", "0\n", "1\n", "2\n", "9.9\n", "8.8\n", "7.7\n" ] } ], "source": [ "list_of_lists = [['hammerhead', 'great white', 'dogfish'],[0, 1, 2],[9.9, 8.8, 7.7]]\n", "\n", "for list in list_of_lists:\n", " for item in list:\n", " print(item)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we utilize a nested for loop, we are able to iterate over the individual items contained in the lists." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another useful looping structure in known as a `while`-loop. We don't have time to go through a while loop in detail and they are used a bit less often in our data analysis but you can watch this great tutorial on [RealPython](https://realpython.com/lessons/intro-while-loops-python/).\n", "\n", "On the class webpage there is also an example of [40 For loops](../../tips/fortyforloops.html) which show many concrete examples used in data analysis that you might find helpful." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Writing New Functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "In the previous sections, we learned how we can organize code into **code blocks** by tabbing it over and wrapping it in control structures (which then execute the code a certain number of times or depending on some particular condition). As you begin to write longer and longer programs though, it sometimes helps to break up the functionality of your programs into reuseable chunks called **functions**. Functions make your code more readable, save time and typing by organizing bits of code into reuseable chunks, and make it easier to share your code with other people." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The general format for a function is composed of a few elements. First there is a line the defines the **name** of the function and the **parameters** is can take (more on that later). Next a sequence of instructions are provided that are tabbed over from the definition. Ending the tab ends the definition of the function:\n", " \n", "```\n", "def my_function(parameter1, parameter2):\n", " \n", " \n", " \n", "```\n", "\n", "This just gives you a general sense of those functions are organized and is not a valid Python command!\n", "\n", "Let's look at an actual version." ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "def my_function():\n", " print(\"Hello from my function\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice two things about this function. First is that is has no parameters (which is fine... we'll talk about how to add them later). Second, when you run this code cell, nothing happens. This is because this simply **defines** the function but does not run it. To run this function someplace else in our code, we would write:" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello from my function\n" ] } ], "source": [ "my_function()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we \"executed\" (i.e., ran) the function by calling its name with the parentheses.\n", "\n", "Functions can be combined with other elements of Python to create more complex program flows. For example, we can combine a custom function with a for loop:" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "*\n", "***\n", "*****\n", "*\n", "***\n", "*****\n", "*\n", "***\n", "*****\n", "*\n", "***\n", "*****\n" ] } ], "source": [ "def my_ramp():\n", " print('*')\n", " print('***')\n", " print('*****')\n", "\n", "for i in range(4):\n", " my_ramp()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above code, which prints out a ramping set of stars four times, is much more efficient than if you repeated the print statements 16 times." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Parameters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Functions can take different types of \"parameters\", which are essentially variables that are created at the start of the execution of the function. This is helpful for making more abstract functions that perform some computation with their inputs. For example:" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "def add(x, y):\n", " print(\"The sum is \", x+y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `add` functions we just defined takes two parameters `x` and `y`. These parameters are then used as the operands to an addition operation. As a result, we can run this function many times with different inputs and get different results:" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The sum is 3\n" ] } ], "source": [ "add(1,2)" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The sum is 62\n" ] } ], "source": [ "add(34,28)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Return values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The functions we have considered so far simply print something out and then finish. However, sometimes you might like to have your function give back one or more result for further processing by other parts of your program. For instance, instead of printing out the sum of x and y, we can redefine the sum() function to return the sum using a special keyword `return`:" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "def sum(x,y):\n", " return x+y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now when we run this function, instead of printing out a message, the value of the sum is calculated and passed back." ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "9" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum(4,5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This allows you to continue to do additional processing. For example, we can add two numbers and then compute the square root of them:" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.0" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import math\n", "\n", "math.sqrt(sum(4,5))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can return multiple values from a function if you want. For example, we could create a function call `arithmetic` that does a number of operations to x and y and returns them all:" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [], "source": [ "def arithmetic(x,y):\n", " _sum = x+y\n", " _diff = x-y\n", " _prod = x*y\n", " _div = x/y\n", " return _sum, _diff, _prod, _div" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This new function gives back a `tuple` (similar to a list) that contains all four of the results in one step." ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(10, 4, 21, 2.3333333333333335)" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arithmetic(7,3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing additional functionality" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the best features of Python, and one reason contributing to its popularity is the large number of add on packages. For example, there are packages for data analysis (`pandas`), plotting (`matplotlib` or `seaborn`), etc... These packages are not all loaded automatically. Instead at the start of your notebook or program, you often need to **import** this functionality. There are a couple of ways to import packages. Generally you need to know the name of the package and details about what functions or methods it provides. Most packages have tons of great documentation. For example, `pandas` a library we will use a lot in class has extensive documentation on the [pydata website](https://pandas.pydata.org/pandas-docs/stable/). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Basic importing is accomplished with the `import` command. For example, to import the back math functions in Python just type:" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [], "source": [ "import math" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can access the methods provided by the math function using the `.` (dot) operator. Any function that the `math` library provides is accessible to us now using `math.`. For example, to compute the cosine of a number:" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5403023058681398" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "math.cos(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sometimes we want to import a library but rename it so that it is easier to type. For instance we will use the pandas library a lot in class. If we type `import pandas` that will load the library but that means everytime we use a pandas function it will need to begin with the `pandas.` syntax. This can be a lot of typing. As a result many popular packages are imported using the `import as ` syntax. This imports the library but immediately changes its name, usually to something simpler and shorter for easy typing. For example it is traditional to import pandas and rename it `pd`. Thus to import pandas this way, we would type this at the top of our notebook or script:" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have access to the pandas function using `pd.`. For example, we can create a new pandas *dataframe* like this:" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
studentgrades
010.95
120.27
230.45
340.80
\n", "
" ], "text/plain": [ " student grades\n", "0 1 0.95\n", "1 2 0.27\n", "2 3 0.45\n", "3 4 0.80" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df=pd.DataFrame({'student': [1,2,3,4], 'grades':[0.95, 0.27, 0.45, 0.8] })\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It was a lot shorter to import pandas as `pd` and then call the `pd.DataFrame()` command than to type out `pandas.DataFrame()`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you know there is a particular function you want to grab from a library, you can import only that function. For example, we could import the `cos()` function from the math library like this:" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [], "source": [ "from math import cos" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is generally useful if there is a really large and complex library and you only need part of it for your work." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python libraries can really extend your computing and data options. For example, one cool library is the `wikipedia` library which provides an interface to the Wikipedia site." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Carbon dioxide',\n", " 'Carbon dioxide scrubber',\n", " 'List of countries by carbon dioxide emissions',\n", " 'Carbon capture and storage',\n", " \"Carbon dioxide in Earth's atmosphere\",\n", " 'Carbon dioxide equivalent',\n", " 'Carbon dioxide removal',\n", " 'Hypercapnia',\n", " 'Carbon sequestration',\n", " 'Carbon dioxide laser']" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import wikipedia as wk\n", "wk.search(\"Carbon Dioxide\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `search()` command in the wikipedia will search all wikipedia packages that match a particular search string. Here it brings up all pages relevant to the phrase \"Carbon Dioxide\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dealing with error messages" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Often times you will run into error messages when using Python. This is totally fine! It is not a big deal and comes up all the time. In fact if you don't make errors sometimes then you are probably not learning enough." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Most errors in python generate what is known as an \"exception\" where the code doesn't necessarily \"crash\" but a warning is issued. For example, if we have too many parentheses (each parentheses you open must be closed by the same type of character) you will get a `SyntaxError`:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid syntax (, line 1)", "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m print(0/0))\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" ] } ], "source": [ "print(0/0))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are a couple of things to notice about this error. First is that it does give you some indication of what happened (for examples it say \"invalid syntax\" which means the code doesn't look right to python and so it can't understand it). It also shows you where the first error occured (here on line 1 of the cell, and it even indicates which character in that line might be the problem.) From this message you could easily fix you code simply by removing the extra parantheses at the end." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "ename": "ZeroDivisionError", "evalue": "division by zero", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mZeroDivisionError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m/\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mZeroDivisionError\u001b[0m: division by zero" ] } ], "source": [ "print(0/0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ooops but this gives a different error! Not we are getting and error because we are attempting to divide by zero. Again, the messages if you read them kind of give some helpful infomation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In most cases you can simply fix the code in your cell, try to re-run it and keep going until you get it right. However, one problem can come up in that errors compound on one another. For example, your error might have accidently renamed a variable or introduced a variable you don't want. In that case it can be helpful to choose Kernel->Restart Kernel from the file menu. This effectively \"restarts\" the computing engine that you are working with and reset it to a fresh session. You will need to re-run all the cells prior to your error in that case but this is easy. There is even a shortcut on the \"Cell\" menu called \"run all cells above\" that will let you rerun everything prior to the current selected cell. This can be a great way to restart and get back to where you were working originally." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Wrapping up" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This chapter has tried to give you a general overview of Python programming and many of the language features you will encounter again and again in this course. It is not all there is to learn about Python and once you get started, you can learn for years if you like. There are many excellent online resources for adding to your Python knowledge including youtube videos, free online tutorials, and even online classes. Thus, use this as a starting place for learning more about the language. However, if you are able to master the above concepts, then you shouldn't have too much difficulty with this course." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Further Reading and Resources" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a collection of useful python resources including videos and online tutorials. This can help student how have less familiarity with programming in general or with python specifically.\n", "\n", "- A nice, free textbook \"How to Code in Python\" by Lisa Tagliaferri\n", "- CodeAcademy has a variety of courses on data analysis with Python. There is a free tutorial on Python 2.0. Although this class uses Python 3.0 and there are minor difference, a beginning programmer who didn't want to pay for the code academy content might benefit from these tutorials on basic python syntax: [Python 2.0 tutorial](https://www.codecademy.com/learn/learn-python)\n", "- Microsoft has an [Introduction to Python](https://docs.microsoft.com/en-us/learn/modules/intro-to-python/?WT.mc_id=python-c9-niner) video series. Each video is about 10 minutes long and introduces very basic python features.\n", "- A nice multi-part tutorial on [Data Visualization with Python and Seaborn](https://medium.com/@neuralnets/statistical-data-visualization-series-with-python-and-seaborn-for-data-science-5a73b128851d) that gets into many more details about Seaborn than we have time to cover in class.\n", "- A six hour (free) video course on basic Python programming on youtube" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Tags", "kernel_info": { "name": "python3" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "nteract": { "version": "0.15.0" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "261.59375px" }, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }