Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Describing and plotting data (Part 1)

1 / 49
Lecture.5

Here is a lot of numbers

52 -23 57 91 -42 34 -59 -50 -80 -71 35 6 2 57 49 -14
100 -48 -49 -25 -75 81 -69 -5 79 -85 -5 -69 98 -11 89 -24
-55 -14 -51 49 74 -71 91 77 68 29 13 -81 21 86 -32 15
65 -22 85 -57 1 54 100 76 -11 83 -60 74 -61 30 93 -53
90 90 -68 51 85 -58 -56 38 -34 10 66 -52 14 -10 -34 -42
99 24 -30 -1 6 46 -11 15 6 69 67 -17 -48 36 -62 -86
-24 -28 -9 -13 19 -3 5 90 -63 -28 -18 29 92 28 -94 -25
26 93 21 39 -90 62 -19 36 14 -27 -67 3 -19 -46 69 48
-45 98 -56 -48 69 98 31 -32 69 68 -2 -99 31 66 65 -80
6 2 57 -49 92 65 -54 -95 -73 -61 -71 -61 70 52 -1 8
2 / 49
Lecture.5

What can we say about them?

We can see they aren't all the same. Not much else really. Looking at a bunch of numbers is hard work.

52 -23 57 91 -42 34 -59 -50 -80 -71 35 6 2 57 49 -14
100 -48 -49 -25 -75 81 -69 -5 79 -85 -5 -69 98 -11 89 -24
-55 -14 -51 49 74 -71 91 77 68 29 13 -81 21 86 -32 15
65 -22 85 -57 1 54 100 76 -11 83 -60 74 -61 30 93 -53
90 90 -68 51 85 -58 -56 38 -34 10 66 -52 14 -10 -34 -42
99 24 -30 -1 6 46 -11 15 6 69 67 -17 -48 36 -62 -86
-24 -28 -9 -13 19 -3 5 90 -63 -28 -18 29 92 28 -94 -25
26 93 21 39 -90 62 -19 36 14 -27 -67 3 -19 -46 69 48
-45 98 -56 -48 69 98 31 -32 69 68 -2 -99 31 66 65 -80
6 2 57 -49 92 65 -54 -95 -73 -61 -71 -61 70 52 -1 8
3 / 49
Lecture.5

Summary numbers

It would be nice to reduce the big set of numbers down to a few numbers that we can look at and make sense of.

Sameness (Central Tendency)

  • What are all the numbers close to?

Differentness (Variance)

  • How different are the numbers?
4 / 49
Lecture.5

Descriptive Statistics

  • Give us summaries of big sets of numbers

  • Useful single numbers to look at

  • They tell us about patterns of sameness and differentness

5 / 49
Lecture.5

Graph the numbers to get a better look

6 / 49
Lecture.5

Dot plot (unordered)

Graphing the numbers gives a quick and dirty sense of what they are like. Here's 200 numbers presented as dots

7 / 49
Lecture.5

Dot plot (ordered)

Sorting the numbers from smallest to largest

8 / 49
Lecture.5

Histograms

Histograms count up the numbers inside specific ranges

9 / 49
Lecture.5

Histograms

Bars show you which bins have more or less numbers in the range

10 / 49
Lecture.5

So what are these numbers like?

What single number would you say best describes most of these numbers?

11 / 49
Lecture.5

Question

Is the red or blue value a better summary of all the numbers?

12 / 49
Lecture.5

Measures of Central Tendency

13 / 49
Lecture.5

Central Tendency

  1. Central tendency should describe what most of the data is like
14 / 49
Lecture.5

Central Tendency

  1. Central tendency should describe what most of the data is like

  2. We want our summary number to be most like the other numbers. We want it to be a representative value

15 / 49
Lecture.5

Central Tendency

  1. Central tendency should describe what most of the data is like

  2. We want our summary number to be most like the other numbers. We want it to be a representative value

  3. There are multiple measures of central tendency with different properties

16 / 49
Lecture.5

Central Tendency

  1. Central tendency should describe what most of the data is like

  2. We want our summary number to be most like the other numbers. We want it to be a representative value

  3. There are multiple measures of central tendency with different properties

  4. Some work better than others depending on the data

17 / 49
Lecture.5

Mode

18 / 49
Lecture.5

Mode

The mode is the single most frequently occuring number

1 1 2 2 3 4 5 6 7 7 7 7 7

  • The mode is 7 because 7 happens the most

  • Find the mode by counting the occurence of each number, the mode is the most frequently occuring number

  • If there is a tie, then you have two or three or more modes (depends on how many different numbers tie)

19 / 49
Lecture.5

Finding the Mode in Python

We make 25 numbers, how do we get python to find the mode?

import numpy as np
a=np.random.randint(1,10+1, 25)
counts = np.bincount(a)
max=np.argmax(counts)
max, counts[max]
20 / 49
Lecture.5

Custom function for the mode in python

You can always write your own function for the mode. This one is called my_mode

def my_mode(array):
counts = np.bincount(a)
max=np.argmax(counts)
return max, counts[max]
a=np.random.randint(1,10+1, 25)
my_mode(a)
21 / 49
Lecture.5

Thinking about the mode

When should we use mode? Appropriate for many datasets; for nominal data (or oridinal), it may be one of the few reasonable descriptors

22 / 49
Lecture.5

Median

23 / 49
Lecture.5

Median

The median is the middle number

1 1 2 2 3 4 5 6 7 7 7 7 7

  • The median is 5 because it is the middle number

  • Find the median by ordering the numbers from smallest to largest, then take the number in the middle

24 / 49
Lecture.5

Median (even number of numbers)

If there are an even number of numbers, find the two in the middle, and

1 2 3 4 5 6 7 8

  • The median is 4.5 because, 4.5 is in between the two middle numbers
25 / 49
Lecture.5

Finding the Median in Python

Put some numbers in a variable.

a=np.random.randint(1,10+1, 12)
np.median(a)
26 / 49
Lecture.5

Thinking about the median

When would the median be a good thing to know?

Suitable for many datasets, and makes sense for ordinal data. More robust to outliers than mean

27 / 49
Lecture.5

Mean

28 / 49
Lecture.5

Mean

The Mean (also called average) is the sum of the numbers, divided by the number of numbers

Mean=sum of numbersnumber of numbers

1 1 2 2 3 4 5 6 7 7 7 7 7

  • Sum = 1+1+2+2+3+4+5+6+7+7+7+7 = 59
  • Number of numbers = 13
  • Mean = 59/13 = 4.538462
29 / 49
Lecture.5

Mean

Mean=ˉX=i=Ni=1xiN

  • ˉX bar symbolizes the mean

  • i=Ni=1xi Summation notation

    • x = all the numbers (1,2,3,4...)
    • i = an index value, representing the first to last and all the numbers in between of x.
    • N = the number of numbers
    • = instruction to add up numbers
30 / 49
Lecture.5

Summation example

x=[4,7,9]

i=Ni=1xi=xi=1+xi=2+xi=3=4+7+9=20

31 / 49
Lecture.5

Mean in a table

index x
1 4
2 7
3 2
4 9
5 8
Sum 30
N 5
Mean 6
32 / 49
Lecture.5

The mean equally divides the sum

index x equal_parts
1 4 6
2 7 6
3 2 6
4 9 6
5 8 6
Sum 30 30
N 5 5
Mean 6 6
33 / 49
Lecture.5

The mean is the balancing point

  • deviation = score minus mean
  • sum of deviations will always equal zero
index x deviations
1 4 -2
2 7 1
3 2 -4
4 9 3
5 8 2
Sum 30 0
N 5 5
Mean 6 0
34 / 49
Lecture.5

Finding the Mean in Python

Use the mean() function

#make some numbers
a=np.random.randint(1,10+1, 12)
np.mean(a)
35 / 49
Lecture.5

sum() and length()

  • sum() sums up the numbers
  • .size counts up the number of numbers in the variable
a=np.random.randint(1,10+1, 12)
np.sum(a)
a.size
36 / 49
Lecture.5

Mean = sum()/length()

a=np.random.randint(1,10+1, 12)
np.sum(a)/a.size
37 / 49
Lecture.5

Thinking about the Mean

When would the mean be a good thing to know?

Most appropriate for interval and ratio data. But sensitive to outliers.

38 / 49
Lecture.5

Do descriptive statistics for central tendency actually describe the data?

It depends on the data

39 / 49
Lecture.5

Histogram shape: Bell-Shaped

Mean (Red), Median (Green), Mode (Blue)

40 / 49
Lecture.5

Right-skewed

Mean (Red), Median (Green), Mode (Blue)

41 / 49
Lecture.5

Outliers

Outliers are really big or really small values that are unusual compared to the rest of the data

42 / 49
Lecture.5

Mean, Median, and outliers

The mean is influenced by outliers, the median is not.

Mean (Red), Median (Green)

43 / 49
Lecture.5

Zooming in

The big number (2000) makes the mean really big, because it is included in the sum.

44 / 49
Lecture.5

Always plot your data

45 / 49
Lecture.5

Big ideas

  1. Descriptive statistics help us reduce a large pile of numbers to a few numbers that "describe the data"
46 / 49
Lecture.5

Big ideas

  1. Descriptive statistics help us reduce a large pile of numbers to a few numbers that "describe the data"

  2. Mode, median, mean, are descriptives for central tendency in the data (meant to represent what most of the numbers are like)

47 / 49
Lecture.5

Big ideas

  1. Descriptive statistics help us reduce a large pile of numbers to a few numbers that "describe the data"

  2. Mode, median, mean, are descriptives for central tendency in the data (meant to represent what most of the numbers are like)

  3. Measures of central tendency can be "off" by quite a bit depending on the shape of the data, need to look at data to see if they are appropriate

48 / 49
Lecture.5

Thanks to Todd Gureckis and Matt Crump for the slides.

49 / 49
Lecture.5

Here is a lot of numbers

52 -23 57 91 -42 34 -59 -50 -80 -71 35 6 2 57 49 -14
100 -48 -49 -25 -75 81 -69 -5 79 -85 -5 -69 98 -11 89 -24
-55 -14 -51 49 74 -71 91 77 68 29 13 -81 21 86 -32 15
65 -22 85 -57 1 54 100 76 -11 83 -60 74 -61 30 93 -53
90 90 -68 51 85 -58 -56 38 -34 10 66 -52 14 -10 -34 -42
99 24 -30 -1 6 46 -11 15 6 69 67 -17 -48 36 -62 -86
-24 -28 -9 -13 19 -3 5 90 -63 -28 -18 29 92 28 -94 -25
26 93 21 39 -90 62 -19 36 14 -27 -67 3 -19 -46 69 48
-45 98 -56 -48 69 98 31 -32 69 68 -2 -99 31 66 65 -80
6 2 57 -49 92 65 -54 -95 -73 -61 -71 -61 70 52 -1 8
2 / 49
Lecture.5
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow