DS-GA 1002: Probability and Statistics for Data Science

Instructor: Carlos Fernandez-Granda (cfgranda@cims.nyu.edu)

Teaching assistants:

This course introduces fundamental concepts in probability and statistics from a data-science perspective. The aim is to become familiarized with probabilistic models and statistical methods that are widely used in data analysis.

Announcements

Midterm

The midterm will be open notes and books. You may use a computer or a tablet, but only to access your notes, any other use will be considered cheating. The time and location are:

  • Section 1: Tuesday October 10 8:30 - 10:10 pm SILVER 208

  • Section 3: Tuesday October 10 6:45 - 8:25 pm, CDS (60 5th Ave) Rm 110

Syllabus

  • Probability: basic probability theory, random variables, multivariate random variables, expectation, random processes, convergence of random processes, Markov chains

  • Statistics: descriptive statistics, frequentist statistics, Bayesian statistics, hypothesis testing, linear regression

See the schedule for more details

Prerequisites

Calculus and linear algebra at the undergraduate level.

Notes

The course will follow these notes:

Probability and Statistics for Data Science

Please read the corresponding chapter before every lecture. The notes will be changed during the course, so we recommend that you don't print them out. Let us know if you find any typos or have any comments about them.

General Information

Lecture

  • Section 1: Monday 3:30PM - 5:10PM, 60 5th Ave, Rm 110

  • Section 3: Tuesday 6:45PM - 8:25PM, 60 5th Ave, Rm 110

Recitation

  • Section 2 (for students attending Section 1): Tuesday 8.35 PM - 9.25 PM, 60 5th Ave, Rm 110

  • Section 4 (for students attending Section 3): Thursday 4.55 PM - 5.45 PM, 60 5th Ave, Rm 110

Office hours

  • Section 1 and 3 (Carlos): Friday 4:00-5:30 pm, 60 5th Ave, Rm 606

  • Section 2 (Sheng): Tuesday 4.30-6.00 PM, 60 5th Ave, Rm 606

  • Section 4 (Lisa): Wednesday 4:30-6:00 pm, 60 5th Ave, Rm 606

Grading policy

Homework (40%) + Midterm (20%) + Final (40%)

Homework

Homework will be posted each Wednesday and is due a week later on Thursday at 11 pm. We will not take into account the assignment with the worst grade.

The homework assignments should be submitted as a pdf through NYU classes. The solutions and the grades will be available also on NYU classes.

Feel free to collaborate and discuss in person or on Piazza, but do not share specific answers and make sure that you write your assignment yourself. Always explain your thought process. If you use results from the notes or a book reference them adequately.

Piazza

We will be using Piazza to answer questions and post announcements about the course. Please sign up here.

Books

Additional references that could be useful include:

  • Probability:

    • A First Course in Probability by Ross

    • Introduction to Probability by Bertsekas and Tsitsiklis

  • Statistics:

    • Introduction to Mathematical Statistics by Hogg, McKean and Craig

    • Statistical Inference by Casella and Berger

    • All of Statistics by Wasserman

    • Probability and Statistics by DeGroot and Schervish

    • Statistics by Freedman, Pisani and Purves