Homework

Course Info
Grading and

Important Dates >


Lecture Notes
and
Extende Syllabi

Open Problems











DS-GA 3001 Special Topics in Data Science: MATHEMATICS OF DATA SCIENCE: Graphs and Networks
(Spring 2018, NYU Center for Data Science)

Afonso S. Bandeira
bandeira [at] cims [dot] nyu [dot] edu

Lectures: Thu 4.55pm-6.35pm at CDS110 (60 5th Av).

Section Leader: Shuyang Ling
Lab section: Wed 7.45pm-8.35pm at CDS110

Office Hours Afonso: Thu 6.35pm-7.35pm at CDS603 or CDS110 (or by appointment at CIWW1123)
Office Hours Shuyang: Wed 4.30pm-5:30pm at CIWW 1103

Piazza page:

(sign up for announcements! Piazza is also ideal to ask questions!)

Graders: Chen Li and Luca Venturi (you can contact them on Piazza, or by email)
Events of potential interest:
The Math and Data seminar meets on Thursday before the class, you can see the schedule here and sign up for the mailing list here. Also, I organize weekly (on Wednesday) group meetings and reading groups on several topics related to Math and Data. If you are interested in research in this area you are more than welcome to join (at whatever frequency you want). You can see the schedule and more information here, and sign up for the mailing list here.

Announcements:
  • Project reports are all due on Apr 26 before class, by email to the instructors and graders. Project reports should be 2-3 pages (more is ok if main message is in first 3 pages).
  • The slides for the presentations are due on the Wednesday before your presentation day (ideally in pdf format)
  • Recall the project is due on April 26
  • Homework 3 (due on Apr 5) available here
  • Midterm Course Evaluation available here
  • Detailed Syllabi now available
  • Homework 2 (due on Feb 27) available here
  • Homework 1 (due on Feb 1) available here
  • Course information, including grading, available here
  • Feedback form available here
  • Piazza page available at:
  • This is part of a two course series on Mathematics of Data Science. Each part can be taken independently, and they can be taken in any order, the other part can be found here.
  • As the semester progresses I will post Lecture Notes and update the Open Problem list. For an earlier version of the notes and open problems, see: Ten Lectures and Forty-Two Open Problems in the Mathematics of Data Science. The course will roughly cover the second half of the notes.
  • This course will have some overlap to a course I gave at MIT, which is available at MIT OCW here.

Prerequisites:
Working knowledge of linear algebra and basic probability is required. Some familiarity with the basics of optimization and algorithms is also recommended.

Syllabus: This is part of a two course series on Mathematics of Data Science. Each part can be taken independently, and they can be taken in any order, the other part can be found here. This part focus on algorithms on graphs and networks while the other in high dimensional data. This is a mostly self-contained research-oriented and fast-paced course designed for graduate students with an interest in doing research in theoretical aspects of algorithms that aim to extract information from data. These often lie in overlaps of (Applied) Mathematics with: Computer Science, Electrical Engineering, Statistics, Operations Research and/or Statistical Physics. Each lecture will feature a couple of Mathematical Open Problem(s) with relevance in Data Science. The main mathematical tools used will be Probability and Linear Algebra, and a working knowledge of these subjects is required. There will also be some (although knowledge of these tools is not assumed) Graph Theory, Representation Theory, Applied Harmonic Analysis, among others. The topics treated will include Random Matrices, Approximation Algorithms, Convex Relaxations, Community detection in graphs, and several others.

The Syllabus includes: Matrix Concentration, Approximation Algorithms and Max-Cut, Community Detection and the Stochastic Block Model, Synchronization Problems and Alignment, Cheeger's Inequality, Semidefinite Programming relaxations, Approximate Message Passing algorithms, and (if time permits) statistical physics heuristics for computational limits of problem on networks.

Please email the instructor at bandeira@cims.nyu.edu if you have any question.


Open problems will be presented at the end of most lectures.
Some of the open problems will also be posted in my blog.
 
I am here to help: please let me know of your goals for your project and keep me up to date of your progress on it. If you have any question, want to discuss a problem, or brainstorm about any research idea, just email me and we'll schedule a time to meet.
Feedback: Also, if you have any comment or feedback on the class (it's going too fast, too slow, you want me to cover more of something, or less of something else, etc) please let me know (in person or through email) or submit a comment to this google form. Having direct feedback from you is the best way for me to try give lectures that you like! (keep in mind that I don't know who sent me the comment or feedback and there is no way for me to answer, for questions use email, Piazza, or my blog).


Lecture Notes and Extended Syllabi:

  • Detailed Syllabi for lectures:
  1. Jan 25: Introduction to graph theory, approximation algorithm, Max-Cut approximation. Chapter 8 on Lecture Notes.
  2. Feb 01: Max-Cut approximation. Lifting / SDP relaxations technique in mathematical signal processing, phase retrieval and k-means SDP.
  3. Feb 08: Unique Games Conjecture, Sum-of-Squares interpretation of SDP relaxation. Chapter 8 of Lecture Notes.
  4. Feb 15: Shannon Capacity, Lovasz Theta Function. Section 7.3.1. on Lecture Notes and ``On the Shannon Capacity of a Graph'' by Laszlo Lovasz. See also Section 6.5.3.
  5. Feb 22: Stochastic Block Model and Phase Transitions on graphs. Chapter 9 of Lecture Notes
  6. Mar 01: Recovery in the Stochastic Block Model with Semidefinite relaxations. Chapter 9 of Lecture Notes
  7. Mar 08: Matrix Concentration (Chapter 4)
  8. Mar 22: Matrix Concentration continued (Chapter 4)
  9. Mar 29: Belief Propagation in graphical models (focusing on SBM) and Cavity Method (Section 3 of this reference)
  10. Apr 5: Synchronization problems and applications to Cryo-Electron Microscopy (Chapter 10).
  11. Apr 12: Spectral Graph Theory: Graph Metrics and Diffusion Maps (Chapter 3)
  12. Apr 19: Spectral Graph Theory: Expanders and Sparsifiers. Reference: notes by Dan Spielman here and here.
  13. Apr 26: Project Presentations
  14. May 3: Project Presentations
  • Detailed Syllabi for labs:
  1. Jan 24: review of linear algebra and probability
  2. Jan 31: discussion of homework 1
  3. Feb 07:  graph Laplacian and Cheeger's inequality
  4. Feb 14:  pseudo distribution for maxcut,  derivation of primal and dual program for Maxcut, SOS4
  5. Feb 21:  introduction to Grothendieck inequality and a proof of an upper bound of Grothendieck constant (Krivine’s bound)
  6. Feb 28:  calculate the Lovasz theta function for n-cycle and discuss connection with Grothendieck constant on graph
  7. Mar 28: Feige's conjecture (Open Problem 4.6.) and Steinitz problem (reference)


Homework:

   

Open Problems: