Time-Series Analysis via Low-Rank Matrix Factorization Applied to Infant-Sleep Data

In this project, we propose a nonparametric model for time series with missing data based on regularized nonnegative low-rank matrix factorization. The model expresses each instance in a set of time series as a linear combination of a small number of shared basis functions. We then apply our methodology to a large real-world dataset of infant-sleep data gathered by caregivers with a mobile-phone app and automatically extract daily-sleep patterns consistent with the existing literature.


Above is the diagram of the proposed low-rank model applied on a set of time series. Each time series, represented by a matrix \(Y^{[n]}\), \(1\leq n \leq N\), is approximated by a sum of \(r\) components corresponding to basis functions \(F_{1}\), \(\dots\), \(F_{r}\), shared across the population. The coefficients \(C^{[n]}_{1}\), \(\dots\), \(C^{[n]}_{r}\) provide a representation of reduced dimensionality for each instance \(Y^{[n]}\). \(\mathcal{T}_n\) denotes the set of observed rows in \(Y^{[n]}\).


The left column shows the daily-sleep patterns learned by the proposed method. The right column shows the order statistics of the corresponding coefficients in the low-rank model. Large values of \(F_{j}\) are associated to sleep, whereas values close to zero indicate wakefulness.


[1] Time-Series Analysis via Low-Rank Matrix Factorization Applied to Infant-Sleep Data, Sheng Liu, Mark Cheng, Hayley Brooks, Wayne Mackey, David J. Heeger, Esteban G. Tabak, Carlos Fernandez-Granda.


Click Here to download the preprocessed infant sleep dataset (18.3 MB). There are 700 infants in the preprocessed dataset. Each infant's sleep time series is partitioned into a matrix \(Y^{[n]} \in \mathbb{R}^{730\times 144}\). Please see [1] for more details.


We provide Matlab scripts for performing nonnegative matrix factorization with time smoothing:

After unzipping the source code, put the downloaded data file in the dat folder. Now run run_low_rank_decomp_script.m to reproduce basis functions and coefficients in [1].


S.L. and M.C. were supported by a seed grant from the Moore-Sloan Data Science Environment at NYU. W.M. was supported by the Simons Foundation as a Junior Fellow in the Simons Society of Fellows. E.G.T. was partially supported by NSF grant DMS-1715753 and ONR grant N00014-15-1-2355. C.F. was supported by NSF award DMS-1616340. The authors thank Xavier Launay at Baby Connect for enabling parents to share data with them, and all the parents and caregivers for providing the data.