# Time-Series Analysis via Low-Rank Matrix Factorization Applied to Infant-Sleep Data

In this project, we propose a nonparametric model for time series with missing data based on regularized nonnegative low-rank matrix factorization. The model expresses each instance in a set of time series as a linear combination of a small number of shared basis functions. We then apply our methodology to a large real-world dataset of infant-sleep data gathered by caregivers with a mobile-phone app and automatically extract daily-sleep patterns consistent with the existing literature.

Above is the diagram of the proposed low-rank model applied on a set of time series. Each time series, represented by a matrix $$Y^{[n]}$$, $$1\leq n \leq N$$, is approximated by a sum of $$r$$ components corresponding to basis functions $$F_{1}$$, $$\dots$$, $$F_{r}$$, shared across the population. The coefficients $$C^{[n]}_{1}$$, $$\dots$$, $$C^{[n]}_{r}$$ provide a representation of reduced dimensionality for each instance $$Y^{[n]}$$. $$\mathcal{T}_n$$ denotes the set of observed rows in $$Y^{[n]}$$.

The left column shows the daily-sleep patterns learned by the proposed method. The right column shows the order statistics of the corresponding coefficients in the low-rank model. Large values of $$F_{j}$$ are associated to sleep, whereas values close to zero indicate wakefulness.

## Paper

[1] Time-Series Analysis via Low-Rank Matrix Factorization Applied to Infant-Sleep Data, Sheng Liu, Mark Cheng, Hayley Brooks, Wayne Mackey, David J. Heeger, Esteban G. Tabak, Carlos Fernandez-Granda.

## Dataset

Click Here to download the preprocessed infant sleep dataset (18.3 MB). There are 700 infants in the preprocessed dataset. Each infant's sleep time series is partitioned into a matrix $$Y^{[n]} \in \mathbb{R}^{730\times 144}$$. Please see [1] for more details.

## Code

We provide Matlab scripts for performing nonnegative matrix factorization with time smoothing:

After unzipping the source code, put the downloaded data file in the dat folder. Now run run_low_rank_decomp_script.m to reproduce basis functions and coefficients in [1].

## Acknowledgements

S.L. and M.C. were supported by a seed grant from the Moore-Sloan Data Science Environment at NYU. W.M. was supported by the Simons Foundation as a Junior Fellow in the Simons Society of Fellows. E.G.T. was partially supported by NSF grant DMS-1715753 and ONR grant N00014-15-1-2355. C.F. was supported by NSF award DMS-1616340. The authors thank Xavier Launay at Baby Connect for enabling parents to share data with them, and all the parents and caregivers for providing the data.