The Division of Biostatistics at the Department of Preventive Medicine invites you to attend the following seminar.
Time: Monday, January 22, 2:00 PM-3:00 PM CDT
ZOOM Virtual Room Connection: Register in advance for this meeting
Seminar Website: https://www.eventcreate.com/e/biostatisticsseminar
Speaker Bio: https://web.stanford.edu/~udell/bio.html
Big Data is Low Rank
Madelein Udell, Ph.D.
Management Science and Engineering, Stanford University
affiliated with the Institute for Computational and Mathematical Engineering,
courtesy appointment in Electrical Engineering
Data scientists are often faced with the challenge of understanding a high dimensional data set organized as a table. These tables may have columns of different (sometimes, non-numeric) types, and often have many missing entries. This talk surveys methods based on low rank models to analyze these big messy data sets. We show that low rank models perform well — indeed, suspiciously well — across a wide range of data science applications, including in social science, medicine, and machine learning. This good performance demands (and this talk provides) a simple mathematical explanation for their effectiveness, which identifies when low rank models perform well and when to look beyond low rank.