Seminar: Robust Learning from Big and Messy Data

Daniel L. Pimentel-Alarc headshot

Daniel L. Pimentel-Alarcón, Ph.D.

Georgia State University, Department of Computer Science

March 15, 2019

3 pm

Patrick F. Taylor Hall, Room 3107


Big data is only getting bigger. For example, the upcoming Square Kilometre Array alone will daily generate twice the amount of data sent around the Internet per day, and 100 times more than the CERN Large Hadron Collider, which already generates so much data that scientists must discard the overwhelming majority of it, hoping they didn’t throw away anything useful. Big data is also getting messier: incomplete, sparse, noisy, biased, and with outliers. Exploitation of these big and messy data increasingly depends on our ability to identify patterns that summarize these datasets.

In this talk I will present our recent theoretical findings to learn linear and non-linear patterns from big and messy data. I will also discuss the main ideas behind our practical algorithms that are guaranteed to succeed even in cases where traditional methods are guaranteed to fail. Finally, I will show applications of our findings in areas as diverse as astronomy, computer vision, metagenomics, and more.


Originally from Mexico City, Daniel did his PhD in Electrical and Computer Engineering at the University of Wisconsin-Madison under the supervision of Robert Nowak and Nigel Boston. He then spent one year as a postdoctoral researcher at the Wisconsin Institute for Discovery, supervised by Stephen Wright and Rebecca Willett. In 2017 he joined Georgia State University as an Assistant Professor in Computer Science. His research focuses on learning from messy data (incomplete, severely corrupted, etc.) using Machine Learning, Statistics, Optimization, Signal Processing, Algebraic Geometry, and related fields, with applications in Data Science.