About Sam Anzaroot

I am a Principal Data Scientist at Dataminr, working on natural language processing. Before Dataminr, I worked as an intern at Oracle Labs East, where I made contributions toward large scale machine learning inference using GPU programming. (github)(resume)

I have recently completed a masters degree at the University of Massachusetts Amherst specifically in the IESL lab working with Andrew McCallum.

I completed my undergrad education at the Computer Science department at Queens College – City University of New York. There I worked at a number of labs, mostly in the Blender Lab (now at RPI) with Heng Ji.

My interests lie in machine learning methods, most recently with applications in natural language processing. I am most fascinated by methods that prevent automatically learned models from making mistakes of the sort that humans would not have made, or the methods that help those models recover from such situations.

My most recent work at UMass involves automatic field extraction from citation strings in research papers. Machine learning methods on this task often only includes local information about the labeling of the citation strings. In contrast, humans can use the fact that a field such as volume number is unlikely to appear twice in a citation while performing inference on such tasks. I am investigating the use of constrained inference of this manner in order to bring the performance of models on such tasks up to human levels.

As an undergrad, I’ve worked with improving relation extraction by jointly doing inference over a large amount of automatically collected relations. We built a system that can detect unlikely collections of relations in a network. For example, this system can detect constraints such as that a company is unlikely to have been founded by somebody who does not live in the same country that the company is located in. The system proceeds to find the best configuration of relations that satisfies these learned constraints (joint work with Qi Li).

I’ve also done undergrad research at the SimStudent Lab at CMU HCII (NSF REU), at the MetroBotics project at Agents Lab at Brooklyn College(NSF REU), and the Brain Networks Lab at Texas A&M(NSF REU).

I also enjoy creating useful and functional user-facing systems that interact with complex machinery. In the past I’ve created a web-based browsing system for browsing networks of entities and relations automatically extracted via machine learning systems. I’ve also produced a web-based browser of 3D brain scans that can be collaboratively annotated by researchers.