I am an Applied AI researcher at Verneek, working on natural language processing, with the mission to enable “anyone to make data-informed decisions”. (github)(resume)
Before that, I was a Principal Data Scientist at Dataminr, working on natural language processing. At Dataminr I built models to extract public safety events from social media posts. Before Dataminr, I worked as an intern at Oracle Labs East, where I made contributions toward large scale machine learning inference using GPU programming.
I completed a masters degree at the University of Massachusetts Amherst specifically in the IESL lab working with Andrew McCallum.
I completed my undergrad education at the Computer Science department at Queens College – City University of New York. There I worked at a number of labs, mostly in the Blender Lab (now at UICU) with Heng Ji.
My interests lie in machine learning methods with applications in natural language processing.
My work at UMass involved automatic field extraction from citation strings in research papers. Machine learning methods on this task often only includes local information about the labeling of the citation strings. In contrast, humans can use the fact that a field such as volume number is unlikely to appear twice in a citation while performing inference on such tasks. I investigated the use of constrained inference of this manner in order to bring the performance of models on such tasks up to human levels.
As an undergrad, I’ve worked with improving relation extraction by jointly doing inference over a large amount of automatically collected relations. We built a system that can detect unlikely collections of relations in a network. For example, this system can detect constraints such as that a company is unlikely to have been founded by somebody who does not live in the same country that the company is located in. The system proceeds to find the best configuration of relations that satisfies these learned constraints (joint work with Qi Li).
I’ve also done undergrad research at the SimStudent Lab at CMU HCII (NSF REU), at the MetroBotics project at Agents Lab at Brooklyn College(NSF REU), and the Brain Networks Lab at Texas A&M(NSF REU).
I also enjoy creating useful and functional user-facing systems that interact with complex machinery.
Publications
- Crisis Sub-Events on Social Media: A Case Study of Wildfires. Shan Jiang, William Groves, Sam Anzaroot, and Alejandro (Alex) Jaimes. In Proceedings of the AI for Social Good Workshop at the 36th International Conference on Machine Learning (AISG@ICML 2019),, 2019.
- Using machine learning to advance synthesis and use of conservation and environmental evidence.. Cheng SJ, Augustin C, Bethel A, Gill D, Anzaroot S, Brun J, DeWilde B, Minnich RC, Garside R, Masuda YJ, Miller DC, Wilkie D, Wongbusarakum S, McKinnon MC. Conservation Biology, 2018.
- Learning Soft Linear Constraints with Application to Citation Field Extraction. Sam Anzaroot, Alexandre Passos, David Belanger, Andrew McCallum. Proc. the 52nd Annual Meeting of the Association for Computational Linguistics (ACL2014), 2014.
- A New Dataset for Fine-Grained Citation Field Extraction. Sam Anzaroot, Andrew McCallum. ICML Workshop on Peer Reviewing and Publishing Models (PEER), 2013.
- Joint Inference for Crossdocument Information Extraction. Qi Li, Sam Anzaroot, Wen-Pin Lin, Xiang Li and Heng Ji. Proc. 20th ACM Conference on Information and Knowledge Management (CIKM2011). 2011
- Cross-lingual Slot Filling from Comparable Corpora. Matthew Snover, Xiang Li, Wen-Pin Lin, Zheng Chen, Suzanne Tamang, Mingmin Ge, Adam Lee, Qi Li, Hao Li, Sam Anzaroot, Heng Ji. Proc. ACL2011 Worshop on Building and Using Comparable Corpora. 2011
- Developing a Framework for Team-based Robotics Research. Elizabeth Sklar, Simon Parsons, Susan Epstein, Arif T. Ozgelen, George Rabanca, Sam Anzaroot, Joel Gonzalez, Jesse Lopez, Mitch Lustig, Linda Ma, Mark Manashirov, J. Pablo Munoz, S. Bruno Salazar, Miriam Schwartz. AAAI 2010 Robotics Exhibition and Workshop. 2011
- Search, Mining and Browsing Self-Boosting Multi-Dimensional Text-Rich Information Networks. Sam Anzaroot, Javier Artiles, Hao Li, Qi Li, Zheng Chen, Suzanne Tamang, Heng Ji, Hongbo Deng, Jiawei Han. The Network Science Collaborative Technology Alliance Annual Meeting. 2011 (presentation)