Previous Work

A look at publicly available code I've written, talks I've given, and projects I've lead

Screen Shot 2019-02-26 at 11.17.04 AM.png

Fraud & Machine Learning

Watch at O’ReillY’s Strata Website

Every company deals with fraud, whether it’s signup fraud, transaction fraud, employee fraud, or a million other variants. However, there’s very limited literature and previous work outside of closed-door working groups.

During this standing-room-only talk, I walk through common approaches to build a machine learning infrastructure for capturing and blocking fraud.

Automated Movie Spoiler Tagging

Review Code on GitHUB

Spoilers are a complicated concept (see this guideline), but avoiding movie and tv show spoilers is a common goal.

With this in mind, I build a model on that can determine in message board posts contain spoilers or not, using data I pulled from reddit. This model robustly handles edge-cases and new concepts (such as speculation and previously unseen characters), while generalizing well.

Machine Learning for Class Imbalances & Adversaries

▶ Watch on Youtube

There are many areas of applied Machine Learning which require models optimized for rare occurrences (i.e. class imbalance), as well as users actively attempting to subvert the system (i.e. adversaries).

The approaches discussed will include ensemble models, deep learning, genetic algorithms, outlier detection via dimensionally reduction (PCA and neural network auto-encoders), time-decay weighting, and Synthetic Minority Over-sampling Technique (SMOTE sampling).

Screen Shot 2017-10-28 at 11.15.26 AM.png

Natural Language Processing (NLP) with Deep Learning

Review code on GitHub

There aren't great batteries included examples for modeling text with deep learning, so I've built out this repo to contain starter code for:

Text processing: Processing text to be utilized with keras (text pre-processing, converting to indices, padding)
Pre-trained embedding: Using a pre-trained text embedding (GoogleNews 300) with keras (translating words to a point in \mathbb{R}^{300})
Convolutional architecture: Modeling text with a convolutional architecture (functionally similar to Ngrams)
RNN architecture: Modeling text with a Recurrent Neural Net (RNN) architecture (functionally similar to a rolling window)

Automating Legal Fulfillment with Machine Learning

▶ Watch on Youtube

Capital One receives thousands of legal requests every year, often as physical mail. During this talk, we'll dive into how the Center for Machine Learning at Capital One have built a self contained platform for summarizing, filtering and triaging these legal documents, utilizing open source projects.

Screen Shot 2017-10-28 at 11.47.40 AM.png

Resume Parser

Review code on GitHub

A utility to make handling many resumes easier by automatically pulling contact information, required skills and custom text fields. These results are then surfaced as a convenient summary CSV.

This started as a side project in grad school, but has become a community project used at companies across the globe.