Profile Picture

Matthew Epland

Data Scientist

About

I am a freelance Data Scientist working with top 25 pharmaceutical companies to improve patient journeys. In my spare time I contribute to the imodels and dtreeviz python libraries. I enjoy investigating anomalous data and building new models!

Projects

Chance of Showers

Chance of Showers Screenshot

Repository


Description: This project provides live water pressure measurements via a web dashboard running on a Raspberry Pi, logs the data, and creates time series forecasts of future water pressure, all to lower the odds of being stranded halfway through a shower with unreliable water pressure! Work in progress.

Methods: Time Series Analysis, Predictive Modeling, Hyperparameter Optimization, Bayesian Optimization, Data Acquisition (DAQ), Cron Job “Heartbeat” Monitoring, Web Dashboard, Python Linting

Software: python 3.11, darts, torch, prophet, plotly, polars, flask


ACIC Causality Challenge 2022

Inverse Probability Weighting Difference-in-Differences (IPWDID)

Repository

Paper
Slides
Challenge Site

Description: In this American Causal Inference Conference (ACIC) 2022 challenge submission, the canonical difference-in-differences (DID) estimator has been used with inverse probability weighting (IPW) and strong simplifying assumptions to produce a benchmark model of the sample average treatment effect on the treated (SATT). Despite the restrictive assumptions and simple model, satisfactory performance in both point estimate and confidence intervals was observed, ranking in the top half of the competition.

Methods: Causal Inference, Inverse Probability Weighting (IPW), Difference-in-Differences (DID), Generalized Linear Models (GLM), Bootstrapping, Monte Carlo, One Hot Encoding

Software: R, speedglm, boot, python 3.9, pyspark, Snowflake


Mount Sinai Health Hackathon EKG Imaging Project

EKG for MI

Repository

Paper
Paper Data

Description: Electrocardiograms (EKG) play a pivotal role in modern medicine, but are commonly interpreted with proprietary and machine specific algorithms. We developed an image recognition model which can be used to read standard EKG strip printouts across machines. An austere variation of the MobileNetV3 convolutional neural network (CNN) model was trained on publicly available labeled waveform data to classify 12-lead EKGs between seven clinically important diagnostic classes.

Methods: Convolutional Neural Network (CNN), Image Classification, Multi-class Classification

Software: python 2.7, torch, matplotlib

A Search for Supersymmetry in Multi-b Jet Events with the ATLAS Detector

Expected Exclusion Plot

Defense Slides

Dissertation (PDF)
Dissertation (Online)

Description: Ph.D. dissertation searching for supersymmetry (SUSY) in pair-produced gluinos at the large hadron collider’s (LHC) ATLAS experiment. The search employed a parameterized boosted decision tree (BDT) to separate SUSY signal events from standard model backgrounds. New methods for optimal BDT parameter point selection and signal region creation were used to increase the search’s sensitive area by ∼30%.

Methods: Boosted Decision Tree (BDT), Hypothesis Testing, Statistical Significance, Hyperparameter Optimization, Bayesian Optimization, Feature Importance, Feature Selection, Network Analysis, Monte Carlo

Software: python 2.7, xgboost, scikit-optimize, networkx, ROOT

Exploring Interdisciplinary Research at Duke via Ph.D. Committees

1st Place - 2018 Scholars@Duke Visualization Challenge

Interdisciplinary Graph

Repository

Poster
Paper
Interactive Network
Duke Scholars Bridge Disciplines to Tackle Big Questions

Description: By combining Duke Ph.D. committee membership data with the faculty appointments directory, connections between academic organizations were found and used to construct an undirected, weighted network, i.e. graph. From this network communities of closely linked organizations were created via the Louvain method. Additionally, the level of interdisciplinary activity in each organization was measured by comparing the relative weights of their external and self connections. Analysis won 1st place in the competition.

Methods: Network Analysis, Louvain Method

Software: python 2.7, networkx, pandas

ATLAS TRT Particle ID Machine Learning R&D Studies

ROC Curve

Repository

Slides

Description: R&D studies of particle identification in the ATLAS transition radiation tracker (TRT) where conducted utilizing machine learning techniques, with the goal of separating electron tracks from muons. Developed with fellow Duke graduate students Doug Davis and Sourav Sen, and continued by Davis and others within the TRT group. Support vector machines (SVM) and boosted decision trees (BDT) from the scikit-learn library were tested, as well as neural networks (NN) constructed with Keras and TensorFlow.

Methods: Support Vector Machines (SVM), Boosted Decision Trees (BDT), Neural Networks (NN), k-fold Cross-Validation

Software: python 2.7, scikit-learn, keras, tensorflow

Numerical Methods and the Dampened, Driven Pendulum

Resonance Sweep

Repository

Paper

Description: A computational study of the dampened, driven pendulum using the Euler-Cromer and Rung-Kutta numerical methods to investigate resonance, nonlinear behavior, and chaos. Numerical simulations compared to theory where possible.

Methods: Euler-Cromer, Rung-Kutta, Ordinary Differential Equations (ODE), Chaos and Lyapunov Exponents

Software: python 2.7, numpy, matplotlib

Notes

Repository


TODO

Publications

TODO

Posts

January 2024