Lessons Learned From Fatal Car Crash Data

Motor vehicle travel is a major means of transportation in the United States, yet for all its advantages, each year fatal motor vehicle crashes in the U.S. lead to an estimated societal burden of more than $230 billion from medical and other costs [1]. Motor vehicle crashes are also the leading cause of death for persons every age from 5 to 32 years old [2]. In this project, we deep dived into the fatal car crash records in the U.S. in year 2016, collected by National Highway Traffic Safety Administration (NHSTA) [3] and encoded using the government’s Fatality Analysis Reporting System. By wielding this dataset and related research, we developed an interactive essay trying to thoroughly explore the top risk factors that are highly correlated to fatal motor vehicle crashes.

To read the interactive essay in your web browser, visit here.

The report detailing our methodologies and poster summary of this project are also available.

Screen Shot 2019-01-05 at 11.45.09 PM.png

This slideshow requires JavaScript.

Snippets from the report:

This slideshow requires JavaScript.

 

The Fall of The Simpsons

Simpsons_viz.png
PDF version.

This visualization is designed to show how the once most beloved TV show declined in its reputation and popularity over time and why this happened.

Tools used: Python matplotlib library for generating time series charts, Google Sheet for the heatmap, and Sketch to put together the visual design.

Software Environment Incident Analytics

Incident Analytics was an intelligent incident management tool we developed for AppDynamics DevOps customers during a Hackathon.  AppDynamics customers were able to configure health rules based on a few key metrics of their interest and get alerted when these metrics saw unexpected patterns. However, without knowing about historical data, DevOps may spend hours figuring out a resolution when someone had solved a similar issue before. In this project, we built a tool based on machine learning algorithms to automatically identify root cause analyses (RCAs) for incidents — this task previously would take hours if not days of manual work. The solution we built helped customers understand the context around incoming incidents and get to resolution much faster. We applied machine learning to grouping incidents together, correlating incidents with RCAs, and analyzing if incidents were triggered by a global issue. This constitutes a big improvement over current AppDynamics solution which provides zero out-of-box analytics. 
Screen Shot 2017-03-10 at 10.19.28 AM
Incident Analytics – User Interface
ux_incident_correlator.png
Incident Analytics – UX Prototype

 

Interactive big data visualization for app performance monitoring

For the years out of college, I’ve been working as a software engineer (focusing on UI) on the core APM team at AppDynamics (now part of Cisco) based in downtown San Francisco, California. Application Performance Management (APM) is a technology that provides end-to-end business transaction-centric management of complex and distributed software applications. Auto-discovered transactions, dynamic baselining, code-level diagnostics, and Virtual War Room collaboration ensure rapid issue identification and resolution to maintain an ideal user experience. At AppDynamics, I developed complex yet performant AngularJS-based web application UI providing rich user interaction with a wealth of APM data in large scale. I’ve been made seasoned in all phases of the software product lifecycle: designing, prototyping, developing, maintaining, test automation, and shipping out useful features to our customers.

This slideshow requires JavaScript.

Automatic detection of epileptiform events in EEG recordings

An electroencephalogram (EEG) is the most important tool in the diagnosis of seizure disorders. Between seizures, epileptiform neural activities in EEG recordings occur in the forms of spikes or spike-and-slow wave complexes. Seeking for an automated EEG interpretation algorithm that is well-accepted by clinicians has been a research goal stretched for decades. As a participant in an NSF-funded Research Experience for Undergraduates (REU) program hosted at Clemson University School of Computing, I continued on this endeavor to develop an automated system that detected epilepsy-related events, in real-time, from scalp EEG recordings.

In finding the optimal algorithm for this purpose, I constructed a multi-stage processing pipeline. In the first stage, I cleaned up the clinic data gathered from 100 epileptic patients and treated them with cross-validation. Next, I used wavelet transformations to generate the features for study from EEG signal in a “sliding window” approach. I then applied machine learning algorithms and analyzed their performances in classifying data patterns into epileptiform activities versus other activities. For this stage I also explored the use of hidden Markov model to fit the time sequence in which epileptiform events occurred. In the final step, I further separated target eplieptiform events from noise signals, by applying a statistical model locally, and stitched outputs from different signal windows together. – source code

The automation results were highlighted these findings in realtime on the eegNet (standardized EEG database developed by Clemson) web interface.

Automatic detection of epileptiform events in EEG recordings – poster

This slideshow requires JavaScript.

The Open Science Investigation

Barriers for scientists to practice open science prevail due to a range of cultural and technological reasons. This undergraduate thesis, developed under the guidance of the Center for Open Science, seeks to understand the incentive structure for open science from a sociotechnical perspective, and attempts at a software solution to facilitate its implementation. The research paper, Incentive structure for Open Science in Web 2.0, elucidates how current reward system needs to be changed to encourage more practices of open science: to create incentives for researchers to open up their research materials for the broader community, organizations need to provide researchers with intrinsic rewards, proper credit allocation, and tangible career benefits. In the technical portion of the project, Designing Data Visualizations for Open Science, I prototyped an interactive research exploration and organizing tool for the Open Science Framework. The thesis contributes to this collective effort towards open science by making the creation of incentives as an explicit design goal for open science web applications. – thesis cover   |  STS paper

Twitter Sentiment Analysis

Social media, such as Facebook and Twitter, was a significant focus of Big Data. Utilizing Hadoop and AWS, I did a Sentiment Analysis of Twitter Data, concerning a series of events around the dismissal and subsequent re-hiring of U.Va. President Sullivan, in the summer of 2012. My data source was approximately 52,000 tweets collected by the U.Va. Library. The analysis result was presented in an infograph and an interactive data explorer showing the interplay of Twitter users centered around Larry Sabato, the most prominent influencer identified during the summer.

screen-shot-2016-06-12-at-12-09-20-am.png