Why Data Science?
I always look for projects which allow me to identify “efficiency,” be it process improvement that saves labor hours, or forecasting spending decision to save money. I enjoy seeking the “optimal” solution for problems.
We are exposed to a massive volume of data like never before; the opportunity of finding efficiency becomes limitless. Data Science is an excellent tool to comb through layers of data to identify the “golden eggs” within. Thanks to user-friendly programming languages like Python, I can combine my business experience and be self-sufficient in tackling problems with Big Data.
Technical Skills
- Python, SQL, Tableau, DataBrick, Hadoop, Excel(include Power Query, Power Desktop), ODBC, PowerShell
Functional Experience
- Data Science (3 years)
- Sales Planning (5 years)
- Supply Chain/ Inventory Planning (6 years)
- Finance/Accounting (5 years)
- Resume
Projects
Public Transit vs Rideshare in Austin, Texas

- Objective: Understand if relationship exists among rideshares and public transportation services. Recommend possible next steps to public transit agency within Austin to drive public transit adoptions.
- Models applied : Multiple Variables Linear Regression
Allstate Claims Severity

- Objective: Develop an automated method of predicting the cost, and hence severity, of claims. Model performance is evaluated on the mean absolute error (MAE) between the predicted loss and the actual loss.
- Models applied : Lightgbm, Xgboost, and Catboost.
Crowdflower Search Results Relevance

- Objective: Create an open-source model that can be used to measure the relevance of search results. Model performance is evaluated on quadratic weighted kappa, which measures the agreement between two ratings - scores assigned by the human rater and the predicted scores.
- Models applied : Logistic Regression, Support Vector Classification, Random Forest, Extra Trees, XGBoost.
Airbnb New User Bookings

- Objective: Predict in which country a new user will make his or her first booking. Model performance is NDCG (Normalized discounted cumulative gain) at k = 5. In other words, making a maximum of 5 predictions on the country of the first booking at the used id level.
- Models applied : Random Forest, Extra Trees, Lightgbm, and Keras Deep Learning.
Lending Club Repayment

- Objective: Predict the likelihood of paid off for loans based on information provided by borrowers at the point of application. The model should be able to screen out high risk loan requests and not rejecting good loans request by mistake.
- Models applied : Logistic Regression, Random Forecast.
Sales Forecast

- Objective: Given 34 months of sales history (Jan 2013 – Oct 2015) of a gaming retailer by items and by shops, predict sales for Nov 2015 (period 35)
- Models applied: ARMA, SARIMA, Holt Winter, Prophet, VARMAX, Xgboost, Random Forest, Rigde
Instacart Market Basket Analysis

- Objective : Predict products mixes that will be included in the next purchase order by users
- Models applied : Gradient Boosting, Lightgbm, Xgboost, Random Forest
Mini Projects

About Me: