TECHNICAL SKILLS AND METHODS

Python Data Science Stack

Core Libraries: pandas, NumPy, matplotlib, SciPy, scikit-learn, statsmodels

Data Preparation & Transformation

Data Cleaning: Handling missing values and duplicates
Feature Engineering: Creation and transformation of meaningful variables
Handling Missing Data: Imputation techniques and analysis
Outlier Analysis: Detection and treatment methods
Box–Cox Transformation: Normalization of non-normal data
Feature Creation (Classification): Example — “collaboration flag” variable
Class Imbalance Handling: Resampling (SMOTE, undersampling) or weighting strategies

Exploratory Data Analysis & Visualization

Exploratory Techniques: Summary statistics, correlation analysis
Data Visualization: Plotting with matplotlib and seaborn
Diagnostic Plots: Q–Q plots, residual plots, LOWESS smoothing

Supervised Learning Frameworks

Regression Models:
- Linear Regression (OLS, WLS)
- Logistic Regression (for classification tasks)
Training Process: Train–test split and cross-validation

Model Evaluation & Diagnostics

Error Metrics: MSE, RMSE, MAE, R2R2
Residual Diagnostics: Pattern assessment, normality, and variance checks
Heteroskedasticity Tests: Breusch–Pagan test and corrective approaches

KEY QUESTIONS EXPLORED

1. Hit songs and albums
It tested whether having a “hit” song (top 25% popularity) on an album boosts the average popularity of the other tracks. Using regression (OLS, log-transformed, Box–Cox, WLS) and diagnostics, it found only a weak, slightly negative relationship and no meaningful uplift for non-hit songs.

2. Collaborations vs solo tracks
It asked how musical attributes change when artists collaborate versus release solo tracks. A collaboration flag was engineered, feature distributions were explored, and supervised learning (logistic-style) was set up to see if collaboration status is predictable from audio features.

3. Toward recommendations
A third stretch goal was to explore the feasibility of building a simple recommendation system from user inputs and track attributes. This framed the dataset as a foundation for future recommender or similarity-based models rather than a full production system.

KEY QUESTIONS EXPLORED

Key Findings

1. Hit songs and albums

Having a hit song on an album does not meaningfully increase the popularity of the album’s other tracks, despite a weak, slightly negative statistical relationship between hit and non‑hit popularity.

2. Collaborations vs solo tracks
Collaborative tracks differ systematically from solo tracks in their audio characteristics, and collaboration status is at least partially predictable from features like danceability, valence, and other Spotify audio metrics.

3. Toward recommendations

The dataset and feature engineering choices make it feasible to extend this work into simple recommendation or similarity-based systems, although building a full recommender was left as future work.

more projects

Machine Learning Prediction of Age and Alzheimer's Disease from DNA Methylation Profiles

DATA SCIENCE / MACHINE LEARNING

Machine Learning Prediction of Age and Alzheimer's Disease from DNA Methylation Profiles

DATA SCIENCE / MACHINE LEARNING

Machine Learning Prediction of Age and Alzheimer's Disease from DNA Methylation Profiles

DATA SCIENCE / MACHINE LEARNING

TRIAGE AI

SOFTWARE ENGINEERING / MACHINE LEARNING

TRIAGE AI

SOFTWARE ENGINEERING / MACHINE LEARNING

TRIAGE AI

SOFTWARE ENGINEERING / MACHINE LEARNING

Spotify Case Study

TECHNICAL SKILLS AND METHODS

KEY QUESTIONS EXPLORED

KEY QUESTIONS EXPLORED

Key Findings

more projects

more projects

more projects

Machine Learning Prediction of Age and Alzheimer's Disease from DNA Methylation Profiles

Machine Learning Prediction of Age and Alzheimer's Disease from DNA Methylation Profiles

Machine Learning Prediction of Age and Alzheimer's Disease from DNA Methylation Profiles

TRIAGE AI

TRIAGE AI

TRIAGE AI