Hospital Readmissions Analysis

Project Details

Hospital Readmissions Analysis

Open Project Link

Project Slider

Project Overview

In this project a ten-year hospital inpatient dataset is analyzed to evaluate whether diabetes-related factors are associated with hospital readmission. The dataset supports the assessment of clinical utilization patterns, diabetes management indicators, and patient-level characteristics to determine potential predictors of readmission outcomes within a healthcare context.
The objective of this project is to transform raw hospital records into a structured, analysis-ready dataset that enables evaluation of diabetes as a contributing factor to readmission risk and supports business intelligence reporting for healthcare decision-making.

Key Features

  • End-to-end SQL data pipeline using staging-to-core architecture
  • Standardization of missing and inconsistent clinical values
  • Surrogate key implementation for full data traceability
  • Advanced data quality diagnostics (duplicates, consistency checks)
  • IQR-based outlier detection with flagging (no data loss)
  • Clinical logic validation for contradiction detection
  • Feature engineering for healthcare KPIs and utilization metrics
  • Export-ready, BI-optimized dataset for Power BI integration

Tools and Technologies Used

This project was developed using MySQL for data ingestion, cleaning, transformation, and validation. SQL techniques such as window functions (e.g., PERCENT_RANK) were applied for statistical analysis and outlier detection. Data modeling followed a structured staging-to-core architecture to ensure reproducibility and auditability. The final dataset was prepared for visualization and reporting in Power BI, enabling downstream analysis and dashboard development aligned with healthcare analytics standards.

Results and Outcomes

The project successfully transformed raw hospital readmission records into a clean, standardized, and analysis-ready dataset suitable for advanced analytics. Key data quality issues, including inconsistent missing values and potential duplicates, were identified and systematically addressed. Outlier detection enhanced analytical reliability without compromising data integrity. The dataset enabled clear evaluation of relationships between diabetes indicators and readmission outcomes, supporting insights into utilization patterns and potential risk factors. Overall, the solution provides a strong foundation for KPI development, segmentation analysis, and healthcare decision support.

Role and Responsibilities

I designed and implemented the complete data cleaning and preparation pipeline, ensuring a structured and reproducible workflow from raw data to final output. My responsibilities included building the database environment, developing staging and core data models, and applying data validation, standardization, and transformation logic. I conducted data quality assessments, implemented outlier detection mechanisms, and performed clinical consistency checks. Additionally, I prepared the dataset for downstream analysis and reporting, ensuring alignment with healthcare analytics objectives and maintaining high standards of data integrity, transparency, and scalability.

Documents

Only document groups with uploaded files are shown below.