Diabetes & Lifestyle; Risk Analysis

Exploratory data analysis regarding demographic, lifestyle, and clinical correlations with diabetes diagnosis | October 2025

Author: Andrew Castro


This project performs an exploratory analysis of a diabetes risk dataset sourced from Kaggle. It focuses on cleaning demographic and clinical data, transforming binary fields for clarity, and utilizing Power BI to test hypotheses regarding physical activity and diet. The final output is an interactive dashboard leveraging DAX for dynamic measures to identify risk factors such as BMI, LDL cholesterol, and socioeconomic influence. It’s vital to understand that analysis is based among the simulated dataset, and may not reflect real-world data.

More info below.

Project Summary

Methodology & Data Transformation


Key DAX Measures Created:

Key Insights & Hypotheses

Demographic Insights & Education

Positive diagnosis followed a typical bell curve, peaking between the ages of 45-55, with an average diagnostic age of approx 50. Positivity rates were highest among the White population (45.10%), followed by a significantly lower rate in the Hispanic population (19.90%).

Education level displayed a clear inverse by positive diagnosis rate. Positivity rates were highest (44.90%) among those with a High school education, and lowest (14.80%) among those with a Postgraduate level. Similar patterns followed income levels, affecting the middle class the most.

Age Influence Bell curve chart showing positive diabetes diagnosis peaking between ages 45 and 55

Socioeconimic Influence

Chart showing inverse correlation between education level and diabetes positivity rates

Family History

Among individuals with a positive diabetes diagnosis, 28.63% reported having a family history of diabetes. This was true for 28.68% of females, 28.50% of males, and 30.43% of non-specified genders. Individuals with unknown family history are to be expected and have also decreased this total percentage. Additionally, 29.51% of individuals had a history of Hypertension following positive diagnosis.

Diabities History

Visual analysis of correlation between risk scores and lifestyle factors

Hypertension History

Additional visual analysis of risk score correlations

Clinical Correlations & Risk Factors

A clear correlation exists between an individual’s physical activity per week and a reduction in their risk of diabetes. Conversely, individuals with increased BMI and LDL Cholesterol levels faced escalating risks of a positive diagnosis. Interestingly, while physical metrics were indicators, improved diet scores and sleep patterns displayed negligible correlation amongst this specific dataset regarding the improvement of their diabetic risk score.

Physical Activity & Diet

Visual analysis of correlation between risk scores and lifestyle factors

lower = less risk

LDL Cholesterol & BMI

Additional visual analysis of risk score correlations

higher = more risk

Glucose & Diet Analysis

Hypothesis 2 was proven false in this case study. Average post-prandial glucose was recorded at 160.04mg/dl. Although, individuals with a poor diet score averaged ~171 mg/dl, post-prandial glucose compared to ~166 mg/dl for those with a perfect score; the difference suggests minimal correlation in this dataset. Both glucose ratings are considered elevated blood sugar levels with minimal variance.

Additionally, those positively diagnosed had slightly increased triglycerides (123.21mg/dL) compared those negatively diagnosed for Diabities, but both averages fell within the healthy range (below 150.00mg/dL).

Influence on Glucose

Chart showing negligible correlation between diet score and glucose levels

Influence on Triglycerides

Comparison of average triglyceride levels by diagnosis positivity

Economic Relationships

Employment status and income levels were explored to find a correlation between an individual’s ability to afford a healthy lifestyle and their diabetic risk. However, the positive diagnostic rate between unemployed (39.78%) and employed (39.85%) groups were nearly identical.

Employment VS Diagnosis

Chart showing nearly identical diagnostic rates between employed and unemployed groups

Conclusion

Physical activity significantly reduced diabetes risk, while diet score showed negligible correlation on glucose levels in this dataset. LDL cholesterol and BMI emerged as strong clinical risk factors, and socioeconomic influences such as education also positively correlated with diabetes prevalence. However, it’s vital to understand that analysis and correlation is based solely on the sampled bias within the simulated dataset. It can not be used to reflect real-world analysis.

This exploratory data analysis reveals the importance of combining lifestyle, demographic, and clinical data to understand complex health risks.

Project Repo