Balancing patient privacy and predictive accuracy through data anonymization in healthcare

Prem Kumar M. , Archana Bhat , Macherla Bhagyalakshmi , Nikila G.S.

Abstract

Data anonymization in healthcare is essential for protecting sensitive patient information while enabling secure usage for research, analytics, and AI-driven clinical decision-making. In this study, the MIMIC-III - Deep Reinforcement Learning dataset was used, which contains comprehensive electronic health records (EHRs) of ICU patients. Data preprocessing was performed using Min-Max Normalization to scale numerical features and ensure consistency. Anonymization techniques such as pseudonymization, generalization, suppression, data masking, and statistical methods like k-anonymity, l-diversity, and t-closeness were applied to safeguard patient privacy. The anonymized dataset was then utilized for predictive modelling using AI techniques including Random Forest and LSTM. Results demonstrated that privacy was maintained with 0% PII leakage, while predictive accuracy remained high, achieving accuracy of 94.6%, precision of 93.8%, recall of 92.5%, and F1-score of 93.1%. This study highlights that effective data anonymization ensures compliance with HIPAA and GDPR while retaining the utility of healthcare data for advanced analytics and AI applications.

Key Words

AI analytics; data anonymization; GDPR; healthcare; HIPAA; k-anonymity; MIMIC-III; patient privacy; pseudonymization;

Address

Prem Kumar M. — Operations Head, Willron Electronics, Bangalore
Archana Bhat — Department of Artificial Intelligence and Machine Learning, BMS Institute of Technology and Management, Bengaluru, India
Macherla Bhagyalakshmi — School of Commerce, Finance and Accountancy, Christ University, Bangalore
Nikila G.S. — Software Engineer, OnGen, Bangalore, India

PDF Viewer

Preview uses the same access rules as Full Text PDF (subscription, purchase, or open access).

← Back to Volume 11, No. 1