WP5: Advancing Data Privacy in Research
In the realm of data-driven research, safeguarding privacy is a foundational imperative. WP5 of our project is dedicated to establishing robust frameworks and methodologies to ensure the protection of sensitive data, particularly within federated research networks and synthetic data environments.
Assessing Privacy Risks in Synthetic Data and Federated Learning
A significant focus of WP5 is on evaluating and quantifying privacy risks associated with synthetic data and federated learning processes. Unlike traditional measures such as k-anonymity or differential privacy, direct metrics for synthetic data (SD) are still under development, necessitating innovative approaches.
Attribute Disclosure and Proxy Measures
Attribute disclosure poses a critical challenge, revealing sensitive attributes from datasets that individuals prefer to keep confidential, such as medical treatments. Clinicians play a pivotal role in identifying these sensitive attributes within datasets.
To address these challenges, WP5 employs proxy measures to assess the similarity between synthetic and real data. This includes evaluating how closely synthetic records mirror real-world data, thereby gauging the risk of attribute disclosure.
Membership Inference
Membership inference involves disclosing meta-information, posing risks depending on the attacker’s knowledge and access to synthetic data or models.
Threat Models
Any understanding of the likelihood of the various attacks must be combined with the context of the dataset being assessed, WP5 defines threat models using Spyderisk to put datasets in the context of the broader information system, including any other datasets which could be used in an attack. In this way the risk levels of the various threats can be anticipated and measures put in place to control such risks effectively.
Progress in Privacy Metrics Implementation
WP5 is actively advancing privacy metrics to enhance assessment capabilities. These metrics encompass both similarity-based and attack-based approaches, allowing for a comprehensive evaluation of privacy risks associated with synthetic data and federated learning.
- Similarity and Distance-based Metrics: These metrics include measures like Exact Matches, Distance to Closest Record (DCR), Nearest Neighbor Distance Ratio (NNDR), Cosine Similarity, and Hausdorff Distance. They help quantify how similar synthetic data is to real data, crucial for assessing privacy risks.
- Attack-based Metrics: WP5 employs metrics such as F1-Score, Accuracy, R2, MAE (Mean Absolute Error), and MAPE (Mean Absolute Percentage Error) to evaluate attribute disclosure as a machine learning task. These metrics provide insights into the effectiveness of privacy-enhancing technologies.
Legal and Regulatory Considerations
Aligned with GDPR Article 29 Working Party guidelines, WP5 addresses risks such as singling out, linkability, and inference. These considerations ensure that data processing activities comply with stringent legal requirements, minimizing risks to data subjects.
Conclusion
WP5 exemplifies our commitment to advancing data privacy standards in research. By integrating cutting-edge frameworks, innovative metrics, and rigorous compliance measures, we ensure that our project not only meets but exceeds expectations for safeguarding sensitive data. As we continue to evolve, WP5 remains at the forefront of shaping ethical and secure data practices in research.
For more information on our ongoing efforts and achievements, please visit [our website] and explore our latest publications and resources.