Synthema

Synthema

Publications

Privacy Mechanisms and Evaluation Metrics for Synthetic Data Generation: A Systematic Review

Our review, “Privacy Mechanisms and Evaluation Metrics for Synthetic Data Generation,” explores synthetic data’s role in enhancing privacy. Covering 105 studies, it highlights differential privacy and GAN models, especially in healthcare. Discover key trends and future research directions in our comprehensive overview.

Automated Knowledge-Based Cybersecurity Risk Assessment of Cyber-Physical Systems

Stephen Phillips from the University of Southampton presents a novel approach for automated cybersecurity risk assessment of cyber-physical systems. This method uses a comprehensive knowledge-base to model and simulate threats, streamlining ISO 27005 implementation. Validated through real-world case studies, it offers enhanced transparency, reproducibility, and performance in risk management.

MOSAIC: An Artificial Intelligence–Based Framework for Multimodal Analysis, Classification, and Personalized Prognostic Assessment in Rare Cancers

The study introduces MOSAIC, an AI-based framework for analyzing and predicting outcomes in rare cancers, tested on 4,427 myelodysplastic syndrome (MDS) patients. Advanced clustering and AI methods improved patient stratification and survival prediction over traditional techniques. UMAP + HDBSCAN achieved better accuracy, and AI models outperformed conventional ones. SHAP analysis provided insights into key features, and federated implementation enhanced model accuracy and data protection, demonstrating MOSAIC’s potential for clinical use.

Clinical and Genomic-Based Decision Support System to Define the Optimal Timing of Allogeneic Hematopoietic Stem-Cell Transplantation in Patients With Myelodysplastic Syndromes

This study aims to optimize the timing of allogeneic hematopoietic stem-cell transplantation (HSCT) for patients with myelodysplastic syndromes (MDS) using the Molecular International Prognostic Scoring System (IPSS-M), which includes clinical and genomic information. Analyzing a retrospective cohort of 7,118 patients, the study finds that low to moderate-low risk patients benefit from delayed HSCT, while high-risk patients benefit from immediate HSCT. The IPSS-M based strategy significantly changes transplantation timing decisions compared to conventional methods, improving life expectancy. This supports the clinical relevance of incorporating genomic data into HSCT timing decisions for personalized treatment.

Personalized Timing for Allogeneic Stem-Cell Transplantation in Hematologic Neoplasms: A Target Trial Emulation Approach Using Multistate Modeling and Microsimulation

This study develops a framework to optimize the timing of allogeneic hematopoietic stem-cell transplantation (HSCT) for patients with hematologic neoplasms using real-world data. By leveraging multistate modeling and microsimulation on a cohort of 7,118 patients with myelodysplastic syndromes, the analysis identifies optimal timing for HSCT based on individual patient profiles. The methodology provides insights and evidence for clinical decision-making, addressing complex scenarios where randomized trials are not feasible.

Protecting Multiple Sensitive Attributes in Synthetic Micro-data

This paper explores the use of synthetic data as a privacy-preserving measure in data analysis, emphasizing the need to protect sensitive attributes while maintaining data utility. It investigates enhancements to the DataSynthesizer model, using Bayesian Networks to generate synthetic data that safeguards multiple sensitive attributes against inference attacks. The study contributes to the field by analyzing the impact of these techniques on data utility, presented at the 2023 IEEE International Conference on Big Data.

Federated learning for causal inference using deep generative disentangled models

In the context of decentralized and privacy-constrained healthcare data settings, we introduce an innovative approach to estimate individual treatment effects (ITE) via federated learning. Emphasizing the critical importance of data privacy in healthcare, especially when drawing on data from various global hospitals, we address challenges arising from data scarcity and specific treatment assignment criteria influenced by the availability of the medication of interest. Our methodology uses federated learning applied to neural network-based generative causal inference models to bridge the gap between decentralized and centralized ITE estimation on a benchmark dataset.

Sickle cell disease landscape and challenges in the EU: the ERN-EuroBloodNet perspective

Sickle cell disease is a hereditary multiorgan disease that is considered rare in the EU. In 2017, the Rare Diseases Plan was implemented within the EU and 24 European Reference Networks (ERNs) were created, including the ERN on Rare Haematological Diseases (ERN-EuroBloodNet), dedicated to rare haematological diseases. The role of the ERN-EuroBloodNet is to improve the overall approach to and the management of individuals with sickle cell disease in the EU through specific on the pooling of expertise, knowledge, and best practices; the development of training and education programmes; the strategy for systematic gathering and standardisation of clinical data; and its reuse in clinical research.

Synthetic Data Generation by Artificial Intelligence to Accelerate Research and Precision Medicine in Hematology

Synthetic data are artificial data generated without including any real patient information by an algorithm trained to learn the characteristics of a real source data set and became widely used to accelerate research in life sciences. In this work researchers apply generative artificial intelligence to build synthetic data in different hematologic neoplasms; develop a synthetic validation framework to assess data fidelity and privacy preservability; and test the capability of synthetic data to accelerate clinical/translational research in hematology.