Synthema
Publications

Synthetic Tabular Data Generation Under Horizontal Federated Learning Environments in Acute Myeloid Leukemia: Case-Based Simulation Study
This study evaluates the combination of synthetic data generation and federated learning in the context of acute myeloid leukemia, a rare hematological disease. Using two state-of-the-art generative models across various data distribution scenarios, the research shows that horizontal federation leads to a loss in data fidelity while maintaining privacy. Despite this trade-off, increasing the number of nodes does not significantly worsen performance, making the approach promising for privacy-preserving data generation in biomedical research.

An improved tabular data generator with VAE-GMM integration.

Synthetic tabular data validation: A divergence-based approach.

Propensity Weighted federated learning for treatment effect estimation in distributed imbalanced environments.

Membership Inference Attacks and Differential Privacy: a study within the context of Generative Models.

Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios.

Advancing Cancer Research with Synthetic Data Generation in Low-Data Scenarios

Improving synthetic Data Generation through Federated Learning in scarce and heterogeneous data scenarios. Big Data and Cognitive Computing

Privacy Mechanisms and Evaluation Metrics for Synthetic Data Generation: A Systematic Review

Automated Knowledge-Based Cybersecurity Risk Assessment of Cyber-Physical Systems

MOSAIC: An Artificial Intelligence–Based Framework for Multimodal Analysis, Classification, and Personalized Prognostic Assessment in Rare Cancers

Clinical and Genomic-Based Decision Support System to Define the Optimal Timing of Allogeneic Hematopoietic Stem-Cell Transplantation in Patients With Myelodysplastic Syndromes
This study aims to optimize the timing of allogeneic hematopoietic stem-cell transplantation (HSCT) for patients with myelodysplastic syndromes (MDS) using the Molecular International Prognostic Scoring System (IPSS-M), which includes clinical and genomic information. Analyzing a retrospective cohort of 7,118 patients, the study finds that low to moderate-low risk patients benefit from delayed HSCT, while high-risk patients benefit from immediate HSCT. The IPSS-M based strategy significantly changes transplantation timing decisions compared to conventional methods, improving life expectancy. This supports the clinical relevance of incorporating genomic data into HSCT timing decisions for personalized treatment.

Personalized Timing for Allogeneic Stem-Cell Transplantation in Hematologic Neoplasms: A Target Trial Emulation Approach Using Multistate Modeling and Microsimulation
This study develops a framework to optimize the timing of allogeneic hematopoietic stem-cell transplantation (HSCT) for patients with hematologic neoplasms using real-world data. By leveraging multistate modeling and microsimulation on a cohort of 7,118 patients with myelodysplastic syndromes, the analysis identifies optimal timing for HSCT based on individual patient profiles. The methodology provides insights and evidence for clinical decision-making, addressing complex scenarios where randomized trials are not feasible.

Protecting Multiple Sensitive Attributes in Synthetic Micro-data
This paper explores the use of synthetic data as a privacy-preserving measure in data analysis, emphasizing the need to protect sensitive attributes while maintaining data utility. It investigates enhancements to the DataSynthesizer model, using Bayesian Networks to generate synthetic data that safeguards multiple sensitive attributes against inference attacks. The study contributes to the field by analyzing the impact of these techniques on data utility, presented at the 2023 IEEE International Conference on Big Data.

Federated learning for causal inference using deep generative disentangled models

Sickle cell disease landscape and challenges in the EU: the ERN-EuroBloodNet perspective
