Synthema

Cross-European collaboration and knowledge sharing is one of the core principles of all EU funded projects, and even more so when more than one initiative is dedicated to the same area of work, as it’s the case for SYNTHEMA and GenoMed4All: two projects focusing on the use of Artificial Intelligence within the context of healthcare and personalised medicine applied to hematological rare diseases.

While GenoMed4All is building a secure and trustworthy Federated Learning platform to pool genomic health data via AI algorithms, SYNTHEMA aims to generate reliable and high-quality synthetic data that can shape new virtual patients — and further enhance diagnostic capacity, assess treatment options and predict outcomes in rare hematological diseases.

On 28th September 2023, both consortiums hosted a joint meeting to explore synergies and gain a better understanding about the connections of their work, and the potential transfer of knowledge between the projects. Through 5 rapid-fire talks, representative partners involved in both projects presented their latest achievements for SYNTHEMA and GenoMed4All — and how they feed into each other. In this article we unpick the lessons learned and the knowledge shared during the session.

Introducing SYNTHEMA and GenoMed4All | Federico Álvarez

As the Project Coordinator for SYNTHEMA and GenoMed4All, Professor Federico Álvarez kicked off the session with a quick introduction to the two projects, highlighting the key elements shared across both.

  • A common goal: led by Federico and his team at Universidad Politécnica de Madrid (UPM), both projects have been created with a clear mission: to develop an innovative approach to effectively use AI models to advance research and improve the standard of care in hematological rare diseases, and therefore contribute to advance the diagnosis and treatment of patients that suffer from those.
  • A common network of partners and supporters: some of the 16 partners involved in SYNTHEMA and the 23 in GenoMed4All are involved across both initiatives, and they also count on the support, resources and active participation of ERN-EuroBloodNet, the European Reference Network in Rare Hematological Diseases. This wide network of stakeholders makes it possible for the consortiums to benefit from a multidisciplinary approach towards achieving the goals stated for both initiatives.
  • A common case study:  each project is focusing on real-world use cases to test how these AI models can help to solve unmet needs of specific hematological diseases: Acute Myeloid Leukaemia (AML) in the case of SYNTHEMA, and Myelodysplastic syndromes (MDS) and Multiple Myeloma (MM) for GenoMed4All. Moreover, both projects are also targeting Sickle Cell Disease (SCD), a rare, chronic and life-threatening disease that affects more than 20 million people worldwide.

Federico closed his presentation stating that the ultimate shared goal is to understand rare hematological diseases and tackle their unmet needs by harnessing the power of AI: “Both projects are making impressive progress in this groundbreaking area of work, and SYNTHEMA is even going one step further trying to generate synthetic data through AI algorithms. We’re seeing very important outcomes towards advancing the usage and management of private patient data.”

Legal and ethical data compliance | Nathan Lea

As Information Governance Lead at The European Institute For Innovation Through Health Data (i~HD)Nathan Lea’s involvement in SYNTHEMA and GenoMed4All is focused on detecting the potential issues and challenges around the ethical use of data obtained through AI-methods, and how these can be extrapolated to wider usage in Europe and around the world.

“Both initiatives have strongly emphasized the need to understand the risks related to data compliance. We have developed data protection management protocols and risk assessments plans, which have been positively received by the European Commission and all stakeholders involved in the projects,” he explained.

Nathan also highlighted the importance of sharing the learnings and outcomes from both projects with other similar initiatives, especially those operating within the healthcare and patient data management landscape. “We have developed strong knowledge in research governance and data reuse, and we have to make sure the consistencies are capitalised across both projects. With the templates and the frameworks we’re generating, we can present a unified approach to inspire others and expand on the successes achieved by SYNTHEMA and GenoMed4All.”

Nathan finished his presentation reminding all partners that these projects operate in a highly regulated space, therefore we need to be fully transparent about our goals and the ways in which we intend to use the data gathered: “Both projects are tackling challenges related to rare disorders, where data needs for AI are not always well served, and are particularly prone to issues around bias, inclusivity and representation. This means caution, transparency and a strict interpretation of regulatory matters is essential.”

Precision medicine, AI, validation and analytics | Gastone Castellani and Marilena Bicchieri

Gastone Castellani, Professor in Medical and Surgical Sciences at Università di Bologna, is one of the partners involved in GenoMed4All and SYNTHEMA. He began his presentation stating the clear synergies between the two projects and also emphasized how they are crucially contributing towards the legacy of HARMONY, the Healthcare Alliance for Resourceful Medicine Offensive against Neoplasms in Hematology.

From a practical standpoint, Gastone also celebrated the data gathering approach adopted by the use cases, especially in relation to building a ‘generic’ pipeline: “From a mathematical point of view, this means sufficiently abstract data can be used for a lot of different use cases. The most important point is the clustering and survival analysis,and the search of causal relationships between genes. I’m glad to say we have published several papers using this pipeline,” he explained.

Marilena Bicchieri, Healthcare Project Manager at Humanitas Research Hospital, shared similar reflections alongside Gastone. She also pointed out the complexities of personalised medicine in regards to the availability of big data: “Personalised medicine allows us to study humans from multiple layers, but it’s very difficult to have all this data for a single patient; this is why there is a huge need to develop synthetic data via machine learning, as this will allow us to clearly identify clusters,” she adds.

Marilena also accentuated the need to scale up all these algorithms for other rare diseases, on top of the use cases that have already been appointed by GenoMed4All and SYNTHEMA. “A big problem is the validation of these methods. We are working to use several approaches, but we still need to tackle the main issues around validity, fidelity, utility and privacy. We have to put our patients’ needs and rights above anything else, to make sure we’re conducting our work ethically and with integrity, and to make sure we’re providing healthcare professionals with the right tools and strategies to build trustworthy AI mechanisms,” she said.

Sickle Cell Disease as a use case | Mar Mañú

Following up from the key points presented by Gastone and Marilena, Mar Mañú, Principal Investigator from Vall d’Hebron Institute of Research (VHIR), expanded on the challenges faced by practitioners and researchers gathering health data to study rare diseases.”Diseases like SCD are chronic, which means we need lifelong data collection from patients and, therefore, resulting us having to build very complex pathways – it’s very difficult to have all this dataset from the same patient,” she mentioned.

Another key issue, as Mar explained, is the way this data is heterogeneously distributed in Europe, impeding researchers from accessing large enough amounts of data to fully understand SCD and the potential treatments. “We’ve advanced a lot in the last 15 years, but what we’re building through SYNTHEMA and GenoMed4All at a European level gives us a real chance to continue studying the disease and eventually come up with innovative approaches,” she added.

Federated learning and architecture | Silvia Uribe

In her enlightening presentation on Federated Learning and Architecture, Professor Silvia Uribe, from Universidad Politécnica de Madrid (UPM) delved into the complexities of designing and deploying asynchronous distributed systems. She emphasised the importance of having the right infrastructure, including a CI/CD tool and a dev environment, to streamline development, integration, testing and deployment of these systems. Silvia highlighted that while High-Performance Computing (HPC) isn’t essential for parameter aggregation, it aids in data transformation, especially when converting multimodal data into tabular forms. 

The presentation also drew parallels between SYNTHEMA and Genomed4All, detailing overlaps in data collection for MDS/AML use cases, including demographic, clinical, and genomic variables. Furthermore, Silvia underscored the significance of data transformation, as well as the challenges in data collection and standardization. She concluded with an overview of the legal and ethical aspects, noting the similarities between the two projects but highlighting that SYNTHEMA, unlike Genomed4All, aims to produce synthetic data in addition to disease discovery.