Impact Review

Impact Review

Health informatics From digital information to patient impact: Connecting health data for research Mark Avery, Director of Health Informatics at Eastern AHSN, explains how we are convening partners to facilitate the storage, processing and use of health and care information for research W hen the first fully assembled human genome sequence was published in 20011, it was clear that a new age in healthcare research had begun. Mapping the book of life was the first step in enabling us to understand how genetic variation shapes human health and disease. In less than 10 years, the time and cost of sequencing genomes was reduced by a factor of 1 million and in 20212 about 30 million people have had theirs sequenced3. However, all this valuable insight into what makes each of us unique takes up a lot of data-storage space, with the demand for all of these sequenced genomes estimated to be in the exabytes globally4. For researchers dealing with enormous volumes of data, secure access is only one part of the challenge. That data needs to be arranged, manipulated and analysed, often in combination with other datasets, to be useful for research. Many research institutions and data providers have their own trusted research environments (TREs), which are secure spaces for researchers to access and analyse sensitive data. However, these TREs are often not compatible with others, which limits the research. By building a bridge between research environments, we can enable researchers to leave very large datasets in situ and analyse them remotely while ensuring peoples privacy is One exabyte (EB) is equal to one billion gigabytes (GB) or can be expressed as 1018 bytes. With the average pop song around 3:30 in length, it would take you about 167,000 years to listen to an exabyte of music protected, pulling only the results into a secure and trustworthy research environment. This process of getting two or more distinct databases to act as one is known as federation. This technology has the potential to enable the medical research community to conduct more research, faster, by removing the geographical, logistical and governance barriers associated with moving exceptionally large datasets. This has enormous potential for future research in terms of increasing both the volume and diversity of data available for health research, which is why Eastern AHSN has been a key partner in multiple data informatics projects this year to support this enabling infrastructure. Gut Reaction Eastern AHSN is a key partner in Gut Reaction, the Health Data Research Hub for inflammatory bowel disease (IBD). The project builds on the high-quality phenotypic and genomic data in the NIHR IBD BioResource by combining it with real-world data from participating NHS hospitals and the UK IBD Registry. Researchers can use extracts from these linked datasets to support important research into inflammatory bowel disease. By March 2022, we had received 45 requests from researchers to use the Gut Reaction data, many of which have passed the criteria and been approved to enable Eastern AHSN are the linchpin that brings all the partners of the project together them to begin their studies. Our role has focused on operational and programme management, coordinating activities across workstreams and project governance, including reporting and assurance to our funders. Involving patients in decision-making about how their data are used for patient or societal benefit has been central to the work of Gut Reaction. This year, we have worked closely with our Patient Advisory Committee to design and implement a new data access process for inflammatory bowel disease data access applications. This approach will be reviewed after 12 months and, if successful, further refined and extended to all data access applications across the entire NIHR BioResource. Over the past 12 months, by drawing on the experience of the partners comprising Gut Reaction and building on our successes, a sustainable model for data sharing for innovation in inflammatory bowel disease has been developed. This means the hub will be able to continue beyond the Rosanna, Patient Partner for the Gut Reaction Health Data Research Hub for inflammatory bowel disease (IBD), shares her experience working with Eastern AHSN. funded period to help drive innovation. Find out more about Gut Reaction and our partners on this programme, visit our website. Professor Serena Nik-Zainal, NIHR Research Professor and Honorary Consultant in Clinical Genetics, University of Cambridge Enabling biomedical research through data The more we learn about the human genome, the more we see the potential for clinical research in this area. Yet, sharing the data we generate across the University of Cambridge and our partner organisations is a challenge. The genomic medicine theme of the NIHR Cambridge Biomedical Research Centre (BRC) sought to improve clinical research data infrastructure, and we commissioned Eastern AHSN to project manage the building of a common data architecture so that genomic and other biological data could be effectively and safely shared across the Cambridge Biomedical Campus. For the resulting project, called CYNAPSE, Eastern AHSN scoped what would be needed, what it might cost and how it could be delivered. Having supported the procurement of a suitable software platform the team is now working with the providers (Lifebit) to develop the features and capabilities required to make it all work for a small initial number of research groups, which will be scaled over the coming months. Eastern AHSNs expertise in helping to find solutions for complex health data challenges has been instrumental to the projects success to date. This work has the potential to revolutionise research in Cambridge and beyond. A proof of concept for federated genomics Building on the CYNAPSE infrastructure and working with the University of Cambridge and Lifebit, we successfully secured funding from UK Research & Innovation as part of Phase 1 of the DARE UK (Data and Analytics Research Environments UK) programme, which is delivered in partnership with Health Data Research UK (HDR UK) and Administrative Data Research UK (ADR UK). Our federated genomics project will demonstrate that different clinical-genomic datasets from CYNAPSE, the new data infrastructure for Embed: Professor Serena Nik-Zainal talks about the potential for research using different datasets without having to move the data the Cambridge Biomedical Campus, and Genomics England can be analysed remotely by approved research partners. By enabling the datasets to be simultaneously accessed remotely, this project could ultimately help the research community securely access and collaborate across larger, combined cohorts to leverage the immense potential for collaboration in genomic research nationally and internationally. The sprint project is due to last only eight months and we are on course to have a proof of concept in place and working by the end of this summer. This DARE UK Multi-party trusted research environment federation sprint project consortium includes the University of Cambridge, NIHR Cambridge BRC, Genomics England, Eastern AHSN, Cambridge University Health Partners and Lifebit. Dr Anna Moore, Principal Investigator for FAIR TREATMENT Data federation and mental health Successfully combining large datasets for research has ramifications beyond genomic projects. We know that negative aspects of a young persons life can lead to poor mental health and providing support as soon as possible can make problems easier to treat, and prevent more severe problems later on. Research indicates that its possible to spot patterns in data from health, education and social care records to identify who needs this help early, but its difficult when this information is secured in different places. Furthermore, there are significant challenges with analysing large enough datasets to enable us to better identify young people with rare mental health conditions. To meet these challenges, we partnered with Eastern AHSN to secure funding for another sprint project funded by UKRI as part of the DARE UK programme. The project is called Federated analytics and artificial intelligence research across trusted research A digital first approach environments for child and adolescent mental health (FAIR TREATMENT). We aim to break these silos in We are working with our local integrated care systems (ICSs) to take a digital first approach as they develop their digital and data strategies and governance. We are also working across the region to build on the foundations laid during the pandemic to support remote monitoring of patients in care homes and virtual wards. and support the development of AI tools for the early identification of possible mental health problems in socioeconomic data and develop ways to analyse data between diverse geographical regions without creating large pools of data known as data lakes. By doing this, there is potential to identify relationships young people. This isnt just about overcoming technical barriers; the information governance requirements are complex. Perhaps the most important aspect is understanding the views of the public about how we use their data. We have recruited a diverse panel of almost 100 young people, parents and guardians who are helping us to understand the opportunities and issues. They are working alongside the organisations contributing data, and legal and ethics experts, to develop best practice in how we use and share data for health research in a safe and ethical way. We hope to have built the infrastructure and successfully share proof of concept by this summer. The DARE UK FAIR TREATMENT consortium includes the University of Cambridge (Departments of Psychiatry and Genetics), Eastern AHSN, InterMine, AIMES, Kaleidoscope, University of Birmingham, University of Essex, Anna Freud National Centre for Children and Families, Cambridgeshire County Council and Bitfount. Share this article Return to the contents page If you want to learn more about our work in health data research and informatics, contact us at mark.avery@eahsn.org or visit our website Up next: Read how we are supporting innovations and networks which are helping people take a greater role in their own health and staying healthier for longer. References National Human Genome Research Institute. (2001). International Human Genome Sequencing Consortium Publishes Sequence and Analysis of the Human Genome. Available: https://www.genome.gov/sites/default/files/media/files/2021-02/2001_Press_Release_FC_notes.pdf. Last accessed 22/04/22. 2 Costa, F. (2012). Big Data in Genomics: Challenges and Solutions. G.I.T. Laboratory Journal. 11 (.), 2-3. 3 Crespi, S. (2021). Looking back at 20 years of human genome sequencing. Available: https://www.science.org/content/podcast/looking-back-20-years-human-genomesequencing. Last accessed 22/04/22. 4 Stephens Z.D., Lee S.Y., Faghri F., Campbell R.H., Zhai C., Efron M.J., et al. (2015). Big Data: Astronomical or Genomical? PLoS Biol. 13 (7) 1