Welcome to HDR UK National Phenomics Resource Project, a project funded by Health Data Research UK.

When patients interact with physicians, or are admitted into hospital, information is collected electronically on their symptoms, diagnoses, laboratory test results, and prescriptions. This information is stored securely in Electronic Health Records (EHR) and is a valuable resource for researchers and clinicians for improving health and healthcare. EHRs are however of variable detail and quality and contain many inconsistencies. As a result, researchers and data providers spend considerable time creating complex computer programs to fix and statistically analyse the information in EHR and identify which patients have which disease. Currently, there is no means to share these tools across institutions in the UK resulting in duplication of effort. Reproducibility of research is also hampered as others do not have access to the precise methods and definitions used in a particular study. This project addresses these issues by creating an open resource for EHR users (researchers, clinicians, the NHS and data providers) to share their methods.

What is ‘phenomics’?

Phenomics refers to the science of deriving new knowledge for health by studying multiple conditions in new ways. This involves studying all currently recognized diseases – so called ‘phenome wide’ approaches. In order to do this efficiently phenomics approaches require the creation of computable definitions of diseases, health states and traits, including temporal components of these (i.e. change and rate of change over time). It covers the full spectrum of health and disease across the entire life course and is relevant to a wide range of potential stakeholders and beneficiaries.

The challenge

A primary reason for using data from EHR is the creation of phenotype algorithms to identify disease status, onset and progression. Phenotyping (describing the characteristics of disease) however is challenging as the data are collected for different purposes, have variable data quality and often require significant harmonisation. While considerable effort goes into these algorithms, there is no consistent methodology for creating and evaluating them and no centralised repository for depositing and sharing them.

The solution

We will create a national platform for dissemination of citable algorithms (incl. validations) and tools which will reduce duplication of effort and improve research reproducibility. We will explore methods for creating computable representations of algorithms for integration into actionable analytics for healthcare. Finally, we will fundamentally shift the EHR cultural landscape by a robust incentivisation programme, providing guidelines on best practices, cross-disciplinary training, and ensuring alignment with other international initiatives.

Impact and outcomes

Though this project, we will deliver a fundamental step-change in the current EHR community in the UK by bringing together health data scientists, clinicians, computer scientists, public health experts and data curators under the FAIR principles (www.force11.org). The National Phenomics Resource will facilitate the dissemination and re-use of algorithms, tools and methods by the community. By establishing a national standard for creating, evaluating and representing phenotypes, we will accelerate the impact of discovery through increased transparency and replicability and maximise the usability and value of existing data repositories to new users. Finally, we will take the first steps towards establishing computational biomedical knowledge objects (e.g. guidelines with embedded phenotypes endorsed by NICE) which will enable the creation of actionable health analytics in the NHS.

About the resource

Objectives

Objectives

  1. Scoping and Prototyping: To landscape existing national/international approaches for creating, validating and curating multimodal disease phenotypes; gather requirements through stakeholder engagement; define a phenotype presentation metadata standard, and deliver a prototype showcasing exemplars.

  2. Phenome Portal: To build and curate an online, open-access, standards-driven library of complex phenotypes enabling their dissemination, re-use, evaluation, and citation.

  3. Computable Phenotype Model and Tooling: To evaluate computable phenotype representation approaches and build data management tools for common UK EHR datasets.

  4. Training and Capacity Building: To develop and deliver cross-disciplinary training on phenotyping, reproducible science, scientific software development at undergrad, postgrad and continuous professional development (CPD) levels.

  5. Community and Engagement: To ignite and evolve the user community by incentivizing usage and on-going meaningful engagement across stakeholders.

Electronic health records

When patients interact with physicians, or are admitted into hospital, information is collected electronically on their symptoms, diagnoses, laboratory test results, and prescriptions and stored in Electronic Health Records (EHR). EHR are a valuable resource for researchers and clinicians as they provide comprehesive information about a patients health, and healthcare, over long periods of time.

Phenotyping algoritms

A primary use-case for EHR is the creation of phenotyping algorithms used to identify disease status, onset and progression or extraction of information on risk factors or biomarkers. These complex algorithms can enable researchers to extract information from EHR, statistically analyze it and use the findings to improve human health. While considerable effort goes into creating these algorithms, there is no consistent methodology for creating and evaluating them and no centralised repository for depositing and sharing them.