Welcome to HDR UK National Phenomics Resource Project, a project funded by Health Data Research UK.

When patients interact with physicians, or are admitted into hospital, information is collected electronically on their symptoms, diagnoses, laboratory test results, and prescriptions. This information is stored securely in Electronic Health Records (EHR) and is a valuable resource for researchers and clinicians for improving health and healthcare. EHRs are however of variable detail and quality and contain many inconsistencies. As a result, researchers and data providers spend considerable time creating complex computer programs to fix and statistically analyse the information in EHR and identify which patients have which disease. Currently, there is no means to share these tools across institutions in the UK resulting in duplication of effort. Reproducibility of research is also hampered as others do not have access to the precise methods and definitions used in a particular study. This project addresses these issues by creating an open resource for EHR users (researchers, clinicians, the NHS and data providers) to share their methods.

What is ‘phenomics’?

Phenomics refers to the science of deriving new knowledge for health by studying multiple conditions in new ways. This involves studying all currently recognized diseases – so called ‘phenome wide’ approaches. In order to do this efficiently phenomics approaches require the creation of computable definitions of diseases, health states and traits, including temporal components of these (i.e. change and rate of change over time). It covers the full spectrum of health and disease across the entire life course and is relevant to a wide range of potential stakeholders and beneficiaries.

The challenge

A primary reason for using data from EHR is the creation of phenotype algorithms to identify disease status, onset and progression. Phenotyping (describing the characteristics of disease) however is challenging as the data are collected for different purposes, have variable data quality and often require significant harmonisation. While considerable effort goes into these algorithms, there is no consistent methodology for creating and evaluating them and no centralised repository for depositing and sharing them.

The solution

We will create a national library for dissemination of citable algorithms (incl. validations) and tools which will reduce duplication of effort and improve research reproducibility. We will explore methods for creating computable representations of algorithms for integration into actionable analytics for healthcare. Finally, we will fundamentally shift the EHR cultural landscape by a robust incentivisation programme, providing guidelines on best practices, cross-disciplinary training, and ensuring alignment with other international initiatives.

Impact and outcomes

Though this project, we will deliver a fundamental step-change in the current EHR community in the UK by bringing together health data scientists, clinicians, computer scientists, public health experts and data curators under the FAIR principles (www.force11.org). The National Phenomics Resource will facilitate the dissemination and re-use of algorithms, tools and methods by the community. By establishing a national standard for creating, evaluating and representing phenotypes, we will accelerate the impact of discovery through increased transparency and replicability and maximise the usability and value of existing data repositories to new users. Finally, we will take the first steps towards establishing computational biomedical knowledge objects (e.g. guidelines with embedded phenotypes endorsed by NICE) which will enable the creation of actionable health analytics in the NHS.

About the resource

The Library contains definitions for hundreds of diseases from structured and unstructured data sources including mobile devices and wearables. For each phenotype, the library curates its metadata, implementation details, code and validation information, and enables reproducible and transparent research by the wider research and clinical community. National research initiatives can curate and showcase their algorithms through bespoke “collection” pages such as BREATHE (respiratory phenotypes collated by the HDR UK Hub for Respiratory Health for use in research) and the British Heart Foundation Data Science Centre phenotypes. Finally, the Phenotype Library is cross-linked to the HDR UK Innovation Gateway to enable discovery of tools and maximise the usability and value of existing data repositories to the research community.

Objectives

Objectives

  1. Scoping and Prototyping: To landscape existing national/international approaches for creating, validating and curating multimodal disease phenotypes; gather requirements through stakeholder engagement; define a phenotype presentation metadata standard, and deliver a prototype showcasing exemplars.

  2. Phenome Portal: To build and curate an online, open-access, standards-driven library of complex phenotypes enabling their dissemination, re-use, evaluation, and citation.

  3. Computable Phenotype Model and Tooling: To evaluate computable phenotype representation approaches and build data management tools for common UK EHR datasets.

  4. Training and Capacity Building: To develop and deliver cross-disciplinary training on phenotyping, reproducible science, scientific software development at undergrad, postgrad and continuous professional development (CPD) levels.

  5. Community and Engagement: To ignite and evolve the user community by incentivizing usage and on-going meaningful engagement across stakeholders.

Electronic health records

When patients interact with physicians, or are admitted into hospital, information is collected electronically on their symptoms, diagnoses, laboratory test results, and prescriptions and stored in Electronic Health Records (EHR). EHR are a valuable resource for researchers and clinicians as they provide comprehesive information about a patients health, and healthcare, over long periods of time.

Phenotyping algoritms

A primary use-case for EHR is the creation of phenotyping algorithms used to identify disease status, onset and progression or extraction of information on risk factors or biomarkers. These complex algorithms can enable researchers to extract information from EHR, statistically analyze it and use the findings to improve human health. While considerable effort goes into creating these algorithms, there is no consistent methodology for creating and evaluating them and no centralised repository for depositing and sharing them.

Key principles

  • The Library stores phenotyping algorithms, metadata and tools only. No data are stored in the Library.
  • Ideally, phenotypes that are deposited in the Library will have undergone some form of peer-review to assess validity and quality either through peer-reviewed publication or some other means of sharing the definition(s)
  • Phenotype definitions will be assigned a unique Digital Object Identifier (DOI) to facilitate identification of the phenotype
  • All material deposited in the Library remain the intellectual property of the research group who created the phenotype(s) – the default licensing agreement that information is available under is the Creative Commons Attribution 4.0 (CC-A)
  • Users should cite the Phenotype Library in all publications, presentations and reports as follows: “HDR UK CALIBER Phenotype Library https://portal.caliberresearch.org/”
  • The aim of the Library is not to standardize or harmonize disease definitions, therefore several phenotypes may be stored for the same condition and the onus is on individual researchers to explore which phenotypes they wish to use

Future plans

Short term - we are currently working on expanding the Phenotype Library to include a number of new collections (HDR UK Hubs for critical care (PIONEER) and cancer (DATA-CAN), ClinicalCodes.org and UK Biobank). Longer term - we are taking the first steps towards establishing computational biomedical knowledge objects (e.g. guidelines with embedded phenotypes endorsed by NICE) which will enable the creation of actionable health analytics in the NHS.