CD3 (Cancer Data Driven Detection) is a new, multidisciplinary and multi-institutional strategic national research programme dedicated to using data to transform our understanding of cancer risk and enable early interception of cancers.
It represents a major, multi-million-pound flagship investment funded through a strategic programme award by Cancer Research UK, the National Institute for Health and Care Research (NIHR) and the Engineering and Physical Sciences Research Council (EPSRC); in partnership with Health Data Research UK (HDR UK) and the Economic and Social Research Council’s Administrative Data Research UK programme (ADR UK).
Image representing AI and Big Data. Credit: BrianPenny (Pixabay)
About CD3
Background
Patients diagnosed early have the best chance of curative treatment and long-term survival. However, only 55% of cancers in England, for example, are currently detected early (as defined as being at stage 1 or 2). There is an urgent need for a paradigm shift in our ability to accurately detect and diagnose cancer at an early stage to transform health outcomes.
The Cancer Research UK Early Detection and Diagnosis of Cancer Roadmap highlights that we currently lack a deep understanding of who is most at risk of developing cancer. There is significant scope to more effectively stratify the population based on risk. Increasing volumes of data are being collected and collated on individuals and populations, presenting a timely opportunity to develop methodologies that link and enable access to these datasets; and, through advanced analytical approaches including artificial intelligence and biostatistics, identify and validate novel risk factors that improve multifactorial cancer risk assessment. By determining cancer risk more accurately, we can more precisely target efforts in early detection, diagnosis and prevention.
Cancer Data Driven Detection
Firmly guided by clinical and domain expertise and partnership with patients and public, the CD3 programme will greatly improve our understanding of who is most at risk of developing cancer and, hence, advance our ability to prevent, detect and diagnose cancers early in ways that are highly attentive to equity. This will be achieved by building an open and inclusive network of multidisciplinary researchers devoted to capitalising on the UK’s population-scale multimodal electronic health record, administrative and cohort infrastructure, and its distinctive strengths in cancer genomics, epidemiology and advanced analytics.
The CD3 initiative will:
- Enable access to multimodal data resources following FAIR (Findable, Accessible, Interoperable, Reusable) principles, including EHR, administrative and genomic/lifestyle data from research cohorts, enriched through novel cancer-related data linkages, identifying gaps where these may not be representative or equally serve the entire population
- Build UK-wide multidisciplinary expertise and capacity in cancer data science and analytics
- Develop and apply state-of-the-art advanced analytics, including Artificial Intelligence (AI)/Machine Learning (ML)-based approaches, for identifying novel cancer risk factors and developing and evaluating cancer risk prediction models
- Build multifactorial models with clinical validity and utility to improve our ability to predict cancer risk for asymptomatic and symptomatic populations
- Foster partnerships with patients, the public, practitioners, clinical teams, policy makers, and with key initiatives and infrastructures across the health data science ecosystem to support acceptable and implementable data driven cancer risk prediction and cancer detection in routine front-line healthcare.
Approach
Two workstreams focusing on a) data and b) analytics, will be guided by three driver programmes aimed at:
- Developing multimodal, multicancer risk prediction models for asymptomatic individuals, estimating both overall and cancer-site-specific risk to inform prevention and early detection strategies.
- Using linked cancer-screening data to develop dynamic risk assessment models for breast and colorectal cancer, including cancer subtypes and risk of progression from precursor lesions.
- Developing static and dynamic risk models to guide multicancer detection tests and primary care-led investigations and referrals for symptomatic patients.
Cross-cutting themes will ensure all activities are patient- and public-centred, that ethical/legal considerations are anticipated and addressed, and that the tools and risk models developed are optimised for clinical use, promote equity, monitor inequalities, and have a lasting impact.
Expected Impact
CD3 will enhance cancer risk prediction and stratification for both asymptomatic and symptomatic populations, enabling allocation of enhanced screening, diagnostics, and prevention approaches to individuals most likely to benefit. By identifying high-risk population subgroups, CD3 will inform public health policy and address health inequalities. It will benefit researchers through a legacy of data accessibility, advanced analytics, and cancer risk prediction tools and will build cancer data science capacity. It will foster partnerships with patients, the public, practitioners, and key health data initiatives, supporting the implementation of data-driven cancer risk prediction and detection in routine healthcare.