The Challenge
Circulating tumor DNA (ctDNA) has shown increasing promise for detecting minimal residual disease (MRD) and predicting the survival outcome of cancer patients. Next-generation sequencing (NGS) of liquid biopsies has allowed for increased precision in the detection of ctDNA.
Previous methods highlight the utility of single nucleotide variants (SNVs) as patient-specific biomarkers in detecting MRD. Phase-variants (PVs), a cluster SNVs present on the same read-pair, has shown promise as a biomarker with lower background noise and higher sensitivity and specificity compared to individual SNVs. ProCogia was asked to develop a pipeline written in Python that identifies patient-specific PVs from pre-treatment biopsies and evaluate their presence in post-treatment samples for the presence of ctDNA.
Procogia’s Approach
- The client had built a proof-of-concept (POC) pipeline in Python in early 2020 but development was paused until late 2021.
- ProCogia was first tasked with evaluating the POC pipeline using serial dilution samples and real patient samples:
Our analysis showed the POC pipeline had increased sensitivity and specificity in detecting ctDNA compared to an existing pipeline that relied solely on SNVs. - ProCogia was then tasked with developing a new PV pipeline in Python that would incorporate many new features and build off from a recent publication that showcased the utility of PVs for ctDNA detection.
- The POC pipeline relied on output VCF files from a separate software for identifying patient-specific PVs:
To remove the reliance on this software, ProCogia developed a new module that would identify PVs directly from the read alignment (SAM/BAM) file using Python/Pysam. - Additional features that ProCogia has implemented:
Filter SNVs by positional base quality to reduce false positive calls.Discard potential germline SNVs by comparing allelic frequencies to matched-normal samples.
Discard PVs the did not overlap target regions (defined in BED format) by utilizing the binary search tree algorithm.
Discard artifactual PVs that are present in matched-normal samples.
Identify and track the number of unique DNA molecules supporting PVs to estimate tumor fraction.
- Ongoing implementation:
A module to finetune parameters of the pipeline on a set of training samples. By implementing Python class objects to store SNV and PV data, the time for identifying PVs was reduced by an order of magnitude.A module to perform Monte Carlo sampling of the data to evaluate background noise and estimate p-values for each PV.
The Results
- We delivered a stable, fully unit tested and documented PV pipeline with new features that can improved sensitivity and specificity in ctDNA detection.
- Lab samples were used to confirm real world data produced similar results as Monte Carlo simulation PV pipeline results.
- PV pipeline was then optimized to improve processing ctDNA sample identification at a 85% reduction in run time.
- ProCogia continue to upskill the client’s internal team and apply best practices for developing and unit testing Python code.
Services Used
Data Consultancy
We provide Data Consultancy to organizations to optimize your investment in people, processes, and technology.
Data Science
Using a blend of mathematics, software tools, business intelligence, and algorithms, we can draw insights and patterns from your raw data, allowing you to make intelligent data-driven decisions.
Bioinformatics
We deliver scientific results that drive clinical and translational research decisions. Our Bioinformatics team has extensive experience designing, optimizing, executing and analyzing pre-clinical and clinical research projects using next-generation sequencing technologies.
Related Blogs
Let’s Connect
What can we help you with?
T: +1 425-624-7532
Alternatively, simply fill in this form and we’ll be in touch.