Develop pipeline that utilizes phased variants to detect circulating tumor DNA in cancer patients
Company Information
Detecting ctDNA with high sensitivity and specificity is crucial for early cancer diagnosis and monitoring. A client had initiated a proof-of-concept (POC) pipeline in Python to utilize PVs for ctDNA detection but needed expertise to evaluate and enhance this pipeline for practical application. ProCogia was brought on board to refine and expand the POC pipeline, leveraging its expertise in data science, bioinformatics, and data consultancy to build a robust, feature-rich PV pipeline.
The Challenge
The existing POC pipeline demonstrated potential but required significant enhancements to meet clinical needs. Key challenges included improving the pipeline’s sensitivity and specificity, removing reliance on external software for PV identification, and incorporating new features to reduce false positives and accurately estimate tumor fractions. Additionally, the pipeline needed to be optimized for processing efficiency and equipped with tools for rigorous evaluation using real patient samples.
Procogia’s Approach
Evaluation of POC
The client had built a proof-of-concept (POC) pipeline in Python in early 2020 but development was paused until late 2021. ProCogia was first tasked with evaluating the POC pipeline using serial dilution samples and real patient samples. Our analysis showed the POC pipeline had increased sensitivity and specificity in detecting ctDNA compared to an existing pipeline that relied solely on SNVs.
Develop PV Pipeline
ProCogia was then tasked with developing a new PV pipeline in Python that would incorporate many new features and build off from a recent publication that showcased the utility of PVs for ctDNA detection.
Develop New Module
The POC pipeline relied on output VCF files from a separate software for identifying patient-specific PVs. To remove the reliance on this software, ProCogia developed a new module that would identify PVs directly from the read alignment (SAM/BAM) file using Python/Pysam.
Additional Features
Filter SNVs by positional base quality to reduce false positive calls.Discard potential germline SNVs by comparing allelic frequencies to matched-normal samples. Discard PVs the did not overlap target regions (defined in BED format) by utilizing the binary search tree algorithm. Discard artifactual PVs that are present in matched-normal samples. Identify and track the number of unique DNA molecules supporting PVs to estimate tumor fraction.ents.
Ongoing implementation:
A module to finetune parameters of the pipeline on a set of training samples. By implementing Python class objects to store SNV and PV data, the time for identifying PVs was reduced by an order of magnitude.A module to perform Monte Carlo sampling of the data to evaluate background noise and estimate p-values for each PV.
The Results
We delivered a stable, fully unit tested and documented PV pipeline with new features that can improved sensitivity and specificity in ctDNA detection.
Lab samples were used to confirm real world data produced similar results as Monte Carlo simulation PV pipeline results.
PV pipeline was then optimized to improve processing ctDNA sample identification at a 85% reduction in run time.
ProCogia continue to upskill the client’s internal team and apply best practices for developing and unit testing Python code.
Services Used
Data Consultancy
We provide Data Consultancy to organizations to optimize your investment in people, processes, and technology.
Data Science
Using a blend of mathematics, software tools, business intelligence, and algorithms, we can draw insights and patterns from your raw data, allowing you to make intelligent data-driven decisions.
Bioinformatics
We deliver scientific results that drive clinical and translational research decisions. Our Bioinformatics team has extensive experience designing, optimizing, executing and analyzing pre-clinical and clinical research projects using next-generation sequencing technologies.
Conclusion
The existing POC pipeline demonstrated potential but required significant enhancements to meet clinical needs. Key challenges included improving the pipeline’s sensitivity and specificity, removing reliance on external software for PV identification, and incorporating new features to reduce false positives and accurately estimate tumor fractions. Additionally, the pipeline needed to be optimized for processing efficiency and equipped with tools for rigorous evaluation using real patient samples.
Explore more stories
Dig deeper into data development by browsing our blogs…
ProCogia helped T-Mobile migrate an on-prem Oracle Database to Snowflake
ProCogia helped T-Mobile migrate an on-prem Oracle Database to Snowflake $2M In Recovered Revenue Company Information T-Mobile is a leading telecommunications provider in the United
Custom Logging Plumber API
Introduction Our client, a leading investment consulting firm, aimed to improve system monitoring, troubleshooting, and reliability by implementing a logging system that tracks user actions
How InfoIQ Helped an E-Commerce Company Boost Conversions
How InfoIQ Helped an E-Commerce Company Boost Conversions 30% Increase in Customer Satisfaction 15% Reduction in Bounce Rates 20% Increase in Conversion Rates Introduction Meet
Get in Touch
Let us leverage your data so that you can make smarter decisions. Talk to our team of data experts today or fill in this form and we’ll be in touch.