Develop pipeline that utilizes phased variants to detect circulating tumor DNA in cancer patients

Author

Life Sciences

Sign up for our newsletter

We care about the protection of your data. Read our Privacy Policy.

Develop pipeline that utilizes phased variants to detect circulating tumor DNA in cancer patients

The Challenge

The existing POC pipeline demonstrated potential but required significant enhancements to meet clinical needs. Key challenges included improving the pipeline’s sensitivity and specificity, removing reliance on external software for PV identification, and incorporating new features to reduce false positives and accurately estimate tumor fractions. Additionally, the pipeline needed to be optimized for processing efficiency and equipped with tools for rigorous evaluation using real patient samples.

Procogia’s Approach

Evaluation of POC

The client had built a proof-of-concept (POC) pipeline in Python in early 2020 but development was paused until late 2021. ProCogia was first tasked with evaluating the POC pipeline using serial dilution samples and real patient samples. Our analysis showed the POC pipeline had increased sensitivity and specificity in detecting ctDNA compared to an existing pipeline that relied solely on SNVs.

Develop PV Pipeline

ProCogia was then tasked with developing a new PV pipeline in Python that would incorporate many new features and build off from a recent publication that showcased the utility of PVs for ctDNA detection.

Develop New Module

The POC pipeline relied on output VCF files from a separate software for identifying patient-specific PVs. To remove the reliance on this software, ProCogia developed a new module that would identify PVs directly from the read alignment (SAM/BAM) file using Python/Pysam.

Additional Features

Filter SNVs by positional base quality to reduce false positive calls.Discard potential germline SNVs by comparing allelic frequencies to matched-normal samples. Discard PVs the did not overlap target regions (defined in BED format) by utilizing the binary search tree algorithm. Discard artifactual PVs that are present in matched-normal samples. Identify and track the number of unique DNA molecules supporting PVs to estimate tumor fraction.ents.

Ongoing implementation:

A module to finetune parameters of the pipeline on a set of training samples. By implementing Python class objects to store SNV and PV data, the time for identifying PVs was reduced by an order of magnitude.A module to perform Monte Carlo sampling of the data to evaluate background noise and estimate p-values for each PV.

The Results

We delivered a stable, fully unit tested and documented PV pipeline with new features that can improved sensitivity and specificity in ctDNA detection.

Lab samples were used to confirm real world data produced similar results as Monte Carlo simulation PV pipeline results.

PV pipeline was then optimized to improve processing ctDNA sample identification at a 85% reduction in run time.

ProCogia continue to upskill the client’s internal team and apply best practices for developing and unit testing Python code.

Services Used

Data Consultancy

We provide Data Consultancy to organizations to optimize your investment in people, processes, and technology.

Data Science

Using a blend of mathematics, software tools, business intelligence, and algorithms, we can draw insights and patterns from your raw data, allowing you to make intelligent data-driven decisions.

Bioinformatics

We deliver scientific results that drive clinical and translational research decisions. Our Bioinformatics team has extensive experience designing, optimizing, executing and analyzing pre-clinical and clinical research projects using next-generation sequencing technologies.

Conclusion

Explore more stories

Dig deeper into data development by browsing our blogs…

Modernizing Data Infrastructure for Reliable, Real-Time Marine Operations

Company Information A leading marine transportation and ship-assist services provider in the Pacific Northwest, the organization delivers safe, efficient, and sustainable maritime operations. Its services

Watch now ->

Achieving 70% Cost Savings through Data Pipeline Optimization and Automation

Watch now ->

A futuristic digital illustration inspired by the Vancouver Whitecaps, showing a soccer player sprinting on the field surrounded by glowing holographic data visualizations and abstract analytics graphics. The player is mid-action, with sleek blue and white tones highlighting technology, motion, and athletic performance. Hexagonal patterns, digital charts, and a glowing field diagram in the background evoke themes of AI, data science, and advanced player analysis.

Data-Driven Scouting: A Whitecaps FC and ProCogia Collaboration

In today’s rapidly evolving landscape, the intersection of sports and technology offers unprecedented opportunities for growth, efficiency, and enhanced experiences. This document explores the collaborative

Watch now ->

Get in Touch

Let us leverage your data so that you can make smarter decisions. Talk to our team of data experts today.

Author

Bill Carney

View all posts

Subscribe to our newsletter

Stay informed with the latest insights, industry trends, and expert tips delivered straight to your inbox. Sign up for our newsletter today and never miss an update!

We care about the protection of your data. Read our Privacy Policy.

Keep reading

Dig deeper into data development by browsing our blogs…

Get in Touch

Let us leverage your data so that you can make smarter decisions. Talk to our team of data experts today or fill in this form and we’ll be in touch.

Take a deeper dive

Locate Us

Follow Us

Contact Us

Take a deeper dive

Locate Us

Follow Us

Contact Us

Develop pipeline that utilizes phased variants to detect circulating tumor DNA in cancer patients

Author

Bill Carney

Categories

Sign up for our newsletter

Develop pipeline that utilizes phased variants to detect circulating tumor DNA in cancer patients

Company Information

The Challenge

Procogia’s Approach

Evaluation of POC

Develop PV Pipeline

Develop New Module

Additional Features

Ongoing implementation:

The Results

Services Used

Data Consultancy

Data Science

Bioinformatics

Conclusion

Explore more stories

Get in Touch

Author

Subscribe to our newsletter

Keep reading

Get in Touch