The Challenge


Procogia’s Approach


  • The client had built a proof-of-concept (POC) pipeline in Python in early 2020 but development was paused until late 2021.
  • ProCogia was first tasked with evaluating the POC pipeline using serial dilution samples and real patient samples:
    Our analysis showed the POC pipeline had increased sensitivity and specificity in detecting ctDNA compared to an existing pipeline that relied solely on SNVs.
  • ProCogia was then tasked with developing a new PV pipeline in Python that would incorporate many new features and build off from a recent publication that showcased the utility of PVs for ctDNA detection.
  • The POC pipeline relied on output VCF files from a separate software for identifying patient-specific PVs:
    To remove the reliance on this software, ProCogia developed a new module that would identify PVs directly from the read alignment (SAM/BAM) file using Python/Pysam.
  • Additional features that ProCogia has implemented: 

    Filter SNVs by positional base quality to reduce false positive calls.Discard potential germline SNVs by comparing allelic frequencies to matched-normal samples.

    Discard PVs the did not overlap target regions (defined in BED format) by utilizing the binary search tree algorithm.

    Discard artifactual PVs that are present in matched-normal samples.

    Identify and track the number of unique DNA molecules supporting PVs to estimate tumor fraction.

  • Ongoing implementation: 

    A module to finetune parameters of the pipeline on a set of training samples. By implementing Python class objects to store SNV and PV data, the time for identifying PVs was reduced by an order of magnitude.A module to perform Monte Carlo sampling of the data to evaluate background noise and estimate p-values for each PV.

The Results


  • We delivered a stable, fully unit tested and documented PV pipeline with new features that can improved sensitivity and specificity in ctDNA detection.
  • Lab samples were used to confirm real world data produced similar results as Monte Carlo simulation PV pipeline results.
  • PV pipeline was then optimized to improve processing ctDNA sample identification at a 85% reduction in run time.
  • ProCogia continue to upskill the client’s internal team and apply best practices for developing and unit testing Python code.


Related Blogs


Let’s Connect