The Challenge


Procogia’s Approach


  • Our team applied data cleaning/wrangling to handle missing value, and low-quality data using pandas, PySpark and Dask.
  • We applied data management of a large dataset of human genetics variants using PySpark and Hadoop cluster.
  • Machine Learning (ML) techniques were used for different classification tasks such as SVM, Random Forest, and logistic regression.
  • We used data visualization tools such as Tableau to visualize the data in a compelling and easy-to-digest way.

The Results


  • Clinicians and scientists received more insights to assist them in making better decisions about medical treatments and underlying conditions.
  • Improved predictions about patient’s disease status with encouraging accuracies (80%), precision and recall (F1 score).
  • Improved operation efficiency of the hospital by predicting a patient’s length of stay. Hospitals can identify patients with a high length of stay risk at the time of admission. As a result, these patients can have their treatment plan optimized to minimize the length of stay, and it can aid logistics such as room and bed allocation planning.



python 4
tableau logo
r 2

Related Blogs


Let’s Connect