The Challenge
Our client wanted to be able to track and monitor more data points from individual tugboat operations. ProCogia were engaged to highlight parameters including distance travelled, RPM (Revs per minute), fuel consumption rates, journey duration, and GPS tracking. The challenge was to reprocess the telemetry data and offload the processing from Power BI so reports could be rendered in a responsive manner. The project was further complicated by inconsistent data formatting (much of it was in XML format) & required some considerable data wrangling.
Procogia’s Approach
- The client had asked ProCogia to help replace their existing vehicle telemetry solution, which persisted data in an SQL database. This solution was to be deprecated in favour of one that aligned with their existing Data Platform, where data was stored using the Delta format.
- ProCogia built a new Azure Data Factory pipeline to facilitate procurement of the raw XML formatted telemetry data and convert it to a tabular (parquet file) form.
- The telemetry comprised of vessel sensor reading which included engine RPM, fuel burn rates and GPS locations amongst others.
- The pipeline employed an Azure function app, again written in Python/Pandas. With each row of XML delivering a different timestamped sensor reading, data processing was non-trivial and called for both time-based grouping and pivoting.
- Pandas was chosen as it provided a natural and easy solution to wrangling XML data. Furthermore, when deployed as a function app, Azure offers a low cost, pay-per-use serverless solution, perfect for batch-based ingestion pipelines.
- A PySpark script, deployed into a Databricks cluster completed the curation process by persisting the data as Delta formatted parquet files. ProCogia demonstrated industry best practices by developing and unit testing PySpark code in a local development environment. To date all Spark code was built interactively in notebooks on a running cluster.
The Results
- ProCogia were able to rebuild a new telemetry ingestion mechanism from the ground up that delivered data straight into the client’s Data Lake hosted in Azure, making it available as alongside other operational data.
- Using Databricks, we were able to materialize results from complex joins, preparing metrics for rapid and easy consumption at the presentation layer.
- Our solution helped the client gain a deeper understanding of their KPIs including distance travelled, RPM (Revs per minute), fuel consumption rates, journey duration and GPS tracking.
Services Used
Data Engineering
We partner with all major cloud providers, allowing us to adopt a data-agnostic approach focused on delivering tailored game-changing solutions.
Data Consultancy
We provide data consultancy to organizations to optimize your investment in people, processes, and technology. This is typically through data strategy engagements, roadmaps, transformations, and independent technology advice.
Related Blogs
Technologies Used










Let’s Connect
What can we help you with?
T: +1 425-624-7532
Alternatively, simply fill in this form and we’ll be in touch.