- The client had asked ProCogia to help replace their existing vehicle telemetry solution, which persisted data in an SQL database. This solution was to be deprecated in favour of one that aligned with their existing Data Platform, where data was stored using the Delta format.
- ProCogia built a new Azure Data Factory pipeline to facilitate procurement of the raw XML formatted telemetry data and convert it to a tabular (parquet file) form.
- The telemetry comprised of vessel sensor reading which included engine RPM, fuel burn rates and GPS locations amongst others.
- The pipeline employed an Azure function app, again written in Python/Pandas. With each row of XML delivering a different timestamped sensor reading, data processing was non-trivial and called for both time-based grouping and pivoting.
- Pandas was chosen as it provided a natural and easy solution to wrangling XML data. Furthermore, when deployed as a function app, Azure offers a low cost, pay-per-use serverless solution, perfect for batch-based ingestion pipelines.
- A PySpark script, deployed into a Databricks cluster completed the curation process by persisting the data as Delta formatted parquet files. ProCogia demonstrated industry best practices by developing and unit testing PySpark code in a local development environment. To date all Spark code was built interactively in notebooks on a running cluster.
- ProCogia were able to rebuild a new telemetry ingestion mechanism from the ground up that delivered data straight into the client’s Data Lake hosted in Azure, making it available as alongside other operational data.
- Using Databricks, we were able to materialize results from complex joins, preparing metrics for rapid and easy consumption at the presentation layer.
- Our solution helped the client gain a deeper understanding of their KPIs including distance travelled, RPM (Revs per minute), fuel consumption rates, journey duration and GPS tracking.