Portal Data Quality Checking

Summary

Tidied, wrangled, and detected data quality issues in a year's worth of freeway detector data from Portal. A key issue in this process was missing data. Here, "missing data" means no data is recorded in the system when it should be. This can occur for many reasons, but typically happens when detectors are not functioning properly and do not record any information.

To find missing data, I generated a sequence of all the timestamps that should be in the dataset. I joined this to the original data and flagged records that had no matches. Flags were also applied based on six other criteria, e.g., records with negative speeds. Once data quality flags were applied, I aggregated the data to daily totals of seven types of data quality flags for each detector. I then created a quick report of when and where missing data were occurring on the freeways in Power BI.

Languages/tools

  • R
  • Power BI