Why are Logistics Carriers so Bad with Data?

Just to be clear when I say Logistic Carriers I mean UPS, DHL, DSV, GLS and many other three letter companies that deliver your Christmas presents, but in a business context they are also responsible for B2B deliveries of goods to shops and warehouses.

I’ve previously worked in Logistics for many years and so have some insight into integrations with Logistic Carriers and the picture is not pretty.

Now I’m working in consultancy and it’s my task to gather invoices from different carrier, process the data and present it in the form of an interactive dashboard.

Sounds easy, just use APIs to get the data, clean, reshape and present. If only πŸ™„

First up no carriers offers APIs for invoicing data. Unbelievable I know. They have APIs for tracking, label printing, updating shipping info, rate calculations, everything else but not invoicing. You have to log in to a web portal, search, select your files and download. Welcome to 1995.

It gets worse from there.


  • Provides PDF or CSV files πŸ‘
  • The CSV has 250 columns!
  • The services are provided as row items so to process data you need to pivot as one package can have many services applied
  • There are 70 different service charges!!!!
  • The service charges appear in the local language as plain text with no service codes making joining multiple files a nightmare
  • Dimensions appear as text fields like 24.0X 12.0X 8.0. Thank god for regexes.
  • Weights and quantities only appear on lines associated with actual freight costs
  • Sometimes data is just missing, orders, tracking numbers, sender, you name it.

Royal Mail

  • Provides CSV files that looks to be sourced from the original PDF
  • CSV files contain a totals row and subtotals along with other extra lines 😬
  • Charges are split across multiple lines (since the PDF does the same)
  • PDF has extra line charges not included in the CSV
  • If you download a single CSV file it’s delimited with a pipe ( | ) but if you download multiple files you get a consolidated file that’s comma delimited 😲
  • Developer site appears to be permanently down (at least it has been for the last two weeks)


  • Excel files available πŸ‘ but they look exactly like you’d expect a PDF invoice to look like 🀯
  • Excel files are still in xls format (the newer xlsx format was introduced in 2007, 13 years ago, this gives you an insight into their systems)
  • Only contains monetary values, no tracking info, no quantities of packages, individual weights etc.
  • Every line item is a summary line that’s immediately duplicated as a sub-total underneath
  • Random cells are merged and formatting makes reading data very challenging
  • Basically they’ve designed an invoice in Excel that generates the PDF they wish to send to their customers

On a side note I recently saw a job advertisement for DSV that specifically asks for strong Excel and VBA skills. Job descriptions are a great way to get an insight into a company and their systems.

What should have been a straightforward integration and data presentation exercise has tuned into a complex workflow requiring web automation, file conversion and extensive data cleaning before we can even start to look at the data.

Environmental Considerations

Moving forward companies will also have an increased demand from their investors to measure and set targets for CO2 emissions.

As things stand today most companies don’t know how their products get from A to B since the carriers are left to optimize their distributions for cost/deadlines with little consideration for the environment. Expect this to change. They will need to offer ‘Least impact’ options to customers and also provide the data to enable companies to report on their distribution environmental impact.

On the parcel tracking side there’s been lots of startup activity creating aggregators that act as a single source of data. Maybe we need that for invoicing as well πŸ€”

Leave a comment

Your email address will not be published. Required fields are marked *