Common Data Mistakes

Wrong way, if you see these mistakes turn back!
Turn Back!

This isn’t meant an an exhaustive list but just the most common errors I’ve seen over the years.

  • Taking the average of values that are already an average
  • Trying to join database tables on fields with different data types
  • Processing data you will later discard, do filtering early on
  • Using averages when mean or mode is better (the average person has 1,9999 legs)
  • Forgetting seasonality in data
  • Accidentally filtering out data by incorrect combining of AND/OR statements
  • Errors caused by fixed length string in databases (often space padded like ‘DataPoint ‘)
  • Accidentally missing data by being provided data from a source that already has hidden filters applied
  • Losing data when joining tables in SQL due to missing data in one table (use outer joins)
  • Poor data quality (countries stored as UK, GB, England etc. in the same field, phone number with 00 and + prefixes)
  • Null data breaking report filters and SQL where statements
  • Misusing null in Boolean fields (1 is true, 0 is false and null is unknown)
  • Storing data as generic data types (dates as text, numbers as text, Boolean as integers and so on)
  • Storing the same data point in different tables with different data types
  • Being inconsistent in naming conventions (my_table, MyTable), pick one and stick to it
  • Using cryptic names for fields instead of names that make sense (max_temp is OK m_tmp is ambiguous)

Leave a comment

Your email address will not be published. Required fields are marked *