Data Audits: Know Your Data

Managing and maintaining data maturity requires deep knowledge of your data. That seems obvious but merely looking at it and nodding your head doth not knowledge bring. Periodical data audits are an excellent way to understand whether your data is being used, or not. You may discover some work-arounds that avoid compliance or that embed secret codes that only a single market might use. These clutter up the database and part of your job as a data steward is to make sure that the data fulfills the needs of the user. The first task of any data audit is to listen to your key stakeholders.

Understanding the data stakeholders

Getting to know your data stakeholders isn’t for policing or reviewing their data habits. It is to better understand how to interpret what they need from a machine. In the next chapter you will learn about machine learning and artificial intelligence, and they both rely on humans to program the machine. By understanding the needs of your stakeholders, you can bridge the gap between your users, and the developers and data scientists who will configure the systems to help people do their job well.

Discover where your data is stored. It might be in the cloud; it might also be represented in an on premise data repository. Map out the data so that you are aware of where it is, how it is represented, and where it travels. If data is stored in multiple places, do you have a communications plan in place for change requests or updates that might occur? In a mature data environment, a centralized shared taxonomy is often used for tagging assets and content to help keep all dependent systems in sync automatically.

Identifying where the data is stored

Conducting a data health check within the DAM

It is recommended to conduct regular health checks in your DAM to ensure data accuracy. Check the system analysis of the search terms used, which metadata elements are being populated, and which ones are usually left blank. Here is a quick checklist of what more to explore:

  • Missing data or null fields. Are there important metadata that are missing which are hindering search and retrieval? Important metadata fields such as Asset or Content Types and Format are essential descriptive data and critical for future automation. Discover why the fields are missing and try to find a way to remedy the situation.
  • Incorrectly formatted data, such as wrong date notation. In a global company, you’ll find different data standards that may prove confusing. Date notations, for example, are different in the US and UK. These differences must be acknowledged and configured to a common standard such as the ISO date format of YYYY-MM-DD.
  • Data generated by bots (for instance, through a contact form on your website) or chat bots that help users find what they are looking and store automated responses are generally helpful. However, there are other bots that are malicious and can hack into your data and exploit it. Be aware of the bots that are in play in your organization and be on the look-out for any anomalies. Contact your Data Scientist or IT representative immediately if you have any suspicions.
  • Wrong data through human error, such as keying errors, misspellings, or omitted words. To err is human. When you discover some consistently wrongly tagged assets, consider using the taxonomy to map common misspellings or acronyms as synonyms. Alternatively, you can program the DAM to suggest an autocorrect with the errors. These errors will surface in regular data audits.