In the last article “Data Governance Tooling — Data Catalogue Value Case” we explored the value case for a data catalogue including some examples of the benefits. One said benefit is a glossary.
Simply put, a glossary is a collection of terms which are used within an organisation, defining them can help ensure that everyone is speaking the same language! yes I’m looking at you Finance.. data fluency is an article for another time but we couldn't talk about a glossary with out at least acknowledging it, so a brief take is data fluency (previously known as literacy until it was pointed out calling someone illiterate may not be the best move), refers to the ability for the business to understand its data estate, be it a measures used in reports, data terms such as ‘ingestion pipelines’, business terminology like ‘Constant Currency’ or even the ability to interpret its data through self serve analytics.
The data glossary is a critical component for improving data fluency but its also a core part of any data catalogue activity, linking glossary terms to definitions for metrics / schema definitions is a valuable exercise, here is an example:
Returned Units vs LY — Calculates the quantity of returned units by product vs the same period last year.
Your Glossary Terms Related:
Returned Units — The number of units returned to store, represented as whole units.
LY (Last Year)- The year displayed as “YYYY” -1 to todays current year.
These are just examples of where you glossary can define terms like the 2 above which are then associated to your data attributes.
Now, lets talk about information architecture and taxonomy, I have always found those names really grand but they are pretty simple, its how do you structure your information glossary.
Conceptually you can drill to as many layers as you need, here is an example for HR: