Even structured data can be messy
Data can take many forms. One type typically relied upon by Knowledge Workers is unstructured data. This is data such as the free text of contracts, emails, and letters. AI is oft correctly touted as the workhorse which can help turn these heaps of uncategorised, unstructured, words-based data into something a little more analysable – thus reducing the mundane aspects of a Knowledge Worker’s day-to-day and allowing them to focus on adding value rather than reading stacks of contracts.
However, even structured data can be messy. This messiness doesn’t just stop more sophisticated use of that data – for example, to trigger automated workflows or automated reasoning. It can stop it being used at all. That’s because, if they don’t think about IA before AI, enterprises end up spending time and money on cleaning and trying to understand and ultimately use the data produced by AI.
This is because words are messy and there are many ways to describe the same thing. One simple example might be: one team using AI to classify a document classification as a “Share Purchase Agreements” and another labelling it as a “SPA”. While others might use “Stock Purchase Agreement”, “Securities Purchase Agreement”, or “Acquisition Purchase Agreement” as labels.
Not only do these labels take different forms – there’s also no guarantee that all of the different people working on these projects across the enterprise are aligned in the meaning of these concepts.
This presents a problem down the road if an enterprise is trying to harness all this data to make enterprise-wide data-driven decisions.
How can they be sure that all these different labels are all referring to the same “thing” or concept? And even if they can be sure that they’re referring to the same thing, they’ll still need to pay someone to make sure this is the case and then take the time to clean up the data and make all the labels the same. And then of course repeat this across all the other metadata across the enterprise. It’s no wonder it’s commonly said that data scientists spend most of their time wrangling data.
The simple fact is, a multitude of different labels, which potentially differ in both form and meaning, represent walls which fragment an enterprise’s data and create information silos. And it’s this fragmentation that provides one of the reasons for why IA must come before AI. Without a cohesive IA strategy that binds AI projects under a single common language, the potential offered by AI and the data it produces is limited to a context-by-context single-use affair, never to be re-used again.
IA before AI
To harness the full potential of AI, enterprises need to see the data AI uses and produces as an architectural system that needs to be designed and managed just as much as the technology that creates it and moves it around.
They need to recognise that structure and categorisation is not an end in and of itself. They need to go beyond asking “is it structured?” and start asking “how should it be structured? What is that structure for? What systems does this information need to talk to? How is that information represented there? And how does this system as a whole help my colleagues and my customers do what they need to do?”
An example is helpful here to show why these questions matter. Imagine if you do business with a company called International Widget Corporation and thus have a variety of data within your enterprise related to your dealings with International Widget Corporation.
If that company name is structured well, then you will be able to link to other databases both inside and outside of the organisation and use that structure to infer further information such as industrial sector. But if it’s not structured well and takes more of a free text approach – where the data is alternately labelled as “International Widget Corporation”, “IWC”, or “Intl Widget Corp.” – not only is there no way of easily linking to other databases, but you might not be able to find all the documents in the first place.
It is only by asking these questions that enterprises can free AI to reach its full potential and produce data which can be consolidated, used, and reused, to create insights that can be trusted and stretch over time and across the whole organisation. Having an IA strategy on how to collect, organise, maintain, and consistently structure your content – unsexy as it is – is key to making this happen.
If organisations who have been banging their heads against the wall for several years trying to get AI to “work properly” adopt the mantra “IA before AI”, they will find that AI actually can deliver the magic they are after.