Effectively harnessing unstructured data for business benefit

Tal Nathan
By Tal Nathan, 
Managing Director, Products 
Digital for Britehouse. (Image Source: Britehouse)

The effective use of unstructured data offers organisations a genuine competitive advantage, however, it is extremely difficult to take full advantage of the enormous volumes of unstructured data out there. This is because in a typical business, only around 20% of its content is structured data, while the other 80% is of the unstructured variety.

Such unstructured information encompasses everything from files stored in shared drives, through to e-mails and messages that are part of an online conversation. All of these contain data and information, but by its nature it is not easily discoverable.

A major challenge with unstructured information is that this form of organisational data is growing exponentially, every year. In fact, each year’s haul of unstructured data is typically larger than all the previous years’ combined. Worse still, storing this data is not only expensive, but businesses suffer from the law of diminishing returns – the more unstructured data that is generated, the more difficult it becomes to find the information that can provide that elusive competitive advantage. Basically, it becomes more difficult to search at a granular level.

In effect, companies are paying an increasing amount to store data that is becoming harder to find; one could say that organisations are becoming less intelligent the more information they store.

Look at the WWW as an example
The exact opposite is true of consumer websites, which are capable of providing users with increasingly granular levels of detail when they search for information, despite the fact that the amount of information available on the World Wide Web is huge in comparison to a single company’s data.

The kind of granular level available is such that it is now possible to search Google for a specific recipe, using only certain ingredients, within a certain caloric range and which can be cooked within a certain time frame and still receive multiple hits.

This is done through the simple expedient of getting Web masters to add ‘Rich Snippets’, a schema or catalogue devised for the major search engines, which effectively enables machines to understand content.

This is something that is referred to as Web 3.0. By comparison, Web 1.0 was the one-to-many model, where users accessed websites, while Web 2.0 was the many-to-many model, utilising social media.

With Web 3.0, content is tagged with these rich snippets which are recognised by the major search engines like Bing, Google, Yahoo! and Yandex. These entities rely on this mark-up to improve the display of search results, thereby making it easier for people to find the right Web pages.

This is a far cry from search engine optimisation (SEO), which was previously used by organisations to ensure that their websites came out top during specific keyword searches. Rich snippets could thus be described as the next evolution of SEO. It is, however, no longer about searching across multiple sites for keywords, but rather for specific information.

Learning’s from the Web
So the real question, then, is why are businesses not applying a similar technique to enable them to find critical business information? After all, if they have the right metadata attached in the same way webmasters are using such tagging in the consumer space, they could theoretically have same result.

For example, a business could easily search for a specific document, written by a specific individual, for a specific client, on a specific date and about a particular project.

The significant challenge for organisations is that while the search engines in the consumer space are able to use these rich snippets to improve the search granularity, the fact remains that such a search still relies on tags that are inputed by humans.

And while a wrongly tagged recipe merely means that searching consumers will not be able to find that particular item of information, badly tagged corporate information could negatively impact the business or its customers.

This is the crux of the challenge facing organisations – business users are often reluctant to add metadata to documents and the behavioral change required to achieve this is often difficult to attain.

This is despite the obvious benefits that the ability to search effectively, and at a granular level, for information can have on a company’s competitive advantage.

While user adoption is clearly a challenge, organisations should be aware that technology already exists that is capable of doing the necessary tagging automatically.

Using the right automated tools, therefore, has enormous implications for businesses as we move forward. After all, once an enterprise is able to automatically generate metadata for its unstructured information, it will be able to provide both context to and discoverability of the data.

In effect, it will be similar to having the efficiency and accuracy of a librarian, but one that never gets anything wrong.

Moreover, it needs to be remembered that this will enable the business to recognise and contextualise its information, according to its own requirements. This, in turn, means organisations will be able to effectively utilise their information to positively impact upon revenue generation and customer service.

Ultimately, everyone is talking about big data and knowledge management and their impact on the business, yet this talk just goes in circles, with no-one seemingly able to actually make something happen.

Instead of trying to change employee behaviour or develop relevant taxonomies in order to obtain the necessary value from business information, enterprises should look to the example of the major search engines.

These entities are already successfully marking content with relevant metadata, within a defined taxonomy, and are using rich snippets to improve consumers’ access to relevant information. Using the right tools to automate their own tagging processes can just as easily enable businesses to truly harness the value that exists in their data.

By Tal Nathan, 
Managing Director, Products 
Digital for Britehouse