Big data is a massive opportunity. It also presents some significant challenges that traditional systems find hard to handle. We’ve all seen the big data statistics: every minute 1,820 TB of data is created, 204m emails and 11 million instant messages are sent, 700,000 Google searches are made and businesses receive 35,000 Facebook likes.

How do you turn this explosion of data into value for your organisation?

According to a recent Ovum report, Enterprise Search and Retrieval* is a key technology for organisations of all sizes.  They also quite rightly pointed out that information that is valuable to an organisation doesn’t just reside within corporate repositories and that no single solution can meet all the corporate needs.

We believe that three of the key elements organisations should focus on when planning their ‘search’ strategies are:

  • Data harvesting from disparate data sources and different formats
  • Semantic search, using meaning based computing and natural language analytics
  • Discovery, to help make sense of large, complex data sets – even when you don’t know exactly what you’re looking for.

Why? Because the challenges with big data are that there’s so much of it, it comes at you so quickly and most of it doesn’t fit neatly into the rows and columns of a conventional database.

You can segment today’s data types into three categories and their associated challenges

  • Unstructured

    • Social data, sentiment and trend analysis
    • Documentation such as Word documents, pdf documents etc.

Challenges include: Trawling, aggregating and mining of data, entity extraction to understand what the data actually is, Visualisation and Natural Language analytics, especially multi language

  • Structured

    • Databases and schemas
    • Siloed and inflexible

Challenges include: Siloed data that may be in different structures and in different business area, Migration and cleansing issues to get a single searchable data set, Data mining if the data is not in a single repository

  • Structured-Unstructured

    • Semantic Web – “a web of data that can be processed directly and indirectly by machines.”
    • URIs and other web resources

Challenges include: Tagging the information, its source and its provenance, Data mining and visualisation

So why does this matter to you?

Data harvesting is a more economical way of creating a single, clean view of multiple data sets rather than always embarking on a Mass Data Management programme, especially if your data consists of various structures, formats and file types across many internal and external resources. There will be instances where MDM makes sense, but if you are doing it for search, discovery and analytics purposes, then I would suggest exploring alternative and more cost effective approaches.

With the large amount of unstructured data that now exists, finding documents and data that have a common meaning or theme can be far more effective than just searching for key word matches and getting a ‘page 1 of 13,000,000 result in 0.18 seconds’ results list.

And finally, Discovery. Without realising it, the Ovum report highlights why we feel Discovery should be a key component for enterprise’s IT strategies.  Ovum’s heading is ‘finding needles with haystacks’.  This is all well and good if you know what the needle looks like, but often we don’t.  We are often trying to find themes and connections in our data such as a connection between social and customer data, financial information all linked to the same theme or trend, documentation and geo-spatial information related to the same insurance or police case – none of which may have been tagged – when was the last time you tagged a piece of information or went back and re-tagged it?

So the digital world is changing rapidly. The difference between winners and losers will become more dramatic. If you want to be among the winners, I recommend you talk to us about a Discovery and Search strategy that can adapt as your world changes around you.

*Ovum Enterprise Search and Retrieval 2011/2012 – Technology Evaluation and Comparison report

About this author

Danny Wootton

Danny Wootton

Danny is Innovation Director for CGI UK, responsible for innovation across all divisions and delivery organisations and across the UK markets - a role Danny describes as the best job at CGI. Parts of the role include building strategic relationships with CGI’s clients around ...

Add new comment

Comment editor

  • No HTML tags allowed.
  • Lines and paragraphs break automatically.
Blog moderation guidelines and term of use