Dark Data – what is it… and should we be scared?

What is Dark Data?

Dark Data is the data, held by organisations and individuals, which is not used to derive insights or lead decision making. 

Gartner defines this in a similar way, “information assets organisations collect, process and store during regular business activities, but generally fail to use for other purposes”. 

An everyday example of this, on an individual level, is accumulating photos on a phone’s camera roll. How many photos are on there? Will you ever look through them? Or delete those which are stored but never viewed? This is a direct causation towards rising Dark Data levels.

The Internet of Things (IoT), a network of interconnected sensors, actuators, and computing systems, produces data at an alarming rate. These systems can be a key origin of Dark Data, with an excess of data not being used for any specific purpose.

Therefore, accumulating this Dark Data may seem inevitable and, for the majority of organisations, a daily occurrence. 

In fact, up to 52% of all information an organisation produces and stores is Dark Data. 

The amount of Dark Data is fast increasing. Between 2025 and 2030 data levels are forecast to triple. 

Actually, 80% of all data produced is Dark, siloed and disconnected, this leaves us increasingly vulnerable, inundated with increased data levels. 

The IoT will generate

>
1

Zettabytes of data

Of which

>
1
%

is estimated to be dark data

Causing

>
1
MN

tonnes of CO2 emissions

data growth in volume graph

This graph portrays the rise in volume of data created, copied and consumed. It also forecasts the continual rise in the amount of data.

Why should you care if your data is Dark?

Lack of clarity and control-

Dark Data by nature is unmanaged and unorganised. Because of this, it is susceptible to negative processing – for example, alteration and corruption.  This leads to a worsening loss of control over data management and increased lack of clarity. Therefore, the user loses out on core insight and decision-making that could have an otherwise transformative value. 

 

data warehouse
A data centre with data processing units. The physical storage required for data. Image from TechRepublic.

Untapped economic value-

There is also an opportunity cost of up to $11.1 trillion a year by utilising Dark Data to its full potential, as researched by McKinsey. The Internet of Things (IoT), a network of interconnected sensors, actuators, and computing systems, can add real value and interoperability, producing data at an extremely fast rate. Linking and understanding all levels of data and enabling their interconnectivity will tap into potential economic impacts. The more data available to analyse, the more insight for better informed economic decisions.

Alongside this lost potential for economic value, Dark Data incurs carbon costs within itself. To permit the data to be accessible, hard drives are spinning constantly, yet in some instances needlessly, in order to store it.  

Lost analysts’ time-

a mapping of analysists time
A split of an analysists time. Data from Anaconda.

The value of data is unrealised if analysis cannot be conducted to bring it to light. 

However, when data is Dark, 44% of data workers’ time is spent on unsuccessful activities. Basic tasks such as data pre-processing, trawling through data and conducting menial tasks such as file type changing for interoperability and data cleaning.  

The chart on the left displays a typical round-up of an analysists time, as explored by Anaconda.

What is the environmental impact of Dark Data?

Not only is Dark Data inefficient, expensive and time-consuming, it is also detrimental to the environment. 

Dark Data wastes up to 6.4 million tonnes of carbon dioxide in a year as data centres pollute the atmosphere with hard drive disks spinning unnecessarily, facilitating the storage of this vastly increasing and overwhelming data load. This is the equivalent to the average energy consumption of 3,900,00 households.  

However, conversely, digitalisation has the potential to positively impact the environment. Through environmental solutions and the creation of Sustainable Digital Twins, data can be managed and utilised for a sustainable outcome. 

 

https://youtu.be/OzzoGf3tCp8

The solution:

Integrate, visualise and analyse your data.  

Integrating your siloed Dark Data into one unified platform allows you to eliminate pre-processing and create a standardised format.  

Visualising the Dark Data also allows it to be brought to the light. This gives clarity and uncovers interlinked datasets, giving contextualisation to previously mismatched ambiguous data. Knowledge Graph Technology can be used to reveal these connections and uncover links. 

The below interactive model presents interlinked semantic metadata. Each dot represents a dataset and the lines track links between them. Isolated dots may represent unused siloed and Dark Data which can then be addressed.

https://youtu.be/gIOdJLP0_Mo

Analysing Dark Data allows it to provide reliable insight through calculations and unlocks the full potential of the data. It ultimately transitions through a period from data through metadata and semantic metadata towards information and knowledge.  

Find out more about managing your data with our Compass: Engine here.

Or you can book a meeting with us.

SHARE ARTICLE

Facebook
Twitter
LinkedIn
Email