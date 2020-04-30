The modern world is increasingly mediated by digital devices and networks, where every click or online transaction generates a data trail that can reveal trends and insights. In industry, it is well known that there is a strong correlation between good organisational performance and the exploitation of corporate data. Data exploitation is often seen as delivering most benefit when data is combined from different sources. However, for some defence organisations, potential benefits are often frustrated by the poor quality of data. There might be data missing and there are inconsistencies between systems. For example, the same asset could be described in one system using a nine-figure alpha-numeric part number but a 13-digit NATO stock number in another. Alternatively, a component could be described differently across systems – for instance, as ‘lamp’ in one and ‘light’ in another.

The Joint Concept Note (2/18) on Information Advantage opines that information (data) is no longer simply an enabler but a weapon of its own, and should be treated as a ‘fully-fledged lever of power’ with the potential to increase battle tempo and momentum. The Note implores planners to develop campaign information strategies that enable a ‘front-footed information advantage posture at the heart of 21st century deterrence’. It adds that ‘data is at the heart of information advantage’ and that Defence requires data that ‘can be understood, manipulated, shared and exploited. We must cohere and align the constituent elements’.

In support of this objective, data engineers, data scientists and consultants spend a lot of time trying to find the right data and performing so-called ‘ROT analysis’ – removing data that is either redundant, obsolete or trivial. This process can typically remove up to 80% of the big data set that is initially presented. Indeed, big data preparation often involves finding the ‘small data’ germane to the objective. Data preparation also involves a step known as pre-processing that involves identifying problems, such as missing data or categorical variables, that need to be corrected before it can be analysed by software. Analysis tools can also help by identifying so-called ‘hot data’ – that which is most frequently used and thus most valuable. Usually, big data tools are necessary when the data to be analysed is in the terabyte scale. However, with good data pre-processing, excellent results can be obtained from just 250 megabytes of well-curated ‘small data’.

There is also often an issue of addressing data that does not fit neatly into a numerical format, or into the rows and columns of structured databases for analysis – so-called ‘unstructured data’. Such data comprises webpages, e-mails, text documents, graphics, and so on, and is now beginning to dominate the data landscape, as well as being used to enable sentiment analysis or opinion-mining. Analysis of unstructured data usually means identifying structures to enable analysis by software – for example, by indexing or grouping text using a text analytics tool or by creating a graph view linking text or content to user groups of interest. Data types also include so-called ‘dark data’, which can be valuable but is often inaccessible and is therefore rarely exploited. Many defence organisations have unexploited dark data because of inaccessibility, security or technical incompatibility. Some estimates put dark data as high as 60% of an organisation’s data holdings.

It is not unusual for those involved in big data analytics to spend much of their analytics time simply preparing data for analysis. Indeed, the 70:20:10 heuristic is often cited: 70% of time and effort is spent sourcing and preparing data for analysis, with just 20% on the analysis itself, and a further 10% to visualise and present the results.

One further observation concerns the ownership of defence-related data. The past 25 years have seen a steady outsourcing of capability provision from MoD to industry. Industry is frequently the custodian of MoD’s capability and performance data. Furthermore, this data is often stored in a proprietary format. If MoD wants to use this data, then it often has to find additional funds to pay their provider to prepare and supply the data. A cause of the problem is that most commercial contracts let by MoD contain inadequate provisions for the supply of data; consequently, data exploitation for such capabilities becomes problematic. Although there is an Information Defence Line of Development (DLOD) for most capabilities, it is not uncommon to find its provisions traded out by ambitious project managers to achieve increased value for money. If Defence is serious about exploiting its data and creating an information advantage, then it needs to consider including provisions for obtaining data held by industry as a matter of course within future contracts.

IT proponents, like Gartner or Forrester, claim that self-service analytics and citizen data scientists are the future for data exploitation. But the realities of using even the simplest of analytics tools is that some data literacy and skill is required, and that several non-technical aspects are important. For example, a lack of persuasive evidence that convinces senior defence leadership to invest in data exploitation, strong senior analytics-literate proponency, and a paucity of data-literate suitably qualified and experienced personnel (SQEP).

Just doing the simple analytics right is a good start – sophisticated AI, advanced analytics or machine learning tools are not necessary to deliver valuable insights. Breakthrough results often come from simple data analysis and by starting small.

In summary, there are five actions which would significantly improve data analytics success in Defence. First, to expect data pre-processing to take most of the initial exploitation effort. Second, to bake data provision into commercial defence service provision contracts as a matter of standard practice. Third, to have a clearer idea of how to exploit unstructured data and dark data holdings. Fourth, to understand that most data exploitation challenges for UK Defence are about SQEP and leadership. Finally, Defence needs to start small on a focal domain, achieve successes, then scale up rather than attempting to boil the ocean with big data projects up front.

Roland McTeague

Roland is a former RAF Engineer Officer and recently led data analytics teams at two large IT companies working with clients in the public and private sectors. He now works for the Ministry of Defence in the Defence Digital organisation.

Banner Image: Ali Air Base Server Farm, US Air Force Photo/Airman 1st Class Jonathan Snyder, Public Release