Data analytics is increasingly used as a first line of defence in cyber security – and yet many organisations are still to embrace this – partly due to a lack of resourcing and partly due to a lack of knowledge for what kind of security analysis they may even require. Fundamentally, we all live in a highly digital world, where we interact online, or we interact with other physical systems that are connected online, and where all these interactions result in more data being collected – from networking traffic to system and user logs – our computers generate massive volumes of data every minute. Now consider the different forms of digital equipment that are also connected and generating data – from laptops to smartphones, servers to light bulbs, payment card systems to CCTV, and anything else that may be Internet-enabled – we now have a wealth of data logs capturing a whole host of activities from these devices, that enable our technology-driven world.
So, what happens when these devices do not perform as expected? What if our laptop suffers a malware attack, our CCTV is taken offline by a Denial of Service attack, or an employee downloads confidential records and copies these to a removable drive? Security data analytics is concerned with how we protect our systems and information from potential threats, through enhanced understanding of our systems and the data that underpins these.
Analysing data is one thing – communicating this analysis to make well-informed decisions is another all together. Visualisation techniques enable us to make greater understanding of complex data, and ultimately allow us to make better decisions about our data.
We can think about exploratory visualisation as supporting our analysis of data, and explanatory visualisation as supporting our presentation of data.
“The greatest value of a picture is when it forces us to notice what we never expected to see” John W. Tukey, 1977
What kind of data are we talking about? Just as the security domain is broad, so to are the forms of data that may inform security. At the core is context – what are we trying to find out, and what data may help us to answer this question clearly? Furthermore, how do we then respond or act as a result of this information?
We can think of data as the lowest level of a broader pyramid, where data informs information, information informs knowledge, and knowledge informs wisdom. Data in isolation may not be meaningful, however combined with other data, may help to build up towards these higher levels. This is often referred to as the DIKW pyramid), or the Hierarchy of Visual Understanding.
For an organisation to be secure, a clear understanding of the operational environment is required. This is often described as situational awareness, which is the perception of environmental elements and events, the comprehension of their meaning, and the projection of their future states. Increasingly, organisations are deploying security operations centres (SOC), where analysts will seek to identify suspicious behaviour and understand the context and relevance to the organisational mission, often using Security Information and Event Management (SIEM) systems.
Technology underpins modern organisations and having insight into the business operational environment is crucial to protect it. As a first stage, ensuring the safe and correct operation of our computer systems, and our networking infrastructure is a good place to start. Network traffic data (e.g., packet captures) can help to indicate what data has been communicated over a network, and what actions have been carried out as a result of this (e.g., access to a particular URL, or downloading of large files). Intrusion Detection Systems (IDS) are commonly used to inspect networking inbound and outbound network traffic, to identify suspicious activities. IDSs will generate log files, and these logs constitute another informative data attribute. Similarly, firewall rules can help understand how the network is configured, and Intrusion Prevention Systems (IPS) will make decisions and act on IDS activity to prevent potential harm.
The remit of cyber security is far and wide and goes beyond traditional computers and network security. A holistic view is required of what we want to protect, and what attack vectors may be used to gain access. Therefore, aspects such as physical security, people security, and process security also need to be understood. Physical security may require CCTV, IoT sensor monitoring or GPS tracking. People security may require text analytics of social media and email usage. Process security may require analysis of business process models, supply chain security, organisational hierarchy information, and operational practice. Technology continues to influence how we conduct business across the globe, and therefore we need to ensure that we understand our threat landscape and have clear monitoring in place to understand potential harms. In many cases, we are interested in spatial-temporal data, i.e., in what location did the activity occur and at what time? Given our highly connected society, location is becoming increasingly challenging (are we looking at the location of the attacker, the location of the data, the location of the breached system?), and as for time, devices are logging activities faster than we can humanly inspect them. There is then the need for big data cyber security analytics – to make this flow of data manageable and insightful, to highlight key attributes in the data, and to enable informed decisions to be made to respond and react to potential threats.
Security is about understanding systems, the people, and the processes that act upon these systems, such that they remain secure.
Can we ever be fully secure? Probably not, but with greater insight of observed activities, we can manage this more effectively. Data analytics and visualisation techniques are one step towards achieving this.
We discussed earlier the idea that data visualisation can be used for exploratory and explanatory uses. In the latter, we may want to tell a story about our data, often described as Data-Driven Storytelling. So what makes for a good data story? There are five aspects that may be pertinent to the story.
Typically, in security, we are looking for novelty and outliers, based on the historical trend, to then provide a forecast of what may be if we do not intervene. Where we can couple observations against either “known-bad” activities (e.g., malware attacks), or show that observations are clearly deviating from “known-good” activities (e.g., insider threat), we can provide some insight into the underlying activity to determine whether action is required.
DarkTrace, CheckPoint, Symantec, Sophos, FireEye, Cynet, Fortinet, Vectra, and Cylance. These are just a handful of vendors that now use Artificial Intelligence and Machine Learning as a part of their products and services for cyber security defence – there are plenty others too, but this just gives you an impression of the direction that the industry is moving in.
Cyber security requires a holistic view to identify what should be protected, and how may it be vulnerable to attack. The volume of data generated by today’s systems means that humans cannot analyse this raw data effectively. With AI and machine learning techniques, we can filter and manage data observations, whilst visualisation can help human analysts understand and communicate about observations and appropriate responsive actions.