Big data is here to stay. In fact, anything is (and always was) big data per se. Yet in the last couple of years, the challenges in collecting, storing and accessing such tremendous amounts of data have been thoroughly tackled by Cloud computing. Now all one has to do is to find a way to turn that into usable understanding, especially in the field of network security.
There’s an ambitious paper floating around on the web regarding this game-changing step. Maybe you have already seen it — if not you can read about it or get the full paper . I would suggest you to read it before reading on, as I will be discussing passages of the paper itself, instead of summarising or expanding on it. But enough of that already — let’s dive in!
Key Points
“Intelligence-driven security relies on big data analytics. Big data encompasses both the breadth of sources and the information depth needed for programs to assess risks accurately and to defend against illicit activity and advanced cyber threats.”
The point of breadth is easily explained in the paper, but care must be taken when estimating its value. Packetloop‘s CTO Michael Baker voiced a valid concern: “Generally more data is only more accurate when you have trained models. Many inputs without context is tough.”
Additionally, the importance of depth cannot be stressed enough. The paper completely fails in this regard. Especially in the field of network packet analysis, there’s so much more potential to uncover — potential that has nothing to do with the advent of Cloud computing and its promises. The key to understanding threats of any kind still lies in the way it is being delivered: by the network itself.
What I mean is the following: instead of reaching too far out to acquire more related data from external sources using big data tools to extract meaning within the context of the network traffic is a double-edged sword. If one does not value the importance of information to be gathered from the network stream itself, any attempt to try to make up for it in another area must be in vain. Instead, network packet analytics need to be constantly expanded and refined.
“Integrating big data analytics into business risk management and security operations will require organisations to rethink how information security programs are developed and executed.”
I’ve touched this subject before. It’s simply about educating the network operators, broadening their understanding of the things that happen, and getting the technology out of the way to bring the network and their operators closer together. We need to be done with this mystical image of networks that we use (and rely on) on a daily basis. It’s the main reason for an ever increasing amount of security concerns. Hackers are smart about their line of work, and so should be network operators.
Security experts may roam freely in huge companies, but the small and medium businesses don’t have the incentive (alias money) to employ a full-time expert. This, of course, will shift the field of the administrator to a more security-aware dimension. But it will also require better tools and technology to cope with the growing need to further understand and analyse the network.
Big Data Holds Promise For Security
“Cyber attackers have become more adept at waging highly targeted, complex attacks that evade traditional defences, static threat detection measures and signature-based tools. Oftentimes, cyber attacks or fraud schemes perpetrated by advanced adversaries aren’t detected until well after damage has been done.”
There’s a long history of attacks being realised only after the actual breach or theft has occurred. What is new is the ability to be able to uncover them by technology itself and not by manually sifting through the aftermath (or apparent lack thereof). Full packet captures are in high demand and Cloud storage is cheap, but does that really make up for the damage? I like to call this I-told-you-so technology, which by itself is very powerful, but suffers the same flaws as the network at the time when the attack was being executed: insufficient knowledge at that particular moment in time. And — let’s be honest — we all hate the guy pointing the finger afterwards.
Of course that’s something which cannot be done to an absolutely satisfying amount. But still, there is much to unearth in the network traffic itself, the relationships between network elements and the sheer endless stream of repeating patterns and surprisingly predictive behaviour. The paper then makes a good point stating the following:
“As part of modernising information security programs, organisations will have to reduce their reliance on signature-based scanning tools, which only detect limited-scope threats that have been encountered in the past. Instead, organisations need to cultivate security capabilities that will ultimately help them detect the unknown and predict threats in the future.”
More Data Means More Security
Please, the section title is horrible and naive. But, anyway, this section gives insights on the possibilities; it also discusses risks encountered while implementing such a security model. Consider this excerpt:
“In an intelligence-driven security model, the definition of ‘security data’ expands considerably. In this new model, security data encompasses any type of information that could contribute to a 360-degree view of the organisation and its possible business risks. […] When big data drives security, the result is a unified, self-evolved approach and a holistic awareness that discrete, stitched-together solutions can’t begin to achieve.”
What is beyond the scope of the text is the amount of customisation needed to adapt big data to your business. Collecting big data is easy (the text states just that), but the amount of interesting information may be far less than what is being collected and correlated. There needs to be a conscious choice about the amount of information that is being presented vs. the information that is truly irrelevant.
The risk is that irrelevant data clogs up your big data solution and makes correlation and analysis sub-par and/or expensive. Such doing may also yield far more false positives than desired. The typical needle in the haystack scenario can never be avoided, but it would be of interest to reduce the amount of storage capacity and computation time necessary to sift through relevant information. The promise of Cloud computing is not to worry about such things anymore, but that doesn’t excuse one from wasting electricity (as one of my professors used to say). Efficiency cannot be bought — it needs to be carefully designed.
Big Data Transforms Security Approaches
“Data processing happens on a much grander scale: today in the SIEM space, tools are capable of correlating thousands of events per second; going forward, security management platforms will correlate hundreds of thousands, even millions, of events per second without the need to expand the hardware footprint.”
No, just no. While the projection is true, it is also integral to consider hardware scaling. Moore’s Law may be broken, but advances in hardware do drive the capabilities of all future systems and technologies. It’s naive to say hardware has nothing to do with this growth. Think of all the Cloud computing infrastructure and parallelisation that’s being engineered around it to actualise its full potential. It’s the pillar of big data being fuled by the growing capacity in storage capacity and computation power.
Now, in evolutionary turning points one has to take a step back and ask: What do I do with the ever-growing excess of hardware resources? Was there something I always wanted to do, but couldn’t because of those very limitations? In this regard we will likely see big data not only in the Cloud, but also in on-premise network appliances. And that doesn’t mean a one-to-one adaption; it means new use-cases that can shift the focus to realtime applications in the upcoming years.
Also, on some occasions I feel the paper talks about big data like it’s just a supercharged SIEM with more input to process instead of reshaping it to fit the intelligence-driven security model. A lot of work and thought needs to be put into the field before we can see a definite answer to how much different these new systems are going to be compared to today’s conventional SIEM solutions.
Building A Big Data Security Program
“Leverage external threat intelligence – Augment internal security analytics programs with external threat intelligence services. Often threat indicators, attack forensics or intelligence feeds from outside sources are not machine-readable and require extensive manual processing by SOC analysts. SOCs should evaluate service providers aggregating threat data from many trustworthy, relevant sources. Data from these sources should be in formats that can be automatically ingested by security analytics platforms for correlation with internal data.”
That’s going to be the gem for network security as a whole. The arrival of Cloud-based analytics will help build open source tools which in turn will be integrated into more than one solution over time, so that all solutions can work on the same intelligence inputs at some point or another. This will also help different intelligence providers to provide their own sets of intelligence in those commonly used formats, ever feeding the growing need.
Looking Ahead: Big Data In Five Years?
All of the things presented in this section are beautifully laid out. I’m quoting them here for emphasis:
- “security analysts will be able to use tools with intuitive interfaces”
- “SOCs will gain the requisite expertise, processes and tools to make the most of security data available to them”
- “data analytics systems will empower users with decision-support capabilities”
- “security management tools will automatically share relevant threat data with partners and creatively reuse big data in different security scenarios”
All nitpicking aside, the paper is well conceived and gives an interesting outlook on the topic. Yet there is one aspect that is not covered. The aspect of the importance of visualisation. Victor Pereira, security analyst and former colleague of mine, said the following: “I think visualisation is an important piece in this puzzle. Even if you have a set of well-trained algorithms, an up and running infrastructure, and are crunching the right data — if you don’t know how to present it in a way that makes it clear to the network operator/SOC analyst/C(I)SO/whatever what’s going in on in his network, maybe the whole solution is doomed to fail.”
And once more, one must not forget that big data as a whole is build on depth — richness of information, aggregation and understanding of superimposed conditions — more than it is build on breadth. I’d like to go one step further than the paper and say: Not only collection is easy; correlation is easy, too. What’s hard is interpretation of data and evaluation of relevance thereof. Sometimes the one even depends on the other and vice versa. This means providing intelligence is a solid foundation, but still not the whole picture.
So, all in all, big data is a big step into the right direction, but can easily stray from its mission to bring holistic network security one step closer to happening. It won’t be for a couple of years before the lasting benefits manifest themselves, but with a sceptic eye and prudence the industry (as well as individuals) can slowly set a path to make the most of it — one step at a time.