I haven’t blogged for quite a while. Recently I started spending my time again on security research, and the blogging itch is back. My current research focus is security data normalization, and in the next few posts, I will expand on this topic.
The first question that comes to mind is, why normalize? Should we normalize at all? That is obviously once we agree on what normalization is. So let’s start there.
At its core, normalization means that data collected from different sources should be converted to a uniform presentation or schema. Such a uniform schema enables analytics to be source agnostic. It also reduces the learning curve for analysts and enables them to be more productive. The article “SIEM Event Normalization Makes Raw Data Relevant to Both Humans and Machines” provides a good starting point for the rationale.
Where they successful?
In his seminal blog post, “Security Correlation Then and Now: A Sad Truth About SIEM“, Anthon Chuvakin claims that they were not. And I tend to agree. Want proof? If you are a serious security analyst, the number 4624 means something to you. Obviously, it is the Windows Login event. More precisely, successful login (4625 logs failures). You might also know that Login Type 2 is “interactive”, or at least you know that you need to consult Randy Franklin Smith’s excellent
Ultimate Windows Security. I have certainly used it a lot, as you can see on the right. Or just Google for 4624.
In a perfect world, an analyst would not need to know about event 4624. ArcSight categorization whitepaper mapped it already in the first decade of the millennium to this:
Now, how many people converse in 6424, and how many know the ArcSight categorization. How many systems analyze 4624 events, how many support ArcSight, or an alternative, categorization scheme?
So Anton has a point.
Now back to 2021. My current research at Microsoft, leading the Azure Sentinel Information Model (ASIM) initiative, enables me to get back to the challenge ArcSight started tackling more than 20 years ago. And I hope this time to move the needle. Let there be a generation of security analysts who don’t know what 4624 is (and not because Windows will die).
In the upcoming blog posts, I will discuss how we try to make normalization work this time. I will address areas such as:
- Value normalization
- Entities, entity IDs, and entity descriptors
So let’s start the journey.