Security data normalization, like any other standardization effort, has a very human aspect to it. If you ever discussed a schema with someone, you know the discussion can get emotional. You would think that deep technical issues are at stake, but it is usually the very basics of normalization that tend to start the commotion: field naming.
Why argue about field names?
I actually find it quite natural. While machine analytics can cope with any name, human users can benefit from just the right field name. That is, If there was just a right field name… it is a subjective topic after all. And this is why good normalization has to develop good social skills.
In ASIM, the Azure Sentinel Information Model, we introduce two techniques to try to make field names serve analysts better, whatever their taste is: Descriptive Scenarios and Aliases. Neither is groundbreaking technology. After all, they try to add some social skills to normalization.
Descriptive scenarios address normalizing the role each entity plays in an event. Most of the information conveyed in an event is about the entities: users, devices, files, processes, and more. But events often include more than one entity of the same type, and those are usually designated by a prefix: Src, Dst, and the like. Being ubiquitous, this is probably a good solution, but there are just so many of those to make it quite confusing. Destination or Target? Source, Actor, or Initiator?
In ASIM, we try to normalize but still keep the prefix intuitive. A user would be an Actor, but a host is a Source. As always, with social skills, talking about it is important. Therefore we provide in the documentation descriptive scenarios that make it easier for analysts to internalize the prefixes we selected. Those are the scenarios for the User entity:
- Create User – An Actor created or modified a Target User
- Modify user – An Actor renamed Target User to Updated User.
- Sign-in – An Actor signed in to a system as a Target User.
- E-Mail – An Actor sends an email to a Target User
- Network connection – A process running as Actor on the source host, communicating with a process running as Target User on the destination host
- DNS request – An Actor initiated a DNS query
- Process creation – An Actor (the user associated with the initiating process) has initiated process creation. The process created runs under the credentials of a Target User (the user related to the target process).
We hope that such scenarios will help analysts better understand who is who, especially in the more complex scenarios such as modifying a user or process creation.
Another intuitive concept, we added – well, trivial for that matter – is Aliases. If we cannot agree on the best name, why not have two or even more?
We find that aliases are handy in several situations:
- Getting rid of prefixes (and suffixes while at that) – It is much easier to use the “User”, “IpAddr”, “Dvc” or “CommandLine” than the convoluted version, say “DvcHostname” or “ActingProcessCommandLine”. Obviously, as discussed above, prefixes are important. However, a short name to designate the most useful entity or entity attribute is very useful.
- Not making a choice – sometimes a value is something to a group and another thing to others. For example, the DNS protocol field Query holds, most often, a domain name. It would be a Query for a DNS expert, while for a typical analyst, it would be a Domain. So we allow both.
- Backward compatibility – version management is not glamorous but important. Sometimes you want to update. Maintaining the old name can be done using an alias.
Obviously, the underlying technology has to support aliases efficiently and not require data duplication. Query time normalization usually has an easier time than ingest time normalization in supporting aliases. This is a good reason to support query time normalization, even if alongside ingest time capabilities.
I would love to hear your thoughts about those and other areas in which normalization can be more social!