Author Archives: Shezaf

About Shezaf

Ofer Shezaf is a senior director for cyber security at Varonis. Ofer works closely with the Varonis product team to ensure that the Varonis products continue to address new and upcoming security threats. Before joining Varonis, Ofer led security research at HP and before that in Israel's cyber warfare department. As an internationally recognized security expert, Ofer is an OWASP (Open Web Application Security Project) leader and the founder of the OWASP Israeli chapter. Some community projects Ofer has led are the OWASP ModSecurity Core Rule Set Project and the WASC Web Application Firewall Evaluation Criteria project. Ofer spoke at numerous conferences including BlackHat, RSA, HitInTheBox and OWASP AppSec.

Digital self improvement: your recommendations

I started a digital self-improvement journey. The beginning was actually focused on productivity improvement. I think that Edge and Outlook are better than Chrome and Gmail. That said, moving from free services, paid by advertising, to commercial services did push me into a privacy campaign.

– Day #1: uninstalling #Chrome
– Day #2: ditching #Gmail 
– Day #3: delete my #Facebook 
– Day #4: #DuckDuckGo.
– Day #5: VPN
– Day $6: Startpage

Along the way, I received a lot of advice from you all, and in this post, I am summarizing it, adding my journey experience when I already have some.

Moving away from Google was a byproduct of switching from Chrome to Edge, inheriting Bing along the way. You suggested that I use DuckDuckGo, and Startpage to improve privacy and to get better search results.

I summarized my initial experience, but the verdict was not final. Since then, I did find that Google works better sometimes, and I am using DuckDuckGo bangs to use Google within DuckDuckGo from time to time. I did not nail precisely what is this advantage and when to expect it.


I received a lot of recommendations for alternative browsers, including Firefox Focus, Brave, Chromium-based (Iridium) builds, KHTML based browsers such as Konquerer, WebKit based browsers such as Midori and Epiphany browsers (if you are on Linux), and Opera. Someone nostalgic also mentioned Netscape.

Maybe it is the number of options, or maybe my take on the importance of the browser itself, but I did not go that route and kept to Edge. I do use DuckDuckGo, as an easy way to use DuckDuckGo as my search engine.


I deleted my Facebook account, and obviously, many recommended ditching every possible social service: WhatsApp, YouTube, Instagram, LinkedIn, to name a few. But I wasn’t after privacy to start with. To an extend, deleting Facebook was just a showoff. I never logged in anyways. While I appreciate the privacy price I am paying, I do need those services, and I want to keep socializing. Quoting one of the comments on my posts from someone who ditches WhatsApp: “It is a lonely world out there”.


I never used VPN, and once on my digital transformation got really tempted to experiment. I went with Nord VPN. Of all the changes I did, this one got the least comments and, on the other hand, probably affected my digital life the most.  Using VPN obviously provides more privacy, but websites and apps fight back. Some inadvertently: I really don’t read German even if I connected through Switzerland, and some purposely: I need to use multi-factor authentication a lot more.

The two comments I got on my VPN usage are whether I can trust any VPN vendor to really keep my privacy (probably not) and to tunnel to Blida. I will leave you as a challenge to figure out the last one.


I received recommendations for different alternative e-mail systems, namely Protonmail and GMX. That said, I think moving from a free service, in which I paid with my privacy, to a paid one, namely, is sufficient for my privacy. And is a great e-mail client, not least because of excellent integration with the desktop and mobile apps.

What’s next?

  • Someone suggested switching my DNS service (guide here) or using a DNS sinkhole like Pi-Hole. Using is easy, but it is Google. I want to explore that route, but not sure who actually provides more privacy.
  • I am still using Google Photos and Google Maps. I would love to find a decent, paid-for alternative, but nothing is on par.  I got OsmAnd as a navigation suggestion, but it is far off the mark.
  • Someone suggested switching to another OS, LineageOS, instead of Android, for example. Well, I do need to get value from my digital life, not just experiment with technology.

Day #7 or my digital transformation: which search engine should I use?

My digital life transformation journey made me leave the Google Search comfort zone and try others. Since I switched from Chrome to Edge, I inherited Bing, and following my posts on the transformation, I followed readers’ advice to try out DuckDuckGo and Startpage. DuckDuckGo and Startpage use Bing and Google, respectively but add a layer of privacy on top. As of this morning, Startpage was my default search engine on Edge on Windows, while I used DuckDuckGo as the browser on my mobile.

I was aware of the limitations but was really taken aback by the empty search result from Startpage. I don’t remember the last time I got no results at all.

This led me to spend a few minutes comparing them, at least on this search, representing my work-life technical search requirements. The issue here is that I did not get the function name correctly.

So, let’s start.


Google, my default until recently, and probably what you are using, provided the expected result. It was able to identify my mistake and provided the correct answer. It also presented to me the error and correct value clearly.


My real surprise was Bing. It found the correct answer, enabling me to get the information I needed, and provided added value: explaining what an ipv4 prefix is, discussing IPv4 in general, and listing related user asks.

I found this added value on top of the search results one of the main innovations in the search experience in the last few years and was pleasantly surprised that Bing outdid Google in this case. I found Google added value content better in other cases, mostly in searches related to my local environment. Therefore, I sometimes revert back to Google for personal life-related inquiries in some cases.

While Bing did get me the correct answers, it did not present my mistake and correct search term, which can be helpful at times.


DuckDuckGo did present to me the correct answer. The search results were identical to Bing, which is not surprising given it uses Bing in the backend. That said, the added value information was missing.


Lastly, Startpage did not return any results, which was the starting point for the whole endeavor.

The comparison here, alongside the day-to-day experience with Bing and DuckDuckGo, proved that Bing is a viable alternative to Google. While also advertising-based service, it is good to know that there is an alternative.

When it comes to more privacy, I found DuckDuckGo acceptable, while at this time, I need to drop Startpage from my toolbox.

Normalization social skills

Security data normalization, like any other standardization effort, has a very human aspect to it. If you ever discussed a schema with someone, you know the discussion can get emotional. You would think that deep technical issues are at stake, but it is usually the very basics of normalization that tend to start the commotion: field naming.

Why argue about field names?

I actually find it quite natural. While machine analytics can cope with any name, human users can benefit from just the right field name. That is, If there was just a right field name… it is a subjective topic after all. And this is why good normalization has to develop good social skills.

In ASIM, the Azure Sentinel Information Model, we introduce two techniques to try to make field names serve analysts better, whatever their taste is: Descriptive Scenarios and Aliases. Neither is groundbreaking technology. After all, they try to add some social skills to normalization.

Descriptive scenarios address normalizing the role each entity plays in an event. Most of the information conveyed in an event is about the entities: users, devices, files, processes, and more. But events often include more than one entity of the same type, and those are usually designated by a prefix: Src, Dst, and the like. Being ubiquitous, this is probably a good solution, but there are just so many of those to make it quite confusing. Destination or Target? Source, Actor, or Initiator?

In ASIM, we try to normalize but still keep the prefix intuitive. A user would be an Actor, but a host is a Source. As always, with social skills, talking about it is important. Therefore we provide in the documentation descriptive scenarios that make it easier for analysts to internalize the prefixes we selected. Those are the scenarios for the User entity:

  • Create User – An Actor created or modified a Target User
  • Modify user – An Actor renamed Target User to Updated User.
  • Sign-in – An Actor signed in to a system as a Target User.
  • E-Mail – An Actor sends an email to a Target User
  • Network connection –  A process running as Actor on the source host, communicating with a process running as Target User on the destination host
  • DNS request – An Actor initiated a DNS query
  • Process creation – An Actor (the user associated with the initiating process) has initiated process creation. The process created runs under the credentials of a Target User (the user related to the target process).

We hope that such scenarios will help analysts better understand who is who, especially in the more complex scenarios such as modifying a user or process creation.

Another intuitive concept, we added – well, trivial for that matter –  is Aliases. If we cannot agree on the best name, why not have two or even more?

We find that aliases are handy in several situations:

  • Getting rid of prefixes (and suffixes while at that) – It is much easier to use the “User”, “IpAddr”, “Dvc” or “CommandLine” than the convoluted version, say “DvcHostname” or “ActingProcessCommandLine”. Obviously, as discussed above, prefixes are important. However, a short name to designate the most useful entity or entity attribute is very useful.
  • Not making a choice – sometimes a value is something to a group and another thing to others. For example, the DNS protocol field Query holds, most often, a domain name. It would be a Query for a DNS expert, while for a typical analyst, it would be a Domain. So we allow both.
  • Backward compatibility – version management is not glamorous but important. Sometimes you want to update. Maintaining the old name can be done using an alias.

Obviously, the underlying technology has to support aliases efficiently and not require data duplication. Query time normalization usually has an easier time than ingest time normalization in supporting aliases. This is a good reason to support query time normalization, even if alongside ingest time capabilities.

I would love to hear your thoughts about those and other areas in which normalization can be more social!

SIEM Normalization Dirty Secret: Values

When posting my first post on normalization on LinkedIn, I was pleasantly surprised that the ensuing discussion got to my favorite normalization topic: value normalization. Mehmet Ergene even linked to his interesting article on the topic.


Because to a large extent, I think that the missing piece in SIEM data normalization so far is value normalization. I was going to start a 40-page long post covering everything about value normalization, but hey… you will not read it. So I will start with an example: our recent Azure Sentinel Registry schema.

Normalizing Registry events is one of the simpler normalization exercises. The registry is a Windows concept, and the events reported are always the same. Just the reporting system changes. Compare that to, say, authentication events, which might inherently behave differently in different systems. Moreover, to start with, we created parsers only for Microsoft solutions that report on Registry activity:

As you will see, even in this simple exercise, value normalization is important and far from trivial.

Does the key fit the lock?

The most important field in a registry event is the key name. Keys in the registry are like folders in file systems. To understand what the event is about, you need the key.

However, the exact same key has different values when reported by different systems. For example:

Windows \REGISTRY\MACHINE\SOFTWARE\Microsoft\Windows Defender\Signature Updates
Sysmon HKLM\SOFTWARE\Microsoft\Windows Defender\Signature Updates\LastEmergencySigCheck
Defender HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Defender\Signature Updates

Each one is different! This certainly affects queries such as this, of which the relevant snippet is:

| where RegistryKey has_all ("HKEY_LOCAL_MACHINE", "Image File Execution Options")

The value “HKEY_LOCAL_MACHINE” will have to be different for different event sources as each system logs the key prefix differently. If we just normalize the key field name and not the value, queries would still have to account for the difference, and analysts understand the peculiarities of each source.

In the Azure Sentinel Information Model (ASIM), we require normalizing the key value, enabling the query above to work. However, the list of options is not comprehensive, and here is exactly where the community can work together and help to extend.

You probably noticed that the Sysmon value has another difference: there is an additional part at the end. The reason is that Sysmon reports the key and the value (which is similar to a filename in file systems) together. This is not a value normalization challenge but rather an example of how field name mapping, usually considered the easy part of normalization, has its complexities. In this case, the Sysmon field has to be split and mapped to two different fields in the target schema.

It’s a bird, it’s a plane, it’s a DWORD…

While not as cardinal as the registry key, another value reported differently by different sources is the type of registry values. I found different solutions to report the same type as “Reg_DWord“, “Dword“, or “%%1876“.

The first two are easy to address if (and only if) one is aware of the issue: an analyst will surely get it, and analytics can search for “dword” as a substring.

The last option, “%%1876“, demonstrates a common value normalization challenge. The use of codes in events. “%%1876” is the Windows code for “Dword“. However, this is not something an analyst should know. In ASIM, we require normalizing the value to the first option (“Reg_DWord“), and as a byproduct, also ensure that the value clear to analysts.

Another example of codes vs. labels is DNS logs. Most analytics based on DNS events uses the error code reported. Here is the start of a typical Azure Sentinel’s DNS detections:

| where isnotempty(ResponseCodeName)
| where ResponseCodeName =~ "NXDOMAIN"

However, some DNS sources report the error code using a numerical format, while others are using a label. As IANA’s mapping suggests, the code for NXDOMAIN is 3.

But what is it all about anyway?

The previous examples demonstrate well what value normalization is. However, the most fundamental value normalization challenge, which is relevant to every schema, is the core fields that tell us what the event actually was:

  • Type: what activity was actually reported?
  • Result: was the activity successful or not?
  • Result Details: what was the reason for failure?
  • Action: the action performed by the reporting device. While not universal and only typical to security systems, it is common enough and important enough to include it here.

Since most sources report only success for registry events, only the first is relevant in our example. But, even there, value normalization is needed. The activity of deleting a value is represented using either “DeleteValue”, “RegistryValueDeleted”, or %%1906″.

Of the list, I find that Result Details and Action are important and mostly overlooked. We tried to tackle the former in the ASIM authentication schema by specifying values for EventResultDetails. However, this is an area in which source devices differ widely, making it a real challenge. 

Why should you care?

The topics presented above help you understand better what value normalization is. There are other value normalization challenges, for example, ensuring the time fields format is consistent. There is also an adjacent problem of normalizing identifiers. All will be discussed in upcoming posts.

But is this important to you?

Yes. It is. If values are not normalized, you cannot create source agnostic analytics, and each query will have to handle each source’s peculiarities. Consequentially, you will have to understand each source intimately. Obviously, this eliminates one of the central goals of normalization.

You may think that the challenge is the schema definition. While none provides comprehensive support for value normalization, most pay at least lip service to it. What’ you should be checking is whether your parser, technical adapter, or app, even if marked as schema compliant, actually normalizes values. 

Should we normalize security data?

I haven’t blogged for quite a while. Recently I started spending my time again on security research, and the blogging itch is back. My current research focus is security data normalization, and in the next few posts, I will expand on this topic.

The first question that comes to mind is, why normalize? Should we normalize at all? That is obviously once we agree on what normalization is. So let’s start there.

At its core, normalization means that data collected from different sources should be converted to a uniform presentation or schema. Such a uniform schema enables analytics to be source agnostic. It also reduces the learning curve for analysts and enables them to be more productive. The article “SIEM Event Normalization Makes Raw Data Relevant to Both Humans and Machines” provides a good starting point for the rationale.

To deliver on the promise, SIEMs have tried to implement normalization since day one. ArcSight CEF and categorization, Splunk CIM, and QRadar LEEF are all normalization schemes.

Where they successful?

In his seminal blog post, “Security Correlation Then and Now: A Sad Truth About SIEM“, Anthon Chuvakin claims that they were not. And I tend to agree. Want proof? If you are a serious security analyst, the number 4624 means something to you. Obviously, it is the Windows Login event. More precisely, successful login (4625 logs failures). You might also know that Login Type 2 is “interactive”, or at least you know that you need to consult Randy Franklin Smith’s excellent Ultimate Windows Security. I have certainly used it a lot, as you can see on the right. Or just Google for 4624.

In a perfect world, an analyst would not need to know about event 4624. ArcSight categorization whitepaper mapped it already in the first decade of the millennium to this:

Now, how many people converse in 6424, and how many know the ArcSight categorization. How many systems analyze 4624 events, how many support ArcSight, or an alternative, categorization scheme?

So Anton has a point.

Now back to 2021. My current research at Microsoft, leading the Azure Sentinel Information Model (ASIM) initiative, enables me to get back to the challenge ArcSight started tackling more than 20 years ago. And I hope this time to move the needle. Let there be a generation of security analysts who don’t know what 4624 is (and not because Windows will die).

As a starting point, we recently released the ASIM Authentication model, which includes a normalizing 4624 parser. I am sure it is not perfect, and we are already getting ideas for improvement.

In the upcoming blog posts, I will discuss how we try to make normalization work this time. I will address areas such as:

  • Value normalization
  • Entities, entity IDs, and entity descriptors
  • Aliasing

So let’s start the journey.

Keeping Ahead of the Hackers

While my posts are typically more focused, Andy Green who is managing digital content at Varonis, thought it would be a good idea to share thoughts around the evolution of the threat landscape over the years: how did attack techniques evolve, the changes brought by the dark web and the economics of hacking, what we, the defenders are doing – wrong or right – and what we should do better.

So if you are into some techno-philosophical thoughts about cybersecurity, here it is:

Brute Force: Anatomy of an Attack

I am back to blogging, but my blog posts now appear on the Varonis blog. I will keep publishing links to those posts here for my loyal followers.

This time:

The media coverage of NotPetya has hidden what might have been a more significant attack: a brute force attack on the UK Parliament.  While for many it was simply fertile ground for Twitter Brexit jokes, an attack like this that targets a significant government body is a reminder that brute force remains a common threat to be addressed.

It also raises important questions as to how such an attack could have happened in the first place.  These issues do suggest that we need to look deeper into this important, but often misunderstood type of attack.

Read more…

Bobby Tables real life coutnterpart

If you are a member of the application security community, you are bound to know this hilarious xkcd cartoon. It is so good that it found its way to non-expert circles. I once got it physically framed as a birthday present from friends.


Like most of you, I though that this is a great way to explain SQL injection. For most of us, this is what it is. For a few, it is a real life problem

My dear friend Or Katz published an even more hilarious blog post outlining the challenges of someone who happens to have a first name which is an SQL keyword. His post is also a very good discussion of the use, or rather abuse, of signatures for web application security. A great and worthwhile read.

Anniversary to the ModSecurity Core Rule Set celebrated with a new major release

I have a very warm place reserved for the ModSecurity Core Rule Set (CRS). I created it a decade ago. Actually the first release in the readme file, labeled 1.1, is dated to October 2006, so this is an anniversary. And what a great present I got for the Anniversary from Chaim Sanders, Walter Hop and my dear friend Christian Folini: a brand new major release!

If you don’t know what the CRS is, a short introduction is due. Continue reading

The WAF Guidebook: What is a Web Application Firewall?

Simply put, Web Application Firewalls are security controls designed to provide the best automated operational protection for HTTP applications, whether web based on mobile. What is “the best” protection, or even “sufficient protection” is not a simple question.  As a result there is a spectrum of solutions for protecting web applications with varying quality and functionality. Which one can call itself a web application firewall is not an easy question to answer.

Probably the only way to define a web application firewall is to list the key features common to web application firewalls uniquely suited for protecting web and mobile applications and which would differentiate than other operational security controls such as intrusion prevention systems and network firewalls. The following sections touch on those key features of WAFs. A fuller discussion of the features will follow in later posts. Continue reading