[Snort-devel] draft rfc - reengineering the snort packet matching system

Todd Lewis tlewis at ...255...
Wed Apr 4 00:30:20 EDT 2001


I had told myself that I would quit when I hit 10k words, but I am simply
exhausted.  This document is not complete, but I think that it is enough
to communicate the gist of my proposal.  (At 8866 words, it certainly
should be!)  The parts that I have left out are mostly peripheral.
Oh, and the actual API is kind of soft, but it does a decent job of
suggesting what I am trying to do.

Of the three people who volunteered to review this document, two of them
were Fyodor and Marty, who were the two people for whom I was trying to
get this document into good shape.  The third person, Abe, will be out
of town for a while, so I decided to go ahead and get this draft out
for review.

This proposal represents a big change for snort, and if it is adopted,
then I want it to be for the right reasons.  Therefore, I urge any
reviewers to apply as much criticism as they can muster to these ideas.

Enjoy.

************************************************************************
TITLE: A Proposal for Reengineering Snort's Packet Matching System
AUTHOR: Todd Lewis
DATE: Wed Apr  4 00:27:51 EDT 2001
STATUS: draft
VERSION: 0.00

I. Introduction
	1. purpose

There are several well-known problems with snort's rule system.
This document describes the present system, documents several of its
problems, and proposes a replacement system, complete with design
criteria, a precise description of the new system complete with a
proposed API, and detailed consideration of many interesting issues
involved with the approach suggested, including its amenity to possible
optimization strategies.  This proposal is targeted at the snort 2.0
development effort.  It is the author's hope that this proposal can
play a positive role in the development of the next version of snort.

	2. snort's two parts

In snort, there should be two components to a rule: a set of criteria
that traffic must meet in order to match the rule (matching), and
a set of operations that should be performed upon matching a rule
(post-processing).  In snort v1, there is some confusion between these
two components.  This document proposes a system for snort v2 that
reestablishes the clear division between these two components of a rule
and that builds a firm basis for the proper and efficient execution
of the first component, i.e., the packet matching component of a rule.
Also included in this document is a discussion of what post-processing
would look like if this proposal is accepted.

	3. the inverse database model

(There is a clear desire among the snort developer community to make
the process of matching rules to packets as efficient as possible.
In casting about for some sort of model to help conceptualize this desire
and facilitate capturing prior art on the subject, I stumbled across a
model that I think is fairly accurate.  Unfortunately, it is also fairly
useless in terms of directly-applicable previous work, but it was helpful
for me, so I describe it here.)

One can conceptualize NID systems as the inverse of a database model.
Rather than having a single query that is applied against a set of data
(or  a relation, in the case of RDBMSs), an NIDS consists of a set of
queries (the rules) and a single piece of data (the datagram).  Since a
single datagram does not constitute a relation in any meaningful way,
and since trying to invert the model by viewing a set of rules as a
relation and the data as a query is apparently a worthless pursuit, the
last 25 years or so worth of database research, concentrated as it is
on the relational model, turns out to be almost completely worthless to
the problem at hand.  However, the work done on extensible data types,
especially in object relational databases like PostgreSQL, and a good
bit of the research on query optimization does turn out to be useful,
and this proposal therefore owes something to the database community.

	4. Packet acquisition is assumed

The feedback to previous development of the packet acquisition
engine (paengine) system has been very positive, and there seems to be
consensus that it should be incorporated into snort v2.  Therefore,
the paengine system is considered as a given in the discussions below.
This means that they are relied on for packet acquisition and that support
for passing verdicts on traffic is a requirement for the system.

II. The Status Quo

	1. description

Packets arrive into snort via a packet acquisition engine.  This engine
tells snort what media type describes the traffic that it generates.
Snort then takes the datagrams from the paengine and passes it to the
appropriate packet processor, the job of which is to extract from the
datagram the IP packet.  No matching is possible on protocols lower than
or different from IP.

Snort has preprocessors which take a stab at the traffic before it
gets inserted into the rule system, optionally performing appropriate
transforms before the matching begins.  This is where tasks such
as defragmentation and unicode translation are performed in order to
simplify the task of rule matching.

After preprocessing, packets are submitted for rule matching.  The present
snort rule system supports describing internet protocol traffic in terms
of source and destination network addresses and TCP or UDP port numbers.
The entire system, from the syntax of the rule file to the data structure
representing the rule to the code for evaluating rules, is hard-coded
into snort's core and cannot be extended easily.

Snort has packet matching capabilities beyond those provided by this
system by virtue of rule options which actually serve as packet matching
criteria.  These options are invoked on traffic that meets the formal
packet matching criteria and serve further to winnow traffic before
it is subjected to non-matching option processing.  Since, unlike
the formal matching system, the option system is extensible, this is
where the energy has gone in snort v1 to extend snort's capabilities to
distinguish different types of traffic.

	2. indictment

There are many problems with this system, and they have been discussed
lengthily on the snort-devel mailing list.  This set is by no means
complete, and these do not necessarily reflect any consensus view,
merely the author's own.

- rule system non-extensible

All of the problems listed below are compounded by the original sin
of the rule system, which is that it is non-extensible.  As a result,
incremental additions to the existing system to address its defects are
impossible, and a complete rewrite is therefore necessary.

- rule system primitive

The rule system itself supports a very small set of criteria which can
be applied to traffic.  While address and port and certainly at the
top of the list in terms of useful information about traffic, they are
not exhaustive.

- confusion between matching and post-processing in options

The rule options appear originally to have been intended to facilitate
reporting and other post-match operations on traffic which met the formal
matching criteria.  However, since the formal rule matching system
is not extensible, options have been overloaded to include informal
matching criteria.  This herniation shows the pressure from the user base
to be able to extend the rule matching criteria to meet new demands.
Unfortunately, the pressure was vented in the wrong place, making the
option system a hodgepodge of muddy purpose.

- no support for non-ip protocols

Yes, grasshopper, although I've never actually seen them, there are
still IPX networks out there, and they do suffer attacks.  IPv6 will be a
complete bitch under the present system, too.  Heck, even supporting IP
protocols other than the big three (TCP, UDP and ICMP) is a bear under
the present system.  How does one describe a rule for an OSPF packet
advertising a certain OSPF area?  This sort of thing is impossible under
the present rule system.

- no visibility into lower level protocols

Requests have come up for rules to detect ethernet attributes like VLAN
tags.  Since the present system is IP-only and offers no visibility into
lower-level protocols, this is impossible.

- no support provided by snort for examining protocol features

Snort obviously understands TCP, because it can detect TCP ports to
do rule matching.  However, when a plugin wants to examine other TCP
features, it has to re-decode the packet itself.  This is redundant and
typical of how snort offers no assistance to other code that wants to
test certain protocol features.

- too complicated

The physical media decoders, the preprocessors, the rule system, the
matching options, all of these things look at and fiddle with packets in
one form or another.  Very often among the developers, a new feature will
be proposed and a debate ensues over whether it should be a preprocessor
or an option.  There is not very much reason behind the existence of
these various components.

- not easy to optimize matching

Because the vast majority of the matching operations exist as options,
there is no easy way to optimize their application.  The very cool
regexp collapser submitted several months ago was typical of the sort of
optimizations that you want to make with matching criteria across multiple
rules; the degree to which the developer had to bend over backwards to
do so under the present system demonstrates just how hard that is to do.

- not easy to parallelize

How would you divide up the workload of a system this complicated?
By touching so many different types of systems in the processing a single
packet, the present system is very hard to understand in terms of what
concurrency control is needed for parallel operation.

	3. good stuff

The present system is not all bad.  In fact, all things considered,
it is pretty darned good.  After all, it was good enough to get snort
to the top of the free software NIDS heap and a wide user base.

Personally, my favorite part about the present system is the extensibility
offered by option plugins.  Since they are the only escape valve in the
present system, they offer a good indication of how much work people will
be willing to do to add new matching criteria to the system, and that
indication provides a lot of hope.  My complaint with the option system
is that it is in the wrong place (at the end of a convoluted process)
and that because it is misplaced it is unoptimizeable.

The present system is easy to understand and is used by a wide variety
of people with varying skill sets.  As a result, reverse compatibility
shoudl be a primary design requirement of any proposed replacement and
is for this one.

That the present system is a hodgepodge, while unfortunate from a
programming perspective, does bring one's attention to the fact that
the good majority of all pending NIDS itches have been scratched by the
present system.  The task at hand is not to destroy what has been built,
but to clean it up and make a better, coherent system out of what has
been cobbled together to date.

III. design principles

These are the design principles which have guided the development of
this proposed new system.

	0. be backwards-compatible

The present system is a good one.  Lots of work has gone into developing
rules for the existing system.	Any new system should be capable of
supporting the existing rule system.

	1. be agnostic on rule syntax

Any new system should export an API to be used for declaring rules.
There is desire and need to experiment with rule syntaxes other than the
existing one and with rule storage formats other than flat file storage.
Experimentation in this area should be possible, which means that the
core should be agnostic towards actual rule declaration.

	2. become flexible

The present system is not very flexible.  There are three specific
things not presently possible that should be possible.

		+ make all protocols first-level entities

Why can I define rules based on TCP flags but not DNS flags?  Why
is IP supported but ethernet is not?  Although I certainly understand
the circumstances that led to snort's present form, this rigidity is
obviously a problem.  What we need to do is allow support for _all_
protocols in rule definitions, not just IP, TCP, UDP and ICMP.

		+ allow variety in protocol stacking

The present system is very biased towards (one of the big three IP
protocols) riding over IP riding over a LAN protocol.  Once we start
getting into tunnelling or application protocols that can ride over
non-IP protocols (LDAP comes to mind), everything starts to get mushy.
For that matter, HTTP over UDP requires a separate rule definition.
Once we start supporting Appletalk over ethernet, then how to support
Appletalk tunneled over IP becomes a very, very big question.

I suggest that what we need to do is become agnostic on what protocols
lie where in the stack.  Let's just start decoding and let the chips fall
where they may.  If we can build a system that makes no assumptions
concerning what protocols lie where, then the user can handle any
situations that come his way.  That is the intent of this proposal.

		+ allow easy support for new protocols

Along with making all protocols into first-level entities comes the
obvious implication that people are going to want to add support for
new protocols.  As with everything else, anything worth doing is worth
doing without a restart, and so this proposal tries to ensure that
adding support for new protocols is a trivially easy operation.

		+ allow new operations on existing protocols 

Once, e.g., you can define rules based on DNS flags as well as TCP
ports, what happens when a new RR record comes out, or Vixie and his
cronies decide completely to overload the meaning of various DNS fields
for some boneheaded new DNS application?  Well, you're going to want to
have new operations on DNS traffic to detect related attacks, especially
considering that the world still relies BIND.  It is important that,
just as you can add support for new protocols, that you can add new
operations on already-supported protocols.

	3. collapse Packet Processors, preprocessors, rule evaluation and
		packet-matching options into a single system

Whenever you see related but different systems, the presumption should
always be, "There's got to be a good way to collapse this functionality."
Your project is always better off if you can.  This proposal tries to
do just that, collapsing these four different systems into a single
interface that can accomplish the work of all four and hopefully do it
better.

	4. have protocol support separated from core

In the present system, knowledge of IP and the big three is hard-wired
into the core.  This is the source of the inflexibility of the present
system.  In any new system, protocol support should be separated from
the core and abstracted behind a well-defined API.  This will allow the
sorts of improvements described elsewhere in this list.

	5. provide primitives to allow for naive and optimized total
		evaluation 

Determining which rules match a packet is always going to be a very
expensive operation.  The way to speed it up, however, is to optimize
what operations get called.  Although the temptation is to take all of
this code and throw it into one megalithic bundle and try to hand-optimize
the cost of invoking various operations, this is exactly the _wrong_ way
to try to optimize packet matching.  Rather, the operations related to
particular protocols should be _well abstracted_ from the core, and the
optimization work should be done in the core, not trying to extract 5%
extra efficiency out of a given operation, but rather in invoking the
operations intelligently and minimizing how many of them you perform.
The history from the database world is very clear that optimizing which
operations are performed and the order in which they are performed is
a vastly superior optimization strategy to trying to minimize the cost
of an individual operation.

Therefore, the role of the protocol engines should be to provide a set of
very simple primitive commands through the protocol engine API, and the
task of invoking these primitives, and of deciding which ones need to be
invoked, in what order, and on which rules, should be handled by the core.
Viz. the extensible type system under PostgreSQL for a good example of
this sort of organization.

	6. have matcher well-abstracted, so that we can experiment with
		alternate implementations; this will help a lot with
		optimization work

A system like the one described here will be very amenable to
optimization.  Indeed, later in this document, I describe several
potential optimization strategies in order to demonstrate that this
system really is amenable to optimization work.  This work will be very
challenging and will not be brought to fruition immediately; indeed,
it will probably be a continuous and ongoing project in the snort
developer community.  Progress will be slow and incremental, resulting
from many individual efforts.

Therefore, it is important that the core matching engine (the "matcher")
have well-defined interfaces to the other subsystems within snort.
Ideally, it could be a module, replaceable at run-time.  In this way,
multiple matcher development efforts could exist simultaneously and not
have to fight over control of the core code in CVS while they pursue
their individual optimization strategies.

	7. support multiple rule methods via a well-defined interface
		and config plugins

Some people like the existing snort rule syntax and some would prefer
something different, like, e.g., xml.  Some people get by just fine
with rule files and some people wish to use databases to store rules.
All of these people can be supported if the snort core gets out of the
rule importing business and instead offers an API to plugins so that
they may control the ruleset in any way they wish.

	8. allow run-time rule examination and update 

Users, especially those using snort as a firewall component and those with
very large rule sets, have a real need not to restart snort, which can
be a very expensive operation.  Therefore, the need exists to be able to
examine the rules in place and to modify the ruleset without restarting.
This support should be extended via an API to the rule importing module,
which would have the job of figuring out how to communicate with the
outside world.

	9. allow for multiple simultaneous packet matching operations and
		for simultaneous matching and post-processing operations

Gigabit ethernet is here, and people running GigE networks need NIDS
just like the rest of us.  The only way to support anything close to
GigE speeds, even with gigahertz processors, is by having snort run
simultaneously on multiple processors.  Snort needs to be thread-safe,
and this interface needs to be thread-safe.  Further, this interface
should not restrict strategies for parallelization unless absolutely
necessary.

	10. do _not_ undertake tasks that should be relegated to
		post-processing, like portscan detection, pattern
		analysis; just do packet matching as well and efficiently
		as possible

We need to be very clear about what this system accomplishes: it
examines packets and determines if they match the criteria set forth
in a rule.  That is all.  Any function that needs to be performed on
a packet that matches a rule should happen on the other side of the
fence, in post-processing.  There should be a clear interface to the
post-processing stage, but no duplication of function.

IV. an aside on memory management

While the number of _types_ of systems will go down under this proposal,
then number of _instances_ of systems will go up.  Memory management can
grow the be a real bitch when you have a gazillion people touching a
given request.

An idea thrown around among the PostgreSQL developers (I think) was,
that you have a single context for each request that gets passed around
with the request.  Anyone who needs memory associated with the request
gets it from a special memory allocator, which ties that memory to the
context.  At the end of processing, you just perform a single release
on the context.

This would be a great way to deal with memory management under the system
described herein, and so it is included.

V. proposal

	1. protocol numbering

The new system has no knowledge of any protocols in the core or in
the matching code.  Instead, it understands protocols by their number,
and it uses the protocol number to find a handler for the protocol.

		+ families and protocols

This scheme is premised on the observations that all network protocols,
except for those at the top of the stack (i.e., application protocols
like dns and higher), serve to carry other protocols, and that modern
ones distinguish their higher-level protocols by use of a protocol number.

Protocols are numbered using a two-part tuple: (family, protocol).
Each of these is proposed as a 32-bit integer.  (16-bit values may work,
but there's been talk of how it's insufficient for TCP and UDP, and I
do not have enough knowledge of other protocol families to assure that
16 bits would be sufficient.  An extra word won't kill us.)  For many
cases, a protocol tuple will describe a protocol that is itself a family
and has its own subsidiary protocols.

		+ examples

At the bottom of the stack is the pcap family.  An ethernet frame would
be described using the tuple (pcap, ethernet).  See pcap.h for a list
of the protocols in this family.  (Look for "DLT_".)  Alternatively,
we could adopt our own numbering scheme for data link types if we are
concerned about pcap falling behind or whatever.

Ethernet, of course, is itself a protocol family.  Ethernet supports
carrying multiple network protocols, such as IPX, Decnet, Banyan Vines
and X.121.  The most popular Ethernet protocol, however, is the Internet
Protocol.  This would be described as (ethernet, ip).  (In some of
these descriptions of tuples, I use a textual form, but of course, these
would be translated into integers in the actual code, through macros or
a lookup facility.)  (The Linux kernel file "include/linux/if_ether.h"
defines the IP protocol number ("ETH_P_IP") as 0x0800, but I for some
reason think that this doesn't actually mean 2048.  Maybe endianness?
You're free to look there for the ethernet protocol numbers it lists
for the protocols I've listed; RFC-1700 has a list with different values.)

IP, of course, itself supports multiple protocols.  For an incomplete
list, look at /etc/protocols on any unix box.  For a complete list,
look in RFC-1700 (search for "Assigned Internet Protocol Numbers"),
which also includes descriptions of most of the protocol families
mentioned here.  Among these protocols are the Network Voice Protocol
(ip, 11), the Reliable Data Protocol (ip, 27), Secure VMTP (ip, 82),
and TCF (ip, 87).  Most popular, however, are TCP (ip, 6), UDP (ip, 17)
and ICMP (ip, 1).  Let's go with UDP.

UDP, of course, itself is a protocol family.  For the most part, its
protocol numbers align with those of TCP, and both may be found on any
unix system in /etc/services.  Among its protocols are the Appletalk
Routing Maintenace Protocol, or "at-rtmp" (udp, 201), ansi Z39.50 (udp,
210), GENRAD-MUX (udp, 176) and the UAAC Protocol, whatever the hell that
is (udp, 145).  A very useful UDP protocol, however, is DNS, which is
(udp, 53).

Among the UDP protocols listed, at-rtmp is very interesting, because
the content of ((pcap, ethernet), (ethernet, ip), (ip, udp), (udp,
201)) is identical to the content of a packet made up of ((ethernet,
appletalk-ddp), (ddp, rtmp)), because Appletalk supports both native
operation over ethernet and tunneled operation over IP.   Ditto for the
various DOS-universe file sharing protocols, netbios and its SMB progeny;
they can, in various forms, run natively over ethernet or over IP.
If windows boxes to this day are vulnerable to attacks launched over the
ethernet variety of the protocol, and they are, then shouldn't snort be
able to detect them?

		+ need for a registrar

Protocol families need to be uniquely numbered.  Initial ones that
obviously need assigning are pcap, IP, TCP, UDP, ethernet, and any other
media types that see common use.  (Many media types reuse ethernet's
numbering scheme, so that should be easy.)  Someone would need to
volunteer to assign these numbers.  Maybe in addition to the RFC Editor
we need a Snort Assigned Number Authority (SANA)?

The maintainers of the code for the various protocols (described below)
will need to publicise their protocol assignments so that the people
who implement support for _those_ protocols will know how to advertise
their support for them.  A SANA would also be useful here.

	2. protocol engines (pe)

At the heart of this proposal is a new type of entity: the protocol
engine.  These engines will implement the pe API, and there will be many
of them.  Protocol engines will advertise their protocol support, support
rule examination and manipulation, and perform two types of operations:
decomposing packets and performing match operations.  When we wish to
add support for a new protocol, we simple write a new protocol engine.
When we wish to support a new type of matching for an existing protocol,
we just add a new operation to its protocol engine.

		+ advertise supported protocols

The pe advertises what protocols it supports to snort.  The interface
could look like this:

	typedef uint32[2] proto;
	proto* supported_protocols;

This would return an array of protocol descriptions.  You need to support
multiple so that, e.g., the pe for IP can advertise support for, among
others, both (ether, ip) and (ip, ip), which is IP tunneled over itself.

I envision protocol engines being shared modules living in their own
directory.  Snort could scan (or rescan) all of the modules in the
directory to get a catalogue of what protocols it can support.  Then,
as rules are added, the protocol engines are demand-loaded into snort
as needed to implement the rules.  This, however, is an implementation
detail.

		+ rule examination and manipulation

The matching part of a rule would consist of two parts: the logical
structure and organization, which is owned by the matcher, and the
individual matching operations, which are owned by the protocol engines.
The pe's need to support examination and manipulation of their portion
of the rule set.  This process is explained in a separate section, below.

		+ decompose dg's

Decomposition means analyzing the header for its own protocol and
identifying any encapsulated protocols in its payload and submitting that
encapsulted data, with the appropriate tag indicating what protocol it
is, back into the core for further processing.  This process should be
clearer when the dg and pl data structures are introduced, below.

		+ support multiple (numbered) match operations

Finally, each protocol engine will support a set of (zero or more)
matching operations.  These will also be numbered as integers.

These operations would be passed into the engines in textual form.
Take the DNS protocol engine for example.  It could support the following
operations:

# | op name   | description
----------------------------------------------------------
0 | null      | null operation
1 | id        | Range of matching query IDs
2 | qr        | Testing the query bit
3 | opcode    | Testing the DNS opcode
------------------------------------------------------------

Of course, this is just the beginning of the sorts of operations that
could be supported for DNS traffic.  Just take a look at section 4.1 of
RFC 1035 to see all of the fields in a DNS header that could be tested,
not to mention the actual content with the various RR types.  You could
start out with a simple set of operations and extend them as time goes
on by simply appending them to the list of operators.

I suggest that the arguments to these operations be simple text fields.
Thus, for operation #3, you could submit several matching opcodes in this
way:

"3-15"

This would indicate the opcodes that are not assigned in rfc-1035.
More complicated options are easily possible, like complicated port
ranges for tcp:

"0-1023,6000-6100"

This form requires that protocol engines describe their syntax for
operation arguments in a clear way.  There is a possibility here for
sharing code for decoding arguments among engines, like a decoder that
takes a comma-delimited sequence of ranges and returns an int[2][].

(I have debated whether this should be a text field or some sort of more
complicated data structure.  To be honest, I am not sure, and I certainly
welcome feedback on this issue.)

	3. the dg and pl structures

(All data structures and functions are described in the appendix.)

		+ definition

datagram is the parent structure; with it are associated one or more
payload.  The paengine returns the datagram structure with the initial
payload filled out and the mempool allocated; the core then submits it
to protocol engines for further stages of decomposition and matching.

		+ naming, or, why dg?

According to RFC-1122, the terms segment, message, datagram, packet
and frame all have meaning only at specific places in the OSI stack.
The model proposed herein transcends the stack, so terminology is
therefore a problem.	 As there is apparently no appropriate term
for a piece of transmitted network information at an arbitrary location
in the stack, this structure is called a dg with no further explanation.

	4. rule management

(Ok, so rules live basically in three different places: in the matcher,
in the protocol engines, and over in post-processing-land, which,
depending on how that's reengineered for v2, may itself be multiple
places.  The systems that take in rules from wherever (flat files,
databases) in whatever syntax (conventional, xml, lisp-like) will
need to do the work of coordinating all of these interfaes as they
manipulate the ruleset.  Helper routines to assist in this task are
not out of the question.

(This part of the interface is certainly doable; I've just
outlined the description here and not filled it out.  It's not crucial
for communicating the gist of this proposal, and I am triaging content
at this point.)

		+ interface
		+ core controls logical operations & optimization
		+ pe's perform actual operations
		+ pe's demand loaded at rule-add time, not at packet-acquisition time
		+ examining rules
		+ potential need for concurrency control

	5. packet handling by the core

How does snort's core work under this proposal?  Here's a description.
(See also the pseudocode in the appendix.)

		+ getting packets from paengines

First, the core gets a datagram from the paengine.  This dg has the initial
payload filled out and an associated mempool allocated.

		+ submitting packets to the matcher

Second, the core passes the dg into the matcher.  The matcher returns the
set of rules which match the datagram; if this set is non-null, then the
datagram is expected to be fully decomposed.

(I've been reading all sorts of great things of late about functional
programing, the bane of which is side effects, so Xavier LeRoy, if you
ever read this document, please forgive me.)

		+ handing a match over for post-processing

Third, if there is a match, then the core passes the dg into the
option handler for post-processing.

			* potential asynchronicity

Ideally, the post-processors would copy any data they need and then
do any lengthy work in a separate thread.  This possibility should be
kept in mind through the discussion and implementation phases.

			* options return verdict

Post-processing is where the verdict on a given packet is rendered.
This placement is based on the dogma that of the two halves of the
processing universe, the matching half is supposed _only_ to do matching.
Thus, the entire loop needs to be blocked until the verdict is rendered.
(Viz. above, "asynchronicity".)

			* need for a similar plugin system on post-processing side

(It would be great if someone could put together a clean interface
for post-processing modules in this new universe.  I don't think
that I am going to be up to it at this rate!

(There a bunch of cool optimization angles on the post-processing side
that fall out of this sort of organization of rules.  If I were involved
in such things, I would suggest having "rules" simply be a coordination
point between the matching and post-processing sides; you could have mesh
matches, post-processing directives and rule numbers in arbitrary ways.
(Many-to-one, one-to-many, etc.)  If anyone is interested, then ask me
about this in a week, and I will be happy to share my ideas.)

	6. packet matching by the matcher

What do matchers do?  Well, as Phillip Greenspun is reputed to have said:

     Any sufficiently complicated C or Fortran program contains an ad
     hoc informally-specified bug-ridden slow implementation of half of
     Common Lisp.

This is the function of the matcher.  8^)  Matchers first decompose the
datagram and then evaluate the matching operators in order to determine
what rules match the datagram.

		+ the decomposition stage

The matcher decomposes a datagram into its various protocol payloads.

			* interface

The protocol engine includes a decomposition interface:

	int    (*decompose)(dg *d);

The matcher passes a datagram to the protocol engine for decomposition
as appropriate.  During the process of descending through a datagram,
the previous protocol engine will have places a pl on the dg labeled
with the appropriate protocol; the matcher simply needs to look up
the protocol engine for that protocol and then submit the dg to that
pe for further processing.  This process continues until a pe does not
add a new pl to the dg.  Here's some pseudo code that might help:

	/* decomposition */
	pl=dg->pl;
	while(pl!=dg->pl){
		pe=pe_lookup(dg->pl);
		pe->decompose(dg); /* the side-effect of this is important */
		pl=dg->pl;
	}

			* need to support rules for mapping traffic to protocols

There are occassions where a user may want to map a specific port
(or whatever) to a specific protocol.  E.g., http is usually (tcp, 80)
but is sometimes (tcp, 8080) on certain ports.  This problem should
be pretty easily solvable under this regime but is left for later
consideration.

			* need to have matcher in control of decomp; stages may be mixed

It may be optimal to mix decomposition and match evaluation; it may be
possible to shoot down datagrams before full decomposition is required.
Therefore, rather than making decomposition recursive, it is invoked
repeatedly by the matcher, which has it as its option to optimize the
order of invocation as it sees best.

		+ the evaluation stage

The evaluation stage consists of the matcher invoking protocol operations
against whatever candidates and in whatever order it considers best.
Varying this candidate set and the order is the path to optimization.

			* interface

The matcher passes a set of candidate criteria into the pe via the
find_matches interface:

	int*   (*find_matches)(dg* d, int operator, int* candidates);

			* protocol prerequisites 

Somehow, there needs to be a way to specify such prerequisites as (atalk
over enet but not tunneled over ip).  This matter is left for a future
revision of this document.

VI. How do these things work?

In this section, various implementation implications of this proposal
are discussed.  (Wow, was that alliterative!)

	1. optimizing in the matcher

This section discusses several optimization strategies in the matcher.
This discussion is not intended to be normative or even necessarily
suggestive, but merely to demonstrate that the system described above
provides the necessary facilities for developers to construct optimizing
rule matchers.

		+ naive case

The initial version of the matcher should probably just consider all
of the matching criteria and then combine them to reach a decision.
This version of the matcher will be slow, but it will be easy to confirm
as correct and fast to program.

		+ minimize intra-rule cost by early shootdown

An easy way to optimize the evaluation of rules is to build them into
a decision tree and to evaluate it recursively, but at each stage to
take the choice with the fewest children.  Take, e.g., cases that have
a top-level branch that consists of a single element; this is almost
always the optimal decision to make at the first branch.  This is an
easy thing to implement in the matcher.

		+ minimize inter-rule cost by collapsing duplicate criteria

Say that several rules share the matching criterion,

	protocol: (ethernet, ip)
	operation: from
	arg: "192.168.16.0/24"

In this case, if the pe for (ethernet, ip) can detect this duplication,
then when it receives a request for all matches, it can evaluate this
criterion once instead of multiple times.  This optimization can be
affected either in the configuration module or in the protocol engine or,
preferrably, in both, in order to catch slips by the other.

		+ minimize evaluation cost by allowing single-pass, multiple-criteria runs

Take, e.g., the task of applying multiple regular expressions to an
http payload.  It is well documented for various regexp engines how to
take multiple patterns and evaluate them in a single pass, rather than
performing one pass over the payload for each regexp, as is the common
(and current) case.  Even when the matcher requests some subset of all
matches for the regexp operator in a given pe, the pe could go ahead
and run the combined regexp, returning those that match the request and
caching all others in that datagram's mempool against future requests.

		+ minimize decomposition cost by early abandonment if no candidate rules

Say that all rules have an IP address requirement, and the ethernet
datagram decomposes into an IPX payload.  If there were some way to ensure
that there was no path that could lead from an IPX payload into an IP
payload, then processing could stop at this point.  Although there is
no provision for it in this proposal, protocol engines could advertise
not only the protocols that they accept, but also the protocols that
they return; if they did so, and it would be easy, then this sort of
optimization would be possible.

		+ minimize analysis cost by culling candidates based on protocol type

For protocol operators the cost of which is proportional to O(n),
where n is the number of candidate criteria submitted for matching,
the matcher could reduce the size of n from the complete set of matching
criteria to some subset based on rules that depend on protocols that are
not present in this datagram.  While this would not help with operators
where the cost is proportional to O(1), like a combined regexp operator,
those are exceptional cases, and this optimization would actually help
in most cases.

		+ chose wisely among possible tests based on db-style performance metrics

This is probably the best strategy of all when combined with an
intelligent operation combination strategy, because the real cost
comes not in chosing between (one) versus (five or six) operations in
matching a rule.  Rather, the real cost comes in the (one) versus (one
thousand) difference in the cost of different operations.  If you know
which operations are very expensive, then you can go to great lengths
to avoid them.  This is a prime database optimization strategy.  See,
e.g., the book "Database System Implementation", by Garcia-Molina, et
al., section 7.4, "Estimating the Cost of Operations", and section 7.5,
"Introduction to Cost-Based Plan Selection".

Since protocol engines do not export cost information under this proposal,
this strategy would require either metering the cost of matching
operations or the importation of cost profiles; both are doable.

	2. session tracking in stateful protocols

		+ reporting in options
		+ memory management
		+ include timeouts on UDP sessions
		+ have callbacks on session expiration?

Right now, there's not much support for stateful protocols outside
of the tcp stream preprocessor.  Support for stateful protocols can
be much better under this scheme.

If you want to keep stats on sessions, then the protocol
engine is the best place to do it.  A protocol engine for such a
protocol, like (ip, tcp), could include session information in the
protocol_specific_operations field of its pl, and the post-processors
could access that information to report on it.  In order to trigger
end-of-session reporting, just have a rule triggered by the equivalent
of the FIN flag or by the timeout.

(The above suggestion of a rule triggered by the session timeout, is
not actually supported in this proposal because of the complexity it
would add.  I am interested to hear people's opinions on whether such
support would be worth the effort.)

Sessions are doable even for protocols that are formally stateless,
like UDP.  Just look at any NAT system's UDP code for an example.

A big issue with stateful protocols is memory management.  This is
a good avenue for hackers to foul up your system by creatin a lot of
meaningless sessions.  If this is a serious issue, then an interface
could be created for the memory management system to communicate memory
pressure onto the protocol engines for stateful protocols, allowing them
to discard sessions in their order of plausibility, or whatever.

	3. defragmentation
		+ need to evaluate multiple times for verdict reasons
		+ need to supress redundant matches

Another existing preprocessor is the degragmentation spp.  This function
can be accomplished in a protocol engine, provided that one issue is
tackled: multiple rule evaluation.

Say you get an initial fragment in.  It either constitutes an effective
attack in and of itself, or it does not.  Because of the possibility that
it is an effective attack, you need to evaluate it against at least some
subset of the applicable rules and potentially reject its passage.

Then, the second fragment comes in.  (This discussion is applicable to
a plurality of fragments, but two are enough to illustrate the issue.)
This new fragment, combined with the original packet, may or may not be
enough to constitute an effective attack.  Therefore, you need to test
it against the applicable rules that might have been triggered by the
newer, more complete version of this packet.

These two sets of rules only overlap in the rules that at their highest
depend on decoded protocol of the first fragment and that were not
actually matched, I think.  Anyway, I think that it's a solvable problem
to determine what these two sets are, using the tools made available
in this framework, but I haven't actually solved the problem myself.

The only real problem with a naive approach here is that rules may
be triggered twice, once by the original fragment and then again when
the newer, more complete packet is resubmitted.

Oh, and of course, the memory management issues apply the same as for
stateful protocols, since it's basically the same problem.

	4. VPN, other crypto, interposition

This one is easy.  You have a protocol engine for the VPN protocol that
returns decoded payloads for further processing.  This is an area where
the new system is clearly better than the old.

	5. grabbing misc, non-matching traffic

How do you say, "Give me all traffic that doesn't match any
already-described rules"?  This is an important operation for anomolous
traffic detection.  The answer is that this is a function of the protocol
engines; add an appropriate operation or operand syntax to make this
happen.

	6. rpc registration

How does RPC protocol support happen?  Well, you intercept and decode
portmap requests in the portmap protocol engine, and the portmap pe
then registers protocol mappings with the core according to what is
passed.

	7. unicode normalization & other packet rewriting (icky)

Any packet rewriting is an open issue.  This is not related to the issue
of packet matching, but rather is a left-over issue from the paengine
project.  It should be answered in that context; this framework will
support whatever answer comes out of that discussion.

	8. portscan detection

Add some random TCP/UDP port matching rules and then do detection in
post-processing.

	9. verdicts

Do it in post-processing.  Perhaps inline this post-processing operation.

VII. future development

Where do we go from here?

	1. migration
		+ transform PktProcessors to pe's

Each grinder needs to be transformed into a pe.  On the upside, this will
allow us to add in support for, e.g., ethernet rules.

		+ break snort core into ip, tcp, etc.

Each of the existing matching operators needs to be translated into an
operator in the appropriate protocol engine.

		+ migrate matching options, e.g., itype/sp_icmp_type_check.c to icmp pe

Each of the existing matching options will need a well-defined mapping
to a new pe and one of its operators.

		+ update paengine interface

The paengine interface will need updating to return a dg.

		+ build adaptor for existing rules

This should be a straightforward piece of code once the mapping of matching
options to new pe's is performed.

	2. adding facilities shared among multiple pe's

It is possible for some facilities to be shared across multiple protocol
engines.  Examples include regexp, encryption protocols and decoding of
matching operation parameters, like sequences of ranges.

	3. enhancing pe's

Once the initial implementation is done, there are many possible
enhancements that can be made to the protocol engines.

		+ speeding up operations like regexps

Anytime you can take multiple passes and serve the same purpose with a
single pass, you win.  Linked lists to hash tables, all of the normal
optimization strategies will apply.

		+ evaluate all & cache results, even on subset queries

When the matcher asks for matches on a subset of criteria, for some
operations, like regexp matching, you can match all criteria and
cache the surplus against future requests.

		+ adding new operations

This is the best part.  There is a whole world of new matching operations
that we can support in snort v2!  Just look at RFC-1035; there must be at
least 20 in there.  We can multiply these easily and at little cost under
this system, so let's run with this!

	4. adding new pe's
		+ exciting new area!
		+ ethernet w/ vlan tests
		+ atalk, ipx
		+ smb, nfs, etc.

Let's add a crapload of new protocol engines.  DNS, SMB, NTP, DHCP, ARP,
there are a million protocols that need support.  Snort will truly rock
as we add support for every protocol under the sun.

	5. optimizing core matching engine
		+ I've mentioned some strategies, but those were for
			proof-of-concept; someone should give it real thought
		+ should be deferred until after 2.0 (perhaps 2.1)

	6. add triggers on session state

This is a maybe feature, described above.

	7. make dynamically-discovered protocol mappings persistent across restarts

This is a sticky issue that I defer discussing until later.

VIII. where this leaves post-processing

Someone needs to work on a similarly abstracted post-processing system.

	- in need of a good design
	- will no longer be abused for matching purposes
	- with much better protocol support
	- with a clear mission: to take packets that match rules and do
		whatever needs doing with them
	- since most existing options will be rolled into pe's, what
		post-processing modules will there be?
		+ verdict setting
		+ portscan detection
		+ event reporting (lots here)
		+ adaptive modules that add new rules!
		+ cooperating with pe's on traffic profiling (but be
			wary of optimizing away decomposition!)

APPENDIX - API

I. data structures
	1. mempool

This is an opaque data structure used by the pool allocator.  The paengine
creates the mempool when it creates the dg and frees it when it is passed
the verdict on the dg.

typedef struct _mempool mempool;

	2. ruleset

This is simply a list of rule numbers.  Personally, I would do this
as an integer array terminated by rule #0.  Can be a linked list if
you feel like wasting 3x as much memory.  (!)

	3. dg

dg is the structure used for storing a datagram, packet, whatever,
and is the parent structure for all of the decomposed views of the
payloads inside.  The memory for this structure belongs to the paengine
that creates it and is void after a verdict on the dg is returned to
the paengine.

	typedef struct _dg {
		pl* payload;
		mempool * m;
		verdict v;
	} dg;

	4. payload

pl is the datastructure for a payload at an arbitrary location in
the protocol stack.  A dg starts off from the paengine with a single
payload, that of the data link medium.  In the course of decomposition,
the protocol engines create new pl structures and prepend them to the
linked list kept in the dg; this memory is allocated out of the memory
pool associated with the dg.

	typedef struct _pl {
		int protocol_family;
		int protocol_number;
		void *protocol_specific_attributes; /* defined by the pe */
		size_t length;
		void *payload;
		dg* parent;
		pl* prev;
		pl* next;
	} pl;

II. memory management

See the section, "IV. an aside on memory management", above, for the
idea behind this interface.  The mempool interface is described as
an abstracted interface, again to allow pluggable implementations.

Here is the prototype for a mempool object:

typedef _mempool {
	void* (*alloc)(mempool *m, size_t size);
	void (*delete)(mempool *m);
} mempool;

	1. pool creation

Pools are created by invoking the global "mempool_create" function.
This function returns a mempool object.

	2. allocation

The "alloc" method off of a mempool object returns either a pointer
to the requested amount of memory or NULL, in which case errno is set
appropriately.

	3. pool deletion

The delete method off of a mempool object deletes the object and all
associated memory.

III. pe interface

As is usual for proposals from this offer, the pe interface is an
object-oriented interface in C, amenable to modular implementation
and run-time loading.  Here is the prototype:

typedef struct _pe {
	uint16 ver_maj;
	uint16 ver_min;
	proto* supported_protocols;
	int    (*add_match)(int operator, char *rule_def);
	int    (*del_match)(int match);
	int*   (*list_matches)(void);
	char*  (list_match)(int rule);
	int    (*decompose)(dg *d);
	int*   (*find_matches)(dg* d, int operator, int* candidates);
	void*  protocol_specific_operations;
} pe;

	1. versioning

pe's export two 16-bit integers, a major and a minor, to denote their
version.  The purpose of this advertisement is to assist post-processing
elements in determining the exact form of the protocol_specific_operations
pointer.

	2. protocol support

This is an array of protocol declarations.  Used by snort to discover
which plugin supports which protocols.

	3. match examination

list_matches() and list_match() allow the examination of the ruleset in
place with this pe.  list_matches() would return an array of int[2],
the two elements of which would indicate the match instance number,
and the operand.  The match instance number can be used to retrieve
the textual description of the match via the list_match() function.

	4. rule modification

add_match() takes the operand and a textual description of the match
criterion.  The syntax of the criterion is determined by the pe.

del_match() is used to zap a match criterion and takes the match number
as an operand.

	5. decomposition

decompose() takes in a dg, the first payload of which is a protocol
supported by this pe.  This pe is supposed to perform any analysis it
needs on this packet (which can optionally be deferred until an actual
request for matching), and then to determine if this data contains any
subsidiary payloads, in which case a new pl should be constructed and
prepended to the pl list for the dg.  If a subsidiary payload is decoded,
then the core will realize this and submit the dg to the appropriate pe
for the next stage of processing.

(I debate whether to explain this further, say with an example, but
by this point in the document, it should be obvious what I mean here.)

	6. matching

find_matches() is fed a dg, which needs already to have been decomposed
by this pe, an operator, and an array of match id's.  This array is
terminated by the sepcial match id "0" and can consist of many match
criteria, one criterion, or the null set, which is taken to mean "all
criteria of this operand".

This function returns an array of match ids (allocated from the context
of the dg, of course) that match the packet and which are a subset of
the submitted set of candidates.

	7. support functions

protocol_specific_operations is an escape valve through which pe's can
do whatever the hell they want.  Anticipated uses for this feature are
for exporting protocol decoding assistance for post-processing.  The
precise meaning of this void* should be well-documented, and users should
be certain to check the version numbers on the pe to make sure that it
has the exact meaning that they anticipate.

IV. matcher interface

As described in the design criteria, above, the matcher is intended
to be a replaceable component in order to make optimization work as
easy and painless as possible.

typedef struct _matcher {
	int*  (*list_rules)(void);
	char* (*list_rule)(int rule);
	int   (*add_rule)(char *description);
	int   (*del_rule)(int rule);
	int*  (*get_matches)(dg* dg);
} matcher;

	1. rule examination

list_rules() returns an array of rule ids.  list_rule takes a rule_id
and returns the description of that rule.  This memory isn't managed,
which is almost certainly a bug.

	2. rule modification

add_rule takes a textual description of a rule.  This description
consists of a combination of "(", ")", "&&", "||", "!", and a 32-bit
integer indicating the match id.  The rest of this API doesn't really
support match ids as global entities, in that there's no support for
associating them with a pe.  If you've read the document this far, then
congratulations; you've reached the point where my exhaustion overcame
my desire for completeness.  This item needs fixing before deployment.

	3. matching

This is the main interface.  The core calls this with a dg* fresh from
the paengine and an associated mempool context.  The matcher returns an
array of rule ids that match the datagram.  If this set is non-null, then
the datagram needs to be fully decomposed by this point.  All associated
memory needs to be allocated out of the mempool associated with the dg.

V. core support

This interface requires several items in the core.

	1. pe discovery

pe* find_pe(int proto_desc[2]);

This returns the pe for the indicated protocol.  (I am still suprised
that this is valid C because this form is used so rarely.)

	2. mempool

mempool* mempool_create();

This interface returns a mempool object.

VI. core pseudocode

Marty has long said that eventually snort will be 10 lines long with
1000 plugins.  Well, this is close:

extern config *c;
int
main(int ac, char *av)
{
	dg *d;
	ruleset *r;

	c=configure(ac, av); /* load config */
	c->do_rules();       /* load rules */
	pa->setup();         /* setup paengine */
	
	while(c->continue){     /* set via signal */
		if(need_rule_action) /* set via signal */
			c->do_rules();    /* allow any rules to be modified */
		d=pa->get_packet();
		matcher->get_matches(d);
		if(ruleset_cnt(d->r)){
			options->dispatch(d);
		}
		pa->dispose_packet(d, d->v);
	}
	exit(0);
}

The only big difference with a multi-threaded case is that the
need_rule_action would need an accompanying semaphore to block all
threads, allowing rule updating to happen.

************************************************************************

--
Todd Lewis
tlewis at ...255...







More information about the Snort-devel mailing list