[Snort-devel] more fun questions
tlewis at ...120...
Tue Dec 19 18:47:39 EST 2000
Thanks for the feedback, Martin. Comments below.
On Tue, 19 Dec 2000, Martin Roesch wrote:
> > This entire loop is reminiscent of
> > snort.c:ProcessPacket(), and this fact has got me thinking, is the
> > paengine really the place to do this? After all, all of this
> > code is going to be duplicated in each paengine, and that's a very
> > good warning sign that your layering is wrong.
> Yes, it's wrong. :) The PaEngine code should merely provide an interface to
> the packet acquisition mechanism, not the actual main packet processing loop.
I am fine with that.
> It initializes the interface, then hands packets up to the program when
> requested. The get_packet() call should actually be something like
> (*acquire_packet)(&raw_pkt) which calls your packet acquisition function and
> returns a filled in raw packet struct. I think that this raw packet struct
> should look something like this:
> struct _RawPkt
> snort_timebuf ts; /* timestamp */
> u_int32_t caplen; /* captured buffer length */
> u_int8_t *raw;
> } RawPkt;
> Sharp eyed individuals will note that this is pretty much the same as what
> libpcap hands you, but I'm pretty sure that this is the bare minimum you need
> to have enough data to do the rest of the job (well, ok, you probably don't
> need the timestamp).
This, too, is fine with me. Do you have any objection to my struct-based
approach to doing this, i.e., my use of:
? Doing it this way is much nicer from the point of view of having external
modules; this way, they just have to export a single symbol rather than
five or six, and I think that:
is nicer than:
> Decoders are then called by passing the Packet struct and the RawPacket
> (*grinder)(&RawPkt, &p);
> "grinder" should probably be renamed to "decoder" or some such, since that's
> the actual function it performs now. So, the decoder stage gets handed the
> raw data and a Packet struct to populate, does its job and when it returns the
> Packet struct gets handed to the traffic analysis stage. Traffic analysis
> (detection) makes its decisions about the code and initiates any
> responses/output based on detection events that are made.
And all of this is done where? I suggest ProcessPacket().
> > By hoisting this logic into the snort core, out of the paengine, I think that
> > we would get several wins:
> > - it makes the paengines generic and therefore portable among
> > projects other than snort;
> > - it reduces code duplicated among paengines;
> > - it is easier to expand the range of supported actions on packets;
> > - it makes the whole thing multi-threadable, which can be
> > important for performance reasons, especially once snort is
> > in the critical path on firewalls.
> I'm not sure if I necessarily agree with embedding the decision logic into the
> main processing loop. Not all applications of Snort are going to be
> interested in dispositioning the packets after the decision code runs, so it's
> probably best to treat it in a more modular fashion.
My reasoning for wanting to do it there is that there are memory
management issues. Even if you're not making decisions about packets, it
is plausible that when you're done with your copy, you need to inform the
engine so that it can do whatever it needs to do. Granted, the present
interface suggestion where the caller provides the buffer does not really
support this, but consider if you will the following case, which I have
discussed with another developer: your paengine establishes a buffer
and registers it with the kernel, which sets it up as the DMA target in
your network interface driver. (On intel, I think this has to be in
the bottom 16MB, so the kernel would probably give it to you instead,
but whatever.) Your driver could copy your packets over PCI directly
from the NIC to your user-space buffer, which would be super-mongo fast.
If you were to do this, and both of us think that this is the logical
next step in optimizing throughput on packet acquisition, then snort
would need to tell the paengine that it's done so that the paengine in
turn can flag that buffer as free for use by the kernel and/or by the NIC.
> I'd actually recommend
> an output plugin as the place to perform this disposition function, it's
> logically executed in the same place as what you've done here and allows for
> modular configuration at run-time (i.e. we don't execute the code if we're not
> in "Gateway mode").
The problem is that the output plugin would not know what the paengine
was or what the paengine's ID for that packet was, unless it were all
global, which I think is a bad idea.
This general thinking was exactly why I had previously asked to extend
the Packet structure to include the disposition flag, so that info could
be passed back up to the paengine, rather than being smuggled back in,
like it was in my USF prototype.
There are cases where output filters would be appropriate. E.g., I was
talking with some of the guys here the other day about the prospect of
turning snort with pcap on Linux-2.2 (or even 1.2) into a user-space
firewall. It'd be easy: you grab packets with pcap, use ipchains
(or ipfwadm) to add a firewall input rule to discard all traffic, and
then use a raw socket to reinsert the traffic, which is not subject to
input rules. In this case, an output filter would be acceptable.
However, that previous example would also work with my suggested
mechanism, while, e.g., neither netfilter nor FreeBSD's divert sockets
will work with output filters, at least not without breaking layering
and having some global data around so that you can cheat. Therefore,
I still think that the better approach is to layer it this way:
acquire packet -+ +-> dispose of packet
+-> munge packet -> process packet +
+-> output packet
Disposing of the packet, at least in some packet acquisition interfaces,
is different from just re-outputting it, and so it needs to be at the
same layer as the acquisition.
> > The last one is important to me. With Linux 2.4 having a completely
> > multi-threaded networking stack in the kernel, it is conceivable that on
> > an oct-way intel machine, you could user-space firewall gigabit ethernet
> > at wire speed, which would be such a coup that I get a woody just thinking
> > about it. Ditto for the mainstream packet examination code. (We would
> > have to clean up some of the globals, etc., but it shouldn't be too bad.)
> How's the SMP code in the 2.4 kernels?
In response to Microsoft's Netcraft study, the guys went in and completely
parallelized the networking stack, putting fine-grained locks around
everything instead of the previous global lock. You can have the kernel
running in networking code simultaneously on multiple processors now,
which results in huge performance increases with multiple interfaces,
like you have on firewalls or in-path packet inspection boxes. If you
can multi-thread your packet acquisition/examination/disposition engine,
then you can continue those wins. It's now possible for a 4-way SMP
machine to be 4-times faster than a UP machine in these roles, and that's
> On uniprocessor machines (i.e. the
> vast majority of machines Snort runs on) multi-threading the engine has yet to
> be proven as a Good Thing since all the overhead to perform the context
> switching may have an impact that overrides the benefits of multi-threading in
> the first place!
Absolutely, multi-threading is potentially a loss on UP machines, which
is why if I were to try to thread snort, I would do it in a way that
was easily runnable without threading or in single-threading mode on UP
machines or platforms without SMP or threading support. The work that we
are doing today to cleave these pieces off into clean interfaces will make
this process much easier, and indeed the threading of the code base can
serve as an additional catalyst for cleaning up the internal interfaces.
> Additionally, on *BSD kernels SMP is poorly supported, while the BSD kernels
> ostensibly have the best packet acquisition interfaces they may not be the
> best platforms for multi-threading, not to mention the additional support
> costs of going multi-threaded (people have a hard enough time installing
> libpcap and libnet right now, requiring a threading library is going to add
> another variable to the mix). Additionally, we need to agree on a threading
> library that works on all the platforms that Snort works on. Libpth?
> Pthreads? If we're going to multi-thread, we need to make sure that we can
> answer these questions, or at the very least provide more than one compilation
> path (#ifdefs for multithreaded code with a build time switch to activate it).
That is a prefectly fair burden for people who want to introduce
threading, and I am comfortable with it. Personally, I would use
pthreads, but code up an abstraction layer so that it could run on, e.g.,
NT machines as well with a little work. The glib threads abstraction
would be a nice place to look for inspiration on how to do that.
However, if these challenges can be met, and I believe that they can,
then I think that threading can be a win.
> > So, this is my thinking. Agree or disagree? I would love to hear
> > people's thoughts on this stuff, but if I don't then I'll just go ahead
> > and code it along these lines.
> I like the general direction, but we need to think about how it's going to fit
> with the non-Linux, non-SMP crowd. :) Remember, we're supporting over 21
> platforms on a variety of architectures, so the base engine has to remain
> compatible across all of these architectures!
I am completely on-board with that. I want for my work to benefit
everybody and not leave anybody behind, and I hope that I serve as a
positive influence on the directions in which the code evolves, which
is why I am trying to talk these things over and establish consensus
that the way I am approaching this work is the right way.
Todd Lewis tlewis at ...120...
God grant me the courage not to give up what I think is right, even
though I think it is hopeless. - Admiral Chester W. Nimitz
More information about the Snort-devel