[Snort-devel] more fun questions

Martin Roesch roesch at ...48...
Thu Dec 21 01:14:09 EST 2000

Todd Lewis wrote:
> > It initializes the interface, then hands packets up to the program when
> > requested.  The get_packet() call should actually be something like
> > (*acquire_packet)(&raw_pkt) which calls your packet acquisition function and
> > returns a filled in raw packet struct.  I think that this raw packet struct
> > should look something like this:
> >
> > struct _RawPkt
> > {
> >     snort_timebuf ts;  /* timestamp */
> >     u_int32_t caplen;  /* captured buffer length */
> >     u_int8_t *raw;
> > } RawPkt;
> >
> > Sharp eyed individuals will note that this is pretty much the same as what
> > libpcap hands you, but I'm pretty sure that this is the bare minimum you need
> > to have enough data to do the rest of the job (well, ok, you probably don't
> > need the timestamp).
> This, too, is fine with me.  Do you have any objection to my struct-based
> approach to doing this, i.e., my use of:
>         pa->acquire_packet(&pkt_buf, sizeof(pkt_buf));
> ?  Doing it this way is much nicer from the point of view of having external
> modules; this way, they just have to export a single symbol rather than
> five or six, and I think that:
>         pa->acquire_packet()
> is nicer than:
>         (*acquire_packet)();

This is fine by me, the struct based approach makes for cleaner code anyway.

> > Decoders are then called by passing the Packet struct and the RawPacket
> > struct:
> >
> > (*grinder)(&RawPkt, &p);
> >
> > "grinder" should probably be renamed to "decoder" or some such, since that's
> > the actual function it performs now.  So, the decoder stage gets handed the
> > raw data and a Packet struct to populate, does its job and when it returns the
> > Packet struct gets handed to the traffic analysis stage.  Traffic analysis
> > (detection) makes its decisions about the code and initiates any
> > responses/output based on detection events that are made.
> And all of this is done where?  I suggest ProcessPacket().

Yes, ProcessPacket() would be the right place for this.

> > > By hoisting this logic into the snort core, out of the paengine, I think that
> > > we would get several wins:
> > >
> > >         - it makes the paengines generic and therefore portable among
> > >           projects other than snort;
> > >         - it reduces code duplicated among paengines;
> > >         - it is easier to expand the range of supported actions on packets;
> > >         - it makes the whole thing multi-threadable, which can be
> > >           important for performance reasons, especially once snort is
> > >           in the critical path on firewalls.
> >
> > I'm not sure if I necessarily agree with embedding the decision logic into the
> > main processing loop.  Not all applications of Snort are going to be
> > interested in dispositioning the packets after the decision code runs, so it's
> > probably best to treat it in a more modular fashion.
> My reasoning for wanting to do it there is that there are memory
> management issues.  Even if you're not making decisions about packets, it
> is plausible that when you're done with your copy, you need to inform the
> engine so that it can do whatever it needs to do.  Granted, the present
> interface suggestion where the caller provides the buffer does not really
> support this, but consider if you will the following case, which I have
> discussed with another developer: your paengine establishes a buffer
> and registers it with the kernel, which sets it up as the DMA target in
> your network interface driver.  (On intel, I think this has to be in
> the bottom 16MB, so the kernel would probably give it to you instead,
> but whatever.)  Your driver could copy your packets over PCI directly
> from the NIC to your user-space buffer, which would be super-mongo fast.
> If you were to do this, and both of us think that this is the logical
> next step in optimizing throughput on packet acquisition, then snort
> would need to tell the paengine that it's done so that the paengine in
> turn can flag that buffer as free for use by the kernel and/or by the NIC.

So are you recommending bypassing the packet filtering/firewall interface in
the kernel?  I agree doing direct DMA xfers to userland would be faster (which
is how most NIC drivers *should* be sending data to the kernel bufs anyway). 
Does it make more sense if we're concerned about that level of performance to
maybe think about moving Snort into the kernel (which could break its
cross-platform ability severely).  

Memory management is going to be a serious issue if we want to be able to
shuffle packets around at will (such as into/out of the "packet switchyard"
that we've mentioned a few times).  

> > I'd actually recommend
> > an output plugin as the place to perform this disposition function, it's
> > logically executed in the same place as what you've done here and allows for
> > modular configuration at run-time (i.e. we don't execute the code if we're not
> > in "Gateway mode").
> The problem is that the output plugin would not know what the paengine
> was or what the paengine's ID for that packet was, unless it were all
> global, which I think is a bad idea.

Well, you'd activate a specific output plugin to put Snort in "gateway mode"
that would automatically assume that the proper paengine was being used. 
There's no law that says you can't pair plugin elements together.

> This general thinking was exactly why I had previously asked to extend
> the Packet structure to include the disposition flag, so that info could
> be passed back up to the paengine, rather than being smuggled back in,
> like it was in my USF prototype.

Adding a flag to the Packet struct is certainly doable, I've got no objection
to doing it.

> There are cases where output filters would be appropriate.  E.g., I was
> talking with some of the guys here the other day about the prospect of
> turning snort with pcap on Linux-2.2 (or even 1.2) into a user-space
> firewall.  It'd be easy: you grab packets with pcap, use ipchains
> (or ipfwadm) to add a firewall input rule to discard all traffic, and
> then use a raw socket to reinsert the traffic, which is not subject to
> input rules.  In this case, an output filter would be acceptable.
> However, that previous example would also work with my suggested
> mechanism, while, e.g., neither netfilter nor FreeBSD's divert sockets
> will work with output filters, at least not without breaking layering
> and having some global data around so that you can cheat.  Therefore,
> I still think that the better approach is to layer it this way:
> acquire packet -+                                  +-> dispose of packet
>                 |                                  |
>                                          +-> munge packet -> process packet +
>                                       |
>                                                                                                   +-> output packet
> Disposing of the packet, at least in some packet acquisition interfaces,
> is different from just re-outputting it, and so it needs to be at the
> same layer as the acquisition.
> > > The last one is important to me.  With Linux 2.4 having a completely
> > > multi-threaded networking stack in the kernel, it is conceivable that on
> > > an oct-way intel machine, you could user-space firewall gigabit ethernet
> > > at wire speed, which would be such a coup that I get a woody just thinking
> > > about it.  Ditto for the mainstream packet examination code.  (We would
> > > have to clean up some of the globals, etc., but it shouldn't be too bad.)
> >
> > How's the SMP code in the 2.4 kernels?
> In response to Microsoft's Netcraft study, the guys went in and completely
> parallelized the networking stack, putting fine-grained locks around
> everything instead of the previous global lock.  You can have the kernel
> running in networking code simultaneously on multiple processors now,
> which results in huge performance increases with multiple interfaces,
> like you have on firewalls or in-path packet inspection boxes.  If you
> can multi-thread your packet acquisition/examination/disposition engine,
> then you can continue those wins.  It's now possible for a 4-way SMP
> machine to be 4-times faster than a UP machine in these roles, and that's
> very exciting.

I agree, although I have a very limited supply of 4-way Xeon boxen to do dev
work on around here. :)

> > On uniprocessor machines (i.e. the
> > vast majority of machines Snort runs on) multi-threading the engine has yet to
> > be proven as a Good Thing since all the overhead to perform the context
> > switching may have an impact that overrides the benefits of multi-threading in
> > the first place!
> Absolutely, multi-threading is potentially a loss on UP machines, which
> is why if I were to try to thread snort, I would do it in a way that
> was easily runnable without threading or in single-threading mode on UP
> machines or platforms without SMP or threading support.  The work that we
> are doing today to cleave these pieces off into clean interfaces will make
> this process much easier, and indeed the threading of the code base can
> serve as an additional catalyst for cleaning up the internal interfaces.

Yep, this needs to happen and it is one of the reasons we're thinking about
going from 1.7->2.0.

> > Additionally, on *BSD kernels SMP is poorly supported, while the BSD kernels
> > ostensibly have the best packet acquisition interfaces they may not be the
> > best platforms for multi-threading, not to mention the additional support
> > costs of going multi-threaded (people have a hard enough time installing
> > libpcap and libnet right now, requiring a threading library is going to add
> > another variable to the mix).  Additionally, we need to agree on a threading
> > library that works on all the platforms that Snort works on.  Libpth?
> > Pthreads?  If we're going to multi-thread, we need to make sure that we can
> > answer these questions, or at the very least provide more than one compilation
> > path (#ifdefs for multithreaded code with a build time switch to activate it).
> That is a prefectly fair burden for people who want to introduce
> threading, and I am comfortable with it.  Personally, I would use
> pthreads, but code up an abstraction layer so that it could run on, e.g.,
> NT machines as well with a little work.  The glib threads abstraction
> would be a nice place to look for inspiration on how to do that.
> However, if these challenges can be met, and I believe that they can,
> then I think that threading can be a win.

Definitely, I just want to make sure we don't forget about the
non-linux/non-x86 crowd out there.

> > > So, this is my thinking.  Agree or disagree?  I would love to hear
> > > people's thoughts on this stuff, but if I don't then I'll just go ahead
> > > and code it along these lines.
> >
> > I like the general direction, but we need to think about how it's going to fit
> > with the non-Linux, non-SMP crowd. :) Remember, we're supporting over 21
> > platforms on a variety of architectures, so the base engine has to remain
> > compatible across all of these architectures!
> I am completely on-board with that.  I want for my work to benefit
> everybody and not leave anybody behind, and I hope that I serve as a
> positive influence on the directions in which the code evolves, which
> is why I am trying to talk these things over and establish consensus
> that the way I am approaching this work is the right way.

Cool, I think we can work all of this stuff out, it's just a matter of making
sure we're all on the same page with the base goals of Snort development. 
Sounds like we are (or pretty close anyway).

> --
> Todd Lewis                                       tlewis at ...120...
>   God grant me the courage not to give up what I think is right, even
>   though I think it is hopeless.          - Admiral Chester W. Nimitz

Martin Roesch
roesch at ...48...

More information about the Snort-devel mailing list