[Snort-devel] initial results on benchmarking paengine code

Martin Roesch roesch at ...48...
Thu Jan 18 23:22:31 EST 2001


Here's a question about SMP performance and Snort multithreading:

If the OS doesn't handle SMP very well (*BSD on x86, Linux<2.4, etc),
are there still going to be significant performance gains?  I'm not a
big MP/multi-threading guru, but if the SMP implementations on these
platforms are as limited as I've heard, how much gain are we really
going to see?

   -Marty

Todd Lewis wrote:
> 
> As I mentioned earlier, I've hoisted the main loop for snort into
> snort.c:ProcessPacket().  It consists of four parts: acquiring the
> packet, grinding it, rule processing it, and disposing of it.
> 
> I am interested in how much of a payoff parallelizing packet processing
> in snort might yield.  If we know in advance that the yield is low,
> then it doesn't pay to do the work.  I decided to benchmark these four
> steps to see how expensive each one was.
> 
> I've used the pentium time stamp counter feature to clock each of
> these four functions, and the results are as follows:
> 
> acquisition: 19237
> grinder:      1510
> rules:        2045
> disposition: 32138
> 
> These are the average number of machine instructions that each step took.
> The sample size was 10,000 packets.  My machine is a 466 MHz Celeron
> which actually performs around 471 million instructions per second.  For
> reference's sake, the getcwd syscall takes as little as 9256 instructions.
> 
> This was a bad test for several reasons:
> 
> 1) They were all ping packets, so the rule processing wasn't especially
> costly;
> 
> 2) They were all default-sized ping packets, so the cost of copying
> lots of data around wasn't captured.
> 
> 3) The acquisition stage may have been exagerated, since I didn't
> perform a sleep or anything to ensure that a packet was ready, relying
> instead on the hope that 'ping -f' would always outstrip snort;
> 
> 4) I don't have any pcap numbers against which to compare them.
> 
> I do not know if acquisition is the only stage that needs serialization.
> It will certainly vary from paengine to paengine.  However, if my
> conjecture on that point is correct, then packet processing can be sped
> up by a maximum of a factor of three by parallelizing the rest of the
> processing, as a rough estimate.
> 
> It's also possible that the whole thing is parallelizeable, in which
> case it will level off at some point, but that point is probably well
> above a factor of three.
> 
> Finally, if the middle two stages prove to be more expensive under
> real-world scenarios, as I suspect that they will be, then the payoff
> can be greater than three.
> 
> Certainly this merits pursuing for the case of dual-processor systems and
> probably is worthwhile for quad-way machines as well.  After my paengine
> work for 2.0 is done, I plan to begin this performance work in earnest.
> 
> --
> Todd Lewis                                       tlewis at ...120...
> 
>   God grant me the courage not to give up what I think is right, even
>   though I think it is hopeless.          - Admiral Chester W. Nimitz
> 
> _______________________________________________
> Snort-devel mailing list
> Snort-devel at lists.sourceforge.net
> http://lists.sourceforge.net/mailman/listinfo/snort-devel

--
Martin Roesch
roesch at ...48...
http://www.snort.org




More information about the Snort-devel mailing list