[Snort-devel] initial results on benchmarking paengine code

Todd Lewis tlewis at ...120...
Mon Jan 8 19:07:59 EST 2001

As I mentioned earlier, I've hoisted the main loop for snort into
snort.c:ProcessPacket().  It consists of four parts: acquiring the
packet, grinding it, rule processing it, and disposing of it.

I am interested in how much of a payoff parallelizing packet processing
in snort might yield.  If we know in advance that the yield is low,
then it doesn't pay to do the work.  I decided to benchmark these four
steps to see how expensive each one was.

I've used the pentium time stamp counter feature to clock each of
these four functions, and the results are as follows:

acquisition: 19237
grinder:      1510
rules:        2045
disposition: 32138

These are the average number of machine instructions that each step took.
The sample size was 10,000 packets.  My machine is a 466 MHz Celeron
which actually performs around 471 million instructions per second.  For
reference's sake, the getcwd syscall takes as little as 9256 instructions.

This was a bad test for several reasons:

1) They were all ping packets, so the rule processing wasn't especially

2) They were all default-sized ping packets, so the cost of copying
lots of data around wasn't captured.

3) The acquisition stage may have been exagerated, since I didn't
perform a sleep or anything to ensure that a packet was ready, relying
instead on the hope that 'ping -f' would always outstrip snort;

4) I don't have any pcap numbers against which to compare them.

I do not know if acquisition is the only stage that needs serialization.
It will certainly vary from paengine to paengine.  However, if my
conjecture on that point is correct, then packet processing can be sped
up by a maximum of a factor of three by parallelizing the rest of the
processing, as a rough estimate.

It's also possible that the whole thing is parallelizeable, in which
case it will level off at some point, but that point is probably well
above a factor of three.

Finally, if the middle two stages prove to be more expensive under
real-world scenarios, as I suspect that they will be, then the payoff
can be greater than three.

Certainly this merits pursuing for the case of dual-processor systems and
probably is worthwhile for quad-way machines as well.  After my paengine
work for 2.0 is done, I plan to begin this performance work in earnest.

Todd Lewis                                       tlewis at ...120...

  God grant me the courage not to give up what I think is right, even
  though I think it is hopeless.          - Admiral Chester W. Nimitz

More information about the Snort-devel mailing list