[Snort-devel] More work on fast string matching algorithms

Mike Fisk mfisk at ...86...
Thu May 31 23:29:27 EDT 2001


In January, the folks at Silicon Defense posted a paper they had done on
fast string matching algorithms for Snort.  We have concurrently worked on
the fast string matching problem (especially for large string sets) and
this mail describes some of the differences between our work and the
Silicon Defense approach, and provides a URL for our work.

In January, the folks at Silicon Defense posted a paper they had done on
fast string matching algorithms for Snort:
	
	http://www.silicondefense.com/software/acbm/

During the fall, myself and George Varghese at UCSD had independently
started pursuing the same line of reasoning.  We submitted an early
version of that paper to a conference in late January before learning of
the Silicon Defense work.  We have since posted a slightly revised paper
at the following URL:

	http://home.lanl.gov/mfisk/papers/ucsd-tr-cs2001-0670.pdf

It's interesting to see where we did and did not take the same approach.  
In particular, we chose not to mess with the Snort rule data structures
(or order of processing).  This led to some interesting work regarding the
grouping of pattern strings and caching the results of set-wise string
searches.  The benefit is that the order of rules (and consequently
alerts) does not have to change.  We also handled some cases that they
punted on, such as case-sensitive searches and multiple content strings
per rule.

Previous reviewers of our paper pointed out a some previous work done in
the theory community that we had not previously found.  The Silicon
Defense paper covers some of these algorithms, but we have since found
additional algorithms that deserve further study.  We intend to pursue
these algorithms further and post those results at a later date.

In the meantime, is anybody actively incorporating optimized string
matching in a current version of Snort?  If so, I would suggest reading
our paper as well.  I haven't looked at the code that Jason Coit released,
but from reading the write-up, I would speculate that our caching of
search results might be easier to implement and have similar performance.  
However I expect that neither our algorithm presented in our paper, nor
the suffix-tree based algorithm in the Silicon Defense implementation will
be the final word on best performance.

Thanks,
--
Mike Fisk, RADIANT Team, Network Engineering Group, Los Alamos National Lab
See http://home.lanl.gov/mfisk/ for contact information






More information about the Snort-devel mailing list