[Snort-users] Questions/Suggestion: Which data to put in the DB?

Jed Pickel jed at ...153...
Mon Jul 31 18:47:41 EDT 2000


Hey Mike,

Thanks for your excellent comments. Responses are included below.

> Well, enough of that.  The real issue of this mail is the database
> interface/module.  I've been going through the RFC's, and have some
> questions/comments on the way snort is saving the data in the database.
> 
> Let's take a look at each protocol:
> 
> IP:
> --------------------------------------------
> RFC791                  Snort		bits
> --------------------------------------------
> Version                                 4
> IHL                                     4
> Type of Service         ip_tos	8
> Total Length                    	16
> Identification          ip_id		16
> Flags			 	3
> Fragment Offset         ip_off          13
> TTL                     ip_ttl          8
> Protocol                ip_proto	8
> Header Checksum                 	16
> Source IP               ip_src (x4)	32
> Destination IP          ip_dst (x4)	32
> Option                          	24
> Padding                                 8
> --------------------------------------------
> 
> Why split up the ip_src and ip_dst into four different elements?

Excellent question. I actually used a single field to represent an IP
address when I first wrote the database plugin. The reason I changed
to using four one byte ints instead of one four byte int was that I
felt it gave greater flexibility for queries and sorting. For example,
I could easily pull out every broadcast address by just looking at the
last octet, or if I wanted to just see alerts affecting a particular
class C range I could just look at the first three octets. I wanted to
avoid having to rely on a lot of processing of netmasks and such for
applications, and I wanted to make SELECTs and sorting as quick as
possible. The other reason I decided to split the IP into four fields
is that is generally how humans think about IP addresses, as four
separate numbers. In my opinion, it makes it easier for a human trying
to do analysis when working directly with a database. The negative
side of having four fields is that query strings are longer and harder
type.

I am curious to hear how others feel about this. There are certainly
good reasons to do it either way.

> Personally I would like to have them represented as two 32bits integers
> (and use a netmask to get the highest and lowest address for an select
> statement).  It would also be nice to have a timestamp in this table.

You can get the timestamp by joining the event table with the iphdr
where both the "cid" (count id) and "sid" (sensor id) fields are
equal.

> TCP:
> --------------------------------------------
> RFC793                  Snort		bits
> --------------------------------------------
> Source Port		th_sport	16
> Destination Port	th_dport	16
> Seq Number				32
> Ack Number				32
> Data Offset				4
> Reserved				6
> URG			th_flags	1
> ACK			   "		1
> PSH			   "		1
> SYN			   "		1
> FIN			   "		1
> Window                  th_win          16
> Checksum				16
> Urgent Pointer          th_urp          16
> Options                         	24
> Padding                                 8
> --------------------------------------------
> 
> Why not log the SEQ and ACK numbers too?  That would make it much easier
> to see the packets together as a session.  I'm not sure about the
> reserved and options fields, but I like to be able to take a look at
> everything when I'm investigating an incident. :-)

I left SEQ and ACK out on the first generation because there was no
stream reassembly and at the time I felt the amount of disk space they
would end up consuming outweighed the benefit of including them. Of
course I have changed my mind on this now as there are attacks that
always have the same SEQ and ACK numbers. :) This one is actually on
my todo list. 

Note that I am also in the process of writing code and database
structure to supporting TCP and IP options. Checksum, reserved, and
offset are the only other TCP fields I left out. I suppose I should
include those too so we can end up with a standard database format for
storing network data.

> ICMP:
> --------------------------------------------
> RFC792                  Snort		bits
> --------------------------------------------
> Type			type		8
> Code			code		8
> Checksum				16
> div 32 bit				32
> IP hdr + 64b data			256
> --------------------------------------------
> 
> There is a 32bits field here that is used for some of the values in the
> code field.  Why not log it?  The IP header and the 64 bits of data, are
> they of any interest?  Again, I like to see everything... :-)

I have certainly used those fields before in some incident analysis
and forensics work. I am like you in that I do like to be enabled to
see everything. 

Note that I am also working on storing the data portion of the
packet. I am planning to do this using either base64 or just ASCII
with the binary filtered out (depending on what command line options
are used to invoke snort - notably whether or not the -C is used with
-d or not). I can use the same base64 routine to store the 256 bytes
in the ICMP header.

> UDP:
> --------------------------------------------
> RFC768                  Snort		bits
> --------------------------------------------
> Source Port		uh_sport	16
> Destination Port	uh_dport	16
> Length                  uh_len          16
> Checksum				16
> --------------------------------------------
> 
> Seems that everything is logged... :-)

Unless you want the checksum.. ;)

> Suggestion:
> 
> What about letting the user choose which data to put in the database,
> and what names to put on the fields?  The best is probably to have this
> as a run-time configuration, but even being able to change this in an
> easy way before compilation would help. :-)

I would like to avoid this because people have built and are building
analysis applications based on this database format. There is room to
allow for different levels of logging (as some people do not care
about the details) while you and I do. :) But leaving column names as a 
configuration option could lead to chaos.

> And a question: 
> 
> What about develop a standard database layout for anomaly based IDS?  Or
> does this already exist?

There is an IETF working group working on this. You can check out the
details at:

    http://www.ietf.org/html.charters/idwg-charter.html

I am hoping to evolve this database work into a standard way to store
network data in a database, sorta like tcpdump format is the standard
way to store network data in a file. That can only happen by evolving
over time and people (like you) providing good comments. ;)

Regards,

* Jed




More information about the Snort-users mailing list