[Snort-devel] [hacker at ...104...: Unicode and IDS evasion]

Martin Roesch roesch at ...48...
Mon Nov 6 11:01:28 EST 2000

Fyodor wrote:
> Gee.. we definetely need to have some feedback from the hosts/systems which
> we monitor if we'd want to stay effective with IDS. if we take
> the same approach here as we took with http encoding (i.g. just
> deploy plugin to decode it), we will miss numerous attacks related to UNICODE
> as well. Integration with nessus database is one option, other thoughts?

I think we're going to need to eventually have a HTTP/UTF8 parser/decoder that
can hand the decoded application layer data off to a content pattern matcher
that just looks at the web content.  Fortunately, the guys at SiliconDefense
are working on something that'll be very helpful in performing this task.

> What we could have is some sort of internal database (need to think of a
> fast search method for it by IP address as a key)  to lookup for host-specific information.
> Maybe some keyword(s) for rules (i.g. generic) could be introduced as well,
> since we probably will need to point out some rules which do not require
> host specific info...
> any thoughts?

Target-based IDS.  I've described this concept in detail on the IDS mailing
list, and I know how to implement a system like this that'll do what we need. 
We'll definitely look into implementing something like this in The Future
(2.0?), but right now we'll just have to be clever in how we handle this
(multi-path the application layer normalization so that it has both the
normalized and un-normalized data available?).

We'll think of something, but I'm focused more on getting 1.7 out right now.


> ----- Forwarded message from Eric Hacker <hacker at ...104...> -----
> From: Eric Hacker <hacker at ...104...>
> Date:         Sun, 29 Oct 2000 23:06:39 -0500
> To: FOCUS-IDS at ...84...
> Subject:      Unicode and IDS evasion
> X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
> Reply-To: Eric Hacker <hacker at ...104...>
> Unicode and IDS evasion.
> Everyone by now is aware of the IIS Unicode vulnerability [1]. Robert Graham
> had mentioned in a previous post [2] that his company had a UTF8 parser
> coded and released within hours of the announcement to catch this attack and
> its variations.
> I got to thinking about that. Wow, a UTF8 parser in a couple of hours? That
> canPt be right, with all the different language options and whatnot, I doubt
> IPd even be able to make a list of things to check in a few hours. Bruce
> Schnier has warned about the complexities in Unicode processing and what it
> might mean to security [3], now wePve seen a vulnerability. To get it right
> in a few hours seems very difficult.
> So I took a look at the write up by Network Ice on what they were looking
> for. [4] They say:
>    UTF8 is a multibyte character set, which means it can
>    use one, two, or more bytes to represent a single
>    character. It is used to represent non-English
>    characters beyond the traditional 7-bit ASCII. In
>    particular, it is used for far-east characters such
>    as Chinese, Japanese, and Korean. In all, UTF8 can
>    represent over 30,000 different characters.
> Wow. 30,000 different characters! What does IIS do with all those? I started
> looking for documentation from Microsoft to answer that, but still havenPt
> come up with anything clear. I read on:
>    By using multiple bytes to represent a traditional
>    7-bit ASCII character, an intruder can evade an
>    intrusion detection system (IDS) or compromise a web
>    server by evading cononicalization/normalization.
> Ah yes, evade an IDS. That was what I was thinking. Why everyone well
> schooled in Ptacek and Newsham [5] or more recently Hoglund and Gary [6]
> will realize the difficulty in keeping IDS synchronized with the server.
> Unicode doesnPt make that any easier. I read on:
>    Cononicalization/normalization is a process whereby
>    a web server strips off the "backtracking"
>    subdirectory of "../". By URL encoding the backtracking
>    subdirectories, an intruder could bypass this process,
>    and thereby access any file on the system.  ...
>    With UTF8 encoded backtracking, it might look like:
>    http://networkice.com/something/%C0%AF%C0%AF/default.htm
>    This alert triggers when such an attempt has been made.
> First, I thought %C0%AF was a P/P and %C0%AE was the P.P, but I could have
> crossed my test results. What really concerns me though, is that my
> understanding of English would say that the above means that they only alert
> on UTF8 encoded backtracking attempt. That doesnPt help much on the evasion
> front. So then I thought maybe it wasnPt relevant for other characters,
> because there was no other way to encode them.
> Lacking a clear table indicating how IIS interprets UTF8, I did some
> testing. I ran through some potential UTF8 codes on my unpatched W2K IIS
> test server. I examined the logs to determine what IIS thought the URL was.
> I found thirteen representations for the letter PaP. I tested all of these
> and successfully retrieved a URL (http://myserver/a.txt) encoded with them.
> That doesnPt bode well for IDS trying to monitor for hostile activity. If
> a.txt was a vulnerability, would any IDS vendor have caught it?
> A search of various (but not all) IDS vendorPs web sites does not bring up
> any declaration of full Unicode support. Other than Network ICE, I really
> didnPt see mention of it. It is unclear to me exactly what UTF8 characters
> NetworkIce detects. I feel fairly confident that UTF8 encoding of an attack
> would bypass most if not all network IDS today.
> Do you want a code page with that?
> IPm no IIS or W2K wizard, but from what I can tell, it seems that when W2K
> is set up for different languages, then the interpretation of UTF8
> characters will be different. I found this in the IIS documentation
> regarding code pages:
>    A code page can be represented in a table as a mapping
>    of characters to single-byte values or multibyte
>    values. Many code pages share the ASCII character set
>    for characters in the range 0x00 - 0x7F.
> If one thinks synchronizing the IDS to the way overlapping fragments are
> dealt with by the TCP/IP stack of various OSs is difficult, try matching the
> code pages the web server is using. This is sure to be a major hassle for
> IDS monitoring of non-English versions of Microsoft IIS.
> A light at the end of the tunnel?
> All is not lost, however. Redirecting the focus of the IDS and standard web
> server coding practices should alleviate much of this problem. I call it
> client side normalization. I introduced it in my paper [8], here I will
> apply it to the Unicode parsing problem.
> A URL may consist of a resource location and perhaps data being submitted.
> The data is typically preceded by a P?P. The resource location portion of a
> URL is almost always generated by the web server owner. Web browsers do not
> change the form of the resource location when submitting data. In this way I
> claim that web browsers represent client side normalization.
> Certainly this cannot be trusted. There are many ways for one to send any
> URL one wants to a web server. However, the normal URLs being submitted will
> be those previously generated by the server itself by normal web browsers.
> If it was not in this set, it probably is an event of interest.
> Web server owners should not create URLs that require reduction or
> cononicalization. Why use a multi-byte character to represent PaP?
> Therefore, when reduction or cononicalization takes place in the resource
> location, it is an event worth noting. I repeat: the presence of reduction
> or cononicalization within the resource location part of a URL is itself
> anomalous, likely malicious, and worthy of a NIDS alert.
> It is highly likely that the same can be said for the data portion of the
> URL. Client side processing can be used to validate data within an accepted
> character set. Again, this does not provide assurance. It does identify data
> that contains non-standard encoding as anomalous.
> Thus, by looking for reduction or cononicalization, itself, one can generate
> alerts for likely malicious traffic and solve much of the Unicode problem.
> Perhaps by using an overly inclusive character set that includes all
> reduction possibilities from all languages one can avoid the language code
> page problem. Otherwise the code page problem will require extensive
> configurability within the IDS to protect international Unicode capable
> services.
> Is it pattern matching or is it protocol analysis?
> I have an interesting tidbit to add to all this. The other day I was reading
> a draft white paper that was given to an acquaintance. The paper was
> supposedly authored by someone at NetworkIce and was arguing that third
> generation IDS that performed protocol analysis were much better than
> earlier IDS that just did pattern matching. The acquaintance said, "It all
> boils down to pattern matching in the end."
> I read the paper. I had to agree. There was nothing in that paper that
> identified any particular advantage of protocol analysis other than that it
> was a better way to reduce data before pattern matching. Yes, it is a better
> way to reduce data, but much of what the paper called protocol analysis is
> done by Snort already.
> Here however, is a clear advantage to protocol analysis. There is no way a
> standard pattern matching IDS can perform Unicode reduction. Sure someone
> could throw a protocol pre-processor into Snort to handle Unicode, but
> detecting this type of attack canPt be done with a reasonably sized pattern
> set. I can see no way of writing a pattern matching signature that will
> detect Unicode within a port 80 stream.
> I welcome any discussion on the ideas presented here. I recognize that my
> limited experience does not encompass all of reality and that my judgments
> may therefore be wrong. If you have evidence or ideas that suggest such,
> please discuss.
> Eric Hacker, GCIA, MCSE, CCSE
> Lucent NPS, Security Practice
> [1] http://www.securityfocus.com/bid/1806
> [2] http://www.securityfocus.com/archive/96/140752
> [3] http://www.counterpane.com/crypto-gram-0007.html#9
> [4] http://www.networkice.com/advice/intrusions/2000639/default.htm
> [5] http://www.robertgraham.com/mirror/Ptacek-Newsham-Evasion-98.html
> [6] http://www.securityfocus.com/focus/ids/articles/desynch.html
> [7] http://windows.microsoft.com/windows2000/en/server/iis/
> [8] http://www.securityfocus.com/focus/ids/articles/resynch.html
> ----- End forwarded message -----
> --
> Said a horny young girl from Milpitas,
> "My favorite sport is coitus."
>         But a fullback from State
>         Made her period late,
> And now she has athlete's fetus.
> _______________________________________________
> Snort-devel mailing list
> Snort-devel at lists.sourceforge.net
> http://lists.sourceforge.net/mailman/listinfo/snort-devel

Martin Roesch
roesch at ...48...

More information about the Snort-devel mailing list