Happy DPI release day, everyone! Three months ago we were talking about how DPI technology could be improved further. Essentially, we were not satisfied with the current state of things. And there is no viable BSD-licensed code that’s fit for use with high load and mission critical networking infrastructure. Then we read a lot of papers floating around regarding the topic and started hacking. Now we proudly present our Lightweight Inspection to the public.
Our key requirements were as follows:
- The engine MUST be open, free and simple.
- The engine MUST use as few bytes as possible for application matching.
- The engine MUST provide a decision after the first payload packet.
- The engine MUST work asymmetrically.
- The engine MUST NOT extract higher layer information.
- The engine MUST NOT match undocumented or encrypted protocols.
- The engine SHOULD classify larger protocol suites as a single application.
- The engine SHOULD only see the actual application data stream.
The concept of lightweight packet inspection has been expanded to be pseudo-stream-based instead of packet-based. This approach is best described as Lightweight Inspection. It means as few bytes as possible of the beginning of a flow are going to be inspected for know protocol patterns. Text protocols are mainly matched by their unique keywords, while binary protocols check magic values, version numbers, paddings, and general header consistency. A regular expression approach has been found not suitable for almost all binary protocols. Of course, only well-known application layer protocols can be matched in this way. Also, the inspection is designed asymmetrically: both directions of the flow are inspected independently. The accuracy is then improved by combining the results of both directions. That’s especially useful with applications that may look alike in one direction, but can still be distinguished in the bigger picture.
The source code is part of the libpeak library. It is written in C, can be found on Github and is released under the ISC license, so feel free to contribute back, fork or use it in your own commercial products as you see fit. A set of man pages has been provided to guide you through all individual modules and the peek(1) utility. The peak_li(3) manual page is quite extensive and covers a lot of internals and how to write new application match functions. The list of currently known applications is a follows:
Boarder Gateway Protocol, BitTorrent, Concurrent Versions System, Dynamic Host Configuration Protocol, Domain Name System, Datagram Transport Layer Security, File Transfer Protocol, Gnutella, Hypertext Transfer Protocol, Internet Key Exchange, Internet Message Access Protocol, Instant Messaging and Presence Protocol, Internet Relay Chat, Lightweight Directory Access Protocol, Network Basic Input Output System, OpenVPN, Post Office Protocol (Version 3), Point-to-Point Tunneling Protocol, Routing Information Protocol, Real-time Transport Control Protocol, Real-time Transport Protocol, Session Initiation Protocol, Simple Mail Transfer Protocol, Simple Network Management Protocol, Secure Shell, Session Traversal Utilities for NAT, Syslog Protocol, Telecommunication Network, Trivial File Transfer Protocol, Transport Layer Security, Extensible Messaging and Presence Protocol, Internet Control Message Protocol, Internet Group Management Protocol and Open Shortest Path First.
We have also added seamless IPv6 support to the mix, support for different network capture formats as well as a tree-based flow tracking algorithm to help working with the code. However, we refrain from releasing the library with a specific version number attached at this point, because the code is still subject to change and not considered fully stable. We work on it on a daily basis and will continue to push fixes and new application matching functions as we encounter them.
We also kindly ask you to share with us your thoughts and experiences with the library.
We have found that given the proper documentation a matching function can be written that makes stream-based detection on the first few bytes of a flow viable without jeopardising much accuracy. However, the presented approach does not work very well with undocumented or encrypted protocols. It is but one small step towards a vast pool of deep packet inspection technologies. Other types of DPI should be implemented alongside or on top of our Lightweight Inspection. In this regard, we will continue to work on increasingly difficult requirements and try to share our findings with you from time to time.
By itself, its main areas of application are: traffic shaping and policing, flow prioritisation and diagnostic monitoring. Of course, most of these scenarios require better means of packet input and output bindings for libpcap, netfilter, netmap, injection into the Kernel networking code and others to make use of the presented code. It is left to the reader to implement these bindings.
Care should be taken when implementing security related policies: the provided output does not verify the application so that evasive applications can disguise themselves as others. It should be kept in mind that due to the inherent nature of the given requirements a complex application cannot be described and verified in its full scope. Instead, this work is meant as a gatekeeper for selecting the appropriate encompassing application inspection.
We would like to acknowledge the fact that a lot of wonderful people have worked very hard in the field of DPI research without whom none of this would have been possible. Special thanks go to OpenDPI, libprotoident, Wireshark, Cloudshark and the authors of these insightful papers: PortLoad: taking the best of two worlds in trafﬁc classiﬁcation, Lightweight, Payload-Based Traffic Classification: An Experimental Evaluation and Libprotoident: Trafﬁc Classiﬁcation Using Lightweight Packet Inspection. Also, thanks go to the curageous lot of people that have offered valuable feedback, support and interest in the project. Cheers, guys!