Research Interests

I work in computer networking, with a particular interest in the modelling and measurement of tele-traffic, in particular the TCP/IP packet  data flowing over the Internet. In general terms the aim is to understand in greater detail how the traffic sources and network structure and protocols interact, with a view to making the network, and end applications, more efficient.  This has lead to work in a number of seemingly different areas including statistical estimation and clock synchronisation.

Clock Synchronisation

Software clocks in computers are based on local hardware synchronising to more accurate remote clocks. Currently the NTP system is used to synchronise hosts to remote servers across the Internet.  The stability of modern PC hardware however actually supports higher accuracy and robustness that NTP currently delivers.  We are developing replacement for the NTP clients and servers based on new principles, in particular the need to distinguish between difference clocks and absolute clocks, and the associated primacy of rate stability over absolute clock error.  The RAD difference clock, for example, can measure RTTs to under a microsecond, even if connectively to the time server is lost for periods of over a week!

The SyncLab Project has as its aim to provide a complete new system for network timing. Currently client software is available for Linux and BSD Unix which can connect to existing NTP servers. Download details, documentation and a number publications can be found on the project page.

This project has been made possible in part by a grants from the Australian Reseach Council, the Cisco University Research Program Fund at Silicon Valley Community Foundation, two Google Research Grant Awards, and a partnership with Symmetricom Inc. (now Microsemi).

Network Inference

By Network Inference we mean the application of sophisticated statistical techniques for the translation of imperfect network measurement data into understanding of the operation, mechanisms, state, use, performance, and fairness of the network. For example, the inference techniques of Network Tomography use data probes like X-rays to look `inside' the network body to locate overloaded links. Such a capability is valuable across the spectrum of network users: for the Internet public to determine who is responsible for slow downloads, for network operators to troubleshoot their networks, and for regulators to police compliance to Service Level Agreements.  I am active in following three directions within network inference.

Active Probing     Here test packets or `probes' are injected into the network, collected at a set of receivers around the network edge, and inferences made on the end-to-end path based on measured end-to-end delays and/or losses. My interests in this area range from the underlying measurement infrastructure, the `heuristic' design of effective probe streams and their analysis, and the rigorous application of queueing theory to active probing problems. My colleagues in this area include Attila Pásztor, François Baccelli, Sridhar Machiraju, and Jean Bolot.  The current focus in on the theoretical side, trying to build up a science of convex networks, a property which will allow optimal probing strategies to be well defined and devised.

Network Tomography     Whereas in active probing inference probes, typically, follow a single end-to-end path which is modelled as a sequence of queues, by Network Tomography we mean a class of inversion problems (which may or may not involve probing) which is much more ambitous in the spatial dimension (multiple sources and receivers over the network) but treats nodes using simple black box models for loss or delay. For example a link may be characterised simply by a single number, a loss probability.  My work in this area primarily concerns multicast probes which flow from a single source to multiple receivers, with copies being made at each branch point, tracing out a measurement tree in the process.  I have worked on loss, delay, and topology inference in this context, with a major focus on generalising beyond the classical simplifying assumptions of perfect spatial and temporal independence. My colleagures here include Vijay Arya, Nick Duffield, François Baccelli, and Rhys Bowden.

Route Tracking (advanced Traceroute)     One of the oldest probe based inference tools is traceroute, which makes use of features of the TCP/IP/ICMP protocol suite to trace out the IP-level path between a source and destination.   However, because of load balancing, a high proportion of routes in the Internet today have multiple branches, and failing to take this into account can produce meaningless topology inferences.  Paris Traceroute is a generalised Traceroute tool which attempts to trace routes as they really are, whether branched or not.  I work with Paris Traceroute researchers Renata Teixeira, Timur Friedman, Christophe Diot, and Ítalo Cunha in applying statistical ideas to the problem of controlling the error in what is effectively topology estimation, and in efficiently tracking (branched) route changes over time.

Traffic Sampling and Sketching

In resource constrained environments such as within core Internet routers, accurate measurement of traffic features and statistics can be difficult. Two canonical approaches to fast approximate measurement are:  sampling of the data, and sketching, which means the use of compact data structures which are fast to update, but which store information imperfectly.  My work in this area has focussed on the measurement of the flow size distribution (number of packets in a flow such as a TCP connection).  This is an important metric for numerous applications including traffic modelling, management, and attack detection. 

We evaluate data collection mechanisms in a Fisher Information framework, comparing various sampling and sketching approaches in order to determine which inherently captures the most information about the distribution.  We developed the Dual Sampling (DS) and the Optimised Flow Sampling Sketch (OFSS) (see below for OFSS code) methods which are both capable of being implemented at high speed.  This work is with Paul Tune.

Traffic Modelling and Time Series Estimation Tools

Packet traffic has scale invariance features, in particular long range dependence (LRD), which impacts on network performance, performance analysis, accuracy of simulation, and parameter estimation.  My underlying interest has been in traffic modelling, but in the analysis of real data, the need for more powerful estimation tools naturally arises and I have also worked extensively in this area. Much of the work here involves wavelets and is in collaboration with Patrice Abry from the Signal Analysis group of the Ecole Normale Supérieure de Lyon.  We confirmed that fractal traffic is real - not just an artifact of poor estimation tools - and introduced wavelet analysis to the area. Other colleagues include Patrick FlandrinMurad Taqqu, Walter Willinger, and Matthew Roughan. Associated Matlab code is available for download at the links below.
With my former student Nicolas Hohn and also Patrice Abry we developed models of packet arrivals based on cluster point processes which describe a number of features of backbone traffic parsimoniously. One of the conclusions is that TCP flows can be treated as independent in the Internet core!  Another is that multifractal models are not needed to describe some of the other features which may appear scaling if the statistical methods used to examine them are not powerful enough.



Downloads

OFSS (Optimized Flow Sampling Sketch) for high-speed estimation of the flow size distribution

The Flow Sampling Sketch, or FSS, is a skampling method, that is a hybrid between sampling and sketching, which allows the flow size distribution to be estimated with very low resource requirements in both time and memory.  The OFSS method is an optimally tuned/calibrated FSS with a statistical performance which is within a constant factor of Flow Sampling, which is known to be optimal.  The OFSS Matlab code allows the critical calibration parameter of OFSS, pf*,  to be calculated for any given input load alpha.  It also gives the associated amount of information, and minimal variance, of any estimator using the OFSS method. More details are given on the code page.

Wavelet based tools for estimation with scaling processes

The second order estimation code includes a number of related capabilities:
The multifractal estimation code goes beyond second order:

Clock Synchronisation

Details of the RADclock project (formally known as the TSCclock) and the current release can be found on the old RADclock page and the new (work in progress) SyncLab Project site.