Skampling
Welcome to the page on skampling. An overview is provided on this page.
Table of Contents
Measurement on Networks
The modern carrier network, essentially the backbone of the Internet, carries huge volumes of traffic each day. Measuring statistics of interest on these networks is extremely challenging, due to the high speed nature of the traffic, strict requirements on processing time and the large volume of traffic that has to be collected, explained further below. Meeting both challenges is the difficult task of any measurement method.
Reasons for measuring these networks are many and varied, but in general, the following are typical reasons why measurement is crucial:
- Accounting Purposes Network operators want to be able to quantify the amount of bandwidth a customer uses accurately to price their rates accordingly.
- Anomaly Detection Attacks on networks are a common occurrence at present. A paralysed network causes huge losses in productivity for businesses that rely on networks, thus being able to detect attacks and defend is extremely important.
- Traffic Engineering Measurements are needed to provide data to network operators to troubleshoot and optimise the network, for example, by reducing congestion.
What to Measure?
A natural aggregate in a network is the flow. The flow is a series of packets with a common key, such as packets with the same source and destination address. Often, one is interested in the distribution of flow sizes, where the flow size is defined by the number of packets in the flow. Alternatively, flow size could be defined in terms of the total transmitted bytes.
Challenges
With regards to measuring flows, there are two main challenges:
- Line Speed Flows come into the network at a prodigious rate. For example, on backbone networks, the rate can be 10,000 flows per second and upwards! Any measurement method is time-limited, in the sense that processing operations must be done in a short time limit.
- Memory Requirements Each time a flow comes in, information such as its key and size needs to be collected. Given the large number of flow arrivals, the amount of memory required is very high. Collecting all flows may amount to Gigabytes of data after just one hour of measurement.
What is Skampling?
There are two approaches to solving the outlined challenges. The first is via sampling, where only a small subset of the traffic is collected. One example is Cisco's Sampled NetFlow, which implements packet sampling, say 1 in 256 packets are collected for measurement.
The second is via sketching, that is by using algorithms with small memory usage and fast updates to build an "approximate" picture of the original traffic.
Both methods, however, introduce estimation error and thus the goal is to develop techniques that has low error, while being computationally cheap. Hence, the project examines the merits of both sampling and sketching to develop a new class of skampling methods, so as to combine the best of both worlds.