the power of broadcast
[a revolution in mass-scale content distribution]
this page shows condensed results gathered from various konspire2b experiments. The results are divided into two groups: artificial and real-world. The artificial results come from konspire2b networks that were set up just for the purpose of running an experiment. These artificial networks run actual konspire2b node software and are not simulations. The real-world results come from experiments that were run on the live konspire2b network.
purpose of experiments
in theory, if we ignore protocol overhead, distributing a file using konspire2b should be log bounded in time. In other words, reaching N receivers should take on the order of log(N) time steps. This is an excellent theoretical bound: the log function grows very slowly as N becomes large. If you are not familiar with the log function, here is a table that shows how slowly it grows (for log with a base of 2):
however, protocol overhead diminishes this theoretical performance.
for a konspire2b network, protocol overhead includes the transfer time of protocol messages and the time it takes for each satisfied node to discover an additional unsatisfied receiver. This overhead is not proportional to file size, so we expect it to affect performance more for small files. However, as the number of satisfied nodes increases, there is more interference among these nodes as they try to discover additional receivers---they essentially fight over the remaining receivers. For example, in a 1,000,000-receiver distribution, we might have a final round in which 500,000 satisfied nodes try to transmit to the remaining 500,000 unsatisfied nodes. This is an extraordinarily difficult matching problem, since each of 500,000 nodes needs to find a partner from a set of an additional 500,000 nodes.
node interference may seem like an insurmountable problem, especially when we are trying to reach 1,000,000 receivers. However, certain factors might diminish the effects of node interference. For example, each satisfied node only searches its local neighborhood for remaining unsatisfied nodes, so interference is not network-wide. We would expect small pockets of nodes to interfere with each other, but never 500,000 nodes all interfering with each other together.
with these experiments, we hope to determine how closely konspire2b's performance matches the theoretical bounds.
artificial network results
64 konspire2b nodes (sender and 63 receivers) ran grouped on four separate machines (Intel GNU/Linux) connected together by a fast network. 15 receivers ran on the sender's machine, and 16 receivers ran on each of the other machines. All nodes were bandwidth-limited to match best-case ADSL performance (16 KiB upstream, 188 KiB downstream). Nodes were started one by one, and nodes discovered each other by connecting to a host catcher node (also running on the sender's machine). The sender node initiated file broadcasts of various sizes. The following graphs show how long it took the nodes to receive the each file.
as expected, for large files, konspire2b protocol overhead was dwarfed by file transfer times, so we have log-bounded distribution times. For example, the times are completely log-bounded when we send an 8 MiByte file:
for a 4 MiByte file, our distribution time is log bounded until the end of the distribution when protocol overhead has started to add up:
as the file size gets smaller, the protocol overhead becomes more significant. For a 1 MiByte file, our distribution time starts to creep above the log bound, though it is still quite good:
when sending files that are smaller than 1 MiByte, protocol overhead starts to dominate. However, log-like performance is still present in or 250 KiByte distribution:
for extremely small files, the data transfer time is insignificant, so distribution time becomes almost linear:
since these nodes were sharing the resources of only four machines and sharing the same local network, we should expect quite a bit of interference between them, especially when 31 data transfers were happening simultaneously during the last round of the broadcast. In a natural network, node interference would probably be reduced.
real-world network results
on May 23, 2003, during a network traffic surge (the result of Slashdot coverage), we made two measurements in the live network. In both experiments, our sender node was bandwidth-limited to 2 KiBytes/second upstream. Other nodes in the network were out of our control, so their bandwidth limits are unknown.
in these result graphs, we show representative lines for a best-case ADSL web server (with 16 KiBytes/second upstream) and a bandwidth-limited web server (with the same 2 KiBytes/second upstream limit as our k2b sender node). These lines show approximately how long it would take these web servers to reach all of the downloaders if they were saturated with demand (and we assume that each downloader could saturate a server's upstream connection).
First, we sent a 1 MiByte file on k2b_measure that was receive by 54 nodes:
notice that the k2b curve is quite logarithmic. The large latencies at the end are due to "straggler" receivers that did not fit neatly into an optimal distribution tree: this kind of behavior is likely in any real-world network where connectivity and reachability can vary dramatically.
As a demonstration of protocol overhead, we also sent a 1 KiByte file on k2b_measure that was received by 62 nodes:
though the k2b curve is still log-like, protocol overhead causes k2b performance to lag far behind ordinary web servers. Of course, k2b was designed for the distribution of large files, and 1 KiByte files are simply too small for k2b to be practical. However, even for files that are only 1 MiByte in size (which are by no means "huge"), k2b holds an advantage over traditional distribution methods. As file size increases, we expect the k2b advantage to become more significant.