konspire2b: a revolution in mass-scale content distribution

the power of broadcast
[a revolution in mass-scale content distribution]

main
download
[features] channels
start a channel
recommendations
k2b_measure
hostKatcher
[docs] the revolution
how k2b works
results
faq
k2b is like...
other systems
BitTorrent?
[community] foundation
forums
wiki
skins
[dev] bugs
developers
project page
coding credits

[donate]

hosted by:

visit:

a comparison with BitTorrent

BitTorrent is a system that spreads the load of file distribution among downloaders, reducing (or completely eliminating) bandwidth usage on the content server. The server splits the content file into chunks and creates a map file that describes each chunk. Downloaders retrieve this map file and then connect to a "tracker" server. The tracker maintains a list of downloaders that already have each chunk. Thus, a downloader can obtain the chunks that it needs from other downloaders and in turn serve back the chunks that it already has.

since BitTorrent's focus is on load balancing in the distribution of high-demand content (in other words, reducing the "Slashdot effect"), it might be seen as the closet "competitor" to konspire2b. Obtaining different pieces of a file from different sources to increase download rates is known as "swarming". Upon seeing the non-swarmed distribution tree used by konspire2b, many people ask, "why not swarm?" or even, "why not just use BitTorrent in the first place?"---some explanation is certainly needed.

in systems (such as eDonkey) where many nodes have a file and only a few nodes are downloading it, swarming makes sense---downloading from more than one source will certainly increase download rates. However, in a BitTorrent peer network, everyone that has pieces of the file is also trying to obtain the file. Does swarming really make sense in this kind of network?

the following quote is from the BitTorrent website (their boldface):

The key to cheap file distribution is to tap the unutilized upload capacity of your customers. It's free. Their contribution grows at the same rate as their demand, creating limitless scalability for a fixed cost.

the reality of the situation is that most downloaders are using asymmetric connections with far greater download capacity than upload capacity---download rates can be more than ten times faster than upload rates. Thus, the contribution of downloaders does not grow at the same rate as their demand: as an ADSL downloader, I want 192 KiB/second downloads, while I can only supply 16 KiB/second uploads (total) to other downloaders. As more ADSL downloaders request the file, their demand grows 12 times faster than their contribution.

in a recent paper (pdf) on economic models applied to BitTorrent, the author makes the astute observation that the total download rate throughout the BitTorrent peer network cannot exceed the total upload rate. Thus in a fair system, if every downloader has ADSL, and everyone is currently in the middle of downloading, then everyone would have a download rate equal to their upload rate. If this is the case, what is the point of "swarming" separate file chunks in attempt to increase download rates? If every downloader simply picked a peer node to upload the file to, the download rates would be maximized---no swarming necessary.

of course, BitTorrent counts on downloaders staying online for a while after their download is finished. With this added bandwidth, most ADSL downloaders are likely to see download rates that are slightly higher than their upload rate. However, this is not a dependable source of extra bandwidth.

according to conservation of bandwidth in BitTorrent, we can only count on each downloader uploading the file to at most one other downloader before it disconnects. This pessimistic assumption makes BitTorrent's worst-case content distribution properties relatively easy to analyze.

analysis of BitTorrent

as a simple example, consider a content file that is split into three chunks by BitTorrent. Each downloader, according to our pessimistic assumption, transfers the file to one other downloader before it disconnects. A downloader can start uploading to another downloader after it has received at least one chunk of the file. Since its download rate equals its upload rate, it will have another chunk ready to upload by the time time it has finished uploading the first chunk. If we assume that all downloaders have ADSL, there is no gain from a downloader uploading to more than one other downloader at a time: the other downloader can saturate its upload capacity completely.

Suppose that alice is the first downloader: she downloads directly from the original server. bob can be thought of as the "second" downloader, and he downloads from alice. Since alice is saturating the server's upload capacity with her download, bob must wait for alice to obtain at least one block before he can download from anyone. alice has unused upload capacity at first, but she has no blocks to upload. If we assume it takes 1 timestep to receive a chunk (and therefore 3 timesteps to receive the entire file), the interaction between alice, bob, and the server can be clarified by this table:

timestep:	alice:	bob:
1	downloads chunk 1 from server	waits
2	downloads chunk 2 from server	downloads chunk 1 from alice
3	downloads chunk 3 from server	downloads chunk 2 from alice
4	(kindly) waits online	downloads chunk 3 from alice
5	disconnects	disconnects

in this table, we can see a "cascading" effect as chunks are transfered through alice to bob: bob is always one timestep behind alice. Notice, also, that bob's upstream bandwidth is completely unused. If a third downloader, charles, joined into the system, he could receive chunks from bob, but he would always be two timesteps behind alice.

timestep:	bob:	charles:
1	waits	waits
2	downloads chunk 1 from alice	waits
3	downloads chunk 2 from alice	downloads chunk 1 from bob
4	downloads chunk 3 from alice	downloads chunk 2 from bob
5	(kindly) waits online	downloads chunk 3 from bob
6	disconnects	disconnects

as a fourth downloader, donna, enters the picture, she must wait 3 timesteps before chunks are available to her. However, she has a choice of two sources for these chunks: she can either get them as part of the cascade from charles or obtain them directly from the server (which has free upstream bandwidth by timestep 4). If donna gets the chunks from the server, a fifth downloader, eve, can get the chunks at the same time from charles:

timestep:	donna:	eve:
1	waits	waits
2	waits	waits
3	waits	waits
4	downloads chunk 1 from server	downloads chunk 1 from charles
5	downloads chunk 2 from server	downloads chunk 2 from charles
6	downloads chunk 3 from server	downloads chunk 3 from charles
7	disconnects	disconnects

thus, each new downloader that receives from the server starts the head of a cascade that might continue indefinitely, and a new cascade of this kind can be started by the server every 3 timesteps (when the server again has free bandwidth after finishing transmission to the head of the last cascade). Note that BitTorrent distribution is randomized, so these orderly cascades do not occur in practice (eve might first obtain chunk 3 from the server and then chunk 1 from another downloader). The orderly cascades presented here simply make the behavior easier to understand: each downloader must get each file chunk from somewhere, and the math is not changed by assuming that each downloader gets all three chunks from the same source.

if we expand our example to a larger group of receivers, we get a distribution tree made up of cascades. In the following diagram, the server is represented by a circle, and each rectangle represents a downloader. Time increases as you move down in the figure. The top of a rectangle represents when a downloader starts receiving a file, and the bottom of the rectangle represents when a downloader finishes receiving a file. Thus, each downloader rectangle is three "units" tall.

every three timesteps, the server is able to start a new cascade---the heads of these cascades appear on the right side of the tree, and the cascades descend toward the left. The red line marks off 4 "content timesteps", or the time it would take an ordinary webserver to transfer the full file 4 times. 4 content timesteps are equal to 12 of our chunk timesteps, since the file contains 3 chunks. When the red timeline has been reached, 22 receivers have the complete file, and several additional receivers have only a portion of the file.

for comparison, the following diagram shows a similar distribution tree for konspire2b. Again, time increases as you go down in the figure, and rectangles represent start and end times for receiver downloads.

in the konspire2b system, each receiver waits until it fully receives the file before selecting another receiver to send the file to. In the diagram, this behavior is visible: for example, there is no overlap in the receiver chain down the left side of the tree. Satisfied receivers continue transmitting to additional unsatisfied receivers until no more unsatisfied receivers remain. The red line again represents 4 content timesteps---15 receivers have been satisfied.

at this point, it looks like BitTorrent wins the race: it reaches 22 receivers in the time it takes konspire2b to reach 15. However, the following table shows why bounds can be tricky to analyze. Nb(t) represents the number of downloaders reached by BitTorrent in t content timesteps, and Nk(t) represents the number reached by konspire2b:

t (timestep)	Nb(t) (number reached by BitTorrent)	Nk(t) (number reached by konspire2b)
1	1	1
2	5	3
3	12	7
4	22	15
5	35	31
6	51	63
7	70	127

in the long run, konspire2b actually comes out ahead, but how far ahead? From the table, it is clear that Nk(t) = ((2^t) - 1) (= 2 raised to the power of t, minus 1). Finding a formula for Nb(t) is more difficult: by looking at the tree, we can see that in each content timestep, each of the existing cascades complete transmissions to 3 more downloaders, and the server completes transmission to one more downloader (also starting a new cascade). At timestep t, there are (t-1) existing cascades. So, we get the following formula for the number of additional receivers reached in timestep t: Nb(t) - Nb(t-1) = 3(t-1) + 1 adding Nb(t-1) to both sides, we get a recursive formula for Nb(t): Nb(t) = 3(t-1) + 1 + Nb(t-1) expanding this recurrence gives us: Nb(t) = (3(t-1) + 1) + (3(t-2) + 1) + (3(t-3) + 1) + ... + (3(1) + 1) simplifying, we get: Nb(t) = t + (3 * t * (t-1)) / 2 and finally: Nb(t) = t + ((3 * t^2) - (3 * t)) / 2 we can generalize this result to a file with c chunks as follows: Nb(t) = t + ((c * t^2) - (c * t)) / 2 the dominant term in this formula is (t^2), so Nb(t) = O(t^2) (is on the order of t squared) for a file with any number of chunks. Thus, in order to reach N receivers using BitTorrent, you need on the order of sqrt(N) timesteps. With konspire2b, you need on the order of log(N) timesteps. To illustrate the difference between these bounds, the following table shows how these functions grow (where log_2 is a log function with a base of 2):

N (number of receivers)	sqrt(N) (BitTorrent time)	log_2(N) (konspire2b time)
1	1	0
16	4	4
500	22.3	8.9
10,000	100	13.3
1,000,000	1000	19.9
1,000,000,000,000	1,000,000	39.9

in this analysis, the number of timesteps taken is equal to the number of transmissions made by the original server. Obviously, we would rather have timesteps log-bounded than square-root-bounded, especially as the number of receivers grows large.

keep in mind that the BitTorrent tree used in this analysis is the optimal tree if we assume that each downloader stays online long enough to transmit the complete file to one other downloader. Even if this assumption holds, the random nature of a BitTorrent network is likely to produce a tree that is less than optimal (with worse performance than what we derived in this analysis). We called our assumption about downloaders a "pessimistic" assumption earlier, but it is in some ways an optimistic assumption: downloaders that watch the progress bar are likely to disconnect from the network as soon as their download completes and before their node has a chance to finish the equivalent of a complete file upload to other downloaders. Other downloaders (for example, those that are away from their machines) are likely to remain connected to the network long after their download completes, so they might upload the file to other downloaders more than once. The longer downloaders stay online, the more BitTorrent's performance approaches konspire2b's performance.

konspire2b uses a completely different model for user interaction---with this model, most receivers are likely to remain online until the broadcast is complete network-wide. Each channel in konspire2b contains a series of valuable files, not just a single download. Users that disconnect shortly after a download completes are likely to miss future downloads that are of equal in value to their current download.

realistic numbers (constants matter)

one complaint about the above analysis is that we set c=3 in our equation for Nb(t). The constant c represents the number of chunks in a file, and very few files have only 3 chunks. If we instead set c=1,000,000, the results are quite different: BitTorrent transfers the complete file to millions of people in only a few timesteps, and konspire2b is left in the dust. However, each chunk is represented by a 20-byte hash string in the BitTorrent map file, so breaking a file into 1,000,000 chunks would result in a 20 MiB map file. Since every downloader gets this map file directly from the server, files with this many chunks are not practical.

as a real-world example, we can use the Mandrake GNU/Linux distribution. The 650 MiB file is broken into 256 KiB chunks, so there are just over 2500 of them. Thus, we can set c=2500. The following table shows the number of receivers reached by each system in a given number of content timesteps. Keep in mind that here, a content timestep is the time it takes to upload Mandrake over an ADSL connection, or roughly 11 hours.

t (timestep)	Nb(t) (number reached by BitTorrent)	Nk(t) (number reached by konspire2b)
1	1	1
2	2502	3
3	7503	7
4	15,004	15
5	25,005	31
6	37,506	63
7	52,507	127
8	70,008	255
9	90,009	511
10	112,510	1023
11	137,511	2047
12	165,012	4095
13	195,013	8191
14	227,514	16,383
15	262,515	32,767
16	300,016	65,535
17	340,017	131,071
18	382,518	262,143
19	427,519	524,287
20	475,020	1,048,575
21	525,021	2,097,151
22	577,522	4,194,303
23	632,523	8,388,607
24	690,024	16,777,215
25	750,025	33,554,431
26	812,526	67,108,863
27	877,527	134,217,727
28	945,028	268,435,455
29	1,015,029	536,870,911
30	1,087,530	1,073,741,823

with this many chunks, BitTorrent takes a huge lead early on. However, by 20 timesteps, konspire2b catches up: it has already reached a million receivers, while BitTorrent has not yet reached half of a million. In fact, BitTorrent has barely reached a million receivers by 30 timesteps---by that time, konspire2b has reached over a billion receivers. To reach a billion receivers BitTorrent would require nearly 900 timesteps (this is not shown in the table).

in today's world, a billion receivers sounds unreasonable, but a million receivers does not. However, konspire2b is only 30% faster for a million receivers. What about 10 million receivers? konspire2b can reach this many in 24 timesteps, while BitTorrent would take 90 timesteps (also not shown in the table). For that large of an audience, konspire2b is 73% faster.

who really cares about the numbers when they get this big? We do. We designed konspire2b so that it could scale to reach enormous audiences.

why not use BitTorrent?

given the above analysis, the answer is simple: using BitTorrent would prohibit the kind of mass-scale content distribution possible with konspire2b distribution trees. If you want to get your content out to a large audience as quickly as possible, konspire2b is the system of choice.

should BitTorrent and konspire2b be seen as competing systems? BitTorrent is designed to reduce the strain on a web server in the distribution of a file. konspire2b is designed to allow a sender to deliver a series of fresh files to a large group of receivers over time. Whether these systems are even attacking the same problem is a matter of much debate.