HTTP/FTP Seeding for BitTorrent

Many websites that list a BitTorrent™ download also list a plain old HTTP or FTP link for the same file for those people who don't use BitTorrent. The files are identical. Why shouldn't a BitTorrent client be able to download from either source, putting all the parts together into one complete file (and even sharing out the pieces it gets from other sources to its Peers as they're validated!)

There are several advantages to this to this addition:

There's always an "unchoked seed" where anybody can get started. It sometimes can take awhile to get a full piece so the tit-for-tat sharing can start working.
Speed. I frequently get a download rate from a single HTTP/FTP server that is faster than a BitTorrent download with many peers and seeds (and far faster than any single peer). And that's not even using the "acceleration" in GetRight to download from multiple HTTP/FTP servers at once.
It doesn't break or change anything for clients that don't recognize it. And clients that don't recognize it would still get the benefit from peers able to send them pieces originally from a HTTP/FTP server.
Compared to the size of a file, the overhead for the extra bit in the .torrent files is miniscule.
Some small changes to the piece selection methods can really reduce the number of connections and restarts needed on the HTTP/FTP side.
I've seen another HTTP Seeding proposal. A disadvantage of its method that it requires people other than BitTorrent client authors to do something :) Since just about every HTTP/FTP server already allows resuming, all existing servers can be used without any changes with the method GetRight is using.
While it would work better if the .torrent format was extended to include just a list of mirrors, it can be done without any changes to either the .torrent format, or HTTP/FTP servers. It just takes more clicking for a user...with GetRight, you have to click on the .torrent file as well as any mirrors you want.
I always think it's a good sign if something can be done without making anybody else change anything!

Proposed Addition to .torrent Files

I propose a new key/value in the .torrent metadata file but not part of the "info" section.

'url-list' = either a single item or a list of normal URLs.

For example (on multiple lines for readability):
d
8:announce27:http://tracker.com/announce
13:creation datei1128487910e
8:url-list26:http://mirror.com/file.exe
4:info...

If the "url-list" URL ends in a slash, "/" the client should add the "name" from the torrent to make the full URL. This should make it easier on Torrent generators and let them treat this field same for single file and multi-file torrents.

I have seen that .torrent files from bittorrent.com did include a http seed in this way. And I've read that they added WebSeeding, apparently using this method!

For Multi-File torrents, this gets a bit more interesting. Normally, BitTorrent clients use the "name" from the .torrent info section to make a folder, then use the "path/file" items from the info section within that folder. For the case of Multi-File torrents, the 'url-list' should be a root folder where a client could add the same "name" and "path/file" to create the URL for the request.
...
8:url-list22:http://mirror.com/pub/
4:infod5:filesld6:lengthi949e4:pathl10:Readme.txte
e4:name7:michael

So a client would use all that to build a url: http://mirror.com/pub/michael/Readme.txt

Implementation Notes for Clients

GetRight was already able to download the same file from several HTTP/FTP mirrors at once, a feature common in many download managers. When adding the BitTorrent protocols into GetRight, it made sense to just consider BitTorrent another protocol and be able to do the same.

HTTP/FTP are streaming protocols, and don't have BitTorrent's concept of blocks. For HTTP you can use byte-ranges to resume anywhere or download specific ranges you specify, but with FTP you can only say where to start the download. So I wanted to have big "gaps" in the data downloaded from BitTorrent peers so a HTTP/FTP connection would have big spaces to fill in. You could use the byte-ranges with HTTP to request individual blocks--but each request will show in the server's logs, and somebody is going to think your DoSing them if it shows 100's of connections in their log. So I made a couple changes to the usual "rarest first" piece-selection method to better allow "gaps" to develop between pieces. That way there are longer spaces in the file for HTTP and FTP threads to fill. They can start at the beginning of a gap and download until they get to the end.

This actually could be implemented differently. You could use the HTTP byte-ranges to request specific pieces and not worry about any server's logs. This method just fit in very well with all the code I'd already done for GetRight's accelerating files. Plus does minimize some connections and restarts.

How I'm defining Gaps in a file:

Gaps are spaces of multiple pieces in a row that I don't have.

Given a bitfield "YYnnnnYnnY" where Y is pieces it has and n are ones it doesn't, there are two gaps "YYnnnnYnnY", one of 4 pieces, and another of 2.

If everything else is more-or-less equivalent, it's better to pick a piece to do from the gap of 2 when requesting a piece from a BitTorrent peer.

In any gap, it will fill in from the end (ie, the highest piece number). That way a HTTP/FTP connection could be cruising along filling in pieces from the beginning and I don't have to do any reconnects for it until they meet in the middle.

Change the "rarest first" piece selection to a "pretty rare with biggest distance from another completed piece".

When scanning for the rarest piece, if the distance from another completed piece is less than that for the current rarest piece, it must be "rare-X". Or if the distance is greater than the current piece's, it can be rare+X to be picked as the next piece. (For no better reason than it seemed to make sense and scale, X is the square root of the number of peers minus 1.)

So if 3 peers had the current rarest piece, the normal algorithm would pick a piece where 2 peers had it...my changed algorithm would require that only 1 peer has the piece if that piece's distance from a complete piece was less than the gap for the current rarest piece.

If the gap is bigger and the piece is the same "rareness" or the usual "rare-1" that piece is selected. (So if the gap is bigger, a piece with either 2 or 3 peers would be chosen.)

So given "YYnnn1Yn2Y", unless 1 is a lot more rare than 2, it's better to pick piece 2.

I always do better explaining or understanding with some code logic to look at...

X = sqrt(Peers) - 1;
for (i=0; i<maxpieces; i++) {
   if (*IDon'tHavePiece*) {
      Gap++;
      if (*PeerHasPiece*) {
         PieceRareness = *Number of peers with the piece*;
         if (PieceRareness<(CurRareness-X) || (PieceRareness<=(CurRareness+X) && Gap>CurGap)) {
            CurRareness = PieceRareness;
            CurGap = Gap;
            NextPiece = i;
         }
      }
   } else {
      Gap = 0;
   }
}

Added an extra "fill-in-the-gaps" piece selection method.

This check is only done If a file is more than 50% complete--so you should have a large number of pieces that other peers will want to download.

Every few pieces (randomly, 1 in 10), it will pick the piece with the smallest gap from a complete piece. For the bitfield "YYnnnnYnnY" it would select the # one "YYnnnnYn#Y"

For the HTTP/FTP download part...

If the client knows it is part of a BitTorrent swarm, when first beginning it is better to start the HTTP/FTP download somewhere randomly in the file. That way it's more likely the first HTTP pieces it gets will be useful for sharing to the BitTorrent peers.

If a download is already going, a HTTP/FTP download should start at the beginning of the biggest gap. Given a bitfield "YYnnnnYnnY" it should start at 1: "YY1nnnYnnY"

If it downloads a piece, but the SHA checksum doesn't match, the connection is closed and that mirror is tossed from the list.

Multi-File torrents:

(Note, I haven't implemented this part myself, and will likely add more notes!)

These would likely have some additional smarts so it tries to start from 0 for the bigger files in the torrent, again so it can start the HTTP/FTP file and let it go until it finishes.

For torrents containing small files, several HTTP/FTP transfers may be needed for one Piece. It may make more sense to do those using BitTorrent! For example, if there were 100 1KB files, assuming 32KB Pieces (the worst case), it would take 100 HTTP/FTP transfers to do the files, but only 4 BitTorrent piece requests.

Changes

March 13, 2006: Added notes about multi-file torrents, and allowing url-list to end in / for either single or multi file torrents.

Thanks!

Some thanks for help with this...

Bram Cohen for creating this whole BitTorrent protocol!

Arvid Norberg (libtorrent.sf.net author) for helping clarify the Multi-File torrent parts.

PS

If you're writing a BitTorrent client, I wrote some notes that you may find helpful (especially if you're doing the DHT part of the protocol!)

Legal

BitTorrent and Torrent are trademarks of BitTorrent, Inc.