Program Documentation for NoTorrent

Howie Vegter
Columbia University
New York, NY 10027
USA
hrv2101 AT columbia DOT edu

CS 6901, Projects in Computer Science
Fall 2005
Advisors: Dr. Markus Hofmann, Dr. Henning Schulzrinne, Salman Abdul Baset

Program Abstract

NoTorrent is a system that uses peer-to-peer web serving in an attempt to counter the Slashdot Effect (also known as flash crowds or web hotspots). There is a NoTorrent tracker that keeps track of which peers have copies of which resources. There is also a NoTorrent client which is responsible for retrieving a resource that a user requested. The client also serves cached resources to other NoTorrent peers.

Return to main NoTorrent page.

Outline of this Report

System Requirements
Installation Instructions
How to Test NoTorrent and Demonstrate Its Operation
Program Internal Operation
Things that Do Not Work
Potential Enhancements
Acknowledgements for Code and Ideas Borrowed
References

System Requirements

NoTorrent is written in Java, so it should work cross-platform. It is written in Java 1.5, so the JVM must be at least version 1.5.

Installation Instructions

Download the NoTorrent program files: notorrent.tgz

Untar that file:

tar xzvf notorrent.tgz

Read the instructions in the README for installing, running, and configuring (via command line options) the tracker and the client.

How to Test NoTorrent and Demonstrate Its Operation

One way to demo the system is to simulate an origin server becoming inaccessible. Peers that retrieved a resource from the origin server before it went down can serve that content to another peer after the server goes down. To simulate this, do the following:

Start the client on a machine ("Peer 1") and configure the web browser to use the client proxy (see README)
Start the client on a different machine ("Peer 2") and configure the web browser to use the client proxy
Start the tracker on a different machine
Post a file (e.g. called "hereitis.htm") on a web server ("Origin Server") you have ftp access to
Have Peer 1 request hereitis.htm through its web browser (just put the URL in the address bar like a normal web request). Peer 1 will now have hereitis.htm in its cache, and the tracker will know that Peer 1 has it.
On the Origin Server, rename hereitis.htm to nowitsgone.htm. This simulates the Origin Server becoming inaccessible.
Have Peer 1 request hereitis.htm again. This time, Peer 1 will not be able to retrieve the file from the Origin Server (because it no longer has a file named "hereitis.htm"). Instead, it will retrieve the file from its cache (and it will say this in a NoTorrent message at the top of the page).
Have Peer 2 request hereitis.htm from the Origin Server. That will fail. Trying to retrieve hereitis.htm from its own cache will fail as well. Next, Peer 2 will ask the tracker who has hereitis.htm. After receiving the list of peers with hereitis.htm, Peer2 will request and receive hereitis.htm from Peer 1 (and it will say this in a NoTorrent message at the top of the page).
This demonstrated that Peer 2 was able to retrieve a file (from another peer) even though the Origin Server was down.

Program Internal Operation

The program consists of a client and a tracker, which we describe separately below.

Client

The client starts by initializing its cache, spawning several threads to handle proxy connections from the browser, and spawning several threads to handle resource requests from other peers.

Client Cache

The main part of the client cache is a hash table mapping URIs to resources (encoded as byte[]s).

Client Proxy

The client proxy listens for connections from the client's browser. The connections include HTTP requests for resources. The resources are then retrieved either from the origin server, the cache, or a peer, as described in the Architecture and Strategy section of the NoTorrent main documentation page.

Client Server

The client server listens for connections from other peers. The requests are XML-encoded messages containing the URL of the requested resource. The client server looks in its cache for the resource. It either returns the resource as a byte[], or it terminates the connection to indicate that it does not have the resource.

Tracker

The tracker starts by initializing its StateInfo table and then spawning several tracker threads to handle messages sent by clients.

Tracker's StateInfo

The tracker maintains a StateInfo object, which is basically a hash table that maps URIs to sets of peers that claim to have the resource associated with that URI.

Tracker Handler Thread

The tracker handler thread receives XML-encoded messages from clients. The main messages the tracker receives are:

PeerListRequest: This is a request by a client for a list of peers that claim to have a given resource. The tracker sends a PeerListResponse in response.
IHaveResource: This is how a client informs the tracker that it has a given resource.
PeerFailedToServe: This is how a client informs the tracker that a peer failed to serve a given resource.

Things that Do Not Work

Some web requests are not forwarded correctly to the origin server. Currently I parse the user's HTTP request for the URL, and then I make an HTTP request for that resource without passing any of the original message headers. Most pages can be retrieved normally in this manner. However, certain requests with the missing headers result in HTTP 403 Forbidden responses by the server. This problem could probably be fixed by just forwarding a client's HTTP request to the server instead of stripping it of all headers and creating a new HTTP request.

Additionally, Yahoo encodes the URLs in their search results with tracking information. For example, a search for 'columbia' returned a result with the following URL:

http://rds.yahoo.com/_ylt=AvJ7YJnlZa45cNDRLKUeG9pXNyoA;_ylu=X3oDMTE2aXNyZXZjBGNvbG8DdwRsA1dTMQRwb3MDMgRzZWMDc3IEdnRpZANGNjcxXzky/SIG=11c9ojmuj/EXP=1135155687/**http%3a//www.columbia.edu/

When not using the NoTorrent client proxy, the browser translates the URL in the browser's address bar to http://www.columbia.edu/. When using NoTorrent's client proxy, the URL remains the same. The pages still load, but this is an abnormality the user should not encounter (remember, we had said in the main NoTorrent documentation page, that we wanted to "Do no harm." That is, NoTorrent should never get in the user's way or provide worse content than if it hadn't been there). It is unclear at this point whether the problem is that I am not correctly handling HTTP 302 redirects or whether this too has to do with missing HTTP headers.

Potential Enhancements

The system currently works basically as it should in that the client proxy can retrieve resources from

the origin server,
the cache, and
other peers.

However, besides the normal issues of hardening and cleaning the code, there are several other ways the system could be enhanced:

First of all, the two issues discussed in the "Things that Do Not Work" section above should be fixed.
Currently, the tracker's resources table and the client's cache are never trimmed or cleaned. Thus, they grow indefinitely (until the machine they are running on runs out of memory). There should be some item eviction scheme put in place.
Any cache should address the problem of stale content. As mentioned, we currently do not evict any content. Some form of purging of stale content should be implemented.
Peers currently need to be able to connect to each other directly with a TCP/IP socket. This works fine in a closed lab like the CLIC lab. But for the system to work across NATs and across other network boundaries, I should consider using the STUN protocol [5] [6].
One other change I could consider would be to cache only resources whose HTTP Referer is set to a specific host (e.g. Slashdot.org). This would enable users to choose to only use NoTorrent in the context of Slashdot.

Acknowledgements for Code and Ideas Borrowed

As mentioned in the main NoTorrent document, this project was originally intended to be a modification of the BitTorrent codebase. That changed, but the overall structure of NoTorrent was more or less based on BitTorrent [1]. In particular, although NoTorrent messages are encoded in XML and BitTorrent messages are encoded in Bencoding ("BEE encoding"), I was heavily researching the BitTorrent protocol [2] [3] when I designed the NoTorrent messages. Thus, the NoTorrent message design was significantly influenced by the BitTorrent message design.

As mentioned earlier, NoTorrent messages are encoded in XML. Encoding and decoding XML messages could potentially be a headache. Thanks to JDOM [4], however, encoding and decoding XML messages was very easy.

References

1: Cohen, Bram. BitTorrent. 19 Dec. 2005 <http://www.bittorrent.com>.
2: Bittorrent Protocol Specification v1.0. 13 Dec. 2005. 19 Dec. 2005 <http://wiki.theory.org/BitTorrentSpecification>.
3: BitTorrent - Protocol. 19 Dec. 2005 <http://www.bittorrent.com/protocol.html>.
4: JDOM. 19 Dec. 2005 <http://www.jdom.org>.
5: RFC 3489. 19 Dec. 2005 <http://www.faqs.org/rfcs/rfc3489.html>.
6: STUN. 19 Dec. 2005 <http://en.wikipedia.org/wiki/STUN>.

Last updated: 2005-12-20 by Howie Vegter, hrv2101 AT columbia DOT edu