NoTorrent - A Peer-to-Peer Web Caching System

Howie Vegter
Columbia University
New York, NY 10027
USA
hrv2101 AT columbia DOT edu

CS 6901, Projects in Computer Science
Fall 2005
Advisors: Dr. Markus Hofmann, Dr. Henning Schulzrinne, Salman Abdul Baset

Abstract

NoTorrent is a system that uses peer-to-peer web serving in an attempt to counter the Slashdot Effect (also known as flash crowds or web hotspots).

Outline of this Report

Introduction
Architecture and Strategy
NoTorrent's Relationship to BitTorrent
Other Related Work
Source Code, Program Files, and Documentation
References

Introduction

The Slashdot Effect [1] gets its name from the following phenomenon. Slashdot.org [2], which is a website that aggregates links to articles about technology, has a readership of roughly half a million people (as of 10/11/2004) [3]. Many of the articles are on NYTimes.com and other websites that can handle sudden, large volumes of traffic. However, some of the articles linked to are on relatively obscure and diminutive websites, which do not expect and often cannot handle a sudden burst of web requests. The sudden large volume of traffic can often bring down or at least significantly slow down these small websites. This means that the Slashdot readers cannot access the content they requested. (See figure 1.)

Another example of the Slashdot Effect occurred on September 11, 2001, when many news websites were slowed to a crawl by the large volume of web requests. It is this phenomenon that we seek to address.

figure 1. Slashdot Effect

Architecture and Strategy

NoTorrent's setup is as follows:

figure 2. NoTorrent Architecture

Users run a NoTorrent client on their local machine. A designated tracker machine runs a NoTorrent tracker. The NoTorrent client consists mainly of:

a web proxy
a cache, and
a server.

The user configures his web browser to use the NoTorrent web proxy (see the README). When a user requests a webpage through his browser, the browser ask the NoTorrent proxy to retrieve the file. It is then the client's NoTorrent proxy's responsibility to retrieve the file.

1) Try to Retrieve Resource from the Origin Server

NoTorrent, as should any caching system, has a duty to "Do no harm." That is, NoTorrent is designed to be as transparent as possible and to only be used if it is necessary.

Thus, the proxy tries first to request the file from the origin server, as would be done if NoTorrent had not been installed. Only if the origin server fails to serve the resource will NoTorrent really get involved.

figure 3. Try to Get from Origin Server

2) Try to Retrieve Resource from Client's Cache

If the origin server fails to serve the resource, the proxy then looks in its local cache for the resource. If the client had earlier retrieved the file either from the origin server when it was still up or from a peer (discussed below), then the file will be in the client's cache, and it will be sent to the browser.

figure 4. Try to Get from Cache

3) Try to Retrieve Resource from a Peer

If both the origin server and the client's cache failed to serve the resource, then the proxy will try to retrieve the resource from a NoTorrent peer. To do this, the proxy must first contact the NoTorrent tracker. The NoTorrent tracker is a program that tracks which peers claim to have which resources and provides this information to clients upon request. (Note: A "client" and a "peer" both refer to a computer that is running the NoTorrent client. I will say "client" when I mean my computer and "peer" when I am referring to the other NoTorrent clients I am interacting with.)

Anyhow, the proxy asks the tracker which peers claim to have the requested resource, and the tracker responds with a list of peers that claim to have the requested resource. (If no peers claim to have the resource, an empty list is returned.)

The proxy then proceeds to sequentially ask the peers on the list for the resource. If a peer sends the file to the proxy, the proxy sends the file to the browser.

figure 5. Try to Get from Peers

Keeping the Tracker Informed

To keep the tracker's resource to peer list mapping current, the peers must send messages to it to keep it informed.

When a proxy retrieves a copy of a resource either from the origin server or from a peer, it sends a message to the tracker informing it that it has a copy of the resource.

When a client requests a resource from a peer and that peer fails to serve the resource, the client sends the tracker a message informing it that the peer failed to serve the content.

Cache and Server

When a proxy retrieves a copy of a resource either from the origin server or from a peer, it also caches a copy of the resource. The client's cache is used both to handle requests by that client and also to handle requests made by peers. The peers' requests are handled by the client's server, which waits listening for connections.

Justification of Architecture

Going back further in history of this project, I originally built a precursor to this project (called PeDiCache) for CS 6998.9, Content Networking, at Columbia in the spring of 2005. At the time I designed that project, I was not familiar with the architecture of BitTorrent. PeDiCache too used a tracker (then called an index server) and client proxies in a manner similar to NoTorrent. My rationale in choosing that architecture at the time was that, while any peer-to-peer system with a centralized server (and thus a single point of failure) was not a true peer-to-peer system, my experience (from several years prior) was that the original Napster (which was based on a centralized index server) operated much more smoothly than Gnutella (which is true peer to peer). Thus, I figured that a peer-to-peer web caching and server system was worth implementing as an experiment. There are other approaches to attempting to counter the Slashdot Effect (see Related Work below), but PeDiCache and NoTorrent are my attempts at a different approach.

Another benefit of this architecture is the benefit that is inherent in most peer-to-peer applications: While availability worsens as more users request content in the client-server model, in peer-to-peer applications, availability tends to increase as users request (and thus serve) content. That is, since peers serve content to other peers in a peer-to-peer application, more users on the system means there will be more servers available.

Complexity of the System

The system comprises about 4,000 lines Java code. All code was written from scratch. JDOM [4] was used to greatly simplify XML parsing in Java.

NoTorrent's Relationship to BitTorrent [8]

NoTorrent was originally intended to be a version of BitTorrent slightly modified so that it did not use .torrent files. As I worked with and researched BitTorrent further, I found that .torrent files are integral to almost every aspect of BitTorrent and that it could not just be "slightly modified" to be used without .torrent files. Thus, I decided it made more sense to write the application from scratch rather than start with the BitTorrent codebase and modify it. Still, NoTorrent was designed with BitTorrent in mind, and the overall structure is roughly the same.

The main purpose of BitTorrent is to enable peers to help each other download a large file. There are a few steps [5] to share a file via BitTorrent:

A .torrent file must be created for the file that is to be served (using a BitTorrent file maker utility). This .torrent file must be made available on a web server.
A BitTorrent tracker must be started to coordinate the peers.
There must be at least one seeder to host a full copy of the file.
Peers download and share files by running a BitTorrent client on their systems.

As I developed NoTorrent, it became clearer that NoTorrent would have less in common with BitTorrent than I originally thought. However, NoTorrent has several similar analogs to BitTorrent's setup:

Run a NoTorrent tracker to coordinate the files.
There must be at least one machine (either the origin server or a peer) that has a copy of the resource.
Peers can download and share files by running a NoTorrent client proxy on their systems.

In hindsight, NoTorrent is now only similar to BitTorrent in that

it coordinates peers with a central tracker,
it serves content in a peer-to-peer fashion, and
some of the messages passed between peers and the tracker are similar.

Other Related Work

Other approaches have been attempted to try to handle flash crowds on the web. Most of these approaches (including BitTorrent) require the content provider to somehow prepare for a flash crowd. These approaches are useful in situations where a flash crowd might reasonably be expected but the cost of building up a more robust server architecture cannot be justified. That is, the content provider only expects rare traffic spikes, not sustained heavy traffic. One such solution is DotSlash [6], which is elegant in that it requires no effort by clients; effort is only required on the part of the content provider. However, there are many cases in the problem at hand (namely maintaining availability of content linked to by Slashdot) where a content provider would have no expectation of being flooded by a flash crowd.

An approach somewhat similar to the one discussed here is the PROOFS [7] project. It is similar in that it uses peer-to-peer to address the problem of flash crowds. It is also elegant in that it requires no effort on the part of the content provider or any other centralized server; effort is only required on the part of the clients. PROOFS seems to be a promising approach to address the problem we have set out to solve.

How My Approach is Different

Most of the other approaches I have encountered required the content provider (the origin server) to prepare for a high volume of traffic. In NoTorrent, the origin server need take no action to prepare for a flash crowd. This is important, because flash crowds are typically unexpected (if they had been expected, the content provider likely would have set up a more robust server architecture).

For the problem domain I am addressing, it is the people who want the content (the Slashdot readers) who provide the capacity for others to obtain the content. Thus, incentives are more properly aligned.

Source Code, Program Files, and Documentation

README file describing how to run NoTorrent
notorrent_source_code.zip - NoTorrent source code
If you improve the code, please email me if you'd like me to post it here.
The NoTorrent source code uses the Code Project Open License (CPOL). In short, this means you can use the code as you want, but it is "as is," so don't hold me liable for any problems with it. :)
notorrent.tgz - project files. includes NoTorrent.jar, README, nttrack, and ntclient (see README)
Presentation Slides (ppt)
Documentation

References

1: Slashdot effect. 16 Dec. 2005. 19 Dec 2005 <http://en.wikipedia.org/wiki/Slashdot_effect>.
2: Slashdot: News for nerds, stuff that matters. 19 Dec. 2005 <http://www.slashdot.org>.
3: G4 - The Screen Savers - Slashdot's CmdrTaco, Sexy Storage, Open-Source Alternatives. 11 Oct. 2004. 18 Dec. 2005 <http://www.g4tv.com/screensavers/episodes/3540/Slashdots_CmdrTaco_Sexy_Storage_OpenSource_Alternatives.html>.
4: JDOM. 18 Dec. 2005 <http://www.jdom.org/>.
5: BitTorrent FAQ and Guide. 18 Dec. 2005 <http://dessent.net/btfaq/#maketorrent>.
6: Schulzrinne, Henning; Zhao, Weibin. DotSlash - An Automated Web Hotspot Rescue System. 18 Dec. 2005 <http://www.cs.columbia.edu/IRT/dotslash/>.
7: Stavrou, Angelos; Rubenstein, Dan. "A Lightweight, Robust P2P System to Handle Flash Crowds." <http://www1.cs.columbia.edu/~danr/publish/2004/Stavrou2004:PROOFS.pdf>. Journal on Selected Areas in Communications, Vol X, No. Y, ZZZ. 2003.
8: Cohen, Bram. BitTorrent. <http://www.bittorrent.com>.

Last updated: 2005-12-19 by Howie Vegter, hrv2101 AT columbia DOT edu