The following text is mirrored from http://schnarff.com/gnutelladev/docs/our-protocol.html. This document contains A Non-technical introduction to the Gnutella Protocol. This description was probably accurate circa 2001, but not is no longer an accurate description of Gnutella.
The Gnutella network is is a form of a distributed file sharing system. That is, each host connected to the network is in theory considered equal. In pseudo-distributed file sharing systems such as Napster or Scour Exchange, each client connects to one or more central servers. With gnutellaNet, there is no centralized server. Each client also functions as a server. This way, the network becomes much more immune to shutdown or regulation . Technical Theory of Operation:
The Gnutella network is a collection of Gnutella servants that cooperate in maintaining the network. A broadcast packet on the Gnutella network begins its life at a single servant, and is broadcasted to all connected servants. These servants then rebroadcast to all connected servants. This continues until the timeto live of the packet expires.
If all servants have eight other servants connected, one broadcast packet with a time to live of 7 can make it to 8^7 servants, which is 2097152. This is far more than enough to reach all the servants on the Gnutella network. A reply packet on the Gnutella network begins its life as a response to a broadcast packet. It is forwarded back to the servant where its initating broadcast came from until it gets back to the servant that sent off the broadcast.
To keep track of where a packet came from, each packet is prefixed with a 16 byte Message ID. The Message ID is simply random data. A servant uses the same Message ID for all of its own broadcast packets. Each servant keeps a hash table of the most recent few thousand packets it has received. The hash table matches the Message ID with the IP address the message came from. To route a reply packet back where it came from, a servant checks its hash table for the Message ID, and sends it back to the IP address the Message ID is matched to. This continues until the packet gets back home.
This results in a network with no hierarchy, every servant is equal. In order to be part of the network, one must contribute to the network. However, some servants are more equal than others. Servants running on faster internet connections are more suited to hub (maintain more GnutellaNet connections) than others, and therefore get responses from the network much faster.
Each Gnutella server only knows about the servers that it is directly connected to. All other servers are invisible, unless they announce themselves by answering to a ping or by replying to a QUERY. This provides amazing anonymity.
Unfortunately, the combination of having no hierarchy and the lack of a definitive source for a server list means that the network is not easily described. It is not a tree (since there is no hierarchy) and it is cyclic. Being cyclic means there is a lot of needless network traffic. Clients today do not do much to reduce the traffic, but for the GnutellaNet to scale, developers will need to start thinking about that. Connecting:
The initiator opens a TCP connection. The initiator sends 'GNUTELLA CONNECT/0.4\n\n'. The receiver sends 'GNUTELLA OK\n\n'. After this, it's all packets. Header:
Bytes 0 - 15: Message ID: A Message ID is generated on the client for each new message it creates. The Message ID is 16 bytes of random data.
Byte 16: Function ID: What message type the packet is. See the list of function types below for descriptions of the types.
Byte 17: TTL Remaining: How many hops the packet has left before it should be dropped.
Byte 18: Hops taken: How many hops this packet has already taken. Set the TTL on response messages to this value!
Bytes 19 - 22: Data Length: The length of the Function-dependent data which follows. There has been some discussion as to if this value is actually only 2 bytes and the last 2 bytes are something else. Seems to work with 4 for me. Also there is a question as to signed or unsigned integers. Don't know that either, I can't get gnutella to try and send a 2^31 + 1 byte packet :). List of Functions:
- 0: Ping
- 1: Ping Reply (Pong)
- 64: Push Request
- 128: Query
- 129: Query Reply (Hits)
A Ping has no body.
Routing: Rebroadcast packet through every available connection, except the one it was received from. Ping Reply (Pong):
Bytes 0 - 1: Host port: The TCP port number of the listening host
Bytes 2 - 5: Host IP: The IP addres of the listening host, in network byte order.
Bytes 6 - 9: File Count: An integer value indicating the number of files shared by the host. No idea if this is a signed or unsigned value.
Bytes 10 - 14: Files Total Size An integer value indicating the total size of files shared by the host, in kilobytes (KB). No idea if this is a signed or unsigned value.
Routing: Forward packet only through the connection that the Ping came from. Query:
Bytes 0 - 1: Minimum Speed: The minimum speed of serverants which should perform the search. This is entered my the user in the "Minimum connection speed" edit box.
Bytes 2 +: Search String: A NULL terminated character string wich contains the search request. Routing: Rebroadcast packet through every available connection, except the one it was received from. Query Reply (Hits):
Byte 0: Number of Items: Number of Search Reply Items (see below) which follow this header.
Bytes 1 - 2: Host Port: The listening port number of the host which found the results.
Bytes 3 - 6: Host IP: The IP address of the host which found the results. In network byte order.
Bytes 7 - 8: Host Speed: The speed of the host which found the results.
Bytes 9 - 10: Unknown: It could be part of the host speed field, or something unknown.
Bytes 11 +: List of Items: A Search Response Item (see below) for each result found.
Last 16 Bytes: Footer: The clientID128 of the host which found the results. This value is stored in the gnutella.ini and is a GUID created with CoCreateGUID() the first time gnutella is started.
Routing: Forward packet only through the Query came from. Push Request (Query Reply Reply):
Bytes 0 - 15: ClientID128: The ClientID128 GUID of the server the client wishes the push from.
Bytes 16 - 19: File Index: Index of file requested. See Search Reply Items for more info.
Bytes 20 - 23: Requester IP: IP Address of the host requesting the push. Network byte order.
Bytes 24 - 25: Requester Port: Port number of the host requesting the push. Search Reply Items:
Bytes 0 - 3: File Index: Each file indexed on the server has an integer value associated with it. When Gnutella scans the hard drive on the server a sequential number is given to each file as it is found. This is the file index.
Bytes 4 - 7: File Size: The size of the file (in bytes).
Bytes 8 +: File Name: The name of the file found. No path information is sent, just the file's name. The filename field is double-NULL terminated. Downloading:
Downloading is done by HTTP. A GET request is sent, with a URI that is constructed from the information in a Search Reply. The URI starts with /get/, then the File Index number (see Search Reply Items), then the filename. Example download request:
GET /get/1234/strawberry-rhubarb-pies.rcp HTTP/1.0\r\n Connection: Keep-Alive\r\n Range: bytes=0-\r\n \r\n
The server should respond with normal HTTP headers, then the file.
HTTP 200 OK\r\n Server: Gnutella\r\n Content-type:application/binary\r\n Content-length: 948\r\n \r\n
Uploading is done in response to a Push Request. The uploader establishes a TCP connection, and sends GIV, then the File Index number, then the ClientID128 of the uploader, and then the filename. Example: