Distributed file storage
What is distributed file storage?
There are two broad categories of peer to peer applications:
- Distributed file sharing - These are applications which allow real-time sharing of content with other user's that are also on at the same time; however, this content does not 'live' on the peer network after a certain user that is sharing this content leaves the network. It does not 'persist' over time. Examples of this kind of network are Napster and Gnutella.
- Distributed file storage - These are a newer class of peer to peer applications where the peer network actually looks and feels like 'a giant hard-drive in the sky'. Files are saved into this 'space', and persist there even after the original peer that performed the insert has left the network. Examples of this kind of network are
- The Circle
- OceanStore which is aimed to store 1 mole of bytes (6 * 10^23) for 1000 years based on Tapestry providing multi-user read/write access.
- PAST is a secured distributed file storage based on Pastry which allow only data insertion, data extraction and data reclaim.
- Pasta allow file's owner to rewrite it and others the read. Pasta is also based on Pastry.
- CFS is a almost complete distributed file system on top of Chord
- Ivy is multi-users distributed files system using a distributed hash table.
- Advogato: The Open Group releases DCE 1.2.2 as LGPL'd Free Software - Quote: "[...] Yet even this is somewhat irrelevant, although related to, what has been released: DCE 1.2.2 contains DFS - the Distributed File System. And I'm going to be bold enough to claim that DFS is the ONLY Distributed File System in the world that actually does what it says on the tin: distribute files. Getting locking right is a bitch: it requires an absolute dead-accurate distributed and correct view of what the present time is on every machine: hence, in DCE 1.2.2, you have a time server. [...]"
What kinds of algorithms and strategies have been used to achieve distributed file storage?
The field of distributed file storage is newer than the older field of distributed file sharing, and is therefore still developing and finding it's legs. No ubiquitous 'killer app' has developed out of this newer field yet. Some important tools and concepts that can help you get started with this new field:
- Distributed hash table
- Emergent file storage networks
- Forward Error Correction
- Bloom Filters
- Overlay networks
- Hash Cash
What are some good discussion forums for distributed file storage?
The following are some IRC, listserve, etc spots where folks are building this new kind of P2P network:
- bluesky (The bluesky mailing list seems to be dead; there hasn't been traffic there since May 2002. What's up? --BradNeuberg)
- Add some IRC channels here
- BitTorrent could be considered a form of distributed file storage or more like "distributed distribution".
- This thread on Slashdot about distributed file systems for Linux
- Brilliant Digital is also using this process subversively in programs like KaZaA and other services. It runs in the background and works as a distributed storage for Brilliant's clients. It is considered spyware.