D-RAID

A Modest Proposal (and I mean a real modest proposal, not the W.C. Fields kind that suggests eating children.)

(Note: This is completely theoretical and will never actually be implemented by me, but if you want to implemented, drop me a line, and give me credit as the source.)
(2nd Note: This proposal hinges on being able to find a m,n secret sharing algorithm that separetes the data into chunks about 1/n the original size.)
D-RAID is a plan for storing and protecting important and/or unique information worldwide by cutting the information up into tiny chunks and storing it worldwide on volounteer systems. D-RAID stands for Distributed Redundant Array of Independant Disks. The plan is, essentially, that people, in return for donating space on their own hard drive, send information they want to have stored to a central server. This information is encrypted with the server's public key. Once the server, the information is decrypted, then encrypted using the sender's public key. It is then sent through a m,n secret-sharing algorythm, with m being roughly double n. After the information has been encrypted, each segment will be distributed among m machines, or, if m machines aren't available, the excess will be stored locally on the hard disk of the server. When information has to be retrieved, the server will tell a client program on the systems to upload the required segment from it's disk to the server, where it is re-incorporated with the rest of the n segments, and sent back to the owner of the information.

User ID's

User ID's will be the e-mail address of the user at the time of joining, and a password will be chosen by the user. If a user forgets his/her password, the can have it mailed to them. If a user changes their e-mail address, they must submit a message to the server using the new e-mail address and give their password. A change of ID message will be sent to the old e-mail address, and if a cancellation reply is not sent by the old e-mail account by the end of the refresh cycle after the current one (Refresh cycles described
later), the user ID will be changed to the new address.

The Client Program

The client program is what will be doing the uploading and downloading of files. It would be recommended that this program be open source to curb fears about security holes. It is also suggested that the client have it's own user, such as draidd (D-RAID Daemon), in order to further limit the program's read/write abilities. The program will run in the background and update either when the PPP link is open (for dial-up) or every 24 hours (for ethernet connections)

The Refresh Cycle

In order to avoid losing data to attrition of people leaving, a refresh cycle must be created. The refresh cycle "ages out" inactive users. At the beginning of every cycle (which I suggest should be about 30-40 days), A message is put on the server, which is read the next time a client program updates (see
above). The message, pretty much tells the client to get a list of the files under it's control, make a hash of the list, and upload it to the server. The server checks this against the hash that is in the client's entry on it's list of clients. If they are different, the server requests entries from it's other clients, waits until it gets n entries, and re-creates the damaged files, which it queues for download to the correct client.

Segment Storage

Segment naming will most likely consist of a hash of the
user id of the segment's owner, plus a 2-digit number to allow for more than one file per user, so as to use the template [HASH]XX.DRS (DRS meaing D-RAID Segment). Any segments that cannot be placed due to lack of clients, will be stored locally on the server, until more clients are added.

The Server

The server should be permenantly connected to the internet, or whatever network it is meant to service. The server will store the split-up secrets locally until each client does it's routine update. Unfortunately, this means that the server will need a large amount of disk space, preferably enough to hold approximately 10% of the files in segmented form. The server is also in charge of initiating the
refresh cycle. Most of the computation on the server is in bursts, and therefore, the server could easily be a low-traffic server for web, mail, ftp, etc, especially if you limit the cpu-intensive activities, such as encryption and secret sharing algorythms to scheduled "down times".

Terms

M,N secret sharing alorrythm - an algorythm that separates up a file and distributes M chunks to different people, but only N of the chunks are required to re-assemble the data.

projects