In my last post, I wrote a non-technical introduction to my software proposal for an online peer-to-peer relational database, and in lieu of my long-promised paper, I thought a brief technical treatment might be of interest.
My vision is for this software to be open-source, so that anyone can obtain it for free and modify it freely. Also, I want the hosting demands to be minimal, and for the software to work reasonably well in a shared-host environment. That means, I think, that it should be written in PHP; as far as I know, Rails apps are not as super-easy to deploy reliably on a shared server, and aside from the above, I am not sure what other languages should even be considered. Furthermore, I don’t want to lock the user into MySQL, as PHP applications often do – even though MySQL is a great database. Instead, the system should auto-detect what databases their environment supports, then allow the user to choose (or default to MySQL if they have no preference). To this end, I have my eye on Propel, or perhaps something a bit more lightweight such as NotORM.
So, the installation of this software should be a little like WordPress. You upload the unzipped files to the root of your vhost (even via FTP, if you’ve no console access) and fire up the web interface. You then enter your database credentials and it builds the core database environment while you wait. Once it is done, you get a nice clean control panel to configure every aspect of the system.
To start with, you create a schema i.e. a database structure design. This consists of the usual components of database development: tables, columns, primary and foreign keys, null and relational constraints. I expect to limit the choice of column types to a handful of the most useful kinds, to ensure they all work across all platforms. Next, one creates a node instance, and specifies which schema it should use. Another set of database credentials are entered at this point, to hold node-specific tables (table prefixing could be used, but shared hosts with “unlimited” databases are ten a penny now). Several nodes can be stored within one instance of the software; collectively they are known as the nodeset. Nodes can be started and stopped in the web interface, though they are not technically standalone servers; in fact they just use Apache (or whatever web server) to respond to requests. A stopped node will probably return a status code 403.
Each node will have a data editor screen associated with it, into which data may be added, modified or deleted. “Deletion” in this case will however not result in a DELETE command; instead it will mark the affected row as deleted, so that that deletion can be propagated throughout the network. The admin will have to decide how to allocate access to editor screens, since all data entered will be sent to other nodes; it could be admins only, registered users only, or anonymous users etc. Other nodes will trust (or distrust) a given node based on their perception of the quality of data entered into it, so care should be taken here. In some use-cases the generic interface will be insufficient, and so the node owner may develop their own software that talks to the node database directly.
Once the new server is up and running, a new admin wishes to mirror the data (perhaps to further a particular aim, or to make use of the data in his/her own system). So, they set up a software instance on their own server. They download a copy of the schema from the first node, and create an empty node in their control panel. Next, they set up a trust relation to the first node, likely on the basis that it is doing a good job of its stated role. Since the first node is by default set to auto-accept any trust requests, it is added to its update list (note however that this does not imply the converse trust – node 1 does not automatically trust node 2). Together the two nodes form a mesh, albeit a pretty limited one; new nodes may then spring up, and have trust arranged with existing nodes. Once newer nodes prove themselves as trustworthy (i.e. they don’t enter bad data) or reliable (they have a good uptime) then admins of older nodes may choose to add trust relations to newer nodes.
Data updates are pushed, rather than pulled. This means that servers do not need to redundantly poll its immediate peers to ask for changes; instead, when a change is made, it is pushed initially by the node where that change occurred. When it is sent to a peer, they will push it out to their peers, and so on and so forth, until the change has gone to the whole mesh. Any messages between nodes, data transfers included, will be done in XML. For hosting environments that can support it, a background task could monitor for changes and start pushing immediately, but for shared environments, this would be run on a simple cron job.
Since each server admin will want to maintain the integrity of their data, each node row will be versioned. This will allow the history of a node to be examined at any point, including which node was responsible for a particular edit. If a node makes a series of edits judged by the community as bad (or malicious), admins will stop trusting them, and they will lose their ability to change in the data. If an otherwise good node persists in trusting a bad node, then the good node will also lose trust – in particular because who trusts who will be publicly visible.
Changes are made at the column level, not at the record level. This means that if two non-connected nodes edit the same record at the same time, but alter different columns, then both changes will correctly propagate regardless of the order in which they were received. If they however were to edit the same columns at the same time, then the later timestamped item will win; this of course requires each server to keep the correct time, validated by an internet time source.
Whilst a node is connected to a schema, the schema may not be modified. However it is likely that nodes collaborating on a mesh will want to upgrade the schema they all use at some point. To this end, the community agrees on a new schema, which is then posted on a public website. Several server admins upload it, and the others then may either upload it in the same fashion, or use their admin screen to grab it from a peer automatically. From here, each admin creates a new database, connects it to the new schema, and then uses a migration system to copy the data from the old database to the new one. Their node is stopped, connected to the new database, and then restarted. Rules can be set so that peers with radically differing schema versions may not communicate at all, but peers with minor differences will do its best (e.g. ignore an extra/missing column).
Nodes initially will just replicate all data. However later on, to cater for admins who wish to just mirror a subset, we will permit nodes to carry out selective replication. This will permit the “IT Jobs” data subset given as an example in the last post.
There should be a limit to the number of nodes that may peer with a given node, so as to discourage “star-shaped” meshes that have a single, central point of failure (this would arise by virtue of everyone wanting to directly trust the most active editor nodes). What this figure should be will be arrived at by experimentation, but I should think it would be around 8. Our aim here is to create a balanced mesh that can survive partial failure.
Talking of experimentation, it shall be perfectly possible to have an interesting, randomly-connected mesh within a single nodeset i.e. all on one server. Such a configuration would not be very useful in real life, but will be excellent for testing theoretical network propagation approaches. This might use a factory pattern so that the message transport can swap HTTP for something else transparently.
That wraps up my technical intro, though I am sure to add small edits in when they occur to me! Feedback here, as always, is very welcome.