What is this about?
Several of my blog posts are about Meshing, a P2P database layer for sharing structured data across the Internet. At the time of writing it is in a design and early implementation phase, and to guide the new reader, I’ve added this page. I’ll update it as time goes on, to reflect the current state of the project.
One minute pitch?
Editing large datasets with large numbers of people is difficult for several reasons: (1) the infrastructure is expensive, (2) contributors worry about the long-term availability of the data, and so are discouraged from participation, and (3) competing contributors each want to be the custodian of a dataset, resulting in many incompatible and fragmented data silos that each lack critical mass. Meshing is a software solution to this problem: a number of people agree on a database format, and then send and receive updates to each other in an automated, decentralised, trust-based way. All rows are versioned, so bad changes can be rolled back. Anyone can then use the resulting dataset for any purpose, which is hoped will encourage ownership, increase scalability and decrease costs.
Who is the intended audience?
Firstly, people who work with large datasets and people who would benefit from some variation of the crowd-sourcing model. Secondly, anyone who is interested in disruptive technologies and how they create new models for business and co-operation. Thirdly, free/open-source developers who are considering assisting the project.
Do you have some suggested use-cases?
Do you have a source code repository?
Yes, here. Anyone is welcome to create a Git fork.
Gimme some juicy techie details.
Sure! The software is written in PHP, and requires at least version 5.2. It will run on any Unix-like system, and should run on Windows too. Since it makes use of the Propel ORM, it will work fine with MySQL and PostgreSQL, and some trivial extra work will get it working on Oracle, MS SQL and SQLite. I’ve also used a number of Zend Framework modules, for utility stuff like auto-loading and command-line parsing. Unit tests are based around SimpleTest.
The background task (to push data to other instances) will normally be kicked off by cron (or whatever scheduler is available) but where the admin has full rights to the machine, a PHP process could be set up to run in the background permanently.
What are the hosting requirements?
The core system must run on shared PHP hosting (such as cPanel) and be as easy to install as WordPress. This is an important prong of the strategy, intended to reduce barriers to active participation. Of course, it will run better on a VPS or a dedicated server, and may expose more features where root access is available.
What’s the license?
I haven’t decided yet. I am keen on something that makes the software Free, but permits communication on the server with separate programs released under proprietary or incompatible licenses. I think the GNU General License would do it, but suggestions are welcomed.
How is the implementation going?
Early days at the time of writing, but good. The prototype is the first target, and this can be split into various parts: interface (how the user interacts with the software), storage (how to set up the database system) and transport (how instances communicate with each other). The interface part has been achieved speedily by using a console command approach.
The storage part was done as of late Nov 2011, and some initial transport tests are in progress as of Feb 2012. A key task here is to set up several nodes on the same machine, and get them to communicate with each other.
Will there be a web/GUI interface?
Yes, but not before the prototype is finished. The first version will be web-based and likely also in PHP, as of course it can integrate tightly. However, since the prototype will have a console command for everything, interfaces can be written in most languages.
Isn’t this just database replication?
No. Database replication assumes that all changes are final, and so a history of each row is not kept. In Meshing, rows are all versioned, so if an instance makes some malicious changes, they can be reverted easily. In any case, relational database replication is generally a one-way process, but Meshing can be bidirectional. Note that even if we used replication to (partially) achieve Meshing’s objectives, it would clash with the project strategy, since such systems are not available on commodity hosting.
Also, Meshing will provide tools to manipulate database designs, database instances, connections and trusts, which would not be nearly as simple or cross-platform using native replication systems.
You need bidirectional replication and record versioning. Why aren’t you using CouchDB?
Good question! When I was first asked this, I didn’t know about the replication feature in CouchDB – it sounds very similar to what I’m trying to achieve. For those who haven’t yet come across it, CouchDB is a document database that simply stores sets of key-value pairs, and is a very different affair to a traditional database. But I was pleased to have been asked this question, as it prompted me to do some related research; in the process I found that the Refuge Project is already using CouchDB for precisely this purpose. This is good news, since it shows that someone else thinks the idea is worth pursuing.
One of my aims for Meshing was that it runs on extremely basic hosting, to encourage widespread adoption. As it happens I have been advised that, since CouchDB is being engineered to work on Android, it doesn’t need a high-spec server to run; I am told it will run in a VPS with 256MB of RAM. Nevertheless, it still won’t run on a shared Linux host, which I am very keen for Meshing to do. In any case, whilst NoSQL systems are undoubtedly offering wins for early tech adopters, most of the world is still firmly sticking to relational systems, and I want Meshing to bear that in mind.