JonBlog
Thoughts on website ideas, PHP and other tech topics, plus going car-free
Bandwidth project: offering direct downloads without breaking limits
Categories: Finished, Ideas

Introduction

Some years back I downloaded an amateur documentary from the ‘net, around 800MB, and wondered at the time how such a thing could be offered to more people without costing the distributor a potential fortune in bandwidth. A few solutions exist already, of course: one can create a torrent, and wait for torrent users who like the download to start seeding it themselves. But downloading a torrent requires software to be installed, and so for ordinary computer users is both too technical and too much hassle. (I wonder though if this would change if a simple torrent client was embedded into all browsers as standard).

Alternatively, one can upload files to RapidShare or the like, though files sometimes need to be split into chunks to respect size limitations, and downloading from them can be a real exercise in jumping through CAPTCHA and timer hoops, unless one has a paid account. Expecting users to run file chunks through joiner software is also unrealistic. Lastly, one can upload to a site such as Google Video, though this is of no use for non-video files, and depends entirely on the whim of a corporate policy as to how long the file remains hosted.

I recently came back to this idea, and started sketching out some requirements. I liked the idea of a PHP application that could be deployed to a low-powered server, thus permitting an amateur content distributor to offer files simply and cheaply, and without the drawbacks of existing solutions. In particular I wanted to offer files over ordinary HTTP, and to be able to specify how many times it may be downloaded or how many download bytes it may consume. I also wanted to be able to throttle file download speed to a maximum bandwidth. Having bounced some notes around on paper, I did a search over at SourceForge and Google, and was surprised not to find anything similar to this description already.

So, I created a new symfony 1.3 project, and started on a prototype. Over a couple of weeks of dabbling, I had the following features implemented:

  • File editing
  • File upload
  • Multiple file group membership
  • Group download limits (by bytes and/or by count)
  • Group maximum bandwidth
  • Group concurrent limits (total and by IP address)
  • Group valid date range
  • Group limits reset frequency
  • Group limits reset offset

The “group limits reset” features permit usage limitations to be reset on a regular basis. For example, the frequency might specify that a file may be downloaded up to 100GB every 30 days, and the offset might specify that the reset occurs on the 10th day of each period (very useful for billing dates, since they don’t usually land at midnight on day one!).

Group membership allows a file to have several different sources of limitations applied to it. For example, a file may have a server-wide limitation applied to it (say, the 100GB/30days mentioned above) but it may also be specified that it is not to be downloaded more than 1000 times in any one day, perhaps to spread server load evenly throughout the period.

Alpha code released

So, I’ve called the project (unimaginatively) Bandwidth and today am releasing a tarball to see if anyone is interested. With a tiny bit of know-how to get it installed, and a smidgen of htaccess to protect the admin area, I think the code might be useful already. It’s loosely “open source” though I’ve not picked a license yet; I’ll do so in the unlikely event someone sends me code. It requires a database – pick any db platform supported by Propel 1.5, they should all work (at least including MySQL and PostgreSQL, probably Oracle 10-11g and MSSQL too). Ensure you’ve got a good version of PHP 5.2 or 5.3 – 5.2.4 is probably the absolute minimum.

Download Alpha

To install, you’ll need to do something like this:

  • Unpack tarball to a PHP host folder (non-root folders may work but not tested)
  • Check whether your PHP config is suitable, by running this: php data/symfony/bin/check_configuration.php (at a minimum you need PDO and a PDO module for your database, plus a PHP accelerator)
  • Create a database and a database user in your db client
  • Add the db settings to config/propel.ini (to build the database) and config/databases.yml (for run-time)
  • Create an Apache virtual host and a hosts file entry as appropriate
  • php symfony project:permissions
  • php symfony propel:build-sql
  • php symfony propel:insert-sql
  • php symfony cache:clear
  • If you put this on a live server, protect “/settings/*” perhaps using  AuthUserFile (yes, the software should/will do this, but it’s only alpha!)

Then fire up the application in a web browser (http://yourserver/settings) and have a play.

Users of shared hosts may have trouble with this code, as the download script turns off max_execution_time completely – which safe mode prohibits. I don’t know what proportion of shared hosts have this restriction – though I believe some of them now permit access to a custom php.ini. YMMV, as always.

Future development

What might come next in the development of this mini-project? Well, presently all downloads go through a PHP application, which for a busy download server immediately reduces the number of concurrent users by a substantial number. I reckon each PHP processs requires a 16MB footprint, whereas downloading a file directly from Apache adds a much smaller additional memory requirement. For the time being, the approach I use is necessary anyway if bandwidth limiting is required. However if just byte-count limiting is required, we may be able to feed Apache logs into this system to count downloads after they’ve been done – which will allow us to push the donkey-work back onto the web server.

Feedback welcome!

Update: 22 July

Credit where it’s due: my throttling mechanism uses this class.

As usual, my prior art research doesn’t turn anything up until much work has been done! I found this package, though it is neither free nor open source – and hence unlikely to appeal to people on a shoestring budget. Quite old as well, since it works with PHP4. But it does look a great deal more featured than my alpha package, though if there’s interest, mine could catch up…

6 Comments to “Bandwidth project: offering direct downloads without breaking limits”

  1. Mobin says:

    Very useful articles, i have got a question:
    Let’s say we want to add some uploading features to this script so users can upload files. Suppose that we would have a main server for the normal request and we have multiple file servers which the files will be stored.

    The POST url would always be something like: http://upload.mysite/ and at the same time we have the file-servers which will be : http://srv1.mysite/, http://srv2.mysite/ and so on… Is it possible for the file to be uploaded to those file-servers DIRECTLY by POSTING to the main site(http://upload.mysite/).

    This is what i have done:
    form method=”post” action=”” and using ajax i am sending a request to my main server which will create a session on the file-server and will return a JSON with a hash and the file-server url to POST to, and then replace the original form action=”” with the new file-server url.

    Is there any other appropriate way to handle this “connection” between main site and file-server dynamically within PHP so we would always POST to http://upload.mysite and files would be uploaded to the file-servers DIRECTLY?

    Big hosting companies usually have one static POST url with just different session id each time someone uploads, but i think they move the files to other file-servers as soon as uploading is finished.( which i don’t want because we are simply wasting more bandwidth and more work for moving the files from main server to the other…)

    Any suggestions or anything wrong with my idea and processing methodology?

  2. Jon says:

    Hi Mobin,

    There are a few ways you can handle this. Although it is possible for a single server to handle all uploads, as your system gets bigger it may not handle all the upload traffic – even though uploading probably accounts for a miniscule fraction of the total throughput.

    The way I suspect this is done is to use a reverse proxy such as Squid. The address upload.mysite refers to the proxy server, which sits in front of an array of web servers, and distributes the work on a round-robin or load-detecting basis.

    I wouldn’t worry about the inefficiency of transferring files between your servers – popular files may need to be stored on more that one server anyway to reduce load on a single point. Assuming your servers are in the same data centre, there is negligible extra cost transferring them between servers. And if your operation means you need to operate from more than one data centre, the cost of the bandwidth between data centres (to replicate popular files) will be much less than the cost of the bandwidth consumed by users anyway!

    As an aside, I should imagine that download sites use servers connected to a SAN, rather than relying on storage on the server itself. These devices specialise in having large arrays of disks, often using various redundancy schemes, to offer large amounts of storage and to protect against disk failure.

  3. Mobin says:

    Thanks Jon for the quick response.
    The configuration you described above is the appropriate way of doing what i asked. However this would cost someone much more (financially) since you involve more hardware and technologies.

    I will try to convert my current script which is in PHP and Perl to achieve something similar to what Jon suggested. (right now i am so busy with my studies hehe!). If i get anything new i will post more comments on this article.
    Thanks

  4. Jon says:

    You’re welcome.

    The proxy software itself is free, but of course having many servers and large amounts of storage is not. But then your question specifically was about a multi-server solution, and for this to scale sufficiently to generate income, I think these questions would need to be tackled :).

    If you don’t want to use a reverse proxy, then you could use a round-robin scheme to POST to one of several web servers when uploading.

  5. Mobin says:

    Oh right so i could have http://upload.site as my static POST url which will use DNS round-robin configuration to connect the client with one of several servers.
    Very nice with DNS round-robin i will get rid of the ajax and json scenario which i described in my first post 🙂

  6. Jon says:

    You could, although it’s not what I meant. That would give different results for everyone who visits the site, but it might achieve the same effect. Don’t forget that DNS results are cached at various stages – certainly by the browser/local-machine – and so effectively are static for a given user once obtained. But, it could still solve the problem.

    My solution was rather simpler – just use a round-robin technique when generating your upload address (upload1.site, upload2.site, …). This means a single user can be split over several servers if they’re using a disproportionate upload capacity. Remember that if an existing upload site you’ve seen does something in a certain way, you don’t have to follow suit 🙂

Leave a Reply to Jon