JonBlog
Thoughts on website ideas, PHP and other tech topics, plus going car-free
Utility/broadband usage scanner
Categories: Code, Ideas

I’ve written a simple PHP script to keep an eye on mobile internet usage, and have open-sourced it under a MIT license for anyone who wants to have a play. It scrapes account data from a provider’s website and stores that data in a local SQLite database.

This mini-project gave me an opportunity to play with PhantomJS, which once you understand what it is doing, is very nice to use. The trick with this JavaScript-powered headless browser system is to understand page.evaluate() – this is run in the context of the scraped page, and so anything you’ve set in a global context disappears. I’ve found extending Phantom’s page object with my own child class helpful; this permits the script to inject its own variables so they are accessible inside the handler function.

The database is presently set up for internet data usage monitoring, but it could be extended to include other telco data, relating to things like call minutes and SMS counts. If you want to extend the code to monitor gas/electricity usage, another table would probably be appropriate, but with a few tweaks it should still work.

To set it up, do the following:

  • Make sure you have PHP available on your system, with the PDO and pdo_sqlite modules enabled
  • Install PhantomJS. If the “phantomjs” binary is in your system path, that’s great, but it doesn’t need to be
  • Clone the account-watcher repo to a folder on your machine
  • Copy configs/system.ini.sample to configs/system.ini and fill in the blanks. Only phantom_executable is important here, and you can just use “phantomjs” if the binary appears in your system path
  • Copy configs/system.ini.sample to configs/system.ini and fill in the blanks

Then, check to see if your provider is supported (see here). In most cases it won’t be, so you’ll need to roll up your sleeves and write a Phantom script. I’ve done most of the hard work already though – if you extend LoadHandlerBase.js you’ll get callbacks for a couple of interesting events (OnUrlChanged and OnLoadFinished), which should be enough to get you going. There’s not much in the way of docs at present, but let me know what you need (either in the comments or a GH ticket) and I’ll see what I can do.

A few other tips: log on to your provider’s authenticated account and explore the HTML of pages you want to scrape using your browser’s element explorer. This is really useful in determining appropriate CSS expressions to refer to form elements or items of captureable data. Also, switch on the test_mode flag in the system config – this will let Phantom render to the screen in real time, so you can spot any local or remote errors. Lastly, console.log() is invaluable for outputting debugging information (but for permanent output of various kinds, use the output* functions instead).

I’m filing this as “in progress” at the moment. At some point I’ll add a nice graphy thing, to be accessed via a local web vhost. Enjoy!

Leave a Reply