Last week
Paul released version 0.5Alpha of the new
BLOB Streaming Engine and so I thought I would take some time to see what is new and do a bit more benchmarking. This time around, Paul concentrated on a new method of handling BLOBs. Instead of storing them directly in a table, you can now store a reference to them in the MyBS repository. Each database can have its own repository (which exists in the database directory on the filesystem) and you can also control how large the repository file can grow before another file is created.
This further abstracts BLOBs from MySQL itself, but does allow one to easily be able to insert BLOBs using a simply HTTP PUT method, although in order to store the BLOB permanently, you need to actually insert the reference into the database. If you don't, the MyBS engine will delete the BLOB after a certain timeout has been reached. I imagine Paul did this in the interests of security as well as to make sure BLOBs don't end up in the MyBS engine with no reference to it. The old method, now called the field reference method, still exists, leaving users with a choice of which one to use, depending on their particular needs.
This particular implementation is very new and is obviously still an Alpha product. However, I think this has a lot of promise to further simplify the handling of binary data, particularly when the data is constantly being changed or added to. All those social networking sites that allow users to upload photos is a good example here. That said, I tend to prefer having more direct control over the data; I am having a hard time grasping the concepts of the BLOB references. I would also like to see the MyBS engine become more standardized perhaps so that it is possible to insert data using the PUT method (as a reference) as well as directly using an INSERT. Currently, it looks like only one or the other can be done, although it is possible to access the BLOBs directly from the repository by creating some system tables.
I have not yet tested this new method of handling BLOB data. I think I may wait a bit until Paul has fleshed out the design a bit more. I did, however, grab some interesting benchmarks using the field-access method. These are similar to the ones I did
previously, except that I have also benchmarked the field-access method on top of Apache's
ProxyPass module:
| Method | Min | Max | Average |
| PHP Grabbing BLOB Directly | .452 | .615 | .518 |
| MyBS Field-Access | .016 | .023 | .018 |
| MyBS Field Access + ProxyPass | .020 | .026 | .022 |
| GETing the file directly | .017 | .018 | .018 |
(Time in Seconds)
This time around, I do not want to mention a clear winner, except to say that using PHP is slow - real slow. That should be obvious, of course. Using the field-access method of MyBS was the fastest on a few runs, although GETing the file directly from the file-system (using Apache) was fairly consistent whereas MyBS had more deviation. ProxyPass slowed things down just a bit too, but not enough to make a huge impact. Certainly, the gains of using ProxyPass far outweigh the extra minor overhead.
I opted to try ProxyPass as a possible answer to the fact that MyBS has no authentication mechanism. So while using the MyBS Field Access method directly resulted in the fastest access time (max), using it for a public-facing site would be very risky. I have not used ProxyPass all too much, so I cannot say for certain if it solves the security problem, but it does allow one to limit a user's ability to access data while also hiding the fact that MyBS is being used. For instance, I setup ProxyPass as follows:
ProxyPass /images http://localhost:8080/StuffDawg/Images/image
ProxyPassReverse /images http://localhost:8080/StuffDawg/Images/image
So the URL to access the image is actually something like http://stuffdawg.moocow.home/images/itemid=20. This cleans things up quite a bit, and prevents the users from trying to access other fields from the database. A nice perk about ProxyPass too is that you should be able to cache the data, say using
mod_cache, or
memcached.
Again I can't commend Paul's efforts enough. Being able to have a system that provides all of the benefits of storing binary data in a relational-database with speeds that are at least similar to direct access methods is going to revolutionize how people deal with data. I, for one, can't wait until I can integrate MyBS into my own site since it will not only simplify my code base but should also allow me to more easily manage data (particularly the photo album).
Tags: