Having gotten my feet with in regards to
BLOBs while developing
StuffDawg, I am starting to really see why some many people get eaten by the BLOB (and get spit out too!). The problem, at least that I have found, with BLOBs is that they do not seem to scale well. My experience makes me think that it is not just because of the fact that you are storing binary data in a database, but because of the overhead in the application code for having to deal with them.
For instance, suppose you are storing images in your database, say for a gallery. That is great and all, but how does one display the images in a web-browser? Using the <img src> tag I would assume. This is all well and good, but since the image does not exist on the file-system, you either have to put it there, or need to write a script that pulls the data from the database. I did the latter for StuffDawg, mostly because it was quicker to developer, and I do not expect StuffDawg to be used very actively like a normal website would be.
Trouble is, if you have a script that pulls the data, you need to specify that as the link to the image. This works great, but what if you have 200 images you want to display at once? That's right. You have to run that script 200 times, which means that your bottleneck may very well not be the database at all, but your application code.
To me, while the code does seem to be much simpler to deal with (no needing to muck around in the file-system), the solution does feel a bit kludgy. Yet the draw for wanting to store binary data in the database is so appealing. Take my
music, for instance. I could store all these directly in the database, and not have to have either funky named files or have to mess with creating a new directory each time I release a song in a new year. Instead, I basically go to the database and say "Hi database, can you get me the mp3 of the song with id 58?" and it servers it right up for me. Hot.
Furthermore, I got to thinking about my blog, or really anyone's blog. We live in a world of rich media, so wouldn't it be great if we could have a place where we could easily store and retrieve that media? For example, I like putting pictures up on my blog on occasion, but right now I have to deal with an even worse hierarchy of directories and filenames and it is just ugly. Plus, I did not have the foresight to think that I might want to have links to other content, such as MP3, videos, PDFs, whatever. And even if I did, I still have to do a lot of manual things to get it to work. Enter BLOBs! So instead of uploading a file via FTP, creating a directory for it, adding the post id to the file, and spending way too much time, all I have to do is upload the file into the database and call it a day. Then, if I want to pull it back out, I just have a PHP script that determines the file type, delivers this as a mime type to the browser, and then serves up the file. Awesome!
I left the details out on purpose, because it gets even better. A gentleman by the name of
Paul McCullagh has created a project called the
Scalable BLOB Streaming Project. I have mentioned this before, but am getting really excited about the prospects of both BLOB Streaming and Paul's PBXT engine. The idea is to make grabbing BLOBs from the database more efficient (something both BLOB Streaming and PBXT aim to do). His main concern seems to be the fact that BLOBs much be buffered in RAM before they can be delivered in a conventional query in MySQL. This is where the streaming part comes in, since the BLOB can be streamed as it is fetched from the database. But I think his approach has the effect of also making managing BLOBs on websites much more efficient as well by eliminating the need to create these gacky scripts to present the data to the browser.
His solution? Well, his alpha implementation involves a simple webserver built on-top of MySQL. To grab data, you simply create a URI and feed it to the webserver. It returns to you the data you were looking for. No more ugly PHP scripts! Just create a URI and call it a day. How sweet is that! Granted, this is only the alpha implementation, but it sure does make me excited. If this works well, it will solve, not only the problems above, but many other problems as well. For example, what if you have multiple web-servers that have to deal with customer uploading images on a regular basis? How do you keep the webservers in sync. rsync? Bah! Now you can just use MySQL replication. Done!
In any case, it will be very interesting to see how BLOB Streaming pans out. My personal spin is that it is going to revolutionize the way we handle and store binary data for our web applications...rock on!
Tags: