Log In : Register


Baby Steps Into Stored Procedures
Fri, 21 Sep 2007 11:21:10
MySQL : Insightful
 
Recently I have been taking a serious look at stored procedures in MySQL and, in particular, how I might go about implementing them in my website and, in particular, where they could be used and why. At present, I have really only gotten my feet wet with using stored procedures, but I think that is enough to make some conclusions that I thought might be helpful to others that are starting to look at stored procedures as well.

One of my initial impressions is that, if you have never used stored procedures before, it seems to be slightly daunting at first. Stored procedures in MySQL basically use standard SQL syntax, but I found myself using some of the lesser used SQL structures (such as IF and WHEN statements as well as session variable handling). The MySQL documentation on stored procedures focuses directly on them and does not really cover the structures you can use in them as much. As a result, I found myself initially having to scour the documentation (and the 'net) for places to get me started. I think more examples, or at least links to additional frequently used structures on the stored procedures chapter would be a welcome addition.

For instance, one of my biggest confusions was how to convert my application design over to SQL. In PHP, I generated a SQL statement based upon the input of the function. Specifically, sometimes I needed a WHERE clause (in my specific case, over an enumerated field), and sometimes I did not. For instance, a logged in user might have access to view private posts, whereas anyone else could only view the public ones. Or an administrator might be able to view all posts in all states. As a result, the result meant that just changing the value in a WHERE clause would not always do the trick.

I tried to implement the same idea in the stored procedure, but was initially confused about how one should really do that. The solution turned out to be basically the same thing - it just took me a while to get there given that I never needed to build a SQL statement from a string within MySQL itself before. The following is basically what I would call my first full attempt at getting all this to work:

DROP PROCEDURE IF EXISTS GetBlogPosts;
DELIMITER //
CREATE PROCEDURE GetBlogPosts
(IN inOffset INT, 
 IN inLimit INT, 
 IN inputStatus ENUM('live', 'private', 'both', 'draft', 'all'))
BEGIN
SET @offset = inOffset;
SET @limit = inLimit;
SET @stmt = 'SELECT postid, title, body, category, mood, 
    posttime, lastmodified, commentcount, status
        FROM BlogPostsView WHERE ';
CASE inputStatus
    WHEN 'live' THEN SET @stmt = CONCAT(@stmt, "status = 'live'");
    WHEN 'private' THEN SET @stmt = CONCAT(@stmt, "status = 'private'");
    WHEN 'both' THEN SET @stmt = CONCAT(@stmt, "status != 'draft'");
    WHEN 'draft' THEN SET @stmt = CONCAT(@stmt, "status = 'draft'");
    WHEN 'all' THEN SET @stmt = CONCAT(@stmt, "1=1");
END CASE;
SET @stmt = CONCAT(@stmt, ' ORDER BY postid DESC LIMIT ?, ?');
PREPARE stmt FROM @stmt;
EXECUTE stmt USING @offset, @limit;
DEALLOCATE PREPARE stmt;
END //
DELIMITER ;

You can see that I basically built a statement and added to it using CONCAT, depending upon what the value of inputStatus was. Granted, I could have just put a full SQL statement for each WHEN, but that did not seem to be very efficient to me, both in terms of space and maintainability. While this may not be the case for others, it took me way too long to figure out that the way to concatenate strings (even those which are SQL statements) was to use CONCAT. Duh :)

In any case, another peculiarity I found was with LIMIT. You can see here that I am using a prepared statements here. The reason is that it is the only way I was able to get the LIMIT clause to work properly. For some reason, I was unable to simply use the inOffset and inLimit variables directly in the LIMIT (regardless of whether or not the statement was prepared), although using the IN variables directly seems to work in WHERE clauses. Instead, I had to assign these values to session variables in addition to a prepared statement. I found this a bit surprising and am not really sure what this is the case.

Now why do all this? Well, my initial thought was that it would be a good way to clean up code, make things more secure and, in particular maybe get some speed boosts. It certainly cleaned up my PHP code a bit (at least so far) and I do like separating things out. When I am trying to optimize my database design, for instance, I can do so without needing to touch my PHP code (assuming that the procedures return the same thing after I am done). It is the same reason why I tend to prefer using HTML templates as a good method of website design since it separates presentation from code.

There are some caveats here, however. Though stored procedures do simplify application code, you of course need to get used to working with stored procedures in the database, which can require more work than simply editing a file. But the major issue I am concerned about is performance. Many say that stored procedures should run faster, due to MySQL pre-optimizing the SQL inside the procedure. While traversing the 'net, I ran into some different ideas about how (or even if) MySQL actually does this. Though stored procedures can increase performance from a network standpoint (since more complicated operations that were normally done on the application side can be done on the database side), I noticed that they often are not cached in the query_cache.

In my tests static procedures (those with no IN variables) were cached. For instance:

mysql> CREATE PROCEDURE Test1 () SELECT title FROM BlogPosts WHERE postid=1; 
Query OK, 0 rows affected (0.00 sec) 

mysql> CALL TEST1(); 
+-------------------+ 
| title             | 
+-------------------+ 
| First Post...Ever | 
+-------------------+ 
1 row in set (0.00 sec) 

Query OK, 0 rows affected (0.01 sec) 

mysql> SHOW STATUS LIKE 'Qcache_hits%'; 
+---------------+-------+ 
| Variable_name | Value | 
+---------------+-------+ 
| Qcache_hits   | 87    | 
+---------------+-------+ 
1 row in set (0.00 sec) 

mysql> CALL TEST1(); 
+-------------------+ 
| title             | 
+-------------------+ 
| First Post...Ever | 
+-------------------+ 
1 row in set (0.00 sec) 

Query OK, 0 rows affected (0.00 sec) 

mysql> SHOW STATUS LIKE 'Qcache_hits%'; 
+---------------+-------+ 
| Variable_name | Value | 
+---------------+-------+ 
| Qcache_hits   | 88    | 
+---------------+-------+ 
1 row in set (0.00 sec) 
But how useful can this really be? There are some cases where calling a procedure with no arguments might be useful, but what if I want to create a procedure that allows me to input a variable, say, for a WHERE clause? Such as:

mysql> CREATE PROCEDURE Test2 (inID INT) SELECT title FROM BlogPosts WHERE postid=inID; 
Query OK, 0 rows affected (0.00 sec) 

mysql> CALL TEST2(1); 
+-------------------+ 
| title             | 
+-------------------+ 
| First Post...Ever | 
+-------------------+ 
1 row in set (0.00 sec) 

Query OK, 0 rows affected (0.00 sec) 

mysql> SHOW STATUS LIKE 'Qcache_hits%'; 
+---------------+-------+ 
| Variable_name | Value | 
+---------------+-------+ 
| Qcache_hits   | 88    | 
+---------------+-------+ 
1 row in set (0.00 sec) 

mysql> CALL TEST2(1); 
+-------------------+ 
| title             | 
+-------------------+ 
| First Post...Ever | 
+-------------------+ 
1 row in set (0.00 sec) 

Query OK, 0 rows affected (0.00 sec) 

mysql> SHOW STATUS LIKE 'Qcache_hits%'; 
+---------------+-------+ 
| Variable_name | Value | 
+---------------+-------+ 
| Qcache_hits   | 88    | 
+---------------+-------+ 
1 row in set (0.00 sec) 
(Note that Qcache_not_cached was incremented at each run in the above example. I didn't include it for brevity's sake)
No query cache for you! Why, I am not sure, but I can only assume it has to do with the fact that the query cache is very coarsely-grained. My thought is that having to cache a procedure call my make storing the results in the cache more complicated, since it usually just caches SELECTs and their results directly (feel free to correct me if I am wrong here) which means that procedures may be more complicated than simply grabbing the command (which would be a CALL statement, not a SELECT).

Now granted, there is more to MySQL performance than the query cache, but I found it odd that static procedures did get cached, but ones which allowed you to input data did not. The real answer here is likely beyond the scope of my understanding at the moment, but what I gained from it is a realization that blindly converting all your SQL statements in your code may not always be the best idea. A blog post, for instance, is a great thing to cache in the query cache since it is usually unlikely that it will change very often (if at all).

One of my colleagues at work did mention that stored procedures can prevent injection attacks. I am actually not sure about this in the context of PHP and need to look a bit more deeply into this. If they do help thwart injection attacks, I could see them being highly useful. I'm sorry to say that my experience with mysqli and prepared statements (which is a good way to prevent injection attacks) was not a good one. Problems were difficult to debug, and in one case I found PHP chewing up 16MB of RAM for no apparent reason (that I was able to find anyway). Stored procedures, by contrast, are very easy to execute, though you do need to use mysqli (specifically mysqli_multi_query, which I basically wrapped around my own easy to use function).

I should be a little fair here too. Stored procedures are pretty new to MySQL but are already quite powerful. And I expect that, as they mature, they will become even more useful and powerful. In a nutshell, I am excited about what stored procedures bring to the table, and plan on implementing them, at least in places, on my own website, if for nothing else than to learn more about them. But they probably fall under the "right tool for the right job" category, meaning that thinking about where one might need to use them and why is still an important consideration. For instance, a VIEW may do just as well as a stored procedure in certain cases, while other situations may call for just calling the SQL directly from the application.

Tags:


2 Comments

Comments

By lsmith - http://pooteeweet.org
On October 4th, 2007 - 03:58:57
The reason is likely that until MySQL 5.1 prepared statements were not handled inside the query cache at all. However this only works if you go through the prepared statement API (as in not using variables like you do in stored procedures). Also I do not know how much MySQL really caches the compiled stored procedures at this point. And yes prepared statements help protect against SQL injection (as long we you do not add user supplied strings to any place in their statement outside of placeholders without prior validation).

By m00dawg - http://www.moocowproductions.org/
On October 4th, 2007 - 09:52:17
Ah I see what you mean. That actually makes some sense, although I am curious as to why the underlying query in a procedure isn't cached. I assume it has to do with how procedures are being handled in the query cache. Now as for the SQL-injection prevention, I assume that they have to be used in the right way in order to prevent injection style attacks. For instance, I noticed that this works just fine: mysql> call test1(); insert into test values (1); +-------------------+ | title | +-------------------+ | First Post...Ever | +-------------------+ 1 row in set (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 1 row affected (0.00 sec) mysql> This also works in the context of mysqli->multi_query, at least when I ran the above query with it, although I did not test it using a conventional injection-style attack. So I think that procedures do not completely eliminate the need to validate code, although one could additionally wrap function calls around prepared statements themselves. While I have found mysqli's approach to using prepared statements a little ugly I have found, I think it would be pretty easy to wrap procedures around it (and is, consequently, on my to-do list for my own site :)

Comment RSS Feed

You cannot post a comment without first logging in.

Go Back


Tim Soderstrom's E-Mail: tim at moocowproductions dot org