SQL Blocking and Client Variable Purge

We've had continual issues with our ColdFusion applications locking up on us. I've spent the last week cleaning out huge tables, creating indexes, and tuning queries, but could never seem to track down what was causing the issue. Today I caught it in the act... As the the applications locked up, I ran the following query in SSMS: SELECT st.text , r.blocking_session_id , r.session_id, r.status , r.command , r.cpu_time , r.total_elapsed_time FROM sys.dm_exec_requests r CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS st This showed me what queries were getting blocked, and who was doing the blocking. It turns out that it was ColdFusion's Client Variable purging that was causing the blocking. This is the query that was running: DELETE FROM CDATA WHERE CDATA.cfid in (SELECT CGLOBAL.cfid FROM CGLOBAL WHERE CGLOBAL.lvisit < {date} ) DELETE FROM CGLOBAL WHERE CGLOBAL.lvisit < {date} Unfortunately, there are no lock hints anywhere on the query. With 7,500 records to delete in a table with more than 30,000,000 records, the table was getting locked for more than 2 minutes, a complete disaster for a production application dependent on client variables. So I went to work creating a new query that will delete the client variables at a much more reasonable pace, with only as much locking as is necessary to keep things humming along smoothly. The new query looks like this: SET NOCOUNT ON DECLARE @cfid char(64), @rowsaffected int, @rowsdeleted int; SET @rowsaffected = 0; SET @rowsdeleted = 1; WHILE @rowsdeleted > 0 AND @rowsaffected < 100 BEGIN BEGIN TRANSACTION SET @cfid = ( SELECT TOP 1 CGLOBAL.cfid FROM CGLOBAL (NOLOCK) WHERE CGLOBAL.lvisit < DATEADD(d, -30, GETDATE()) ORDER BY CGLOBAL.lvisit ) DELETE FROM CDATA WITH (ROWLOCK) WHERE CDATA.cfid = @cfid DELETE FROM CGLOBAL WITH (ROWLOCK) WHERE CGLOBAL.cfid = @cfid SET @rowsdeleted = (SELECT @@ROWCOUNT); SET @rowsaffected = @rowsaffected + 1; COMMIT END PRINT CAST(@rowsaffected AS varchar) + ' client records deleted' The query will loop through and delete one record at a time, until it hits the limit of rows affected specified. This is set up as a scheduled task in CF so that I know exactly when it will happen, and can watch for issues if there is a failure using existing global error catching.
Posted by Daniel Short on Dec 11, 2009 at 1:50 PM | Categories: ColdFusion - SQL -

10 Comments

Brad Wood

Brad Wood wrote on 12/13/09 1:07 AM

30 Million records-- that's a sizable chunk. How much traffic does your site see? I would think you could get away with doing more than one record at a time though. If you're on MS SQL server, it's going to try and escalate to a full table lock after it deletes 1024 rows. Did you find a way to keep CF from trying to purge the client records itself?
Daniel Short

Daniel Short wrote on 12/13/09 9:01 AM

It's definitely a busy site. So having 30 million records in 90 days isn't unheard of. I've set it to delete any records over 30 days old now and it's slowly catching up :). As for CF purging the records, you can change that option when you edit your client store in the CF admin. Just click on the datasource you're using on the Client Variables page and you can uncheck the purge option to keep CF from trying to do the work.
Ciaran Archer

Ciaran Archer wrote on 03/05/10 10:09 AM

Hi there - I'm looking into some problems we're having with our CV trim job, and our DB would be equally busy, so I'd love to hear thoughts on the following: Brad: if you delete it one row at a time as Daniel suggests, and each delete is in it's own transaction block, why does MS SQL Server try and escalate it to a full table lock after 1024 rows? Daniel: can I ask why delete it one by one, why not just specify the rowlock in the bulk delete statement above? E.g. /* do deletion */ delete from cdata with (rowlock) where cfid in ( select cfid from cglobal (nolock) where lvisit < @timeThreshold ) delete from cglobal with (rowlock) where lvisit < @timeThreshold What's the difference in doing it one at a time? Thanks!
Daniel Short

Daniel Short wrote on 03/05/10 5:51 PM

Then problem is that you can't guarantee that SQL server will honor the RowLock statement. If it feels it needs a TABLELOCK to do what it needs to do, it will escalate the lock on it's own. Deleting it one at a time ensure that a TABLELOCK won't happen and freeze up your entire app...
Brad Wood

Brad Wood wrote on 03/05/10 6:14 PM

@Ciaran: Sorry if my comment was confusing. If you delete one record at a time it their own transaction, a lock escalation should NOT occur. In my original comment I had said "I would think you could get away with doing more than one record at a time". The main problem with that like Daniel explained is lock escalation. Locking hints are just that-- hints. If SQL Server decides it's tired of keeping track of row locks and that it would be cheaper to just lock the entire table, it will try and request an exclusive table lock. One could probably use rowcount and delete maybe a few hundred at a time, but you would have to try it and see how well it worked.
Daniel Short

Daniel Short wrote on 03/06/10 6:35 AM

Yep, I tried a couple hundred at a time when I first hit the issue and continued to get unwanted locks on the tables. I really wish I could say "no really SQL, I mean ROWLOCK damnit!" Don't they know I'm in charge? :)
Ciaran Archer

Ciaran Archer wrote on 03/06/10 11:08 AM

Thanks for that feedback guys. Is there a way you would suggest to measure the amount of lock escalation in any given minute? We have a job that runs currently every 15 mins and the job trys to delete rows with a last hit older than 2 hours. Before I implement a row-by-row delete I was hoping I would measure the amount of lock escalation going on right now. I suspect I will see an increase in lock escalation every 15 minutes, as my job runs. If I can then implement the new method I should see the amount of lock escalation decrease overall - or at least see no more 'spikes' over 15 mins. I guess doing something via perfmon might be an option? I'd be interested to know your thoughts. We are getting periodic site slowness which I think is related to this client variables trimming job, but I would love to prove it :) Thanks in advance!
Lincoln manning

Lincoln manning wrote on 04/20/10 11:43 AM

I used Microsoft's recommendation on how to find out locks in my system and it has been eye opening. If you are using SQL Server 2000 or 2005 Check out http://support.microsoft.com/kb/271509
SteveO

SteveO wrote on 08/03/11 11:40 PM

Hi Dan, great post. I believe this is whats slowing down our CF server every 67 minutes! We've got about 15M rows in our CGLOBAL and CDATA tables - mostly generated by search engine bots and spiders. I just wanted to clarify the best way to implement this. Would you DISABLE the client purge option in CFAdmin and create your own purge script with your SQL above and schedule it manually every hour? Or is there a way you can you UPDATE the SQL in which the automatic CFAdmin purge script runs to your SQL above? Also, does your SQL only delete 100 rows that are 30 days old? Per day, I think we generate about 450,000 new rows in each table. Does that mean in a 24 hour period (with hourly purge schedules), we would need to increase the "@rowsaffected" to 18,750? Does that make sense? Any help would be greatly appreciated! Regards Steven :-)
Daniel Short

Daniel Short wrote on 08/04/11 8:50 AM

Hey Steve, Yes, you should disable the purge in the cfadmin, as it's horribly inefficient. There is no way that I'm aware of to change the way the client variable purge works directly from ColdFusion. The script does just delete 100 rows at a time. You can of course change that value to something higher, or just run the script more often. I think I had my scripts running every 10 minutes or so (I've since stopped using client variables entirely). If you run your script every 10 minutes, then you could delete 3,000 at a time. Once your table gets back to a manageable size, then the delete script will run pretty quickly, and have negligible impacts on the server.