What’s the Best Way to Manage a Database Queue?

This question circulates through the office about once a year.  “Um, I have Windows service which processes requests which are queued in a database table.  Everything is working great, but I would like to add a second instance of the service and I need to know the best way to ensure the services don’t step on each other.  In other words, how do I mark each row as it is being processed so it won’t be picked up again?” 

Here are the ingredients in my current ideal solution:

  1. The table must include the following columns: a Unique Identifier such as RequestID, Status, LockedBy, and LockedTime.  I also like to include RequestedTime and CompletionTime columns as I’ve found that capturing these values help when/if system stats are required.
  2. The key is to lock the current request at the time of processing.  To do so, you should use a transaction and an update lock.
  3. I think it is important to pull requests off the queue in an intelligent order.  FIFO is the best bet under most circumstances.  In this case, the RequestedTime column becomes imperative (unless your RequestID is an auto-incrementing identity column and you are comfortable ordering on it.)
  4. Finally, I like to make sure no request is left behind.  Let’s say Service A locks a request for processing and then the application server crashes.  I want the lock to eventually expire so Service B can pick up the row and save the day.

Assuming new requests are added to the queue as ‘Pending’ , one may use a routine similar to the one below to safely pull the next request off the queue:

create procedure [GetNextItemInQueue]
(
    @LockedBy varchar(75)
)
as 

set nocount on 

declare
    @Error                     int,
    @ExitCode                  int,
    @RowCount                  int,
    @PendingStatus               int,
    @MinutesBeforeBreakLock       int,
    @RequestId                   int

select
    @Error                      = 0,
    @ExitCode                   = 0,
    @RowCount                   = 0,
    @PendingStatus                = 1,
    @MinutesBeforeBreakLock        = 30 

begin tran ProcessQueue 

select top 1 @RequestId = RequestId
from Queue with (updlock)
where Status = @PendingStatus
and (LockedTime IS NULL OR
DateDiff(mi, LockedTime, GetDate()) >= @MinutesBeforeBreakLock)
order by RequestedTime Asc 

select @RowCount = @@RowCount, @Error = @@Error 

if (@RowCount = 0)
    goto ExitProc 

if (@Error <> 0)
    goto ErrorHandler 

update Queue
set    LockedBy = @LockedBy,
    LockedTime = GetDate()
where RequestId = @RequestId 

select @Error = @@Error
if (@Error <> 0)
    goto ErrorHandler 

select @RequestId as RequestId 

select @Error = @@Error
if (@Error <> 0)
    goto ErrorHandler 

goto ExitProc 

ErrorHandler:
    raiserror   (@Error, 16, 1)
    if @@trancount > 0
        rollback tran ProcessQueue
    select      @ExitCode =  -100 

ExitProc:
    commit tran ProcessQueue
    return      (@ExitCode)

Once the request is processed, a database call will be made to update the row’s Status to ‘Complete’ and CompletionTime. 

Note, you may choose to change the status of each row from ‘Pending’ to ‘Processing’ when you do the initial lock.  This is perfectly fine, but I’ve found that is unnecessary as the Processing status can be determined based on the value of other columns.

Again, that’s my current preferred implementation.  Perhaps there are better solutions out there?

Comments

  1. did u face any problems with this one? I am trying to do exactly same and wondering if this works fine or needs some improvement :)

  2. Hi TG. The implementation outlined above has been running in production for well over a year now without issues. We haven’t encountered any issues yet. Thanks for following up.

  3. does this scale in a web cluster environment making sure one request is not processed by two clients?

  4. @Joe Campbell – It does. As a matter of fact, that’s the primary reason for a routine like this. When a row is locked, it is timestamped and who locked it is captured. As long as the row is locked and the expiration time isn’t met, the row won’t be processed again – by any client. Thanks for asking.

  5. Ben, thanks for the quick response. How does this handle concurrent client requests for a row which is not locked?

  6. @Joe – That’s where the update lock (or updlock) comes in.

    Let’s say Request A and B come in at exactly the same time. Well, SQL Server will ensure that only one of the requests will get a lock on the top row – in other words, the next row to be processed in the queue.

    Once a row is locked, it is no longer available and it can’t be locked by another request. It should also be noted that the lock stays in place until the associated update is performed.

    If Request A gets the initial lock, Request B will lock the next row in the queue and follow the same process as Request A (lock row and capture id, update row based on LockedBy and LockedTime values, implicitly release lock.)

    Did I answer your question?

  7. Hi,
    very nice implementation!

    what about if I need to re queue an item if processing encountered some external problem?

    And what if I want to retry processing for a limited number of times, after which i drop the queue item.

  8. @Giorgio – You’re first case is covered using the @MinutesBeforeBreakLock variable. @MinutesBeforeBreakLock gives a thread a limited amount of time to complete the processing the queued item.

    If, after the number of minutes established with @MinutesBeforeBreakLock, row processing isn’t completed, the row becomes available for processing by another thread. This logic was put in place for exactly the reason you mentioned — failures dues external issues.

    Regarding your second case, that’s not built into the script right now, but it could be extended pretty easily. Off the top of my head (read: following queries should be vetted), a “NumAttempts” column could be added to the Queue table and incremented each time a given row is “pulled off” the queue as part of the update:

    update Queue
    set LockedBy = @LockedBy,
    LockedTime = GetDate(),
    NumAttempts = Coalesce(NumAttempts,0) + 1
    where RequestId = @RequestId

    If you want to limit the number of attempts, the select statement would be changed to something like this:

    select top 1 @RequestId = RequestId
    from Queue with (updlock)
    where Status = @PendingStatus
    and (LockedTime IS NULL OR
    DateDiff(mi, LockedTime, GetDate()) >= @MinutesBeforeBreakLock)
    and (NumAttempts < @AttemptsAllowed)
    order by RequestedTime Asc

    Note, the restriction based on NumAttempts being less than your established limit.

    Thanks for the comment and the question.

  9. Hi Daniel,

    If I am understanding your question correctly, I didn’t do any special processing. There was no multi-thread or throttling. The service(s) merely polled the queue on a given timer interval. If items were in the queue, each service would process queued items until all were processed to completion. Once all items were processed, each service would wait until the next interval fired and the polling and processing continued.

    I hope this helps,
    Ben

  10. Thank you,

    So you did your polling based on a timer? and added more services if need be? how would you know if you needed to add more by the difference between locktime and Requested time?

closed