Home

Archived Posts from “SQL”

TSQL - Self Update

25

July

This morning I needed to compose a very simple SQL routine and it took me around five compiles until I got the syntax right.  All I needed to do was transfer an active status from one entity (in my case a computer) to another.  I decided to implement this by updating the same table in which I was selecting.  I think the syntax associated with "self update" type queries is tricky so this post is so I personally don’t loose this code snippet and future cycles.  I hope it might help you as well.

This example transfers the status of the source computer to the destination computer.  Nothing fancy…

DECLARE @SourceID INT; SET @SourceID = 2
DECLARE @DestinationID INT; SET @DestinationID = 1

DECLARE @Computer TABLE (ComputerID INT, StatusID INT)

INSERT INTO @Computer(ComputerID, StatusID)
SELECT @DestinationID, 1 UNION SELECT @SourceID, 2

SELECT * FROM @Computer

UPDATE A
SET A.StatusID = B.StatusID
FROM @Computer A, @Computer B
WHERE B.ComputerID = @SourceID
AND A.ComputerID = @DestinationID

SELECT * FROM @Computer

(2 row(s) affected)
ComputerID  StatusID
----------- -----------
1           1
2           2

(2 row(s) affected)

(1 row(s) affected)

ComputerID  StatusID
----------- -----------
1           2
2           2

(2 row(s) affected)

 

kick it on DotNetKicks.com


SQL Tip: Uniquely Name Table Columns

10

June

It is easy to see the subtle difference between the Product Tables below.  The table to the left adds the table name as a prefix to nearly all of the column names. image The table on the right represents very simple column names with no redundancy.  Early in my career, I preferred the implementation to the right as I felt this pattern best represented the object (table) itself.  After all, we all have a name.  We don’t have a PersonName. Well, it didn’t take me long to change my opinion.

The next time you are designing a database, do yourself a favor and mimic the pattern to the left.  Always prefix column names with the table name if the column name is likely to be repeated  in more than one table.  ID, Name, Description, DateAdd and DateChange are good examples of column names which tend to show up in multiple places.  When it comes time to build your queries and work with the result sets, you will thank me. 

Consider the two following routines — each returns the same result but the latter is much cleaner, readable, consists of less code and is inherently greater resistant to stupid coding errors especially within the syntax of the join and the column aliases.

– Poor implementation
declare @Product table
(
    ID int, Name varchar(50)
)

declare @ProductVersion table
(
    ID int, ProductID int, VersionNumber int
)

insert into @Product(ID, Name)
select 1, ‘product 1′
union
select 2, ‘product 2′

insert into @ProductVersion (ID, ProductID, VersionNumber)
select 1, 1, 1
union
select 1, 2, 1

– Note the column aliases and
– join on less-than-obvious columns
select P.ID as ProductID, PV.ID as ProductVersionID from
@Product P inner join
@ProductVersion PV on (P.ID = PV.ProductID)

go

– Proper implementation
declare @Product table
(
    ProductID int, Name varchar(50)
)

declare @ProductVersion table
(
    ProductVersionID int, ProductID int, VersionNumber int
)

insert into @Product(ProductID, Name)
select 1, ‘product 1′
union
select 2, ‘product 2′

insert into @ProductVersion (ProductVersionID, ProductID, VersionNumber)
select 1, 1, 1
union
select 1, 2, 1

– Note there’s no need for aliases and
– the join obvious
select P.ProductID, PV.ProductVersionID from
@Product P inner join
@ProductVersion PV on (P.ProductID = PV.ProductID)

Keep this tip in mind.  It will save you heartache throughout your next project.

kick it on DotNetKicks.com


Don’t Mix DDL and DML

05

February

I sat through an absolutely fantastic presentation on query optimization a couple of weeks back.  The presenter did a fine job of covering many of the basics like SET NOCOUNT ON, minimize cursor use, avoid explicit or implicit functions in the WHERE clause and why not to use the sp_ prefix when naming custom stored procedures.  He also touched upon practices which are critical aspects of query optimization, yet, in my opinion, few put into practice. Specifically, he talked about the need for benchmarks and how to establish them.  He also demonstrated how to conduct sufficient testing.  Then there were the more advanced topics like table fragmentation, associated DBCC commands, Statistics IO, Fill Factor and how procedure cache works.  He also shared plenty of TSQL samples and how they will (or will not) use a given index.  For example, did you know the LIKE clause can use indexes if the pattern starts with a character string such as WHERE lname LIKE ‘W%’ but LIKE cannot use an index if the pattern starts with a leading wildcard, such as WHERE lname LIKE ‘%wold’?  Again, excellent presentation and did I mention it was wrapped up in under an hour? 

I typically roll the dice with these types of presentation knowing they may turn out to be a refresher course, way over my head, or worse, an absolute bore.  This one was worth sitting through as I picked up the following nugget: Mixing DDL and DML Operations Can Cause a Recompile.

What does this mean?  DDL (Data Definition Language) and DML (Data Manipulation Language) almost always exists in our stored procedures.  Examples of DDL are our DECLARE variable and CREATE TABLE statements.  DML, on the other hand, includes our SET, SELECT and INSERT to name a few.  A key cause of stored procedure recompiles is the interleaving of DML and DDL statements — especially DDL following DML statements.  In other words, according to Microsoft best practices, place all of your DDL statements at the top of your stored procedures and then do your actual query work. Now, are recompiles always going to happen if you have interleaved DDL and DML?  Nope, not in all case, but you should do everything possible to prevent recompiles (even if it is super easy.)

kick it on DotNetKicks.com


Coping with Windows Auth

30

August

There are a few primary reasons why Windows Authentication should be considered a best practice. Since passwords aren’t visibly exposed in configuration files and credentials are not sent over the network, your systems tend to be more secure.  Additionally, password management (expiration periods, minimum lengths and account lockout after multiple invalid logon requests) becomes a heck of a lot easier. Considering the benefits, I have no issues with applications accessing SQL Server using Integrated Security, however, I think there is a big different between application access and developer access. 

Really quickly, you can setup SQL Server to run in one of two security modes: Windows Authentication or Mixed Mode.  Mixed Mode is exactly as it implies and allows users to connect using Windows NT Authentication or using SQL Server Authentication.  I’m a fan of Mixed Mode which allows for application accounts to reap the benefits of Integrated Security while allowing considerably easier SQL Auth access for developers, support personnel, etc. 

Please consider the following scenarios: 

  1. An instance of SQL Server is setup in the Development Environment and it isn’t running in Mixed Mode.  This implies that one needs to be a member of the appropriate domain and have appropriate permissions in order to access the SQL Server.  Not a big deal, right?  Well, I would agree if I were a developer working onsite and the Dev Network was readily available to me. But I happen to work remotely on occasion and there isn’t an entrance point into this particular domain through VPN.  Boy, SQL Authentication sure would come in handy in this case. 
  2. Let’s say your environments (perhaps Dev, QA and Production) are hosted in separate domains.  It would be painful to switch between domains in order to access each SQL Servers, wouldn’t it?  (I know what you are thinking, “Why would a developer need to access QA or Production?”  Special assignment.  Let’s leave it at that.)

Fortunately, there’s a workaround.  If you are me, you beg and plead and moan and request that security mode be changed (not that that’s easy.) OR you discover an even better solution in the appropriately named RunAs command which allows a user to run specific tools and programs with different permissions than the user’s current logon provides.  

The following are a few of  my favorite commands which I’ve wrapped up neatly in their own .cmd file for quick execution (you will need to update the domain and user values accordingly):

  • runas /user:domain\user “C:\Program Files\Microsoft SQL Server\90\Tools\Binn\VSShell\Common7\IDE\ssmsee.exe”
  • runas /user:domain\user “C:\WINDOWS\system32\mmc.exe /s \”C:\Program Files\Microsoft SQL Server\80\Tools\BINN\SQL Server Enterprise Manager.MSC\”"
  • runas /user:domain\user isqlw

This tip actually came from a wise, remote developer working in Canada who was confronted with Scenario 1 above.  Pretty neat, eh?


DBTalk - Stuff vs Replace

08

August

Why would you use “Stuff” instead of “Replace?”

Good question.  Both perform a similar function – replace a string with another string (or no string at all), but Replace has a simpler syntax.

The Replace function looks like this:

select replace(‘abcdef’,‘bc’,‘xx’)

result: axxdef

No locations to find, just replace the string.  However, what if we had a string with two occurrences of the string, like:

select replace(‘abcabcabc’,‘abc’,‘123′)

result: 123123123

If we only wanted to replace the first one, Replace wouldn’t work, since it always replaces ALL occurrences of the string. But Stuff would, since it only replaces the string it finds at the starting location we tell it for the number of chars we want it to replace. 

select stuff(‘abcabcabc’,4, 3,‘123′)

result: abc123abc

Since I tend to prefer explicit control of strings (to reduce the possibility of unintended substitutions), I will usually grab Stuff rather than Replace.  The overhead is probably slightly less for Stuff since the function already knows where to look for the value, but I don’t consider that a deal breaker.


Capture Results of EXEC SP

07

August

Here is a simple example of a routine to capture the results of an executed stored procedure so they can be acted upon.  To be honest, I am posting this completely for selfish reasons as I tend to forget this syntax more often than I wish to admit.

CREATE TABLE #Results
(
    Column1 varchar(64),
    Column2 varchar(64),
)

INSERT INTO #Results
EXEC SampleStoredProcedure @param1, @param2, @param3

SELECT Column1, Column2
FROM #Results

If, per chance, there is a way to include stored procedure EXEC as a derived table within a sql select statement, I would love to hear about it.  Conceptually, I am thinking something like the following:

SELECT * FROM (EXEC SampleStoredProcedure @param1, @param2, @param3) A


DBTalk - Derived Tables

06

August

For the uninitiated, a derived table is where you create a query that contains another query but that second query is wrapped in parenthesis and you will normally join to it for further results or set operations.

The pseudo-code is like:

Select a.Blah, b.BlahBlah
From  FirstTable a
Join (select BlahBlah from SecondTable ) b On b.ID = a.ID

So the second query becomes the “derived table”, all in memory, and is joinable just like a regular table, and all of its indexes are play as well.

BUT - do derived tables have performance better or worse than temp tables and in memory temp table variables?


DBTalk - Select Into and Temp Tables

05

August

We can create a new copy of a table from an existing one, using the “select into” function.

However, while “select into” is a powerful function it can also be a server killer due to the i/o overhead copying a bazillion rows in a single batch (if the source was a large table), and more importantly, during the query, if you are creating a temp table (like #tablename) rather than a permanent table, it locks tempdb  (yes, the entire db) for the duration of the query, which means that no other temp tables can be created until the lock is removed (this is not a good thing). 

So if we use “select into” to create a temp or perm table, we want to do it really quickly.  Here’s the workaround for achieving the same goal but without breaking a sweat, OR, use it to simply create an empty copy of an existing table:

select   a.* into tmpAccount
from     dbo.Account a with (nolock)
where    1 = 2

That where clause is always false, which in turn says it can’t return any rows which eliminates all table i/o so the query runs licketysplit.  But does it build the table copy we were hoping for?  You bet. 

The above statement builds an exact replica of the source table but without any rows or i/o or locks or trauma, and releases the lock on tempdb almost instantly if you were building a temp table.

Then, we can leisurely fill the table in a manner that suits the row count.  If under 5000 rows, just do the insert.

If over, break it into 5000 row chunks using a “while” loop.  That prevents the transaction log from bursting when you try to cram 1.5 billion records into the new table…


DBTalk - Group By vs Select Distinct

04

August

Here’s something to contemplate:

What’s the difference between “group by” and “select distinct” in a simple query like:

select distinct a.LastName
from   dbo.Account a with (nolock)

could be rewritten as:

select   a.LastName
from     dbo.Account a with (nolock)
group by a.LastName

Who can tell us what the difference in execution would be?  When would you pick one over the other?


DBTalk - @@error and @@rowcount

04

August

We use the @@Error and @@rowcount system variables all the time in queries and sprocs.  However, one might be lead astray thinking they behaved like “regular” variables in programming, i.e., they are persistent, but they are not, they only hold the value from the most recent statement.

Here’s an example where I declare a couple of variables to hold the error and rowcount values and purposely create an illegal query (division by zero), then I grab the rowcount in one statement and display that plus the “error” in the next.  Note that the no error value is zero and a non-zero value indicates error:

declare @Error int, @rowcount int
select @Error = 0, @rowcount = 0 

select ‘Query 1/0′ = 1/0
select @rowcount = @@rowcount
select @@Error = @@Error, ‘@rowcount’ = @rowcount

The results are this:

Query 1/0
———–
Msg 8134, Level 16, State 1, Line 5
Divide by zero error encountered.

@@Error     @rowcount
———– ———–
0           1

Huh? How come the @@Error value is still zero?  The real error value was overwritten by the “error” value from the statement “select @rowcount = @@rowcount”, which of course will always be successful.  Here’s how to properly capture the error and rowcount:

declare @Error int, @rowcount int
select @Error = 0, @rowcount = 0 

select ‘Query 1/0′ = 1/0
select @Error = @@Error, @rowcount = @@rowcount
select ‘@Error’ = @Error, ‘@rowcount’ = @rowcount

and here’s the correct results:

Query 1/0
———–
Msg 8134, Level 16, State 1, Line 4
Divide by zero error encountered.

@Error      @rowcount
———– ———–
8134        1

So the last piece is, why is rowcount still 1 when no data rows were returned?  Every select statement that doesn’t hit an actual table returns at least one row.  If we rem out all of the select and set statements then we get a zero for rowcount.:

declare @Error int, @rowcount int
–select @Error = 0, @rowcount = 0 

–select ‘Query 1/0′ = 1/0
select @Error = @@Error, @rowcount = @@rowcount
select ‘@Error’ = @Error, ‘@rowcount’ = @rowcount

results:

@Error      @rowcount
———– ———–
0           0

So when we do hit a real table, we’ll get an accurate capture of row count activity:

declare @Error int, @rowcount int
select @Error = 0, @rowcount = 0 

select id, status from master..sysobjects where name = ‘grumpy’
select @Error = @@Error, @rowcount = @@rowcount
select ‘@Error’ = @Error, ‘@rowcount’ = @rowcount

results (obviously, no table in master with a name of grumpy):

id          status
———– ———–

@Error      @rowcount
———– ———–
0           0

And one that returns data:

declare @Error int, @rowcount int
select
@Error = 0, @rowcount = 0

select id, status from master..sysobjects where name = sysindexes
select @Error = @@Error, @rowcount = @@rowcount
select ‘@Error’ = @Error, ‘@rowcount’ = @rowcount

results (one data row, no errors):

id          status
———– ———–
-134        0

@Error      @rowcount
———– ———–
0           1

These behaviors are predictable and repeatable and form the basis of structured error handling in sprocs in SQL 2000.  For SQL2005, we have the beautiful new Try/Catch model for error management that makes life a breeze.


DBTalk - Avoid Functions in the WHERE Clause

04

August

Functions like “getdate()” and convert(varchar(10), MyDateColumn,101) commonly show up in the WHERE clause of queries.  Here’s an example:

select   c.LastName
from     Customer c with (nolock)
where    c.LastUpdated < getdate()

You might think “no big deal.” It’s just a function, but for sql server, it almost always turns a deterministic query into a non-deterministic one.  The difference?  Index usage.

Most of the time, sql server will ignore any index on a column that is used in a WHERE clause expression containing a function, turning an index seek into an index or table scan, with the obvious performance penalty.

So instead of the above query, replace it with:

declare  @CurrentDate datetime
select   @CurrentDate = getdate()   

select   c.LastName
from     Customer c with (nolock)
where    c.LastUpdated < @CurrentDate

The function is gone and sql server will use an index on LastUpdated if it exists.

The same holds true for this example:

select   c.LastName
from     Customer c with (nolock)
where    c.LastUpdated < dateAdd(day, -1, getdate())
becomes:

declare  @CurrentDate datetime
select
@CurrentDate = dateAdd(day, -1, getdate())select   c.LastName
from     Customer c with (nolock)
where    c.LastUpdated < @CurrentDate

Sometimes there is simply no workaround for a function in the WHERE clause but, knowing their impact, make every effort to eliminate them. 


DBTalk

04

August

Over the past 6 months, I’ve subscribed to the “DBTalk” distribution list at work and I’ve really been enjoying it.  Here’s are my top 2 reasons:

1. Though I know there are many .NET coders who also happen to be very capable database designers and developers, I believe that the many of us still have a lot to learn.  I would say that we all know at least the basics, but there’s an awful lot of DB knowledge that has eludes us over the years.  If nothing else, the “DBTalk” distribution list has proven to be a non-intrusive way to educate an entire group of developers on SQL Server Best Practices and a few Tips & Tricks.

2. The moderator does a really nice job of getting everyone involved.  Those of us who frequent technical blogs know this isn’t always easy — especially when your audience isn’t necessarily comfortable with the topic.  Fortunately, the moderate provides information on varying topics for all levels and he throws out carrots all the time.  For example, he’ll present a Tip of the Day and ask if anyone has an alternative solution or he will simply submit a problem and ask for possible solutions.  It seems with most conversation at least someone on the distribution list is willing to bite.

Anyhow, with permission, I’ll be posting a number of the topics and tips which have been discussed over the past several months.  They may come all at once, in chunks or one every once in a while.  I haven’t yet decided. 


What’s the Best Way to Manage a Database Queue?

20

June

This question circulates through the office about once a year.  “Um, I have Windows service which processes requests which are queued in a database table.  Everything is working great, but I would like to add a second instance of the service and I need to know the best way to ensure the services don’t step on each other.  In other words, how do I mark each row as it is being processed so it won’t be picked up again?” 

Here are the ingredients in my current ideal solution:

  1. The table must include the following columns: a Unique Identifier such as RequestID, Status, LockedBy, and LockedTime.  I also like to include RequestedTime and CompletionTime columns as I’ve found that capturing these values help when/if system stats are required.
  2. The key is to lock the current request at the time of processing.  To do so, you should use a transaction and an update lock.
  3. I think it is important to pull requests off the queue in an intelligent order.  FIFO is the best bet under most circumstances.  In this case, the RequestedTime column becomes imperative (unless your RequestID is an auto-incrementing identity column and you are comfortable ordering on it.)
  4. Finally, I like to make sure no request is left behind.  Let’s say Service A locks a request for processing and then the application server crashes.  I want the lock to eventually expire so Service B can pick up the row and save the day.

Assuming new requests are added to the queue as ‘Pending’ , one may use a routine similar to the one below to safely pull the next request off the queue:

create procedure [GetNextItemInQueue]
(
    @LockedBy varchar(75)
)
as 

set nocount on 

declare
    @Error                     int,
    @ExitCode                  int,
    @RowCount                  int,
    @PendingStatus               int,
    @MinutesBeforeBreakLock       int,
    @RequestId                   int

select
    @Error                      = 0,
    @ExitCode                   = 0,
    @RowCount                   = 0,
    @PendingStatus                = 1,
    @MinutesBeforeBreakLock        = 30 

begin tran ProcessQueue 

select top 1 @RequestId = RequestId
from Queue with (updlock)
where Status = @PendingStatus
and (LockedTime IS NULL OR
DateDiff(mi, LockedTime, GetDate()) >= @MinutesBeforeBreakLock)
order by RequestedTime Asc 

select @RowCount = @@RowCount, @Error = @@Error 

if (@RowCount = 0)
    goto ExitProc 

if (@Error <> 0)
    goto ErrorHandler 

update Queue
set    LockedBy = @LockedBy,
    LockedTime = GetDate()
where RequestId = @RequestId 

select @Error = @@Error
if (@Error <> 0)
    goto ErrorHandler 

select @RequestId as RequestId 

select @Error = @@Error
if (@Error <> 0)
    goto ErrorHandler 

goto ExitProc 

ErrorHandler:
    raiserror   (@Error, 16, 1)
    if @@trancount > 0
        rollback tran ProcessQueue
    select      @ExitCode =  -100 

ExitProc:
    commit tran ProcessQueue
    return      (@ExitCode)

Once the request is processed, a database call will be made to update the row’s Status to ‘Complete’ and CompletionTime. 

Note, you may choose to change the status of each row from ‘Pending’ to ‘Processing’ when you do the initial lock.  This is perfectly fine, but I’ve found that is unnecessary as the Processing status can be determined based on the value of other columns.

Again, that’s my current preferred implementation.  Perhaps there are better solutions out there?


DataView.RowFilter Top Rows

07

February

Out of the box, you can do a lot with the DataView.RowFilter property, but you can’t make use of the top clause. However, if the view contains a column with unique values, you can obtain the same results by setting the Sort property and then generating a RowFilter which uses the in condition.

Here’s a sample:

public static DataView GetLatestComments(int numComments) { DataView Comments = new DataView(GetComments()); Comments.Sort = "DateCreated Desc"; string filter = string.Empty; string delimiter = string.Empty; numComments = Math.Min(numComments, Comments.Count); for (int i = 0; i < numComments; i++) { filter += delimiter + Comments[i]["Id"].ToString(); delimiter = ","; } if (filter.Length > 0) { Comments.RowFilter = "Id in (" + filter + ")"; } return Comments; }

Note how the delimited string is built.  Phil Haack provided an interesting tip on concatenating a delimited string last year.  Undoubtedly, we’ve all built delimited strings using our preferred method.  If you sift through the post comments, you will see what I mean.  Anyhow, I took a liking to the method Joe Cheng uses when he says he’s “feeling lazy and performance is not critical (which is the vast majority of the time).”


Find Instances of String in Database Objects

29

January

I have a folder full of SQL utility scripts which I’ve compiled over the years.  One of which would search all stored procedures within a given database for instances of a specified string.  This type of script is really helpful if you need to find all routines which reference a specific table column.  This morning I extended the routine to include views and user-defined functions as well.  I also made a change to include all databases on the SQL Server except of the four system databases.

Declare @SearchString varchar(50) Select @SearchString = SearchText TDB Create Table #Results ( [Database] varchar(255), [Object] varchar(255), [Type] char(1) ) SET NOCOUNT ON Declare @DBName sysname; Set @DBName = Declare @Qry nvarchar(4000); Set @Qry = WHILE @DBName IS NOT NULL BEGIN SET @DBName = ( SELECT MIN(name) FROM master..sysdatabases WHERE name NOT IN (master, model, tempdb, msdb) AND DATABASEPROPERTY(name, IsOffline) = 0 AND DATABASEPROPERTY(name, IsSuspect) = 0 AND name > @DBName ) IF @DBName IS NULL BREAK SET @Qry = SELECT ”’ + @DBName + ”’, O.Name, O.Type FROM + QUOTENAME(@DBName) + ..SysObjects O JOIN + QUOTENAME(@DBName) + ..SysComments C ON O.ID = C.ID WHERE O.Type in (P,U,V) AND PATINDEX(% + @SearchString + %,C.Text) > 0 ORDER BY O.Type, O.Name INSERT INTO #Results EXEC (@Qry) END Select Distinct [Database],[Object],[Type] From #Results Order By [Database], [Object] Drop Table #Results

Notice the script used PATINDEX().  What does it do?  It returns the starting position of the first occurrence of a pattern in a specific expression, or zero if the pattern is not found, on all valid text and character data types.  You may have noticed the expression sought by PATINDEX() must be enclosed in “%” signs. but within the expression we could use wildcards “%” for any number of letters, “_” for a single character, [] to specify a range of characters and [^] to skip a range, just as you do when using LIKE comparisons. 


Next Page »

CONTACT

RSS

ARCHIVES

READ BY TOPIC

LINKS

LINK ADS