Saturday, December 6, 2008

SQL SERVER - SELECT 1 vs SELECT * - An Interesting Observation

SQL SERVER - SELECT 1 vs SELECT * - An Interesting Observation

February 26, 2008 by pinaldave

Many times I have seen issue of SELECT 1 vs SELECT * discussed in terms of performance or readability while checking for existence of rows in table. I ran quick 4 tests about this observed that I am getting same result when used SELECT 1 and SELECT *. I think smart readers of this blog will come up the situation when SELECT 1 and SELECT * have different execution plan when used to find existence of rows.

Let us see 4 tests I ran and note their result is same and their Execution Plan take same amount of resources (50% and 50%)

You can click on images to see larger images.

Test 1 : Whole Table SELECT

USE AdventureWorks
GO
IF EXISTS(
SELECT 1
FROM Production.Product)
SELECT 'SELECT 1'
GO
IF EXISTS(
SELECT *
FROM Production.Product)
SELECT 'SELECT *'
GO

Test 2 : Condition WHERE on Indexed Column

USE AdventureWorks
GO
IF EXISTS(
SELECT 1
FROM Production.Product
WHERE Name = 'Adjustable Race')
SELECT 'SELECT 1'
GO
IF EXISTS(
SELECT *
FROM Production.Product
WHERE Name = 'Adjustable Race')
SELECT 'SELECT *'
GO

Test 3 : Using Aggregate function COUNT

USE AdventureWorks
GO
IF (
SELECT 1
FROM Production.Product
WHERE Name = 'Adjustable Race') = 1
SELECT 'SELECT 1'
GO
IF (
SELECT COUNT(*)
FROM Production.Product
WHERE Name = 'Adjustable Race') = 1
SELECT 'SELECT *'
GO

Test 4 : Using COUNT with search on non-indexed Column

USE AdventureWorks
GO
IF (
SELECT COUNT(1)
FROM Production.Product
WHERE SafetyStockLevel = '800') > 1
SELECT 'SELECT 1'
GO
IF (
SELECT COUNT(*)
FROM Production.Product
WHERE SafetyStockLevel = '800') > 1
SELECT 'SELECT *'
GO

Note: I have used single quotes around 800, which is numeric and I am aware of that fact. Above queries are for testing purpose only.

I have been using SELECT 1 instead of SELECT * when checking existence of rows. I would like to see what my readers have opinion about this. Please have your opinion and make your comment here.

Reference : Pinal Dave (http://www.SQLAuthority.com)

kick it on DotNetKicks.com

Posted in Author Pinal, SQL, SQL Authority, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology | 21 Comments
21 Responses to “SQL SERVER - SELECT 1 vs SELECT * - An Interesting Observation”

1.
on February 26, 2008 at 11:59 am Allan Svelmoe Hansen

I read about this as well over on
http://www.sqlskills.com/blogs/conor/2008/02/07/EXISTSSubqueriesSELECT1VsSELECT.aspx
However my own testing, both when viewing execution plans and using Statistic timer to see compile/execution time, is that I can’t find consistent difference between using SELECT * or SELECT 1 when combined with an EXISTS clause/condition

(even when using DROPCLEANBUFFERS and FREEPROCCACHE between each run, which as far as I understand should clear the cache and force a recompilation of my queries).

I’m glad somebody else have taken this subject up as well, as I am strained for time to play myself at the moment - and it is one of those “myths” within SQL currently, that I’m having trouble finding out if is true or not.

2.
on February 26, 2008 at 6:28 pm Kris

If you try this with a view that return columns from functions or subquieries

ie

select c1,c2,(select tc from tbl2 where tbl2.t1 =tbl.t1) as tc3
from tbl

the select 1 will not execute the (select tc from tbl2 where tbl2.t1 =tbl.t1 ) sub query. This is a significant performance gain my view runs 99% slower with select *

3.
on February 26, 2008 at 8:01 pm simon worth

The problem with SELECT * is it retrieves all the meta data associated to the object(s) in question BEFORE runtime. So during compilation of the query there is unneeded overhead associated to it - but at run time the query plans will be the same.

Just because the query plans are the same does not mean that the queries perform the same - it is compile time that is taking the hit in this case, and not run time.

So you’re doing yourself a compilation favor if you use SELECT 1 rather than SELECT *, even if you don’t see the gains in the execution plan - as I said they will probably be exactly the same. Just keep in mind that meta data is being queried and returned, then discarded because it is unneeded to get the results of you query. Why thrash the system tables when you don’t need to?

When you go to the store to get a bottle of coke, you don’t buy all the coke products in the store and throw out those you didn’t want - it’s the same thing here. Get what you need from the beginning - it costs less.

4.
on February 26, 2008 at 10:17 pm Manik Bod

One Great Article. Pinal tum great ho!

5.
on February 27, 2008 at 2:20 am jemm

I don’t think there is any difference, as long as the SELECT 1/* is inside EXISTS, which really doesn’t return any rows - it just returns boolean as soon as condition of the WHERE is checked.

I’m quite sure that the SQL Server Query Optimizer is smart enough not to search for the unneeded meta data in the case of EXISTS.

I agree that in all the other situations SELECT * shouldn’t be used for the reasons Simon mentioned. Also, index usage wouldn’t be optimal etc.

For me EXISTS (SELECT * ..) is the only place where I allow myself to write SELECT * in production code ;)

6.
on February 27, 2008 at 4:24 am Mladen

@simon worth:
can you provide some source or example for this claim? i’d be very interesetd to see.

7.
on February 27, 2008 at 6:03 am rsivam

good post, I have kicked your article at
http://blog.sqlauthority.com/2008/02/26/sql-server-select-1-vs-select-an-interesting-observation/

8.
on February 27, 2008 at 12:03 pm Allan Svelmoe Hansen

@simon worth:
By running statistics time on I’ve not seen any consistent indication that at compile time the query will collect all the metadata and thus have a higher compile time them then a SELECT 1. Some times it was faster, other times it was not.

Without looking into the engine I would also seriously think it would be bad implementation to retrieve any sort of meta data when the SELECT * is within a conditional clause, like EXISTS. The engine would be able to see from the syntax alone, that there is no usage for the specific data and thus no need to look up any additional metadata, other then testing whether the statement evaluates true or not.

I’d be very interested if you had some code which could show this behaviour. Or some links which have shown this to be true. Because I can’t see it myself, neither logical nor running statistic on the queries.

9.
on February 27, 2008 at 11:39 pm Tim B

It seems that this is primarily a semantic argument in terms of runtime performance. Personally, I’ve always used SELECT 1 because I wasn’t sure about the implications of the alternative.

If it comes down to little more than personal preference, I’m going to continue using SELECT 1 because it reinforces the idea that SELECT * is (usually) bad.

10.
on February 28, 2008 at 2:26 am Thomas Freudenberg

To check the existence of rows you can also try this statement:

IF EXISTS(SELECT NULL as [EMPTY] FROM Production.Product)

11.
on March 5, 2008 at 7:35 pm simon worth

If you want to read more about the meta data collection on objects when the query is executed - read the article that Allan Svelmoe Hansen posted in the first comment. Conor used to work (I don’t think he still does) for Microsoft on the Query Processor team. If anyone would know the inside of the query processor it’s him. In that blog post he states

“The * will be expanded to some potentially big column list and then it will be determined that the semantics of the EXISTS does not require any of those columns, so basically all of them can be removed.

“SELECT 1″ will avoid having to examine any unneeded metadata for that table during query compilation.”

Source : http://www.sqlskills.com/blogs/conor/2008/02/07/EXISTSSubqueriesSELECT1VsSELECT.aspx

12.
on March 5, 2008 at 7:45 pm Guy Pravin

Simon,

We can not take anybodies word for it. (With due respect to Conor - I know he is the man). Is there any proof for this?

-Guy

13.
on March 5, 2008 at 7:54 pm simon worth

Honestly I don’t see what the importance of this subject is.

People have been saying for a long time that SELECT * in an exists condition has unneeded overhead - and there are several blog posts out there to back up that claim.

If you really really want to use SELECT * instead of using SELECT 1 in production code - then go ahead.

If you haven’t seen performance issues with your code then don’t worry about it.

I understand that you want something tangible to look at and prove or disprove this “theory”, but I don’t have time to put the code together and gather the statistics just to satisfy curiosity.

I’ll leave that up to you to prove or disprove.

It may be a good idea to post a comment on Connor’s site asking if he can prove or disprove this - as he is more intimate with the query processor than I am.

14.
on March 5, 2008 at 8:00 pm pinaldave

Simon,

Good comment. I think there is no real conclusion. As I said I always used SELECT 1, however this whole exercise is for fun and what everybody think about this matter.

I like your way of thinking however neither theory is proven yet. I am also curious what other readers have to suggest.

Regards,
Pinal Dave ( http://www.SQLAuthority.com )

15.
on May 2, 2008 at 1:27 pm emily

Hi,

Just want to ask, is there any select statement whereby it will call you to select the record start from second record then additional 10 records onwards?

For example,

User have to retrieve record 2 until record 12?
If yes, can help?

Thanks.

16.
on June 25, 2008 at 12:09 pm Ravi

Hi emily here is the answer for your question.

SELECT rnum, employee_id
FROM (
SELECT rownum rnum, employee_id
FROM employees
WHERE ROWNUM < 100)
WHERE rnum BETWEEN 2 AND 12;

this query will retrieve output for
record 2 until record 12. By use of inline view.

Regards,
Ravi
Sony India
____________________
Hi,

Just want to ask, is there any select statement whereby it will call you to select the record start from second record then additional 10 records onwards?

For example,

User have to retrieve record 2 until record 12?
If yes, can help?

Thanks.

17.
on July 11, 2008 at 1:52 am G2

Nice observation

18.
on September 19, 2008 at 11:17 pm Bryan Stauffer

SQL 2005

I managed to get a glimpse using the following two queries. According the estimated query execution plans the ’select *’ query has an extra step and the estimated row size is larger than ’select 1′. Perhaps this is not proof but I think it is more evidence that ’select 1′ is the better practice.

select * from tempdb.sys.sysobjects where [name] = ”

select 1 from tempdb.sys.sysobjects where [name] = ”

19.
on September 19, 2008 at 11:19 pm Bryan Stauffer

sorry, on that last post, i tried to include a string that inadvertenly looked like html and was stripped out. just use any user-defined table for the [name] = portion of the query to duplicate my findings.

20.
on November 21, 2008 at 8:54 pm Lutz Mueller

Hi Pinal,

I’m wondering what test conditions you used…

The other day I had a performance issue with a proceudre that included a statement like

IF NOT EXISTS (SELECT * FROM view WHERE condition)

The view itself based on two tables withe several 10-thousand rows in one and several million in the other (as to my best knowledge properly joined and indexed using information from STATISTICS XML). The SELECT statement above would return approx. 1 million rows.

When I changed it to SELECT 1, performance increased significantly.

My explanation (at least to myself) goes a little more in direction of boolean logic:

The NOT EXISTS statement becomes FALSE performing the following tasks:
SELECT * -> grab all the data and if there are any drop them and return FALSE
SELECT 1 -> try to grab the first row and if you find one single row return FALSE, no matter on how many rows fulfill the WHERE clause

As far as I figured the TRUE statement takes the same time for both, * amd 1, since the resultset is NULL in both cases.

Would you agree with my comment and, if not, retest your code with large tables?

Regards

Lutz

21.
on December 5, 2008 at 11:09 pm Rao K

What is the equallent in SQL Server for

select * from table1 a
where (a.x, a.y) in (
select b.x, min(b.y)
from table1 b
group by b.x
)

No comments: