Заметки Дмитрия Пилюгина о Microsoft SQL Server 

Twitter RSS
Home SQL Server (все заметки) isnumeric selectivity estimation bug (en)
formats

isnumeric selectivity estimation bug (en)

Some time ago on one of the forums there was a question, about why adding to the query where clause a condition «isnumeric(column)=0», makes query very slow.
I did some investigations and that lead me to some interesting results.

Preparing data

Lets create a test db, table and fill it with data.

create database IntrinsicCETest;
go
use IntrinsicCETest;
go
create table t (a int identity primary key, b varchar(8), c char(200) not null default(''));
create index ix_b on t(b);
go
create proc FillTestData( @SampleRowCount int )
as
truncate table t; --clear all data
with numbers(n) as --insert @SampleRowCount rows
(
	select top(@SampleRowCount) row_number() over(order by (select null)) from sys.columns c1,sys.columns c2,sys.columns c3
)
insert t(b)
select
	case when n%2 = 0 then 'A'+left(newid(),7) else convert(varchar(8),dateadd(dd,n,'19000101'),112) end
from
	numbers
alter table t rebuild; --rebuild all indexes
update statistics t with fullscan; --update statistics with full scan to get accurate estimates
go

Half of the values of column b satisfies isnumeric(b)=0, half satisfies isnumeric(b)=1.

Strange behaviour

Now, lets execute the same query three times, for different amount of rows 10, 100 and 1000.

use IntrinsicCETest;
go
exec FillTestData 10;
select page_count from sys.dm_db_index_physical_stats(db_id(),object_id('t'),1,null,null);
set statistics io on
select t.b,t.c from t where isnumeric(t.b) = 0;
set statistics io off
go
exec FillTestData 100;
select page_count from sys.dm_db_index_physical_stats(db_id(),object_id('t'),1,null,null);
set statistics io on
select t.b,t.c from t where isnumeric(t.b) = 0;
set statistics io off
go
exec FillTestData 1000;
select page_count from sys.dm_db_index_physical_stats(db_id(),object_id('t'),1,null,null);
set statistics io on
select t.b,t.c from t where isnumeric(t.b) = 0;
set statistics io off
go

Results:

1) Table size: 1 page
Table ‘t’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
2) Table size 3 pages
Table ‘t’. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
3) Table size 29 pages
Table ‘t’. Scan count 1, logical reads 1005, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

As you can see, between execution 2 and 3, there is a huge difference in logical reads. Whereas table has grown 10 times (from 3 pages to 29), reads has grown 200 times! Lets take a look at the query plan to know why?
Rerun 2 and 3 query with option «Include actual execution plan».

exec FillTestData 100;
select page_count from sys.dm_db_index_physical_stats(db_id(),object_id('t'),1,null,null);
set statistics io on
select t.b,t.c from t where isnumeric(t.b) = 0;
set statistics io off
go

exec FillTestData 1000;
select page_count from sys.dm_db_index_physical_stats(db_id(),object_id('t'),1,null,null);
set statistics io on
select t.b,t.c from t where isnumeric(t.b) = 0;
set statistics io off
go


As it is seen from the plans, in the second query server decided to use non-clustered index scan+key look up instead of clustered scan, despite of the half table is selected. That is odd.
There might be a question, why plans differ in general? The thing is, that optimizer takes cost based decisions, and from some point it becomes more cheap (i.e. cost is lower) to choose another index, than scan the whole table. In this case optimized decided that clustered index became large enough to scan it, rather it would be more cheap to scan non-clustered index and then seek clustered. Obviously, this was a wrong decision. Lets look, what estimates lead to this decision.

As we can see, optimizer estimated row count as 1, whereas actual row count is 500.

Now, lets run this query and look at the plan:

select t.b,t.c from t where isnumeric(t.b) = 1;

As we can see, this time, optimizer did the right choice, and though we have no statistics on isnumeric expression and estimates are different from actuals, the difference is not so dramatic as 1 vs 500, and the correct plan is chosen.
Now lets force index ix_b in the «isnumeric=1» query, to look at the plan (I also force it on the query isnumeric=0, though it is not necessary, but I’d like to test both queries optimized under the same conditions).
Note:
Here and further, i write «execute query», presuming that it will be always run in «set showplan_xml on» mode. That is estimate plan will be build.

set showplan_xml on
go
select t.b,t.c from t with(index(ix_b)) where isnumeric(t.b) = 0;
select t.b,t.c from t with(index(ix_b)) where isnumeric(t.b) = 1;
go
set showplan_xml off
go

Worth taking into consideration:
1. The first plan has no Filter and Compute Scalar operators. Instead of it, filter is put into the predicate property of the scan.
2. Plans differ in quantity of operators, but has similar shape. The query with «=1», even with that shape of plan has normal 177 rows estimate, in opposite to «=0» query.

Where the is the filter in isnumeric=0 query

In fact, we do have filter operation in plan, but it is placed together with the scan operator, not as separate plan operator. This happens because of the plan transformation after it was build and copy out, this is known as «post optimization rewrite».
There is a TF 9130 (which i described in my earlier post RowGoal on non-uniform distribution(ru)) that disables this transformation. Lets rerun queries, with this flag enabled, to look at the plan shapes.

set showplan_xml on
go
select t.b,t.c from t with(index(ix_b)) where isnumeric(t.b) = 0 
option(recompile, querytraceon 9130);
select t.b,t.c from t with(index(ix_b)) where isnumeric(t.b) = 1
option(recompile, querytraceon 9130);
go
set showplan_xml off
go

The first query plan shape will be the following:

As we can see, Filter expects on the input 1000 of rows (correct), and after Filter expected 1 row (incorrect).
We may wonder, why Filter is put down to the Nested Loops Join? That is because optimizer has a technique known as «predicate pushdown». This is done to reduce an amount of data, that will be passed to later operators (e.g. to join). In this case optimizer is mistaken when doing this.

Case of trivial plan

Lets make our query more easy and eliminate column t.c from select list. This will eliminate operation «key lookup + nested loops» also, because now we will have all the necessary information in index ix_b (i.e. ix_b becomes covering for this query).

set showplan_xml on
go
select t.b from t with(index(ix_b)) where isnumeric(t.b) = 0 
option(recompile, querytraceon 9130);
select t.b from t with(index(ix_b)) where isnumeric(t.b) = 1
option(recompile, querytraceon 9130);
go
set showplan_xml off
go


Even in this case, estimate is completely wrong. Now, 1 vs 1000!

Note:
I will no more include the exact quantity of rows in the screen shots, just look at the thickness of arrows, in this case, thin arrow — stands for 1 row, thick — for more than 1. That is what important for this note.

Now lets run the following query (we added (select 1) expression to select list):

set showplan_xml on
go
select t.b,(select 1) from t with(index(ix_b)) where isnumeric(t.b) = 0 
option(recompile, querytraceon 9130);
select t.b,(select 1) from t with(index(ix_b)) where isnumeric(t.b) = 1
option(recompile, querytraceon 9130);
go
set showplan_xml off
go

Estimates became good in both cases.
For the first look, this makes no sense, but it is not true.
Lets run the query:

set showplan_xml on
go
select t.b from t with(index(ix_b)) where isnumeric(t.b) = 0 
option(recompile, querytraceon 9130, querytraceon 8757);
select t.b from t with(index(ix_b)) where isnumeric(t.b) = 1
option(recompile, querytraceon 9130, querytraceon 8757);
go
set showplan_xml off
go


We observe that here we have also a good estimate (we also don’t observe one of Compute Scalar, comparing to previous plan, which is natural, as it was responsible for computing (select 1)).
What is TF 8757? As I wrote in earlier post Trivial Plan Optimization(ru), TF 8757 prevents trivial plan phase. The same does subquery (select 1). So the last two sets of queries, with good estimates, were fully optimized, whereas the bad one had trivial plan.

It might seem natural to suppose, that wrong estimates somehow depends on an optimization level, but that is not true. The initial query (with nested loops + key lookup) also has full optimization level, but still has poor estimates.

What do we have so far

Before finding the explanation of all this odd things, lets sum up all we know for that moment:
1. Simplified query isnumeric=0 has wrong estimate in case of TRIVIAL optimization level, and correct in case of FULL.
2. Query with isnumeric=1 always has correct estimate.
3. Initial query isnumeric=0 (with NL + key lookup) has poor estimate, despite of FULL optimization level.

Explanation

At first, we should recall, what stages passes query during optimization. In general these are: parsing/binding, simplification, exploration, optimization (trivial or full). One of the most important parameters, that determines optimization process — is cardinality (i.e. row count). During the whole optimization process optimizer uses cardinality estimation framework to take one or another decision.
In general it may be pictured like that:

For our investigation it would be very interesting to look at what happens with estimates on that stages (simplification, exploration, optimization). Here another set of undocumented trace flags might help. I have already described them in my earlier posts, these are: 8606 (logical operators trees during simplification process), 8612 (adds cardinality and other info to the trees) and well-known 3604, which directs output to console.

Lets run the queries and switch to the Messages tab:

set showplan_xml on
go
select t.b from t with(index(ix_b)) where isnumeric(t.b) = 0 
option(recompile, querytraceon 3604, querytraceon 8606, querytraceon 8612);
select t.b from t with(index(ix_b)) where isnumeric(t.b) = 1
option(recompile, querytraceon 3604, querytraceon 8606, querytraceon 8612);
go
set showplan_xml off
go


From this we can see, that on simplification phase we already have an estimate of LogOp_Select equaled to 1 for isnumeric=0, and 1000 for isnumeric=0.

Now, if we rerun queries with trivial plan preventing flag, we’ll see the same picture during simplification:

But the other one in the plan:

So far we can conclude:
Estimation of the specific operator doesn’t depend on optimization level. But, in case of trivial plan, the optimization is aborted early with trivial plan found, and there is no serious investigations of alternative plans. Trivial plan is based on the simplified tree, and poor cardinality estimate of 1 row goes into the plan. When we prevent trivial plan, we force optimizer to do more comprehensive investigations, and, obviously, it is able to find a plan with more appropriate estimates.

There is still a question, why than optimizer fails to build a good plan with good estimates for the initial query with loop join. We’ll come back to that question later. And now, first of all, it would be interesting to know, what «magic» happens between the output of «simplified tree» and «join collapse tree».
In that moment properties derivation performed, you can observe it enabling trace flag 2373, between «simplified tree» and «join collapse tree» you will see:

Memory before deriving properties of LogOp_Get: 7
Memory after deriving properties of LogOp_Get: 7
Memory before deriving properties of LogOp_Select: 7
Memory after deriving properties of LogOp_Select: 7

Among the other properties there are selectivity and cardinality.
In general, this process may be pictured like this:

Lets see, what is exactly hidden under «Derive properties».
For better understanding, it should be recalled, how cardinality estimation is performed. Derivation of cardinality starts from the bottom base operator using statistical information. This estimate is modifying while flowing up the tree. For example, when calculating filter cardinality, first the selectivity of scalar operators performed, then it is multiplied by the input cardinality and we have an output cardinality of the filter.

Below you may see, what exact functions are called internally to perform selectivity derivation for scalar operators, and cardinality for relational.

Screen shot illustrates hierarchy of the function calls, that was grabbed with the help of symbolic debugger parsed to XML view to be more convenient.
We should pay attention on three of the functions:

  • CScaOp_Intrinsic::FCalcSelectivity (calculate selectivity for is numeric);
  • CScaOp_Const::FCalcSelectivity (calculate selectivity for a constant);
  • CScaOp_Comp::ConstConstSelec (calculate selectivity for CScaOp_Comp, in case when two constant values compared).

Function invocations illustrate those what was described above. To calculate filter cardinality, first selectivities of the child scalar operators are calculated, then, in function CScaOp_Comp::ConstConstSelec, selectivity for compare operator is calculated, and then result is applied to input cardinality, to get output.

The thing is that function CScaOp_Comp::ConstConstSelec in case of isnumeric = 0, returns 0 selectivity! So estimated cardinality might be — 0*1000 = 0 rows. Here is where optimizer is mistaken. Everything further goes not in a good way. Of course, as it is statistical estimation, the minimum is never evaluated as 0 rows, instead optimizer does some further calculations and succeeds with 1 row estimate.
Here is function calls for isnumeric=0 and isnumeric=1.

Look, how much less function invocations are done in case of proper estimate, which succeeds with FValid function call.

It is interesting to know, that we may influence on function CScaOp_Comp::ConstConstSelec result by means of one more undocumented TF 2328.
Here it goes:

set showplan_xml on
go
select t.b from t with(index(ix_b)) where isnumeric(t.b) = 0
option(recompile)
select t.b from t with(index(ix_b)) where isnumeric(t.b) = 0
option(recompile, querytraceon 2328)
go
set showplan_xml off

If we look at the plans, we will see, that with this flag estimated row count was changed to 500 rows.

That is because with this flag server makes guesses. With this flag execution path leads to function CScaOp_Comp::GuessSelect, which, in this case, gives us 0.5 selectivity estimate (i.e. 50% of rows)
Caution!
This is not a solution of the problem in anyway! This is used just to demonstrate conception.

Upd 20.11.2012: Recently found the description of this TF in MS blogs — Disabling Constant-Constant Comparison Estimation

Now, let us come back to the initial query with loops join. We still have one unanswered question.
Why do we have good estimates for simplified query (those for which ix_b became covering) if it is fully optimized, but don’t have the same result for initial query?

To answer that question in general, we should recall that optimizer choses plan based on costing. So, there is simple answer, the bad plan is chosen because it is cheaper or because the alternatives for a good plan isn’t even explored (there might be other reasons, e.g. timeout, but not in this case).

To store and perform calculations over alternative plans optimizer uses a structure called memo. I described it in my earlier post Full Optimization: Search 0(ru), so won’t cover it here in details. I recommend to read Benjamin Nevarez blog — Inside the Query Optimizer Memo Structure if you are not familiar with it. For now, it is important that we may look at what alternative operators were considered by the optimizer, when choosing the plan. To view this structure we should enable TF 8615.
At first lets see Memo for simplified query (TF 8757 is also enabled to prevent trivial plan, because memo is not build for trivial optimization):

set showplan_xml on
go
select t.b from t with(index(ix_b)) where isnumeric(t.b) = 0
option(recompile,querytraceon 9130,querytraceon 3604,querytraceon 8757,querytraceon 8615)
go
set showplan_xml off
go

If we look carefully into the output, we’ll notice that there are two Filter operations PhyOp_Filter, in two groups. Cardinality of the first on is [Card=1], another one is [Card=177.828]. In this case, the more cheaper alternative is to build plan using group 9 (with proper estimates), than group 12.

The process of taking plan from memo is called copy out. As the exact process, how the final tree is build from memo, is unknown. Lets try to look at both plans in another way. Let build the second plan as it would be no alternative filter operator in memo. To accomplish that, we should recall how do alternative appear in memo. Alternatives generated by means of applying transformation rules to existing operators. In this case, the rule, responsible for the «good» plan, is SelectToFilter. We can disable this rule for this particular query using undocumented query hint queryruleoff, which I described in earlier post(ru).

So, lets run the query:

set showplan_xml on
go
select t.b from t with(index(ix_b)) where isnumeric(t.b) = 0
option(recompile,querytraceon 9130, querytraceon 3604, querytraceon 8757, querytraceon 8615)
select t.b from t with(index(ix_b)) where isnumeric(t.b) = 0
option(recompile,querytraceon 9130,queryruleoff SelectToFilter, querytraceon 3604, querytraceon 8757, querytraceon 8615)
go
set showplan_xml off

If you will look at the second memo, you’ll see, that now we have only one PhyOp_Filter operator.
Lets take a look at the plans:

Pay your attention to the thickness of arrows. After disabling the rule, the second plan gives almost the same bad estimate for the filter, as the initial query — one row. Also consider, that the cost of the «bad» plan is slightly higher than the «good» one. I think that is why the good one plan is chosen here by default.
Also, take a look at the select row counts, they are the same for the «bad» plan and a «good» one! This is not a screen shot mistake, it is real. This is because of a hidden Compute Scalar, which has right estimate of 177.828 rows. We’ll come back to that later on.
So, to sum up, in case of simplified query the «good» one plan is selected because it is simply cheaper.

Lets come back to initial query with loop join, and run the following:

set showplan_xml on
go
select t.b,t.c from t with(index(ix_b)) where isnumeric(t.b) = 0
option(recompile,querytraceon 9130,querytraceon 3604,querytraceon 8757,querytraceon 8615)
go
set showplan_xml off

If you look into memo after running this query you’ll find only one operator PhyOp_Filter, in group 12 with cardinality equals to 1. Form here we may conclude, that for this query, another alternative plans simply not memoized and not considered when choosing the plan. It considers the plan with wrong estimates to be good enough.
As we remember, after optimizer missed with selectivity, it considers that one row will be selected and pushes down isnumeric=0 predicate, instead of first calculating it and then filter. If we enumerate from the leaves to the root, in «good» plan — first goes compute scalar, then filter. In the «bad» one — first filter, then, invisible in plan, compute scalar.
The rule responsible for the early filtration is SelResToFilter. This exact rule combined with wrong selectivity gives a «bad» plan. Lets try to execute almost the same query under different conditions: as is, with rule off, and for isnumeric=1 predicate — and then look at the plans.

set showplan_xml on
go
--as is
select t.b,t.c from t with(index(ix_b)) where isnumeric(t.b) = 0
option(recompile,querytraceon 9130)
--disable rule SelResToFilter
select t.b,t.c from t with(index(ix_b)) where isnumeric(t.b) = 0
option(recompile,querytraceon 9130,queryruleoff SelResToFilter)
-- with value 1
select t.b,t.c from t with(index(ix_b)) where isnumeric(t.b) = 1
option(recompile)
go
set showplan_xml off


As we can see, after disabling the rule, we have got a proper plan for the value 0 as for the value 1.
Now, lets go back to the invisible scalar operator.

Tendency to propagate an error through the plan

As we know, an estimation errors tend to accumulate as the estimations flows from the bottom to the top. For example, misestimating somewhere in the scan or seek we may multiply this misestimate somewhere further, in join for example.
It is interesting, how this bug behaves. We have saw, that the root node contained the same number of rows for the right and the wrong plans.

There might be a few reasonable questions.
First, why there is one row on the select node input in plan, but the estimated row count of select node is 177.828 (I don’t refer to select plan node as an operator, because it’s actually not an operator)?
Second, what if this plan would be a subplan of more complex one?

To answer the first question, we’ll run query with another undocumented flag 8607 with added 8612, which I’ve described in earlier post Full Optimization: Search 0 (ru)

set showplan_xml on
go
select t.b,t.c from t with(index(ix_b)) where isnumeric(t.b) = 0
option(recompile,querytraceon 3604,querytraceon 8607,querytraceon 8612)
select t.b,t.c from t with(index(ix_b)) where isnumeric(t.b) = 1
option(recompile,querytraceon 3604,querytraceon 8607,querytraceon 8612)
go
set showplan_xml off

On the message tab we’ll see (shorten for brevity):

--isnumeric = 0
*** Output Tree: ***
  PhyOp_ComputeScalar [Card=177.828]
     PhyOp_Apply lookup TBL: t (0) (x_jtInner) [Card=1]
        PhyOp_Filter [Card=1]
           PhyOp_Range TBL: t(2) ASC
           ...
--isnumeric = 1
*** Output Tree: ***
  PhyOp_Filter [Card=177.828]
     PhyOp_ComputeScalar [Card=1000]
        PhyOp_Apply lookup TBL: t (0) (x_jtInner) [Card=1000]
           PhyOp_Range TBL: t(2) ASC
           ...

As it could be seen from results, the order of filter and compute scalar operations are different. For the query with isnumeric = 1 the output tree seems to be more logically correct. If we read from bottom to top: get range of rows (PhyOp_Range), apply additional data from clustered index (PhyOp_Apply), calculate isnumeric expression (PhyOp_ComputeScalar), filter predicate (PhyOp_Filter).
For the first query tree, it doesn’t seem to be so logical. We could see that the upper most operator is PhyOp_ComputeScalar, that has correct estimate and will be considered by the upper operators.
To demonstrate it we can run the query:

set showplan_xml on
go
declare @t table (a int primary key);
select t.b,t.c from t with(index(ix_b)) cross join @t t2 where isnumeric(t.b) = 0 option(recompile, querytraceon 9130);
go
set showplan_xml off
go

Lets look into the plan:

Wonderful picture, on the both inputs of the Nested Loops Join there is one row, but at the output 177.828 rows. Suddenly! =)
But we already know from where such an estimate comes.

One more simple query:

select t.a,t.b,t.c into #t1 from t;
alter table #t1 add constraint pk_t1 primary key (a);
go
set showplan_xml on
go
select t.b,t.c from t with(index(ix_b)) join #t1 t1 on t1.a = t.a where isnumeric(t.b) = 0 option(recompile, querytraceon 9130);
go
set showplan_xml off
go
drop table #t1;

План:

In the Sort operator there is again 177 rows, but as we can see the estimate error doesn’t propagate further through the plan. So, answering the second question, does this error tend to spread on the plan, the answer might be — at least not always, and more to «no» than to «yes». But we should remember, how the plan is build. This error still may has an impact on the shape of the whole plan and the other operators.
Consider the following imaginary query:

set statistics io on
go
select t1.c,t2.c from t t1 join t t2 on t1.b = t2.b where isnumeric(t2.b) = 0 and isnumeric(t1.c)=0 option(recompile);
select t1.c,t2.c from t t1 join t t2 on t1.b = t2.b where isnumeric(t2.b) = 1 and isnumeric(t1.c)=0 option(recompile);
go
set statistics io off
go

Results:

Table ‘t’. Scan count 1001, logical reads 3182, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘t’. Scan count 2, logical reads 62, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


Though this example is strained, it is illustrative, that this kind of things may happen! And the forum question, that lead to this post, contained the picture of the plan with exactly that shape, from the real life situation.

A few possible worarounds

1. Correct the estimate.
To prevent estimation on the early predicate pushdown we may replace explicit constant with local variable.

set showplan_xml on
go
declare @0 int = 0;
select t.b,t.c from t with(index(ix_b)) where isnumeric (t.b) = @0;
go
set showplan_xml off
go

Take into consideration, that it will work only if @0 is local variable, not a parameter. If @0 is a parameter, the parameter sniffing issue is fully applied to it. For example:

create proc proc1
	@p int
as
select t.b,t.c from t where isnumeric(t.b) = @p;
go
--1. both "bad" plans
exec sp_recompile 'proc1';
exec proc1 0; --sniff value 0, build "bad" plan
exec proc1 1; --use same wrong plan for value 1 also

--2. both "good" plans
exec sp_recompile 'proc1';
exec proc1 1; --sniff value 1, build "good" plan
exec proc1 0; --use good plan for value 0 also
go
drop proc proc1;
go

The first pair of queries would have the «bad» one plan (even for values 1):

In the second pair, both would have a good one (even for value 0):

The same story would be with local variable and option(recompile), if you have one in your query. It might be in the query for other reasons, but it will impact.
You would like probably to avoid parameter sniffing or recompile issue, then you may use «optimize for» hint, it works fine

declare @0 int = 0;
select t.b,t.c from t where isnumeric(t.b) = @0 option(recompile, optimize for(@0 unknown));

Another approach to influence selectivity estimation, is to create a computed column on the isnumeric expression (it is not necessary to create index on that computed column).
This would gives optimizer an opportunity to create statistics on that column and use it when estimating predicate (also consider, that you don’t have to replace «isnumeric(mycolun)=0» with «mycomputedcolumn=0», the server executes match by it self, just be sure that definition of the column and query expression are equal letter to letter). This approach would give the most accurate estimate. In general it may be used in a lot of other cases, when there is a need to make a good estimates over expressions, not in this particular case only.

alter table t add isnum as isnumeric(t.b);
go
set showplan_xml on
go
select t.b,t.c from t with(index(ix_b)) where isnumeric(t.b) = 0;
go
set showplan_xml off
go
alter table t drop column isnum;
go

2. Correct the poor choice of index join type etc.
If we are not correcting the estimate, we may correct the decision based on that estimate with query hint. There are a wide set of them, you may force seek or a scan, you may decide what index to use, you mar force join order or join type etc. That is one of the good cases, where hints might be applied without any doubts (though, the reason for hint should be well described in comments, of course).

Resume

The main reason of slow execution for the isnumeric=0 predicate — ineffective plan, due to early predicate pushdown, because of the wrong selectivity estimate. Would this error always lead to a bad plan? No, though incorrect estimate is always present, it is not necessary that plan would be build with wrong estimated operators. Does this estimation error tends to propagate through the plan? In simple cases it is not, but it might influence on the other operators and plan shape when building the plan, and we should consider this.

Final thoughts

  • All that was tested on 2008R2 rtm, 2012 rtm express.
  • You may do this manipulations on 10 000 or 100 000 tables, it doesn’t correlated to table row count, you will still have 1 row estimate.
  • All written above also true for isdate =)
  • As an advice, I call to the t-sql developers, to store data in well typed columns, e.g. if you have number, store it in a number type column, if you have date store it as date. This would also help to avoid a lot of head ache in future with many other things.
  • I have one wish for MS sql server dev team, please try to be as much as possible close to what is going on in reality, e.g. if we have filter or a hidden compute scalar — display it. It would make plan analysis more clear and reduce need to use undocumented features.

You may vote on Connect Item.

If you want to read something more about optimizer

Query Optimizer Deep Dive
More Undocumented Query Optimizer Trace Flags

 

Добавить комментарий

Ваш e-mail не будет опубликован. Обязательные поля помечены *

Анти-спам: введите результат (цифрами) *