Skip to content

Query: further optimize translation of StartsWith #11881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
maumar opened this issue May 3, 2018 · 24 comments · Fixed by #31482
Closed

Query: further optimize translation of StartsWith #11881

maumar opened this issue May 3, 2018 · 24 comments · Fixed by #31482

Comments

@maumar
Copy link
Contributor

maumar commented May 3, 2018

Currently we translate foobar.StartsWith(foo) into:

(foobar like foo +'%' and left(foobar, len(foo)) = foo) or foo = '' -- last term is removed if foo is non-null constant

however @rmacfadyen pointed out that for some scenarios the last term actually makes the query non-sargable.

If the term is removed completely, we return (arguably) incorrect data for case: null.StartsWith("").

We could however use the following translation:

(foobar like foo +'%' and left(foobar, len(foo)) = foo) or (foo = '' and foobar is null)

(and if we know that foobar can't be null we could drop the term as well.

We need to do some more investigation on which scenarios are getting better with this translation (i.e. if its worth complicating the sql)

@ajcvickers
Copy link
Contributor

Triage: Do the second translation.

@ajcvickers ajcvickers added this to the Backlog milestone May 4, 2018
@rmacfadyen
Copy link

rmacfadyen commented May 5, 2018

If the term is removed completely, we return (arguably) incorrect data for case: null.StartsWith("").

In SQL any comparison against NULL is NULL. So a NULL value can never StartsWith(..) anything (empty string, NULL, or an actual value).

In SQL LEFT(NULL, 0) returns NULL (not an empty string). Also SELECT 1 WHERE NULL LIKE '%' returns no rows.

All of which means that currently StartsWith('') is currently returning incorrect results when the column is NULL.

Edit: My thinking on this was from the standpoint of if a database engine where to create a StartsWith(Source, Expression) function how would it treat NULLs? It would return True, False and NULL, with NULL being returned if either the Source or Expression where NULL.

@divega
Copy link
Contributor

divega commented May 7, 2018

In SQL LEFT(NULL, 0) returns NULL (not an empty string). Also SELECT 1 WHERE NULL LIKE '%' returns no rows.

That is true but another point of view is that the behavior of these functions/operators is simplistic and not consistent with what SQL does with NULL in other places, e.g.:

NULL OR true --> true
NULL AND false --> false

In these cases NULL clearly means an UNKNOWN value. In the case we are discussing, the fact that you don't know what a string value is doesn't change the fact that such string will start with an empty string: all strings do. There are more ways to reason about it as well:

  1. In general translating .NET Boolean expressions that cannot be null to SQL that can be evaluated as NULL is problematic and breaks composability when used as part of a composite predicate, so we try to avoid it within reason. Of course for all other cases in which the translation of StartsWith when one or both of the arguments are null will result in NULL, but this one could be easily avoided, and as we have found so far, with the proposed translation, it does not get in the way of sargability.

  2. We are talking about scenarios in which evaluating StartWith in-memory in a regular program would throw an exception, so we really get to choose how it behaves.

  3. Changing the translation to remove this term could break any application that uses StartWith in a predicate without any additional short-circuiting logic to prevent rows with NULL values from being incorrectly filtered out.

@dmitry-lipetsk
Copy link
Contributor

dmitry-lipetsk commented Sep 19, 2018

class SqlServerStartsWithOptimizedTranslator
{
 //...
new SqlFunctionExpression("LEN", typeof(int), new[] { patternExpression }
 //...
}

If patternExpression contains 'char' or 'const string' (ConstantExpression), you may detect length locally and pass it to SQL.

@darcythomas
Copy link

My observation/ 2 cents:

For me:
[Foo] LIKE @foo + '%' AND (LEFT([Foo], LEN(@foo)) = @foo)) OR (@foo = '')
is 100X slower than [Foo] LIKE @foo + '%'

This seems to be because the the query needs to perform a index scan rather than an index seek on IX_TableName_Foo

@roji
Copy link
Member

roji commented Jun 24, 2019

Note that with #14657, when the pattern is constant we no longer translate the long/inefficient version, but escape the escape characters client-side (if needed) and simply send LIKE %pattern, which is the most efficient possible.

We could still look into improvements for non-constant patterns but I'm not really sure it's worth it.

@langdonx
Copy link

Currently..

where wo.WorkOrderNumber.EndsWith("D") == false

Translates to..

WHERE (([t].[work_order_number] IS NOT NULL AND ([t].[work_order_number] LIKE '%D')) OR (CAST(0 AS bit) = CAST(1 AS bit))) AND CASE
WHEN [t].[work_order_number] IS NOT NULL AND ([t].[work_order_number] LIKE '%D') THEN CAST(1 AS bit)
ELSE CAST(0 AS bit)

Did I do something wrong, or is that expected until this issue is resolve?

@langdonx
Copy link

langdonx commented Oct 30, 2019

Gets even crazier when two are used together:

where wo.WorkOrderNumber.EndsWith("D") == false
where wo.WorkOrderNumber.EndsWith("T") == false

Translates to..

WHERE ((([t].[work_order_number] IS NOT NULL AND ([t].[work_order_number] LIKE '%T')) OR (CAST(0 AS bit) = CAST(1 AS bit))) AND CASE
WHEN [t].[work_order_number] IS NOT NULL AND ([t].[work_order_number] LIKE '%T') THEN CAST(1 AS bit)
ELSE CAST(0 AS bit)
END IS NOT NULL) AND ((([t].[work_order_number] IS NOT NULL AND ([t].[work_order_number] LIKE '%D')) OR (CAST(0 AS bit) = CAST(1 AS bit))) AND CASE
WHEN [t].[work_order_number] IS NOT NULL AND ([t].[work_order_number] LIKE '%D') THEN CAST(1 AS bit)
ELSE CAST(0 AS bit)
END IS NOT NULL)

Which is conflicting enough to stop returning records.

I'd expect this to translate to..

WHERE [t].[work_order_number] NOT LIKE '%T' AND WHERE [t].[work_order_number] NOT LIKE '%D'

Seems like StartsWith() == false and EndsWith() == false could use some work as well.

I'm being punished for writing more legible code. 😂

where !wo.WorkOrderNumber.EndsWith("D")
where !wo.WorkOrderNumber.EndsWith("T")

Translates to..

WHERE ([t].[work_order_number] IS NOT NULL AND NOT ([t].[work_order_number] LIKE '%D')) AND ([t].[work_order_number] IS NOT NULL AND NOT ([t].[work_order_number] LIKE '%T'))

@rotem925
Copy link

rotem925 commented Nov 3, 2020

Is there going to be a fix for that? Not using LIKE for StartsWith is a killer for optimization. It seems to me like we really need to be able to use the index in large tables.

@nomz0101
Copy link

nomz0101 commented Nov 3, 2020

Hi,
I am encountering major performance issues as well using StartsWith().
It seems like there must be a better solution to be able to get the query translated using the "LIKE" method. I am running on big tables with a lot of data.

I am not able to use the EF.Functions.Like solution because I am working with Generic Repository Pattern. Is there a workaround for this?

@roji
Copy link
Member

roji commented Nov 3, 2020

@langdonx please try the latest 5.0.0-rc2 - for constant patterns (e.g. EndsWith("D")) we translate to a simple and efficient LIKE %D; same with StartsWith.

Non-constant patterns are trickier, though we have some plans for how to improve that as well (client-side parameter transformation).

@rotem925
Copy link

rotem925 commented Nov 3, 2020

@roji Thank you for the quick response!
We actually need this for non-constant array parameters(joined with a query)

Is there any workaround replacing the current StartsWith implementation with a custom one that uses Like? Till there is a fix? I mean overriding the current implementation.

The current implementation is a killer for us with a non usable product.
Using LIKE takes 30ms on a large table (since it uses the index).
Using current implementation take around 3000ms with arround 3000ms cpu time.

Thanks!

@roji
Copy link
Member

roji commented Nov 3, 2020

You can always explicitly use LIKE by using EF.Functions.Like instead of StartsWith/EndsWith. But be careful - if your the pattern your matching contains special characters (%, _), you'll get incorrect results (that's why we don't currently translate non-constant StartsWith with LIKE).

@glen-84
Copy link

glen-84 commented May 15, 2022

It's not even using LIKE for MySQL:

 WHERE (@__val_0 = '') OR (LEFT(`u`.`username`, CHAR_LENGTH(@__val_0)) = @__val_0))

Why is it doing this? I don't think that this can use an index. This is such a simple case (startsWith).

@roji
Copy link
Member

roji commented May 15, 2022

@glen-84 when the pattern to be matched is a parameter, it's not possible to use LIKE since there may be wildcards in there (%, _); that would cause incorrect results. When the pattern is a constant, EF can detect the wildcards and properly escape them.

Compare StartsWith with a constant pattern:

_ = await ctx.Blogs.Where(b => b.Name.StartsWith("foo")).ToListAsync();

... which yields the following SQL:

SELECT [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
WHERE [b].[Name] LIKE N'foo%'

... with StartsWith with a parameter pattern:

var pattern = "foo";
_ = await ctx.Blogs.Where(b => b.Name.StartsWith(pattern)).ToListAsync();

... which yields the following SQL:

SELECT [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
WHERE (@__pattern_0 = N'') OR (LEFT([b].[Name], LEN(@__pattern_0)) = @__pattern_0)

Note that the above is on SQL Server - MySQL may be doing something different.

@glen-84
Copy link

glen-84 commented May 15, 2022

I'm fairly new to EF, so forgive me if this is a silly question, but why can't EF escape the wildcards?

@roji
Copy link
Member

roji commented May 15, 2022

@glen-84 EF Core currently does not support manipulating parameter data before sending it. Note that you still have the option of using LIKE by calling EF.Functions.Like instead of String.StartsWith; if you do that, you take the responsibility in case there are wildcards characters.

@glen-84
Copy link

glen-84 commented May 15, 2022

Is there an issue to track support for this?

@roji
Copy link
Member

roji commented May 15, 2022

This is currently tracked as part of #17598 ("client-side parameter transformations"), but I've split it out to #28028 as it's different (and probably simpler to implement).

@4865783a5d
Copy link

4865783a5d commented Jun 1, 2023

@roji I've noticed that the current (EF Core 6 - with the Sql Server provider) translation provides a non-sargable query when using StartsWith.

and (@__request_Filter_0 = N'') OR (LEFT([g].[Name], LEN(@__request_Filter_0)) = @__request_Filter_0)

image

Leaving the empty-check, it can provide an optimized query plan. Wouldn't it be possible to perform the empty-check within the expression visitor?

and (LEFT([g].[Name], LEN(@__request_Filter_0)) = @__request_Filter_0)

image

@roji
Copy link
Member

roji commented Jun 1, 2023

@4865783a5d I am unable to reproduce this - both queries use an index scan for me.

Given the following database schema, seed data and queries:

DROP TABLE IF EXISTS data;
CREATE TABLE data (id INT IDENTITY PRIMARY KEY, name NVARCHAR(255));
CREATE INDEX IX_name ON data(name);

BEGIN TRANSACTION;
DECLARE @i INT = 0;
WHILE @i < 1000000
BEGIN
    INSERT INTO data (name) VALUES (CAST(@i AS NVARCHAR(MAX)));
    SET @i = @i + 1;
END;
COMMIT;

UPDATE STATISTICS data;
CHECKPOINT;

-- Queries
SET SHOWPLAN_ALL ON;
DECLARE @p NVARCHAR(255) = '10';
SELECT id FROM data WHERE LEFT(name, LEN(@p)) = @p;
SELECT id FROM data WHERE @p = N'' OR LEFT(name, LEN(@p)) = @p;
SET SHOWPLAN_ALL OFF;

I get the following plans:

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+--------------------+--------------------+---------------------------------------------------------------------------------------------------------------------------------------+------------------------+------------+----------+-----------+----------+----------------+------------------------+--------+--------+--------+------------------+
|StmtText                                                                                                                                                               |StmtId|NodeId|Parent|PhysicalOp          |LogicalOp           |Argument                                                                                                                               |DefinedValues           |EstimateRows|EstimateIO|EstimateCPU|AvgRowSize|TotalSubtreeCost|OutputList              |Warnings|Type    |Parallel|EstimateExecutions|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+--------------------+--------------------+---------------------------------------------------------------------------------------------------------------------------------------+------------------------+------------+----------+-----------+----------+----------------+------------------------+--------+--------+--------+------------------+
|SELECT id FROM data WHERE LEFT(name, LEN(@p)) = @p;                                                                                                                    |2     |2     |0     |null                |null                |2                                                                                                                                      |null                    |100000      |null      |null       |null      |4.2962093       |null                    |null    |SELECT  |false   |null              |
|  |--Parallelism(Gather Streams)                                                                                                                                       |2     |3     |2     |Parallelism         |Gather Streams      |null                                                                                                                                   |null                    |100000      |0         |0.059725   |11        |4.2962093       |[test].[dbo].[data].[id]|null    |PLAN_ROW|true    |1                 |
|       |--Clustered Index Scan(OBJECT:([test].[dbo].[data].[PK__data__3213E83F1DC1301F]), WHERE:(substring([test].[dbo].[data].[name],(1),len([@p]))=[@p]))            |2     |4     |3     |Clustered Index Scan|Clustered Index Scan|OBJECT:([test].[dbo].[data].[PK__data__3213E83F1DC1301F]), WHERE:(substring([test].[dbo].[data].[name],(1),len([@p]))=[@p])            |[test].[dbo].[data].[id]|100000      |3.923125  |0.1833595  |270       |4.1064844       |[test].[dbo].[data].[id]|null    |PLAN_ROW|true    |1                 |
|SELECT id FROM data WHERE @p = N'' OR LEFT(name, LEN(@p)) = @p;                                                                                                        |3     |5     |0     |null                |null                |3                                                                                                                                      |null                    |364382.22   |null      |null       |null      |4.412096        |null                    |null    |SELECT  |false   |null              |
|  |--Parallelism(Gather Streams)                                                                                                                                       |3     |6     |5     |Parallelism         |Gather Streams      |null                                                                                                                                   |null                    |364382.22   |0         |0.14227834 |11        |4.412096        |[test].[dbo].[data].[id]|null    |PLAN_ROW|true    |1                 |
|       |--Clustered Index Scan(OBJECT:([test].[dbo].[data].[PK__data__3213E83F1DC1301F]), WHERE:([@p]=N'' OR substring([test].[dbo].[data].[name],(1),len([@p]))=[@p]))|3     |7     |6     |Clustered Index Scan|Clustered Index Scan|OBJECT:([test].[dbo].[data].[PK__data__3213E83F1DC1301F]), WHERE:([@p]=N'' OR substring([test].[dbo].[data].[name],(1),len([@p]))=[@p])|[test].[dbo].[data].[id]|364382.22   |3.923125  |0.1833595  |270       |4.1064844       |[test].[dbo].[data].[id]|null    |PLAN_ROW|true    |1                 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+--------------------+--------------------+---------------------------------------------------------------------------------------------------------------------------------------+------------------------+------------+----------+-----------+----------+----------------+------------------------+--------+--------+--------+------------------+

Can you please post a similar minimal SQL script that shows the two different plans? It would be quite strange if just adding a single check for the parameter being null would significantly change the query plan (it's a one-time check that has nothing to do with the table data).

However, I do have an idea on making LIKE work for the parameter case - I'll take a look.

@4865783a5d
Copy link

4865783a5d commented Jun 1, 2023

@roji Apologizes for leaving an important detail out. There was another, preceeding predicate causing this behavior. I'll try to replicate it.

Edit: Can't seem to replicate it anymore - StartsWith translates into good T-SQL using proper indices.

@henning-krause
Copy link

I have the same problem on fairly large tables where the current implementation does not use the indexes on the table. Because I'm using Odata, which operates on IQueryable alone and is not bound directly to Entity Framework, I'm not able to use the EF.Functions.Like solution.

@roji
Copy link
Member

roji commented Aug 16, 2023

@henning-krause we can't do much with the above - you'll need to post the SQL which EF generates, as well as ideally the LINQ query.

roji added a commit to roji/efcore that referenced this issue Aug 16, 2023
roji added a commit to roji/efcore that referenced this issue Aug 16, 2023
roji added a commit to roji/efcore that referenced this issue Aug 16, 2023
roji added a commit to roji/efcore that referenced this issue Aug 16, 2023
@roji roji closed this as completed in a07a1bd Aug 16, 2023
roji added a commit to roji/efcore that referenced this issue Aug 17, 2023
roji added a commit that referenced this issue Aug 17, 2023
@ajcvickers ajcvickers modified the milestones: 8.0.0, 8.0.0-rc1 Aug 19, 2023
@ajcvickers ajcvickers modified the milestones: 8.0.0-rc1, 8.0.0 Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.