-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Query: further optimize translation of StartsWith #11881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Triage: Do the second translation. |
In SQL any comparison against NULL is NULL. So a NULL value can never StartsWith(..) anything (empty string, NULL, or an actual value). In SQL LEFT(NULL, 0) returns NULL (not an empty string). Also SELECT 1 WHERE NULL LIKE '%' returns no rows. All of which means that currently StartsWith('') is currently returning incorrect results when the column is NULL. Edit: My thinking on this was from the standpoint of if a database engine where to create a StartsWith(Source, Expression) function how would it treat NULLs? It would return True, False and NULL, with NULL being returned if either the Source or Expression where NULL. |
That is true but another point of view is that the behavior of these functions/operators is simplistic and not consistent with what SQL does with NULL in other places, e.g.: NULL OR true --> true In these cases NULL clearly means an UNKNOWN value. In the case we are discussing, the fact that you don't know what a string value is doesn't change the fact that such string will start with an empty string: all strings do. There are more ways to reason about it as well:
|
class SqlServerStartsWithOptimizedTranslator
{
//...
new SqlFunctionExpression("LEN", typeof(int), new[] { patternExpression }
//...
} If patternExpression contains 'char' or 'const string' (ConstantExpression), you may detect length locally and pass it to SQL. |
My observation/ 2 cents: For me: This seems to be because the the query needs to perform a index scan rather than an index seek on |
Note that with #14657, when the pattern is constant we no longer translate the long/inefficient version, but escape the escape characters client-side (if needed) and simply send We could still look into improvements for non-constant patterns but I'm not really sure it's worth it. |
Currently..
Translates to..
Did I do something wrong, or is that expected until this issue is resolve? |
Gets even crazier when two are used together:
Translates to..
Which is conflicting enough to stop returning records. I'd expect this to translate to..
Seems like I'm being punished for writing more legible code. 😂
Translates to..
|
Is there going to be a fix for that? Not using LIKE for StartsWith is a killer for optimization. It seems to me like we really need to be able to use the index in large tables. |
Hi, I am not able to use the EF.Functions.Like solution because I am working with Generic Repository Pattern. Is there a workaround for this? |
@langdonx please try the latest 5.0.0-rc2 - for constant patterns (e.g. Non-constant patterns are trickier, though we have some plans for how to improve that as well (client-side parameter transformation). |
@roji Thank you for the quick response! Is there any workaround replacing the current StartsWith implementation with a custom one that uses Like? Till there is a fix? I mean overriding the current implementation. The current implementation is a killer for us with a non usable product. Thanks! |
You can always explicitly use LIKE by using EF.Functions.Like instead of StartsWith/EndsWith. But be careful - if your the pattern your matching contains special characters (%, _), you'll get incorrect results (that's why we don't currently translate non-constant StartsWith with LIKE). |
It's not even using WHERE (@__val_0 = '') OR (LEFT(`u`.`username`, CHAR_LENGTH(@__val_0)) = @__val_0)) Why is it doing this? I don't think that this can use an index. This is such a simple case ( |
@glen-84 when the pattern to be matched is a parameter, it's not possible to use LIKE since there may be wildcards in there ( Compare StartsWith with a constant pattern: _ = await ctx.Blogs.Where(b => b.Name.StartsWith("foo")).ToListAsync(); ... which yields the following SQL: SELECT [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
WHERE [b].[Name] LIKE N'foo%' ... with StartsWith with a parameter pattern: var pattern = "foo";
_ = await ctx.Blogs.Where(b => b.Name.StartsWith(pattern)).ToListAsync(); ... which yields the following SQL: SELECT [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
WHERE (@__pattern_0 = N'') OR (LEFT([b].[Name], LEN(@__pattern_0)) = @__pattern_0) Note that the above is on SQL Server - MySQL may be doing something different. |
I'm fairly new to EF, so forgive me if this is a silly question, but why can't EF escape the wildcards? |
@glen-84 EF Core currently does not support manipulating parameter data before sending it. Note that you still have the option of using LIKE by calling EF.Functions.Like instead of String.StartsWith; if you do that, you take the responsibility in case there are wildcards characters. |
Is there an issue to track support for this? |
@roji I've noticed that the current (EF Core 6 - with the Sql Server provider) translation provides a non-sargable query when using StartsWith. and (@__request_Filter_0 = N'') OR (LEFT([g].[Name], LEN(@__request_Filter_0)) = @__request_Filter_0) Leaving the empty-check, it can provide an optimized query plan. Wouldn't it be possible to perform the empty-check within the expression visitor? and (LEFT([g].[Name], LEN(@__request_Filter_0)) = @__request_Filter_0) |
@4865783a5d I am unable to reproduce this - both queries use an index scan for me. Given the following database schema, seed data and queries: DROP TABLE IF EXISTS data;
CREATE TABLE data (id INT IDENTITY PRIMARY KEY, name NVARCHAR(255));
CREATE INDEX IX_name ON data(name);
BEGIN TRANSACTION;
DECLARE @i INT = 0;
WHILE @i < 1000000
BEGIN
INSERT INTO data (name) VALUES (CAST(@i AS NVARCHAR(MAX)));
SET @i = @i + 1;
END;
COMMIT;
UPDATE STATISTICS data;
CHECKPOINT;
-- Queries
SET SHOWPLAN_ALL ON;
DECLARE @p NVARCHAR(255) = '10';
SELECT id FROM data WHERE LEFT(name, LEN(@p)) = @p;
SELECT id FROM data WHERE @p = N'' OR LEFT(name, LEN(@p)) = @p;
SET SHOWPLAN_ALL OFF; I get the following plans:
Can you please post a similar minimal SQL script that shows the two different plans? It would be quite strange if just adding a single check for the parameter being null would significantly change the query plan (it's a one-time check that has nothing to do with the table data). However, I do have an idea on making LIKE work for the parameter case - I'll take a look. |
@roji Apologizes for leaving an important detail out. There was another, preceeding predicate causing this behavior. I'll try to replicate it. Edit: Can't seem to replicate it anymore - StartsWith translates into good T-SQL using proper indices. |
I have the same problem on fairly large tables where the current implementation does not use the indexes on the table. Because I'm using Odata, which operates on IQueryable alone and is not bound directly to Entity Framework, I'm not able to use the EF.Functions.Like solution. |
@henning-krause we can't do much with the above - you'll need to post the SQL which EF generates, as well as ideally the LINQ query. |
Closes dotnet#30493 Closes dotnet#11881 Closes dotnet#26735
Closes dotnet#30493 Closes dotnet#11881 Closes dotnet#26735
Closes dotnet#30493 Closes dotnet#11881 Closes dotnet#26735
Closes dotnet#30493 Closes dotnet#11881 Closes dotnet#26735
Closes dotnet#30493 Closes dotnet#11881 Closes dotnet#26735 (cherry picked from commit a07a1bd)
Currently we translate
foobar.StartsWith(foo)
into:however @rmacfadyen pointed out that for some scenarios the last term actually makes the query non-sargable.
If the term is removed completely, we return (arguably) incorrect data for case: null.StartsWith("").
We could however use the following translation:
(and if we know that foobar can't be null we could drop the term as well.
We need to do some more investigation on which scenarios are getting better with this translation (i.e. if its worth complicating the sql)
The text was updated successfully, but these errors were encountered: