Skip to content

.Net: [MEVD] Sqlite filtering behavior is problematic/incorrect #11655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
roji opened this issue Apr 20, 2025 · 1 comment
Open

.Net: [MEVD] Sqlite filtering behavior is problematic/incorrect #11655

roji opened this issue Apr 20, 2025 · 1 comment
Assignees
Labels
Build Features planned for next Build conference msft.ext.vectordata Related to Microsoft.Extensions.VectorData .NET Issue or Pull requests regarding .NET code

Comments

@roji
Copy link
Member

roji commented Apr 20, 2025

In all of our providers, the (LINQ) filter is a pre-filter, taking effect before the vector similarity search. However, in SQLite this seems to be the opposite: if you ask do a filtered similarity search with top=1, it looks like this first gets the most similar record, and only then applies the filter (returning nothing if the single filter doesn't match).

I'm not sure if this is intentional/documented, but it certainly doesn't seem very useful - the point of filtering is usually to restrict which records get considered for similarity search (e.g. within a given category, tenant...). We should check if this behavior is because of our own connector or just the way sqlite_vec works, and consider what to do based on that.

@dmytrostruk assigning to you since I think you wrote the SQLite connector and are the most familiar with it. This technically isn't blocking for Build, but we shouldn't GA the connector before we understand what's going on (and we have plans to use SQLite as our demo/getting started connector...).

Full repro
public class Foo(SqliteFixture fixture) : IClassFixture<SqliteFixture>
{
    [Fact]
    public async Task SqliteIssue()
    {
        var collection = new SqliteVectorStoreRecordCollection<string, Record>("Data Source=/tmp/foo.sqlite", "foo");
        if (await collection.CollectionExistsAsync())
        {
            await collection.DeleteCollectionAsync();
        }

        await collection.CreateCollectionAsync();

        await collection.UpsertAsync(
        [
            new Record
            {
                Key = "1",
                Text = "foo",
                Vector = new ReadOnlyMemory<float>([1f, 1f, 1f])
            },
            new Record
            {
                Key = "2",
                Text = "bar",
                Vector = new ReadOnlyMemory<float>([10f, 20f, 35f])
            }
        ]);

        var results = await collection.VectorizedSearchAsync(new ReadOnlyMemory<float>([10f, 20f, 35f]), 1, new VectorSearchOptions<Record>
        {
            Filter = r => r.Text == "foo",
        }).ToListAsync();

        Assert.Single(results);
    }

    public class Record
    {
        [VectorStoreRecordKey]
        public string Key { get; set; }

        [VectorStoreRecordData(IsIndexed = true)]
        public string Text { get; set; }

        [VectorStoreRecordVector(Dimensions: 3)]
        public ReadOnlyMemory<float>? Vector { get; set; }
    }
}
@roji roji added .NET Issue or Pull requests regarding .NET code msft.ext.vectordata Related to Microsoft.Extensions.VectorData labels Apr 20, 2025
@roji roji added the Build Features planned for next Build conference label Apr 20, 2025
@roji roji moved this to Backlog: Planned in Semantic Kernel Apr 20, 2025
@github-actions github-actions bot changed the title [MEVD] Sqlite filtering behavior is problematic/incorrect .Net: [MEVD] Sqlite filtering behavior is problematic/incorrect Apr 20, 2025
@roji
Copy link
Member Author

roji commented Apr 21, 2025

BTW this may be a matter of simply using subqueries to ensure that the WHERE operation occurs first, and the ORDER BY later. In SQL, WHERE is supposed to be evaluated first in any case (so no subquery should be needed), but sqlite_vec seems to be doing some odd things so something else may be going on.

@dmytrostruk dmytrostruk moved this from Backlog: Planned to Sprint: Planned in Semantic Kernel Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Features planned for next Build conference msft.ext.vectordata Related to Microsoft.Extensions.VectorData .NET Issue or Pull requests regarding .NET code
Projects
Status: Sprint: Planned
Development

No branches or pull requests

3 participants