Skip to content

Instantly share code, notes, and snippets.

@Hugoberry
Last active March 6, 2024 11:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Hugoberry/a170e30103adaf91dcf70571829e5a33 to your computer and use it in GitHub Desktop.
Save Hugoberry/a170e30103adaf91dcf70571829e5a33 to your computer and use it in GitHub Desktop.
Bare-bones implementation of similarity search in DuckDB using Expression API
import duckdb
from duckdb import (
ConstantExpression,
ColumnExpression,
FunctionExpression,
StarExpression
)
conn = duckdb.connect(database='embed.ddb')
text="Wooden floor"
from openai import OpenAI
client = OpenAI()
embedding = client.embeddings.create(input=[text], model="text-embedding-3-large").data[0].embedding
rms = conn.table("rms_embeddings_01")
list_cosine_similarity = FunctionExpression('list_cosine_similarity', ColumnExpression('embedding'), ConstantExpression(embedding))
top10 = (rms.
select(*[StarExpression(),list_cosine_similarity.alias("similarity")]).
order("similarity desc").
limit(10).
select(StarExpression(exclude=["similarity", "embedding"])).
fetchdf()
)
conn.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment