Embedding

class experimental.embedding.Embedding

Bases: object

Embeddings provide a compact and meaningful representation of objects in a numerical vector space. They capture the semantic relationships between objects.

This class enables users to search unstructured data based on semantic similarity and to leverage the power of the vector index scan.

create_index(column, model_name, embedding_dimension=None, method='hnsw')

Generate embeddings and create index for a column of unstructured data.

This include

texts,
images, or
videos, etc.

This enables searching unstructured data based on semantic similarity, That is, whether they mean or contain similar things.

For better efficiency, the generated embeddings is stored in a column-oriented approach, i.e., separated from the input DataFrame. The input DataFrame must have a unique key to identify the tuples in the search results.

Parameters

column (str) – name of column to create index on.
model_name (str) – name of model to generate embedding.
embedding_dimension (Optional[int]) – dimension of the embedding.
method (Optional[Literal['ivfflat', 'hnsw']]) – name of the index access method (i.e. index type) in pgvector.

Returns

Dataframe with target column indexed based on embeddings.

Return type

DataFrame

Example

Please refer to Generating, Indexing and Searching Embeddings (Experimental) for more details.

search(column, query, top_k)

Searche unstructured data based on semantic similarity on embeddings.