Embedding

class experimental.embedding.Embedding

Bases: object

Embeddings provide a compact and meaningful representation of objects in a numerical vector space. They capture the semantic relationships between objects.

This class enables users to search unstructured data based on semantic similarity and to leverage the power of the vector index scan.

create_index(column, model_name, embedding_dimension=None, method='hnsw')

Generate embeddings and create index for a column of unstructured data.

This include

  • texts,

  • images, or

  • videos, etc.

This enables searching unstructured data based on semantic similarity, That is, whether they mean or contain similar things.

For better efficiency, the generated embeddings is stored in a column-oriented approach, i.e., separated from the input DataFrame. The input DataFrame must have a unique key to identify the tuples in the search results.

Parameters
  • column (str) – name of column to create index on.

  • model_name (str) – name of model to generate embedding.

  • embedding_dimension (Optional[int]) – dimension of the embedding.

  • method (Optional[Literal['ivfflat', 'hnsw']]) – name of the index access method (i.e. index type) in pgvector.

Returns

Dataframe with target column indexed based on embeddings.

Return type

DataFrame

Example

Please refer to Generating, Indexing and Searching Embeddings (Experimental) for more details.

search(column, query, top_k)

Searche unstructured data based on semantic similarity on embeddings.

Parameters
  • column (str) – name of column to search

  • query (Any) – content to be searched

  • top_k (int) – number of most similar results requested

Returns

Dataframe with the top k most similar results in the column of query.

Return type

DataFrame

Example

Please refer to Generating, Indexing and Searching Embeddings (Experimental) for more details.