Embedding
- class experimental.embedding.Embedding
Bases:
object
Embeddings provide a compact and meaningful representation of objects in a numerical vector space. They capture the semantic relationships between objects.
This class enables users to search unstructured data based on semantic similarity and to leverage the power of the vector index scan.
- create_index(column, model_name, embedding_dimension=None, method='hnsw')
Generate embeddings and create index for a column of unstructured data.
This include
texts,
images, or
videos, etc.
This enables searching unstructured data based on semantic similarity, That is, whether they mean or contain similar things.
For better efficiency, the generated embeddings is stored in a column-oriented approach, i.e., separated from the input DataFrame. The input DataFrame must have a unique key to identify the tuples in the search results.
- Parameters
column (str) – name of column to create index on.
model_name (str) – name of model to generate embedding.
embedding_dimension (Optional[int]) – dimension of the embedding.
method (Optional[Literal['ivfflat', 'hnsw']]) – name of the index access method (i.e. index type) in pgvector.
- Returns
Dataframe with target column indexed based on embeddings.
- Return type
DataFrame
Example
Please refer to Generating, Indexing and Searching Embeddings (Experimental) for more details.
- search(column, query, top_k)
Searche unstructured data based on semantic similarity on embeddings.
- Parameters
column (str) – name of column to search
query (Any) – content to be searched
top_k (int) – number of most similar results requested
- Returns
Dataframe with the top k most similar results in the column of query.
- Return type
DataFrame
Example
Please refer to Generating, Indexing and Searching Embeddings (Experimental) for more details.