Database
Manage connection to Greenplum/PostgreSQL database.
- class db.Database
Bases:
object
Representation of a database in which data is located and computation is performed.
Each
Database
object is tied to a connection to the remote database system.- close()
Close the database connection.
- Return type
None
- create_dataframe(table_name=None, schema=None, rows=None, columns=None, column_names=None, files=None, parser=None)
Create a
DataFrame
from a database table, or a set of data.- Parameters
table_name (Optional[str]) – str: name of table in Database
schema (Optional[str]) – str: name of schema in Database
rows (Optional[List[Union[Tuple[Any, ...], Dict[str, Any]]]]) – List[Union[Tuple[Any, …], Dict[str, Any]]]: a List of rows
columns (Optional[Dict[str, Iterable[Any]]]) – Dict[str, List[Any]]: a dict of columns
column_names (Optional[Iterable[str]]) – Iterable[str]: List of given column names
files (Optional[List[str]]) –
parser (Optional[NormalFunction]) –
Example
To create
DataFrame
from a database table:>>> cursor.execute("DROP TABLE IF EXISTS one_column_table") >>> cursor.execute( ... "CREATE TABLE one_column_table AS SELECT 42 as id;") >>> df_from_table = db.create_dataframe(table_name="one_column_table") >>> df_from_table ---- id ---- 42 ---- (1 row)
To create
DataFrame
from a predefined set of data:>>> rows = [(1,), (2,), (3,)] >>> df_from_rows = db.create_dataframe(rows=rows, column_names=["id"]) >>> df_from_rows ---- id ---- 1 2 3 ---- (3 rows) >>> columns = {"a": [1, 2, 3], "b": [1, 2, 3]} >>> t_from_columns = db.create_dataframe(columns=columns) >>> t_from_columns ------- a | b ---+--- 1 | 1 2 | 2 3 | 3 ------- (3 rows)
- apply(func, expand=False, column_name=None)
Apply a function in database without depending on a
DataFrame
.This is primarily for applying functions on adaptable Python objects as constants in database.
The arguments and return type is similar to
apply()
except that parameterfunc
takes no argument.Example
>>> @gp.create_function ... def add(a: int, b: int) -> int: ... return a + b >>> db.apply(lambda: add(1, 2), column_name="sum") ----- sum ----- 3 ----- (1 row)
- Parameters
func (Callable[[], FunctionExpr]) –
expand (bool) –
column_name (Optional[str]) –
- Return type
- assign(**new_columns)
Assign new columns by calling functions in database.
- Parameters
new_columns (Callable[[], Any]) – a
dict
whose keys are column names and values areCallable
returning column data when applied to constant value in database.- Returns
GreenplumPython DataFrame resulted with assigned columns
- Return type
Example
>>> abs = gp.function("abs") >>> db.assign(abs=lambda: abs(-42)) ----- abs ----- 42 ----- (1 row)
- install_packages(requirements)
Install the required Python packages on the server host.
The packages will be installed to the currently activated Python environment.
- Parameters
requirements (str) – specification of what packages are required to be installed. The format is the same as the rquirements file used by
pip
. It can be obtained by reading an existing requirements file in text mode.- Return type
None
Example
See Installing Python Packages on Server without Internet (Experimental) for more details.
Note
This function only installs packages on the server host that GreenplumPython directly connects to. If your database server spreads across multiple hosts, additional operations are required to make the packages available on all hosts.
One simple way to achieve this is to setup an NFS share on all hosts. Please refer to Installing Python Packages on Server without Internet (Experimental) for a simple working example.
Warning
This function is currently experimental and the interface is subject to change.
- db.database(uri=None, params={})
Open a connection to database with connection URI or parameters.
- Parameters
uri (Optional[str]) – connection URI to the database. Please refer to the libpq documentation on connection URI for detailed usage.
params (Dict[str, Optional[str]]) – connection parameters to the database. Please refer to the libpq documentation on parameter keywords for detailed usage. The parameter will be ignored and will not be passed to the remote database server if its value is
None
.
- Return type