Greenplum Database Configuration and Maintenance
You must configure Greenplum Database client host access and role privileges and attributes before using the Greenplum-Spark Connector to transfer data between your Greenplum Database and Spark clusters.
Once you start running Spark applications that use the Greenplum-Spark Connector, you may be required to perform certain Greenplum Database maintenance tasks.
These Greenplum Database configuration and maintenance tasks, described below, must be performed by a Greenplum user with administrative (
You must explicitly configure Greenplum Database to permit access from all Spark nodes and stand-alone clients. Configure access for each Spark node, Greenplum database, and Greenplum Database role combination in the
pg_hba.conf file on the master node.
Refer to Configuring Client Authentication in the Greenplum Database documentation for detailed information on configuring
The Greenplum-Spark Connector uses JDBC to communicate with the Greenplum Database master node. The Greenplum user/role name that you provide when you use the Greenplum-Spark Connector to transfer data between Greenplum Database and Spark must have certain privileges assigned by the administrator:
The user/role must have
CREATEprivileges on each non-public database schema in which a table to be transferred resides:
<db-name>=# GRANT USAGE, CREATE ON SCHEMA <schema_name> TO <user_name>;
The user/role must have the
SELECTprivilege on every Greenplum Database table that the user will read into Spark:
<db-name>=# GRANT SELECT ON <schema_name>.<table_name> TO <user_name>;
The user/role must have permission to create writable external tables using the Greenplum Database
<db-name>=# ALTER USER <user_name> CREATEEXTTABLE(type = 'writable', protocol = 'gpfdist');
See the Greenplum Database Managing Roles and Privileges documention for further information on assigning privileges to Greenplum Database users.
The Greenplum-Spark Connector uses Greenplum Database external tables to load Greenplum data into Spark. Maintenance tasks related to these external tables may include:
- Periodically checking the status of your Greenplum Database catalogs for bloat, and
VACUUM-ing the catalog as appropriate. Refer to the Greenplum Database System Catalog Maintenance and
VACUUMdocumentation for further information.
- Manually removing Greenplum-Spark Connector-created external tables when your Spark cluster shuts down abnormally. Refer to Cleaning Up Orphaned Greenplum External Tables for details related to this procedure.