Pivotal Greenplum-Spark Connector 1.7.0 Release Notes
The Pivotal Greenplum-Spark Connector supports high speed, parallel data transfer between Greenplum Database and an Apache Spark cluster using:
- Spark’s Scala API - programmatic access (including the
Refer to the Pivotal Greenplum Database documentation for detailed information about Pivotal Greenplum Database.
See the Apache Spark documentation for information about Apache Spark version 2.3.1.
The following table identifies the supported component versions for the Pivotal Greenplum-Spark Connector:
|Greenplum-Spark Connector Version||Greenplum Version||Spark Version||Scala Version||PostgreSQL JDBC Driver Version|
|1.7.0||4.3.x, 5.x, 6.x||2.3.1 and above||2.11||9.4.1209|
|1.6.2, 1.6.1||4.3.x, 5.x, <=6.7||2.3.1 and above||2.11||9.4.1209|
|1.6.0, 1.5.0||4.3.x, 5.x||2.1.2 and above||2.11||9.4.1209|
|1.4.0, 1.3.0, 1.2.0, 1.1.0, 1.0.0||4.3.x, 5.x||2.1.1||2.11||9.4.1209|
The Greenplum-Spark Connector is bundled with, and certified against, the PostgreSQL JDBC driver versions listed above.
Released: July 9, 2020
Greenplum-Spark Connector 1.7.0 includes new and changed features and bug fixes.
Pivotal Greenplum-Spark Connector 1.7.0 includes the following new and changed features:
Support for Range of Port Numbers
The developer can now specify one or more lists or ranges of port numbers in the Greenplum-Spark Connector
Mixed-Case Column Names
The Greenplum-Spark Connector supports reading from or writing to Greenplum database tables that you create with mixed-case column names.
The Greenplum-Spark Connector exposes the new
distributedBywrite option that a developer can use to specify a distribution column(s) for a Greenplum Database table that the Connector creates or re-creates on their behalf.
New Default Distribution Policy for Connector-Created Greenplum Tables
The Greenplum-Spark Connector now specifies random distribution by default for tables that it creates or re-creates. In previous releases, the Connector did not specify a distribution column. You can provide the
distributedByoption, mentioned above, to specifically set the table distribution columns.
The following issues were resolved in Greenplum-Spark Connector version 1.7.0:
|173608876||Resolved an issue where the Greenplum-Spark Connector failed to read data from, or write data to, Greenplum Database version 6.7.1+ due to a change in how Greenplum handles distributed transaction IDs.|
|30732||There was no way to specify a distribution column for a Greenplum table that was created or re-created by the Greenplum-Spark Connector on the developer’s behalf. This issue is resolved; the Connector now exposes the
|30544||Resolved an issue where the Greenplum-Spark Connector failed to correctly read from a Greenplum Database table that was created with mixed-case column names.|
|30461||The Greenplum-Spark Connector did not support more than one port number in
Known issues and limitations related to the 1.7.0 release of the Pivotal Greenplum-Spark Connector include the following:
- The Greenplum-Spark Connector supports basic data types like Float, Integer, String, and Date/Time data types. The Connector does not yet support more complex types. See Greenplum Database <-> Spark Data Type Mapping for additional information.