Pivotal Greenplum-Spark Connector 1.7.0 Release Notes

The Pivotal Greenplum-Spark Connector supports high speed, parallel data transfer between Greenplum Database and an Apache Spark cluster using:

  • Spark’s Scala API - programmatic access (including the spark-shell REPL)

Refer to the Pivotal Greenplum Database documentation for detailed information about Pivotal Greenplum Database.

See the Apache Spark documentation for information about Apache Spark version 2.3.1.

Supported Platforms

The following table identifies the supported component versions for the Pivotal Greenplum-Spark Connector:

Greenplum-Spark Connector Version Greenplum Version Spark Version Scala Version PostgreSQL JDBC Driver Version
1.7.0 4.3.x, 5.x, 6.x 2.3.1 and above 2.11 9.4.1209
1.6.2, 1.6.1 4.3.x, 5.x, <=6.7 2.3.1 and above 2.11 9.4.1209
1.6.0, 1.5.0 4.3.x, 5.x 2.1.2 and above 2.11 9.4.1209
1.4.0, 1.3.0, 1.2.0, 1.1.0, 1.0.0 4.3.x, 5.x 2.1.1 2.11 9.4.1209

The Greenplum-Spark Connector is bundled with, and certified against, the PostgreSQL JDBC driver versions listed above.

Greenplum-Spark Connector 1.7.0

Released: July 9, 2020

Greenplum-Spark Connector 1.7.0 includes new and changed features and bug fixes.

New and Changed Features

Pivotal Greenplum-Spark Connector 1.7.0 includes the following new and changed features:

  • Support for Range of Port Numbers

    The developer can now specify one or more lists or ranges of port numbers in the Greenplum-Spark Connector server.port option.

  • Mixed-Case Column Names

    The Greenplum-Spark Connector supports reading from or writing to Greenplum database tables that you create with mixed-case column names.

  • distributedBy Option

    The Greenplum-Spark Connector exposes the new distributedBy write option that a developer can use to specify a distribution column(s) for a Greenplum Database table that the Connector creates or re-creates on their behalf.

  • New Default Distribution Policy for Connector-Created Greenplum Tables

    The Greenplum-Spark Connector now specifies random distribution by default for tables that it creates or re-creates. In previous releases, the Connector did not specify a distribution column. You can provide the distributedBy option, mentioned above, to specifically set the table distribution columns.

Resolved Issues

The following issues were resolved in Greenplum-Spark Connector version 1.7.0:

Bug Id Summary
173608876 Resolved an issue where the Greenplum-Spark Connector failed to read data from, or write data to, Greenplum Database version 6.7.1+ due to a change in how Greenplum handles distributed transaction IDs.
30732 There was no way to specify a distribution column for a Greenplum table that was created or re-created by the Greenplum-Spark Connector on the developer’s behalf. This issue is resolved; the Connector now exposes the distributedBy write option for this purpose.
30544 Resolved an issue where the Greenplum-Spark Connector failed to correctly read from a Greenplum Database table that was created with mixed-case column names.
30461 The Greenplum-Spark Connector did not support more than one port number in server.port. This issue is resolved. The Connector now allows you to set one or more lists or ranges of port numbers in the server.port option.

Known Issues and Limitations

Known issues and limitations related to the 1.7.0 release of the Pivotal Greenplum-Spark Connector include the following:

  • The Greenplum-Spark Connector supports basic data types like Float, Integer, String, and Date/Time data types. The Connector does not yet support more complex types. See Greenplum Database <-> Spark Data Type Mapping for additional information.