Pivotal Greenplum-Spark Connector 1.7.0 Release Notes
The Pivotal Greenplum-Spark Connector supports high speed, parallel data transfer between Greenplum Database and an Apache Spark cluster using:
- Spark’s Scala API - programmatic access (including the
spark-shell
REPL)
Refer to the Pivotal Greenplum Database documentation for detailed information about Pivotal Greenplum Database.
See the Apache Spark documentation for information about Apache Spark version 2.3.1.
Supported Platforms
The following table identifies the supported component versions for the Pivotal Greenplum-Spark Connector:
Greenplum-Spark Connector Version | Greenplum Version | Spark Version | Scala Version | PostgreSQL JDBC Driver Version |
---|---|---|---|---|
1.7.0 | 4.3.x, 5.x, 6.x | 2.3.1 and above | 2.11 | 9.4.1209 |
1.6.2, 1.6.1 | 4.3.x, 5.x, <=6.7 | 2.3.1 and above | 2.11 | 9.4.1209 |
1.6.0, 1.5.0 | 4.3.x, 5.x | 2.1.2 and above | 2.11 | 9.4.1209 |
1.4.0, 1.3.0, 1.2.0, 1.1.0, 1.0.0 | 4.3.x, 5.x | 2.1.1 | 2.11 | 9.4.1209 |
The Greenplum-Spark Connector is bundled with, and certified against, the PostgreSQL JDBC driver versions listed above.
Greenplum-Spark Connector 1.7.0
Released: July 9, 2020
Greenplum-Spark Connector 1.7.0 includes new and changed features and bug fixes.
New and Changed Features
Pivotal Greenplum-Spark Connector 1.7.0 includes the following new and changed features:
Support for Range of Port Numbers
The developer can now specify one or more lists or ranges of port numbers in the Greenplum-Spark Connector
server.port
option.Mixed-Case Column Names
The Greenplum-Spark Connector supports reading from or writing to Greenplum database tables that you create with mixed-case column names.
distributedBy Option
The Greenplum-Spark Connector exposes the new
distributedBy
write option that a developer can use to specify a distribution column(s) for a Greenplum Database table that the Connector creates or re-creates on their behalf.New Default Distribution Policy for Connector-Created Greenplum Tables
The Greenplum-Spark Connector now specifies random distribution by default for tables that it creates or re-creates. In previous releases, the Connector did not specify a distribution column. You can provide the
distributedBy
option, mentioned above, to specifically set the table distribution columns.
Resolved Issues
The following issues were resolved in Greenplum-Spark Connector version 1.7.0:
Bug Id | Summary |
---|---|
173608876 | Resolved an issue where the Greenplum-Spark Connector failed to read data from, or write data to, Greenplum Database version 6.7.1+ due to a change in how Greenplum handles distributed transaction IDs. |
30732 | There was no way to specify a distribution column for a Greenplum table that was created or re-created by the Greenplum-Spark Connector on the developer’s behalf. This issue is resolved; the Connector now exposes the distributedBy write option for this purpose. |
30544 | Resolved an issue where the Greenplum-Spark Connector failed to correctly read from a Greenplum Database table that was created with mixed-case column names. |
30461 | The Greenplum-Spark Connector did not support more than one port number in server.port . This issue is resolved. The Connector now allows you to set one or more lists or ranges of port numbers in the server.port option. |
Known Issues and Limitations
Known issues and limitations related to the 1.7.0 release of the Pivotal Greenplum-Spark Connector include the following:
- The Greenplum-Spark Connector supports basic data types like Float, Integer, String, and Date/Time data types. The Connector does not yet support more complex types. See Greenplum Database <-> Spark Data Type Mapping for additional information.