Pivotal Greenplum-Spark Connector 1.6.x Release Notes
The Pivotal Greenplum-Spark Connector supports high speed, parallel data transfer between Greenplum Database and an Apache Spark cluster using:
- Spark’s Scala API - programmatic access (including the
Pivotal Greenplum-Spark Connector 1.6.x releases of the Greenplum Database connector for Apache Spark include new and changed features and bug fixes.
The following table identifies the supported component versions for the Pivotal Greenplum-Spark Connector:
|Greenplum-Spark Connector Version||Greenplum Version||Spark Version||Scala Version|
|1.6.1, 1.6.2||4.3.x, 5.x, 6.x||2.3.1 and above||2.11|
|1.6.0||4.3.x, 5.x||2.1.2 and above||2.11|
|1.5.0||4.3.x, 5.x||2.1.2 and above||2.11|
Refer to the Pivotal Greenplum Database documentation for detailed information about Pivotal Greenplum Database.
See the Apache Spark documentation for information about Apache Spark version 2.1.2.
Released: January 2, 2020
The following issue was resolved in Pivotal Greenplum-Spark Connector version 1.6.2:
|169651274||In some cases, when reading from Greenplum Database, the Greenplum-Spark Connector incorrectly generated one or more extra
Released: June 24, 2019
Pivotal Greenplum-Spark Connector 1.6.1 includes the following new features:
spark.read.greenplum() Shortcut Method
The Greenplum-Spark Connector now includes a driver shortcut
.greenplum()method to read data from Greenplum Database into Spark. Refer to Using the .greenplum() Shortcut Method for additional information.
The Greenplum-Spark Connector exposes the new
iteratorOptimizationoption that a developer can specify to materialize data in memory rather than use an
Iteratorto optimize memory on write operations to Greenplum Database. Refer to About Connector Options.
The following issues were resolved in Pivotal Greenplum-Spark Connector version 1.6.1:
|29857||In some cases, a write operation using the Greenplum-Spark Connector failed when the Connector used an
Released: October 16, 2018
Pivotal Greenplum-Spark Connector 1.6.0 includes the following new feature:
Finer-Grained Control Over the Connector Server Address
The Greenplum-Spark Connector exposes new options to specify the
gpfdistserver process address on the Spark worker node. Refer to Configuring the Connector Server Address for additional information about these options.
Pivotal Greenplum-Spark Connector 1.6.0 includes the following changes:
connector.portOption is Replaced and Deprecated
The Greenplum-Spark Connector no longer uses the
connector.portoption. The Connector now uses an option named
server.portto identify the server port number.
The following issues were resolved in Pivotal Greenplum-Spark Connector version 1.6.0:
|29589||A read operation using the Greenplum-Spark Connector failed when the hosts in the Spark cluster were configured with multiple network interfaces. Greenplum Database was unable to access a
|29606||Due to a suboptimal table metadata query, the Greenplum-Spark Connector failed to read from a Greenplum Database view that contained greater than ten thousand rows. This issue is resolved. The Connector now uses a different query to obtain Greenplum table metadata.|
Known issues and limitations related to the 1.6.x release of the Pivotal Greenplum-Spark Connector include the following:
- The Greenplum-Spark Connector fails to work with Greenplum Database versions 6.7.1+ due to a change in how Greenplum handles distributed transaction IDs.
- The Greenplum-Spark Connector supports basic data types like Float, Integer, String, and Date/Time data types. The Connector does not yet support more complex types. See Greenplum Database <-> Spark Data Type Mapping for additional information.