Pivotal Greenplum-Spark Connector 1.4.0 Release Notes

The Pivotal Greenplum-Spark Connector supports high speed, parallel data transfer between Greenplum Database and an Apache Spark cluster using:

  • Spark’s Scala API - programmatic access (including the spark-shell REPL)

Pivotal Greenplum-Spark Connector 1.4.0 is a minor release of the Greenplum Database connector for Apache Spark. This release includes new features.

Supported Platforms

The following table identifies the supported component versions for Pivotal Greenplum-Spark Connector 1.4.0:

Greenplum-Spark Connector Version Greenplum Version Spark Version Scala Version
1.4.0 4.3.x, 5.x 2.1.1 2.11
1.3.0 4.3.x, 5.x 2.1.1 2.11
1.2.0 4.3.x, 5.x 2.1.1 2.11
1.1.0 4.3.x, 5.x 2.1.1 2.11
1.0.0 4.3.x, 5.x 2.1.1 2.11

Refer to the Pivotal Greenplum Database documentation for detailed information on Pivotal Greenplum Database.

See the Apache Spark documentation for information on Apache Spark version 2.1.1.

New Features

Pivotal Greenplum-Spark Connector 1.4.0 includes the following new feature:

Load Spark Data into a Greenplum Database Table - This feature provides parallel data transfer from Spark to Greenplum. When writing from Spark to Greenplum Database, the Connector:

  • Automatically converts Spark data types into Greenplum data types.
  • Creates the Greenplum Database table if it does not already exist.
  • Supports the Spark SaveMode.ErrorIfExists and SaveMode.Append save modes on the destination Greenplum Database table.
  • Supports writing data to Greenplum Database where the Spark DataFrame:
    • Is defined with a different column order than the Greenplum Database table, or
    • Includes a superset of the columns defined for the Greenplum Database table.

Known Issues and Limitations

Known issues and limitations related to the 1.4.0 release of the Pivotal Greenplum-Spark Connector include the following:

  • The Greenplum-Spark Connector does not yet support transferring data from Spark to Greenplum Database using the Spark SaveMode.Ignore and SaveMode.Overwrite write modes.
  • The Greenplum-Spark Connector supports basic data types like Float, Integer, String, and Date/Time data types. The Connector does not yet support more complex types. See Greenplum Database <-> Spark Data Type Mapping for additional information.