Pivotal Greenplum®-Spark® Connector v1.5

Pivotal Greenplum-Spark Connector 1.5.0 Release Notes

The Pivotal Greenplum-Spark Connector supports high speed, parallel data transfer between Greenplum Database and an Apache Spark cluster using:

  • Spark’s Scala API - programmatic access (including the spark-shell REPL)

Pivotal Greenplum-Spark Connector 1.5.0 is a minor release of the Greenplum Database connector for Apache Spark. This release includes new and changed features and bug fixes.

Supported Platforms

The following table identifies the supported component versions for Pivotal Greenplum-Spark Connector 1.5.0:

Greenplum-Spark Connector Version Greenplum Version Spark Version Scala Version
1.5.0 4.3.x, 5.x 2.1.2 and above 2.11
1.4.0 4.3.x, 5.x 2.1.1 2.11
1.3.0 4.3.x, 5.x 2.1.1 2.11
1.2.0 4.3.x, 5.x 2.1.1 2.11
1.1.0 4.3.x, 5.x 2.1.1 2.11
1.0.0 4.3.x, 5.x 2.1.1 2.11

Refer to the Pivotal Greenplum Database documentation for detailed information about Pivotal Greenplum Database.

See the Apache Spark documentation for information about Apache Spark version 2.1.2.

New Features

Pivotal Greenplum-Spark Connector 1.5.0 includes the following new features:

Support for Greenplum Database Views and Randomly Distributed Tables - The Connector can now load data into Spark from Greenplum Database views and tables created with random distribution.

Support for Spark SaveMode.Ignore - When this save mode is enabled on a write operation and the target Greenplum Database table exists, the Connector ignores the write request; it neither writes data to the table nor does it disturb the existing data.

Support for Spark SaveMode.Overwrite - You can now overwrite data in a Greenplum Database table. When SaveMode.Overwrite is enabled for a write operation and the target Greenplum Database table exists, you can instruct the Connector to perform one of the following actions before writing any new data:

  • Drop and re-create the target table, or
  • Truncate data that may already exist in the table.

New truncate Write Option - The Connector exposes a new write option, truncate, to support SaveMode.Overwrite. Refer to Connector Write Options for additional information.

Changed Features

Pivotal Greenplum-Spark Connector 1.5.0 includes the following changes:

  • partitionsPerSegment Option is Replaced and Deprecated

    The Greenplum-Spark Connector no longer uses the partitionsPerSegment read option. The Connector now uses an option named partitions to determine the number of Spark partitions.

  • Support for Spark Version 2.1.2+

    Greenplum-Spark Connector 1.5.0 supports Spark version 2.1.2 and above. The Connector previously supported version 2.1.1 of Spark.

Resolved Issues

The following issues were resolved in Pivotal Greenplum-Spark Connector version 1.5.0:

Bug Id Summary
158466513 The Greenplum-Spark Connector returned a java.lang.IllegalArgumentException when it read a single row of data from a table in a Greenplum Database cluster with 6 or more primary segments into a Spark DataFrame, and then attempted to write the DataFrame to a Greenplum Database table. The Connector now correctly reads and writes a DataFrame with a single row of data from/to Greenplum.
159861323 The Greenplum-Spark Connector incorrectly wrote a question mark (’?’) character to Greenplum Database when it encountered a Cyrillic character in the Spark DataFrame. The Connector now correctly preserves the DataFrame character set when writing to Greenplum.
159861326 The Greenplum-Spark Connector incorrectly handled backslash (’\’) characters when reading a Greenplum Database table into a Spark DataFrame. The Connector now correctly handles the backslash character when encountered in Greenplum data.

Known Issues and Limitations

Known issues and limitations related to the 1.5.0 release of the Pivotal Greenplum-Spark Connector include the following:

  • The Greenplum-Spark Connector supports basic data types like Float, Integer, String, and Date/Time data types. The Connector does not yet support more complex types. See Greenplum Database <-> Spark Data Type Mapping for additional information.