Pivotal Greenplum-Spark Connector 1.1.0 Release Notes

The Pivotal Greenplum-Spark Connector supports high speed, parallel data transfer from Greenplum Database to an Apache Spark cluster.

Pivotal Greenplum-Spark Connector 1.1.0 is a minor release of the Greenplum Database connector for Apache Spark. This release includes bug fixes and improvements.


The Greenplum-Spark Connector supports loading Greenplum Database table data into Spark using:

  • Spark’s Scala API - programmatic access (including the spark-shell REPL)

Supported Platforms

The following table identifies the supported component versions for Pivotal Greenplum-Spark Connector 1.1.0:

Greenplum-Spark Connector Version Greenplum Version Spark Version Scala Version
1.1.0 4.3.x, 5.x 2.1.1 2.11
1.0.0 4.3.x, 5.x 2.1.1 2.11

Refer to the Pivotal Greenplum Database documentation for detailed information on Pivotal Greenplum Database.

See the Apache Spark documentation for information on Apache Spark version 2.1.1.

Features and Changes

Pivotal Greenplum-Spark Connector 1.1.0 includes the following new feature:

  • Column Projection

    The Greenplum-Spark connector now supports column projection when reading from Greenplum Database into Spark. The Connector will transfer table data residing only in the columns specified in a select or filter operation.

Resolved Issues

The following issues were resolved in Pivotal Greenplum-Spark Connector 1.1.0:

Bug Id Summary
29088 The Greenplum-Spark Connector now correctly handles the case when new data is added to a Greenplum table, the data range exceeds the original partition boundaries, and the application specified a partitionsPerSegment greater than the default (one).

Known Issues and Limitations

Known issues and limitations related to the 1.1.0 release of the Pivotal Greenplum-Spark Connector include the following:

  • The Connector does not yet support writing Spark data back into Greenplum Database.
  • The Greenplum-Spark Connector supports basic data types like Float, Integer, String, and Date/Time data types. The Connector does not yet support more complex types. See Greenplum Database <-> Spark Data Type Mapping for additional information.
  • The Greenplum-Spark Connector does not yet support reading data from tables located in schemas not on the user’s search_path.
  • The Connector requires that the Greenplum Database public schema be the first schema named in the Greenplum Database user’s schema search_path.
  • The Greenplum-Spark Connector does not yet support filter pushdown. This limitation may result in straight JDBC data transfer being more performant than the Connector when the data is filtered. The Connector will support filter pushdown in a future release.