Pivotal Greenplum-Spark Connector 1.0.0 Release Notes

Pivotal Greenplum-Spark Connector 1.0.0 is the first release of the Greenplum Database connector for Apache Spark.

The Pivotal Greenplum-Spark Connector supports high speed, parallel data transfer from Greenplum Database to an Apache Spark cluster.


The Greenplum-Spark Connector supports loading Greenplum Database table data into Spark using:

  • Spark’s Scala API - programmatic access (including the spark-shell REPL)

Supported Platforms

The following table identifies the supported component versions for Pivotal Greenplum-Spark Connector 1.0.0:

Greenplum-Spark Connector Version Greenplum Version Spark Version Scala Version
1.0.0 4.3.x, 5.x 2.1.1 2.11

Refer to the Pivotal Greenplum Database documentation for detailed information on Pivotal Greenplum Database.

See the Apache Spark documentation for information on Apache Spark version 2.1.1.

Known Issues and Limitations

Known issues and limitations related to the 1.0.0 release of the Pivotal Greenplum-Spark Connector include the following:

  • The Connector does not yet support writing Spark data back into Greenplum Database.
  • The Greenplum-Spark Connector supports basic data types like Float, Integer, String, and Date/Time data types. The Connector does not yet support more complex types. See Greenplum Database <-> Spark Data Type Mapping for additional information.
  • The Greenplum-Spark Connector does not yet support reading data from tables located in schemas not on the user’s search_path.
  • The Connector requires that the Greenplum Database public schema be the first schema named in the Greenplum Database user’s schema search_path.
  • The Greenplum-Spark Connector does not yet support table filtering; you can load only a complete Greenplum Database table into Spark. This limitation may result in straight JDBC data transfer being more performant than the Connector when the data is filtered. The Connector will support filter pushdown in a future release.