Pivotal Greenplum-Spark Connector 1.1.0 Release Notes
The Pivotal Greenplum-Spark Connector supports high speed, parallel data transfer from Greenplum Database to an Apache Spark cluster.
Pivotal Greenplum-Spark Connector 1.1.0 is a minor release of the Greenplum Database connector for Apache Spark. This release includes bug fixes and improvements.
Scope
The Greenplum-Spark Connector supports loading Greenplum Database table data into Spark using:
- Spark’s Scala API - programmatic access (including the
spark-shell
REPL)
Supported Platforms
The following table identifies the supported component versions for Pivotal Greenplum-Spark Connector 1.1.0:
Greenplum-Spark Connector Version | Greenplum Version | Spark Version | Scala Version |
---|---|---|---|
1.1.0 | 4.3.x, 5.x | 2.1.1 | 2.11 |
1.0.0 | 4.3.x, 5.x | 2.1.1 | 2.11 |
Refer to the Pivotal Greenplum Database documentation for detailed information on Pivotal Greenplum Database.
See the Apache Spark documentation for information on Apache Spark version 2.1.1.
Features and Changes
Pivotal Greenplum-Spark Connector 1.1.0 includes the following new feature:
Column Projection
The Greenplum-Spark connector now supports column projection when reading from Greenplum Database into Spark. The Connector will transfer table data residing only in the columns specified in a select or filter operation.
Resolved Issues
The following issues were resolved in Pivotal Greenplum-Spark Connector 1.1.0:
Bug Id | Summary |
---|---|
29088 | The Greenplum-Spark Connector now correctly handles the case when new data is added to a Greenplum table, the data range exceeds the original partition boundaries, and the application specified a partitionsPerSegment greater than the default (one). |
Known Issues and Limitations
Known issues and limitations related to the 1.1.0 release of the Pivotal Greenplum-Spark Connector include the following:
- The Connector does not yet support writing Spark data back into Greenplum Database.
- The Greenplum-Spark Connector supports basic data types like Float, Integer, String, and Date/Time data types. The Connector does not yet support more complex types. See Greenplum Database <-> Spark Data Type Mapping for additional information.
- The Greenplum-Spark Connector does not yet support reading data from tables located in schemas not on the user’s
search_path
. - The Connector requires that the Greenplum Database
public
schema be the first schema named in the Greenplum Database user’s schemasearch_path
. - The Greenplum-Spark Connector does not yet support filter pushdown. This limitation may result in straight JDBC data transfer being more performant than the Connector when the data is filtered. The Connector will support filter pushdown in a future release.