Pivotal Greenplum-Spark Connector 1.4.0 Release Notes
The Pivotal Greenplum-Spark Connector supports high speed, parallel data transfer between Greenplum Database and an Apache Spark cluster using:
- Spark’s Scala API - programmatic access (including the
Pivotal Greenplum-Spark Connector 1.4.0 is a minor release of the Greenplum Database connector for Apache Spark. This release includes new features.
The following table identifies the supported component versions for Pivotal Greenplum-Spark Connector 1.4.0:
|Greenplum-Spark Connector Version||Greenplum Version||Spark Version||Scala Version|
Refer to the Pivotal Greenplum Database documentation for detailed information on Pivotal Greenplum Database.
See the Apache Spark documentation for information on Apache Spark version 2.1.1.
Pivotal Greenplum-Spark Connector 1.4.0 includes the following new feature:
Load Spark Data into a Greenplum Database Table - This feature provides parallel data transfer from Spark to Greenplum. When writing from Spark to Greenplum Database, the Connector:
- Automatically converts Spark data types into Greenplum data types.
- Creates the Greenplum Database table if it does not already exist.
- Supports the Spark
SaveMode.Appendsave modes on the destination Greenplum Database table.
- Supports writing data to Greenplum Database where the Spark
- Is defined with a different column order than the Greenplum Database table, or
- Includes a superset of the columns defined for the Greenplum Database table.
Known issues and limitations related to the 1.4.0 release of the Pivotal Greenplum-Spark Connector include the following:
- The Greenplum-Spark Connector does not yet support transferring data from Spark to Greenplum Database using the Spark
- The Greenplum-Spark Connector supports basic data types like Float, Integer, String, and Date/Time data types. The Connector does not yet support more complex types. See Greenplum Database <-> Spark Data Type Mapping for additional information.