Pivotal Greenplum-Spark Connector 1.5.0 Release Notes
The Pivotal Greenplum-Spark Connector supports high speed, parallel data transfer between Greenplum Database and an Apache Spark cluster using:
- Spark’s Scala API - programmatic access (including the
Pivotal Greenplum-Spark Connector 1.5.0 is a minor release of the Greenplum Database connector for Apache Spark. This release includes new and changed features and bug fixes.
The following table identifies the supported component versions for Pivotal Greenplum-Spark Connector 1.5.0:
|Greenplum-Spark Connector Version||Greenplum Version||Spark Version||Scala Version|
|1.5.0||4.3.x, 5.x||2.1.2 and above||2.11|
Refer to the Pivotal Greenplum Database documentation for detailed information about Pivotal Greenplum Database.
See the Apache Spark documentation for information about Apache Spark version 2.1.2.
Pivotal Greenplum-Spark Connector 1.5.0 includes the following new features:
Support for Greenplum Database Views and Randomly Distributed Tables - The Connector can now load data into Spark from Greenplum Database views and tables created with random distribution.
Support for Spark SaveMode.Ignore - When this save mode is enabled on a write operation and the target Greenplum Database table exists, the Connector ignores the write request; it neither writes data to the table nor does it disturb the existing data.
Support for Spark SaveMode.Overwrite - You can now overwrite data in a Greenplum Database table. When
SaveMode.Overwrite is enabled for a write operation and the target Greenplum Database table exists, you can instruct the Connector to perform one of the following actions before writing any new data:
- Drop and re-create the target table, or
- Truncate data that may already exist in the table.
New truncate Write Option - The Connector exposes a new write option,
truncate, to support
SaveMode.Overwrite. Refer to Connector Write Options for additional information.
Pivotal Greenplum-Spark Connector 1.5.0 includes the following changes:
partitionsPerSegmentOption is Replaced and Deprecated
The Greenplum-Spark Connector no longer uses the
partitionsPerSegmentread option. The Connector now uses an option named
partitionsto determine the number of Spark partitions.
Support for Spark Version 2.1.2+
Greenplum-Spark Connector 1.5.0 supports Spark version 2.1.2 and above. The Connector previously supported version 2.1.1 of Spark.
The following issues were resolved in Pivotal Greenplum-Spark Connector version 1.5.0:
|158466513||The Greenplum-Spark Connector returned a
|159861323||The Greenplum-Spark Connector incorrectly wrote a question mark (’?’) character to Greenplum Database when it encountered a Cyrillic character in the Spark
|159861326||The Greenplum-Spark Connector incorrectly handled backslash (’\’) characters when reading a Greenplum Database table into a Spark
Known issues and limitations related to the 1.5.0 release of the Pivotal Greenplum-Spark Connector include the following:
- The Greenplum-Spark Connector supports basic data types like Float, Integer, String, and Date/Time data types. The Connector does not yet support more complex types. See Greenplum Database <-> Spark Data Type Mapping for additional information.