Pivotal Greenplum-Spark Connector 1.3.0 Release Notes
The Pivotal Greenplum-Spark Connector supports high speed, parallel data transfer from Greenplum Database to an Apache Spark cluster.
Pivotal Greenplum-Spark Connector 1.3.0 is a minor release of the Greenplum Database connector for Apache Spark. This release includes bug fixes, new features, and improvements.
The Greenplum-Spark Connector supports loading Greenplum Database table data into Spark using:
- Spark’s Scala API - programmatic access (including the
The following table identifies the supported component versions for Pivotal Greenplum-Spark Connector 1.3.0:
|Greenplum-Spark Connector Version||Greenplum Version||Spark Version||Scala Version|
Refer to the Pivotal Greenplum Database documentation for detailed information on Pivotal Greenplum Database.
See the Apache Spark documentation for information on Apache Spark version 2.1.1.
Pivotal Greenplum-Spark Connector 1.3.0 includes the following new features:
Connection Pool Configuration
The Greenplum-Spark Connector pools JDBC connections for each Spark application. The Connector now provides configuration options to tune connection pool size and idle properties. Refer to JDBC Connection Pooling for additional information about this feature.
Pivotal Greenplum-Spark Connector 1.3.0 includes the following changes:
Data Assignment to Spark Workers
The Greenplum-Spark Connector now uses Greenplum Database table statistics to partition table data among Spark worker nodes. With this scheme, the 1.3.0 Connector may assign table data to different Spark worker nodes than would be assigned by previous Connector versions.
Spark Worker Port Specification
The Greenplum-Spark Connector now supports specifying a single
gpfdistport number via a
DataFrameoption. In previous versions of the Connector, you set an environment variable named
$GPFDIST_PORTto specify a single port or a list of port numbers. Refer to Network Port Requirements for more information about Greenplum-Spark Connector port requirements and configuration.
The following issues were resolved in Pivotal Greenplum-Spark Connector 1.3.0:
|155369957||The Greenplum-Spark Connector returned a
|155799016||The Greenplum-Spark Connector returns a “connection limit exceeded” error when a request by a Spark application exceeds the maximum number of connections configured for the Greenplum Database server. To mitigate this error in cases where a Spark application using the Greenplum-Spark Connector is the culprit, the Connector now exposes configuration options to tune connection pool size and idle properties. Refer to JDBC Connection Pooling for information about connection pooling in the Greenplum-Spark Connector. For related troubleshooting information, see Greenplum Database Connection Errors.|
Known issues and limitations related to the 1.3.0 release of the Pivotal Greenplum-Spark Connector include the following:
- The Connector does not yet support writing Spark data back into Greenplum Database.
- The Greenplum-Spark Connector supports basic data types like Float, Integer, String, and Date/Time data types. The Connector does not yet support more complex types. See Greenplum Database <-> Spark Data Type Mapping for additional information.