Pivotal Greenplum-Spark Connector 1.2.0 Release Notes
The Pivotal Greenplum-Spark Connector supports high speed, parallel data transfer from Greenplum Database to an Apache Spark cluster.
Pivotal Greenplum-Spark Connector 1.2.0 is a minor release of the Greenplum Database connector for Apache Spark. This release includes new features and improvements.
Scope
The Greenplum-Spark Connector supports loading Greenplum Database table data into Spark using:
- Spark’s Scala API - programmatic access (including the
spark-shell
REPL)
Supported Platforms
The following table identifies the supported component versions for Pivotal Greenplum-Spark Connector 1.2.0:
Greenplum-Spark Connector Version | Greenplum Version | Spark Version | Scala Version |
---|---|---|---|
1.2.0 | 4.3.x, 5.x | 2.1.1 | 2.11 |
1.1.0 | 4.3.x, 5.x | 2.1.1 | 2.11 |
1.0.0 | 4.3.x, 5.x | 2.1.1 | 2.11 |
Refer to the Pivotal Greenplum Database documentation for detailed information on Pivotal Greenplum Database.
See the Apache Spark documentation for information on Apache Spark version 2.1.1.
New Features
Pivotal Greenplum-Spark Connector 1.2.0 includes the following new features:
Filter Predicate Pushdown
The Greenplum-Spark connector now supports filter pushdown when reading from Greenplum Database into Spark. The filter is applied by Greenplum Database, and the Connector transfers the filtered table data to Spark.
User-Specified Schema
The Greenplum-Spark Connector now exposes a schema option to identify the location of the Greenplum Database table. The table need no longer reside in a schema in your
search_path
.Custom JDBC Driver
You can now use a custom JDBC driver with the Greenplum-Spark Connector. Refer to Constructing the Greenplum Database JDBC URL.
JDBC Connection Pooling
The Greenplum-Spark connector now uses JDBC connection pooling internally to optimize connection re-use.
Changes
Pivotal Greenplum-Spark Connector 1.2.0 includes the following changes:
Data Source Short Name
The Greenplum-Spark connector now exposes the data source short name
greenplum
for reading data from Greenplum Database. Use of the Greenplum-Spark Connector fully-qualified data source class name is deprecated. Refer to Greenplum-Spark Connector Data Source for additional information.Location of External Table Creation
When you provide a user-specified schema, the Greenplum-Spark Connector now creates external tables in that schema rather than the
public
schema.Port Usage
In previous releases, the Greenplum-Spark Connector utilized multiple TCP ports in the range 49152-65535 for transferring data from Greenplum Database segment hosts to Spark worker nodes. The Greenplum-Spark Connector now uses a single port for data transfer and defers port assigment to the operating system unless you specifically configure the port number that you want the Connector to use. Refer to Network Port Requirements for more information about Greenplum-Spark Connector port requirements and configuration.
Removed Greenplum Database SUPERUSER Requirement
The Greenplum-Spark Connector no longer requires
SUPERUSER
privileges for the Greenplum Database user specified in the JDBC login credentials.Connector
password
Key is now OptionalThe
GreenplumRelationProvider
password
connection key is now optional. You can omit thepassword
key if Greenplum Database is configured to not require a password for the specified user, or if you use kerberos authentication and provide the required authentication properties in the JDBC connection string URL. See Connector Read Options.
Resolved Issues
The following issues were resolved in Pivotal Greenplum-Spark Connector 1.2.0:
Bug Id | Summary |
---|---|
154978014 | If a Greenplum Database table contained a column of time or timestamp type and one of the column’s values specified fractional seconds (for example, 10:36:54.137), then the Greenplum-Spark connector would issue a warning similar to: java.lang.NumberFormatException: For input string: “43.553”. This problem has been resolved. |
Known Issues and Limitations
Known issues and limitations related to the 1.2.0 release of the Pivotal Greenplum-Spark Connector include the following:
- The Connector does not yet support writing Spark data back into Greenplum Database.
- The Greenplum-Spark Connector supports basic data types like Float, Integer, String, and Date/Time data types. The Connector does not yet support more complex types. See Greenplum Database <-> Spark Data Type Mapping for additional information.