Configuration
Information about the different configuration properties the Spark connector either expects or allows to set.
All couchbase-specific properties start with the spark.couchbase
prefix. They can all be configured on the SparkSession
like the following:
val spark = SparkSession
.builder()
.config("PROPERTY-KEY", "PROPERTY-VALUE")
.getOrCreate()
Required Properties
Three properties are required to get the connector up and running:
Property | Description |
---|---|
spark.couchbase.connectionString |
The connection string / hostnames of the cluster |
spark.couchbase.username |
The name of your RBAC user |
spark.couchbase.password |
The password of your RBAC user |
For example:
val spark = SparkSession
.builder()
.appName("Migration")
.config("spark.couchbase.connectionString", "127.0.0.1")
.config("spark.couchbase.username", "user")
.config("spark.couchbase.password", "pass")
.getOrCreate()
Optional Properties
There are other properties which can be provided that alter the workflow, for example by providing implicit bucket, scope or collection names.
Property | Description |
---|---|
spark.couchbase.implicitBucket |
Used as a bucket, if no explicit name provided on the operation. |
spark.couchbase.implicitScope |
Used as a scope, if no explicit name provided on the operation. |
spark.couchbase.implicitCollection |
Used as a collection, if no explicit name provided on the operation. |
spark.couchbase.waitUntilReadyTimeout |
The time until the SDK waits to make sure all connections are properly established (1 minute default). |
The implicit values are always used if no explicit Keyspace
or option override is provided on an operation.
Dynamic Configuration Properties
In addition to configuring the connector, it is also possible to configure the underlying SDK properties. Usually the SDK provides a rich builder to do so, but since spark only allows to provide string properties the approach is a bit less flexible.
The strategy is similar to configuring the SDK through system properties or connection string values. Properties are taken as a string value and then decoded into the target format.
For example to configure the KeyValue timeout, the following property can be used:
-
Key:
"spark.couchbase.timeout.kvTimeout"
-
Value:
"10s"
Or to set the certificate path:
-
Key:
security.trustCertificate
-
Value:
mycert.pem
Please refer to the SDK documentation for the possible keys or values.
TLS Connections
TLS connections can be confiugred through SDK properties shown above, but there is an alternative way that aligns with configuring TLS in spark itself. The following properties are recognized and if enabled used to connect to a Couchbase cluster encrypted:
Property | Description |
---|---|
spark.ssl.enabled |
if TLS/SSL should be enabled |
spark.ssl.keyStore |
the path to the jvm keystore |
spark.ssl.keyStorePassword |
the password of the jvm keystore |
Note that the prefix for these properties is not spark.couchbase
but spark.ssl
, since they are spark-generic properties.