Getting Started
Step-by-step instructions for installing the Couchbase Elasticsearch Connector.
Pre-requisites
Linux is required for production deployments. macOS is fine for experimentation and development, but is not officially supported.
You will need:
-
The latest release of Couchbase Elasticsearch Connector.
-
Compatible versions of Java, Elasticsearch, and Couchbase Server.
Couchbase Enterprise Edition is required if you wish to enable secure connections to Couchbase. Likewise, Elasticsearch requires an additional license in order to support secure connections. Trial versions of both are available. |
Pre-flight Check
Verify the Elasticsearch cluster is up and running (the default port is 9200
).
$ curl localhost:9200
Expected result is something like:
{
"name" : "K3RqW4F",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "Bw-Ta0wDTcekzQIhXZHGkg",
"version" : {
"number" : "5.6.5",
"build_hash" : "6a37571",
"build_date" : "2017-12-04T07:50:10.466Z",
"build_snapshot" : false,
"lucene_version" : "6.6.1"
},
"tagline" : "You Know, for Search"
}
Verify that Couchbase Server is running.
$ curl localhost:8092
Expected result is something like:
{"couchdb":"Welcome","version":"v4.5.1-60-g3cf258d","couchbase":"5.0.2-5506-community"}
Installation
Extract the connector distribution archive.
This should give you a directory called couchbase-elasticsearch-connector-<version>
.
This directory will be referred to as $CBES_HOME
.
Add $CBES_HOME/bin
to your PATH
.
Configuration
Copy $CBES_HOME/config/example-connector.toml
to $CBES_HOME/config/default-connector.toml
.
The connector commands get their configuration from $CBES_HOME/config/default-connector.toml by default.
You can tell them to use a different config file with the --config <file> command line option.
|
Take a moment to browse the settings available in default-connector.toml
.
Make sure the Couchbase and Elasticsearch credentials and hostnames match your environment.
Note that the passwords are stored separately in the $CBES_HOME/secrets
directory.
If you’re using Elasticsearch 5.x, replace all instances of the type name _doc
with something that doesn’t have a leading underscore.
Starting with Elasticsearch 6 there can be only one mapping type per index.
Consequently, type names are no longer useful, and are scheduled for removal from Elasticsearch.
In the mean time, using the default type name _doc is recommended to help ease the transition… unless you’re still using Elasticsearch 5, which doesn’t allow type names to have leading underscores.
|
The sample config will replicate documents from the Couchbase travel-sample
bucket.
Go ahead and install the sample buckets now if you haven’t already.
Controlling the Connector
Distributed Mode
The throughput of the connector is limited by the time it takes for Elasticsearch to index documents. If you determine a single instance of the connector is unable to saturate your Elasticsearch indexing capacity, you can run multiple instances of the connector in distributed mode for horizontal scalability.
A Couchbase bucket consists of many separate partitions known as virtual buckets (often abbreviated as "vbuckets"). When the connector runs in distributed mode, each instance of the connector is responsible for replicating a different subset of the vbuckets. |
To run the connector in distributed mode, install the connector on multiple machines.
Make sure the connector configuration is identical on each machine, except for the memberNumber
config key, which must be unique within the group.
Set the totalMembers
config key to the total number of connector processes in the group.
Make sure to stop all of the connector instances in a group before changing the number of instances in the group. |
When a connector instance runs in distributed mode, it replicates from only the vbuckets that correspond to its group membership configuration.
Managing Checkpoints
The connector periodically saves its replication state by writing metadata documents to the Couchbase bucket.
These documents have IDs starting with _connector:cbes:
Command line tools are provided to manage the replication checkpoint.
You must stop all connector instances in a group before modifying the replication checkpoint, otherwise the changes will not take effect. |
Saving the current replication state
To create a backup of the current state:
cbes-checkpoint-backup --output <checkpoint.json>
This will create a checkpoint document on the local filesystem. On Linux, to include a timestamp in the filename:
cbes-checkpoint-backup \ --output checkpoint-$(date -u +%Y-%m-%dT%H:%M:%SZ).json
This command is safe to use while the connector is running, and can be triggered from a cron job to create periodic backups.
Reverting to a saved checkpoint
If you want to rewind the event stream and re-index documents starting from a saved checkpoint, first stop all running connector processes in the connector group. Then run:
cbes-checkpoint-restore --input <checkpoint.json>
The next time you run the connector, it will resume from the checkpoint you just restored.
Resetting the connector
If you want to discard all replication state and start streaming from the beginning, first stop all of the connector processes, then run:
cbes-checkpoint-clear
Or, if you want to reset the connector so it starts from the current state of the bucket:
cbes-checkpoint-clear --catch-up