Getting Started
A brief overview of the various ways to run the Couchbase Elasticsearch Connector, followed by step-by-step instructions for installing the connector in solo/distributed mode.
Operational Modes
The connector can be deployed in different modes depending on the requirements of your project.
Solo
In the simplest mode, the connector runs as a single standalone Java process. This is referred to as "solo" mode. Solo mode is appropriate for experimentation and low-traffic environments.
If this is your first time working with the connector, we recommend starting with solo mode.
Distributed
If a single connector process cannot handle all of your traffic, multiple connector processes can be deployed in "distributed" mode. In this mode each process is manually configured to handle only a subset of the replication workload. Distributed mode can scale to handle high volumes of traffic, but is inflexible; adding an additional process to a distributed connector group requires stopping and reconfiguring all of the processes in the group.
Solo mode is effectively the same as distributed mode with a group size of 1. |
Autonomous Operations
For scalable environments that require high availability and centralized management, you can run the connector in "autonomous operations" (AO) mode. This mode is similar to distributed mode in that each process handles a subset of the replication workload, but improves upon it by using a HashiCorp Consul cluster to coordinate the activities of the connector processes. This enables connector processes to dynamically join or leave the group, and allows an administrator to reconfigure the group on-the-fly without needing to shut down all of the processes.
AO mode is discussed in more detail in the Autonomous Operations guide. The page you’re reading now is focused on solo and distributed mode; we recommend becoming familiar with these modes before progressing to AO mode.
Pre-requisites
Linux is required for production deployments. macOS is fine for experimentation and development, but is not officially supported.
To deploy the connector in solo or distributed mode, you will need:
-
The latest release of Couchbase Elasticsearch Connector.
-
Compatible versions of Java, Elasticsearch, and Couchbase Server.
Couchbase Enterprise Edition is required if you wish to enable secure connections to Couchbase. Likewise, versions of Elasticsearch prior to 6.8 and 7.1 require an additional license in order to support secure connections. Trial versions of both are available. |
Pre-flight Check
Verify the Elasticsearch cluster is up and running (the default port is 9200
).
$ curl localhost:9200
Expected result is something like:
{
"name" : "K3RqW4F",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "Bw-Ta0wDTcekzQIhXZHGkg",
"version" : {
"number" : "5.6.5",
"build_hash" : "6a37571",
"build_date" : "2017-12-04T07:50:10.466Z",
"build_snapshot" : false,
"lucene_version" : "6.6.1"
},
"tagline" : "You Know, for Search"
}
Verify that Couchbase Server is running.
$ curl localhost:8092
Expected result is something like:
{"couchdb":"Welcome","version":"v4.5.1-60-g3cf258d","couchbase":"5.0.2-5506-community"}
Installation
Extract the connector distribution archive.
This should give you a directory called couchbase-elasticsearch-connector-<version>
.
This directory will be referred to as $CBES_HOME
.
Add $CBES_HOME/bin
to your PATH
.
Configuration
Copy $CBES_HOME/config/example-connector.toml
to $CBES_HOME/config/default-connector.toml
.
The connector commands get their configuration from $CBES_HOME/config/default-connector.toml by default.
You can tell them to use a different config file with the --config <file> command line option.
|
Take a moment to browse the settings available in default-connector.toml
.
Make sure the Couchbase and Elasticsearch credentials and hostnames match your environment.
Note that the passwords are stored separately in the $CBES_HOME/secrets
directory.
If you’re using Elasticsearch 5.x, replace all instances of the type name _doc
with something that doesn’t have a leading underscore.
Starting with Elasticsearch 6 there can be only one mapping type per index.
Consequently, type names are no longer useful, and are scheduled for removal from Elasticsearch.
In the mean time, using the default type name _doc is recommended to help ease the transition… unless you’re still using Elasticsearch 5, which doesn’t allow type names to have leading underscores.
|
The sample config will replicate documents from the Couchbase travel-sample
bucket.
Go ahead and install the sample buckets now if you haven’t already.
Controlling the Connector
Distributed Mode
The throughput of the connector is limited by the time it takes for Elasticsearch to index documents. If you determine a single instance of the connector is unable to saturate your Elasticsearch indexing capacity, you can run multiple instances of the connector in distributed mode for horizontal scalability.
A Couchbase bucket consists of many separate partitions (also known as virtual buckets, abbreviated as "vbuckets"). When the connector runs in distributed mode, each instance of the connector is responsible for replicating a different subset of the partitions.
To run the connector in distributed mode, install the connector on multiple machines.
Make sure the connector configuration is identical on each machine, except for the memberNumber
config key, which must be unique within the group.
Set the totalMembers
config key to the total number of connector processes in the group.
Make sure to stop all of the connector instances in a group before changing the number of instances in the group. |
When a connector instance runs in distributed mode, it replicates from only the partitions that correspond to its group membership configuration.
Managing Checkpoints
The connector periodically saves its replication state by writing metadata documents to the Couchbase bucket.
These documents have IDs starting with _connector:cbes:
Command line tools are provided to manage the replication checkpoint.
You must stop all connector instances in a group before modifying the replication checkpoint, otherwise the changes will not take effect. (This restriction does not apply when running in Autonomous Operations mode.) |
The following commands are specific to the solo and distributed modes. Autonomous Operations mode has its own separate commands for managing checkpoints.
Saving the current replication state
To create a backup of the current state:
cbes-checkpoint-backup --output <checkpoint.json>
This will create a checkpoint document on the local filesystem. On Linux, to include a timestamp in the filename:
cbes-checkpoint-backup \ --output checkpoint-$(date -u +%Y-%m-%dT%H:%M:%SZ).json
This command is safe to use while the connector is running, and can be triggered from a cron job to create periodic backups.
Reverting to a saved checkpoint
If you want to rewind the event stream and re-index documents starting from a saved checkpoint, first stop all running connector processes in the connector group. Then run:
cbes-checkpoint-restore --input <checkpoint.json>
The next time you run the connector, it will resume from the checkpoint you just restored.
Resetting the connector
If you want to discard all replication state and start streaming from the beginning, first stop all of the connector processes, then run:
cbes-checkpoint-clear
Or, if you want to reset the connector so it starts from the current state of the bucket:
cbes-checkpoint-clear --catch-up
What’s Next?
After successfully deploying the connector in solo or distributed mode, you’re ready to dive into the Autonomous Operations guide.