Consistency
A mechanism to ensure that the Full Text Search (FTS) index can obtain the most up-to-date version of the document written to a collection or a bucket.
The consistency mechanism provides Consistency Vectors as objects in the search query that ensures FTS index searches all your last data written to the vBucket.
The search service does not respond to the query until the designated vBucket receives the correct sequence number.
The search query remains blocked while continuously polling the vBucket for the requested data. Once the sequence number of the data is obtained, the query is executed over the data written to the vBucket.
When using this consistency mode, the query service will ensure that the indexes are synchronized with the data service before querying.
Workflow to understand Consistency
-
Create an FTS index in Couchbase.
-
Write a document to the Couchbase cluster.
-
Couchbase returns the associate vector to the app, which needs to issue a query request with the vector.
-
The FTS index starts searching the data written to the vBucket.
In this workflow, it is possible that the document written to vBucket is not yet indexed. So, when FTS starts searching that document, the most up-to-date document versions are not retrieved, and only the indexed versions are queried.
Therefore, the Couchbase server provides a consistency mechanism to overcome this issue and ensures that the FTS index can search the most up-to-date document written to vBucket.
Consistency Level
The consistency level is a parameter that either takes an empty string indicating the unbounded (not_bounded) consistency or at_plus indicating the bounded consistency.
at_plus
Executes the query, requiring indexes first to be updated to the timestamp of the last update.
This implements bounded consistency. The request includes a scan_vector parameter and value, which is used as a lower bound. This can be used to implement read-your-own-writes (RYOW).
If index-maintenance is running behind, the query waits for it to catch up.
not_bounded
Executes the query immediately, without requiring any consistency for the query. No timestamp vector is used in the index scan.
This is the fastest mode, because it avoids the costs of obtaining the vector and waiting for the index to catch up to the vector.
If index-maintenance is running behind, out-of-date results may be returned.
Consistency Vectors
The consistency vectors supporting the consistency mechanism in Couchbase contain the mapping of the vbucket and sequence number of the data stored in the vBucket.
For more information about consistency mechanism, see Consistency
Example
{
"ctl": {
"timeout": 10000,
"consistency": {
"vectors": {
"index1": {
"607/205096593892159": 2,
"640/298739127912798": 4
}
},
"level": "at_plus"
}
},
"query": {
"match": "jack",
"field": "name"
}
}
In the example above, this is the set of consistency vectors.
"index1": { "607/205096593892159": 2, "640/298739127912798": 4 }
The query is looking within the FTS index "index1" - for:
-
vbucket 607 (with UUID 205096593892159) to contain sequence number 2
-
vbucket 640 (with UUID 298739127912798) to contain sequence number 4
Consistency Timeout
It is the amount of time (in milliseconds) the search service will allow for a query to execute at an index partition level.
If the query execution surpasses this timeout
value, the query is canceled. However, at this point if some of the index partitions have responded, you might see partial results, otherwise no results at all.
{
"ctl": {
"timeout": 10000,
"consistency": {
"vectors": {
"index1": {
"607/205096593892159": 2,
"640/298739127912798": 4
}
},
"level": "at_plus"
}
},
"query": {
"match": "jack",
"field": "name"
}
}
The "Complete" option
The complete option allows you to set the query result as "complete" which indicates that if any of the index partitions are unavailable due to the node not being reachable, the query will display an error in response instead of partial results.
Consistency Tips and Recommendations
Consistency vectors provide 'read your own writes' functionality where the read operation waits for a specific time until the write operation is finished.
When users know that their queries are complex which require more time in completing the write operations, they can set the timeout value higher than the default timeout of 10 seconds so that consistency can be obtained in the search operations.
However, if this consistency is not required, the users can optimize their search operations by using the default timeout of 10 seconds.