Change History
When Magma storage is used for a bucket, the changes made to documents within the bucket’s collections can be recorded, in a change history.
Understanding Change History
When Magma storage is used for a bucket, the changes made to documents within the bucket’s collections can be recorded, in a change history. The change history resides on disk. Its capacity is administrator-specified. When the change history is full, old records are automatically removed (by means of compaction), to allow space for new records.
Change history can be either on or off. This is determined at bucket-level: if change history is on, it is potentially active for all collections in the bucket.
The size of the change history can be specified both in seconds and in bytes. If either or both are configured, change history is on:
-
The number of seconds establishes a time window, extending from the present into the past. Changes recorded prior to the earliest point specified by the current time window are removed.
-
The number of bytes establishes a physical size for the change history. If this size is exceeded, records are removed, starting with the earliest changes.
The minimum size for the change history is 2 GiB (which would be specified as 2147483648
).
The maximum is 1.8 PiB (which would be specified as 18446744073709551615
).
If a positive integer outside this range is specified, an error is flagged, no file-size is established, and change history remains disabled for the bucket.
If the default value of zero is retained for both seconds and bytes, the recording of change history is disabled.
Note that the change history is created per bucket, and is replicated in accordance with the number of intra-cluster replicas defined for the bucket. Therefore, if two replicas are configured, and the specified maximum size is 2 GiB, the total size used for the change history across the cluster becomes 6 GiB.
Change-History Management
The following interfaces are provided for configuring change history.
Change-History Enablement
Change history can be enabled with either:
-
The REST API, using
POST /pools/default/buckets
(to create a bucket) orPOST /pools/default/buckets/<bucketName>
(to edit); specifying one or both of the parametershistoryRetentionBytes
andhistoryRetentionSeconds
. See Creating and Editing Buckets. -
The CLI, using
couchbase-cli bucket-create
(to create a bucket) orcouchbase-cli bucket-edit
(to edit a bucket); specifying one or both of the parametershistory-retention-bytes
andhistory-retention-seconds
. See bucket-create and bucket-edit.
Default Collection-Setting
If change history for a bucket is switched on, the recording of changes can be performed for collections within the bucket.
A default setting, controlling whether changes are recorded for all collections within the bucket, can be established when a bucket is either created or edited.
This setting itself has a default value of true
, which means that change history is indeed recorded for every collection; unless the setting is overridden in the creating or editing of the individual collection.
This bucket-wide default setting can be established with either:
-
The REST API, using
POST /pools/default/buckets
(to create a bucket) orPOST /pools/default/buckets/<bucketName>
(to edit); specifying the parameterhistoryRetentionCollectionDefault
. See Creating and Editing Buckets. -
The CLI, using
couchbase-cli bucket-create
(to create a bucket) orcouchbase-cli bucket-edit
(to edit a bucket); specifying the parameterenable-history-retention-by-default
. See bucket-create and bucket-edit.
Individual Collection-Setting
Change history can be switched off or on for each individual collection, overriding the established, bucket-wide default. This setting can be made either when the collection is created, or subsequently, by editing the collection’s configuration. This allows a determination to be made as to which collections need their change history to be recorded, and which do not.
The setting can be established with either:
-
The REST API, using
POST /pools/default/buckets/<bucket_name>/scopes/<scope_name>/collections
(to create a collection) orPATCH /pools/default/buckets/<bucket_name>/scopes/<scope_name>/collections/<collection_name>
(to edit); specifying the parameterhistory
. See Creating and Editing a Collection. -
The CLI, using
couchbase-cli collection-manage
(to create or edit a collection); specifying the parameterenable-history-retention
. See collection-manage.
Change-History Status and Content-Access
The change-history status of each collection can be returned by means of the cbstats
tool, with the arguments collections
and collection-details
.
Output indicates a status of true
(change history is on) or false
(change history is off).
See the collections and collections-details reference pages for cbstats.
To access the change history and examine its contents, use the Kafka 4.1 Connector.
Change-History Statistics
Statistics are provided to allow change history to be monitored and managed.
Statistics Provided by cbstats
The cbstats
tool, using the all
argument, returns statistics that include the following:
-
ep_total_enqueued
andep_total_deduplicated
. These respectively provide the total number of checkpointed items, and the number of items discarded due to deduplication. The numbers correspond to items for the entire bucket, and include both active and replica vBuckets.If deduplication is switched off,
ep_total_enqueued
represents all checkpointed items. In such a case, if the bucket has one replica, and the value ofep_total_enqueued
is n, the number of writes enqueued to the change history is n/2.If deduplication is switched on, the statistic
ep_total_deduplicated
represents the number of items removed from checkpoints for the whole bucket, through deduplication, while the bucket has been up.By inspecting system performance (in terms of CPU, memory, disk, and measured latency) when deduplication is respectively on and off, insight can be gained into how optimizations can be achieved. Note that generally speaking, deduplication results in a greater use of CPU, memory, and disk; due to the key-handling requirement for deduplicated items.
-
ep_total_deduplicated_flusher
. When items are to be flushed (persisted), the contents of multiple checkpoints are batched, so that all can be flushed. If deduplication is switched on, the checkpoints may already have had items removed through deduplication. However, a further check is made across checkpoints, prior to writes being persisted, to find and remove duplicate items. The total number of items removed from the flusher in this way is represented byep_total_deduplicated_flusher
. -
history_start_seqno
. The document seqno that represents the starting point of the history window. Hence, all documents having a higher seqno will be retained. As more and more data is removed from the window, this seqno will be incremented, to correspond to a later document. -
history_disk_size
. On-disk compressed size (after Magma’s block compression and per document compression) of the fragmented and unfragmented data that lies within the history window, for all vBuckets on the node.
For more information, see the reference page for cbstats
all.
Note that the use of a tool such as Grafana with some of these statistics — so as to describe a continuum of readings over time — may produce more insight than the inspection of an individual figure.
Statistics Provided for REST and Prometheus
Metrics are provided for retrieval by either the REST API or Prometheus. See Statistics for information on statistics-retrieval, and the Metrics Reference for a complete list of available statistics.
In the following descriptions, the history window refers to the time or space specified by the administrator for the change history (by means of, say, the parameters historyRetentionBytes
and historyRetentionSeconds
, used with the REST API).
-
kv_total_enqueued
,kv_total_deduplicated
, andkv_total_deduplicated_flusher
. These have the same significance as their equivalents, provided bycbstats
; described immediately above. -
kv_ep_magma_history_time_evicted_bytes
. The total amount of data (in bytes) so far removed from the change history, due to the limit established withhistoryRetentionSeconds
(REST API) orhistory-retention-seconds
(CLI). -
kv_ep_magma_history_size_evicted
. The total amount of data (in bytes) so far removed from the change history, due to the limit established withhistoryRetentionBytes
(REST API) orhistory-retention-bytes
(CLI). -
kv_ep_db_history_file_size
. On-disk compressed size (after Magma’s block compression and per document compression) of the fragmented and unfragmented data that lies within the history window. -
kv_vb_max_history_disk_size
. Maximum amount of history retained across all vBuckets. The size here is on-disk compressed.The statistic can be used to return data according to
state
: which can beactive
,pending
,replica
, ordead
. By default, data on all available states is returned. For example:curl -X GET "http://localhost:8091/pools/default/stats/range/kv_vb_max_history_disk_size_bytes?bucket=magmaBucket" -u Administrator:password | jq '.'
Output, in summarized form, is as follows:
{ "data": [ { "metric": { "nodes": [ "127.0.0.1:8091" ], "bucket": "magmaBucket", "instance": "kv", "name": "kv_vb_max_history_disk_size_bytes", "state": "active" }, "values": [ [ 1679404201, "0" ], . . ] }, { "metric": { "nodes": [ "127.0.0.1:8091" ], "bucket": "magmaBucket", "instance": "kv", "name": "kv_vb_max_history_disk_size_bytes", "state": "pending" }, "values": [ [ 1679404201, "0" ], . . ] }, { "metric": { "nodes": [ "127.0.0.1:8091" ], "bucket": "magmaBucket", "instance": "kv", "name": "kv_vb_max_history_disk_size_bytes", "state": "replica" }, "values": [ [ 1679404201, "0" ], . ], "errors": [], "startTimestamp": 1679404201, "endTimestamp": 1679404261 }
An individual state can be specified with the
state
flag, as follows:curl -X GET "http://localhost:8091/pools/default/stats/range/kv_vb_max_history_disk_size_bytes?bucket=magmaBucket&state=replica" -u Administrator:password | jq '.'
The output from this call will contain only data for the state
replica
. -
kv_ep_magma_history_logical_disk_size
. Size of fragmented and unfragmented data (after per document compression) that lies within the history window. This, along withkv_ep_magma_history_logical_data_size
, can be used to compute fragmentation in history. -
kv_ep_magma_history_logical_data_size
. Size of unfragmented data (after per document compression) that lies within the history window. This, along withkv_ep_magma_history_logical_disk_size
, can be used to compute fragmentation in history.
See Also
For information on establishing change-history default settings at bucket-creation time, see Creating and Editing Buckets.
For information on switching on or off change history for a specific collection, see Creating and Editing a Collection.
To examine the change-history status for each collection in a bucket, see the collections option for cbstats
.