Migrating from SDK2 to SDK3 API
The 3.0 API breaks the existing 2.0 APIs in order to provide a number of improvements. Collections and Scopes are introduced.
The Document class and structure has been completely removed from the API, and the returned value is now Result
.
Retry behaviour is more proactive, and lazy bootstrapping moves all error handling to a single place.
Individual behaviour changes across services are explained here.
Fundamentals
The Couchbase SDK team takes semantic versioning seriously, which means that API should not be broken in incompatible ways while staying on a certain major release. This has the benefit that most of the time upgrading the SDK should not cause much trouble, even when switching between minor versions (not just bugfix releases). The downside though is that significant improvements to the APIs are very often not possible, save as pure additions — which eventually lead to overloaded methods.
To support new server releases and prepare the SDK for years to come, we have decided to increase the major version of each SDK and as a result take the opportunity to break APIs where we had to. As a result, migration from the previous major version to the new major version will take some time and effort — an effort to be counterbalanced by improvements to coding time, through the simpler API, and performance. The new API is built on years of hands-on experience with the current SDK as well as with a focus on simplicity, correctness, and performance.
Before this guide dives into the language-specific technical component of the migration, it is important to understand the high level changes first. As a migration guide, this document assumes you are familiar with the previous generation of the SDK and does not re-introduce SDK 2.0 concepts. We recommend familiarizing yourself with the new SDK first by reading at least the getting started guide, and browsing through the other chapters a little.
Terminology
The concept of a Cluster
and a Bucket
remain the same, but a fundamental new layer is introduced into the API: Collections
and their Scopes
.
Collections are logical data containers inside a Couchbase bucket that let you group similar data just like a Table does in a relational database — although documents inside a collection do not need to have the same structure.
Scopes allow the grouping of collections into a namespace, which is very useful when you have multiple tenants accessing the same bucket.
Couchbase Server includes support for collections as a developer preview in version 6.5, and as a first class concept of the programming model from version 7.0.
Note that the SDKs include the feature from SDK 3.0, to allow easier migration.
In the previous SDK generation, particularly with the KeyValue
API, the focus has been on the codified concept of a Document
.
Documents were read and written and had a certain structure, including the id
/key
, content, expiry (ttl
), and so forth.
While the server still operates on the logical concept of documents, we found that this model in practice didn’t work so well for client code in certain edge cases.
As a result we have removed the Document
class/structure completely from the API.
The new API follows a clear scheme: each command takes required arguments explicitly, and an option block for all optional values.
The returned value is always of type Result
.
This avoids method overloading bloat in certain languages, and has the added benefit of making it easy to grasp APIs evenly across services.
As an example here is a KeyValue document fetch:
from datetime import timedelta
from couchbase.cluster import Cluster
from couchbase.collection import GetOptions
cluster=Cluster("couchbases://10.192.1.104")
collection=cluster.default_collection()
get_result = collection.get("key", GetOptions(timeout=timedelta(seconds=3)))
Compare this to a N1QL query:
query_result = cluster.query("select 1=1", QueryOptions(timeout=timedelta(seconds=3))
Since documents also fundamentally handled the serialization aspects of content, two new concepts are introduced: the Serializer
and the Transcoder
.
Out of the box the SDKs ship with a JSON serializer which handles the encoding and decoding of JSON.
You’ll find the serializer exposes the options for methods like N1QL queries and KeyValue subdocument operations,.
The KV API extends the concept of the serializer to the Transcoder
.
Since you can also store non-JSON data inside a document, the Transcoder
allows the writing of binary data as well.
It handles the object/entity encoding and decoding, and if it happens to deal with JSON makes uses of the configured Serializer
internally.
See the Serialization and Transcoding section below for details.
What to look out for
The SDKs are more proactive in retrying with certain errors and in certain situations, within the timeout budget given by the user — as an example, temporary failures or locked documents are now being retried by default — making it even easier to program against certain error cases.
This behavior is customizable in a RetryStrategy
, which can be overridden on a per operation basis for maximum flexibility if you need it.
Note, most of the bootstrap sequence is now lazy (happening behind the scenes). For example, opening a bucket is not raising an error anymore, but it will only show up once you perform an actual operation. The reason behind this is to spare the application developer the work of having to do error handling in more places than needed. A bucket can go down 2ms after you opened it, so you have to handle request failures anyway. By delaying the error into the operation result itself, there is only one place to do the error handling. There will still be situations why you want to check if the resource you are accessing is available before continuing the bootstrap; for this, we have the diagnostics and ping commands at each level which allow you to perform those checks eagerly.
Language Specifics
Now that you are familiar with the general theme of the migration, the next sections dive deep into the specifics. First, installation and configuration are covered, then we talk about exception handling, and then each service (i.e. Key/Value, Query,…) is covered separately.
Installation and Configuration
The Python SDK 3.x is available for download from the same resources as the previous generation. Builds can be found on PyPi. Please see the Release Notes for up-to-date information.
Python SDK 3.x has a minimum required Python version of 3.5, although we recommend running the latest version (i.e. at the time of writing Python 3.8) with the highest patch version available. |
Note that the transitive dependency list has changed. As a refresher, Python SDK 2 depended on the following packages:
-
typing
SDK 3 depends on the following ones instead:
-
typing (on Python<3.7)
-
typing-extensions (on Python<3.8)
-
boltons
-
pyrsistent
Additionally these are supported optionally in SDK 2 and SDK 3.
-
Twisted
-
gevent
If you are pulling in the SDK through a package manager (recommended), all mandatory dependencies will be resolved for you automatically. |
Configuring Collections
The fundamental semantics of the Bucket
from SDK 2 are analogous to that of the Collection
in SDK 3.
# SDK 2 custom KV timeout
bucket = Bucket("couchbases://127.0.0.1/default")
bucket.timeout=5
# SDK 3 equivalent
cluster=Cluster("couchbases://10.192.1.104")
collection=cluster.bucket("default").default_collection()
collection.timeout=5
The default settings can still be customized through either the connection string or system properties. The SDK has elaborate reflection logic in place to parse "flat" string values and apply them to the builder, which means that you can now configure more properties than in SDK 2. Note that the property paths have changed.
# Will set the compression type to inout
Cluster.connect(
"couchbases://127.0.0.1?compression=inout",ClusterOptions(PasswordAuthenticator(
"user",
"pass")))
# This is equivalent to
collection.compression = COMPRESS_INOUT
See the configuration section for full specifics.
At the end of this guide you’ll find a reference that describes the SDK 2 environment options and their SDK 3 equivalents where applicable.
Authentication
Since SDK 2 supports Couchbase Server clusters older than 5.0, it had to support both Role-Based Access Control (RBAC) as well as bucket-level passwords. The minimum cluster version supported by SDK 3 is Server 5.0, which means that only RBAC is supported. This is why you can set the username and password when directly connecting:
# add convenience overload when available
Cluster.connect("couchbases://127.0.0.1", ClusterOptions(PasswordAuthenticator("username", "password")))
This is just a shorthand for:
Cluster.connect(
"couchbases://127.0.0.1",
ClusterOptions(PasswordAuthenticator("username", "password")))
The reason why you can pass in a specific authenticator is that you can also use the same approach to configure certificate-based authentication:
cert_dir=os.path.join(os.path.curdir,"cert_dir")
Cluster.connect("couchbases://127.0.0.1", ClusterOptions(
CertAuthenticator(cert_path="cert.pem",
key_path="key.crt",
trust_store_path="trust_store.pem"
)))
Connection Lifecycle
From a high-level perspective, bootstrapping and shutdown is very similar to SDK 2.
Collections
will be generally available with an upcoming Couchbase Server release, but the SDK already encodes it in its API to be future-proof.
If you are using a Couchbase Server version which does not support Collections
, always use the default_collection()
method to access the KV API; it will map to the full bucket.
Also note, you will now find Query, Search, and Analytics at the Cluster
level.
This is where they logically belong.
If you are using Couchbase Server 6.5 or later, you will be able to perform cluster-level queries even if no bucket is open.
If you are using an earlier version of the cluster you must open at least one bucket, otherwise cluster-level queries will fail.
Exception Handling
How to handle exceptions is unchanged from SDK 2.
You should still use try/catch
on the blocking APIs and the corresponding async methods on the other APIs.
There have been changes made in the following areas:
-
Exception hierarchy and naming.
-
Proactive retry where possible.
Key Value
The Key/Value (KV) API is now located under the Collection
interface, so even if you do not use collections, the default_collection()
call needs to be opened in order to access it.
The following table describes the SDK 2 KV APIs and where they are now located in SDK 3:
SDK 2 | SDK 3 |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In addition, the datastructure APIs have been renamed and moved:
SDK 2 | SDK 3 |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
There are two important API changes:
-
On the request side, overloads have been reduced and moved under a
Options
block -
On the response side, the return types have been unified.
The signatures now look very similar.
In SDK3, the get
method returns a GetResult
, and the upsert
returns MutationResult
.
Each of those results only contains the fields that the specific method can actually return, making it impossible to accidentally try to access the expiry
on the Result
after a mutation, for example.
Optional parameters are now accessible via couchbase.options.OptionBlock
derivatives, or via named parameters (with the latter overriding
the former).
All required params are still part of the method signature, making it clear what is required and what is not (or has default values applied if not overridden).
The timeout can be overridden on every operation and now takes a datetime.timedelta
object from the Python standard library.
# SDK 3 custom timeout
get_result = collection.get(
"mydoc-id",
GetOptions(timeout=timedelta(seconds=5)))
self.assertEquals("fish",get_result.content_as[str])
In SDK 2, the get_from_replica
method had a ReplicaMode
argument which allowed to customize its behavior on how many replicas should be reached.
We have identified this as a potential source of confusion and as a result split it up in two methods that simplify usage significantly.
There is now a get_all_replicas
method and a get_any_replica
method.
-
get_all_replicas
asks the active node and all available replicas and returns the results as a stream. -
get_any_replica
usesget_all_replicas
, and returns the first result obtained.
Unless you want to build some kind of consensus between the different replica responses, we recommend get_any_replica
for a fallback to a regular get
when the active node times out.
Query
N1QL querying is now available at the Cluster
level instead of the bucket level, because you can also write N1QL queries that span multiple buckets. Compare a simple N1QL query from SDK 2 with its SDK 3 equivalent:
# SDK 2 simple query
query_result = bucket.query("select * from `travel-sample` limit 10")
for row in query_result:
value = row.value
# SDK 3 simple query
query_result = cluster.query("select * from `travel-sample` limit 10")
for value in query_result:
#...
pass
The following shows how to do named and positional parameters in SDK 2, and their SDK 3 counterparts:
# SDK 2 named parameters
bucket.query(
"select * from bucket where type = $type",
type="airport")
# SDK 2 positional parameters
bucket.query(
"select * from bucket where type = $1",
"airport")
# SDK 3 named parameters
from couchbase.cluster import QueryOptions
cluster.query(
"select * from bucket where type = $type",
QueryOptions(named_parameters={"type": "airport"}))
# SDK 3 positional parameters
cluster.query(
"select * from bucket where type = $1",
QueryOptions(positional_parameters=["airport"]))
Analytics
Analytics querying, like N1QL, is also moved to the Cluster
level: it is now accessible through the Cluster.analytics_query
method.
As with the Query service, parameters for the Analytics queries have moved into the AnalyticsOptions
:
# SDK 3 simple analytics query
analytics_result = cluster.analytics_query("select * from dataset")
for value in analytics_result:
#...
pass
from couchbase.cluster import AnalyticsOptions
# SDK 3 named parameters for analytics
cluster.analytics_query(
"select * from dataset where type = $type",
AnalyticsOptions(named_parameters={"type": 'airport'}))
# SDK 3 positional parameters for analytics
cluster.analytics_query(
"select * from dataset where type = $1",
AnalyticsOptions(positional_parameters=["airport"]))
Management APIs
In SDK 2, the management APIs were centralized in the Admin
class at the cluster level and the BucketManager
class at the bucket level.
Since SDK 3 provides more management APIs, they have been split up in their respective domains.
So for example when in SDK 2 you needed to remove a bucket you would call Admin.bucket_remove
you will now find it under BucketManager.drop_bucket
.
Also, creating a N1QL index now lives in the QueryIndexManager
, which is accessible through the Cluster
.
The following table provides a mapping from the SDK 2 management APIs to those of SDK 3:
SDK 2 | SDK 3 |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SDK 2 | SDK 3 |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|