Transcoders & Non-JSON Documents

    +
    The Scala SDK supports common JSON document requirements out-of-the-box. Custom transcoders and serializers provide support for applications needing to perform advanced operations, including supporting non-JSON data.

    The Scala SDK uses the concepts of transcoders and serializers, which are used whenever data is sent to or retrieved from Couchbase Server.

    When sending data to Couchbase, the SDK passes the object being sent to a transcoder. The transcoder can either reject the object as being unsupported, or convert it into a byte[] and a Common Flag. The Common Flag specifies whether the data is JSON, a non-JSON string, or raw binary data. It may, but does not have to, use a serializer to perform the byte conversion.

    Serializers are discovered at compile-time: if the application is sending type T, the compiler will look for a JsonSerializer[T]. Similarly when the application is receiving data to a desired type T, the compiler will look for a JsonDeserializer[T]. There are JsonSerializer and JsonDeserializer implementations provided for many types, including several popular third-party JSON libraries, and it is easy to add more.

    On retrieving data from Couchbase, the fetched byte[] and Common Flag are passed to a transcoder. The transcoder converts the bytes into a concrete class (the application specifies the required type) if possible. It may use a serializer (the JsonDeserializer[T]) for this.

    So, while a JsonSerializer[T] and JsonDeserializer[T] will always be found - the code will not compile if not - they are not necessarily used. It is down to the transcoder to make this choice.

    Many applications will not need to be aware of transcoders and serializers, as the defaults support most standard JSON use cases. The information in this page is only needed if the application has an advanced use-case, likely involving either non-JSON data, or a requirement for a particular JSON serialization library. For examples of many common JSON use cases see JSON Libraries.

    Default Behaviour

    The ClusterEnvironment contains a global transcoder, which by default is JsonTranscoder.

    On sending data of type T to Couchbase, a JsonSerializer[T] will be found by the compiler. If it cannot be found, the program will not be compiled. JsonTranscoder will send objects to that serializer to convert into a byte[]. The serialized bytes are then sent to the Couchbase Server, along with a Common Flag of JSON.

    JsonTranscoder will pass any T to its serializer, apart from a byte[]. It will reject this with a Failure(IllegalArgumentException), as it is ambiguous how it should be handled.

    On retrieving data from Couchbase into a desired type T, a JsonDeserializer[T] will be found by the compiler. JsonTranscoder passes the fetched byte[] and Common Flag to that serializer to convert into a T.

    This table summarizes that information, and this more concise form will be used to describe the other transcoders included in the SDK.

    Item Result Common Flag

    String

    Results of serializer

    JSON

    byte[]

    Failure(IllegalArgumentException)

    -

    Other T

    Results of serializer

    JSON

    The default JsonSerializer and JsonDeserializer provided handle objects of type T as follows:

    Type of T Serialized

    String

    Into/from JSON with the high-performance JSON library Jackson

    byte[]

    Passed through (no serialization)

    JsonObject, JsonArray, JsonObjectSafe, JsonArraySafe

    Into/from JSON using Jackson

    Boolean

    Into/from a JSON representation directly ('true' or 'false')

    Other primitives (Int, Double, Long, Short)

    Into/from a JSON representation directly

    ujson.Value from µPickle

    Into/from JSON using the µPickle library

    io.circe.Json from Circe

    Into/from JSON using the Circe library

    play.api.libs.json.JsValue from Play JSON

    Into/from JSON using the Play JSON library

    org.json4s.JsonAST.JValue from Json4s

    Into/from JSON using the Json4s library

    org.typelevel.jawn.ast.JValue from Jawn

    Into/from JSON using the Jawn library

    Scala case classes

    Into/from JSON with a small amount of boilerplate to automatically generate a JsonSerializer

    There are concrete examples of using these on the JSON Libraries page. Note that the Scala SDK only has an optional dependency on Circe, Json4s, et al, so those libraries not be pulled into your application.

    RawJsonTranscoder

    The RawJsonTranscoder provides the ability for the application to explicitly specify that the data they are storing or retrieving is JSON. This transcoder does not accept a serializer, and always performs straight pass through of the data to the server. This enables the application to avoid unnecessary parsing costs when they are certain they are using JSON data.

    It only accepts Strings and byte[].

    Item Result Common Flag

    String

    Passthrough

    JSON

    byte[]

    Passthrough

    JSON

    Other T

    Failure(IllegalArgumentException)

    -

    Say we want to serialize and deserialize some data with the JSON library uPickle[https://github.com/lihaoyi/upickle], and have the Scala SDK just passthrough the serialized data to and from Couchbase. We will look at better ways of doing this later, but here is one approach using RawJsonTranscoder.

    Since uPickle has already done the serialization work, we don’t want to use the default JsonTranscoder, as this will run the provided bytes needlessly through DefaultJsonSerializer (Jackson). Instead, RawJsonTranscoder is used, which just passes through the serialized bytes, and stores them in Couchbase with the JSON Common Flag set. Similarly, the same transcoder is used on reading the document, so the raw bytes can be retrieved in a String without going through DefaultJsonSerializer (Jackson).

    val json = ujson.Obj("name" -> "John Smith", "age" -> 27)
    val bytes: Array[Byte] = ujson.transform(json, ujson.BytesRenderer()).toBytes
    
    collection.upsert(
      "doc-id",
      bytes,
      UpsertOptions().transcoder(RawJsonTranscoder.Instance)
    ) match {
      case Success(_) =>
        collection
          .get(
            "doc-id",
            GetOptions().transcoder(RawJsonTranscoder.Instance)
          )
          .flatMap(result => result.contentAs[Array[Byte]]) match {
          case Success(fetched) =>
            val jsonFetched = upickle.default.read[ujson.Value](fetched)
            assert(jsonFetched("name").str == "John Smith")
            assert(jsonFetched("age").num == 27)
    
          case Failure(err) => fail(s"Failed to get or convert doc: $err")
        }
    
      case Failure(err) => fail(s"Failed to upsert doc: $err")
    }

    Non-JSON Transcoders

    It is most common to store JSON with Couchbase. However, it is possible to store non-JSON documents, such as raw binary data, perhaps using an concise binary encoding like MessagePack or CBOR, in the Key-Value store.

    It’s important to note that the Couchbase Data Platform includes multiple components other than the Key-Value store — including N1QL and its indexes, FTS, analytics, and eventing — and these are optimized for JSON and will either ignore or provide limited functionality with non-JSON documents.

    Also note that some simple data types can be stored directly as JSON, without recourse to non-JSON transcoding. A valid JSON document can be a simple integer (42), string ("hello"), array ([1,2,3]), boolean (true, false) and the JSON null value.

    RawStringTranscoder

    The RawStringTranscoder provides the ability for the user to explicitly store and retrieve raw string data with Couchbase. It can be used to avoid the overhead of storing the string as JSON, which requires two bytes for double quotes, plus potentially more for escaping characters.

    Note that this transcoder does not accept a serializer, and always performs straight passthrough of the data to the server. It only accepts Strings.

    Item Result Common Flag

    String

    Passthrough

    String

    byte[]

    Failure(IllegalArgumentException)

    -

    Other T

    Failure(IllegalArgumentException)

    -

    Here’s an example of using the RawStringTranscoder:

    collection.upsert(
      "doc-id",
      "hello world",
      UpsertOptions().transcoder(RawStringTranscoder.Instance)
    ) match {
    
      case Success(_) =>
        collection
          .get(
            "doc-id",
            GetOptions().transcoder(RawStringTranscoder.Instance)
          )
          .flatMap(result => result.contentAs[String]) match {
    
          case Success(fetched) =>
            assert(fetched == "hello world")
    
          case Failure(err) => fail(s"Failed to get or convert doc: $err")
        }
    
      case Failure(err) => fail(s"Failed to upsert doc: $err")
    }

    RawBinaryTranscoder

    The RawBinaryTranscoder provides the ability for the user to explicitly store and retrieve raw byte data to Couchbase. The transcoder does not perform any form of real transcoding, and does not take a serializer, but rather passes the data through and assigns the appropriate binary Common Flag.

    Item Result Common Flag

    String

    Failure(IllegalArgumentException)

    -

    byte[]

    Passthrough

    Binary

    Other T

    Failure(IllegalArgumentException)

    -

    Here’s an example of using the RawBinaryTranscoder:

    val content: Array[Byte] = "hello world".getBytes(StandardCharsets.UTF_8)
    
      collection.upsert(
        "doc-id",
        content,
        UpsertOptions().transcoder(RawBinaryTranscoder.Instance)
      ) match {
        case Success(_) =>
          collection
            .get(
              "doc-id",
              GetOptions().transcoder(RawBinaryTranscoder.Instance)
            )
            .flatMap(result => result.contentAs[Array[Byte]]) match {
            case Success(fetched) =>
              assert(fetched(0) == 'h')
              assert(fetched(1) == 'e')
              assert(fetched(2) == 'l')
              // ...
    
            case Failure(err) => fail(s"Failed to get or convert doc: $err")
          }
    
        case Failure(err) => fail(s"Failed to upsert doc: $err")
      }

    Custom Transcoders and Serializers

    More advanced transcoding needs can be accomplished if the application implements their own transcoders and serializers.

    Creating a Custom Serializer

    Say we have a Scala case class, MyUser, that we want to easily convert to & from JSON to store in Couchbase. The Scala SDK already provides support for this (see JSON Libraries), but perhaps for some reason we want to use the JSON library uPickle for this instead. First we need a JsonSerializer[User] and JsonDeserializer[User], which are simple to write:

    case class MyUser(name: String, age: Int)
    
    object MyUser {
      implicit object UserSerializer extends JsonSerializer[MyUser] {
        override def serialize(content: MyUser): Try[Array[Byte]] = {
          // It's also possible for uPickle to serialize and deserialize
          // case classes directly to/from JSON, but for the purposes of
          // demonstration we will generate the JSON manually.
          val json = ujson.Obj("name" -> content.name, "age" -> content.age)
          Success(ujson.transform(json, ujson.BytesRenderer()).toBytes)
        }
      }
    
      implicit object UserDeserializer extends JsonDeserializer[MyUser] {
        override def deserialize(bytes: Array[Byte]): Try[MyUser] = {
          Try({
            val json = upickle.default.read[ujson.Value](bytes)
            MyUser(json("name").str, json("age").num.toInt)
          })
        }
      }
    }

    Both of these are marked implicit object and inside object MyUser so the compiler can find them. They will now be picked up by the compiler and used automatically:

    val user = MyUser("John Smith", 27)
    
    // The compiler will find our UserSerializer for this
    collection.upsert("john-smith", user) match {
    
      case Success(_) =>
        collection
          .get("john-smith")
    
          // ... and our UserDeserializer for this
          .flatMap(fetched => fetched.contentAs[MyUser]) match {
    
          case Success(fetchedUser) =>
            assert(fetchedUser == user)
    
          case Failure(err) => fail(s"Failed to get doc: $err")
        }
    
      case Failure(err) => fail(s"Failed to upsert doc: $err")
    }

    Note we don’t need to change the transcoder for this example. The table for JsonTranscoder shows that it already does what we need: on serialization (in the upsert), it passes the MyUser object to the compiler-found serializer (UserSerializer) and stores the result in Couchbase with the JSON common flag. And on deserialization (in the contentAs), the raw bytes are passed to UserDeserializer, and resulting MyUser passed back to the application.

    Selecting a Serializer

    What if there are multiple serializers that could be used for an object, and the application needs to select one?

    The serializer is an implicit argument to any operation that requires one, and the compiler-chosen selection can be overwritten by the application like this:

    case class MyUser2(name: String, age: Int)
    
    object MyUser2 {
      // First serializer uses uPickle
      implicit object UserSerializer1 extends JsonSerializer[MyUser2] {
        override def serialize(content: MyUser2): Try[Array[Byte]] = {
          val json = ujson.Obj("name" -> content.name, "age" -> content.age)
          Success(ujson.transform(json, ujson.BytesRenderer()).toBytes)
        }
      }
    
      // Second serializer writes the JSON manually
      implicit object UserSerializer2 extends JsonSerializer[MyUser2] {
        override def serialize(content: MyUser2): Try[Array[Byte]] = {
          val sb = new StringBuilder
          sb.append("""{"name":""")
          sb.append(content.name)
          sb.append("""","age":""")
          sb.append(content.age)
          sb.append("}")
          Success(sb.toString.getBytes(StandardCharsets.UTF_8))
        }
      }
    }
    val user = MyUser2("John Smith", 27)
    
    // This import will cause the compiler to prefer UserSerializer2
    import MyUser2.UserSerializer2
    collection.upsert("john-smith", user).get
    
    // But the application can override this
    collection.upsert("john-smith", user)(MyUser2.UserSerializer1).get

    Creating a Custom Transcoder

    Let’s look at a more complex example: encoding the JSON alternative, MessagePack. MessagePack is a compact binary data representation, so it should be stored with the binary Common Flag. The Common Flag is chosen by the transcoder, and none of the existing transcoders matches our needs (RawBinaryTranscoder does set the binary flag, but it passes data through directly rather than using a serializer). So we need to write one.

    Start by creating a new serializer and deserializer for our case class, that uses MessagePack:

    object MsgPack {
      implicit object MsgPackSerializer extends JsonSerializer[MyUser] {
        override def serialize(content: MyUser): Try[Array[Byte]] = {
          Try({
            // MessagePack can automatically generate equivalent code,
            // but for demonstration purposes we will do it manually
            val packer = MessagePack.newDefaultBufferPacker()
            packer.packString(content.name)
            packer.packInt(content.age)
            packer.close()
            packer.toByteArray
          })
        }
      }
    
      implicit object MsgPackDeserializer extends JsonDeserializer[MyUser] {
        override def deserialize(bytes: Array[Byte]): Try[MyUser] = {
          Try({
            val unpacker = MessagePack.newDefaultUnpacker(bytes)
            MyUser(unpacker.unpackString(), unpacker.unpackInt())
          })
        }
      }
    }

    And now create a transcoder that sets the binary Common Flag when storing the data:

    class BinaryTranscoder extends TranscoderWithSerializer {
      def encode[A](value: A, serializer: JsonSerializer[A]): Try[EncodedValue] = {
        serializer
          .serialize(value)
          .map(bytes => EncodedValue(bytes, DocumentFlags.Binary))
      }
    
      def decode[A](
        value: Array[Byte],
        flags: Int,
        serializer: JsonDeserializer[A]
      )(implicit tag: WeakTypeTag[A]): Try[A] = {
        serializer.deserialize(value)
      }
    }

    Note this transcoder is completely independent to MessagePack. All it does is pass data to and from a serializer, and set a Binary Common Flag.

    Now we can use the new transcoder and serializer to seamlessly store MessagePack data in Couchbase Server:

    val user = MyUser("John Smith", 27)
    
    // Make sure the MessagePack serializers are used
    import MsgPack._
    
    val transcoder = new BinaryTranscoder
    
    // The compiler will find and use our MsgPackSerializer here
    collection.upsert(
      "john-smith",
      user,
      UpsertOptions().transcoder(transcoder)
    ) match {
      case Success(_) =>
    
        collection
          .get("john-smith", GetOptions().transcoder(transcoder))
    
          // ... and our MsgPackDeserializer here
          .flatMap(result => result.contentAs[MyUser]) match {
    
          case Success(fetched) =>
            assert(fetched == user)
    
          case Failure(err) => fail(s"Failed to get or convert doc: $err")
        }
    
      case Failure(err) => fail(s"Failed to upsert doc: $err")
    }

    Further reading