Observability Metrics Reporting

  • Developer Preview
    +

    Individual request tracing presents a very specific (though isolated) view of the system. In addition, it also makes sense to capture information that aggregates request data (i.e. requests per second), but also data which is not tied to a specific request at all (i.e. resource utilization).

    The deployment situation itself is similar to the request tracer: either applications already have a metrics infrastructure in place or they don’t. The difference is that exposing some kind of metrics is much more common than request based tracing, because most production deployments at least monitor CPU and memory usage (e.g. through JMX).

    Metrics broadly fall into the following categories:

    • Request/Response Metrics (such as requests per second).

    • SDK Metrics (such as how many open collections, various queue lengths).

    • System Metrics (such as CPU usage or garbage collection performance).

    Right now only the first category is implemented by the SDK; more are planned.

    The Default AggregatingMeter

    Developer Preview

    Before starting, here are all imports used in the following examples:

    import com.couchbase.client.metrics.opentelemetry.OpenTelemetryMeter
    import com.couchbase.client.scala.env.{ClusterEnvironment, LoggingMeterConfig}
    import com.couchbase.client.scala.{Cluster, ClusterOptions}
    import io.opentelemetry.api.metrics.GlobalMeterProvider
    
    import scala.concurrent.duration._
    import scala.util.Try

    The default implementation aggregates and logs request and response metrics. Since the feature is in developer preview, you need to manually enable it on the environment:

    val config: Try[ClusterEnvironment] = ClusterEnvironment.builder
      .loggingMeterConfig(LoggingMeterConfig()
        .enabled(true))
      .build
    
    val cluster: Try[Cluster] = config.flatMap(c =>
      Cluster.connect("localhost", ClusterOptions
        .create("username", "password")
        .environment(c)))

    By default the metrics will be emitted every 10 minutes, but you can customize the emit interval as well:

    val config: Try[ClusterEnvironment] = ClusterEnvironment.builder
      .loggingMeterConfig(LoggingMeterConfig()
        .enabled(true)
        .emitInterval(10.minutes))
      .build

    Once enabled, there is no further configuration needed. The AggregatingMeter will emit the collected request statistics every interval. A possible report looks like this (prettified for better readability):

    {
       "meta":{
          "emit_interval_s":10
       },
       "query":{
          "127.0.0.1":{
             "total_count":9411,
             "percentiles_us":{
                "50.0":544.767,
                "90.0":905.215,
                "99.0":1589.247,
                "99.9":4095.999,
                "100.0":100663.295
             }
          }
       },
       "kv":{
          "127.0.0.1":{
             "total_count":9414,
             "percentiles_us":{
                "50.0":155.647,
                "90.0":274.431,
                "99.0":544.767,
                "99.9":1867.775,
                "100.0":574619.647
             }
          }
       }
    }

    Each report contains one object for each service that got used and is further separated on a per-node basis so they can be analyzed in isolation.

    For each service / host combination, a total amount of recorded requests is reported, as well as percentiles from a histogram in microseconds. The meta section on top contains information such as the emit interval in seconds, so tooling can later calculate numbers like requests per second.

    The AggregatingMeter can be configured on the environment as shown above. The following table shows the currently available properties:

    Table 1. AggregatingMeterConfig Properties
    Property Default Description

    enabled

    false

    If the AggregatingMeter should be enabled.

    emitInterval

    600 seconds

    The interval where found orphans are emitted.

    OpenTelemetry Integration

    Developer Preview

    The SDK supports plugging in any OpenTelemetry metrics consumer instead of using the default AggregatingMeter. To do this, first you need to add an additional dependency to your application:

    libraryDependencies += "com.couchbase.client" % "metrics-opentelemetry" % "0.1.4"

    Now you can initialise the SDK with OpenTelemetry:

    val config: Try[ClusterEnvironment] = ClusterEnvironment.builder
      .meter(OpenTelemetryMeter.wrap(GlobalMeterProvider.get()))
      .build

    Micrometer Integration

    Developer Preview

    In addition to OpenTelemetry, we also provide a module so you can hook up the SDK metrics to Micrometer.

    libraryDependencies += "com.couchbase.client" % "metrics-micrometer" % "0.1.0"

    In addition to the facade you also need to include your micrometer implementation of choice. Once you’ve created the a Micrometer MeterRegistry, you need to wrap it and pass it into the environment:

       val config: Try[ClusterEnvironment] = ClusterEnvironment.builder
          .meter(MicrometerMeter.wrap(meterRegistry))
          .build

    At this point the metrics are hooked up to Micrometer and will be reported as cb.requests (as a Counter) and cb.responses (as a DistributionSummary).