XDCR Filtering Expressions

    +
    XDCR filtering expressions allow a document to be included in or excluded from a filtered replication, based on the document’s fields and values.

    Understanding Filtering Expressions

    XDCR Advanced Filtering expressions, applied to the documents within a specified source bucket, allow case-sensitive matches to be made on:

    • id and xattrs values, within the document’s metadata.

    • Field-names and values, within the document’s data, nested to any degree.

    Every document on which a match is successfully made is included in the filtered replication. Other documents are not included.

    XDCR filtering expressions constitute a subset of SQL++ expressions, with some additions.

    Sections on this page list the available expressions, provide examples of their usage, and provide links to documentation for the comparable SQL++ expressions. A list of reserved words is provided. Additionally, relationships between expressions are listed in in BNF (Backus-Naur Form) notation.

    Note that most examples assume the travel-sample bucket to be the source of the filtered replication. For information on installing this bucket, see Install Sample Buckets.

    XDCR Advanced Filtering can be managed by means of Couchbase Web Console (see Filter a Replication), the CLI (see xdcr-replicate), and the REST API (see Creating XDCR Replications).

    Pattern Matching

    Pattern matching is supported with the following syntax:

    REGEXP_CONTAINS (expression, pattern)

    For example, the following expression specifies country as the expression, and "France" as the pattern:

    REGEXP_CONTAINS(country, "France")

    If travel-sample.inventory.airline is used as the source collection, every document that contains a field country whose value is "France" is replicated to the target. For example, this would include the document airline_1191:

    {
      "callsign": "REUNION",
      "country": "France",
      "iata": "UU",
      "icao": "REU",
      "id": 1191,
      "name": "Air Austral",
      "type": "airline"
    }

    Metadata Access

    Each document contains metadata, as well as data. Within the metadata, a document’s id and extended attributes can be accessed by means of the reserved word META, used as part of the the pattern-matching expression.

    For example, the following expression seeks a match on any document the value of whose metadata id field contains the substring "airline_10":

    REGEXP_CONTAINS(META().id, "airline_10")

    This would match a number of documents from the travel-sample.inventory.airline collection, including airline_10, whose metadata is as follows:

    {
      "meta": {
        "id": "airline_10",
        "rev": "1-159642cc11a000000000000002000000",
        "expiration": 0,
        "flags": 33554432,
        "type": "json"
      },
      "xattrs": {}
    }

    Extended Attributes are optionally used to define application-specific metadata. Their field name is xattrs. In the metadata example shown above, no value exists for the field. If an extended attribute were fully defined — for example, a color field, assigned the value blue — the appearance within the metadata would be as follows:

      "xattrs": {
        "color": "blue"
      }

    The extended attribute’s value could then be filtered and matched with the following expression:

    REGEXP_CONTAINS(META().xattrs.color, "blue")

    For information on extended attributes, see Extended Attributes.

    Lookahead

    XDCR Advanced Filtering is supported by lookahead. Lookahead is used to specify a pattern, in pattern matching; and permits a match when one specified character or character-sequence:

    • Is followed by another specified character or character-sequence. This is termed positive lookahead.

      The following syntax is used:

      char1 ( ?= char2 )

      The specified char1 must therefore be located and, for a successful match to be obtained, must be followed by the specified char2.

    • Is not followed by another specified character or character-sequence. This is termed negative lookahead.

      The following syntax is used:

      char1 ( ?! char2 )

      The specified char1 must therefore be located, and, for a successful match to be obtained, must not be followed by the specified char2.

    Lookahead might therefore be used on the travel-sample.default.default collection, to filter documents whose metadata id contains airport rather than airline, or vice versa. For example, the following expression uses positive lookahead to specify that the metadata id that begins with air should be immediately followed by the character p; thereby returning documents whose metadata id value is airport, but not airline:

    REGEXP_CONTAINS(META().id, "^air(?=p)")

    For information on the expression ^, see XDCR Regular Expressions.

    Alternatively, negative lookahead might be used to specify that the metadata id that begins with air should not be followed by the character l; thereby returning documents whose metadata id value is airport, but not airline:

    REGEXP_CONTAINS(META().id, "^air(?!l)")

    XDCR Pattern Matching: Comparison with SQL++

    The function REGEXP_CONTAINS is also supported by SQL++. Note, however, that SQL++ does not support forward lookahead.

    SQL++ supports a number of pattern-matching functions in addition to REGEXP_CONTAINS: see Pattern-matching Functions.

    META is one of a group of reserved words used by XDCR Advanced Filtering. For details, Reserved Words, below. META is also one of a larger group of reserved words used by SQL++. For details, see the page for SQL++ Reserved Words. Note that in SQL++, META provides access to a wider range of extended attributes.

    Checking for Existence

    The existence of a field can be checked for, by means of the Collection Operator EXISTS. The syntax is as follows:

    EXISTS ( expression )

    For example:

    EXISTS(country)

    This returns every document that contains a country field. This would therefore include airline_10:

    {
      "callsign": "MILE-AIR",
      "country": "United States",
      "iata": "Q5",
      "icao": "MLA",
      "id": 10,
      "name": "40-Mile Air",
      "type": "airline"
    }

    XDCR Collection Operator: Comparison with SQL++

    XDCR Advanced Filtering provides the single Collection Operator, EXISTS; which determines whether or not a specified field exists in the body of a document. SQL++ uses EXISTS on subclauses; and provides a variety of additional Collection Operators. For details, see the SQL++ page for Collection Operators.

    Using Logical Operators

    XDCR Advanced Filtering provides the Logical Operators AND, OR, and NOT. See the Filtering Expression BNF, below, for the syntactic possibilities of these operators.

    AND can be used to add a required condition to an expression. For example:

    REGEXP_CONTAINS(country, "France") AND airportname = "La Teste De Buch"

    This provides a successful match on any document whose country value is "France", and whose aiportname value is "La Test De Buch". This would therefore include the following document in the replication:

    {
      "airportname": "La Teste De Buch",
      "city": "Arcachon",
      "country": "France",
      "faa": "XAC",
      "geo": {
        "alt": 49,
        "lat": 44.59639,
        "lon": -1.110833
      },
      "icao": "LFCH",
      "id": 1283,
      "type": "airport",
      "tz": "Europe/Paris"
    }

    OR can be used to add an alternative condition to an expression. For example:

    REGEXP_CONTAINS(country, "France") OR country = "United States"

    This provides a successful match on any document whose country value is either "France" or "United States".

    NOT can be prepended to a condition, to allow a successful match only when the condition is false. For example:

    REGEXP_CONTAINS(country, "France") AND NOT airportname = "La Teste De Buch"

    This provides a successful match on every document that contains the country value "France", and contains an airportname value that is not "La Teste De Buch".

    All words used as Logical Operators for XDCR Advanced Filtering are reserved. For details, see Reserved Words, below.

    XDCR Logical Operators: Comparison with SQL++

    SQL++ provides the same Logical Operators as does XDCR Advanced Filtering. For details, see the SQL++ page for Logical Operators.

    Using Comparison Operators

    XDCR Advanced Filtering provides the following Comparison Operators:

    = and ==

    != and <>

    >

    >=

    <

    <=

    IS NULL

    IS NOT NULL

    IS MISSING

    IS NOT MISSING

    As this indicates, the tests for equality and inequality are each provided in two versions, to ensure compatibility with different languages.

    Note that for purposes of comparison, each field within a document is categorized as one of the following:

    • Has a non-null value

    • Is specified as null

    • Is missing a non-null value, and is not specified as null

    Therefore:

    • IS NULL is successfully matched with null.

    • IS NOT NULL is successfully matched when the field either has a value, or is missing a value.

    • IS MISSING is successfully matched when neither null nor a value is present.

    • IS NOT MISSING is successfully matched when either null or a value is present.

    For example:

    REGEXP_CONTAINS(country, "France") AND name != "40-Mile Air"

    This provides a successful match with every document whose country value is "France", and whose name value is not "40-Mile Air". This would include airline_1191:

    {
      "callsign": "REUNION",
      "country": "France",
      "iata": "UU",
      "icao": "REU",
      "id": 1191,
      "name": "Air Austral",
      "type": "airline"
    }

    The following, additional example tests for a null icao field, on documents whose country value is United States:

    REGEXP_CONTAINS(country, "United States") AND icao IS NULL

    This returns a number of matches, one of which is airport_4079:

    {
      "airportname": "Orlando",
      "city": "Orlando",
      "country": "United States",
      "faa": "DWS",
      "geo": {
        "alt": 340,
        "lat": 28.398,
        "lon": -81.57
      },
      "icao": null,
      "id": 4079,
      "type": "airport",
      "tz": "America/New_York"
    }

    All words used in Comparison Operators for XDCR Advanced Filtering are reserved. For details, see Reserved Words, below.

    XDCR Comparison Operators: Comparison with SQL++

    The Comparison Operators provided by XDCR Advanced Filtering are a subset of those provided by SQL++. For details, see the SQL++ page for Comparison Operators.

    Selecting Fields and Elements

    XDCR Advanced Filtering provides operators for Field Selection and Element Selection.

    Field Selection

    The Field Selection Operator is the period: . This allows a child-field, within a parent-field, to be specified. Note that the Field Selection Operator was used to specify the metadata id field, in Metadata Access, above.

    The following example uses the Field Selection Operator to obtain a match on any document that contains a country field with a value of "United States", and also has an alt field, within the value of its geo field, with a value that is greater than or equal to 6813.

    REGEXP_CONTAINS(country, "United States") AND geo.alt >= 6813

    This returns a number of matches, including airport_4084:

    {
      "airportname": "Telluride",
      "city": "Telluride",
      "country": "United States",
      "faa": "TEX",
      "geo": {
        "alt": 9078,
        "lat": 37.953759,
        "lon": -107.90848
      },
      "icao": "KTEX",
      "id": 4084,
      "type": "airport",
      "tz": "America/Denver"
    }

    When non-standard alphanumeric characters have been used in field-naming (for example, space or bracket characters), the field, when referenced in an advanced filtering expression, should be escaped with backticks: eg, field.`the field name`.

    Element Selection

    The Element Selection Operator, which is provided for use on arrays, takes the form [ n ], where n is an array-position.

    For example, the following provides a successful match when a document whose airline value is "AA" also contains a schedule array, whose initial member has a field flight with a value of "AA679":

    REGEXP_CONTAINS(airline, "AA") AND schedule[0].flight = "AA679"

    This produces a match on document route_5784:

    {
      "airline": "AA",
      "airlineid": "airline_24",
      "destinationairport": "PHL",
      "distance": 153.59665185566308,
      "equipment": "E90 DH3 319",
      "id": 5784,
      "schedule": [{
        "day": 0,
        "flight": "AA679",
        "utc": "22:01:00"
      }, {
        "day": 0,
        "flight": "AA253",
        "utc": "22:29:00"
      }, {
        "day": 1,
          .
          .
          .
      }, {
        "day": 6,
        "flight": "AA661",
        "utc": "16:35:00"
      }],
      "sourceairport": "LGA",
      "stops": 0,
      "type": "route"
    }

    XDCR Selection Operators: Comparison with SQL++

    The Field Selection Operator is used in SQL++ with additional functionality: nested expressions support is provided. The Element Selection Operator is also used in SQL++ with additional functionality: negative indexing of arrays and the * operator are supported. SQL++ also provides array slicing that allows the building of sub-slices of arrays. For information, see the SQL++ page for Nested Operators and Expressions.

    Using Arithmetic Operators

    XDCR Advanced Filtering is supported by the following arithmetic operators:

    Operator Description

    +

    Add values.

    -

    Subtract right value from left value.

    *

    Multiply values.

    /

    Divide left value by right value.

    %

    Modulo. Divide left value by right value and return the remainder.

    Note that modulo is an integer operator and will use only the integer part of each value.

    -value

    Negate value.

    For additional details on syntax and return values, see the SQL++ page for Arithmetic Operators.

    For example, the following filtering expression provides a successful match on documents whose country value is United States and whose id is an integer value that returns a value of less than or equal to 3, when modulo 7 is applied:

    REGEXP_CONTAINS(country, "United States") AND id % 7 <=  3

    XDCR Arithmetic Operators: Comparison with SQL++

    SQL++ provides the same arithmetic operators, with the same semantics and return values. See the SQL++ page for Arithmetic Operators.

    Number Functions

    XDCR Advanced Filtering is supported by Number Functions. These functions are the same as those provided by SQL++. However, the following SQL++ Number Functions do not apply to XDCR Advanced Filtering:

    • RANDOM

    • SIGN

    • TRUNC

    For information on all other functions, see the SQL++ page for Number Functions.

    Handling Dates

    XDCR Advanced Filtering provides a basic DATE function that allows user to compose filtering expressions based on dates.

    Dates can be specified in RFC3339 format. Dates can also be specified in the following ISO-8601 formats:

    ISO-8601 Format Example

    Date

    2019-01-25

    Date and time in UTC

    2019-01-25T18:40:37Z

    For example:

    REGEXP_CONTAINS(event_type, "birthday") AND DATE(date) >= DATE("2019-01-25")

    This seeks a match on any document whose date value is equal to or later than "2019-01-25". If created, the following hypothetical document provides a successful match:

    {
      "event_type": "birthday",
      "venue": "white hart hotel, salisbury, uk",
      "session": "afternoon",
      "dining_preference": "buffet",
      "number_of_guests": 25,
      "date": "2019-01-25"
    }

    Note that times are supported by utilizing the following RFC-3339 format:

    DATE(transaction.time) < DATE(2018-01-01T12:00Z)

    XDCR Date Operators: Comparison with SQL++

    SQL++ date functions are not supported by XDCR Advanced Filtering. For information on SQL++ date functions, see the SQL++ page for Date Functions.

    Reserved Words

    Reserved Words are words used syntactically by XDCR Advanced Filtering. If these words have used as identifiers in JSON documents, in order to be referenced in filtering expressions, they must be escaped, by means of backticks. The complete list of words is as follows:

    AND

    EXISTS

    FALSE

    IF

    IS

    META

    MISSING

    NOT

    NULL

    OR

    TRUE

    Filtering Expression BNF

    The relationships between available expressions for XDCR Advanced Filtering are expressed in the following table, in Backus-Naur Form.

    Expression Is Equal To

    FilterExpression

    ( "(" FilterExpression ")" { "AND" FilterExpression } { "OR" FilterExpression } ) | InnerExpression { "AND" FilterExpression }

    InnerExpression

    AndCondition { "OR" AndCondition }

    AndCondition

    Condition { "AND" Condition }

    Condition

    ( [ "NOT" ] Condition ) | Operand

    Operand

    BooleanExpr | ( LHS ( CheckOp | ( CompareOp RHS) ) )

    BooleanExpr

    Boolean | BooleanFuncExpr

    LHS

    ConstFuncExpr | Boolean | Field | Value

    RHS

    ConstFuncExpr | Boolean | Value | Field

    CompareOp

    "=" | "==" | "<>" | "!=" | ">" | ">=" | "<" | "⇐"

    CheckOp

    ( "IS" [ "NOT" ] ( NULL | MISSING ) )

    Field

    { @"-" } OnePath { "." OnePath } { MathOp MathValue }

    OnePath

    ( PathFuncExpression | StringType ){ ArrayIndex }

    StringType

    @String | @Ident | @RawString | @Char

    ArrayIndex

    "[" @Int "]"

    Value

    @String

    ConstFuncExpr

    ConstFuncNoArg | ConstFuncOneArg | ConstFuncTwoArgs

    ConstFuncNoArg

    ConstFuncNoArgName "(" ")"

    ConstFuncNoArgName

    "PI" | "E"

    ConstFuncOneArg

    ConstFuncOneArgName "(" ConstFuncArgument ")"

    ConstFuncOneArgName

    "ABS" | "ACOS"…​

    ConstFuncTwoArgs

    ConstFuncTwoArgsName "(" ConstFuncArgument "," ConstFuncArgument ")"

    ConstFuncTwoArgsName

    "ATAN2" | "POW"

    ConstFuncArgument

    Field | Value | ConstFuncExpr

    ConstFuncArgumentRHS

    Value

    PathFuncExpression

    OnePathFuncNoArg

    OnePathFuncNoArg

    OnePathFuncNoArgName "(" ")"

    MathOp

    @"+" | @"-" | @"*" | @"/" | @"%"

    MathValue

    @Int | @Float

    OnePathFuncNoArgName

    "META"

    BooleanFuncExpr

    BooleanFuncTwoArgs | ExistsClause

    BooleanFuncTwoArgs

    BooleanFuncTwoArgsName "(" ConstFuncArgument "," ConstFuncArgumentRHS ")"

    BooleanFuncTwoArgsName

    "REGEXP_CONTAINS"

    ExistsClause

    ( "EXISTS" "(" Field ")" )