Querying MapReduce Views

    +
    The content of the key that is generated by the emit() function provides information on how the data is selected from your view.

    In order to query a view, the view definition must include a suitable map function that uses the emit() function to generate each row of information.

    The key can be used when querying a view as the selection mechanism, either by using an:

    • explicit key — show all the records matching the exact structure of the supplied key.

    • list of keys — show all the records matching the exact structure of each of the supplied keys (effectively showing keyA or keyB or keyC).

    • range of keys — show all the records starting with keyA and stopping on the last instance of keyB.

    When querying the view results, a number of parameters can be used to select, limit, order and otherwise control the execution of the view and the information that is returned.

    When a view is accessed without specifying any parameters, the view produces results matching the following:

    • Full view specification, i.e. all documents are potentially output according to the view definition.

    • Limited to 10 items within the web console, unlimited through the REST API.

    • Reduce function used if defined in the view.

    • Items sorted in ascending order (using UTF-8 comparison for strings, natural number order)

    View results and the parameters operate and interact in a specific order. The interaction directly affects how queries are written and data is selected.

    views query flow

    The core arguments and selection systems are the same through both the REST API interface, and the client libraries. The setting of these values differs between different client libraries, but the argument names and expected and supported values are the same across all environments.

    Querying

    Querying can be performed through the REST API endpoint. The REST API supports and operates using the core HTTP protocol, and this is the same system used by the client libraries to obtain the view data.

    To retrieve views information, access any server node in a cluster on port 8092.

    The following is the HTTP method and URI used to query views:

    GET /[bucket-name]/_design/[ddoc-name]/_view/[view-name]

    Where:

    • bucket-name is the name of the bucket.

    • ddoc-name is the name of the design document that contains the view.

    • view-name is the name of the corresponding view within the design document.

    Development view, the ddoc-name is prefixed with dev_. For example, the design document beer is accessible as a development view using dev_beer.

    Production views are accessible using their name only.

    Parameters (optional):

    Table 1. Views parameters
    Parameters Type Description

    descending

    boolean

    Return the documents in descending by key order.

    endkey

    string

    Stop returning records when the specified key is reached. Key must be specified as a JSON value.

    endkey_docid

    string

    Stop returning records when the specified document ID is reached.

    full_set

    boolean

    Use the full cluster data set (development views only).

    group

    boolean

    Group the results using the reduce function to a group or single row. Note: Do not use group with group_level because they are not compatible.

    group_level

    numeric

    Specify the group level to be used. Note: Do not use group_level with group because they are not compatible.

    inclusive_end

    boolean

    Specifies whether the specified end key is included in the result. Note: Do not use inclusive_end with key or keys.

    key

    string

    Return only documents that match the specified key. Key must be specified as a JSON value.

    keys

    array

    Return only documents that match each of keys specified within the given array. Key must be specified as a JSON value. Sorting is not applied when using this option.

    limit

    numeric

    Limit the number of the returned documents to the specified number.

    on_error

    string

    Sets the response in the event of an error.

    Supported values:

    • continue : Continue to generate view information in the event of an error, including the error information in the view response stream.

    • stop : Stop immediately when an error condition occurs. No further view information is returned.

    reduce

    boolean

    Use the reduction function.

    skip

    numeric

    Skip this number of records before starting to return the results.

    stale

    string

    Allow the results from a stale view to be used.

    Supported values:

    • false : The server waits for the indexer to finish the changes that correspond to the current key-value document set and then returns the latest entries from the view index.

    • ok : The server returns the current entries from the index file including the stale views.

    • update_after : The server returns the current entries from the index, and then initiates an index update.

    startkey

    string

    Return records with a value equal to or greater than the specified key. Key must be specified as a JSON value.

    startkey_docid

    string

    Return records starting with the specified document ID.

    For additional details, see Getting Views Information.

    Selecting information

    Couchbase Server supports a number of mechanisms for selecting information returned by the view. Key selection is made after the view results (including the reduction function) are executed, and after the items in the view output have been sorted.

    When specifying keys to the selection mechanism, the key must be expressed in the form of a JSON value. For example, when specifying a single key, a string must be quoted ("string").

    When specifying the key selection through a parameter, the keys must match the format of the keys emitted by the view. Compound keys, for example where an array or hash has been used in the emitted key structure, the supplied selection value should also be an array or a hash.

    The following selection types are supported:

    • Explicit Key

    An explicit key can be specified using the parameter key. The view query will only return results where the key in the view output, and the value supplied to the key parameter match identically.

    For example, if you supply the value "tomato" only records matching exactly "tomato" will be selected and returned. Keys with values such as "tomatoes" will not be returned.

    • Key List

    A list of keys to be output can be specified by supplying an array of values using the keys parameter. In this instance, each item in the specified array will be used as explicit match to the view result key, with each array value being combined with a logical or.

    For example, if the value specified to the keys parameter was ["tomato","avocado"], then all results with a key of ‘tomato’ or ‘avocado’ will be returned.

    When using this query option, the output results are not sorted by key. This is because key sorting of these values would require collating and sorting all the rows before returning the requested information.

    In the event of using a compound key, each compound key must be specified in the query. For example:

    ```
    keys=[["tomato",20],["avocado",20]]
    ```
    • Key Range

    A key range, consisting of a startkey and endkey. These options can be used individually, or together, as follows:

    * `startkey` only
    
      Output does not start until the first occurrence of `startkey`, or a value
      greater than the specified value, is seen. Output will then continue until the
      end of the view.
    
    * `endkey` only
    
      Output starts with the first view result, and continues until the last
      occurrence of `endkey`, or until the emitted value is greater than the computed
      lexical value of `endkey`.
    
    * `startkey` and `endkey`
    
      Output of values does not start until `startkey` is seen, and stops when the
      last occurrence of `endkey` is identified.

    When using endkey, the inclusive_end option specifies whether output stops after the last occurrence of the specified endkey (the default). If set to false, output stops on the last result before the specified endkey is seen.

    The matching algorithm works on partial values, which can be used to an advantage when searching for ranges of keys.

    Do not use the inclusive_end parameter with key or keys parameters. The inclusive_end parameter is not designed to work with key or keys because it is an attribute of range operations.

    Selecting compound information by key or keys

    If you are generating a compound key within your view, for example when outputting a date split into individually year, month, day elements, then the selection value must exactly match the format and size of your compound key. The value of key or keys must exactly match the output key structure.

    For example, with the view data:

    {"total_rows":5693,"rows":[
    {"id":"1310653019.12667","key":[2011,7,14,14,16,59],"value":null},
    {"id":"1310662045.29534","key":[2011,7,14,16,47,25],"value":null},
    {"id":"1310668923.16667","key":[2011,7,14,18,42,3],"value":null},
    {"id":"1310675373.9877","key":[2011,7,14,20,29,33],"value":null},
    {"id":"1310684917.60772","key":[2011,7,14,23,8,37],"value":null},
    {"id":"1310693478.30841","key":[2011,7,15,1,31,18],"value":null},
    {"id":"1310694625.02857","key":[2011,7,15,1,50,25],"value":null},
    {"id":"1310705375.53361","key":[2011,7,15,4,49,35],"value":null},
    {"id":"1310715999.09958","key":[2011,7,15,7,46,39],"value":null},
    {"id":"1310716023.73212","key":[2011,7,15,7,47,3],"value":null}
    ]
    }

    Using the key selection mechanism you must specify the entire key value, i.e.:

    ?key=[2011,7,15,7,47,3]

    If you specify a value, such as only the date:

    ?key=[2011,7,15]

    The view will return no records, since there is no exact key match. Instead, you must use a range that encompasses the information range you want to output:

    ?startkey=[2011,7,15,0,0,0]&endkey=[2011,7,15,99,99,99]

    This will output all records within the specified range for the specified date.

    Partial selection and key ranges

    Matching of the key value has a precedence from right to left for the key value and the supplied startkey and/or endkey. Partial strings may therefore be specified and return specific information.

    For example, given the view data:

    "a",
     "aa",
     "bb",
     "bbb",
     "c",
     "cc",
     "ccc"
     "dddd"

    Specifying a startkey parameter with the value "aa" will return the last seven records, including "aa":

    "aa",
     "bb",
     "bbb",
     "c",
     "cc",
     "ccc",
     "dddd"

    Specifying a partial string to startkey will trigger output of the selected values as soon as the first value or value greater than the specified value is identified. For strings, this partial match (from left to right) is identified. For example, specifying a startkey of "d" will return:

    "dddd"

    This is because the first match is identified as soon as the a key from a view row matches the supplied startkey value from left to right. The supplied single character matches the first character of the view output.

    When comparing larger strings and compound values the same matching algorithm is used. For example, searching a database of ingredients and specifying a startkey of "almond" will return all the ingredients, including "almond", "almonds", and "almond essence".

    To match all of the records for a given word or value across the entire range, you can use the null value in the endkey parameter. For example, to search for all records that start only with the word "almond", you specify a startkey of "almond", and an endkey of "almond\u02ad" (i.e. with the last Latin character at the end). If you are using Unicode strings, you may want to use "\uefff".

    startkey="almond"&endkey="almond\u02ad"

    The precedence in this example is that output starts when ‘almond’ is seen, and stops when the emitted data is lexically greater than the supplied endkey. Although a record with the value "almond\02ad" will never be seen, the emitted data will eventually be lexically greater than "almond\02ad" and output will stop.

    In effect, a range specified in this way acts as a prefix with all the data being output that match the specified prefix.

    Partial selection with compound keys

    Compound keys, such as arrays or hashes, can also be specified in the view output, and the matching precedence can be used to provide complex selection ranges. For example, if time data is emitted in the following format:

    [year,month,day,hour,minute]

    Then precise date (and time) ranges can be selected by specifying the date and time in the generated data. For example, to get information between 1st April 2011, 00:00 and 30th September 2011, 23:59:

    ?startkey=[2011,4,1,0,0]&endkey=[2011,9,30,23,59]

    The flexible structure and nature of the startkey and endkey values enable selection through a variety of range specifications. For example, you can obtain all of the data from the beginning of the year until the 5th March using:

    ?startkey=[2011]&endkey=[2011,3,5,23,59]

    You can also examine data from a specific date through to the end of the month:

    ?startkey=[2011,3,16]&endkey=[2011,3,99]

    In the above example, the value for the day element of the array is an impossible value, but the matching algorithm will identify when the emitted value is lexically greater than the supplied endkey value, and information selected for output will be stopped.

    A limitation of this structure is that it is not possible to ignore the earlier array values. For example, to select information from 10am to 2pm each day, you cannot use this parameter set:

    ?startkey=[null,null,null,10,0]&endkey=[null,null,null,14,0]

    In addition, because selection is made by a outputting a range of values based on the start and end key, you cannot specify range values for the date portion of the query:

    ?startkey=[0,0,0,10,0]&endkey=[9999,99,99,14,0]

    This will instead output all the values from the first day at 10am to the last day at 2pm.

    Pagination

    Pagination over results can be achieved by using the skip and limit parameters. For example, to get the first 10 records from the view:

    ?limit=10

    The next ten records can obtained by specifying:

    ?skip=10&limit=10

    On the server, the skip option works by executing the query and literally iterating over the specified number of output records specified by skip, then returning the remainder of the data up until the specified limit records are reached, if the limit parameter is specified.

    When paginating with larger values for skip, the overhead for iterating over the records can be significant. A better solution is to track the document id output by the first query (with the limit parameter). You can then use startkey_docid to specify the last document ID seen, skip over that record, and output the next ten records.

    Therefore, the paging sequence is, for the first query:

    ?startkey="carrots"&limit=10

    Record the last document ID in the generated output, then use:

    ?startkey="carrots"&startkey_docid=DOCID&skip=1&limit=10

    When using startkey_docid you must specify the startkey parameter to specify the information being searched for. By using the startkey_docid parameter, Couchbase Server skips through the B-Tree index to the specified document ID. This is much faster than the skip/limit example shown above.

    Grouping in queries

    If you have specified an array as your compound key within your view, then you can specify the group level to be applied to the query output when using a reduce().

    When grouping is enabled, the view output is grouped according to the key array, and you can specify the level within the defined array that the information is grouped by. You do this by specifying the index within the array by which you want the output grouped using the group_level parameter.

    views grouping

    The group_level parameter specifies the array index (starting at 1) at which you want the grouping occur, and generate a unique value based on this value that is used to identify all the items in the view output that include this unique value:

    • A group level of 0 groups by the entire dataset (as if no array exists).

    • A group level of 1 groups the content by the unique value of the first element in the view key array. For example, when outputting a date split by year, month, day, hour, minute, each unique year will be output.

    • A group level of 2 groups the content by the unique value of the first and second elements in the array. With a date, this outputs each unique year and month, including all records with that year and month into each group.

    • A group level of 3 groups the content by the unique value of the first three elements of the view key array. In a date this outputs each unique date (year, month, day) grouping all items according to these first three elements.

    The grouping will work for any output structure where you have output an compound key using an array as the output value for the key.

    Selection when grouping

    When using grouping and selection using the key, keys, or startkey / endkey parameters, the query value should match at least the format (and element count) of the group level that is being queried.

    For example, using the following map() function to output information by date as an array:

    function(doc, meta)
    {
      emit([doc.year, doc.mon, doc.day], doc.logtype);
    }

    If you specify a group_level of 2 then you must specify a key using at least the year and month information. For example, you can specify an explicit key, such as [2012,8] :

    ?group_level=2&key=[2012,8]

    You can query it for a range:

    ?group_level=2&startkey=[2012,2]&endkey=[2012,8]

    You can also specify a year, month and day, while still grouping at a higher level. For example, to group by year/month while selecting by specific dates:

    ?group_level=2&startkey=[2012,2,15]&endkey=[2012,8,10]

    Specifying compound keys that are shorter than the specified group level may output unexpected results due to the selection mechanism and the way startkey and endkey are used to start and stop the selection of output rows.

    Ordering

    All view results are automatically output sorted, with the sorting based on the content of the key in the output view. Views are sorted using a specific sorting format, with the basic order for all basic and compound follows as follows:

    • null

    • false

    • true

    • Numbers

    • Text (case sensitive, lowercase first, UTF-8 order)

    • Arrays (according to the values of each element, in order)

    • Objects (according to the values of keys, in key order)

    The natural sorting is therefore by default close to natural sorting order both alphabetically (A-Z) and numerically (0-9).

    There is no collation or foreign language support. Sorting is always according to the above rules based on UTF-8 values.

    You can alter the direction of the sorting (reverse, highest to lowest numerically, Z-A alphabetically) by using the descending option. When set to true, this reverses the order of the view results, ordered by their key.

    Because selection is made after sorting the view results, if you configure the results to be sorted in descending order and you are selecting information using a key range, then you must also reverse the startkey and endkey parameters. For example, if you query ingredients where the start key is ‘tomato’ and the end key is ‘zucchini’, for example:

    ?startkey="tomato"&endkey="zucchini"

    The selection will operate, returning information when the first key matches ‘tomato’ and stopping on the last key that matches ‘zucchini’.

    If the return order is reversed:

    ?descending=true&startkey="tomato"&endkey="zucchini"

    The query will return only entries matching ‘tomato’. This is because the order will be reversed, ‘zucchini’ will appear first, and it is only when the results contain ‘tomato’ that any information is returned.

    To get all the entries that match, the startkey and endkey values must also be reversed:

    ?descending=true&startkey="zucchini"&endkey="tomato"

    The above selection will start generating results when ‘zucchini’ is identified in the key, and stop returning results when ‘tomato’ is identified in the key.

    View output and selection are case sensitive. Specifying the key ‘Apple’ will not return ‘apple’ or ‘APPLE’ or other case differences. Normalizing the view output and query input to all lowercase or upper case will simplify the process by eliminating the case differences.

    Understanding letter ordering in views

    Couchbase Server uses a Unicode collation algorithm to order letters, so you should be aware of how this functions. Most developers are typically used to Byte order, such as that found in ASCII and which is used in most programming languages for ordering strings during string comparisons.

    The following shows the order of precedence used in Byte order, such as ASCII:

    123456890 < A-Z < a-z

    This means any items that start with integers will appear before any items with letters; any items that beginning with capital letters will appear before items in lower case letters. This means the item named "Apple" will appear before "apple" and the item "Zebra" will appear before "apple". Compare this with the order of precedence used in Unicode collation, which is used in Couchbase Server:

    123456790 < aAbBcCdDeEfFgGhH...

    Notice again that items that start with integers will appear before any items with letters. However, in this case, the lowercase and then uppercase of the same letter are grouped together. This means that "apple" will appear before "Apple" and will also appear before "Zebra." In addition, be aware that with accented characters will follow this ordering:

    a < á < A < Á < b

    This means that all items starting with "a" and accented variants of the letter will occur before "A" and any accented variants of "A."

    Ordering Example

    In Byte order, keys in an index would appear as follows:

    "ABC123" < "ABC223" < "abc123" < "abc223" < "abcd23" < "bbc123" < "bbcd23"

    The same items will be ordered this way by Couchbase Server under Unicode collation:

    "abc123" < "ABC123" < "abc223" < "ABC223" < "abcd23" < "bbc123" < "bbcd23"

    This is particularly important for you to understand if you query Couchbase Server with a startkey and endkey to get back a range of results. The items you would retrieve under Byte order are different compared to Unicode collation.

    Ordering and Query Example

    This following example demonstrates Unicode collation in Couchbase Server and the impact on query results returned with a startkey and endkey. It is based on the beer-sample database provided with Couchbase Server.

    Imagine you want to retrieve all breweries with names starting with uppercase Y. Your query parameters would appear as follows:

    startkey="Y"&endkey="z"

    If you want breweries starting with lowercase y or uppercase Y, you would provides a query as follows:

    startkey="y"&endkey="z"

    This will return all names with lower case Y and items up to, but not including lowercase z, thereby including uppercase Y as well. To retrieve the names of breweries starting with lowercase y only, you would terminate your range with capital Y:

    startkey="y"&endkey="Y"

    As it happens, the sample database does not contain any results because there are no beers in it which start with lowercase y.

    Error control

    There are a number of parameters that can be used to help control errors and responses during a view query.

    • on_error

    The on_error parameter specifies whether the view results will be terminated on the first error from a node, or whether individual nodes can fail and other nodes return information.

    When returning the information generated by a view request, the default response is for any raised error to be included as part of the JSON response, but for the view process to continue. This permits for individual nodes within the Couchbase cluster to timeout or fail, while still generating the requested view information.

    In this instance, the error is included as part of the JSON returned:

    {
       "errors" : [
          {
             "from" : "http://192.168.1.80:9503/_view_merge/?stale=false",
             "reason" : "req_timedout"
          },
          {
             "from" : "http://192.168.1.80:9502/_view_merge/?stale=false",
             "reason" : "req_timedout"
          },
          {
             "from" : "http://192.168.1.80:9501/_view_merge/?stale=false",
             "reason" : "req_timedout"
          }
       ],
       "rows" : [
          {
             "value" : 333280,
             "key" : null
          }
       ]
    }

    You can alter this behavior by using the on_error argument. The default value is continue. If you set this value to stop then the view response will cease the moment an error occurs. The returned JSON will contain the error information for the node that returned the first error. For example:

    ```
    {
       "errors" : [
          {
             "from" : "http://192.168.1.80:9501/_view_merge/?stale=false",
             "reason" : "req_timedout"
          }
       ],
       "rows" : [
          {
             "value" : 333280,
             "key" : null
          }
       ]
    }
    ```