Function: Convert Bucket to Collections

    +

    Goal: Demonstrate Converting "upgraded" Buckets to Collections.

    • This function ConvertBucketToCollections demonstrates a simple technique to reorganize into buckets.

    • Requires Eventing Storage (or metadata collection) and a "source" collection listing to the sample "beer-sample".

    • The sample "beer-sample" is equivalent to an upgraded bucket where all data resides in beer-sample._default._default.

    • Needs Bindings of type "Constant alias" (as documented in the Scriptlet).

    • Needs Bindings of type "bucket alias" (as documented in the Scriptlet).

    • If "Constant alias" DO_COPY is true

      • move all type` = "brewery" to COLLECTION bulk.data.brewery

      • move all type` = "beer" to COLLECTION bulk.data.beer

    • If "Constant alias" DO_DELETE is true

      • remove all type` = "brewery" from upgrade COLLECTION beer-sample._default._default

      • remove all type` = "beer" from upgrade COLLECTION beer-sample._default._default

    • The function should have higher throughput as you add more workers up to the # of vCPUs.

    • Example of performance using this technique:

      • Test cluster is a symmetric 3 x AWS r5.2xlarge (64 GiB of memory, 8 vCPUs, 64-bit platform).

      • Will process 93K ops/sec in a steady state.

      • 250M small documents: takes 44 minutes to reorganize a bucket with 80 types into a new bucket with 80 collections.

      • 1B small documents: takes 3 hours to reorganize a bucket with 80 types into a new bucket with 80 collections.

    If you alter this function and attempt to run this Eventing function in a single bucket you will have to disable recursion checks. Refer to Disabling infinite recursion checks. In this case, always test your Eventing Function on a non-production system to ensure you do not mistakenly create an infinite recursion loop.
    • ConvertBucketToCollections

    • Input Data/Mutation

    • Output Data/Logged

    // To run configure the settings for this Function, ConvertBucketToCollections, as follows:
    //
    // Version 7.1+
    //   "Function Scope"
    //     *.* (or try beer-sample.data if non-privileged)
    // Version 7.0+
    //   "Listen to Location"
    //     beer-sample.data.source
    //   "Eventing Storage"
    //     rr100.eventing.metadata
    //   Binding(s)
    //    1. "binding type", "alias name...", "bucket.scope.collection",       "Access"
    //       "bucket alias", "src_col",       "beer-sample._default._default", "read and write"
    //       "bucket alias", "brewery_col",   "bulk.data.brewery",             "read and write"
    //       "bucket alias", "beer_col",      "bulk.data.beer",                "read and write"
    //
    //    2. "binding type",   "alias name...", "value",
    //       "Constant alias", "DO_COPY",       true
    //       "Constant alias", "DO_DELETE",     true
    //
    // Version 6.6.2 (not applicable)
    
    // Upgrades `beer-sample` from a bucket paradigm to a collection/keyspace paradigm.
    
    function OnUpdate(doc, meta) {
        if (doc.type === 'beer') {
            if (DO_COPY) beer_col[meta.id] = doc;
            if (DO_DELETE) {
                if (!beer_col[meta.id]) { // safety check
                    log("skip delete copy not found type=" + doc.type + ", meta.id=" + meta.id);
                } else {
                    delete src_col[meta.id];
                }
            }
        }
        if (doc.type === 'brewery') {
            if (DO_COPY) brewery_col[meta.id] = doc;
            if (DO_DELETE) {
                if (!brewery_col[meta.id]) {  // safety check
                    log("skip delete copy not found type=" + doc.type + ", meta.id=" + meta.id);
                } else {
                    delete src_col[meta.id];
                }
            }
        }
    }
    LOAD THE SAMPLE `beer-sample` from the UI Settings page
    CREATE two empty keyspaces `bulk`.`data`.`brewery` and `bulk`.`data`.`beer`
    THE `beer-sample`.`_default`.`_default` keyspace is empty (0 documents)
    The keyspace `bulk`.`data`.`brewery` has 1,412 documents of type == "brewery"
    The keyspace `bulk`.`data`.`beer` has 5,891 documents of type == "beer"