MongoDB’s New Bulk API

May 6 • Posted 2 months ago

By Christian Kvalheim, Driver Lead and Node.js Driver Maintainer at MongoDB

The New Bulk API

One of the core new features in MongoDB 2.6 is the new bulk write operations. All the drivers include a new bulk api that allows applications to leverage these new operations using a fluid style API. Let’s explore the API and how it’s implemented in the Node.js driver.

The API

The API has two core concepts. The ordered and the unordered bulk operation. The main difference is in the way the operations are executed in bulk. In the case of an ordered bulk operation, every operation will be executed in the order they are added to the bulk operation. In the case of an unordered bulk operation however there is no guarantee what order the operations are executed. Later we will look at how each is implemented.

Operations

You can initialize an ordered or unordered bulk operation in the following way.

    var ordered = db.collection('documents').initializeOrderedBulkOp();
    var unordered = db.collection('documents').initializeUnorderedBulkOp();

Both the ordered and unordered instances are bulk operation objects that we can add insert, update and remove operations to. The following operations are valid.

updateOne (update first matching document)

```ordered.find({ a : 1 }).updateOne({$inc : {x : 1}});```

update (update all matching documents)

```ordered.find({ a : 1 }).update({$inc : {x : 2}});```

replaceOne (replace entire document)

```ordered.find({ a : 1 }).replaceOne({ x : 2});```

updateOne or upsert (update first existing document or upsert)

```ordered.find({ a : 2 }).upsert().updateOne({ $inc : { x : 1}});```

update or upsert (update all or upsert)

```ordered.find({ a : 2 }).upsert().update({ $inc : { x : 2}});```

replace or upsert (replace first document or upsert)

```ordered.find({ a : 2 }).upsert().replaceOne({ x : 3 });```

removeOne (remove the first document matching)

```ordered.find({ a : 2 }).removeOne();```

remove (remove all documents matching)

```ordered.find({ a : 1 }).remove();```

insert

```ordered.insert({ a : 5});```

What happens under the covers when you start adding operations to a bulk operation? Let’s take a look at the new write operations to see how it works.

The New Write Operations

MongoDB 2.6 introduces a completely new set of write operations. Before 2.6 all write operations where done using wire protocol messages at the socket level. From 2.6 this changes to using commands.

### Insert Write Command

The insert write commands allow an application to insert batches of documents. Here’s an example:

    {
        insert: 'collection name'
      , documents: [{ a : 1}, ...]
      , writeConcern: {
        w: 1, j: true, wtimeout: 1000
      }
      , ordered: true/false
    }

A couple of things to note. The documents field contains an array of all the documents that are to be inserted. The writeConcern field specifies what would have previously been a getLastError command that would follow the pre 2.6 write operations. In other words there is always a response from a write operation in 2.6. This means that w:0 has different semantics than what one is used to in pre 2.6. In the context w:0 basically means only return an ack without any information about the success or failure of insert operations.

Let’s take a look at the update and remove write commands before seeing the results that are returned when executing these operations in 2.6.

Update Write Command

There are some slight differences in the update write command in comparison to the insert write command. Here’s an example:

    {
        update: 'collection name'
      , updates: [{ 
            q: { a : 1 }
          , u: { $inc : { x : 1}}
          , multi: true/false
          , upsert: true/false
        }, ...]
      , writeConcern: {
        w: 1, j: true, wtimeout: 1000
      }
      , ordered: true/false
    }

The main difference here is that the updates array is an array of update operations where each entry in the array contains the q field that specifies the selector for the update. The u contains the update operation. multi specifies if we will updateOne or updateAll documents that matches the selection. Finally upsert tells the server if it will perform an upsert if the document is not found.

Finally let’s look at the remove write command.

Remove Write Command

The remove write command is very similar to the update write command. Here’s an example:

    {
        delete: 'collection name'
      , deletes: [{ 
            q: { a : 1 }
          , limit: 0/1
        }, ...]
      , writeConcern: {
        w: 1, j: true, wtimeout: 1000
      }
      , ordered: true/false
    }

Similar to the update example, we can see that the entries in the deletes array contain documents with specific fields. The q field is the selector that will match which documents will be removed. The limit field sets the number of elements to be removed. Currently limit only supports two values, 0 and 1. The value 0 for limit removes all documents that match the selector. A value of 1 for limit removes the first matching document only.

Now let’s take a look at how results are returned for these new write commands.

Write Command Results

One of the best new aspects of the new write commands is that they can return information about each individual operation error in the batch. Results are efficient - only information about errors are returned as well as the aggregated counts of successful operations. Here’s an example of a comprehensive* result:

    {
      "ok" : 1,
      "n" : 0,
      "nModified": 1, (Applies only to update)
      "nRemoved": 1, (Applies only to removes)
      "writeErrors" : [
        {
          "index" : 0,
          "code" : 11000,
          "errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: t1.t.$a_1  dup key: { : 1.0 }"
        }
      ],
      writeConcernError: {
        code : 22,
        errInfo: { wtimeout : true },
        errmsg: "Could not replicate operation within requested timeout"
      }      
    }

The two most interesting fields here are writeErrors and writeConcernError. If we take a look at writeErrors we can see how it’s an array of objects that include an index field as well as a code and errmsg. The field references the position of the failing document in the original documents, updates or deletes array allowing the application to identify the original batch document that failed.

The Effect of Ordered (true/false)

If ordered is set to true the write operation will fail on the first write error (meaning the first error that fails to apply the operation to memory). If one sets ordered to false the operation will continue until all operations have been executed (potentially in parallel), then return all the results. writeConcernError on the other hand does not stop the processing of a bulk operation if a document fails to be written to MongoDB.

It helps to think of writeErrors as hard errors and writeConcernError as a soft error.

The Special Case of w:0 As I mentioned previously, the semantics for w:0 changed for the write commands. The old style of write operations before 2.6 are a combination of a write wire message and a getLastError command. In the old style w:0 meant that the driver would not send a getLastError command after the write operation.

In 2.6 the new insert/update/delete commands will always respond. While w:0 would not return a result in versions of MongoDB before 2.6, in 2.6 and above it will. However it will truncate all the results and only return if the command ran successfully or failed.

As a result, if you execute.

    {
        insert: 'collection name'
      , documents: [{ a : 1}, ...]
      , writeConcern: {
        w: 0
      }
      , ordered: true/false
    }

All you receive from the server is the result

{ok : 1}

The Implication For The Bulk API

There are some implications to the fact that write commands are not mixed operations but either insert/update or removes. The Bulk API lets you mix operations and then merges the results back into a single result that simulates a mixed operations command in MongoDB. What does that mean in practice. Well let’s look at how node.js implements ordered and unordered bulk operations. Let’s use examples to show what happens.

Ordered Operations

Let’s take the following set of operations:

    var ordered = db.collection('documents').initializeOrderedBulkOp();
    ordered.insert({ a : 1 });
    ordered.find({ a : 1 }).update({ $inc: { x : 1 }});
    ordered.insert({ a: 2 });
    ordered.find({ a : 2 }).remove();
    ordered.insert({ a: 3 });

When running in ordered mode the bulk API guarantees the ordering of the operations and thus will execute this as 5 operations one after the other:

    insert bulk operation
    update bulk operation
    insert bulk operation
    remove bulk operation
    insert bulk operation

We have now reduced the bulk API to performing single operations and your throughput suffers accordingly.

If we re-order our bulk operations in the following way:

    var ordered = db.collection('documents').initializeOrderedBulkOp();
    ordered.insert({ a : 1 });
    ordered.insert({ a: 2 });
    ordered.insert({ a: 3 });
    ordered.find({ a : 1 }).update({ $inc: { x : 1 }});
    ordered.find({ a : 2 }).remove();

The execution is reduced to the following operations one after the other:

    insert bulk operation
    update bulk operation
    remove bulk operation

Thus for ordered bulk operations the order of operations will impact the number of write commands that need to be executed and thus the throughput possible.

Unordered Operations

Unordered operations do not guarantee the execution order of operations. Let’s take the example from above:

    var ordered = db.collection('documents').initializeOrderedBulkOp();
    ordered.insert({ a : 1 });
    ordered.find({ a : 1 }).update({ $inc: { x : 1 }});
    ordered.insert({ a: 2 });
    ordered.find({ a : 2 }).remove();
    ordered.insert({ a: 3 });

The Node.js driver will collect the operations into separate type-specific operations. So we get.

    insert bulk operation
    update bulk operation
    remove bulk operation

In difference to the ordered operation these bulks all get executed in parallel in Node.js and the results then merged when they have all finished.

Takeaway

MongoDB as of 2.6 only allows batches of inserts, updates or removes and not a mixed batch containing all three of the operation types. When performing ordered bulk operation we need to keep this in mind to avoid the scenario above. However for an unordered bulk operation the missing mixed batch type in 2.6 does not impact performance.

Note: Although the Bulk API actually supports downconversion to 2.4 the performance impact is considerable as all operations are reduced to single write operations with a getLastError. It’s recommended to leverage this API primarily with 2.6 or higher.

Like what you see? Sign up to the MongoDB Newsletter and get monthly updates straight to your inbox

blog comments powered by Disqus
blog comments powered by Disqus