Posts tagged:

developers

MongoDB’s New Bulk API

May 6 • Posted 2 months ago

By Christian Kvalheim, Driver Lead and Node.js Driver Maintainer at MongoDB

The New Bulk API

One of the core new features in MongoDB 2.6 is the new bulk write operations. All the drivers include a new bulk api that allows applications to leverage these new operations using a fluid style API. Let’s explore the API and how it’s implemented in the Node.js driver.

The API

The API has two core concepts. The ordered and the unordered bulk operation. The main difference is in the way the operations are executed in bulk. In the case of an ordered bulk operation, every operation will be executed in the order they are added to the bulk operation. In the case of an unordered bulk operation however there is no guarantee what order the operations are executed. Later we will look at how each is implemented.

Operations

You can initialize an ordered or unordered bulk operation in the following way.

    var ordered = db.collection('documents').initializeOrderedBulkOp();
    var unordered = db.collection('documents').initializeUnorderedBulkOp();

Both the ordered and unordered instances are bulk operation objects that we can add insert, update and remove operations to. The following operations are valid.

updateOne (update first matching document)

```ordered.find({ a : 1 }).updateOne({$inc : {x : 1}});```

update (update all matching documents)

```ordered.find({ a : 1 }).update({$inc : {x : 2}});```

replaceOne (replace entire document)

```ordered.find({ a : 1 }).replaceOne({ x : 2});```

updateOne or upsert (update first existing document or upsert)

```ordered.find({ a : 2 }).upsert().updateOne({ $inc : { x : 1}});```

update or upsert (update all or upsert)

```ordered.find({ a : 2 }).upsert().update({ $inc : { x : 2}});```

replace or upsert (replace first document or upsert)

```ordered.find({ a : 2 }).upsert().replaceOne({ x : 3 });```

removeOne (remove the first document matching)

```ordered.find({ a : 2 }).removeOne();```

remove (remove all documents matching)

```ordered.find({ a : 1 }).remove();```

insert

```ordered.insert({ a : 5});```

What happens under the covers when you start adding operations to a bulk operation? Let’s take a look at the new write operations to see how it works.

The New Write Operations

MongoDB 2.6 introduces a completely new set of write operations. Before 2.6 all write operations where done using wire protocol messages at the socket level. From 2.6 this changes to using commands.

### Insert Write Command

The insert write commands allow an application to insert batches of documents. Here’s an example:

    {
        insert: 'collection name'
      , documents: [{ a : 1}, ...]
      , writeConcern: {
        w: 1, j: true, wtimeout: 1000
      }
      , ordered: true/false
    }

A couple of things to note. The documents field contains an array of all the documents that are to be inserted. The writeConcern field specifies what would have previously been a getLastError command that would follow the pre 2.6 write operations. In other words there is always a response from a write operation in 2.6. This means that w:0 has different semantics than what one is used to in pre 2.6. In the context w:0 basically means only return an ack without any information about the success or failure of insert operations.

Let’s take a look at the update and remove write commands before seeing the results that are returned when executing these operations in 2.6.

Update Write Command

There are some slight differences in the update write command in comparison to the insert write command. Here’s an example:

    {
        update: 'collection name'
      , updates: [{ 
            q: { a : 1 }
          , u: { $inc : { x : 1}}
          , multi: true/false
          , upsert: true/false
        }, ...]
      , writeConcern: {
        w: 1, j: true, wtimeout: 1000
      }
      , ordered: true/false
    }

The main difference here is that the updates array is an array of update operations where each entry in the array contains the q field that specifies the selector for the update. The u contains the update operation. multi specifies if we will updateOne or updateAll documents that matches the selection. Finally upsert tells the server if it will perform an upsert if the document is not found.

Finally let’s look at the remove write command.

Remove Write Command

The remove write command is very similar to the update write command. Here’s an example:

    {
        delete: 'collection name'
      , deletes: [{ 
            q: { a : 1 }
          , limit: 0/1
        }, ...]
      , writeConcern: {
        w: 1, j: true, wtimeout: 1000
      }
      , ordered: true/false
    }

Similar to the update example, we can see that the entries in the deletes array contain documents with specific fields. The q field is the selector that will match which documents will be removed. The limit field sets the number of elements to be removed. Currently limit only supports two values, 0 and 1. The value 0 for limit removes all documents that match the selector. A value of 1 for limit removes the first matching document only.

Now let’s take a look at how results are returned for these new write commands.

Write Command Results

One of the best new aspects of the new write commands is that they can return information about each individual operation error in the batch. Results are efficient - only information about errors are returned as well as the aggregated counts of successful operations. Here’s an example of a comprehensive* result:

    {
      "ok" : 1,
      "n" : 0,
      "nModified": 1, (Applies only to update)
      "nRemoved": 1, (Applies only to removes)
      "writeErrors" : [
        {
          "index" : 0,
          "code" : 11000,
          "errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: t1.t.$a_1  dup key: { : 1.0 }"
        }
      ],
      writeConcernError: {
        code : 22,
        errInfo: { wtimeout : true },
        errmsg: "Could not replicate operation within requested timeout"
      }      
    }

The two most interesting fields here are writeErrors and writeConcernError. If we take a look at writeErrors we can see how it’s an array of objects that include an index field as well as a code and errmsg. The field references the position of the failing document in the original documents, updates or deletes array allowing the application to identify the original batch document that failed.

The Effect of Ordered (true/false)

If ordered is set to true the write operation will fail on the first write error (meaning the first error that fails to apply the operation to memory). If one sets ordered to false the operation will continue until all operations have been executed (potentially in parallel), then return all the results. writeConcernError on the other hand does not stop the processing of a bulk operation if a document fails to be written to MongoDB.

It helps to think of writeErrors as hard errors and writeConcernError as a soft error.

The Special Case of w:0 As I mentioned previously, the semantics for w:0 changed for the write commands. The old style of write operations before 2.6 are a combination of a write wire message and a getLastError command. In the old style w:0 meant that the driver would not send a getLastError command after the write operation.

In 2.6 the new insert/update/delete commands will always respond. While w:0 would not return a result in versions of MongoDB before 2.6, in 2.6 and above it will. However it will truncate all the results and only return if the command ran successfully or failed.

As a result, if you execute.

    {
        insert: 'collection name'
      , documents: [{ a : 1}, ...]
      , writeConcern: {
        w: 0
      }
      , ordered: true/false
    }

All you receive from the server is the result

{ok : 1}

The Implication For The Bulk API

There are some implications to the fact that write commands are not mixed operations but either insert/update or removes. The Bulk API lets you mix operations and then merges the results back into a single result that simulates a mixed operations command in MongoDB. What does that mean in practice. Well let’s look at how node.js implements ordered and unordered bulk operations. Let’s use examples to show what happens.

Ordered Operations

Let’s take the following set of operations:

    var ordered = db.collection('documents').initializeOrderedBulkOp();
    ordered.insert({ a : 1 });
    ordered.find({ a : 1 }).update({ $inc: { x : 1 }});
    ordered.insert({ a: 2 });
    ordered.find({ a : 2 }).remove();
    ordered.insert({ a: 3 });

When running in ordered mode the bulk API guarantees the ordering of the operations and thus will execute this as 5 operations one after the other:

    insert bulk operation
    update bulk operation
    insert bulk operation
    remove bulk operation
    insert bulk operation

We have now reduced the bulk API to performing single operations and your throughput suffers accordingly.

If we re-order our bulk operations in the following way:

    var ordered = db.collection('documents').initializeOrderedBulkOp();
    ordered.insert({ a : 1 });
    ordered.insert({ a: 2 });
    ordered.insert({ a: 3 });
    ordered.find({ a : 1 }).update({ $inc: { x : 1 }});
    ordered.find({ a : 2 }).remove();

The execution is reduced to the following operations one after the other:

    insert bulk operation
    update bulk operation
    remove bulk operation

Thus for ordered bulk operations the order of operations will impact the number of write commands that need to be executed and thus the throughput possible.

Unordered Operations

Unordered operations do not guarantee the execution order of operations. Let’s take the example from above:

    var ordered = db.collection('documents').initializeOrderedBulkOp();
    ordered.insert({ a : 1 });
    ordered.find({ a : 1 }).update({ $inc: { x : 1 }});
    ordered.insert({ a: 2 });
    ordered.find({ a : 2 }).remove();
    ordered.insert({ a: 3 });

The Node.js driver will collect the operations into separate type-specific operations. So we get.

    insert bulk operation
    update bulk operation
    remove bulk operation

In difference to the ordered operation these bulks all get executed in parallel in Node.js and the results then merged when they have all finished.

Takeaway

MongoDB as of 2.6 only allows batches of inserts, updates or removes and not a mixed batch containing all three of the operation types. When performing ordered bulk operation we need to keep this in mind to avoid the scenario above. However for an unordered bulk operation the missing mixed batch type in 2.6 does not impact performance.

Note: Although the Bulk API actually supports downconversion to 2.4 the performance impact is considerable as all operations are reduced to single write operations with a getLastError. It’s recommended to leverage this API primarily with 2.6 or higher.

Like what you see? Sign up to the MongoDB Newsletter and get monthly updates straight to your inbox

Introducing the MongoDB Community Kit

Oct 16 • Posted 9 months ago

Open source projects thrive as a reflection of the participation and enthusiasm of their communities. MongoDB is a great example of a global community, and we have seen a number of people, like MongoDB MUG organizers and MongoDB Masters, create lasting impact for MongoDB through their work with the community. To encourage growth in the MongoDB Community, we’ve taken what we’ve learned and turned into a new resource: The MongoDB Community Kit.

Sometimes people want to get involved but aren’t sure how how to get started. This tool can help you as a user and a community member contribute to the project in whatever way you like. Based on our experiences with thousands of developers over the past 4 years, the MongoDB community team have developed a number of techniques that will help you provide valuable impact. For instance, you can:

  • Give a talk on how you scaled your MongoDB infrastructure
  • Write a blog post with real advice on how to use MongoDB in production
  • Create an open source tool for MongoDB that enables other users to code faster and create better deployments.
  • Create a User Group in your local area that educates new users and brings a community together
  • Contribute to the community in whichever way you like, even if it isn’t listed in the Community Kit. The invitation is open.

The Kit is available on Github as an open source project. This makes it easy to access, fork and update. This package is released under the Creative Commons license CC BY-NC-SA 3.0. Just like any other project, this kit gets better when users contribute their knowledge, so we encourage you to submit a pull request and add your feedback.

Some suggestions:

  • Add some posters you created for a MUG
  • Post your “Intro to MongoDB Slides”
  • Fork the kit and translate it into your own language and share with your community

We’re looking forward to seeing all the great activity coming from the community. Keep the pull requests coming!

Enhancing the F# developer experience with MongoDB

Aug 28 • Posted 11 months ago

This is a guest post by Max Hirschhorn, who is currently an intern at MongoDB.

About the F# programming language

F# is a multi-paradigm language built on the .NET framework. It is functional-first and prefers immutability, but also supports object-oriented and imperative programming styles.

Also, F# is a statically-typed language with a type inference system. It has a syntax similar to Ocaml, and draws upon ideas from other functional programming languages such as Erlang and Haskell.

Using the existing .NET driver

The existing .NET driver is compatible with F#, but is not necessarily written in a way that is idiomatic to use from F#.

Part of the reason behind this is that everything in F# is explicit. For example, consider the following example interface and implementing class.

[<Interface>]
type I =
    abstract Foo : unit -> string

type C() =
    interface I with
        member __.Foo () = "bar"

// example usage
let c = C()
(c :> I).Foo()

So in order to use any of the interface members, the class must be upcasted using the :> operator. Note that this cast is still checked at compile-time.

In a similar vein, C# supports implicit operators, which the BSON library uses for converting between a primitive value and its BsonValue equivalent, e.g.

new BsonDocument {
    { "price", 1.99 },
    { "$or", new BsonDocument {
        { "qty", new BsonDocument { { "$lt", 20 } } },
        { "sale", true }
    } }
};

whereas F# does not. This requires the developer to explicitly construct the appropriate type of BsonValue, e.g.

BsonDocument([ BsonElement("price", BsonDouble(1.99))
               BsonElement("$or", BsonArray([ BsonDocument("qty", BsonDocument("$lt", BsonInt32(20)))
                                              BsonDocument("sale", BsonBoolean(true)) ])) ])

with the query builder, we can hide the construction of BsonDocument instances, e.g.

Query.And([ Query.EQ("price", BsonDouble(1.99))
            Query.OR([ Query.LT("qty", BsonInt32(20))
                       Query.EQ("sale", BsonBoolean(true)) ]) ])

It is worth noting that the need to construct the BsonValue instances is completely avoided when using a typed QueryBuilder.

type Item = {
    Price : float
    Quantity : int
    Sale : bool
}

let query = QueryBuilder<Item>()

query.And([ query.EQ((fun item -> item.Price), 1.99)
            query.Or([ query.LT((fun item -> item.Quantity), 20)
                       query.EQ((fun item -> item.Sale), true) ]) ])

What we are looking for is a solution that matches the brevity of F# code, offers type-safety if desired, and is easy to use from the language.

New features

The main focus of this project is to make writing queries against MongoDB as natural from the F# language as possible.

bson quotations

We strive to make writing predicates as natural as possible by reusing as many of the existing operators as possible.

A taste

Consider the following query

{ price: 1.99, $or: [ { qty: { $lt: 20 } }, { sale: true } ] }

we could express this with a code quotation

bson <@ fun (x : BsonDocument) -> x?price = 1.99 && (x?qty < 20 || x?sale = true) @>

or with type safety

bson <@ fun (x : Item) -> x.Price = 1.99 && (x.Quantity < 20 || x.Sale = true) @>
Breaking it down

The quotations are not actually executed, but instead are presented as an abstract syntax tree (AST), from which an equivalent BsonDocument instance is constructed.

The ? operator

The ? operator is defined to allow for an unchecked comparison. The F# language supports the ability to do a dynamic lookup (get) and assignment (set) via the ? and ?<- operators respectively, but does not actually provide a implementation.

So, the F# driver defines the ? operator as the value associated with a field in a document casted to a fresh generic type.

// type signature: BsonDocument -> string -> 'a
let (?) (doc : BsonDocument) (field : string) =
    unbox doc.[field]

and similarly defines the ?<- operator as the coerced assignment of a generically typed value to the associated field in the document.

// type signature: BsonDocument -> string -> 'a -> unit
let (?<-) (doc : BsonDocument) (field : string) value =
    doc.[field] = unbox value |> ignore
Queries

Unchecked expressions have the type signature Expr<BsonDocument -> bool>.

// $mod
bson <@ fun (x : BsonDocument) -> x?qty % 4 = 0 @>

Checked expressions have the type signature Expr<'DocType -> bool>.

// $mod
bson <@ fun (x : Item) -> x.Quantity % 4 = 0 @>
Updates

Unchecked expressions have the type signature Expr<BsonDocument -> unit list>. The reason for the list in the return type is to perform multiple update operations.

// $set
bson <@ fun (x : BsonDocument) -> [ x?qty <- 20 ] @>

// $inc
bson <@ fun (x : BsonDocument) -> [ x?qty <- (+) 1 ] @>
Mmm… sugar

A keen observer would notice that (+) 1 is not an int, but actually a function int -> int. We are abusing the fact that type safety is not enforced here by assigning the quantity field of the document to a lambda expression, that takes a single parameter of the current value.

Note that

// $inc
bson <@ fun (x : BsonDocument) -> [ x?qty <- x?qty + 1 ] @>

is also valid.

Checked expressions either have the type signature Expr<'DocType -> unit list> or Expr<'DocType -> 'DocType>, depending on whether the document type has mutable fields (only matters for record types).

// $set
bson <@ fun (x : Item) -> [ x.Quantity <- 20 ] @>

// $inc
bson <@ fun (x : Item) -> [ x.Quantity <- x.Quantity + 1 ] @>

mongo expressions

Uses the monadic structure (computation expression) to define a pipeline of operations that are executed on each document in the collection.

Queries
let collection : IMongoCollection<BsonDocument> = ...

mongo {
    for x in collection do
    where (x?price = 1.99 && (x?qty < 20 || x?sale = true))
}

or with a typed collection

let collection : IMongoCollection<Item> = ...

mongo {
    for x in collection do
    where (x.price = 1.99 && (x.qty < 20 || x.sale = true))
}
Updates
let collection : IMongoCollection<BsonDocument> = ...

mongo {
    for x in collection do
    update
    set x?price 0.99
    inc x?qty 1
}

or with a typed collection

let collection : IMongoCollection<Item> = ...

mongo {
    for x in collection do
    update
    set x.Price 0.99
    inc x.Quantity 1
}

Serialization of F# data types

Now supports

Conclusion

Resources

The source code is available at GitHub. We absolutely encourage you to experiment with it and provide us feedback on the API, design, and implementation. Bug reports and suggestions for improvements are welcomed, as are pull requests.

Disclaimer. The API and implementation are currently subject to change at any time. You must not use this driver in production, as it is still under development and is in no way supported by MongoDB, Inc.

Acknowledgments

Many thanks to the guidance from the F# community on Twitter, and my mentors: Sridhar Nanjundeswaran, Craig Wilson, and Robert Stam. Also, a special thanks to Stacy Ferranti and Ian Whalen for overseeing the internship program.

blog comments powered by Disqus