We’re all quite used to having log files on lots of servers, in disparate places. Wouldn’t it be nice to have centralized logs for a production system? Logs that can be queried?
I would encourage everyone to consider using MongoDB for log centralization. It’s a very good fit for this problem for several reasons:
- MongoDB inserts can be done asynchronously. One wouldn’t want a user’s experience to grind to a halt if logging were slow, stalled or down. MongoDB provides the ability to fire off an insert into a log collection and not wait for a response code. (If one wants a response, one calls getLastError() — we would skip that here.)
- Old log data automatically LRU’s out. By using capped collections, we preallocate space for logs, and once it is full, the log wraps and reuses the space specified. No risk of filling up a disk with excessive log information, and no need to write log archival / deletion scripts.
- It’s fast enough for the problem. First, MongoDB is very fast in general, fast enough for problems like this. Second, when using a capped collection, insertion order is automatically preserved: we don’t need to create an index on timestamp. This makes things even faster, and is important given that the logging use case has a very high number of writes compared to reads (opposite of most database problems).
- Document-oriented / JSON is a great format for log information. Very flexible and “schemaless” in the sense we can throw in an extra field any time we want.
The MongoDB profiler works very much in the way outlined above, storing profile timings in a collection that is very log-like. We have been very happy with that implementation to date.