4

I'm trying to make use of Mongoose and its querystream in a scheduling application, but maybe I'm misunderstanding how it works. I've read this question here on SO [Mongoose QueryStream new results and it seems I'm correct, but someone please explain:

If I'm filtering a query like so -

Model.find().stream()

when I add or change something that matches the .find(), it should throw a data event, correct? Or am I completely wrong in my understanding of this issue?

For example, I'm trying to look at some data like so:

 Events.find({'title':/^word/}).stream();

I'm changing titles in the mongodb console, and not seeing any changes.

Can anyone explain why?

Community
  • 1
  • 1
afithings
  • 691
  • 1
  • 9
  • 16

1 Answers1

2

Your understanding is indeed incorrect as a stream is just an output stream of the current query response and not something that "listens for new data" by itself. The returned result here is basically just a node streaming interface, which is an optional choice as opposed to a "cursor", or indeed the direct translation to an array as mongoose methods do by default.

So a "stream" does not just "follow" anything. It is reall just another way of dealing with the normal results of a query, but in a way that does not "slurp" all of the results into memory at once. It rather uses event listeners to process each result as it is fetched from the server cursor.

What you are in fact talking about is a "tailable cursor", or some variant thereof. In basic MongoDB operations, a "tailable cursor" can be implemented on a capped collection. This is a special type of collection with specific rules, so it might not suit your purposes. They are intended for "insert only" operations which is typically suited to event queues.

On a model that is using a capped collection ( and only where a capped collection has been set ) then you implement like this:

var query = Events.find({'title':/^word/}).sort({ "$natural": -1}).limit(1);
var stream  = query.tailable({ "awaitdata": true}).stream();

// fires on data received
stream.on("data",function(data) {
    console.log(data);
});

The "awaitdata" there is just as an important option as the "tailable" option itself, as it is the main thing that tells the query cursor to remain "active" and "tail" the additions to the collection that meet the query conditions. But your collection must be "capped" for this to work.

An alternate and more adavanced approach to this is to do something like the meteor distribution does, where the "capped collection" that is being tailed is in fact the MongoDB oplog. This requires a replica set configuration, however just as meteor does out of the box, there is nothing wrong with having a single node as a replica set in itself. It's just not wise to do so in production.

This is more adavnced than a simple answer, but the basic concept is since the "oplog" is a capped collection you are able to "tail" it for all write operations on the database. This event data is then inspected to determine such details as the collection you want to watch for writes has been written to. Then that data can be used to query the new information and do something like return the updated or new results to a client via a websocket or similar.

But a stream in itself is just a stream. To "follow" the changes on a collection you either need to implement it as capped, or consider implementing a process based on watching the changes in the oplog as described.

Blakes Seven
  • 49,422
  • 14
  • 129
  • 135
  • Thank you. This is a really fantastic explanation. I had already set it up as a capped collection, so I'll have to investigate further. In either case, I assume this would mean that if I'm seeking against something like, say, a timestamp, it would only return results for things inserted after I've executed that .find, correct? It wouldn't poll to keep checking? – afithings Jul 31 '15 at 04:32
  • @afithings On the very first execution of the query ( polled by the event listener ) all results will be returned that match the condtions. I am just asking for `limit(1)` in the example since you usually just want to follow. After that, since the cursor on the server remains open, then all newly inserted data is returned. This will not track document "changes" other than insertion. For that you need to look at the oplog. So generally, just new insertions after it is listening. – Blakes Seven Jul 31 '15 at 04:36