← back to the blog


MongoD aggregation pipelines with $match and $project

Posted on August 1st, 2020 in NodeJS by George

Intro

 

In this tutorial, we are going to create a 4 stages pipeline using the aggregation framework provided by MongoDB. 

I have an example database named movies. Each document has many fields but I would like to focus only on 4 of them. 

The next query is a projection to demonstrate the fields used further. 

 

MongoDB Enterprise Cluster0-shard-0:PRIMARY> db.movies.findOne({"title" : "Treasure Island"}, {_id: 0, title: 1, cast: 1, directors: 1, writers: 1})
{
	"title" : "Treasure Island",
	"cast" : [
		"Sarah Holland",
		"Pax Lohan",
		"Anthony Booth",
		"Christopher Benjamin"
	],
	"directors" : [
		"Peter Rowe"
	],
	"writers" : [
		"Peter Rowe",
		"Robert Louis Stevenson (novel)"
	]
}

 

As we can see in the example above, I am interested in the title, and the 3 array fields, cast, directors, and writers.

Not all the documents in the database have all these fields set, but we will handle this with "$match" later on when we will sanitize the documents.

In the next part of the article, I will prepare 4 stages for my pipeline where I want to get all the documents having a certain person in each of the array values.

If a person in the particular movie is cast, director and writer, these are the documents I want to get.

The tricky part is that the writers may have appended to the name an attribute like (novel), see in the example above. We will transform our values using "$map" to fix this.

 

Prepare stage 1

 

The first stage is to match all the elements that have values in all three arrays.

If you are reading this article then I assume that you are familiar with the mongo CLI and you know already that is a JavaScript interpreter.

So each step is going to be assigned to a variable as st1, st2, st3, s4.

 

var st1 = {
	"$match" : {
		"cast" : {
			"$elemMatch" : {
				"$exists" : true
			}
		},
		"directors" : {
			"$elemMatch" : {
				"$exists" : true
			}
		},
		"writers" : {
			"$elemMatch" : {
				"$exists" : true
			}
		}
	}
}

 

This stage ensures that our pipeline has prepared only documents having values in the directors, writers, and cast.

 

Prepare stage 2

 

In stage 2 we will use "$project" to make sure that we are passing further only the fields we need. Title, cast, writers, directors.

At this stage, we will sanitize the values inside writers array by mapping over the documents, splitting the string values for each of the writers, and extracting only the name leaving aside any attribute like "(novel)" if exists.

 

var st2 = {
	"$project" : {
                "title": 1,
		"_id" : 0,
		"cast" : 1,
		"directors" : 1,
		"writers" : {
			"$map" : {
				"input" : "$writers",
				"as" : "writer",
				"in" : {
					"$arrayElemAt" : [
						{
							"$split" : [
								"$$writer",
								" ("
							]
						},
						0
					]
				}
			}
		}
	}
}

 

Prepare stage 3

 

In this stage, I will use "$setIntersection" to create an array of elements of existing elements in all 3 array values.

Then, using "$size" and "$gt" I will check if the size is greater than 0 which confirms that the specific name value is present in the new array created by the intersection and therefore in the 3 array fields of the document.

The newly created field "isInCastDirectorsAndWriters" value is going to be a Boolean.

 

var st3 = {
	"$project" : {
		"title" : 1,
		"isInCastDirectorsAndWriters" : {
			"$gt" : [
				{
					"$size" : {
						"$setIntersection" : [
							"$cast",
							"$directors",
							"$writers"
						]
					}
				},
				0
			]
		}
	}
}

 

Prepare stage 4

 

Stage 4 is used just to filter out the documents where the "isInCastDirectorsAndWriters" value is false.

 

var st4 = { "$match" : { "isInCastDirectorsAndWriters" : true } }

 

Run the aggregation pipeline

 

As a final step, we will run the aggregation pipeline using the stages created above.

Then we will see the results printed to the console.

 

//The next line is the aggregation command
//The pipeline is the array containing the stages st1,st2 ...

MongoDB Enterprise Cluster0-shard-0:PRIMARY> db.movies.aggregate([st1,st2,st3,st4])

{ "title" : "Une partie de cartes", "isInCastDirectorsAndWriters" : true }
{ "title" : "The House of the Devil", "isInCastDirectorsAndWriters" : true }
{ "title" : "The Bewitched Inn", "isInCastDirectorsAndWriters" : true }
{ "title" : "The Sign of the Cross", "isInCastDirectorsAndWriters" : true }
{ "title" : "Alice in Wonderland", "isInCastDirectorsAndWriters" : true }
{ "title" : "The Voyage Across the Impossible", "isInCastDirectorsAndWriters" : true }
{ "title" : "Fairyland: A Kingdom of Fairies", "isInCastDirectorsAndWriters" : true }
{ "title" : "The Conquest of the Pole", "isInCastDirectorsAndWriters" : true }
{ "title" : "The Life and Death of King Richard III", "isInCastDirectorsAndWriters" : true }
{ "title" : "Gertie the Dinosaur", "isInCastDirectorsAndWriters" : true }
{ "title" : "The Mysterious X", "isInCastDirectorsAndWriters" : true }
{ "title" : "Blind Justice", "isInCastDirectorsAndWriters" : true }
{ "title" : "Fatty and Mabel Adrift", "isInCastDirectorsAndWriters" : true }
{ "title" : "Police", "isInCastDirectorsAndWriters" : true }
{ "title" : "The Rough House", "isInCastDirectorsAndWriters" : true }
{ "title" : "A Man There Was", "isInCastDirectorsAndWriters" : true }
{ "title" : "The Bell Boy", "isInCastDirectorsAndWriters" : true }
{ "title" : "The Outlaw and His Wife", "isInCastDirectorsAndWriters" : true }
{ "title" : "A Dog's Life", "isInCastDirectorsAndWriters" : true }
{ "title" : "Shoulder Arms", "isInCastDirectorsAndWriters" : true }

Type "it" for more

//Get the total using itcount()

MongoDB Enterprise Cluster0-shard-0:PRIMARY> db.movies.aggregate([st1,st2,st3,st4]).itcount()

Total: 1597


 

Thank you if you managed to get so far.

All the best.