Data Modelling in MongoDB

Welcome to a tutorial on Data Modelling in MongoDB.

As you already know, data in MongoDB is schema-less, meaning that there is no need of defining a structure for the data before insertion. So, MongoDB is a document-based database, any document within the same collection is not mandatory to have the same set of fields or structure.

Also, MongoDB helps in mapping the documents with the entity objects. Generally, the documents in a collection of MongoDB will always share the same data structure.

An important factor or challenge in modeling the data in a database is simply load balancing and therefore ensures an effective performance aspect. Also, it is a mandate while modeling the data to consider the complete usage of data (i.e. CRUD operations) along with how the data will be inherited.

In MongoDB, there are 2 ways in which the relationships between the data can be established.

Reference Documents
Embedded Documents

MongoDB: Referenced Documents

This is a way to implement the relationship between data stored in different collections. It is so because a reference to the data in one collection will be used in connecting the data between the collections.

Check out the example below: we have 2 collections of books and authors as shown below:

{
  title: "Mongo database",
  author: "author1",
  language: "English",
  publisher: {
             name: "My publications",
             founded:1990,
             location: "SF"
            }
}

{
  title: "Hibernate in action",
  author: "author2",
  language: "English",
  publisher: {
             name: "My publications",
             founded:1990,
             location: "SF"
            }
}

From the above example, the publisher data is repeated. So, to avoid this repetition, we can add references to the book to the publisher data rather than using the entire data of the publisher in every book entry. Check out the example below.

{
  name: "My Publciations",
  founded:1980,
  location: "CA",
  books: [111222333,444555666, ..]
}

{
   _id:111222333,
   title: "Mongo database",
   author: "author1",
   language: "English"
}

{
  _id:444555666,
  title: "Hibernate in action",
  author: "author2",
  language: "English"
}

We can do this in another way, wherein one can reference the publisher id in the book's data, your choice.

MongoDB: Embedded Documents

Now for this, one collection will be embedded into another collection. Let’s consider 2 collections of students and address.

The example below shows how the address can be embedded into the student collection. An example is the embedding of a single address to student data.

{
  _id:123,
  name: "Justin"
}

{
  _studentId:123,
  street: "123 Street",
  city: "USA",
  state: "CA"
}

{
  _studentId:123,
  street: "456 Street",
  city: "California",
  state: "HR"
}

Embedding multiple addresses can also be done, as shown in the example below.

{
  _id:123,
  name: "Justin"
  addresses: [
      	{
             	street: "123 Street",
   		city: "USA",
   		state: "CA"
        },

        {
   		street: "456 Street",
   		city: "California",
   		state: "HR"
        }
    ]
}

Example of Data Modelling in MongoDB

Now, consider an example of building a student database in a college. Let’s assume there are 3 models: Student, Address, and Course. In a regular RDBMS database, these 3 models will be translated into 3 tables. This is shown below:

From the above model, if a user's details have to be added, then entries should be made in all the 3 tables.

Now, let's try to model the same data modeling in MongoDB. Therefore, in MongoDB, schema design will have only a collection of one User and will have the structure shown below.

{
   _id: 123,
   username: '123xyz',
   contact :[{
           phone: '123-456-7890',
           email: 'xyz@example.com'
        }
    ],
   access: [{
   	level:5,
   	group:"dev"
   }]
}

In MongoDB, data related to all the 3 models will be displayed under one Collection.

Also, MongoDB provides multiple ways of modeling your data.

Note that fieldnames are in a collection like username, and so on. In the above example use memory, may 10-20 bytes or so. However, when the dataset is very large, this can add up to a lot of memory. Thus, it is a good practice for large datasets, to use short fieldnames to store data in collections, like username instead of uName.