Sharding in MongoDB

Welcome to a tutorial on Sharding in MongoDB.

In MongoDB, Sharding is the mechanism of storing data across multiple machines. However, the basic principle of this feature of MongoDB is to support the data growth which is expected of any application. It is so, because, at one point in time the accessibility of any application will definitely result in an increase in the data growth and it would be difficult to accommodate such a growth of data.

Now, when considering data growth, it is quite difficult to manage in a single system. However, it is an ideal way to have a cluster containing the replica set of the data. Thus, horizontal scaling of the data is required and sharding does this in MongoDB. But, in simple terms, sharding just adds more machines to handle the sudden or rapid growth of data in an application.

 

Need for Sharding in MongoDB :

  1. Vertical scaling is very scaling
  2. In the data backup process, all available data will be written to the master nodes.
  3. Space in the local disk may not be large enough to handle the data growth.

Check out the diagram below as it shows the conceptual diagram of how the sharding works in a mongodb environment.

Sharding in MongoDB

 

Application

This is an application that makes use of the mongodb and needs to cluster the data across multiple servers.

 

Shards

Shards are used to store the actual data. Also in any production environment, each shard will be a separate replica set.

 

Configuration

Configuration is configured mongodb servers that store the cluster's metadata. Ordinarily, these configuration servers contain a mapping of the cluster's data set to the shards. The query router chooses the specific shard based on the metadata for any target operations. Also, in any production environment, there will be actually 3 configuration servers.