Last time we took a look at an example of a Netflix database and found out, that these giant companies oftentimes prefer NoSQL databases to be able to handle large volumes of data and have maximum availability and flexibility. Feel free to go back to the previous post to learn more or use elya_studywithme to find all related posts.
Now, is there something else, other than choosing a NoSQL database type, what can make our data more reliable? How can we make sure that our data continues to be available even if the server goes down?
Replication makes it possible.
Database replication is the process of storing data on more than one server.
Imagine lots of movie data that Netflix possesses. Replicating it would simply mean being clever and not storing all on one server, but having multiple copies on different servers that results in a distributed database. If one server fails, there’s gonna be still all data accessible from another server. Quite nice.
In fact, that’s what happened in 2015. When Amazon Web Services experienced major technical difficulties, many well-established tech companies got affected, loosing millions of dollars over lost operational hours.
But not Netflix. Netflix had data distributed over multiple servers in multiple regions. When the whole region fell out, Netflix still had same data available elsewhere.
Other advantages of database replication:
Load reduction – data spread over several servers eliminates the likelihood that any one server will be overwhelmed by user queries.
Efficiency – when servers that are less burdened with queries they can offer improved performance.
High availability – multiple servers with the same data ensures high availability, which we talked about above. If one server goes down, other server will serve as a backup.
Some disadvantages of database replication include e.g. data loss that can occur during replication or data being out of sync with each other. Finally, running multiple servers comes with additional costs.
How to replicate data?
My university slides mention 2 main ways to do it:
- synchronous propagation – it’s when changes you make in a database are propagated to all database copies immediately
- asynchronous propagation – you first update one copy and then propagate changes to other database copies
In addition, data replication can start from 2 different locations:
- primary copy – there is a master database copy and you’re only allowed to change this one. Afterwards changes will be propagated synchronously or asynchronously to all other copies.
- update everywhere – you can freely choose a database you wanna update.
From where data replication can start and how changes can be propagated can be all summarised in the matrix (see carousel).
That’s it for today. I hope you now realise why database replication is important and how it works. Have a great Saturday!