A distributed system is composed of a group of computer nodes that communicate through a network and coordinate their work to accomplish a common task. The emergence of distributed systems uses cheap, ordinary machines to complete a single computer that can not complete the computation and storage tasks. Therefore, the aim is to use more machines to process more data.
First, when the processing capacity of a single node cannot meet the increasing computation, storage, task, and the advanced of the hardware (plus memory, and disk, using better CPU) are very expensive. And developers cannot optimize the applications further. Then, we only need to consider a distributed system. The distributed systems resolve problems itself is the same as the stand-alone system. And distributed systems with multi-node, through the network communication topology, will introduce many issues that the stand-alone system does not have. To resolve these problems, the distributed systems will address more mechanisms and protocols and bring more problems.
In many articles, distributed systems are divided into distributed computing and distributed storage. Computing and storage are complementary. For example, computing requires data, either from live data (streaming data) or from stored data; And the results of the calculation need to be stored. While computing and storage are discussed in great detail in operating systems, distributed systems extend these theories to multiple nodes.
So how does a distributed system distribute tasks to these computer nodes, a straightforward idea, divide and conquer? For computing, you switch around, you do a little bit of computation at each node, and you sum it up, and that’s the idea of MapReduce; For storage, a better idea is to store a portion of the data per node. Partition is the only option when the data size is significant, and it also brings some benefits:
(1) To improve performance and concurrency, operations are distributed to different shards, independent of each other
(2) To improve the availability of the system, even if some fragments are not available, other fragments will not be affected