What is a distributed system?
Hi, In this tutorial, I’m going to discuss the three characteristics that a distributed system has to have to be considered a distributed system. I’m Asiri Hewage, I work as a Software Engineer at Pearson. Pearson is the world’s learning company that makes a commercial distribution of the world’s top-quality educational publications. I’ve been a Software Engineer for about 3 years. I’ve spent the last few years mostly being a mentor too.
Andrew Tanenbaum, a famous computer science professor and author of a few very influential books in the field, has offered this definition.
A distributed system is a collection of independent computers that appear to its users as one computer.
Now, that seems pretty obvious, it’s a bunch of computers working together, serving some goal, looking to the user like it’s just one thing. Digging a little deeper, he defines three characteristics that a distributed system has to have to be considered a distributed system.
- The first characteristic is that the computers operate concurrently. There are many computers, they’re all processing, they’re all doing their thing at the same time.
- The second characteristic is that they fail independently. Now, that might seem obvious, but we have to keep track of the fact and keep in mind, that these computers are gonna fail. We’re gonna have a big cluster of machines operating together, even if it’s not all that big, even if it’s just five computers, ten computers, those machines are gonna fail, and we’re not gonna know when. And we have to design a distributed system with that fact in mind. Those seem obvious enough, and they’re easy to kind of wrap our minds around.
- The third characteristic is that these computers don’t share a global clock. All of the activities that each one, the computing that each one of these machines does, is asynchronous with respect to the other machines in the system.
Let’s dig a little bit deeper into that. Let’s look at some examples of systems and try to decide whether they’re distributed or not. Amazon.com?
Easy enough!. That’s a distributed system. There are thousands, tens of thousands, of servers operating in data centers across the world that get the job of being amazon.com done. I go to my browser, and I have this nice HTML interface, it kinda looks like one computer to me, but there are probably hundreds of machines collaborating to answer each request that I make when I use that system. Amazon has even helpfully talked about some of the architecture of their whole network, some of the technology they’ve developed, they’ve published papers, and a lot of that has had a seminal influence on the development of some key distributive systems technologies in the last decade.
How about a Cassandra database? Cassandra is an example of a horizontally scalable, non-relational database that’s pretty popular these days. This is an example of distributed storage, Certainly a distributed system. It’s a cluster of machines that work together to present the abstraction of being a single database to clients that access the thing. So definitely a distributed system.
Now sometimes people get a little cute, and they say, well, how about my computer? It’s got a bunch of processors in it. It’s got the CPU, and there’s a processor for the USB controller, and there’s the GPU to do the graphics, and that’s certainly a very high-performance processor. And these are all operating concurrently, and they could fail independently. So how about that?
Your PC is an example of a system that’s not a distributed system, and when we look at why, this helps us understand Tannenbaum's third requirement of what a distributed system is, which is that global clock.
You can support me by buying a coffee ☕️ here https://www.buymeacoffee.com/asirihewage