I think this is indicative of the fact that many problems in software development don't have perfect answers. It is often a balance of pro's and con's and sometimes the weighting applied to them varies over time and so we flip between different competing approaches.
One of these areas relates to whether it is better to centralise or distribute. Should our applications be monolithic in nature or a collection of loosely coupled distributed parts. For a long time the argument as been seen to be won by distributed computing and the microservices approach. However in recent times the monolith approach has started to be seen as not always being the wrong idea.
This article shouldn't be seen as an argument against a distributed approach, it should be viewed more as an argument against premature optimisation. By understanding the drawbacks of distribution you can make a better judgement on whether it's the right approach for your application right now.
Interconnected Services
Distributed computing is a relatively broad term, but within the context of this article we are taking it to mean the pattern of dividing an application up into a collection of microservices.
Usually microservices are built to align to a single business domain or process with the application being the sum of these parts communicating via a lightweight protocol such as RESTful APIs, or increasingly via an event driven architecture.
You can see from this that the term microservice is quite loosely defined, a lot of the issues that are created when applying this approach can often be traced back to the fact that defining the division between microservices and their relative size is quite a hard problem.
The best explanation I've seen for this is that a microservice should be as easy to replace as to refactor, basically meaning microservices shouldn't be so large as to negate the option of starting again with their design.
I think this idea is much easier to apply when starting with a blank sheet of paper. When splitting up an existing application it is often more pragmatic to not subdivide too quickly, as further splitting an existing service is often easier than trying to coalesce several services back into one once they've been split.
Fallacies of Distributed Computing
In 1994 L. Peter Deutsch at Sun Microsystems devised a list of seven fallacies of distributed computing building on top of the work of Bill Joy and Dave Lyon.
The fallacies represent seven assumptions that often cause the architecture and development of a distributed system to head in the wrong direction.
The first is that the Network is Reliable, this often leads to services not being written with network related error handling in mind, meaning when network errors do occur services often stall and become stuck consuming resources while waiting for a response that isn't forthcoming.
The second and third are related in that Latency is Zero and Bandwidth is Infinite can both cause developers to give little thought to the nature of the data that is propagating through the network.
Number four is that the Network is Secure, which can lead to a complacency where possible intrusion from malicious insiders isn't considered.
Number five is that Network Topology Doesn't Change, which in a similar way to two and three are indicative of us not thinking about the fact that the network our applications operate in is a dynamic element in the same way as our code.
Number six is There is One Administrator, this can cause us to fail recognise inconsistent or contradictory policies around network traffic and routing.
Number seven is that Transport Cost is Zero, here we need to factor into our thinking that an API call and the resultant transfer of data has a cost in terms transmission time.
Strategies to Compensate
The fallacies described in the previous section shouldn't be seen as arguments for why distributed systems shouldn't be built, they are things that should be considered when they are.
We can often think that our software is deployed into an homogeneous environment with perfect conditions, but this is often not the case.
Errors in transport can occur so we should have an affective strategy to detect these errors, retry calls if we believe this may lead to a successful outcome, but also have a strategy for when calls continue to fail such as a circuit breaker to avoid filling the network with requests that are unlikely to received the desired response.
We must realise that as load increases on our system the size of the data we are passing between elements may start to be a factor in their performance. Even if each individual request/response is not large in sufficient quantities their impact may be felt.
We have to maintain a healthy distrust of the elements of our system we are interacting with. A zero trust approach means we do not inherently trust any element simply because it is inside the network, all elements must properly authenticate and be authorized.
We must also consider that when we subdivide our system into more elements that we are introducing a cost in the fact those elements will need to communicate, this cost must be balanced with the benefit the change in architecture would bring.
These are only some of the things we need to think about with a distributed approach. This post is too short to cover them in great detail, but the main takeaway should be that a distributed approach isn't cost free and sometimes it might not offer advantages over a monolithic approach. Getting a distributed approach right is hard and not an exact science, many things need to be considered and some missteps along the way should be expected.
As with any engineering decision its not right or wrong its a grey area where pro's and con's must be balanced.