
In this blog post, inspired by “Software Architecture: The Hard Parts: Modern Trade-Off Analyses for Distributed Architectures”, we will delve into the critical considerations and trade-offs involved in making architectural decisions for distributed systems. By examining concepts such as architectural quanta and the nuances of static and dynamic coupling, we aim to provide insights into how to balance these trade-offs effectively.
Architectural quanta are the smallest units of deployment in a system. These units encapsulate a part of the system’s functionality and can be independently deployed and scaled. Quanta are critical in distributed architectures as they define the boundaries of deployment, change, and scalability. In simpler terms, they represent the atomic pieces of your system that can be managed and modified independently.
An architectural quantum is characterized by a high degree of functional cohesion and a high degree of synchronous dynamic coupling.
Imagine a microservices-based e-commerce application. Each service in this application, such as the Order Service, Payment Service, and Inventory Service, can be considered an architectural quanta. These services can be independently deployed, updated, and scaled without affecting the others.
In monolith architecture, the whole project is a single quanta, because it can’t be divided into smaller parts and be deployed independently.
Static coupling describes how services are related to each other, while dynamic coupling determines how services call each other.
Static coupling refers to the dependencies that exist between components at compile time. This type of coupling is often seen in tightly integrated systems where changes in one component require changes in dependent components. High static coupling can make the system rigid and challenging to evolve.
In static coupling, without the required data, the service cannot work.
Consider a monolithic application where different modules directly invoke each other’s methods. If the Order Module directly calls the Payment Module and expects a specific interface, any change in the Payment Module’s interface will necessitate changes in the Order Module.
Dynamic coupling determines how services call each other at runtime, often seen in distributed systems where components interact through network calls or messaging. This type of coupling can introduce flexibility but also complexity, as it involves network latency, failures, and retries.
In a microservices architecture, services communicate over HTTP or messaging queues. For instance, the Order Service may send a message to the Payment Service to process a payment. The Payment Service can be changed or replaced without affecting the Order Service, provided the message format remains consistent.
In distributed architectures, one of the biggest challenges is data ownership and consistency.
Unlike monolithic systems where a single centralized database can enforce ACID (Atomicity, Consistency, Isolation, Durability) guarantees and ensure that all data is immediately consistent and up-to-date for every transaction, distributed systems typically relax these constraints to achieve higher scalability and fault tolerance.
This shift leads to BASE (Basically Available, Soft state, Eventual consistency) properties. In other words, distributed data will not always be consistent immediately, but it will converge to a consistent state over time.
The main reason lies in the CAP theorem (Consistency, Availability, Partition tolerance), also known as Brewer’s theorem.
It was first proposed by Eric Brewer in 2000 and later formally proven by Gilbert and Lynch (2002).
“In the presence of a network partition, a distributed system can provide only two of the following three guarantees: Consistency, Availability, and Partition tolerance.”
- Brewer’s CAP theorem
According to CAP:
In a distributed system, partition tolerance is non-negotiable - network failures are inevitable. That means you must trade off between consistency and availability. Choosing availability often leads to eventual consistency rather than strict ACID guarantees.
In an e-commerce system, when a customer places an order, the Order Service marks it as confirmed immediately. However, the Inventory Service and Shipping Service may only update after processing events from a queue. For a brief period, the product may appear as still in stock even though it has been ordered. Eventually, all services converge to the correct state.
So, just as I mentioned: in distributed systems strict ACID is impossible to achieve due to CAP theorem. Instead, systems adopt the BASE model, where data may not be immediately consistent across all services but will eventually converge to a correct state.
There are three common architectural patterns for handling eventual consistency:
message broker or similar.The following sections illustrate each pattern with examples, trade-offs, and use cases.
In this approach, the Order Service directly calls the Inventory Service and the Shipping Service via HTTP or RPC.
The goal is to achieve faster, near-immediate consistency compared to event-driven communication.
However, this introduces a trade-off:
This approach mimics the benefits of ACID transactions in a monolithic system but is much harder to guarantee in a distributed architecture.
Advantages:
Drawbacks:
Advantages:
Drawbacks:
In this approach, the Order Service does not call other services directly.
Instead, it publishes an event (e.g., OrderPlaced) to a Message Broker such as Kafka or RabbitMQ.
Other services, like the Inventory Service and Shipping Service, subscribe to these events.
When they receive the OrderPlaced event, they update their own databases and trigger any follow-up actions.
This design is known as event-driven architecture.
This pattern trades immediate consistency for higher scalability and fault tolerance, which is often acceptable in modern distributed e-commerce systems.
In this approach, services achieve consistency through scheduled background jobs rather than immediate calls or event-driven updates.
Each service periodically reconciles its state with a source of truth or with other services.
Typical implementations involve:
This pattern is suitable when absolute real-time accuracy is not critical,
for example: updating product search indexes, analytics, or syncing data to reporting databases.
In distributed systems, a common challenge arises when one service needs data from another.
If the Order Service continuously queries the Inventory Service for product availability, it creates several issues:
A common solution is to introduce a cache to reduce direct dependencies between services.
Instead of querying the Inventory Service for every request, the Order Service keeps a copy of the required data.
This cache can be:
StockUpdated).Each instance of Order Service keeps its own cache in memory or a small local store.
This improves latency but each instance may briefly hold slightly different data.
Benefits
Drawbacks
Multiple Order Service instances rely on a common cache (e.g., Redis, Memcached).
All cache updates are visible to every instance, which improves consistency across the cluster.
Benefits
Drawbacks
Instead of replicating the full dataset, a service may only copy the specific attributes (columns) it needs from another service’s database.
For example, the Order Service may only replicate the stock_level column from the Inventory DB, rather than the full inventory table.
This reduces data duplication and keeps the local dataset smaller, while still avoiding constant cross-service queries.
When applying the Data Domain Pattern, each service owns its own data and acts as the single source of truth.
However, other services still need access to that data, and there are several ways to provide it.
In some cases, using a shared database is the most practical solution:
In distributed systems, when a business process spans multiple services (e.g., placing an order, charging a payment, reserving inventory, and arranging shipping), a single ACID transaction across all services is not possible.
The Saga pattern addresses this challenge by breaking down the workflow into a sequence of local transactions, each performed by a single service.
If one step fails, the system performs compensating transactions to undo the work of the previous steps, ensuring eventual consistency.
There are two common approaches to implementing a Saga:
There are two main ways to implement sagas in distributed systems:
A central orchestrator service controls the saga. It tells each service what to do next and handles compensations if one step fails.
Advantages
Drawbacks
single point of failure.There is no central orchestrator. Each service publishes events and subscribes to events from others. The saga emerges from this chain of reactions.
Advantages
event-driven systems.Drawbacks