Poyters
21 min read

Trade offs for distribiuted architectures

Book

In this blog post, inspired by “Software Architecture: The Hard Parts: Modern Trade-Off Analyses for Distributed Architectures”, we will delve into the critical considerations and trade-offs involved in making architectural decisions for distributed systems. By examining concepts such as architectural quanta and the nuances of static and dynamic coupling, we aim to provide insights into how to balance these trade-offs effectively.


Architectural Quanta

Architectural quanta are the smallest units of deployment in a system. These units encapsulate a part of the system’s functionality and can be independently deployed and scaled. Quanta are critical in distributed architectures as they define the boundaries of deployment, change, and scalability. In simpler terms, they represent the atomic pieces of your system that can be managed and modified independently.

An architectural quantum is characterized by a high degree of functional cohesion and a high degree of synchronous dynamic coupling.

Example of Architectural Quanta

Imagine a microservices-based e-commerce application. Each service in this application, such as the Order Service, Payment Service, and Inventory Service, can be considered an architectural quanta. These services can be independently deployed, updated, and scaled without affecting the others.

In monolith architecture, the whole project is a single quanta, because it can’t be divided into smaller parts and be deployed independently.


Static and dynamic coupling

Static coupling describes how services are related to each other, while dynamic coupling determines how services call each other.

Static Coupling

Static coupling refers to the dependencies that exist between components at compile time. This type of coupling is often seen in tightly integrated systems where changes in one component require changes in dependent components. High static coupling can make the system rigid and challenging to evolve.

In static coupling, without the required data, the service cannot work.

Example of Static Coupling

Consider a monolithic application where different modules directly invoke each other’s methods. If the Order Module directly calls the Payment Module and expects a specific interface, any change in the Payment Module’s interface will necessitate changes in the Order Module.

Dynamic Coupling

Dynamic coupling determines how services call each other at runtime, often seen in distributed systems where components interact through network calls or messaging. This type of coupling can introduce flexibility but also complexity, as it involves network latency, failures, and retries.

Example of Dynamic Coupling

In a microservices architecture, services communicate over HTTP or messaging queues. For instance, the Order Service may send a message to the Payment Service to process a payment. The Payment Service can be changed or replaced without affecting the Order Service, provided the message format remains consistent.

Message

Data ownership and consistency

In distributed architectures, one of the biggest challenges is data ownership and consistency.

Unlike monolithic systems where a single centralized database can enforce ACID (Atomicity, Consistency, Isolation, Durability) guarantees and ensure that all data is immediately consistent and up-to-date for every transaction, distributed systems typically relax these constraints to achieve higher scalability and fault tolerance.

This shift leads to BASE (Basically Available, Soft state, Eventual consistency) properties. In other words, distributed data will not always be consistent immediately, but it will converge to a consistent state over time.

Why can’t distributed systems be fully ACID?

The main reason lies in the CAP theorem (Consistency, Availability, Partition tolerance), also known as Brewer’s theorem.

It was first proposed by Eric Brewer in 2000 and later formally proven by Gilbert and Lynch (2002).

“In the presence of a network partition, a distributed system can provide only two of the following three guarantees: Consistency, Availability, and Partition tolerance.”

According to CAP:

In a distributed system, partition tolerance is non-negotiable - network failures are inevitable. That means you must trade off between consistency and availability. Choosing availability often leads to eventual consistency rather than strict ACID guarantees.

Practical implications

Example

In an e-commerce system, when a customer places an order, the Order Service marks it as confirmed immediately. However, the Inventory Service and Shipping Service may only update after processing events from a queue. For a brief period, the product may appear as still in stock even though it has been ordered. Eventually, all services converge to the correct state.


Patterns of eventual consistency

So, just as I mentioned: in distributed systems strict ACID is impossible to achieve due to CAP theorem. Instead, systems adopt the BASE model, where data may not be immediately consistent across all services but will eventually converge to a correct state.

There are three common architectural patterns for handling eventual consistency:

  1. Task-based synchronization with orchestration - a service coordinates updates by directly invoking others (HTTP/RPC).
  2. Event-based synchronization - services publish and consume events asynchronously through a message broker or similar.
  3. Background synchronization - services periodically reconcile their state with a source of truth using scheduled jobs or similar.

The following sections illustrate each pattern with examples, trade-offs, and use cases.

Task-based synchronization with orchestration

In this approach, the Order Service directly calls the Inventory Service and the Shipping Service via HTTP or RPC.
The goal is to achieve faster, near-immediate consistency compared to event-driven communication.

However, this introduces a trade-off:

This approach mimics the benefits of ACID transactions in a monolithic system but is much harder to guarantee in a distributed architecture.

Embedded orchestration (one service acts as the orchestrator)

Advantages:

Drawbacks:

Place Order
HTTP/RPC call
Update
HTTP/RPC call
Create Shipment
Update Stock

Dedicated orchestrator service

Advantages:

Drawbacks:

Place Order
Update
HTTP/RPC call
HTTP/RPC call
HTTP/RPC call
Create Shipment
Create Order
Update Stock

Event-based synchronization

In this approach, the Order Service does not call other services directly.
Instead, it publishes an event (e.g., OrderPlaced) to a Message Broker such as Kafka or RabbitMQ.

Other services, like the Inventory Service and Shipping Service, subscribe to these events.
When they receive the OrderPlaced event, they update their own databases and trigger any follow-up actions.

This design is known as event-driven architecture.

Benefits

Drawbacks

This pattern trades immediate consistency for higher scalability and fault tolerance, which is often acceptable in modern distributed e-commerce systems.

Place Order
Publish: OrderPlaced
Consume: OrderPlaced
Consume: OrderPlaced

Background synchronization

In this approach, services achieve consistency through scheduled background jobs rather than immediate calls or event-driven updates.
Each service periodically reconciles its state with a source of truth or with other services.

Typical implementations involve:

Benefits

Drawbacks

This pattern is suitable when absolute real-time accuracy is not critical,
for example: updating product search indexes, analytics, or syncing data to reporting databases.

Place Order
Save Locally
(Scheduled Sync Job)
(Scheduled Sync Job)

Data availability patterns

In distributed systems, a common challenge arises when one service needs data from another.
If the Order Service continuously queries the Inventory Service for product availability, it creates several issues:

HTTP request

Using a Cache

A common solution is to introduce a cache to reduce direct dependencies between services.
Instead of querying the Inventory Service for every request, the Order Service keeps a copy of the required data.

This cache can be:

Variant 1: Local Cache (per service instance)

Each instance of Order Service keeps its own cache in memory or a small local store.
This improves latency but each instance may briefly hold slightly different data.

Benefits

Drawbacks

Lookup
Sync / Refresh

Variant 2: Shared Distributed Cache

Multiple Order Service instances rely on a common cache (e.g., Redis, Memcached).
All cache updates are visible to every instance, which improves consistency across the cluster.

Benefits

Drawbacks

Sync / Refresh

Columnar Replication

Instead of replicating the full dataset, a service may only copy the specific attributes (columns) it needs from another service’s database.
For example, the Order Service may only replicate the stock_level column from the Inventory DB, rather than the full inventory table.

This reduces data duplication and keeps the local dataset smaller, while still avoiding constant cross-service queries.

Benefits

Drawbacks

Replicated column: stock_level
Source of truth

Data Domain Pattern

When applying the Data Domain Pattern, each service owns its own data and acts as the single source of truth.
However, other services still need access to that data, and there are several ways to provide it.

In some cases, using a shared database is the most practical solution:

Advantages

Drawbacks


Saga Pattern

In distributed systems, when a business process spans multiple services (e.g., placing an order, charging a payment, reserving inventory, and arranging shipping), a single ACID transaction across all services is not possible.

The Saga pattern addresses this challenge by breaking down the workflow into a sequence of local transactions, each performed by a single service.
If one step fails, the system performs compensating transactions to undo the work of the previous steps, ensuring eventual consistency.

There are two common approaches to implementing a Saga:

Benefits

Drawbacks

Orchestration vs Choreography

There are two main ways to implement sagas in distributed systems:

Orchestration

A central orchestrator service controls the saga. It tells each service what to do next and handles compensations if one step fails.

Advantages

Drawbacks

Place Order
Call: Reserve Stock
Call: Process Payment
Call: Arrange Shipment

Choreography

There is no central orchestrator. Each service publishes events and subscribes to events from others. The saga emerges from this chain of reactions.

Advantages

Drawbacks

Place Order
Publish: OrderPlaced
Consume: OrderPlaced → Reserve Stock
Consume: OrderPlaced → Process Payment
Publish: ShipmentRequested

Additional resources