Reactive Microservices — 5

Kapil Raina
9 min readJun 16, 2021

This is an 8 part series on reactive microservices. [1 , 2 , 3, 4, 5, 6, 7, 8]

Part 2 , 3 & 4 of this services listed reactive microservices patterns. This part continues to list more patterns.

Consistency & Availability

Distributed architecture would need services across data centers or across network in same datacenter or across links in same server, to work together to meaningfully process a request. Information would need to travel and thus has inherent latency. So, state in different services/nodes may not be same i.e. state would not be consistent. Moreover, during the time information takes to travel between services, the state in original sender may have changed. So, in reality receiver always deals with stale state in distributed systems.

Strong Consistency: A system is considered strongly consistent if all nodes(members) of the system have agreed to the state before it becomes available to observers. Strong consistency means no observer sees stale data.

Eventual Consistency: Eventually consistent state means that nodes(members) of system would eventually see the same value of a given piece of data in absence of more updates.

Traditional monolithic architecture was based on strong consistency with tools like ACID transactions which blocked a service until all nodes accepted the update. In a distributed system however, strong consistency limits scalability, availability, and throughput.

Reactive Systems inherently must deal with eventual consistency in many places. Strong consistency may also be needed in places, for example while processing commands within the microservice. Such need of strong consistency should be analyzed in the domain.

The event driven nature introduces eventually consistency in the reactive microservices.

CQRS & Event Carried State Transfer uses events to update the state across distributed components, so that state update in eventually consistent. In CQRS while the write model can be strongly consistent, the read model is always eventually consistent. Asynchronous Integration — Event Driven

Availability: System is considered available, if it remains responsive despite any failures.

CAP Theorem states that in event of a network partition(P), system can provide one of the two: Consistency or Availability. The original theorem says that systems can pick two among the three: Consistency, Availability & Network Partition. Since network partition cannot be a choice and would happen eventually, we assume that to be a fact in case of distributed systems.

CAP

AP: Sacrifice consistency. Allow writes on both sides or partition and merge them data later when partition is fixed. Observers would see different view of the data depending on where they are querying.

CP: Sacrifice availability. Turn off the part of system to accept state change requests, till partition is restored. No new data is accepted by the system, so there is no consistency issue.

Given there would always be unreliable network, and for example if system continues to allow modification of data during network partition via parts of system which are reachable, then data would be in inconsistent state between parts of system and the only way to avoid that would be to stop accepting modification requests, which would impact availability. Reactive microservices are based in responsiveness i.e. availability and thus impacting availability may seem to be anti-reactive. However, architecture should be able to balance out the availability and consistency in different parts of the application and to make an informed choice i.e. AP or CP.

Scalability has a strong equation with CAP theorem. Choosing consistency or availability at scale is a choice and there are tools that attempt to provide solution for these.

With respect to scalability, strong consistency needs coherence among all nodes also blocking of processing. Strong consistency introduces contention in the database due to locks. This hampers the ability to scale (Scalability). Tools like Akka also provide the ability to shard the entities and provide strong consistency within a single shard, thus providing ACID type guarantees, allowing shards to scale as well. (https://doc.akka.io/docs/akka/2.5/cluster-sharding.html?language=java), ( https://www.lagomframework.com/documentation/1.4.x/scala/PersistentEntity.html).Sharding doesn’t eliminate contention arising from strong consistency, it only isolates it for each entity.

Choice of high availability at scale sacrifices consistency and this would seem like a default choice for reactive microservices architecture since availability directly impacts the core reactive tenant of responsiveness. Databases usually provide replication between the cluster nodes or replicas. CRDT is a new technique to enable eventual consistency & high availability of entity/aggregate state. As opposite to sharding, CRDT technique stores entities over many replicas and update to one replica is merged up eventually in other replicas via CRDT data types. CRDT data types are designed to be merged mathematically.( https://www.lightbend.com/blog/concurrent-sharing-of-data-in-motion-across-clusters-with-crdts-in-akka-distributed-data)

Choosing between availability and consistency in a reactive microservice solution is not a technology decision, it is a business decision. Different services of the application may need different availability or consistency guarantees. Choosing one over the other has consequences which needs a conversation between designers and domain experts. The choice affects the user experience and journey, so CAP theorem is not something hiding some technical limitations in the background. It is a fact of life in distributed systems and should be part of a domain concern. Ref: Reactive Systems & Domain Driven Design

“NOW”

Information is always from the past….

Individual microservice can deal in strong consistency and traditional ACID characteristics of transactions. However, across the boundaries of microservices, where the information travels between servers isolated in space, the inherent lag in information is the reality of any distributed system including reactive microservices. Messages (Events & Commands) between services cannot be universally (system-wide) treated as “Now” messages in all microservices. While within a microservices the programs can assume a consistent notion of time (i.e. define consistency/transaction boundaries within the microservice), across the microservices there is not a universal “NOW”. Reactive microservice recognizes this Isolation of Time as a property of events and do not make any attempts to bind them to the “Local” consistent time.

NOW

Modelling Time in reactive microservices in a way that there is no universal expectation of NOW, manifests itself in interesting ways since there is basic “Isolation in Time” tenant. For example:

  • A long running transaction doesn’t get initiated and executed at one point in time. A step in a saga transaction can only assume its own NOW.
  • Event Notification: Event carries its timestamp on when it was originated. The subscriber cannot assume that start of event processing is the event time as well.
  • CQRS: The query part of CQRS design needs to know the current state of data as per its own local present and make it explicit to the consumer.

Integration and Communication

The introduction section has highlighted the shift in thinking communication in distributed microservices.

Jonas Bonér

The async communication of reactive microservices can be achieved by the known integration patterns or by the new ones that allow stream processing. The choice of integration stack depends on the nature of service, but all of them support async non-blocking communication. Using async messages eliminates the tight coupling that the sync communication introduces during unexpected spikes and peak loads in the client system which would cascade the load to the other system.

  • Sync Point-to-Point: In reactive microservices, async, non-blocking messages is the backbone of communication. Synchronous calls can be used, but with an altered expectation to return immediately. Synchronous integration requires sender and receiver to be available at the same time, thus coupling them in time. Examples:

Synchronous response could be a confirmation that a request has been accepted and would be processed later(asynchronously).

Return references like acknowledgement numbers or transaction ids for a long running transaction. An http response to a browser for example, can be a location header signaling the indicating where the result of the processing can be queried from.

In DDD terminology, a command that is issued to a known service can be synchronous with a simple acknowledgement back as a response. Queries are usually synchronous.

Sync messages should be used only in applicable domain requirements, so should be used sparingly.

  • Async Point-to-Point: Async Point to point communication between services (i.e. between a client and a stable address of the service) is apt in some cases. Async integration doesn’t require sender and receiver to be available at the same time, thus decoupling them in time.

Both Commands-with-immediate-sync-response and Queries can be built as async point-to-point.

Completable Future or async REST enable async point-to-point communication.

The thread usage in this case would improve in comparison to sync point2point. But since it is not a completely decoupled model, the amount of gains using this would be limited after a point.

  • Async Pub-Sub: Use a messaging system with a publish and subscribe model where the service has predefined contract to publish to certain topics and listen to others. This allows publishing microservice to put messages in topics that the consuming microservice(s) can pick up from. Examples of messaging backbones: JMS MQ, AMQP, Apache Kafka, Amazon Kinesis. Advanced cases that need advanced routing, data transformation & enrichment can use Apache Camel, stream processors like Apache Spark or Akka Streams. Ideal for Event communication.

Async pub-sub eliminates blocking issues completely and decouples the services, who don’t have to know about or wait for the other service.

  • Reactive Streams: Using fully asynchronous, back-pressured, real-time streaming (http://www.reactive-streams.org/). RXJava, Apache Spark Streaming & Akka Streams are reactive specification compatible streaming products. Ideal for Event communication.

Transaction Co-ordination — Sagas

ACID transactions require strong consistency which in turn essentially needs synchronization between systems. This co-ordination is very expensive and an antipattern in reactive microservices. Co-ordination puts cap on the scalability & availability.

While the autonomy in the service design is an excellent way to avoid co-ordination, there are still cases practically where co-ordination is needed between services. This should be a minimal case instead of being a starting point.

The reactive system communication is asynchronous, so it is obvious that the transaction co-ordination must be achieved similarly by foregoing the synchronized co-ordination process. The transactions are managed via multiple aspects that work in a complementary fashion:

  • Since co-ordination is not an option, the service assumes that the other part of system will hold good during processing. If it does not, then the service simply detects that and tries to invoke a compensating action. This works in the same way across multiple service boundaries.
  • The event-driven services that work with message passing enables this async co-ordination with two types of messages:

Command: an action that initiates changes that affect the state of the entities in the service.

Event: facts that a command has successfully completed and that the “next-action” may start.

The chaining of services to complete a long running transaction is achieved by patterns like Saga. A Saga is aware about the steps involved in long running transaction and if a failure is detected in any service, then all previous service’s compensating action for that transaction is invoked by saga. Saga model transactions into a finite state machine. Sagas can be implemented as “Event-Choreography” wherein service events acts as trigger and terminator for saga or as “Transaction-Coordinator”. Example below depicts coordinator pattern.

Ref: https://blog.couchbase.com/saga-pattern-implement-business-transactions-using-microservices-part-2/

Back Pressure

Back-pressure was mentioned earlier in context of the reactive streams but is an important pattern to be mentioned on its own. Applying backpressure in the consumer regulates the pace of consumption at the discretion of the consumer. This can be in stream processing or reading database query results or any producer-consumer scenario. Traditionally the speed mismatch between producer and consumer results in threads blocking, which creates contention within the application thread pool.

This is an 8 part series on reactive microservices. [1 , 2 , 3, 4, 5, 6, 7, 8]

--

--

Kapil Raina

Tech Architect | Unpredictably Curious | Deep Breathing