Improving microservices reliability - part 1: Two Phase Commit

Hi everyone! Today I would like to talk a bit about how we can improve reliability between microservices. This is the first article of the series and we’ll be focusing on the Two-Phase-Commit technique.

It has been a while since my last article, this is the first one I write from Montreal. I moved here last November to work with a local Fintech Company, Fiska.

So, let’s suppose you’re working on a microservices architecture and you need to have them talk each other in some way. An isolated service is not that useful, we have to share some data with the rest of our system. Maybe you’re working on the next Amazon or Neflix, who knows!

<figcaption>just a bunch of microservices</figcaption></figure>

Let’s make a simple example. Imagine a user placing an order on an ecommerce. The Orders microservice receives the command, stores the data in its persistence layer and then needs to inform all the interested parties. For example a Notifications service that will send emails to the customer and the admins.

Of course we don’t want emails to be sent if the Orders microservice fails to process the command.

One possible approach is the famous Two-Phase-Commit (aka 2PC):

<figcaption>Two Phase Commit</figcaption></figure>

As you can easily guess, it’s a two-step process:

The Coordinator service asks all the participants if they are ready to commit the transaction. They can reply with a yes or no. Note that if a single service replies with a no ( or a timeout or any other error), the full transaction is automatically canceled.
If all the participants have answered yes, the Coordinator sends the Commit command to them and waits for the final ack.

Although functional, there are few drawbacks with this approach. First of all, it comes with an intrinsic performance penalty as we’re putting the all the actors on hold multiple times awaiting for an answer.

Secondly, in some cases it may be possible that other transactions triggered in between are paused till the whole process completes.

Moreover, if the Coordinator fails for some reason at the beginning of Phase 2, all the other services are left hanging in a limbo-state.

Don’t get me wrong, 2PC is a good approach, but as all the other tools in our belt we need to know when we it can be used. For example it’s extremely useful when replicating data among a cluster of db replica nodes.

So, going back to our ecommerce microservices, in the next article we’ll see how can we can leverage the Outbox Pattern to safely notify our services when an order has been placed.

Au-revoir!