Implementing a distributed Workflow Engine using RabbitMQ
Why use a messaging framework
Many thriving web 2.0/3.0 companies have built their success around workflow engines and messaging frameworks, e.g. Airbnb’s Airflow, LinkedIn’s Kafka and Netflix’s Meson. The cornerstone of any enterprise-grade workflow engine is the use of a messaging framework that allows for distributed execution and ensures guaranteed message delivery.
A workflow engine encompasses various software components serving the purpose of automatically executing tasks based on user-definable business processes. Generally speaking, workflows consist of smaller tasks called activities, which form a set of atomic and re-usable building blocks performing specific actions. Activities cannot be arbitrarily chained together, as some of them depend on specific data or a predefined state in order to complete their task.
So far, so good, no need for messaging frameworks yet… The true complexity is hidden behind the requirements that not only the workflow engine must be fail-safe but also support distributed execution in order to separate the workflow-orchestration from any heavy-duty computation that might max out the host’s resources. You might argue that this also could be done through web services, but just think about the hassle of ensuring that activities are reliably executed after a system failure… Here you go, just to name a few:
- Persist the event in a backing store and save all the accompanying data when emitting an event
- Delete the data once the successful delivery has been confirmed by the receiver(s)
- Retry to deliver all messages after a system failure recovery
- …
By now, I am sure you are also inclined to willingly offload these tasks to a tried and tested messaging framework that gives you all these features for free. This blog post will focus on the usage of RabbitMQ as messaging framework and address two aforementioned issues when implementing a distributed workflow engine:
- How to connect compatible activities using topic exchanges and routing keys
- How to distribute the RabbitMQ broker to increase reliability
Connecting Activities with RabbitMQ
Let’s have a look at how activities within a workflow can trigger each other using a topic exchange and adequate routing keys. But first let’s briefly recap how topic exchanges work:

https://www.rabbitmq.com/tutorials/tutorial-five-dotnet.html
Example:
- Publisher P sends a message with routing key key_1.key_2.rabbit to the exchange X. Note: The routing key should contain keywords describing the message’s content.
- Since consumer C2 has bound its queue to messages matching the routing key *.*.rabbit, the message sent by publisher P will be routed to Q2. Note the use of wildcards to allow any key at the corresponding positions.
Now, let’s have a look at how the same concept can be used to make Activity A1 selectively trigger A2 or A3.
Example:
- A1 sends a message of type MsgType1 with routing key A1.MsgType1. Since A2 has bound its queue to messages with the exact same key, it will be the sole receiver of the message. Note the use of the ID of the sending activity in the routing key to identifiy the issuer of the message.
- A1 sends a message of type MsgType2 with routing key A1.MsgType2. Since A3 has bound its queue to messages of type MsgType2 from any activity, it will be the unique receiver of the message.
As you surely figured out already, the trick is to use a unique activity identifier as routing key. The basic example above can be refined by adding more structure to the routing key, e.g {Workflow ID}.{Activity ID}.{Message Type}. The additional workflow key would separate all messages based on which workflow they belong to. Note that instead of extending the routing key, you could also use distinct exchanges (X) to properly route the messages. The question then is what discriminative criteria to use for exchanges: Message Type? Workflow ID? Both are valid choices, though there is an anti-pattern that should be avoided when designing exchanges and routing keys: Don’t distribute unwanted messages that get filtered out of the queue by the consumer.
As you can see in the example above, basic remoting capabilities come out of the box since RabbitMQ-clients can connect to a remote message broker instance and interact with it as if it were running on the same machine.
Distributed RabbitMQ Broker
Although very easy to implement, the approach above reveals serious restrictions when considering that Machine 1 hosting the message broker could possibly fail and leave the workflow engine hosted on Machine 2 completely idle until the message queues are available again. To remediate this limitation, why not run replicated message broker instances on Machine 1 and Machine 2? Sure, but how can be keep the queue states in sync? RabbitMq supports different types of distribution (source):
- Clustering: If you are aiming at increasing performance and availability, you can connect multiple machines running the same Erlang and RabbitMQ versions and let them form a single logical broker. Queues can either be located on a single node or mirrored across multiple nodes. The connections between the machines are expected to be reliable, hence the machines should ideally be located at a single site.
- Federation: If you are planning to connect two remote message brokers across the internet, you can use federated exchanges and queues. Built-in logic takes care of analyzing whether transmitting a message is really required. When no queue has subscribed to a specific type of message, there is no need to transfer it.
- The Shovel: If you need more control over the decision logic that governs the message distribution in Federation-mode, you can go for the Shovel-mechanism, which basically allows you to forward every message published in a specific source queue to a destination exchange hosted in another broker.
Summary
As unremarkable as they may seem, message queues are true workhorses when it comes to enabling reliable distributed computing. RabbitMQ is just one good example among many equally good alternatives that you might come across in the wild.