Messaging systems are very useful when we are developing distributed applications. We can send and receive messages (aka events, aka records) asynchronously, making our application to scale better. Kafka and RabbitMQ are some examples of messaging systems.
In this kind of systems we usually find consumers and producers. When a producer sends a message to the messaging server (broker), it receives an ACK from the broker meaning that the message has been stored successfully. That's the green path. But in real life errors happen all the time. The way producers and consumers behave when an error occurs determines what messing semantics we can get.
Given this scenario:
- A producer sends a message to the messaging broker.
- Broker stores the message (to be delivered to the consumer.)
- Broker fails before sending ACK to the producer.
Imagine the producer sends the message again and again until it gets a successful ACK. In this case, the message will be duplicated many times, since the broker will store it multiple times. Producer acting this way ensures the message will be delivered at least once.
Perhaps, because of our application nature, message duplication is not an option. Maybe it's better for us to be sure the message will be delivered at most once, but no more (that includes no times.) Many times (most of them, hopefully), producer will send the message and the broker will store it and will send back the ACK to the producer. Everything ok. But when that does not happen, message will not be delivered. Why? Because producer will not retry.
In the same scenario as before, imagine this time the producer does not retry sending the message. In this case, message will be lost.
Exacly-once delivery is the messaging semantics everyone wants, obviously. But it is not easy to implement.
Currently, exactly-once delivery is only possible within the scope of Kafka Streams. Kafka does it combining two features: message idempotency and transactions (atomic writes between multiple partitions.)
As we said before, the producer is not able to be sure whether the message has been stored or not in the broker, due communication errors. That means it can't do much more than retry sending the messages if ACKs are not received, like in the at-least one semantics. But, how can message duplication be avoided? Let's see how Apache Kafka does it.
To avoid message duplication, producer sends a unique sequence number so the broker persists the message just once. Message sending is idempotent. Also, with this feature, Kafka ensures the messages to be delivered in order. (By default, in Kafka, this feature is disabled because it requires much more performance. You can enable it with
enable.idempotence=true in your producer config.)
Messaging is a really complex topic in distributed systems and it has to be used carefully. Also, when using messaging systems, you should keep in mind side-effects and write idempotent code as much as possible.