Fix replication of SLAVEOF inside transaction.

In Redis 4.0 replication, with the introduction of PSYNC2, masters and slaves replicate commands to cascading slaves and to the replication backlog itself in a different way compared to the past. Masters actually replicate the effects of client commands. Slaves just propagate what they receive from masters. This mechanism can cause problems when the configuration of an instance is changed from master to slave inside a transaction. For instance we could send to a master instance the following sequence: MULTI SLAVEOF 127.0.0.1 0 EXEC SLAVEOF NO ONE Before the fixes in this commit, the MULTI command used to be propagated into the replication backlog, however after the SLAVEOF command the instance is a slave, so the EXEC implementation failed to also propagate the EXEC command. When the slaves of the above instance reconnected, they were incrementally synchronized just sending a "MULTI". This put the master client (in the slaves) into MULTI state, breaking the replication. Notably even Redis Sentinel uses the above approach in order to guarantee that configuration changes are always performed together with rewrites of the configuration and with clients disconnection. Sentiel does: MULTI SLAVEOF ... CONFIG REWRITE CLIENT KILL TYPE normal EXEC So this was a really problematic issue. However even with the fix in this commit, that will add the final EXEC to the replication stream in case the instance was switched from master to slave during the transaction, the result would be to increment the slave replication offset, so a successive reconnection with the new master, will not permit a successful partial resynchronization: no way the new master can provide us with the backlog needed, we incremented our offset to a value that the new master cannot have. However the EXEC implementation waits to emit the MULTI, so that if the commands inside the transaction actually do not need to be replicated, no commands propagation happens at all. From multi.c: if (!must_propagate && !(c->cmd->flags & (CMD_READONLY|CMD_ADMIN))) { execCommandPropagateMulti(c); must_propagate = 1; } The above code is already modified by this commit you are reading. Now also ADMIN commands do not trigger the emission of MULTI. It is actually not clear why we do not just check for CMD_WRITE... Probably I wrote it this way in order to make the code more reliable: better to over-emit MULTI than not emitting it in time. So this commit should indeed fix issue #3836 (verified), however it looks like some reconsideration of this code path is needed in the long term. BONUS POINT: The reverse bug. Even in a read only slave "B", in a replication setup like: A -> B -> C There are commands without the READONLY nor the ADMIN flag, that are also not flagged as WRITE commands. An example is just the PING command. So if we send B the following sequence: MULTI PING SLAVEOF NO ONE EXEC The result will be the reverse bug, where only EXEC is emitted, but not the previous MULTI. However this apparently does not create problems in practice but it is yet another acknowledge of the fact some work is needed here in order to make this code path less surprising. Note that there are many different approaches we could follow. For instance MULTI/EXEC blocks containing administrative commands may be allowed ONLY if all the commands are administrative ones, otherwise they could be denined. When allowed, the commands could simply never be replicated at all.
2024-11-22 00:52:38 +00:00 · 2017-07-12 11:07:28 +02:00 · 2017-07-12 11:07:28 +02:00 · 87aabb1afa
commit 87aabb1afa
parent 44f89d1d98
2 changed files with 19 additions and 3 deletions
--- a/src/multi.c
+++ b/src/multi.c
@ -117,6 +117,7 @@ void execCommand(client *c) {
    int orig_argc;
    struct redisCommand *orig_cmd;
    int must_propagate = 0; /* Need to propagate MULTI/EXEC to AOF / slaves? */
+    int was_master = server.masterhost == NULL;

    if (!(c->flags & CLIENT_MULTI)) {
        addReplyError(c,"EXEC without MULTI");
@ -147,11 +148,12 @@ void execCommand(client *c) {
        c->argv = c->mstate.commands[j].argv;
        c->cmd = c->mstate.commands[j].cmd;

-        /* Propagate a MULTI request once we encounter the first write op.
+        /* Propagate a MULTI request once we encounter the first command which
+         * is not readonly nor an administrative one.
         * This way we'll deliver the MULTI/..../EXEC block as a whole and
         * both the AOF and the replication link will have the same consistency
         * and atomicity guarantees. */
-        if (!must_propagate && !(c->cmd->flags & CMD_READONLY)) {
+        if (!must_propagate && !(c->cmd->flags & (CMD_READONLY|CMD_ADMIN))) {
            execCommandPropagateMulti(c);
            must_propagate = 1;
        }
@ -167,9 +169,22 @@ void execCommand(client *c) {
    c->argc = orig_argc;
    c->cmd = orig_cmd;
    discardTransaction(c);
+
    /* Make sure the EXEC command will be propagated as well if MULTI
     * was already propagated. */
-    if (must_propagate) server.dirty++;
+    if (must_propagate) {
+        int is_master = server.masterhost == NULL;
+        server.dirty++;
+        /* If inside the MULTI/EXEC block this instance was suddenly
+         * switched from master to slave (using the SLAVEOF command), the
+         * initial MULTI was propagated into the replication backlog, but the
+         * rest was not. We need to make sure to at least terminate the
+         * backlog with the final EXEC. */
+        if (server.repl_backlog && was_master && !is_master) {
+            char *execcmd = "*1\r\n$4\r\nEXEC\r\n";
+            feedReplicationBacklog(execcmd,strlen(execcmd));
+        }
+    }

 handle_monitor:
    /* Send EXEC to clients waiting data from MONITOR. We do it here
--- a/src/server.h
+++ b/src/server.h
@ -1504,6 +1504,7 @@ void changeReplicationId(void);
 void clearReplicationId2(void);
 void chopReplicationBacklog(void);
 void replicationCacheMasterUsingMyself(void);
+void feedReplicationBacklog(void *ptr, size_t len);

 /* Generic persistence functions */
 void startLoading(FILE *fp);