When feeding the master with a high rate traffic the the slave's feed is much slower.
This causes the replication buffer to grow (indefinitely) which leads to slave disconnection.
The problem is that writeToClient() decides to stop writing after NET_MAX_WRITES_PER_EVENT
writes (In order to be fair to clients).
We should ignore this when the client is a slave.
It's better if clients wait longer, the alternative is that the slave has no chance to stay in
sync in this situation.
See #3462 and related PRs.
We use a simple algorithm to calculate the level of affinity violation,
and then an optimizer that performs random swaps until things improve.
after a slave is promoted (assuming it has no slaves
and it booted over an hour ago), it will lose it's replication
backlog at the next replication cron, rather than waiting for slaves
to connect to it.
so on a simple master/slave faiover, if the new slave doesn't connect
immediately, it may be too later and PSYNC2 will fail.
This fixes a crash with Redis Cluster when OBJECT is mis-used, because
getKeysUsingCommandTable() will call serverPanic() detecting we are
accessing an invalid argument in the case "OBJECT foo" is called.
This bug was introduced when OBJECT HELP was introduced, because the key
argument is set fixed at index 2 in the command table, however now
OBJECT may be called with an insufficient number of arguments to extract
the key.
The "Right Thing" would be to have a specific function to extract keys
from the OBJECT command, however this is kinda of an overkill, so I
preferred to make getKeysUsingCommandTable() more robust and just return
no keys when it's not possible to honor the command table, because new
commands are often added and also there are a number with an HELP
subcommand violating the normal form, and crashing for this trivial
reason or having many command-specific key extraction functions is not
great.
- protocol parsing (processMultibulkBuffer) was limitted to 32big positions in the buffer
readQueryFromClient potential overflow
- rioWriteBulkCount used int, although rioWriteBulkString gave it size_t
- several places in sds.c that used int for string length or index.
- bugfix in RM_SaveAuxField (return was 1 or -1 and not length)
- RM_SaveStringBuffer was limitted to 32bit length
The commit splits the add functions into a set() and add() set of
functions, so that it's possible to set registers in an independent way
just having the index and count.
Related to #3819, otherwise a fix is not possible.
The function in its initial form, and after the fixes for the PSYNC2
bugs, required code duplication in multiple spots. This commit modifies
it in order to always compute the script name independently, and to
return the SDS of the SHA of the body: this way it can be used in all
the places, including for SCRIPT LOAD, without duplicating the code to
create the Lua function name. Note that this requires to re-compute the
body SHA1 in the case of EVAL seeing a script for the first time, but
this should not change scripting performance in any way because new
scripts definition is a rare event happening the first time a script is
seen, and the SHA1 computation is anyway not a very slow process against
the typical Redis script and compared to the actua Lua byte compiling of
the body.
Note that the function used to assert() if a duplicated script was
loaded, however actually now two times over three, we want the function
to handle duplicated scripts just fine: this happens in SCRIPT LOAD and
in RDB AUX "lua" loading. Moreover the assert was not defending against
some obvious failure mode, so now the function always tests against
already defined functions at start.
Unfortunately, as outlined by @soloestoy in #4505, "lua" AUX RDB field
loading in case of duplicated script was still broken. This commit fixes
this problem and also a memory leak introduced by the past commit.
Note that now we have a regression test able to duplicate the issue, so
this commit was actually tested against the regression. The original PR
also had a valid fix, but I prefer to hide the details of scripting.c
outside scripting.c, and later "SCRIPT LOAD" should also be able to use
the function luaCreateFunction() instead of redoing the work.
With PSYNC2 to force a full SYNC in tests is hard. With this new DEBUG
subcommand we just need to call it and then CLIENT KILL TYPE master in
the slave.
In the case of slaves loading the RDB from master, or in other similar
cases, the script is already defined, and the function registering the
script should not fail in the assert() call.