Sometimes it's useful to compute a key's cluster slot in a module.
This API function is just like the command CLUSTER KEYSLOT (but faster).
A "reverse" API is also added:
`RedisModule_ClusterCanonicalKeyNameInSlot`. Given a slot, it returns a
short string that we can call a canonical key for the slot.
The check in fileIsManifest misjudged the manifest file. For example,
if resp aof contains "file", it will be considered a manifest file and
the check will fail:
```
*3
$3
set
$4
file
$4
file
```
In #12951, if the preamble aof also contains it, it will also fail.
Fixes#12951.
the bug was happening if the the word "file" is mentioned
in the first 1024 lines of the AOF. and now as soon as it finds
a non-comment line it'll break (if it contains "file" or doesn't)
Distribute operations via `redis-benchmark` on different slots owned by
node.
`current_slot_index` is never updated hence the value is always `0` and
the tag used is always the first slot owned by the node. Hence any
read/write operation via `redis-benchmark` in cluster mode always
happens on a particular slot.
This is inconvenient to load data uniformly via `redis-benchmark`.
Since lua_Number is not explicitly an integer or a double, we need to
make an effort
to convert it as an integer when that's possible, since the string could
later be used
in a context that doesn't support scientific notation (e.g. 1e9 instead
of 100000000).
Since fpconv_dtoa converts numbers with the equivalent of `%f` or `%e`,
which ever is shorter,
this would break if we try to pass a long integer number to a command
that takes integer.
we'll get an implicit conversion to string in Lua, and then the parsing
in getLongLongFromObjectOrReply will fail.
```
> eval "redis.call('hincrby', 'key', 'field', '1000000000')" 0
(nil)
> eval "redis.call('hincrby', 'key', 'field', tonumber('1000000000'))" 0
(error) ERR value is not an integer or out of range script: ac99c32e4daf7e300d593085b611de261954a946, on @user_script:1.
```
Switch to using ll2string if the number can be safely represented as a
long long.
The problem was introduced in #10587 (Redis 7.2).
closes#13113.
---------
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Update the crc16 hash lookup table to use fixed size character arrays instead of pointers
to static string addresses. Since the actual values are so short, we can just store them
in a uniform array instead. This saves about 128kb of memory and should improve the
performance as well since we should have much better memory locality.
`CONFIG SET oom-score-adj handles configuration failures` test failed in
some CI jobs today.
Failed CI: https://github.com/redis/redis/actions/runs/8152519326
Not sure why the github action's docker image perssions have changed,
but the issue is similar to #12887,
where we can't assume the range of oom_score_adj that a user can change.
## Solution:
Modify the way of determining whether the current user has no privileges
or not,
instead of relying on whether the user id is 0 or not.
This commit updates the processing of PONG gossip messages in the
cluster. When a node (B) becomes a replica due to a failover, its PONG
messages include its new primary node's (A) information and B's
configuration epoch is aligned with A's. This allows observer nodes to
identify changes in primary-ship, addressing issues of intermediate
states and enhancing cluster state consistency during topology changes.
Fix#13018
After #13013
### This PR make effort to defrag the pubsub kvstore in the following
ways:
1. Till now server.pubsub(shard)_channels only share channel name obj
with the first subscribed client, now change it so that the clients and
the pubsub kvstore share the channel name robj.
This would save a lot of memory when there are many subscribers to the
same channel.
It also means that we only need to defrag the channel name robj in the
pubsub kvstore, and then update
all client references for the current channel, avoiding the need to
iterate through all the clients to do the same things.
2. Refactor the code to defragment pubsub(shard) in the same way as
defragment of keys and EXPIRES, with the exception that we only
defragment pubsub(without shard) when slot is zero.
### Other
Fix an overlook in #11695, if defragment doesn't reach the end time, we
should wait for the current
db's keys and expires, pubsub and pubsubshard to finish before leaving,
now it's possible to exit
early when the keys are defragmented.
---------
Co-authored-by: oranagra <oran@redislabs.com>
The new regular expression break the validator:
```
In file included from commands.c:10:
commands_with_reply_schema.def:14528:72: error: stray ‘\’ in program
14528 | struct jsonObjectElement MEMORY_STATS_ReplySchema_patternProperties__db\_\d+__properties_overhead_hashtable_main_elements[] = {
```
The reason is that special characters are not added to to_c_name,
causes special characters to appear in the structure name, causing
c file compilation to fail.
Broken by #12913
- In removeUnsupportedArgs, trying to access the next item after the
last one and causing an out of bounds read.
- In versionIsSupported, when the 'version' is equal to 'since', the
return value is 0 (not supported).
Also, change the function to return `not supported` in case they have
different numbers of digits
Both issues were found by `Non-interactive non-TTY CLI: Test
command-line hinting - old server` under `test-sanitizer-address` (When
changing the `src/version.h` locally to `8.0.0`)
The new `MAXAGE` argument inside `client-kill` triggered the issue (new
argument at the end of the list)
---------
Co-authored-by: YaacovHazan <yaacov.hazan@redislabs.com>
Sometimes we need to make fast judgement about why Redis is suddenly
taking more memory. One of the reasons is main DB's dicts doing
rehashing.
We may use `MEMORY STATS` to monitor the overhead memory of each DB, but
there still lacks a total sum to show an overall trend. So this PR adds
the total overhead of all DBs to `INFO MEMORY` section, together with
the total count of rehashing DB dicts, providing some intuitive metrics
about main dicts rehashing.
This PR adds the following metrics to INFO MEMORY
* `mem_overhead_db_hashtable_rehashing` - only size of ht[0] in
dictionaries we're rehashing (i.e. the memory that's gonna get released
soon)
and a similar ones to MEMORY STATS:
* `overhead.db.hashtable.lut` (complements the existing
`overhead.hashtable.main` and `overhead.hashtable.expires` which also
counts the `dictEntry` structs too)
* `overhead.db.hashtable.rehashing` - temporary rehashing overhead.
* `db.dict.rehashing.count` - number of top level dictionaries being
rehashed.
---------
Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
In XREADGROUP ACK, because streamPropagateXCLAIM does not propagate
entries-read, entries-read will be inconsistent between master and
replicas.
I.e. if no entries were claimed, it would have propagated correctly, but
if some
were claimed, then the entries-read field would be inconsistent on the
replica.
The fix was suggested by guybe7, call streamPropagateGroupID
unconditionally,
so that we will normalize entries_read on the replicas. In the past, we
would
only set propagate_last_id when NOACK was specified. And in #9127,
XCLAIM did
not propagate entries_read in ACK, which would cause entries_read to be
inconsistent between master and replicas.
Another approach is add another arg to XCLAIM and let it propagate
entries_read,
but we decided not to use it. Because we want minimal damage in case
there's an
old target and new source (in the worst case scenario, the new source
doesn't
recognize XGROUP SETID ... ENTRIES READ and the lag is lost. If we
change XCLAIM,
the damage is much more severe).
In this patch, now if the user uses XREADGROUP .. COUNT 1 there will be
an additional
overhead of MULTI, EXEC and XGROUPSETID. We assume the extra command in
case of
COUNT 1 (4x factor, changing from one XCLAIM to
MULTI+XCLAIM+XSETID+EXEC), is probably
ok since reading just one entry is in any case very inefficient (a
client round trip
per record), so we're hoping it's not a common case.
Issue was introduced in #9127.
Even if we have SCRIPT FLUSH ASYNC now, when there are a lot of
lua scripts, SCRIPT FLUSH ASYNC will still block the main thread.
This is because lua_close is executed in the main thread, and lua
heap needs to release a lot of memory.
In this PR, we take the current lua instance on lctx.lua and call
lua_close on it in a background thread, to close it in async way.
This is MeirShpilraien's idea.
The --count option for redis-cli has been released in redis 7.2.
https://github.com/redis/redis/pull/12042
But I have found in code, that some logic was missing for using this
'count' option.
```
static redisReply *sendScan(unsigned long long *it) {
redisReply *reply;
if (config.pattern)
reply = redisCommand(context, "SCAN %llu MATCH %b COUNT %d",
*it, config.pattern, sdslen(config.pattern), config.count);
else
reply = redisCommand(context,"SCAN %llu",*it);
```
The intention was being able to using scan count.
But in this case, the --count will be only applied when 'pattern' is
declared.
So, I had fix it simply, to be worked properly - even if --pattern
option is not being used.
I tested it simply with time() command several times, and I could see it
works as intended with this commit.
The examples of test results are below:
```
# unstable build
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan >/dev/null 2>/dev/null)
real 0m1.287s
user 0m0.011s
sys 0m0.022s
# count is not applied
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 >/dev/null 2>/dev/null)
real 0m1.117s
user 0m0.011s
sys 0m0.020s
# count is applied with --pattern
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 --pattern "hash:*" >/dev/null 2>/dev/null)
real 0m0.045s
user 0m0.002s
sys 0m0.002s
```
```
# fix-redis-cli-scan-count build
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan >/dev/null 2>/dev/null)
real 0m1.084s
user 0m0.008s
sys 0m0.024s
# count is applied even if --pattern is not declared
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 >/dev/null 2>/dev/null)
real 0m0.043s
user 0m0.000s
sys 0m0.004s
# of course this also applied
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 --pattern "hash:*" >/dev/null 2>/dev/null)
real 0m0.031s
user 0m0.002s
sys 0m0.002s
```
Thanks a lot.
If we call `DEL` on expired keys, keys may be deleted in
`expireIfNeeded` and we don't need to call `dbSyncDelete` or
`dbAsyncDelete` after, which repeat the deletion process(i.e. find keys
in main db).
In this PR, I refine the return values of `expireIfNeeded` to indicate
whether we have deleted the expired key to avoid the potential redundant
deletion logic in `delGenericCommand`. Besides, because both KEY_EXPIRED
and KEY_DELETED are non-zero, this PR won't affect other functions
calling `expireIfNeeded`.
I also make a performance test. I first close active expiration by
`debug set-active-expire 0` and write 1 million keys with 1ms TTL. Then
I repeatedly delete 100 expired keys in one `DEL`. The results are as
follow, which shows that this PR can improve performance by about 10% in
this situation.
**unstable**
```
Summary:
throughput summary: 10080.65 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.953 0.136 0.959 1.215 1.335 2.247
```
**This PR**
```
Summary:
throughput summary: 11074.20 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.865 0.128 0.879 1.055 1.175 2.159
```
---------
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Oran Agra <oran@redislabs.com>
In #8554, we added a MALLOC_MIN_SIZE to use a minimum allocation
size when using malloc(0). However, we did not update the size,
when malloc_size is missing.
When malloc_size exists, we record the size that was allocated
instead of the size that was requested. This would work with both
jemalloc, and libc malloc (the change in #8554, doesn't break this).
When malloc_size is missing, we allocate extra size_t bytes and
store the requested size in it. In that case, the requested size
is probably different than the allocated size anyway (the change
in #8554 doesn't conceptually change that).
So we have room for improvement since in this case we are aware
of the extra bytes we asked for. Same as we're also aware of the
extra size_t bytes we asked for.
In addition, some cleaning was done:
1. fixes some outupdated comments.
2. test cleanups
Implement #12699
This PR exposing Lua os.clock() api for getting the elapsed time of Lua
code execution.
Using:
```lua
local start = os.clock()
...
do something
...
local elpased = os.clock() - start
```
---------
Co-authored-by: Meir Shpilraien (Spielrein) <meir@redis.com>
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
Following #12568
In issue #9357, when inserting an element larger than 1GB, we currently
store it in a plain node instead of a listpack.
Presently, when we insert an element that exceeds the maximum size of a
packed node, it cannot be accommodated in any other nodes, thus ending
up isolated like a large element.
I.e. it's a node with only one element, but it's listpack encoded rather
than a plain buffer.
This PR lowers the threshold for considering an element as 'large' from
1GB to the maximum size of a node.
While this change doesn't completely resolve the bug mentioned in the
previous PR, it does mitigate its potential impact.
As a result of this change, we can now only use LSET to replace an
element with another element that falls below the maximum size
threshold.
In the worst-case scenario, with a fill of -5, the largest packed node
we can create is 2GB (32k * 64k):
* 32k: The smallest element in a listpack is 2 bytes, which allows us to
store up to 32k elements.
* 64k: This is the maximum size for a single quicklist node.
## Others
To fully fix#9357, we need more work, as discussed in #12568, when we
insert an element into a quicklistNode, it may be created in a new node,
put into another node, or merged, and we can't correctly delete the node
that was supposed to be deleted.
I'm not sure it's worth it, since it involves a lot of modifications.
Implement #12963
## Changes
1. large bins don't have external fragmentation or are at least
non-defraggable, so we should ignore the effect of
large bins when measuring fragmentation, and only measure fragmentation
of small bins. this affects both the allocator_frag* metrics and also
the active-defrag trigger
2. Adding INFO metrics for `muzzy` memory, which is memory returned to
the OS but still shows as RSS until the OS reclaims it.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
Recently I saw in CI that reply-schemas-validator fails here:
```
Failed validating 'minimum' in schema[1]['properties']['groups']['items']['properties']['consumers']['items']['properties']['active-time']:
{'description': 'Last time this consumer was active (successful '
'reading/claiming).',
'minimum': 0,
'type': 'integer'}
On instance['groups'][0]['consumers'][0]['active-time']:
-1729380548878722639
```
The reason is that in fuzzer, we may restore corrupted active-time,
which will cause the reply schema CI to fail.
The fuzzer can cause corrupt the state in many places, which will
bugs that mess up the reply, so we decided to skip logreqres.
Also, seen-time is the same type as active-time, adding the minimum.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
There is a timing issue in the test, close may arrive late, or in
freeClientAsync we will free the client in async way, which will
lead to errors in watching_clients statistics, since we will only
unwatch all keys when we truly freeClient.
Add a wait here to avoid this problem. Also fixed some outdated
comments i saw. The test was introduced in #12966.
We can see that the past time here happens to be busy_time_limit,
causing the test to fail:
```
[err]: RM_Call from blocked client in tests/unit/moduleapi/blockedclient.tcl
Expected '50' to be more than '50' (context: type eval line 26 cmd {assert_morethan [expr [clock clicks -milliseconds]-$start] $busy_time_limit} proc ::test)
```
It is reasonable for them to be equal, so equal is added here.
It should be noted that in the previous `Busy module command` test,
we also used assert_morethan_equal, so this should have been missed
at the time.
Currently redis uses O3 level optimization would remove the frame pointer
in the target bin.
In the very old past, when gcc optimized at O1 and above levels, the
frame pointer is deleted by default to improve performance. This saves
the RBP registers and reduces the pop/push instructions. But it makes it
difficult for us to observe the running status of the program. For
example, the perf tool cannot be used effectively, especially the modern
eBPF tools such as bcc/memleak.
Add readme about the command json folder, what it does, and who should
(not) use it.
see discussion
https://github.com/redis/redis/issues/9359#issuecomment-1936420698
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
In the `databasesCron()`, the time consumed by
`kvstoreIncrementallyRehash()` is used to calculate the exit condition.
However, within `kvstoreIncrementallyRehash()`, the loop first checks
for timeout before performing rehashing. Therefore, the time for the
last rehash isn't accounted for, making the consumed time inaccurate. We
need to precisely calculate all the time spent on rehashing.
Additionally, the time allocated to `kvstoreIncrementallyRehash()`
should be the remaining time, which is
`INCREMENTAL_REHASHING_THRESHOLD_US` minus the already consumed
`elapsed_us`.
Currently aof_last_fsync is using a low resolution unixtime is really
bad,
it checks if the absolute number of (full) seconds changed by one.
depending on which side of the second barrier it falls, we can get very
different results.
This PR change the resolution to use milliseconds instead of complete
seconds.
In cases where the event loop cycle duration is short and their rapid
(e.g. running
many fast commands with short pipeline, or a high `hz` config), this
change will not
make much difference, since in anyway, we'll be quick to detect that
we're on a "new
second", and it's likely that these fsync will always be executed close
to the second
switch barrier.
But in cases of rare or slow event loops cycles (e.g. either slow
commands, or very
low rate of traffic to redis, and low `hz`), it could easily be that
with the old code,
in some cases we'll have over 1.5 seconds between fsyncs, and in others
less than 0.5.
see discussion in #8612
This PR also handle aof_flush_postponed_start as well, the damage there
is smaller
since the threshold is 2 seconds, and not 1.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
Redis has some special commands that mark the client's state, such as
`subscribe` and `blpop`, which mark the client as `CLIENT_PUBSUB` or
`CLIENT_BLOCKED`, and we have metrics for the special use cases.
However, there are also other special commands, like `WATCH`, which
although do not have a specific flags, and should also be considered
stateful client types. For stateful clients, in many scenarios, the
connections cannot be shared in "connection pool", meaning connection
pool cannot be used. For example, whenever the `WATCH` command is
executed, a new connection is required to put the client into the "watch
state" because the watched keys are stored in the client.
If different business logic requires watching different keys, separate
connections must be used; otherwise, there will be contamination. This
also means that if a user's business heavily relies on the `WATCH`
command, a large number of connections will be required.
Recently we have encountered this situation in our platform, where some
users consume a significant number of connections when using Redis
because of `WATCH`.
I hope we can have a way to observe these special use cases and special
client connections. Here I add a few monitoring metrics:
1. `watching_clients` in `INFO` reply: The number of clients currently
in the "watching" state.
2. `total_watched_keys` in `INFO` reply: The total number of keys being
watched.
3. `watch` in `CLIENT LIST` reply: The number of keys each client is
currently watching.
Usually, the probability that a dict exists is much greater than the
probability that it does not exist. In kvstoreDictAddRaw, we will call
kvstoreGetDict multiple times. Based on this assumption, we change
createDictIfNeeded to something like get or create function:
```
before:
dict exist: 2 kvstoreGetDict
dict non-exist: 2 kvstoreGetDict
after:
dict exist: 1 kvstoreGetDict
dict non-exist: 3 kvstoreGetDict
```
A possible 3% performance improvement was observed:
In addition, some typos/comments i saw have been cleaned up.
In low memory situations, sending a big number of arguments (sets)
may cause OOM panic. Use ztrycalloc, like we do on LCS and XAUTOCLAIM,
and fail gracefully.
This change affects the following commands: ZUNION, ZINTER, ZDIFF,
ZUNIONSTORE, ZINTERSTORE, ZDIFFSTORE, ZINTERCARD.
These tests have all failed in daily CI:
```
*** [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
*** [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
*** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl
Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
```
Increase the range to avoid failures, and improve the comment to be
clearer.
tests was introduced in #13004.
The receiver does not update any of its cluster state based on gossip
about itself. This commit explicitly avoids sending or processing gossip
about the receiver.
Currently cluster bus gossips include 10% of nodes in the cluster with a
minimum of 3 nodes. For up to 30 node clusters, this commit makes sure
that 1/3 of the gossip (1 out of 3 gossips) is never discarded. This
should help with relatively faster convergence of cluster state in
general.
Following the changes introduced by 8cd62f82c, the dbExpandExpires used
the db_size instead of expires_size.
Co-authored-by: YaacovHazan <yaacov.hazan@redislabs.com>
Following the changes introduced by 8cd62f82c, the kvstoreDictExpand for
the expires kvstore used the slot_size instead of expires_slot_size.
Co-authored-by: YaacovHazan <yaacov.hazan@redislabs.com>
This test fails occasionally:
```
*** [err]: CLIENT KILL maxAGE will kill old clients in tests/unit/introspection.tcl
Expected 2 == 1 (context: type eval line 14 cmd {assert {$res == 1}} proc ::test)
```
This test is very likely to do a false positive if the execute time
takes longer than the max age, for example, if the execution time
between sleep and kill exceeds 1s, rd2 will also be killed due to
the max age.
The test can adjust the order of execution statements to increase
the probability of passing, but this is still will be a timing issue
in some slow machines, so decided give it a few more chances.
The test was introduced in #12299.
Fail CI:
https://github.com/redis/redis/actions/runs/7837608438/job/21387609715
## Why defragment tests only failed under 32-bit
First of all, under 32-bit jemalloc will allocate more small bins and
less large bins, which will also lead to more external fragmentation,
therefore, the fragmentation ratio is higher in 32-bit than in 64-bit,
so the defragment tests(`Active defrag eval scripts: cluster` and
`Active defrag big keys: cluster`) always fails in 32-bit.
## Why defragment tests only failed with cluster
The fowllowing is the result of `Active defrag eval scripts: cluster`
test.
1) Before #11695, the fragmentation ratio is 3.11%.
2) After #11695, the fragmentation ratio grew to 4.58%.
Since we are using per-slot dictionary to manage slots, we will only
defragment the contents of these dictionaries (keys, values), but not
the dictionaries' struct and ht_table, which means that frequent
shrinking and expanding of the dictionaries, will make more fragments.
3) After #12850 and #12948, In cluster mode, a large number of cluster
slot dicts will be shrunk, creating additional fragmention, and the
dictionary will not be defragged.
## Solution
* Add defragmentation of the per-slot dictionary's own structures, dict
struct and ht_table.
## Other change
* Increase floating point print precision of `frags` and `rss` in debug
logs for defrag
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
The test fails here and there:
```
*** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl
scan didn't handle slot skipping logic.
```
There are two case:
1. In the case of passing the test, we use child process to avoid the
dict resize, but it can not completely limit it, since in the dictDelete
we still have chance to trigger the resize (hit the force radio). The
reason why our test passed before is because the expire dict is still
in the rehashing process, so the dictDelete, the dictShrinkIfNeeded can
not trigger the resize.
2. In the case of failing the test, the expire dict finished the
rehashing,
so the last dictDelete, the dictShrinkIfNeeded trigger the dict resize
since it hit the force radio, so the skipping logic fail.
This PR add a new DEBUG command to disbale the dict resize.
We forgot to call quicklistSetOptions after createQuicklistObject,
in the sort store scenario, we will create a quicklist with default
fill or compress options.
This PR adds fill and depth parameters to createQuicklistObject to
specify that options need to be set after creating a quicklist.
This closes#12871.
release notes:
> Fix lists created by SORT STORE to respect list compression and
packing configs.
Fix two crash introducted by #12955
When a quicklist node can't be inserted and split, we eventually merge
the current node with its neighboring
nodes after inserting, and compress the current node and its siblings.
1. When the current node is merged with another node, the current node
may become invalid and can no longer be used.
Solution: let `_quicklistMergeNodes()` return the merged nodes.
3. If the current node is a LZF quicklist node, its recompress will be
1. If the split node can be merged with a sibling node to become head or
tail, recompress may cause the head and tail to be compressed, which is
not allowed.
Solution: always recompress to 0 after merging.
After fix for #13033, address sanitizer reports this heap-use-after-free
error. When the pubsubshard_channels dict becomes empty, we will delete
the dict, and the dictReleaseIterator will call dictResetIterator, it
will use the dict so we will trigger the error.
This PR introduced a new struct kvstoreDictIterator to wrap
dictIterator.
Replace the original dict iterator with the new kvstore dict iterator.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: guybe7 <guy.benoish@redislabs.com>
The reason is the same as #13016. The reason is that in #12819,
in cron, in addition to trying to shrink, we will also tyring
to expand. The dict was expanded by cron before we trigger the
bgsave since we do have the enough keys (4096) to hit the radio.
Before the bgsave, we only add 4095 keys to avoid this issue.
Fix#12864
The main reason for this crash is that when replacing a element of a
quicklist packed node with lpReplace() method,
if the final size is larger than 4GB, lpReplace() will fail and returns
NULL, causing `node->entry` to be incorrectly set to NULL.
Since the inserted data is not a large element, we can't just replace it
like a large element, first quicklistInsertAfter()
and then quicklistDelIndex(), because the current node may be merged and
invalidated in quicklistInsertAfter().
The solution of this PR:
When replacing a node fails (listpack exceeds 4GB), split the current
node, create a new node to put in the middle, and try to merge them.
This is the same as inserting a large element.
In the worst case, its size will not exceed 4GB.
This commit addresses a problem in connSocketBlockingConnect where
different types of connection failures, including timeouts and other
errors, were not consistently handled. Previously, the function did not
return C_ERR immediately after detecting a connection failure, which
could lead to inconsistent states and misinterpretation of the
connection status.
With this update, connSocketBlockingConnect now correctly returns C_ERR
upon encountering any connection error, ensuring that all types of
connection failures are handled consistently and the behavior of the
function aligns with expected outcomes in case of connection issues.
Closes#12900
After #12822, when pubsubshard_channels became empty, kvstoreDictDelete
will delete the dict (which is the only one currently deleting dicts
that
become empty) and in the next loop, we will make an invalid call to
dictNext.
After the dict becomes empty, we break out of the loop without calling
dictNext.