We can see that the past time here happens to be busy_time_limit,
causing the test to fail:
```
[err]: RM_Call from blocked client in tests/unit/moduleapi/blockedclient.tcl
Expected '50' to be more than '50' (context: type eval line 26 cmd {assert_morethan [expr [clock clicks -milliseconds]-$start] $busy_time_limit} proc ::test)
```
It is reasonable for them to be equal, so equal is added here.
It should be noted that in the previous `Busy module command` test,
we also used assert_morethan_equal, so this should have been missed
at the time.
Redis has some special commands that mark the client's state, such as
`subscribe` and `blpop`, which mark the client as `CLIENT_PUBSUB` or
`CLIENT_BLOCKED`, and we have metrics for the special use cases.
However, there are also other special commands, like `WATCH`, which
although do not have a specific flags, and should also be considered
stateful client types. For stateful clients, in many scenarios, the
connections cannot be shared in "connection pool", meaning connection
pool cannot be used. For example, whenever the `WATCH` command is
executed, a new connection is required to put the client into the "watch
state" because the watched keys are stored in the client.
If different business logic requires watching different keys, separate
connections must be used; otherwise, there will be contamination. This
also means that if a user's business heavily relies on the `WATCH`
command, a large number of connections will be required.
Recently we have encountered this situation in our platform, where some
users consume a significant number of connections when using Redis
because of `WATCH`.
I hope we can have a way to observe these special use cases and special
client connections. Here I add a few monitoring metrics:
1. `watching_clients` in `INFO` reply: The number of clients currently
in the "watching" state.
2. `total_watched_keys` in `INFO` reply: The total number of keys being
watched.
3. `watch` in `CLIENT LIST` reply: The number of keys each client is
currently watching.
These tests have all failed in daily CI:
```
*** [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
*** [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
*** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl
Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
```
Increase the range to avoid failures, and improve the comment to be
clearer.
tests was introduced in #13004.
This test fails occasionally:
```
*** [err]: CLIENT KILL maxAGE will kill old clients in tests/unit/introspection.tcl
Expected 2 == 1 (context: type eval line 14 cmd {assert {$res == 1}} proc ::test)
```
This test is very likely to do a false positive if the execute time
takes longer than the max age, for example, if the execution time
between sleep and kill exceeds 1s, rd2 will also be killed due to
the max age.
The test can adjust the order of execution statements to increase
the probability of passing, but this is still will be a timing issue
in some slow machines, so decided give it a few more chances.
The test was introduced in #12299.
The test fails here and there:
```
*** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl
scan didn't handle slot skipping logic.
```
There are two case:
1. In the case of passing the test, we use child process to avoid the
dict resize, but it can not completely limit it, since in the dictDelete
we still have chance to trigger the resize (hit the force radio). The
reason why our test passed before is because the expire dict is still
in the rehashing process, so the dictDelete, the dictShrinkIfNeeded can
not trigger the resize.
2. In the case of failing the test, the expire dict finished the
rehashing,
so the last dictDelete, the dictShrinkIfNeeded trigger the dict resize
since it hit the force radio, so the skipping logic fail.
This PR add a new DEBUG command to disbale the dict resize.
We forgot to call quicklistSetOptions after createQuicklistObject,
in the sort store scenario, we will create a quicklist with default
fill or compress options.
This PR adds fill and depth parameters to createQuicklistObject to
specify that options need to be set after creating a quicklist.
This closes#12871.
release notes:
> Fix lists created by SORT STORE to respect list compression and
packing configs.
Fix two crash introducted by #12955
When a quicklist node can't be inserted and split, we eventually merge
the current node with its neighboring
nodes after inserting, and compress the current node and its siblings.
1. When the current node is merged with another node, the current node
may become invalid and can no longer be used.
Solution: let `_quicklistMergeNodes()` return the merged nodes.
3. If the current node is a LZF quicklist node, its recompress will be
1. If the split node can be merged with a sibling node to become head or
tail, recompress may cause the head and tail to be compressed, which is
not allowed.
Solution: always recompress to 0 after merging.
The reason is the same as #13016. The reason is that in #12819,
in cron, in addition to trying to shrink, we will also tyring
to expand. The dict was expanded by cron before we trigger the
bgsave since we do have the enough keys (4096) to hit the radio.
Before the bgsave, we only add 4095 keys to avoid this issue.
Fix#12864
The main reason for this crash is that when replacing a element of a
quicklist packed node with lpReplace() method,
if the final size is larger than 4GB, lpReplace() will fail and returns
NULL, causing `node->entry` to be incorrectly set to NULL.
Since the inserted data is not a large element, we can't just replace it
like a large element, first quicklistInsertAfter()
and then quicklistDelIndex(), because the current node may be merged and
invalidated in quicklistInsertAfter().
The solution of this PR:
When replacing a node fails (listpack exceeds 4GB), split the current
node, create a new node to put in the middle, and try to merge them.
This is the same as inserting a large element.
In the worst case, its size will not exceed 4GB.
Currently, once active defrag starts, we can not adjust
active_defrag_running
downwards. This is because active_defrag_running will be dynamically
compute
based on the fragmentation, we think we should not lower the effort when
the
fragmentation drops.
However, we need to note that active_defrag_running will also be
dynamically
computed based on configurations. In this case, we are not respecting
cycle-min
or cycle-max. Some people may realize halfway through that defrag
consumes a
lot and want to adjust it.
Previously we could only turn off activedefrag and then turn it on again
to
adjust active_defrag_running downwards. So in this PR, when a active
defrag
configuration change is made, we will re-compute it.
These configuration items are:
- active-defrag-cycle-min
- active-defrag-cycle-max
- active-defrag-threshold-upper
# Description
Gather most of the scattered `redisDb`-related code from the per-slot
dict PR (#11695) and turn it to a new data structure, `kvstore`. i.e.
it's a class that represents an array of dictionaries.
# Motivation
The main motivation is code cleanliness, the idea of using an array of
dictionaries is very well-suited to becoming a self-contained data
structure.
This allowed cleaning some ugly code, among others: loops that run twice
on the main dict and expires dict, and duplicate code for allocating and
releasing this data structure.
# Notes
1. This PR reverts the part of https://github.com/redis/redis/pull/12848
where the `rehashing` list is global (handling rehashing `dict`s is
under the responsibility of `kvstore`, and should not be managed by the
server)
2. This PR also replaces the type of `server.pubsubshard_channels` from
`dict**` to `kvstore` (original PR:
https://github.com/redis/redis/pull/12804). After that was done,
server.pubsub_channels was also chosen to be a `kvstore` (with only one
`dict`, which seems odd) just to make the code cleaner by making it the
same type as `server.pubsubshard_channels`, see
`pubsubtype.serverPubSubChannels`
3. the keys and expires kvstores are currenlty configured to allocate
the individual dicts only when the first key is added (unlike before, in
which they allocated them in advance), but they won't release them when
the last key is deleted.
Worth mentioning that due to the recent change the reply of DEBUG
HTSTATS changed, in case no keys were ever added to the db.
before:
```
127.0.0.1:6379> DEBUG htstats 9
[Dictionary HT]
Hash table 0 stats (main hash table):
No stats available for empty dictionaries
[Expires HT]
Hash table 0 stats (main hash table):
No stats available for empty dictionaries
```
after:
```
127.0.0.1:6379> DEBUG htstats 9
[Dictionary HT]
[Expires HT]
```
When db->expires_cursor==0, it means the DB is done the scanning,
we should exit the loop to avoid the useless scanning.
It is easy to see the active expire timeout in the modified test,
for example, let's assume that there is only 1 expired key in the
DB, and the size / buckets ratio is less than 1%, which means that
we will skip it in isExpiryDictValidForSamplingCb, and the return
value of expires_cursor is 0.
Because `data.sampled == 0` is always true, so `repeat` is also
always true, we will keep scanning the DB, but every time it is
skipped by the previous judgment (expires_cursor = 0), until the
timelimit is finally exhausted.
Ci report this failure:
```
*** [err]: Don't rehash if used memory exceeds maxmemory after rehash in tests/unit/maxmemory.tcl
Expected '4098' to equal or match '4002'
WARNING: the new maxmemory value set via CONFIG SET (1176088) is smaller than the current memory usage (1231083)
```
It can be seen from the log that used_memory changed before we set
maxmemory.
The reason is that in #12819, in cron, in addition to trying to shrink,
we will
also tyring to expand. The dict was expanded by cron before we set
maxmemory,
causing the test to fail.
Before setting maxmemory, we only add 4095 keys to avoid triggering
resize.
When we use a timer to unblock a client in module, if the timer
period and the block timeout are very close, they will unblock the
client in the same event loop, and it will trigger the assertion.
The reason is that in moduleBlockedClientTimedOut we will protect
against re-processing, so we don't actually call updateStatsOnUnblock
(see #12817), so we are not able to reset the c->duration.
The reason is unblockClientOnTimeout() didn't realize that bc had
been unblocked. We add a function to the module to determine if bc
is blocked, and then use it in unblockClientOnTimeout() to exit.
There is the stack:
```
beforeSleep
blockedBeforeSleep
handleBlockedClientsTimeout
checkBlockedClientTimeout
unblockClientOnTimeout
unblockClient
resetClient
-- assertion, crash the server
'c->duration == 0' is not true
```
The block timeout is passed in the test case, but we do not pass
in the timeout_callback, and it will crash when unlocking. In this
case, in moduleBlockedClientTimedOut we will check timeout_callback.
There is the stack:
```
beforeSleep
blockedBeforeSleep
handleBlockedClientsTimeout
checkBlockedClientTimeout
unblockClientOnTimeout
replyToBlockedClientTimedOut
moduleBlockedClientTimedOut
-- timeout_callback is NULL, invalidFunctionWasCalled
bc->timeout_callback(&ctx,(void**)c->argv,c->argc);
```
Add a way to HSCAN a hash key, and get only the filed names.
Command syntax is now:
```
HSCAN key cursor [MATCH pattern] [COUNT count] [NOVALUES]
```
when `NOVALUES` is on, the command will only return keys in the hash.
---------
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Adds an ability to kill clients older than a specified age.
Also, fixed the age calculation in `catClientInfoString` to use
`commandTimeSnapshot`
instead of the old `server.unixtime`, and added missing documentation
for
`CLIENT KILL ID` to output of `CLIENT help`.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
tests consistently fail on timeout (sleep that's too short).
it now takes more time because in #12819 we iterate on all dicts, not
just non-empty ones.
it passed the PR's CI because it skips the `slow` tag, which might have
been misplaced, but now it is probably required.
with the fix, the tests take quite a lot of time:
```
[ok]: Redis can trigger resizing (1860 ms)
[ok]: Redis can rewind and trigger smaller slot resizing (744 ms)
```
before #12819:
```
[ok]: Redis can trigger resizing (309 ms)
[ok]: Redis can rewind and trigger smaller slot resizing (295 ms)
```
failure:
https://github.com/redis/redis/actions/runs/7704158180/job/20995931735
```
*** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl
scan didn't handle slot skipping logic.
*** [err]: Redis can trigger resizing in tests/unit/other.tcl
Expected '[Dictionary HT]
Hash table 0 stats (main hash table):
table size: 128
number of elements: 5
[Expires HT]
Hash table 0 stats (main hash table):
No stats available for empty dictionaries
' to match '*table size: 8*' (context: type eval line 29 cmd {assert_match "*table size: 8*" [r debug HTSTATS 0]} proc ::test)
*** [err]: Redis can rewind and trigger smaller slot resizing in tests/unit/other.tcl
Expected '[Dictionary HT]
Hash table 0 stats (main hash table):
table size: 256
number of elements: 10
[Expires HT]
Hash table 0 stats (main hash table):
No stats available for empty dictionaries
' to match '*table size: 16*' (context: type eval line 27 cmd {assert_match "*table size: 16*" [r debug HTSTATS 0]} proc ::test)
```
In #11012, we will reprocess command when client is unblocked on keys,
in some blocking commands, for example, in the XREADGROUP BLOCK
scenario,
because of the re-processing command, we will recalculate the block
timeout,
causing the blocking time to be reset.
This commit add a new CLIENT_REPROCESSING_COMMAND clent flag, explicitly
let the command know that it is being re-processed, later in
blockForKeys
we will not reset the timeout.
Affected BLOCK cases:
- list / zset / stream, added test cases for each.
Unaffected cases:
- module (never re-process the commands).
- WAIT / WAITAOF (never re-process the commands).
Fixes#12998.
The function `tryResizeHashTables` only attempts to shrink the dicts
that has keys (change from #11695), this was a serious problem until the
change in #12850 since it meant if all keys are deleted, we won't shrink
the dick.
But still, both dictShrink and dictExpand may be blocked by a fork child
process, therefore, the cron job needs to perform both dictShrink and
dictExpand, for not just non-empty dicts, but all dicts in DBs.
What this PR does:
1. Try to resize all dicts in DBs (not just non-empty ones, as it was
since #12850)
2. handle both shrink and expand (not just shrink, as it was since
forever)
3. Refactor some APIs about dict resizing (get rid of `htNeedsShrink`
`htNeedsShrink` `dictShrinkToFit`, and expose `dictShrinkIfNeeded`
`dictExpandIfNeeded` which already contains all the code of those
functions we get rid of, to make APIs more neat)
4. In the `Don't rehash if redis has child process` test, now that cron
would do resizing, we no longer need to write to DB after the child
process got killed, and can wait for the cron to expand the hash table.
Modules may want to handle allocation failures gracefully. Adding
RM_TryCalloc() and RM_TryRealloc() for it.
RM_TryAlloc() was added before:
https://github.com/redis/redis/pull/10541
#### Problem Statement:
For any read/update operation during rehashing, we're doing ~10+ random
DRAM lookups to do the rehashing, as we are using the `rehashidx` to
rehash 10 buckets, whose dict entries most likely aren't cached in the
CPU or near the bucket we are operating on. If these random bucket are
empty, the rehashing process during that command execution is skipped.
#### Implementation:
For reducing the performance recession while dict is rehashing, we
determine the index at which the key would be stored in the 0th HT, we
check if that index has already been rehashed, if not we will rehash the
bucket containing the key and the bucket will be moved from 0th HT to
the 1st HT.
If the key has already been rehashed, we perform the random access
bucket rehash (using `rehashidx`) and we again verify if rehashing is
still ongoing and look up the key in the respective HT.
This ensures rehashing is not skipped in any command call and that we
rehash a particular bucket or random bucket in each call.
#### Changes in this PR:
- Added a new method `dictBucketRehash` to perform rehash on a single
bucket.
- Helper function `moveKeysInBucketOldtoNew` for `dictRehash` and
`dictBucketRehash` to move all the keys in a bucket from the old to the
new hash HT.
- Helper function `verifyMoreRehashRequired` for `dictRehash` and
`dictBucketRehash` to check if we have already rehashed the whole table
and if more rehashing is required.
### Benchmark:
- This PR still shows **~13%** improvement in the latency during
rehashing.
- Rehashing is now **~2%** faster for this PR when compared to unstable.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
We have test cases for incr related commands with no key exist and
spaces in key and wrong type of key. However, we dont have test cases
covered for INCRBY INCRBYFLOAT DECRBY INCR DECR HINCRBY HINCRBYFLOAT
ZINCRBY with valid key and invalid value as argument, and float value to
incrby and decrby. So added test cases for the scenarios in incr.tcl.
Thank you!
In #11568 we removed the NOSCRIPT flag from commands, e.g. removing
NOSCRIPT flag from WAIT. Aiming to allow them in scripts and let them
implicitly behave in the non-blocking way.
This PR remove NOSCRIPT flag from WAITAOF just like WAIT (to be
symmetrical)).
And this PR also add BLOCKING flag for WAIT and WAITAOF.
This PR did some cleanups around function:
- drop the comment about Libraries Ctx, since we do have comment
in functionsLibCtx, no need to maintain multiple copies.
- remove outdated comment about the dropped Library description.
- remove unused desc and code vars in functionExtractLibMetaData.
- fix engines_nemory typo, changed it to engines_memory.
- remove outdated comment about FUNCTION CREATE and FUNCTION INFO,
FUNCTION CREATE was renamed to FUNCTION LOAD.
- Check in initServer whether the return of functionsInit is OK.
Currently slowlog gets disabled if slowlog-log-slower-than is set to less than zero. I think we should also disable it if slowlog-max-len is set to zero. We apply the same logic to acllog-max-len.
Before this change (most recently modified in
https://github.com/redis/redis/pull/12850#discussion_r1421406393), The
trigger for normal expand threshold was 100% utilization and the trigger
for normal shrink threshold was 10% (HASHTABLE_MIN_FILL).
While during fork (DICT_RESIZE_AVOID), when we want to avoid rehash, the
trigger thresholds were multiplied by 5 (`dict_force_resize_ratio`),
meaning 500% for expand and 2% (100/10/5) for shrink.
However, in `dictRehash` (the incremental rehashing), the rehashing
threshold for shrinking during fork (DICT_RESIZE_AVOID) was 20% by
mistake.
This meant that if a shrinking is triggered when `dict_can_resize` is
`DICT_RESIZE_ENABLE` which the threshold is 10%, the rehashing can
continue when `dict_can_resize` is `DICT_RESIZE_AVOID`.
This would cause unwanted CopyOnWrite damage.
It'll make sense to change the thresholds of the rehash trigger and the
thresholds of the incremental rehashing the same, however, in one we
compare the size of the hash table to the number of records, and in the
other we compare the size of ht[0] to the size of ht[1], so the formula
is not exactly the same.
to make things easier we change all the thresholds to powers of 2, so
the normal shrinking threshold is changed from 100/10 (i.e. 10%) to
100/8 (i.e. 12.5%), and we change the threshold during forks from 5 to
4, i.e. from 500% to 400% for expand, and from 2% (100/10/5) to 3.125%
(100/8/4)
Fix#12785 and other race condition issues.
See the following isolated comments.
The following report was obtained using SANITIZER thread.
```sh
make SANITIZER=thread
./runtest-moduleapi --config io-threads 4 --config io-threads-do-reads yes --accurate
```
1. Fixed thread-safe issue in RM_UnblockClient()
Related discussion:
https://github.com/redis/redis/pull/12817#issuecomment-1831181220
* When blocking a client in a module using `RM_BlockClientOnKeys()` or
`RM_BlockClientOnKeysWithFlags()`
with a timeout_callback, calling RM_UnblockClient() in module threads
can lead to race conditions
in `updateStatsOnUnblock()`.
- Introduced:
Version: 6.2
PR: #7491
- Touch:
`server.stat_numcommands`, `cmd->latency_histogram`, `server.slowlog`,
and `server.latency_events`
- Harm Level: High
Potentially corrupts the memory data of `cmd->latency_histogram`,
`server.slowlog`, and `server.latency_events`
- Solution:
Differentiate whether the call to moduleBlockedClientTimedOut() comes
from the module or the main thread.
Since we can't know if RM_UnblockClient() comes from module threads, we
always assume it does and
let `updateStatsOnUnblock()` asynchronously update the unblock status.
* When error reply is called in timeout_callback(), ctx is not
thread-safe, eventually lead to race conditions in `afterErrorReply`.
- Introduced:
Version: 6.2
PR: #8217
- Touch
`server.stat_total_error_replies`, `server.errors`,
- Harm Level: High
Potentially corrupts the memory data of `server.errors`
- Solution:
Make the ctx in `timeout_callback()` with `REDISMODULE_CTX_THREAD_SAFE`,
and asynchronously reply errors to the client.
2. Made RM_Reply*() family API thread-safe
Related discussion:
https://github.com/redis/redis/pull/12817#discussion_r1408707239
Call chain: `RM_Reply*()` -> `_addReplyToBufferOrList()` -> touch
server.current_client
- Introduced:
Version: 7.2.0
PR: #12326
- Harm Level: None
Since the module fake client won't have the `CLIENT_PUSHING` flag, even
if we touch server.current_client,
we can still exit after `c->flags & CLIENT_PUSHING`.
- Solution
Checking `c->flags & CLIENT_PUSHING` earlier.
3. Made freeClient() thread-safe
Fix#12785
- Introduced:
Version: 4.0
Commit:
3fcf959e60
- Harm Level: Moderate
* Trigger assertion
It happens when the module thread calls freeClient while the io-thread
is in progress,
which just triggers an assertion, and doesn't make any race condiaions.
* Touch `server.current_client`, `server.stat_clients_type_memory`, and
`clientMemUsageBucket->clients`.
It happens between the main thread and the module threads, may cause
data corruption.
1. Error reset `server.current_client` to NULL, but theoretically this
won't happen,
because the module has already reset `server.current_client` to old
value before entering freeClient.
2. corrupts `clientMemUsageBucket->clients` in
updateClientMemUsageAndBucket().
3. Causes server.stat_clients_type_memory memory statistics to be
inaccurate.
- Solution:
* No longer counts memory usage on fake clients, to avoid updating
`server.stat_clients_type_memory` in freeClient.
* No longer resetting `server.current_client` in unlinkClient, because
the fake client won't be evicted or disconnected in the mid of the
process.
* Judgment assertion `io_threads_op == IO_THREADS_OP_IDLE` only if c is
not a fake client.
4. Fixed free client args without GIL
Related discussion:
https://github.com/redis/redis/pull/12817#discussion_r1408706695
When freeing retained strings in the module thread (refcount decr), or
using them in some way (refcount incr), we should do so while holding
the GIL,
otherwise, they might be simultaneously freed while the main thread is
processing the unblock client state.
- Introduced:
Version: 6.2.0
PR: #8141
- Harm Level: Low
Trigger assertion or double free or memory leak.
- Solution:
Documenting that module API users need to ensure any access to these
retained strings is done with the GIL locked
5. Fix adding fake client to server.clients_pending_write
It will incorrectly log the memory usage for the fake client.
Related discussion:
https://github.com/redis/redis/pull/12817#issuecomment-1851899163
- Introduced:
Version: 4.0
Commit:
9b01b64430
- Harm Level: None
Only result in NOP
- Solution:
* Don't add fake client into server.clients_pending_write
* Add c->conn assertion for updateClientMemUsageAndBucket() and
updateClientMemoryUsage() to avoid same
issue in the future.
So now it will be the responsibility of the caller of both of them to
avoid passing in fake client.
6. Fix calling RM_BlockedClientMeasureTimeStart() and
RM_BlockedClientMeasureTimeEnd() without GIL
- Introduced:
Version: 6.2
PR: #7491
- Harm Level: Low
Causes inaccuracies in command latency histogram and slow logs, but does
not corrupt memory.
- Solution:
Module API users, if know that non-thread-safe APIs will be used in
multi-threading, need to take responsibility for protecting them with
their own locks instead of the GIL, as using the GIL is too expensive.
### Other issue
1. RM_Yield is not thread-safe, fixed via #12905.
### Summarize
1. Fix thread-safe issues for `RM_UnblockClient()`, `freeClient()` and
`RM_Yield`, potentially preventing memory corruption, data disorder, or
assertion.
2. Updated docs and module test to clarify module API users'
responsibility for locking non-thread-safe APIs in multi-threading, such
as RM_BlockedClientMeasureTimeStart/End(), RM_FreeString(),
RM_RetainString(), and RM_HoldString().
### About backpot to 7.2
1. The implement of (1) is not too satisfying, would like to get more
eyes.
2. (2), (3) can be safely for backport
3. (4), (6) just modifying the module tests and updating the
documentation, no need for a backpot.
4. (5) is harmless, no need for a backpot.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
Before #12850, we will only try to shrink the dict in serverCron,
which we can control by using a child process, but now every time
we delete a key, the shrink check will be called.
In these test (added in #12802), we meant to disable the resizing,
but druing the delete, the dict will meet the force shrink, like
2 / 128 = 0.015 < 0.2, the delete will trigger a force resize and
will cause the test to fail.
In this commit, we try to keep the load factor at 3 / 128 = 0.023,
that is, do not meet the force shrink.
The test have a race:
```
*** [err]: Redis can rewind and trigger smaller slot resizing in tests/unit/other.tcl
Expected '[Dictionary HT]
Hash table 0 stats (main hash table):
table size: 12
number of elements: 2
[Expires HT]
Hash table 0 stats (main hash table):
No stats available for empty dictionaries
' to match '*table size: 8*' (context: type eval line 12 cmd {assert_match "*table size: 8*" [r debug HTSTATS 0]} proc ::test)
```
When `r del "{alice}$j"` is executed in the loop, when the key is
deleted to [9, 12], the load factor has meet HASHTABLE_MIN_FILL,
if serverCron happens to trigger slot dict resize, then the test
will fail. Because there is not way to meet HASHTABLE_MIN_FILL in
the subsequent dels.
The solution is to avoid triggering the resize in advance. We can
use multi to delete them at once, or we can disable the resize.
Since we disabled resize in the previous test, the fix also uses
the method of disabling resize.
The test is introduced in #12802.
When we insert entries into dict, it may autonomously expand if needed.
However, when we delete entries from dict, it doesn't shrink to the
proper size. If there are few entries in a very large dict, it may cause
huge waste of memory and inefficiency when iterating.
The main keyspace dicts (keys and expires), are shrinked by cron
(`tryResizeHashTables` calls `htNeedsResize` and `dictResize`),
And some data structures such as zset and hash also do that (call
`htNeedsResize`) right after a loop of calls to `dictDelete`,
But many other dicts are completely missing that call (they can only
expand).
In this PR, we provide the ability to automatically shrink the dict when
deleting. The conditions triggering the shrinking is the same as
`htNeedsResize` used to have. i.e. we expand when we're over 100%
utilization, and shrink when we're below 10% utilization.
Additionally:
* Add `dictPauseAutoResize` so that flows that do mass deletions, will
only trigger shrinkage at the end.
* Rename `dictResize` to `dictShrinkToFit` (same logic as it used to
have, but better name describing it)
* Rename `_dictExpand` to `_dictResize` (same logic as it used to have,
but better name describing it)
related to discussion
https://github.com/redis/redis/pull/12819#discussion_r1409293878
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
Regarding how to obtain the hash slot of a key, there is an optimization
in `getKeySlot()`, it is used to avoid redundant hash calculations for
keys: when the current client is in the process of executing a command,
it can directly use the slot of the current client because the slot to
access has already been calculated in advance in `processCommand()`.
However, scripts are a special case where, in default mode or with
`allow-cross-slot-keys` enabled, they are allowed to access keys beyond
the pre-declared range. This means that the keys they operate on may not
belong to the slot of the pre-declared keys. Currently, when the
commands in a script are executed, the slot of the original client
(i.e., the current client) is not correctly updated, leading to
subsequent access to the wrong slot.
This PR fixes the above issue. When checking the cluster constraints in
a script, the slot to be accessed by the current command is set for the
original client (i.e., the current client). This ensures that
`getKeySlot()` gets the correct slot cache.
Additionally, the following modifications are made:
1. The 'sort' and 'sort_ro' commands use `getKeySlot()` instead of
`c->slot` because the client could be an engine client in a script and
can lead to potential bug.
2. `getKeySlot()` is also used in pubsub to obtain the slot for the
channel, standardizing the way slots are retrieved.
The test was introduced in #10745, but we forgot to add it to the
test_helper.tcl, so our CI did not actually run it. This PR adds it
and ensures it passes CI tests.
Fix a daily test failure because alpine doesn't support stack traces and
add in an extra assertion related to making sure the stack trace was
printed twice.
This change is trying to make two failure modes a bit easier to deep dive:
1. If a serverPanic or serverAssert occurs during the info (or module)
printing, it will recursively panic, which is a lot of fun as it will
just keep recursively printing. It will eventually stack overflow, but
will generate a lot of text in the process.
2. When a segfault happens during the segfault handler, no information
is communicated other than it happened. This can be problematic because
`info` may help diagnose the real issue, but without fixing the
recursive crash it might be hard to get at that info.
We have achieved replacing `slots_to_keys` radix tree with key->slot
linked list (#9356), and then replacing the list with slot specific
dictionaries for keys (#11695).
Shard channels behave just like keys in many ways, and we also need a
slots->channels mapping. Currently this is still done by using a radix
tree. So we should split `server.pubsubshard_channels` into 16384 dicts
and drop the radix tree, just like what we did to DBs.
Some benefits (basically the benefits of what we've done to DBs):
1. Optimize counting channels in a slot. This is currently used only in
removing channels in a slot. But this is potentially more useful:
sometimes we need to know how many channels there are in a specific slot
when doing slot migration. Counting is now implemented by traversing the
radix tree, and with this PR it will be as simple as calling `dictSize`,
from O(n) to O(1).
2. The radix tree in the cluster has been removed. The shard channel
names no longer require additional storage, which can save memory.
3. Potentially useful in slot migration, as shard channels are logically
split by slots, thus making it easier to migrate, remove or add as a
whole.
4. Avoid rehashing a big dict when there is a large number of channels.
Drawbacks:
1. Takes more memory than using radix tree when there are relatively few
shard channels.
What this PR does:
1. in cluster mode, split `server.pubsubshard_channels` into 16384
dicts, in standalone mode, still use only one dict.
2. drop the `slots_to_channels` radix tree.
3. to save memory (to solve the drawback above), all 16384 dicts are
created lazily, which means only when a channel is about to be inserted
to the dict will the dict be initialized, and when all channels are
deleted, the dict would delete itself.
5. use `server.shard_channel_count` to keep track of the number of all
shard channels.
---------
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Fix#12792
On ubuntu 23(lunar), non-root users will not be allowed to change the
oom_score_adj of a process to a value that is too low.
Since terminal's default oom_score_adj is 200, if we run the test on
terminal, we won't be able to set the oom_score_adj of the redis process
to 9 or 22, which is too low.
Reproduction on ubuntu 23(lunar) terminal:
```sh
$ cat /proc/`pgrep redis-server`/oom_score_adj
200
$ echo 100 > /proc/`pgrep redis-server`/oom_score_adj
# success without error
$ echo 99 > /proc/`pgrep redis-server`/oom_score_adj
echo: write error: Permission denied
```
As from the output above, we can only set the minimum oom score of redis
processes to 100.
By modifying the test, make oom_score_adj only increase upwards and not
decrease.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Previous implementation would disconnect _all_ clients when running `ACL
LOAD`, which wasn't very useful.
This change brings the behavior in line with that of `ACL SETUSER`, `ACL
DELUSER`, in that only clients whose user is deleted or clients
subscribed to channels which they no longer have access to will be
disconnected.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
This PR, we added -4 and -6 options to redis-cli to determine
IPV4 / IPV6 priority in DNS lookup.
This was mentioned in
https://github.com/redis/redis/pull/11151#issuecomment-1231570651
For now it's only used in CLUSTER MEET.
The options also made it possible to reliably test dns lookup in CI,
using this option, we can add some localhost tests for #11151.
The commit was cherry-picked from #11151, back then we decided to split
the PR.
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
We dont have test for hgetall against key doesnot exist so added the
test in test suite and along with this, added wrong type cases for other
missing commands.
The by/get options of sort/sort_ro command used to be forbidden in
cluster mode, since we are not sure which slot the pattern may be in.
As the optimization done in #12536, patterns now can be mapped to slots,
we should allow by/get options in cluster mode when the pattern maps to
the same slot as the key.
In #11489, we consider acl username to be sensitive information,
and consider the ACL GETUSER a sensitive command and remove it
from redis-cli historyfile.
This PR redact username information in ACL GETUSER and ACL DELUSER
from SLOWLOG, and also remove ACL DELUSER from redis-cli historyfile.
This PR also mark tls-key-file-pass and tls-client-key-file-pass
as sensitive config, will redact it from SLOWLOG and also
remove them from redis-cli historyfile.
In INFO CLIENTS section, we already have blocked_clients and
tracking_clients. We should add a new metric showing the number of
pubsub connections, which helps performance monitoring and trouble
shooting.
This is a follow-up fix to #12733. We need to apply the same changes to
delKeysInSlot. Refer to #12733 for more details.
This PR contains some other minor cleanups / improvements to the test
suite and docs.
It uses the postnotifications test module in a cluster mode test which
revealed a leak in the test module (fixed).
in #12536 we made a similar optimization for SCAN, now that hashtags in
patterns. When we can make sure all keys matching the pettern will be in
the same slot, we can limit the iteration to run only one one.
Warning:
```
postnotifications.c:216:77: warning: format specifies type 'long' but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
RedisModule_Log(ctx, "warning", "Got an unexpected subevent '%ld'", subevent);
~~~ ^~~~~~~~
%llu
```
CI:
https://github.com/redis/redis/actions/runs/6937308713/job/18871124342#step:6:115
## Other
Add `CFLAGS=-Werror` flag for module CI.
---------
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
The following four configurations are renamed to align with Redis style:
1. server_cpulist renamed to server-cpulist
2. bio_cpulist renamed to bio-cpulist
3. aof_rewrite_cpulist renamed to aof-rewrite-cpulist
4. bgsave_cpulist renamed to bgsave-cpulist
The original names are retained as aliases to ensure compatibility with
old configuration files. We recommend users to gradually transition to
using the new configuration names to maintain consistency in style.
Introduced in #11695 .
The tryResizeHashTables function gets stuck on the last non-empty slot
while iterating through dictionaries. It does not restart from the
beginning. The reason for this issue is a problem with the usage of
dbIteratorNextDict:
/* Returns next dictionary from the iterator, or NULL if iteration is complete. */
dict *dbIteratorNextDict(dbIterator *dbit) {
if (dbit->next_slot == -1) return NULL;
dbit->slot = dbit->next_slot;
dbit->next_slot = dbGetNextNonEmptySlot(dbit->db, dbit->slot, dbit->keyType);
return dbGetDictFromIterator(dbit);
}
When iterating to the last non-empty slot, next_slot is set to -1,
causing it to loop indefinitely on that slot. We need to modify the code
to ensure that after iterating to the last non-empty slot, it returns to
the first non-empty slot.
BTW, function tryResizeHashTables is actually iterating over slots
that have keys. However, in its implementation, it leverages the
dbIterator (which is a key iterator) to obtain slot and dictionary
information. While this approach works fine, but it is not very
intuitive. This PR also improves readability by changing the iteration
to directly iterate over slots, thereby enhancing clarity.
When we register notification or server event in RedisModule_OnLoad, but
RedisModule_OnLoad eventually fails, triggering notification or server
event
will cause the server to crash.
If the loading fails on a later stage of moduleLoad, we do call
moduleUnload
which handles all un-registration, but when it fails on the
RedisModule_OnLoad
call, we only un-register several specific things and these were
missing:
- moduleUnsubscribeNotifications
- moduleUnregisterFilters
- moduleUnsubscribeAllServerEvents
Refactored the code to reuse the code from moduleUnload.
Fixes#12808.