* Tested it on local instance. This was originally part of
https://github.com/valkey-io/valkey/pull/288 but I am pushing this
separately, so that we can easily merge it into the upcoming release.
```
lua debugger> server ping
<redis> ping
<reply> "+PONG"
lua debugger> redis ping
<redis> ping
<reply> "+PONG"
```
* I also searched for lua debugger related unit tests to add coverage
for this but did not find any relevant test to modify. Leaving it at
that for now.
---------
Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com>
ValkeyModuleEvent_MasterLinkChange was updated to use more inclusive
language, but it was done in the compatibility layer as well
(RedisModuleEvent_).
---------
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
Actually, this script doesn't do anything except printing that it is
replaced by redis-cli.
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Updated procedure redis_deferring_client in test environent to
valkey_deferring_client.
Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
This includes comments used for module API documentation.
* Strategy for replacement: Regex search: `(//|/\*| \*|#).* ("|\()?(r|R)edis( |\.
|'|\n|,|-|\)|")(?!nor the names of its contributors)(?!Ltd.)(?!Labs)(?!Contributors.)`
* Don't edit copyright comments
* Replace "Redis version X.X" -> "Redis OSS version X.X" to distinguish
from newly licensed repository
* Replace "Redis Object" -> "Object"
* Exclude markdown for now
* Don't edit Lua scripting comments referring to redis.X API
* Replace "Redis Protocol" -> "RESP"
* Replace redis-benchmark, -cli, -server, -check-aof/rdb with "valkey-"
prefix
* Most other places, I use best judgement to either remove "Redis", or
replace with "the server" or "server"
Fixes#148
---------
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Wait for cluster to be in a fully consistent and online state in
`cluster_config_consistent`. We expect the `start_server` to create the
desired primaries and replicas before the start of the tests. With the
current setup, the replicas may not complete the sync with primaries and
can be in loading state. In some cases, the role of replicas can still
be master with the delay of propagation of replicate command. The tests
can show flaky behavior in such cases. Add a check that verifies the
nodes health status 'online' for the cluster consistency. Leverage the
deterministic order of `CLUSTER SLOTS` to consider the cluster as
consistent along with the nodes health status.
---------
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Signed-off-by: Ram Prasad Voleti <ramvolet@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Ram Prasad Voleti <ramvolet@amazon.com>
Ref: https://github.com/redis/redis/pull/12760
### Description
#### Fixes compatibilty of PlaceholderKV cluster (7.2 - extensions
enabled by default) with older Redis cluster (< 7.0 - extensions not
handled) .
With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.
This fix introduces a flag `extensions_supported` on the clusterMsg
indicating the sender node can handle extensions parsing. Once, a
receiver nodes receives a message with this flag set to 1, it would
update clusterNode new field extensions_supported and start sending out
extensions if it has any.
This PR also introduces a DEBUG sub command to enable/disable cluster
message extensions `process-clustermsg-extensions` feature.
Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `extensions_supported` and then extensions message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).
### Testing
TCL test verifying the cluster state is healthy irrespective of
enabling/disabling cluster message extensions feature.
---------
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
This commit does not remove redis.call/pcall just yet. It also does not
rename Redis in error messages such as "Please specify at least one
argument for this redis lib call". This allows users to maintain full
backwards compatibility while introducing an option to use server.call
for new code.
I verified that the unit tests pass. Also manually verified that the
redis-server responds to server.call invocations within lua scripting.
Also verified that function registration works as expected.
```
[ok]: EVAL - is Lua able to call Redis API? (0 ms)
[ok]: EVAL - is Lua able to call Server API? (1 ms)
[ok]: EVAL - No arguments to redis.call/pcall is considered an error (0 ms)
[ok]: EVAL - No arguments to server.call/pcall is considered an error (1 ms)
```
---------
Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
This commit updates the following fields:
1. server_version -> valkey_version in server info. Since we would like
to advertise specific compatibility, we are making the version specific
to valkey. servername will remain as an optional indicator, and other
valkey compatible stores might choose to advertise something else.
1. We dropped redis-ver from the API. This isn't related to API
compatibility, but we didn't want to "fake" that valkey was creating an
rdb from a Redis version.
1. Renamed server-ver -> valkey_version in rdb info. Same as point one,
we want to explicitly indicate this was created by a valkey server.
---------
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
New info information to be used to determine the valkey versioning info.
Internally, introduce new define values for "SERVER_VERSION" which is
different from the Redis compatibility version, "REDIS_VERSION".
Add two new info fields:
`server_version`: The Valkey server version
`server_name`: Indicates that the server is valkey.
Add one new RDB field: `server_ver`, which indicates the valkey version
that produced the server.
Add 3 new LUA globals: `SERVER_VERSION_NUM`, `SERVER_VERSION`, and
`SERVER_NAME`. Which reflect the valkey version instead of the Redis
compatibility version.
Also clean up various places where Redis and configuration was being
used that is no longer necessary.
---------
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
This PR covers changing the redis.crt and redis.key to valkey certs for
TLS testing.
The files are generated by the gen-test-certs.sh script under tests/tls/.
Also covers comments provided.
Signed-off-by: hwware <wen.hui.ware@gmail.com>
Co-authored-by: hwware <wen.hui.ware@gmail.com>
Remove trademarked wording on configuration layer.
Following changes for release notes:
1. Rename redis.conf to valkey.conf
2. Pre-filled config in the template config file: Changing pidfile to `/var/run/valkey_6379.pid`
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Documentation references should use `Valkey` while server and cli
references are all under `valkey`.
---------
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
In `beginResultEmission`, -1 means the result length is not known in
advance. But after #12185, if we pass -1 to `zrangeResultBeginStore`, it
will convert to SIZE_MAX in `zsetTypeCreate` and try to `dictExpand`.
Although `dictExpand` won't succeed because the size overflows, I think
we'd better to avoid this wrong conversion.
This bug can be triggered when the source of `zrangestore` doesn't exist
or we use `zrangestore` command with `byscore` or `bylex`.
The impact is that dst keys will be converted to use skiplist instead of
listpack.
Users who abuse lua error_reply will generate a new error object on each
error call, which can make server.errors get bigger and bigger. This
will
cause the server to block when calling INFO (we also return errorstats
by
default).
To prevent the damage it can cause, when a misuse is detected, we will
print a warning log and disable the errorstats to avoid adding more new
errors. It can be re-enabled via CONFIG RESETSTAT.
Because server.errors may be very large (it may be better now since we
have the limit), config resetstat may block for a while. So in
resetErrorTableStats, we will try to lazyfree server.errors.
See the related discussion at the end of #8217.
After #13072, there is an use-after-free error. In expireScanCallback, we
will delete the dict, and then in dictScan we will continue to use the dict,
like we will doing `dictResumeRehashing(d)` in the end, this casued an error.
In this PR, in freeDictIfNeeded, if the dict's pauserehash is set, don't
delete the dict yet, and then when scan returns try to delete it again.
At the same time, we noticed that there will be similar problems in iterator.
We may also delete elements during the iteration process, causing the dict
to be deleted, so the part related to iter in the PR has also been modified.
dictResetIterator was also missing from the previous kvstoreIteratorNextDict,
we currently have no scenario that elements will be deleted in kvstoreIterator
process, deal with it together to avoid future problems. Added some simple
tests to verify the changes.
In addition, the modification in #13072 omitted initTempDb and emptyDbAsync,
and they were also added. This PR also remove the slow flag from the expire
test (consumes 1.3s) so that problems can be found in CI in the future.
In some cases, users will abuse lua eval. Each EVAL call generates
a new lua script, which is added to the lua interpreter and cached
to redis-server, consuming a large amount of memory over time.
Since EVAL is mostly the one that abuses the lua cache, and these
won't have pipeline issues (i.e. the script won't disappear
unexpectedly,
and cause errors like it would with SCRIPT LOAD and EVALSHA),
we implement a plain FIFO LRU eviction only for these (not for
scripts loaded with SCRIPT LOAD).
### Implementation notes:
When not abused we'll probably have less than 100 scripts, and when
abused we'll have many thousands. So we use a hard coded value of 500
scripts. And considering that we don't have many scripts, then unlike
keys, we don't need to worry about the memory usage of keeping a true
sorted LRU linked list. We compute the SHA of each script anyway,
and put the script in a dict, we can store a listNode there, and use
it for quick removal and re-insertion into an LRU list each time the
script is used.
### New interfaces:
At the same time, a new `evicted_scripts` field is added to
INFO, which represents the number of evicted eval scripts. Users
can check it to see if they are abusing EVAL.
### benchmark:
`./src/redis-benchmark -P 10 -n 1000000 -r 10000000000 eval "return
__rand_int__" 0`
The simple abuse of eval benchmark test that will create 1 million EVAL
scripts. The performance has been improved by 50%, and the max latency
has dropped from 500ms to 13ms (this may be caused by table expansion
inside Lua when the number of scripts is large). And in the INFO memory,
it used to consume 120MB (server cache) + 310MB (lua engine), but now
it only consumes 70KB (server cache) + 210KB (lua_engine) because of
the scripts eviction.
For non-abusive case of about 100 EVAL scripts, there's no noticeable
change in performance or memory usage.
### unlikely potentially breaking change:
in theory, a user can maybe load a
script with EVAL and then use EVALSHA to call it (by calculating the
SHA1 value on the client side), it could be that if we read the docs
carefully we'll realized it's a valid scenario, but we suppose it's
extremely rare. So it may happen that EVALSHA acts on a script created
by EVAL, and the script is evicted and EVALSHA returns a NOSCRIPT error.
that is if you have more than 500 scripts being used in the same
transaction / pipeline.
This solves the second point in #13102.
Allow using `+` as a special ID for last item in stream on XREAD
command.
This would allow to iterate on a stream with XREAD starting with the
last available message instead of the next one which `$` is used for.
I.e. the caller can use `BLOCK` and `+` on the first call, and change to
`$` on the next call.
Closes#7388
---------
Co-authored-by: Felipe Machado <462154+felipou@users.noreply.github.com>
Sometimes it's useful to compute a key's cluster slot in a module.
This API function is just like the command CLUSTER KEYSLOT (but faster).
A "reverse" API is also added:
`RedisModule_ClusterCanonicalKeyNameInSlot`. Given a slot, it returns a
short string that we can call a canonical key for the slot.
The check in fileIsManifest misjudged the manifest file. For example,
if resp aof contains "file", it will be considered a manifest file and
the check will fail:
```
*3
$3
set
$4
file
$4
file
```
In #12951, if the preamble aof also contains it, it will also fail.
Fixes#12951.
the bug was happening if the the word "file" is mentioned
in the first 1024 lines of the AOF. and now as soon as it finds
a non-comment line it'll break (if it contains "file" or doesn't)
Since lua_Number is not explicitly an integer or a double, we need to
make an effort
to convert it as an integer when that's possible, since the string could
later be used
in a context that doesn't support scientific notation (e.g. 1e9 instead
of 100000000).
Since fpconv_dtoa converts numbers with the equivalent of `%f` or `%e`,
which ever is shorter,
this would break if we try to pass a long integer number to a command
that takes integer.
we'll get an implicit conversion to string in Lua, and then the parsing
in getLongLongFromObjectOrReply will fail.
```
> eval "redis.call('hincrby', 'key', 'field', '1000000000')" 0
(nil)
> eval "redis.call('hincrby', 'key', 'field', tonumber('1000000000'))" 0
(error) ERR value is not an integer or out of range script: ac99c32e4daf7e300d593085b611de261954a946, on @user_script:1.
```
Switch to using ll2string if the number can be safely represented as a
long long.
The problem was introduced in #10587 (Redis 7.2).
closes#13113.
---------
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
`CONFIG SET oom-score-adj handles configuration failures` test failed in
some CI jobs today.
Failed CI: https://github.com/redis/redis/actions/runs/8152519326
Not sure why the github action's docker image perssions have changed,
but the issue is similar to #12887,
where we can't assume the range of oom_score_adj that a user can change.
## Solution:
Modify the way of determining whether the current user has no privileges
or not,
instead of relying on whether the user id is 0 or not.
After #13013
### This PR make effort to defrag the pubsub kvstore in the following
ways:
1. Till now server.pubsub(shard)_channels only share channel name obj
with the first subscribed client, now change it so that the clients and
the pubsub kvstore share the channel name robj.
This would save a lot of memory when there are many subscribers to the
same channel.
It also means that we only need to defrag the channel name robj in the
pubsub kvstore, and then update
all client references for the current channel, avoiding the need to
iterate through all the clients to do the same things.
2. Refactor the code to defragment pubsub(shard) in the same way as
defragment of keys and EXPIRES, with the exception that we only
defragment pubsub(without shard) when slot is zero.
### Other
Fix an overlook in #11695, if defragment doesn't reach the end time, we
should wait for the current
db's keys and expires, pubsub and pubsubshard to finish before leaving,
now it's possible to exit
early when the keys are defragmented.
---------
Co-authored-by: oranagra <oran@redislabs.com>
Sometimes we need to make fast judgement about why Redis is suddenly
taking more memory. One of the reasons is main DB's dicts doing
rehashing.
We may use `MEMORY STATS` to monitor the overhead memory of each DB, but
there still lacks a total sum to show an overall trend. So this PR adds
the total overhead of all DBs to `INFO MEMORY` section, together with
the total count of rehashing DB dicts, providing some intuitive metrics
about main dicts rehashing.
This PR adds the following metrics to INFO MEMORY
* `mem_overhead_db_hashtable_rehashing` - only size of ht[0] in
dictionaries we're rehashing (i.e. the memory that's gonna get released
soon)
and a similar ones to MEMORY STATS:
* `overhead.db.hashtable.lut` (complements the existing
`overhead.hashtable.main` and `overhead.hashtable.expires` which also
counts the `dictEntry` structs too)
* `overhead.db.hashtable.rehashing` - temporary rehashing overhead.
* `db.dict.rehashing.count` - number of top level dictionaries being
rehashed.
---------
Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
In XREADGROUP ACK, because streamPropagateXCLAIM does not propagate
entries-read, entries-read will be inconsistent between master and
replicas.
I.e. if no entries were claimed, it would have propagated correctly, but
if some
were claimed, then the entries-read field would be inconsistent on the
replica.
The fix was suggested by guybe7, call streamPropagateGroupID
unconditionally,
so that we will normalize entries_read on the replicas. In the past, we
would
only set propagate_last_id when NOACK was specified. And in #9127,
XCLAIM did
not propagate entries_read in ACK, which would cause entries_read to be
inconsistent between master and replicas.
Another approach is add another arg to XCLAIM and let it propagate
entries_read,
but we decided not to use it. Because we want minimal damage in case
there's an
old target and new source (in the worst case scenario, the new source
doesn't
recognize XGROUP SETID ... ENTRIES READ and the lag is lost. If we
change XCLAIM,
the damage is much more severe).
In this patch, now if the user uses XREADGROUP .. COUNT 1 there will be
an additional
overhead of MULTI, EXEC and XGROUPSETID. We assume the extra command in
case of
COUNT 1 (4x factor, changing from one XCLAIM to
MULTI+XCLAIM+XSETID+EXEC), is probably
ok since reading just one entry is in any case very inefficient (a
client round trip
per record), so we're hoping it's not a common case.
Issue was introduced in #9127.
Implement #12699
This PR exposing Lua os.clock() api for getting the elapsed time of Lua
code execution.
Using:
```lua
local start = os.clock()
...
do something
...
local elpased = os.clock() - start
```
---------
Co-authored-by: Meir Shpilraien (Spielrein) <meir@redis.com>
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
Following #12568
In issue #9357, when inserting an element larger than 1GB, we currently
store it in a plain node instead of a listpack.
Presently, when we insert an element that exceeds the maximum size of a
packed node, it cannot be accommodated in any other nodes, thus ending
up isolated like a large element.
I.e. it's a node with only one element, but it's listpack encoded rather
than a plain buffer.
This PR lowers the threshold for considering an element as 'large' from
1GB to the maximum size of a node.
While this change doesn't completely resolve the bug mentioned in the
previous PR, it does mitigate its potential impact.
As a result of this change, we can now only use LSET to replace an
element with another element that falls below the maximum size
threshold.
In the worst-case scenario, with a fill of -5, the largest packed node
we can create is 2GB (32k * 64k):
* 32k: The smallest element in a listpack is 2 bytes, which allows us to
store up to 32k elements.
* 64k: This is the maximum size for a single quicklist node.
## Others
To fully fix#9357, we need more work, as discussed in #12568, when we
insert an element into a quicklistNode, it may be created in a new node,
put into another node, or merged, and we can't correctly delete the node
that was supposed to be deleted.
I'm not sure it's worth it, since it involves a lot of modifications.
Recently I saw in CI that reply-schemas-validator fails here:
```
Failed validating 'minimum' in schema[1]['properties']['groups']['items']['properties']['consumers']['items']['properties']['active-time']:
{'description': 'Last time this consumer was active (successful '
'reading/claiming).',
'minimum': 0,
'type': 'integer'}
On instance['groups'][0]['consumers'][0]['active-time']:
-1729380548878722639
```
The reason is that in fuzzer, we may restore corrupted active-time,
which will cause the reply schema CI to fail.
The fuzzer can cause corrupt the state in many places, which will
bugs that mess up the reply, so we decided to skip logreqres.
Also, seen-time is the same type as active-time, adding the minimum.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
There is a timing issue in the test, close may arrive late, or in
freeClientAsync we will free the client in async way, which will
lead to errors in watching_clients statistics, since we will only
unwatch all keys when we truly freeClient.
Add a wait here to avoid this problem. Also fixed some outdated
comments i saw. The test was introduced in #12966.
We can see that the past time here happens to be busy_time_limit,
causing the test to fail:
```
[err]: RM_Call from blocked client in tests/unit/moduleapi/blockedclient.tcl
Expected '50' to be more than '50' (context: type eval line 26 cmd {assert_morethan [expr [clock clicks -milliseconds]-$start] $busy_time_limit} proc ::test)
```
It is reasonable for them to be equal, so equal is added here.
It should be noted that in the previous `Busy module command` test,
we also used assert_morethan_equal, so this should have been missed
at the time.
Redis has some special commands that mark the client's state, such as
`subscribe` and `blpop`, which mark the client as `CLIENT_PUBSUB` or
`CLIENT_BLOCKED`, and we have metrics for the special use cases.
However, there are also other special commands, like `WATCH`, which
although do not have a specific flags, and should also be considered
stateful client types. For stateful clients, in many scenarios, the
connections cannot be shared in "connection pool", meaning connection
pool cannot be used. For example, whenever the `WATCH` command is
executed, a new connection is required to put the client into the "watch
state" because the watched keys are stored in the client.
If different business logic requires watching different keys, separate
connections must be used; otherwise, there will be contamination. This
also means that if a user's business heavily relies on the `WATCH`
command, a large number of connections will be required.
Recently we have encountered this situation in our platform, where some
users consume a significant number of connections when using Redis
because of `WATCH`.
I hope we can have a way to observe these special use cases and special
client connections. Here I add a few monitoring metrics:
1. `watching_clients` in `INFO` reply: The number of clients currently
in the "watching" state.
2. `total_watched_keys` in `INFO` reply: The total number of keys being
watched.
3. `watch` in `CLIENT LIST` reply: The number of keys each client is
currently watching.
These tests have all failed in daily CI:
```
*** [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
*** [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
*** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl
Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
```
Increase the range to avoid failures, and improve the comment to be
clearer.
tests was introduced in #13004.
This test fails occasionally:
```
*** [err]: CLIENT KILL maxAGE will kill old clients in tests/unit/introspection.tcl
Expected 2 == 1 (context: type eval line 14 cmd {assert {$res == 1}} proc ::test)
```
This test is very likely to do a false positive if the execute time
takes longer than the max age, for example, if the execution time
between sleep and kill exceeds 1s, rd2 will also be killed due to
the max age.
The test can adjust the order of execution statements to increase
the probability of passing, but this is still will be a timing issue
in some slow machines, so decided give it a few more chances.
The test was introduced in #12299.
The test fails here and there:
```
*** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl
scan didn't handle slot skipping logic.
```
There are two case:
1. In the case of passing the test, we use child process to avoid the
dict resize, but it can not completely limit it, since in the dictDelete
we still have chance to trigger the resize (hit the force radio). The
reason why our test passed before is because the expire dict is still
in the rehashing process, so the dictDelete, the dictShrinkIfNeeded can
not trigger the resize.
2. In the case of failing the test, the expire dict finished the
rehashing,
so the last dictDelete, the dictShrinkIfNeeded trigger the dict resize
since it hit the force radio, so the skipping logic fail.
This PR add a new DEBUG command to disbale the dict resize.