Commit Graph

12724 Commits

Author SHA1 Message Date
Yanqi Lv
4986310945
Import-mode: Avoid expiration and eviction during data syncing (#1185)
New config: `import-mode (yes|no)`

New command: `CLIENT IMPORT-SOURCE (ON|OFF)`

The config, when set to `yes`, disables eviction and deletion of expired
keys, except for commands coming from a client which has marked itself
as an import-source, the data source when importing data from another
node, using the CLIENT IMPORT-SOURCE command.

When we sync data from the source Valkey to the destination Valkey using
some sync tools like
[redis-shake](https://github.com/tair-opensource/RedisShake), the
destination Valkey can perform expiration and eviction, which may cause
data corruption. This problem has been discussed in
https://github.com/redis/redis/discussions/9760#discussioncomment-1681041
and Redis already have a solution. But in Valkey we haven't fixed it by
now.

E.g. we call `set key 1 ex 1` on the source server and transfer this
command to the destination server. Then we call `incr key` on the source
server before the key expired, we will have a key on the source server
with a value of 2. But when the command arrived at the destination
server, the key may be expired and has deleted. So we will have a key on
the destination server with a value of 1, which is inconsistent with the
source server.

In standalone mode, we can use writable replica to simplify the sync
process. However, in cluster mode, we still need a sync tool to help us
transfer the source data to the destination. The sync tool usually work
as a normal client and the destination works as a primary which keep
expiration and eviction.

In this PR, we add a new mode named 'import-mode'. In this mode, server
stop expiration and eviction just like a replica. Notice that this mode
exists only in sync state to avoid data inconsistency caused by
expiration and eviction. Import mode only takes effect on the primary.
Sync tools can mark their clients as an import source by `CLIENT
IMPORT-SOURCE`, which work like a client from primary and can visit
expired keys in `lookupkey`.

**Notice: during the migration, other clients, apart from the import
source, should not access the data imported by import source.**

---------

Signed-off-by: lvyanqi.lyq <lvyanqi.lyq@alibaba-inc.com>
Signed-off-by: Yanqi Lv <lvyanqi.lyq@alibaba-inc.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-11-19 21:53:19 +01:00
Binbin
ee386c92ff
Manual failover vote is not limited by two times the node timeout (#1305)
This limit should not restrict manual failover, otherwise in some
scenarios, manual failover will time out.

For example, if some FAILOVER_AUTH_REQUESTs or some FAILOVER_AUTH_ACKs
are lost during a manual failover, it cannot vote in the second manual
failover. Or in a mixed scenario of plain failover and manual failover,
it cannot vote for the subsequent manual failover.

The problem with the manual failover retry is that the mf will pause
the client 5s in the primary side. So every retry every manual failover
timed out is a bad move.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-11-19 11:17:20 -05:00
Binbin
132798b57d
Receipt of REPLCONF VERSION reply should be triggered by event (#1320)
This add the missing return when repl_state change to RECEIVE_VERSION_REPLY,
this way we won’t be blocked if the primary doesn’t reply with REPLCONF
VERSION.

In practice i guess this is no likely to block in this context, reading
small responses are are likely to be received in one packet, so this
is just a cleanup (consistent with the previous state machine
processing).

Also update the state machine diagram to mention the VERSION reply.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-19 23:42:50 +08:00
Seungmin Lee
3d0c834203
Fix LRU crash when getting too many random lua scripts (#1310)
### Problem
Valkey stores scripts in a dictionary (lua_scripts) keyed by their SHA1
hashes, but it needs a way to know which scripts are least recently
used. It uses an LRU list (lua_scripts_lru_list) to keep track of
scripts in usage order. When the list reaches a maximum length, Valkey
evicts the oldest scripts to free memory in both the list and
dictionary. The problem here is that the sds from the LRU list can be
pointing to already freed/moved memory by active defrag that the sds in
the dictionary used to point to. It results in assertion error at [this
line](https://github.com/valkey-io/valkey/blob/unstable/src/eval.c#L519)

### Solution
If we duplicate the sds when adding it to the LRU list, we can create an
independent copy of the script identifier (sha). This duplication
ensures that the sha string in the LRU list remains stable and
unaffected by any defragmentation that could alter or free the original
sds. In addition, dictUnlink doesn't require exact pointer
match([ref](https://github.com/valkey-io/valkey/blob/unstable/src/eval.c#L71-L78))
so this change makes sense to unlink the right dictEntry with the copy
of the sds.

### Reproduce
To reproduce it with tcl test:
1. Disable je_get_defrag_hint in defrag.c to trigger defrag often
2. Execute test script
```
start_server {tags {"auth external:skip"}} {

    test {Regression for script LRU crash} {
        r config set activedefrag yes
        r config set active-defrag-ignore-bytes 1
        r config set active-defrag-threshold-lower 0
        r config set active-defrag-threshold-upper 1
        r config set active-defrag-cycle-min 99
        r config set active-defrag-cycle-max 99

        for {set i 0} {$i < 100000} {incr i} {
            r eval "return $i" 0
        }
        after 5000;
    }
}
```


### Crash info
Crash report:
```
=== REDIS BUG REPORT START: Cut & paste starting from here ===
14044:M 12 Nov 2024 14:51:27.054 # === ASSERTION FAILED ===
14044:M 12 Nov 2024 14:51:27.054 # ==> eval.c:556 'de' is not true

------ STACK TRACE ------

Backtrace:
/usr/bin/redis-server 127.0.0.1:6379 [cluster](luaDeleteFunction+0x148)[0x723708]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](luaCreateFunction+0x26c)[0x724450]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](evalCommand+0x2bc)[0x7254dc]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](call+0x574)[0x5b8d14]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](processCommand+0xc84)[0x5b9b10]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](processCommandAndResetClient+0x11c)[0x6db63c]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](processInputBuffer+0x1b0)[0x6dffd4]
/usr/bin/redis-server 127.0.0.1:6379 [cluster][0x6bd968]
/usr/bin/redis-server 127.0.0.1:6379 [cluster][0x659634]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](amzTLSEventHandler+0x194)[0x6588d8]
/usr/bin/redis-server 127.0.0.1:6379 [cluster][0x750c88]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](aeProcessEvents+0x228)[0x757fa8]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](redisMain+0x478)[0x7786b8]
/lib64/libc.so.6(__libc_start_main+0xe4)[0xffffa7763da4]
/usr/bin/redis-server 127.0.0.1:6379 [cluster][0x5ad3b0]
```
Defrag info:
```
mem_fragmentation_ratio:1.18
mem_fragmentation_bytes:47229992
active_defrag_hits:20561
active_defrag_misses:5878518
active_defrag_key_hits:77
active_defrag_key_misses:212
total_active_defrag_time:29009
```

### Test:
Run the test script to push 100,000 scripts to ensure the LRU list keeps
500 maximum length without any crash.
```
27489:M 14 Nov 2024 20:56:41.583 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.583 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
[ok]: Regression for script LRU crash (6811 ms)
[1/1 done]: unit/test (7 seconds)
```

---------

Signed-off-by: Seungmin Lee <sungming@amazon.com>
Signed-off-by: Seungmin Lee <155032684+sungming2@users.noreply.github.com>
Co-authored-by: Seungmin Lee <sungming@amazon.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-11-18 18:06:35 -08:00
Seungmin Lee
f9d0b87622
Upgrade macos-12 to macos-13 in workflows (#1318)
### Problem
GitHub Actions is starting the deprecation process for macOS 12.
Deprecation will begin on 10/7/24 and the image will be fully
unsupported by 12/3/24.
For more details, see
https://github.com/actions/runner-images/issues/10721

Signed-off-by: Seungmin Lee <sungming@amazon.com>
Co-authored-by: Seungmin Lee <sungming@amazon.com>
2024-11-18 18:00:30 -08:00
Amit Nagler
c5012cc630
Optimize RDB load performance and fix cluster mode resizing on replica side (#1199)
This PR addresses two issues:

1. Performance Degradation Fix - Resolves a significant performance
issue during RDB load on replica nodes.
- The problem was causing replicas to rehash multiple times during the
load process. Local testing demonstrated up to 50% degradation in BGSAVE
time.
- The problem occurs when the replica tries to expand pre-created slot
dictionaries. This operation fails quietly, resulting in undetected
performance issues.
- This fix aims to optimize the RDB load process and restore expected
performance levels.

2. Bug fix when reading `RDB_OPCODE_RESIZEDB` in Valkey 8.0 cluster
mode-
- Use the shard's master slots count when processing this opcode, as
`clusterNodeCoversSlot` is not initialized for the currently syncing
replica.
- Previously, this problem went unnoticed because `RDB_OPCODE_RESIZEDB`
had no practical impact (due to 1).

These improvements will enhance overall system performance and ensure
smoother upgrades to Valkey 8.0 in the future.

Testing:
- Conducted local tests to verify the performance improvement during RDB
load.
- Verified that ignoring `RDB_OPCODE_RESIZEDB` does not negatively
impact functionality in the current version.

Signed-off-by: naglera <anagler123@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-11-18 19:09:35 +08:00
Binbin
d07674fc01
Fix sds unittest tests to check for zmalloc_usable_size (#1314)
s_malloc_size == zmalloc_size, currently sdsAllocSize does not
calculate PREFIX_SIZE when no malloc_size available, this casue
test_typesAndAllocSize fail in the new unittest, what we want to
check is actually zmalloc_usable_size.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-18 14:55:26 +08:00
uriyage
94113fde7f
Improvements for TLS with I/O threads (#1271)
Main thread profiling revealed significant overhead in TLS operations,
even with read/write offloaded to I/O threads:

Perf results:

**10.82%** 8.82% `valkey-server libssl.so.3 [.] SSL_pending` # Called by
main thread after I/O completion

**10.16%** 5.06% `valkey-server libcrypto.so.3 [.] ERR_clear_error` #
Called for every event regardless of thread handling

This commit further optimizes TLS operations by moving more work from
the main thread to I/O threads:

Improve TLS offloading to I/O threads with two main changes:

1. Move `ERR_clear_error()` calls closer to SSL operations
   - Currently, error queue is cleared for every TLS event
   - Now only clear before actual SSL function calls
   - This prevents unnecessary clearing in main thread when operations
     are handled by I/O threads

2. Optimize `SSL_pending()` checks
   - Add `TLS_CONN_FLAG_HAS_PENDING` flag to track pending data
   - Move pending check to follow read operations immediately
   - I/O thread sets flag when pending data exists
   - Main thread uses flag to update pending list

Performance improvements:
Testing setup based on
https://valkey.io/blog/unlock-one-million-rps-part2/

Before:
- SET: 896,047 ops/sec
- GET: 875,794 ops/sec

After:
- SET: 985,784 ops/sec (+10% improvement)
- GET: 1,066,171 ops/sec (+22% improvement)

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2024-11-17 21:52:35 -08:00
Binbin
aa2dd3ecb8
Stabilize replica migration test to make sure cluster config is consistent (#1311)
CI report this failure:
```
[exception]: Executing test client: MOVED 1 127.0.0.1:22128.
MOVED 1 127.0.0.1:22128
    while executing
"wait_for_condition 1000 50 {
            [R 3 get key_991803] == 1024 && [R 3 get key_977613] == 10240 &&
            [R 4 get key_991803] == 1024 && ..."
```

This may be because, even though the cluster state becomes OK,
The cluster still has inconsistent configuration for a short period
of time. We make sure to wait for the config to be consistent.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-16 18:58:25 +08:00
Binbin
86f33ea2b0
Unprotect rdb channel when bgsave child fails in dual channel replication (#1297)
If bgsaveerr is error, there is no need to protect the rdb channel.
The impact of this may be that when bgsave fails, we will protect
the rdb channel for 60s. It may occupy the reference of the repl
buf block, making it impossible to recycle it until we free the
client due to COB or free the client after 60s.

We kept the RDB channel open as long as the replica hadn't established
a main connection, even if the snapshot process failed. There is no
value in keeping the RDB client in this case.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-15 16:48:13 +08:00
Binbin
92181b6797
Fix primary crash when processing dirty slots during shutdown wait / failover wait / client pause (#1131)
We have an assert in propagateNow. If the primary node receives a
CLUSTER UPDATE such as dirty slots during SIGTERM waitting or during
a manual failover pausing or during a client pause, the delKeysInSlot
call will trigger this assert and cause primary crash.

In this case, we added a new server_del_keys_in_slot state just like
client_pause_in_transaction to track the state to avoid the assert
in propagateNow, the dirty slots will be deleted in the end without
affecting the data consistency.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-11-15 16:47:15 +08:00
Binbin
4e2493e5c9
Kill diskless fork child asap when the last replica drop (#1227)
We originally checked the replica connection to whether to kill the
diskless child only when rdbPipeReadHandler is triggered. Actually
we can check it when the replica is disconnected, so that we don't
have to wait for rdbPipeReadHandler to be triggered and can kill
the forkless child as soon as possible.

In this way, when the child or rdbPipeReadHandler is stuck for some
reason, we can kill the child faster and release the fork resources.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-15 16:34:32 +08:00
Binbin
d3f3b9cc3a
Fix daily valgrind build with unit tests (#1309)
This was introduced in #515.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-15 14:27:28 +08:00
bentotten
b9994030e9
Log clusterbus handshake timeout failures (#1247)
This adds a log when a handshake fails for a timeout. This can help
troubleshoot cluster asymmetry issues caused by failed MEETs

---------

Signed-off-by: Ben Totten <btotten@amazon.com>
Signed-off-by: bentotten <59932872+bentotten@users.noreply.github.com>
Co-authored-by: Ben Totten <btotten@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-11-14 20:48:48 -08:00
Qu Chen
32f7541fe3
Simplify dictType callbacks and move some macros from dict.h to dict.c (#1281)
Remove the dict pointer argument to the `dictType` callbacks `keyDup`,
`keyCompare`, `keyDestructor` and `valDestructor`. This argument was
unused in all of the callback implementations.

The macros `dictFreeKey()` and `dictFreeVal()` are made internal to dict
and moved from dict.h to dict.c. They're also changed from macros to
static inline functions.

Signed-off-by: Qu Chen <quchen@amazon.com>
2024-11-14 09:45:47 +01:00
Parth
863d312803
Fix link-time optimization to work correctly for unit tests (i.e. -flto flag) (#1290) (#1296)
* We compile various c files into object and package them into library
(.a file) using ar to feed to unit tests. With new GCC versions, the
objects inside such library don't participate in LTO process without
additional flags.
* Here is a direct quote from gcc documentation explaining this issue:
"If you are not using a linker with plugin support and/or do not enable
the linker plugin, then the objects inside libfoo.a are extracted and
linked as usual, but they do not participate in the LTO optimization
process. In order to make a static library suitable for both LTO
optimization and usual linkage, compile its object files with
-flto-ffat-lto-objects."
* Read full documentation about -flto at
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
* Without this additional flag, I get following errors while executing
"make test-unit". With this change, those errors go away.

```
ARCHIVE libvalkey.a
ar: threads_mngr.o: plugin needed to handle lto object
...
..
.
/tmp/ccDYbMXL.ltrans0.ltrans.o: In function `dictClear':
/local/workplace/elasticache/valkey/src/unit/../dict.c:776: undefined
reference to `valkey_free'
/local/workplace/elasticache/valkey/src/unit/../dict.c:770: undefined
reference to `valkey_free'
/tmp/ccDYbMXL.ltrans0.ltrans.o: In function `dictGetVal':
```

Fixes #1290

---------

Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com>
2024-11-13 21:50:55 -08:00
skyfirelee
4a9864206f
Migrate quicklist unit test to new framework (#515)
Migrate quicklist unit test to new unit test framework, and cleanup
remaining references of SERVER_TEST, parent ticket #428.

Closes #428.

Signed-off-by: artikell <739609084@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-11-14 10:37:44 +08:00
Binbin
6fba747c39
Fix log printing always shows the role as child under daemonize (#1301)
In #1282, we init server.pid earlier to keep log message role
consistent, but we forgot to consider daemonize. In daemonize
mode, we will always print the child role.

We need to reset server.pid after daemonize(), otherwise the
log printing role will always be the child. It also causes a
incorrect server.pid value, affecting the concatenation of
some pid names.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-14 10:26:23 +08:00
Binbin
2df56d87c0
Fix empty primary may have dirty slots data due to bad migration (#1285)
If we become an empty primary for some reason, we still need to
check if we need to delete dirty slots, because we may have dirty
slots data left over from a bad migration. Like the target node forcibly
executes CLUSTER SETSLOT NODE to take over the slot without
performing key migration.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-11 22:13:47 +08:00
Binbin
a2d22c63c0
Fix replica not able to initate election in time when epoch fails (#1009)
If multiple primary nodes go down at the same time, their replica nodes will
initiate the elections at the same time. There is a certain probability that
the replicas will initate the elections in the same epoch.

And obviously, in our current election mechanism, only one replica node can
eventually get the enough votes, and the other replica node will fail to win
due the the insufficient majority, and then its election will time out and
we will wait for the retry, which result in a long failure time.

If another node has been won the election in the failover epoch, we can assume
that my election has failed and we can retry as soom as possible.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-11 22:12:49 +08:00
Binbin
167e8ab8de
Trigger the election immediately when doing a manual failover (#1081)
Currently when a manual failover is triggeded, we will set a
CLUSTER_TODO_HANDLE_FAILOVER to start the election as soon as
possible in the next beforeSleep. But in fact, we won't delay
the election in manual failover, waitting for the next beforeSleep
to kick in will delay the election a some milliseconds.

We can trigger the election immediately in this case in the
same function call, without waitting for beforeSleep, which
can save us some milliseconds.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-11 21:43:46 +08:00
Binbin
4aacffa32d
Stabilize dual replication test to avoid getting LOADING error (#1288)
When doing `$replica replicaof no one`, we may get a LOADING
error, this is because during the test execution, the replica
may reconnect very quickly, and the full sync is initiated,
and the replica has entered the LOADING state.

In this commit, we make sure the primary is pasued after the
fork, so the replica won't enter the LOADING state, and with
this fix, this test seems more natural and predictable.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-11 21:42:34 +08:00
Qu Chen
9300a7ebc8
Set fields to NULL after free in freeClient() (#1279)
Null out several references after freeing the object in `freeClient()`.

This is just to make the code more safe, to protect against
use-after-free for future changes.

Signed-off-by: Qu Chen <quchen@amazon.com>
2024-11-11 10:39:48 +01:00
zixuan zhao
0b5b2c7484
Log as primary role (M) instead of child process (C) during startup (#1282)
Init server.pid earlier to keep log message role consistent.

Closes #1206.

Before:
```text
24881:C 21 Oct 2024 21:10:57.165 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
24881:C 21 Oct 2024 21:10:57.165 * Valkey version=255.255.255, bits=64, commit=814e0f55, modified=1, pid=24881, just started
24881:C 21 Oct 2024 21:10:57.165 * Configuration loaded
24881:M 21 Oct 2024 21:10:57.167 * Increased maximum number of open files to 10032 (it was originally set to 1024).
```
After:
```text
68560:M 08 Nov 2024 16:10:12.257 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
68560:M 08 Nov 2024 16:10:12.257 * Valkey version=255.255.255, bits=64, commit=45d596e1, modified=1, pid=68560, just started
68560:M 08 Nov 2024 16:10:12.257 * Configuration loaded
68560:M 08 Nov 2024 16:10:12.258 * monotonic clock: POSIX clock_gettime
```

Signed-off-by: azuredream <zhaozixuan67@gmail.com>
2024-11-11 10:33:26 +01:00
zhenwei pi
45d596e121
RDMA: Use conn ref counter to prevent double close (#1250)
RDMA: Use connection reference counter style
    
The reference counter of connection is used to protect re-entry of closenmethod.
Use this style instead the unsafe one.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-11-08 09:33:01 +01:00
Jacob Murphy
e972d56460
Make sure to copy null terminator byte in dual channel code (#1272)
As @madolson pointed out, these do have proper null terminators. This
cleans them up to follow the rest of the code which copies the last byte
explicitly, which should help reduce cognitive load and make it more
resilient should code refactors occur (e.g. non-static allocation of
memory, changes to other functions).

---------

Signed-off-by: Jacob Murphy <jkmurphy@google.com>
2024-11-07 18:25:43 -08:00
eifrah-aws
07b3e7ae7a
Add CMake build system for valkey (#1196)
With this commit, users are able to build valkey using `CMake`.

## Example usage:

Build `valkey-server` in Release mode with TLS enabled and using
`jemalloc` as the allocator:

```bash
mkdir build-release
cd $_
cmake .. -DCMAKE_BUILD_TYPE=Release \
         -DCMAKE_INSTALL_PREFIX=/tmp/valkey-install \
         -DBUILD_MALLOC=jemalloc -DBUILD_TLS=1
make -j$(nproc) install

# start valkey
/tmp/valkey-install/bin/valkey-server
```

Build `valkey-unit-tests`:

```bash
mkdir build-release-ut
cd $_
cmake .. -DCMAKE_BUILD_TYPE=Release \
         -DBUILD_MALLOC=jemalloc -DBUILD_UNIT_TESTS=1
make -j$(nproc)

# Run the tests
./bin/valkey-unit-tests 
```

Current features supported by this PR:

- Building against different allocators: (`jemalloc`, `tcmalloc`,
`tcmalloc_minimal` and `libc`), e.g. to enable `jemalloc` pass
`-DBUILD_MALLOC=jemalloc` to `cmake`
- OpenSSL builds (to enable TLS, pass `-DBUILD_TLS=1` to `cmake`)
- Sanitizier: pass `-DBUILD_SANITIZER=<address|thread|undefined>` to
`cmake`
- Install target + redis symbolic links
- Build `valkey-unit-tests` executable
- Standard CMake variables are supported. e.g. to install `valkey` under
`/home/you/root` pass `-DCMAKE_INSTALL_PREFIX=/home/you/root`

Why using `CMake`? To list *some* of the advantages of using `CMake`:

- Superior IDE integrations: cmake generates the file
`compile_commands.json` which is required by `clangd` to get a compiler
accuracy code completion (in other words: your VScode will thank you)
- Out of the source build tree: with the current build system, object
files are created all over the place polluting the build source tree,
the best practice is to build the project on a separate folder
- Multiple build types co-existing: with the current build system, it is
often hard to have multiple build configurations. With cmake you can do
it easily:
- It is the de-facto standard for C/C++ project these days

More build examples: 

ASAN build:

```bash
mkdir build-asan
cd $_
cmake .. -DBUILD_SANITIZER=address -DBUILD_MALLOC=libc
make -j$(nproc)
```

ASAN with jemalloc:

```bash
mkdir build-asan-jemalloc
cd $_
cmake .. -DBUILD_SANITIZER=address -DBUILD_MALLOC=jemalloc 
make -j$(nproc)
```

As seen by the previous examples, any combination is allowed and
co-exist on the same source tree.

## Valkey installation

With this new `CMake`, it is possible to install the binary by running
`make install` or creating a package `make package` (currently supported
on Debian like distros)

### Example 1: build & install using `make install`:

```bash
mkdir build-release
cd $_
cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/valkey-install -DCMAKE_BUILD_TYPE=Release
make -j$(nproc) install
# valkey is now installed under $HOME/valkey-install
```

### Example 2: create a `.deb` installer:

```bash
mkdir build-release
cd $_
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc) package
# ... CPack deb generation output
sudo gdebi -n ./valkey_8.1.0_amd64.deb
# valkey is now installed under /opt/valkey
```

### Example 3: create installer for non Debian systems (e.g. FreeBSD or
macOS):

```bash
mkdir build-release
cd $_
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc) package
mkdir -p /opt/valkey && ./valkey-8.1.0-Darwin.sh --prefix=/opt/valkey  --exclude-subdir
# valkey-server is now installed under /opt/valkey

```

Signed-off-by: Eran Ifrah <eifrah@amazon.com>
2024-11-07 18:01:37 -08:00
Wen Hui
3672f9b2c3
Revert "Decline unsubscribe related command in non-subscribed mode" (#1265)
This PR goal is to revert the changes on PR
https://github.com/valkey-io/valkey/pull/759

Recently, we got some reports that in Valkey 8.0 the PR
https://github.com/valkey-io/valkey/pull/759 (Decline unsubscribe
related command in non-subscribed mode) causes break change.
(https://github.com/valkey-io/valkey/issues/1228)

Although from my thought, call commands "unsubscribeCommand",
"sunsubscribeCommand", "punsubscribeCommand" in request-response mode
make no sense. This is why I created PR
https://github.com/valkey-io/valkey/pull/759

But breaking change is always no good, @valkey-io/core-team How do you
think we revert this PR code changes?

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-11-07 20:05:16 -05:00
Binbin
1c18c80844
Fix incorrect cache_memory reset in functionsLibCtxClear (#1255)
functionsLibCtxClear should clear the provided lib_ctx parameter,
not the static variable curr_functions_lib_ctx, as this contradicts
the function's intended purpose.

The impact i guess is minor, like in some unhappy paths (diskless load
fails, function restore fails?), we will mess up the functions_caches
field, which is used in used_memory_functions / used_memory_scripts
fileds in INFO.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-07 13:44:21 +08:00
Binbin
22bc49c4a6
Try to stabilize the failover call in the slot migration test (#1078)
The CI report replica will return the error when performing CLUSTER
FAILOVER:
```
-ERR Master is down or failed, please use CLUSTER FAILOVER FORCE
```

This may because the primary state is fail or the cluster connection
is disconnected during the primary pause. In this PR, we added some
waits in wait_for_role, if the role is replica, we will wait for the
replication link and the cluster link to be ok.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-07 13:42:20 +08:00
Binbin
a0b1cbad83
Change errno from EEXIST to EALREADY in serverFork if child process exists (#1258)
We set this to EEXIST in 568c2e039b,
it prints "File exists" which is not quite accurate,
change it to EALREADY, it will print "Operation already in progress".

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-07 12:13:00 +08:00
Binbin
12c5af03b8
Remove empty DB check branch in KEYS command (#1259)
We don't think we really care about optimizing for the empty DB case,
which should be uncommon. Adding branches hurts branch prediction.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-06 10:32:00 +08:00
Amit Nagler
48ebe21ad1
fix: clean up refactoring leftovers (#1264)
This commit addresses issues that were likely introduced during a rebase
related to:
b0f23df165

Change dual channel replication state in main handler only

Signed-off-by: naglera <anagler123@gmail.com>
2024-11-05 04:57:34 -08:00
Madelyn Olson
3c32ee1bda
Add a filter option to drop all cluster packets (#1252)
A minor debugging change that helped in the investigation of
https://github.com/valkey-io/valkey/issues/1251. Basically there are
some edge cases where we want to fully isolate a note from receiving
packets, but can't suspend the process because we need it to continue
sending outbound traffic. So, added a filter for that.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-11-04 12:36:20 -08:00
Binbin
a102852d5e
Fix timing issue in cluster-shards tests (#1243)
The cluster-node-timeout is 3000 in our tests, the timing test wasn't
succeeding, so extending the wait_for made them much more reliable.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-02 19:51:14 +08:00
Jim Brunner
0d7b2344b2
correct type internal to kvstore (minor) (#1246)
All of the internal variables related to number of dicts in the kvstore
are type `int`. Not sure why these 2 items were declared as `long long`.

Signed-off-by: Jim Brunner <brunnerj@amazon.com>
2024-11-01 15:16:18 -07:00
zhenwei pi
e985ead7f9
RDMA: Prevent IO for child process (#1244)
RDMA MR (memory region) is not forkable, the VMA (virtual memory area)
of a MR gets empty in a child process. Prevent IO for child process to
avoid server crash.

In the check for whether read and write is allowed in an RDMA
connection, a check that if we're in a child process is added. If we
are, the function returns an error, which will cause the RDMA client to
be disconnected.

Suggested-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-11-01 13:28:09 +01:00
Madelyn Olson
1c222f77ce
Improve performance of sdssplitargs (#1230)
The current implementation of `sdssplitargs` does repeated `sdscatlen`
to build the parsed arguments, which isn't very efficient because it
does a lot of extra reallocations and moves through the sds code a lot.
It also typically results in memory overhead, because `sdscatlen`
over-allocates, which is usually not needed since args are usually not
modified after being created.

The new implementation of sdssplitargs does two passes, the first to
parse the argument to figure out the final length and the second to
actually copy the string. It's generally about 2x faster for larger
strings (~100 bytes), and about 20% faster for small strings (~10
bytes). This is generally faster since as long as everything is in the
CPU cache, it's going to be fast.

There are a couple of sanity tests, none existed before, as well as some
fuzzying which was used to find some bugs and also to do the
benchmarking. The original benchmarking code can be seen
6576aeb86a.

```
test_sdssplitargs_benchmark - unit/test_sds.c:530] Using random seed: 1729883235
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 56.44%, new:13039us, old:29930us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 56.58%, new:12057us, old:27771us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 59.18%, new:9048us, old:22165us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 54.61%, new:12381us, old:27278us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 51.17%, new:16012us, old:32793us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 49.18%, new:16041us, old:31563us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 58.40%, new:12450us, old:29930us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 56.49%, new:13066us, old:30031us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 58.75%, new:12744us, old:30894us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 52.44%, new:16885us, old:35504us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 62.57%, new:8107us, old:21659us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 62.12%, new:8320us, old:21966us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 45.23%, new:13960us, old:25487us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 57.95%, new:9188us, old:21849us
```

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-10-31 11:37:53 -07:00
Masahiro Ide
91cbf77442
Eliminate snprintf usage at setDeferredAggregateLen (#1234)
to align with how we encode the length at `_addReplyLongLongWithPrefix`

Signed-off-by: Masahiro Ide <masahiro.ide@lycorp.co.jp>
Co-authored-by: Masahiro Ide <masahiro.ide@lycorp.co.jp>
2024-10-31 11:30:05 -07:00
zhenwei pi
ab98f375db
RDMA: Delete keepalive timer on closing (#1237)
Typically, RDMA connection gets closed by client side, the server side
handles diconnected CM event, and delete keepalive timer correctly.
However, the server side may close connection voluntarily, for example
the maxium connections exceed. Handle this case to avoid invalid memory
access.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-10-30 11:12:42 +01:00
Binbin
789a73b0d0
Minor fix to debug logging in replicationFeedStreamFromPrimaryStream (#1235)
We should only print logs when hide-user-data-from-log is off.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-30 10:25:50 +08:00
Shivshankar
13f5f665f2
Update the argument of clusterNodeGetReplica declaration (#1239)
clusterNodeGetReplica agrumnets are missed to migrate during the slave
to replication migration so updated the argument slave to replica.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-10-30 00:19:56 +01:00
Madelyn Olson
5a4c0640ce
Mark main and serverAssert as weak symbols to be overridden (#1232)
At some point unit tests stopped building on MacOS because of duplicate
symbols. I had originally solved this problem by using a flag that
overrides symbols, but the much better solution is to mark the duplicate
symbols as weak and they can be overridden during linking. (Symbols by
default are strong, strong symbols override weak symbols)

I also added macos unit build to the CI, so that this doesn't silently
break in the future again.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-10-29 14:26:17 -07:00
zixuan zhao
8ee7a58025
Document log format configs in valkey.conf (#1233)
Add config options for log format and timestamp format introduced by
#1022
Related to #1225

This change adds two new configs into valkey.conf:
log-format
log-timestamp-format

---------

Signed-off-by: azuredream <zhaozixuan67@gmail.com>
2024-10-29 11:13:30 +01:00
Lipeng Zhu
c21f1dc084
Increase the IO_THREADS_MAX_NUM. (#1220)
### Description

This patch try to increase the max number of io-threads from 16(128) to
256 for below reasons:

1. The core number increases a lot in the modern server processors, for
example, the [Sierra
Forest](https://en.wikipedia.org/wiki/Sierra_Forest) processors are
targeted towards with up to **288** cores.
Due to limitation of **_io-threads_** number (16 and 128 ), benchmark
like https://openbenchmarking.org/test/pts/valkey even cannot run on a
high core count server.

2. For some workloads, the bottleneck could be main thread, but for the
other workloads, big key/value which caused heavy io, the bottleneck
could be the io-threads, for example benchmark `memtier_benchmark -s
127.0.0.1 -p 9001 "--data-size" "20000" --ratio 1:0 --key-pattern P:P
--key-minimum=1 --key-maximum 1000000 --test-time 180 -c 50 -t 16
--hide-histogram`. The QPS is still scalable after 16 io-threads.

![image](https://github.com/user-attachments/assets/e980f805-a162-44be-b03e-ab37a9c489cf)
**Fig 1. QPS Scale factor with io-threads number grows.**

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>
2024-10-27 22:43:23 -07:00
Binbin
5d2ff853a3
Fix minor repldbfd leak in updateReplicasWaitingBgsave if fstat fails (#1226)
In the old code, if fstat fails, replica->repldbfd will hold the
fd and we are doing a free client. And in freeClient, we check and
close only if repl_state == REPLICA_STATE_SEND_BULK. So if fstat
fails, we will leak the fd.

We can also extend freeClient to handle REPLICA_STATE_WAIT_BGSAVE_END
as well, but here seems to be a more friendly (and safer) way.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-27 15:23:00 +08:00
Shivshankar
4be09e434a
Fix typo in valkey.conf file's shutdown section (#1224)
Found typo "exists" ==> "exits" in valkey.conf in shutdown section.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-10-25 14:03:59 +02:00
Lipeng Zhu
9c60fcdae2
Do security attack check only when command not found to reduce the critical path (#1212)
When explored the cycles distribution for main thread with io-threads
enabled. We found this security attack check takes significant time in
main thread, **~3%** cycles were used to do the commands security check
in main thread.

This patch try to completely avoid doing it in the hot path. We can do
it only after we looked up the command and it wasn't found, just before
we call commandCheckExistence.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>
2024-10-25 11:13:28 +02:00
zixuan zhao
55bbbe09a3
Configurable log and timestamp formats (logfmt, ISO8601) (#1022)
Add ability to configure log output format and timestamp format in the
logs.

This change adds two new configs:

* `log-format`: Either legacy or logfmt (See https://brandur.org/logfmt)
* `log-timestamp-format`: legacy, iso8601 or milliseconds (since the
eppch).

Related to #1006.

Example:

```
$ ./valkey-server  /home/zhaoz12/git/valkey/valkey/valkey.conf
pid=109463 role=RDB/AOF timestamp="2024-09-10T20:37:25.738-04:00" level=warning message="WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect."
pid=109463 role=RDB/AOF timestamp="2024-09-10T20:37:25.738-04:00" level=notice message="oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo"
pid=109463 role=RDB/AOF timestamp="2024-09-10T20:37:25.738-04:00" level=notice message="Valkey version=255.255.255, bits=64, commit=affbea5d, modified=1, pid=109463, just started"
pid=109463 role=RDB/AOF timestamp="2024-09-10T20:37:25.738-04:00" level=notice message="Configuration loaded"
pid=109463 role=master timestamp="2024-09-10T20:37:25.738-04:00" level=notice message="monotonic clock: POSIX clock_gettime"
pid=109463 role=master timestamp="2024-09-10T20:37:25.739-04:00" level=warning message="Failed to write PID file: Permission denied"
```

---------

Signed-off-by: azuredream <zhaozixuan67@gmail.com>
2024-10-25 00:36:32 +02:00
Binbin
2956367731
Maintain return value of rdbSaveDb after writing slot-info aux (#1222)
All other places written in this function are maintained it,
although the caller of rdbSaveDb does not reply on it, it is
maintained to be consistent with other places, is its duty.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-24 09:53:05 -04:00