valkey/tests/integration/replication-2.tcl
Oran Agra ae89958972
Set repl-diskless-sync to yes by default, add repl-diskless-sync-max-replicas (#10092)
1. enable diskless replication by default
2. add a new config named repl-diskless-sync-max-replicas that enables
   replication to start before the full repl-diskless-sync-delay was
   reached.
3. put replica online sooner on the master (see below)
4. test suite uses repl-diskless-sync-delay of 0 to be faster
5. a few tests that use multiple replica on a pre-populated master, are
   now using the new repl-diskless-sync-max-replicas
6. fix possible timing issues in a few cluster tests (see below)

put replica online sooner on the master 
----------------------------------------------------
there were two tests that failed because they needed for the master to
realize that the replica is online, but the test code was actually only
waiting for the replica to realize it's online, and in diskless it could
have been before the master realized it.

changes include two things:
1. the tests wait on the right thing
2. issues in the master, putting the replica online in two steps.

the master used to put the replica as online in 2 steps. the first
step was to mark it as online, and the second step was to enable the
write event (only after getting ACK), but in fact the first step didn't
contains some of the tasks to put it online (like updating good slave
count, and sending the module event). this meant that if a test was
waiting to see that the replica is online form the point of view of the
master, and then confirm that the module got an event, or that the
master has enough good replicas, it could fail due to timing issues.

so now the full effect of putting the replica online, happens at once,
and only the part about enabling the writes is delayed till the ACK.

fix cluster tests 
--------------------
I added some code to wait for the replica to sync and avoid race
conditions.
later realized the sentinel and cluster tests where using the original 5
seconds delay, so changed it to 0.

this means the other changes are probably not needed, but i suppose
they're still better (avoid race conditions)
2022-01-17 14:11:11 +02:00

94 lines
3.3 KiB
Tcl

start_server {tags {"repl external:skip"}} {
start_server {} {
test {First server should have role slave after SLAVEOF} {
r -1 slaveof [srv 0 host] [srv 0 port]
wait_replica_online r
wait_for_condition 50 100 {
[s -1 master_link_status] eq {up}
} else {
fail "Replication not started."
}
}
test {If min-slaves-to-write is honored, write is accepted} {
r config set min-slaves-to-write 1
r config set min-slaves-max-lag 10
r set foo 12345
wait_for_condition 50 100 {
[r -1 get foo] eq {12345}
} else {
fail "Write did not reached replica"
}
}
test {No write if min-slaves-to-write is < attached slaves} {
r config set min-slaves-to-write 2
r config set min-slaves-max-lag 10
catch {r set foo 12345} err
set err
} {NOREPLICAS*}
test {If min-slaves-to-write is honored, write is accepted (again)} {
r config set min-slaves-to-write 1
r config set min-slaves-max-lag 10
r set foo 12345
wait_for_condition 50 100 {
[r -1 get foo] eq {12345}
} else {
fail "Write did not reached replica"
}
}
test {No write if min-slaves-max-lag is > of the slave lag} {
r config set min-slaves-to-write 1
r config set min-slaves-max-lag 2
exec kill -SIGSTOP [srv -1 pid]
assert {[r set foo 12345] eq {OK}}
wait_for_condition 100 100 {
[catch {r set foo 12345}] != 0
} else {
fail "Master didn't become readonly"
}
catch {r set foo 12345} err
assert_match {NOREPLICAS*} $err
}
exec kill -SIGCONT [srv -1 pid]
test {min-slaves-to-write is ignored by slaves} {
r config set min-slaves-to-write 1
r config set min-slaves-max-lag 10
r -1 config set min-slaves-to-write 1
r -1 config set min-slaves-max-lag 10
r set foo aaabbb
wait_for_condition 50 100 {
[r -1 get foo] eq {aaabbb}
} else {
fail "Write did not reached replica"
}
}
# Fix parameters for the next test to work
r config set min-slaves-to-write 0
r -1 config set min-slaves-to-write 0
r flushall
test {MASTER and SLAVE dataset should be identical after complex ops} {
createComplexDataset r 10000
after 500
if {[r debug digest] ne [r -1 debug digest]} {
set csv1 [csvdump r]
set csv2 [csvdump {r -1}]
set fd [open /tmp/repldump1.txt w]
puts -nonewline $fd $csv1
close $fd
set fd [open /tmp/repldump2.txt w]
puts -nonewline $fd $csv2
close $fd
puts "Master - Replica inconsistency"
puts "Run diff -u against /tmp/repldump*.txt for more info"
}
assert_equal [r debug digest] [r -1 debug digest]
}
}
}