valkey/TODO

Redis TODO and Roadmap

VERSION 2.0 TODO
================

* BRPOPLPUSH
* List ops like L/RPUSH L/RPOP should return the new list length.
* Save dataset / fsync() on SIGTERM
* MULTI/EXEC should support the "EXEC FSYNC" form?
* BLPOP & C. tests (write a non blocking Tcl client as first step)
* ZCOUNT sortedset min max
* ZRANK: http://docs.google.com/viewer?a=v&q=cache:tCQaP3ZeN4YJ:courses.csail.mit.edu/6.046/spring04/handouts/ps5-sol.pdf+skip+list+rank+operation+augmented&hl=en&pid=bl&srcid=ADGEEShXuNjTcZyXw_1cq9OaWpSXy3PprjXqVzmM-LE0ETFznLyrDXJKQ_mBPNT10R8ErkoiXD9JbMw_FaoHmOA4yoGVrA7tZWiy393JwfCwuewuP93sjbkzZ_gnEp83jYhPYjThaIzw&sig=AHIEtbRF0GkYCdYRFtTJBE69senXZwFY0w
* Once ZRANK is implemented, change the implementation of ZCOUNT to use the augmented skiplist in order to be much faster.
* Write doc for ZCOUNT, and for open / closed intervals of sorted sets range operations.

Virtual Memory sub-TODO:
* Check if the page selection algorithm is working well
* Divide swappability of objects by refcount
* Use multiple open FDs against the VM file, one for thread.
* it should be possible to give the vm-max-memory option in megabyte, gigabyte, ..., just using 2GB, 100MB, and so forth.
* Try to understand what can be moved into I/O threads that currently is instead handled by the main thread. For instance swapping file table scannig to find contiguous page could be a potential candidate (but I'm not convinced it's a good idea, better to improve the algorithm, for instance double the fast forward at every step?).
* Possibly decrRefCount() against swapped objects can be moved into I/O threads, as it's a slow operation against million elements list, and in general consumes CPU time that can be consumed by other threads (and cores).
* EXISTS should avoid loading the object if possible without too make the code too specialized.
* vm-min-age <seconds> option
* Make sure objects loaded from the VM are specially encoded when possible.
* Check what happens performance-wise if instead to create threads again and again the same threads are reused forever. Note: this requires a way to disable this clients in the child, but waiting for empty new jobs queue can be enough.
* Sets of integers are slow to load, for a number of reasons. Fix it. (use slow_sets.rdb file for debugging). (p.s. this was now partially fixed).
* On EXEC try to block the client until relevant keys are loaded.

* Hashes (GET/SET/DEL/INCRBY/EXISTS/FIELDS/LEN/MSET/MGET). Special encoding for hashes with less than N elements.
* Write documentation for APPEND
* Implement LEN, SUBSTR, PEEK, POKE, SETBIT, GETBIT

VERSION 2.2 TODO (Fault tolerant sharding)
===========================================

* Redis-cluster, a fast intermediate layer (proxy) that implements consistent hashing and fault tollerant nodes handling.

Interesting readings about this:

    - http://ayende.com/Blog/archive/2009/04/06/designing-rhino-dht-a-fault-tolerant-dynamically-distributed-hash.aspx

VERSION 2.4 TODO (Optimizations and latency)
============================================

* Lower the CPU usage.
* Lower the RAM usage everywhere possible.
* Use epool and alike to rewrite ae.c for Linux and other platforms suppporting fater-than-select() mutiplexing APIs.
* Implement an UDP interface for low-latency GET/SET operations.

OTHER IMPORTANT THINGS THAT WILL BE ADDED BUT I'M NOT SURE WHEN
===============================================================

BIG ONES:

* Specially encoded memory-saving integer sets.
* A command to export a JSON dump (there should be mostly working patch needing major reworking).
* Specially encoded sets of integers (this includes a big refactoring providing an higher level layer for Sets manipulation)

SMALL ONES:

* If sizeof(double) == sizeof(void*) we could store the double value of sorted sets directly in place of the pointer instead of allocating it in the heap.
* Delete on writes against expire policy should only happen after argument parsing for commands doing their own arg parsing stuff.
* Give errors when incrementing a key that does not look like an integer, when providing as a sorted set score something can't be parsed as a double, and so forth.
* MSADD (n keys) (n values). See this thread in the Redis google group: http://groups.google.com/group/redis-db/browse_thread/thread/e766d84eb375cd41
* Don't save empty lists / sets / zsets on disk with snapshotting.
* Remove keys when a list / set / zset reaches length of 0.
* An option to exec a command slave-side if the master connection is lost: even cooler: if the script returns "0" the slave elects itself as master, otherwise continue trying to reconnect.

THE "MAYBE" TODO LIST: things that may or may not get implemented
=================================================================

Most of this can be seen just as proposals, the fact they are in this list
it's not a guarantee they'll ever get implemented ;)

* Move dict.c from hash table to skip list, in order to avoid the blocking resize operation needed for the hash table.
* FORK command (fork()s executing the commands received by the current
  client in the new process). Hint: large SORTs can use more cores,
  copy-on-write will avoid memory problems.
* DUP command? DUP srckey dstkey, creates an exact clone of srckey value in dstkey.
* SORT: Don't copy the list into a vector when BY argument is constant.
* Write the hash table size of every db in the dump, so that Redis can resize the hash table just one time when loading a big DB.
* LOCK / TRYLOCK / UNLOCK as described many times in the google group
* Replication automated tests
* Byte Array type (BA prefixed commands): BASETBIT BAGETBIT BASETU8 U16 U32 U64 S8 S16 S32 S64, ability to atomically INCRBY all the base types. BARANGE to get a range of bytes as a bulk value, BASETRANGE to set a range of bytes.
* zmalloc() should avoid to add a private header for archs where there is some other kind of libc-specific way to get the size of a malloced block. Already done for Mac OS X.
* Read-only mode.
* Pattern-matching replication.
* Add an option to relax the delete-expiring-keys-on-write semantic *denying* replication and AOF when this is on? Can be handy sometimes, when using Redis for non persistent state, but can create problems. For instance should rename and move also "move" the timeouts? How does this affect other commands?
* Multiple BY in SORT.
minor TODO change 2009-11-24 12:47:26 +00:00			`Redis TODO and Roadmap`

multi bulk requests in redis-benchmark, default fsync policy changed to everysec, added a prefix character for DEBUG logs 2010-02-06 12:39:07 +00:00			`VERSION 2.0 TODO`
			`================`
TODO reworked to reflect the real roadmap 2009-10-27 17:54:03 +00:00
New vararg BLPOP able to block against multiple keys 2010-01-02 14:06:44 +00:00			`* BRPOPLPUSH`
TODO update 2009-12-01 18:37:43 +00:00			`* List ops like L/RPUSH L/RPOP should return the new list length.`
New vararg BLPOP able to block against multiple keys 2010-01-02 14:06:44 +00:00			`* Save dataset / fsync() on SIGTERM`
VM tuning thanks to redis-stat vmstat. Now it performs much better under high load 2010-01-22 17:58:11 +00:00			`* MULTI/EXEC should support the "EXEC FSYNC" form?`
New vararg BLPOP able to block against multiple keys 2010-01-02 14:06:44 +00:00			`* BLPOP & C. tests (write a non blocking Tcl client as first step)`
ZRANGEBYSCORE now supports open intervals, prefixing double values with a open paren. Added ZCOUNT that can count the elements inside an interval of scores, this supports open intervals too 2010-02-07 20:52:35 +00:00			`* ZCOUNT sortedset min max`
VM now is able to block clients on swapped keys for all the commands 2010-02-09 13:01:49 +00:00			`* ZRANK: http://docs.google.com/viewer?a=v&q=cache:tCQaP3ZeN4YJ:courses.csail.mit.edu/6.046/spring04/handouts/ps5-sol.pdf+skip+list+rank+operation+augmented&hl=en&pid=bl&srcid=ADGEEShXuNjTcZyXw_1cq9OaWpSXy3PprjXqVzmM-LE0ETFznLyrDXJKQ_mBPNT10R8ErkoiXD9JbMw_FaoHmOA4yoGVrA7tZWiy393JwfCwuewuP93sjbkzZ_gnEp83jYhPYjThaIzw&sig=AHIEtbRF0GkYCdYRFtTJBE69senXZwFY0w`
			`* Once ZRANK is implemented, change the implementation of ZCOUNT to use the augmented skiplist in order to be much faster.`
			`* Write doc for ZCOUNT, and for open / closed intervals of sorted sets range operations.`
MSET and MSETNX commands implemented 2009-10-16 11:44:25 +00:00
Introduced a new log verbosity level, so now DEBUG is really for debugging. Refactored a bit maxmemory. When virtual memory is short in RAM free the objects freelist as well as swapping things out. 2010-01-06 14:15:17 +00:00			`Virtual Memory sub-TODO:`
VM tuning thanks to redis-stat vmstat. Now it performs much better under high load 2010-01-22 17:58:11 +00:00			`* Check if the page selection algorithm is working well`
Introduced a new log verbosity level, so now DEBUG is really for debugging. Refactored a bit maxmemory. When virtual memory is short in RAM free the objects freelist as well as swapping things out. 2010-01-06 14:15:17 +00:00			`* Divide swappability of objects by refcount`
VM now is able to block clients on swapped keys for all the commands 2010-02-09 13:01:49 +00:00			`* Use multiple open FDs against the VM file, one for thread.`
used_memory_human added to INFO output. Human readable amount of memory used. 2010-01-08 15:38:48 +00:00			`* it should be possible to give the vm-max-memory option in megabyte, gigabyte, ..., just using 2GB, 100MB, and so forth.`
VM tuning thanks to redis-stat vmstat. Now it performs much better under high load 2010-01-22 17:58:11 +00:00			`* Try to understand what can be moved into I/O threads that currently is instead handled by the main thread. For instance swapping file table scannig to find contiguous page could be a potential candidate (but I'm not convinced it's a good idea, better to improve the algorithm, for instance double the fast forward at every step?).`
TODO cahnges 2010-01-26 10:02:27 +00:00			`* Possibly decrRefCount() against swapped objects can be moved into I/O threads, as it's a slow operation against million elements list, and in general consumes CPU time that can be consumed by other threads (and cores).`
Fixed memory human style memory reporting, removed server.usedmemory, now zmalloc_used_memory() is used always. 2010-01-23 16:55:04 +00:00			`* EXISTS should avoid loading the object if possible without too make the code too specialized.`
			`* vm-min-age <seconds> option`
loading side of the threaded VM 2010-01-28 15:12:04 +00:00			`* Make sure objects loaded from the VM are specially encoded when possible.`
Fixed VM corruption due to child fclosing the VM file directly or indirectly calling exit(), now replaced with _exit() in all the sensible places. Masked a few signals from IO threads. 2010-01-31 15:03:44 +00:00			`* Check what happens performance-wise if instead to create threads again and again the same threads are reused forever. Note: this requires a way to disable this clients in the child, but waiting for empty new jobs queue can be enough.`
TODO updated 2010-03-03 19:13:42 +00:00			`* Sets of integers are slow to load, for a number of reasons. Fix it. (use slow_sets.rdb file for debugging). (p.s. this was now partially fixed).`
			`* On EXEC try to block the client until relevant keys are loaded.`
Introduced a new log verbosity level, so now DEBUG is really for debugging. Refactored a bit maxmemory. When virtual memory is short in RAM free the objects freelist as well as swapping things out. 2010-01-06 14:15:17 +00:00
multi bulk requests in redis-benchmark, default fsync policy changed to everysec, added a prefix character for DEBUG logs 2010-02-06 12:39:07 +00:00			`* Hashes (GET/SET/DEL/INCRBY/EXISTS/FIELDS/LEN/MSET/MGET). Special encoding for hashes with less than N elements.`
			`* Write documentation for APPEND`
			`* Implement LEN, SUBSTR, PEEK, POKE, SETBIT, GETBIT`
TODO updated with plans up to 1.5 2009-11-21 13:13:50 +00:00
VM tuning thanks to redis-stat vmstat. Now it performs much better under high load 2010-01-22 17:58:11 +00:00			`VERSION 2.2 TODO (Fault tolerant sharding)`
TODO aesthetic changes 2009-11-21 13:16:45 +00:00			`===========================================`
TODO updated with plans up to 1.5 2009-11-21 13:13:50 +00:00
TODO aesthetic changes 2009-11-21 13:16:45 +00:00			`* Redis-cluster, a fast intermediate layer (proxy) that implements consistent hashing and fault tollerant nodes handling.`
TODO updated with plans up to 1.5 2009-11-21 13:13:50 +00:00
ae.c initial refactoring for epoll implementation 2009-11-23 11:00:23 +00:00			`Interesting readings about this:`

			`- http://ayende.com/Blog/archive/2009/04/06/designing-rhino-dht-a-fault-tolerant-dynamically-distributed-hash.aspx`

VM tuning thanks to redis-stat vmstat. Now it performs much better under high load 2010-01-22 17:58:11 +00:00			`VERSION 2.4 TODO (Optimizations and latency)`
TODO aesthetic changes 2009-11-21 13:16:45 +00:00			`============================================`
TODO updated with plans up to 1.5 2009-11-21 13:13:50 +00:00
			`* Lower the CPU usage.`
			`* Lower the RAM usage everywhere possible.`
			`* Use epool and alike to rewrite ae.c for Linux and other platforms suppporting fater-than-select() mutiplexing APIs.`
			`* Implement an UDP interface for low-latency GET/SET operations.`

TODO list modified. What's planned for 1.4 is now written in the stone ;) 2009-12-12 18:42:42 +00:00			`OTHER IMPORTANT THINGS THAT WILL BE ADDED BUT I'M NOT SURE WHEN`
			`===============================================================`

			`BIG ONES:`

			`* Specially encoded memory-saving integer sets.`
			`* A command to export a JSON dump (there should be mostly working patch needing major reworking).`
TODO list update 2009-12-18 20:49:22 +00:00			`* Specially encoded sets of integers (this includes a big refactoring providing an higher level layer for Sets manipulation)`
TODO list modified. What's planned for 1.4 is now written in the stone ;) 2009-12-12 18:42:42 +00:00
			`SMALL ONES:`

Hash auto conversion from zipmap to hash table, type fixed for hashes, hash loading from disk 2010-03-09 12:18:49 +00:00			`* If sizeof(double) == sizeof(void*) we could store the double value of sorted sets directly in place of the pointer instead of allocating it in the heap.`
first implementation of HSET/HSET. More work needed 2010-03-06 00:56:16 +00:00			`* Delete on writes against expire policy should only happen after argument parsing for commands doing their own arg parsing stuff.`
TODO list modified. What's planned for 1.4 is now written in the stone ;) 2009-12-12 18:42:42 +00:00			`* Give errors when incrementing a key that does not look like an integer, when providing as a sorted set score something can't be parsed as a double, and so forth.`
TODO update 2009-12-13 00:16:51 +00:00			`* MSADD (n keys) (n values). See this thread in the Redis google group: http://groups.google.com/group/redis-db/browse_thread/thread/e766d84eb375cd41`
TODO list update 2009-12-18 20:49:22 +00:00			`* Don't save empty lists / sets / zsets on disk with snapshotting.`
			`* Remove keys when a list / set / zset reaches length of 0.`
Bug #169 fixed (BLOP/BRPOP interrupted connections are not cleared from the queue) 2010-02-27 11:26:08 +00:00			`* An option to exec a command slave-side if the master connection is lost: even cooler: if the script returns "0" the slave elects itself as master, otherwise continue trying to reconnect.`
TODO list modified. What's planned for 1.4 is now written in the stone ;) 2009-12-12 18:42:42 +00:00
TODO list update 2009-12-18 20:49:22 +00:00			`THE "MAYBE" TODO LIST: things that may or may not get implemented`
			`=================================================================`
MSET and MSETNX commands implemented 2009-10-16 11:44:25 +00:00
TODO updated 2009-11-29 11:09:31 +00:00			`Most of this can be seen just as proposals, the fact they are in this list`
			`it's not a guarantee they'll ever get implemented ;)`

a few redis-cli format specified fixed 2009-12-05 13:26:05 +00:00			`* Move dict.c from hash table to skip list, in order to avoid the blocking resize operation needed for the hash table.`
use __attribute__ format in sdscatprintf() when the compiler is GCC. Fixed format bugs resulting from the new warnings. 2009-12-05 12:50:36 +00:00			`* FORK command (fork()s executing the commands received by the current`
			`client in the new process). Hint: large SORTs can use more cores,`
			`copy-on-write will avoid memory problems.`
			`* DUP command? DUP srckey dstkey, creates an exact clone of srckey value in dstkey.`
			`* SORT: Don't copy the list into a vector when BY argument is constant.`
			`* Write the hash table size of every db in the dump, so that Redis can resize the hash table just one time when loading a big DB.`
			`* LOCK / TRYLOCK / UNLOCK as described many times in the google group`
			`* Replication automated tests`
			`* Byte Array type (BA prefixed commands): BASETBIT BAGETBIT BASETU8 U16 U32 U64 S8 S16 S32 S64, ability to atomically INCRBY all the base types. BARANGE to get a range of bytes as a bulk value, BASETRANGE to set a range of bytes.`
			`* zmalloc() should avoid to add a private header for archs where there is some other kind of libc-specific way to get the size of a malloced block. Already done for Mac OS X.`
			`* Read-only mode.`
			`* Pattern-matching replication.`
SHUTDOWN now does the right thing when append only is on, that is, fsync instead to save the snapshot. 2009-12-18 12:31:44 +00:00			`* Add an option to relax the delete-expiring-keys-on-write semantic denying replication and AOF when this is on? Can be handy sometimes, when using Redis for non persistent state, but can create problems. For instance should rename and move also "move" the timeouts? How does this affect other commands?`
TODO list update 2009-12-18 20:49:22 +00:00			`* Multiple BY in SORT.`