Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the commands JSON files #403

Merged
merged 5 commits into from
May 2, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
222 changes: 214 additions & 8 deletions src/commands/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,219 @@
Command JSON files
==================

This directory contains JSON files, one for each command.

Each JSON contains all the information about the command itself, but these JSON files are not to be used directly!
Any third party who needs access to command information must get it from `COMMAND INFO` and `COMMAND DOCS`.
The output can be extracted in a JSON format by using `valkey-cli --json`, in the same manner as in `utils/generate-commands-json.py`.
Each JSON file contains all the information about the command itself. It is the
"single source of truth" (SSOT) for the command's metadata.

These JSON files were originally not intended to be used directly, since they
contain internals and some fields like "acl_categories" that are not the final
ACL categories. (Valkey will apply some implicit rules to compute the final ACL
zuiderkwast marked this conversation as resolved.
Show resolved Hide resolved
categories.) However, people see JSON files and use them directly anyway.

Any third party who needs access to command information were originally supposed
to get it from `COMMAND INFO` and `COMMAND DOCS`. These commands can be combined
into a JSON file by the script `utils/generate-commands-json.py`. Confusingly
enough, this JSON file as a slightly different format!

Structure
---------

Each JSON file contains an object with a single key. The key is the command name
in uppercase, e.g. "HSCAN" (hscan.json). The value is a JSON object with the
following keys. To be safe, assume all of them are optional.

* `"summary"`: a string with a short description of the command. One sentence.
* `"complexity"`: a string like `"O(1)"` or longer, like `"O(1) for every call. O(N) for a complete iteration, including enough command calls for the cursor to return back to 0. N is the number of elements inside the collection."`.
* `"group"`: a string used for categorization in documentation. One of these:
* `"bitmap"`
* `"cluster"`
* `"connection"`
* `"generic"`
* `"geo"`
* `"hash"`
* `"hyperloglog"`
* `"list"`
* `"pubsub"`
* `"scripting"`
* `"sentinel"`
* `"server"`
* `"set"`
* `"sorted_set"`
* `"stream"`
* `"string"`
* `"transactions"`
* `"since"`: a string with a version number, like "7.0.0". It's the version
(Redis OSS or Valkey) where the command was introduced.
* `"arity"`: The number of arguments, including the command name itself. A
negative number means "at least", e.g. -3 means at least 3.
* `"container"`: Only present for subcommands. See below.
* `"history"`: An array of changes, each change represented by a 2-element array
zuiderkwast marked this conversation as resolved.
Show resolved Hide resolved
on the form `[VERSION, DESCRIPTION]`. Omit if empty. Don't add an empty array.
* `"function"`: The name of the C function in Valkey's source code implementing
the command. (Don't use it for anything else.)
* `"command_flags"`: An array of flags represented as strings. Command flags:
* `"ADMIN"`
* `"ALLOW_BUSY"`
* `"ASKING"`
* `"BLOCKING"`
* `"DENYOOM"`
* `"FAST"`
* `"LOADING"`
* `"MAY_REPLICATE"`
* `"NO_ASYNC_LOADING"`
* `"NO_AUTH"`
* `"NO_MANDATORY_KEYS"`
* `"NO_MULTI"`
* `"NOSCRIPT"`
* `"ONLY_SENTINEL"`
* `"PROTECTED"`
* `"PUBSUB"`
* `"READONLY"`
* `"SENTINEL"`
* `"SKIP_MONITOR"`
* `"SKIP_SLOWLOG"`
* `"STALE"`
* `"TOUCHES_ARBITRARY_KEYS"`
* `"WRITE"`
* `"acl_categories"`: A list of ACL categories in uppercase. Note that the
effective ACL categies include "implicit ACL categories" explained below.
* `"ADMIN"`
* `"BITMAP"`
* `"CONNECTION"`
* `"DANGEROUS"`
* `"GEO"`
* `"HASH"`
* `"HYPERLOGLOG"`
* `"KEYSPACE"`
* `"LIST"`
* `"SCRIPTING"`
* `"SET"`
* `"SORTEDSET"`
* `"STREAM"`
* `"STRING"`
* `"TRANSACTION"`
* `"command_tips"`: Optional. A list of one or more of these strings:
* `"NONDETERMINISTIC_OUTPUT"`
* `"NONDETERMINISTIC_OUTPUT_ORDER"`
* `"REQUEST_POLICY:ALL_NODES"`
* `"REQUEST_POLICY:ALL_SHARDS"`
* `"REQUEST_POLICY:MULTI_SHARD"`
* `"REQUEST_POLICY:SPECIAL"`
* `"RESPONSE_POLICY:AGG_LOGICAL_AND"`
* `"RESPONSE_POLICY:AGG_MIN"`
* `"RESPONSE_POLICY:AGG_SUM"`
* `"RESPONSE_POLICY:ALL_SUCCEEDED"`
* `"RESPONSE_POLICY:ONE_SUCCEEDED"`
* `"RESPONSE_POLICY:SPECIAL"`
* `"key_specs"`: An array of key specifications. See below.
* `"reply_schema"`: A [JSON Schema](https://json-schema.org/) that describes the
reply of the command. This isn't complete. For example, JSON Schema can't
distinguish arrays from sets, commands returning a set are declared to return
an array.
* `"arguments"`: An array of arguments. Each argument is an object with the following keys:
* `"name"`: A string identifying the argument. It's unique among the arguments.
* `"type"`: The type of the argument.
* `"block"`: A group of arguments. The elements are in the key `"arguments"`.
* `"double"`: A number, not necessarily an integer.
* `"integer"`: An integer.
* `"key"`: A string representing a key in the database.
* `"oneof"`: One of a list of alternatives. The alternatives are in the key `"arguments"`.
* `"pattern"`: A string representing a glob-style pattern.
* `"pure-token"`: A fixed string. The string is in the key `"token"`.
* `"string"`: A string.
* `"unix-time"`: An integer representing a unix time in either seconds or milliseconds.
* `"arguments"`: A list with the same structure as its parent. Present if type is "block" or "oneof".
* `"display"`: ("entries-read", "key" or "pattern")
* `"key_spec_index"`: An index into the `"key_specs"` array. Only if `"type"` is `"key"`.
* `"multiple":` true if the argument can be repeated multiple times. Omitted means false.
* `"multiple_token"`: Unclear meaning. Maybe meaningless.
* `"optional":` True if the argument is optional. Omitted means false.
* `"since"`: Version (string) when the argument was introduced.
* `"token"`: A string indicating a fixed string value. This is always present
if type is "pure-token". If type is anything else, then `"token"` indicates
the argument is preceded by an extra (fixed string) argument.

Implicit ACL categories
-----------------------

The ACL categories specified as `"acl_categories"` are not the ones actually used.
The effective ACL categories are affected also by command flags.

The logic for this can be found in the function `setImplicitACLCategories()` in
`server.c`. Here are the rules (unless they have changed since this
documentation was written):

* Command flag WRITE implies ACL category WRITE.
* Command flag READONLY and not ACL category SCRIPTING implies ACL category READ.
"Exclude scripting commands from the RO category."
* Command flag ADMIN implies ACL categories ADMIN and DANGEROUS.
* Command flag PUBSUB implies ACL category PUBSUB.
* Command flag FAST implies ACL category FAST.
* Command flag BLOCKING implies ACL category BLOCKING.
* Not ACL category FAST implies ACL category SLOW. "If it's not fast, it's slow."

There's an issue about explicitly listing all categories, removing this
discrepancy: https://github.com/valkey-io/valkey/issues/417

Key specs
---------

Key specifications are specified in the array `"key_specs"` key of a command.

Each element in this array is an object with the following keys:

* `"flags"`: An array of strings indicating what kind of access is the command does on the key.
* `"ACCESS"`
* `"DELETE"`
* `"INCOMPLETE"`
* `"INSERT"`
* `"NOT_KEY"`
* `"OW"`
* `"RM"`
* `"RO"`
* `"RW"`
* `"UPDATE"`
* `"VARIABLE_FLAGS"`
* `"begin_search"`: How to find the first key used by this key spec. It's an
object with only one key. The key determines the method for finding the first
key. Here are the possible forms of the `"begin_search"` object:
* `{"index": {"pos": N}}`: The first key is at position N in the command line,
where 0 is the command name.
* `{"keyword": KEYWORD, "startfrom": N}`: The first key is found by searching
for an argument with the exact value KEYWORD starting from index N in the
command line. The first key is the argument after the keyword.
* `{"unknown": null}`: Finding the keys of this command is too complicated to
explain.
* `"find_keys"`: How to find the remainnig keys of this key spec. It's an object
on one of these forms:
* `{"range": {"lastkey": LAST, "step": STEP, "limit": LIMIT}}`: A range of keys.
* LAST: If LAST is positive, it's the index of the last key relative to the
first key. If last is negative, then -1 is the end of the whole command
line, -2 is the penultimate argument of the command, and so on.
* STEP: The number of arguments to skip to find the next one. Typically 1.
* LIMIT: If LAST is -1, we use the limit to stop the search by a factor. 0
and 1 mean no limit. 2 means half of the remaining arguments, 3 means a
third, and so on.

Commands with subcommands
-------------------------

Commands with subcommands are special. Examples of commands with subcommands are
`CLUSTER` and `ACL`. Their first argument is a subcommand which determines the
syntax of the rest the command, which is stored in its own JSON file.

For example `CLUSTER INFO` is stored in a file called `cluster-info.json`. The
toplevel key is called `"INFO"`. Within the body, there's a key called
`"container"` with the value `"CLUSTER"`. The file `cluster.json` exists by it
doesn't have an `"arguments"` key.

The JSON files are used to generate commands.def within this repo and JSON files for documentation, and
despite looking similar to the output of `COMMAND` there are some fields and flags that are implicitly populated, and that's the
reason one shouldn't rely on the raw files.
Appendix
--------

The `reply_schema` section is a standard JSON Schema (see https://json-schema.org/) that describes the reply of each command.
It is designed to someday be used to auto-generate code in client libraries, but is not yet mature and is not exposed externally.
How to list all the `group`, `command_flags` and `acl_categries`, etc. used in all these files:

cat *.json | jq '.[].group' | grep -F '"' | sed 's/^ *//;s/, *$//;s/^/ * `/;s/$/`/' | sort | uniq
cat *.json | jq '.[].command_flags' | grep -F '"' | sed 's/^ *//;s/, *$//;s/^/ * `/;s/$/`/' | sort | uniq
cat *.json | jq '.[].acl_categories' | grep -F '"' | sed 's/^ *//;s/, *$//;s/^/ * `/;s/$/`/' | sort | uniq
cat *.json | jq '.[].arguments[]?.type' | grep -F '"' | sed 's/^ *//;s/, *$//;s/^/ * `/;s/$/`/' | sort | uniq