[ruby/yarp] Move fuzz docs into docs/ directory, reference in README

https://github.com/ruby/yarp/commit/97547b6192
2023-06-30 14:28:54 -04:00 · 2023-06-30 14:28:54 -04:00 · d49a92d036
--- a/doc/encodings.rdoc
+++ b/doc/encodings.rdoc
@ -146,7 +146,7 @@ The default encoding for a string created with method String.new is:

 - For a \String object argument, the encoding of that string.
 - For a string literal, the script encoding;
-  see {Script Encoding}[https://docs.ruby-lang.org/en/master/encodings_rdoc.html#label-Script+Encoding].
+  see {Script Encoding}[rdoc-ref:encodings.rdoc@Script+Encoding].

 In either case, any encoding may be specified:

--- a/docs/fuzzing.md
+++ b/docs/fuzzing.md
@ -0,0 +1,93 @@
+# Fuzzing
+
+We use fuzzing to test the various entrypoints to the library. The fuzzer we use is [AFL++](https://aflplus.plus). All files related to fuzzing live within the `fuzz` directory, which has the following structure:
+
+```
+fuzz
+├── corpus
+│   ├── parse             fuzzing corpus for parsing (a symlink to our fixtures)
+│   ├── regexp            fuzzing corpus for regexp
+│   └── unescape          fuzzing corpus for unescaping strings
+├── dict                  a AFL++ dictionary containing various tokens
+├── docker
+│   └── Dockerfile        for building a container with the fuzzer toolchain
+├── fuzz.c                generic entrypoint for fuzzing
+├── heisenbug.c           entrypoint for reproducing a crash or hang
+├── parse.c               fuzz handler for parsing
+├── parse.sh              script to run parsing fuzzer
+├── regexp.c              fuzz handler for regular expression parsing
+├── regexp.sh             script to run regexp fuzzer
+├── tools
+│   ├── backtrace.sh      generates backtrace files for a crash directory
+│   └── minimize.sh       generates minimized crash or hang files
+├── unescape.c            fuzz handler for unescape functionality
+└── unescape.sh           script to run unescape fuzzer
+```
+
+## Usage
+
+There are currently three fuzzing targets
+
+- `yp_parse_serialize` (parse)
+- `yp_regexp_named_capture_group_names` (regexp)
+- `yp_unescape_manipulate_string` (unescape)
+
+Respectively, fuzzing can be performed with
+
+```
+make fuzz-run-parse
+make fuzz-run-regexp
+make fuzz-run-unescape
+```
+
+To end a fuzzing job, interrupt with CTRL+C. To enter a container with the fuzzing toolchain and debug utilities, run
+
+```
+make fuzz-debug
+```
+
+# Out-of-bounds reads
+
+Currently, encoding functionality implementing the `yp_encoding_t` interface can read outside of inputs. For the time being, ASAN instrumentation is disabled for functions from src/enc. See `fuzz/asan.ignore`.
+
+To disable ASAN read instrumentation globally, use the `FUZZ_FLAGS` environment variable e.g.
+
+```
+FUZZ_FLAGS="-mllvm -asan-instrument-reads=false" make fuzz-run-parse
+```
+
+Note, that this may make reproducing bugs difficult as they may depend on memory outside of the input buffer. In that case, try
+
+```
+make fuzz-debug # enter the docker container with build tools
+make build/fuzz.heisenbug.parse # or .unescape or .regexp
+./build/fuzz.heisenbug.parse path-to-problem-input
+```
+
+# Triaging Crashes and Hangs
+
+Triaging crashes and hangs is easier when the inputs are as short as possible. In the fuzz container, an entire crash or hang directory can be minimized using
+
+```
+./fuzz/tools/minimize.sh <directory>
+```
+
+e.g.
+```
+./fuzz/tools/minimize.sh fuzz/output/parse/default/crashes
+```
+
+This may take a long time. In the the crash/hang directory, for each input file there will appear a minimized version with the extension `.min` appended.
+
+Backtraces for crashes (not hangs) can be generated en masse with
+
+```
+./fuzz/tools/backtrace.sh <directory>
+```
+
+Files with basename equal to the input file name with extension `.bt` will be created e.g.
+
+```
+id:000000,sig:06,src:000006+000190,time:8480,execs:18929,op:splice,rep:4
+id:000000,sig:06,src:000006+000190,time:8480,execs:18929,op:splice,rep:4.bt
+```