* Always split corpus to a fixed number of parts
* Fix splitting
* Rewrite corpus splitting in Python
* Replace in taskcluster
* Add tests
* Unify compression tool with Taskcluster
* Move zstd installation to docker image
* Disable opuscleaner in CI
* Compress chunks
* Fix file names
* Remove zeros from file index
* Start file index with 1
* Fix corpus splitting
* Add a link to an issue
* Generate script description from doc
* Use new test dir
* Use new test dir
* Test command line args
* Clarify expected files
* Add logging
* Initial integration of opus cleaner
* Support custom filters
* Use opus cleaner in pipeline
* Fix env
* Fix filter generation
* Add more rules
* Fix elrc filter
* Fix env
* Fix frequent patterns filter
* Switch to reading from stdin
* Add a feature flag for opus cleaner
* Fix condition
* Add extra test for non empty files
* Integrate with TC
* Run linter
* Fix step config
* Fix step config
* Fix step config
* Fix step config
* Fix command
* Fix path
* Update OpusCleaner
* Remove warning
* Log filtered length
* Add opuscleaner logs
* Add comments
* Fix using custom filters
* Extract function
* Change the CI target back
* Fix file path
* Replace conda with poetry
* Add doc
* Add more comments
* Rename example filter
* Test corpus
* Fix filter name
* Use opus dataset instead of mtdata
* Make CI faster
* Add sections to makefile
* Fix custom filter search
* Redirect stderr to stdout
* Fix usage of custom config
* Fix config name
* Change back to all