1. Actually remove merged states in MergeTrees(). Not doing this led to slightly worse performance
2. TryComputePoint() contained a bug when it didn't traverse some transitions in automaton
* FindStronglyConnectedComponents() ncorreclty used array to store state info. When
epsilon closure of state was constructed a very small fraction of states was traversed
but array was created for all states in automaton. This created huge gc pressure.
* SetToEpsilonClosure() is also now recursion-free
* Automaton simplification now considers important case of eliminating epsilon-transition
TryDeterminized() tried to do something even if it is known that automaton is already determinized
or non-determinizable.
Because automaton state is immutable, it is possible to store determinization state alongside with it.
There are 3 states:
* Unknown - TryDeterminize() was never called for this automaton
* IsDeterminized - TryDeterminized() successfully determinized automaton
* IsNonDeterminizable - TryDeterminize() was called but didn't succeed.
Because determinization state depends on maximum number of states,
`TryDeterminize(int maxStatesCount)` method was removed in favour of using defaults.
Because this overload was never used in practice.
Also, as an implementation detail an enum type was exposed as part of automaton quoting interface.
Compiler generated incorrect C# code for quoting enum constants. Fixed that.
* It didn't recurse past first state
* transition2 was not checked if it is valid for merging. Because of subscript error, transition1 was checked twice.
EpsilonClosure is small and immutable. Perfect fit for value type.
Also, changed usages of Pair<> to ValueTuple<> and started using
expression-bodied functions for properties.
We finally reached a point where automatons are big enough that recursive implementations
of algorithms fail with `StackOverflowException`.
There are 4 tests which test operations with big automatons (100k states):
* `TryComputePointLargeAutomaton`
* `SetToProductLargeAutomaton`
* `GetLogNormalizerLargeAutomaton`
* `ProjectSourceLargeAutomaton`
All four used to fail before code was rewritten to use explicit stack instead of recursive calls.
All changes except one didn't change algorithms used, code was almost mechanically changed
to use stack.
The only exception - automaton simplification. Old code used recursion in very non-trivial way.
It was rewritten from scratch, using different algorithm: instead of extraction of generalized
sequences and then reinserting them back new code merges states directly in automaton.
(There's a comment at the beginning of Simplify() method explaining all operations)
To reduce memory pressure automaton data (states and transitions) was
converted from classess to structs and "vectorized" into just two large arrays.
Now, one state costs just 16 bytes and one transition costs 24 bytes.
Previously cost was at least twice of that, due to additional indirection,
object headers and pre-reservation of arrays capacity: 8 bytes pre reference
to state + 16 bytes per state header + 24 bytes per transitions array per state).
In practice memory consumption by automatons was reduced approximately
by the factor of 3.
Important changes:
* `Automaton.StateData` is now a struct instead of class.
* All transitions of automaton are stored in single array instead of
"transition array per state"
* Automatons can not be mutated. All immutable data was moved
to new `DataContainer` struct.
* New Automaton.Builder class was introduces for construction
of new automatons from scratch. All mutation is done through
"copy to builder, mutate, get immutable data" pattern.
* Widely used `AppendInPlace()` method is not actually inplace anymore
and became way more expensive. Multiple `AppendInPlace()` calls
should be replaced with 1 `Automaton.Builder` instance with
multiple `Builder.Append()` calls. Places which were hot in profiler were rewritten.
`AppendInPlace()` will be removed completely in separate pull-request.
Technical changes:
* `Optional<>` struct was moved into Containers namespace. Logically it is
a container of "0-or-1" elements.
* To enforce immutable data-structures new `ReadOnlyArray`, `ReadOnlyArraySegment`
and `ReadOnlyArraySegment` enumerator structs were introduced. They wrap regular
arrays but do not allow to mutate them. Unlike `ReadOnlyList<>` these are value types
and do not introduce any memory-costs.
* In quite a few places Tuples were replaced with ValueTuples.
* Most recursive helper methods were "inlined" as local functions at their call-sites.
Not strictly necessary, but mady refactoring a lot easier by bringing related code together.
* Some code was reformated/updated to more modern style. Either by tooling or manually.
* Softmax factor throws when output is observed under VMP.
* VectorSoftmaxOp_KM11.XAverageLogarithm allows softmax to be a point mass as long as some x[i] is a point mass.
Fixes#101
* Added IrregularQuantiles
* IrregularQuantiles throws ArgumentOutOfRangeException
* Added Region.Equals and CompareTo
* BlogTests.Handedness is a test
* BallCountingTest comment
* More documentation and testing
- Made some properties readonly
- IsLiteralOrLoopVar used local variable with same name and casing as property, chnaged local to lowercase: recognizer
- Pattern matching