Fixed cases where MMath.NormalCdf, DoubleIsBetweenOp, IsPositiveOp, DoublePlusOp would throw.
JaggedSubarrayOp uses ForceProper.
PlusDoubleOp gracefully handles improper distributions on input.
Added MMath.NormalCdfIntegral, NormalCdfDiff, NormalCdfExtended.
Added ExtendedDouble class.
MeanVarianceAccumulator correctly handles weight=0.
Region.GetLogVolume has a lower bound.
Variables with PointEstimate do not use JaggedSubarrayWithMarginal.
Due to refactoring mistake `sampleProb / prob * intervalLength`
turned into `sampleProb / (prob * intervalLength)` which is obviosly incorrect.
Fixed that + added a rudimentary test for `DiscreteChar.Sample()`
Made `ArgumentCountToNames` cache thread-safety.
After that another exception appeared with `ArgsToValidatingAutomaton` dictionary (reference null exception). Making the dictionary concurrent fixed this error.
* Determinization doesn't change language of the automaton anymore.
2 new tests were added which check that it doesn't happen.
Previously all very low probability transitions (with probability of less than e^-35) were removed.
That was done because of 2 reasons:
- some probabilities were represented in linear space. And e^-35 is as low resolution
as you can get with regular doubles. (That is, if one probability = 1 and another is e^-35,
then when you add them together, second one is indistinguishable from zero).
This was fixed when discrete charprobabilities
- Trying to determinize some non-determinizable automata lead to explosion of low-probability
states and transitions which led to a very poor performance.
(E.g. `AutomatonNormalizationPerformance3` test). Now a smarter strategy for detecting
these non-determinizable automata is used - during traversal all sets of states from root
are remembered. If automaton comes to the same set of states but with different weights
than it stops immediately, because otherwise it will be caught in infinite loop
* `Equals()` and `GetHashCode()` for `WeightedStateSet` take into account only high 32 bits of weight.
This coupled with normalization of weights allows to reuse already added states with
very close weights. This speeds up the "PropertyInferencePerformanceTest" 2.5x due to
smaller intermediate automata.
* Weighted sets of size one are handled specially in `TryDeterminize` - they don't need to be
determinized and can be copied into result almost as is. (Unless they have non-deterministic
transitions. Simple heuristic of "has different destination states" is used to detect that
and fallback to slow general path).
* Representation for `WeightedStateSet` is changed from (int -> float) dictionary to
sorted array of (int, float) pairs. As an optimization, a common case of single-element
set does not allocate any arrays.
* Determinization code for `ListAutomaton` was removed, because it has never worked
Transition weights may become infinite when going from log space into value space. And
`SetToSum()` which is used by `MergerParallelTransitions` can't handle 2 inifnite weights at once.
To solve this, transitions weights are normalized by max weight prior to passing into `SetToSum()`
Pointmasses are the most common form of char distributions. Common operations for them
have been sped up:
* `IsPointMass` and `Point` are now computed at DiscreteChar construction time and
stored in `char? Storage.Point`. `Storage.Ranges` still contain range with pointmass,
because many operations are not specialized for points.
* `FindProb()`, `GetLogAverageOf()` and `SetToProduct()` now have fast-path for pointmass distributions.
* `DiscreteChar` and `DiscreteChar.Storage` implement `IEquatable<>`
* Pair (FirstTranstiionIndex, LastTransitionIndex) changed to pair
(FirstTransitionIndex, TransitionsCount).
* Introduced Automaton.Builder.LinkedStateData struct which
mirrors Automaton.StateData but represents transitions as linked list.
Previously Automaton.StateData was reused for this purpose.
That was confusing.
Since DiscreteChar was refactored, GetLogProb() can directly return the saved
logarithm of probability. Instead of first exponentiating it and then taking logarithm.
Previously `NotSupportedException` was thrown for `WordChar` and `Uniform` char classes.
Now versions of these distributions with upper chars excluded are returned.
Note: implementation of `ToLower()` for `Unknown` char class is still somewhat broken:
- it ignores probability outside ranges.
- it can get slow if char ranges are big.
In some edge cases StringAutomaton needs to represent extremely
low probabilities of character transitions. To be able to do that,
instead of storing probabilities as double values they are stored
as `Weight` structs already used by automatons. Weight stores logarithm
of value instead of value itself.