BinaryNativeClassifierMapping and MulticlassNativeClassifierMapping serialize labels as strings. Incremented CustomSerializationVersion for each.
BinaryNativeClassifierMapping does not serialize labels when TLabel is bool.
IReader.ReadObject is generic.
* Added Variable.Max(int,int).
* Compiler warns about excess memory consumption in more cases when it should, and fewer cases when it shouldn't.
* TransformBrowser shows attributes by default.
* Updated FactorDocs
* BayesPointMachineClassifier.LoadBackwardCompatibleBinaryClassifier and SaveForwardCompatible use text or binary depending on the file extension
* Added IWriter and IReader, WrappedBinaryWriter, WrappedBinaryReader, WrappedTextWriter, WrappedTextReader
* Changed uses of BinaryWriter to IWriter
* Changed uses of BinaryReader to IReader
* LearnersTests uses same xunit version as other tests
Introduce "generational" strategy for clearing intermediate data in reused preallocated array for
`Automaton.Condensation.FindStronglyConnectedComponents` state.
To avoid allocations `FindStronglyConnectedComponents` employs a preallocated array with data about states.
It may use this cache multiple times when processing a single automaton. It typically will use only a handful
of entries in this array, but each time it grabs it, it has to clear it in full.
With previous array clearance strategy in the worst case `ComputeEpsilonClosure` was a quadratic algorithm,
since it will call `FindStronglyConnectedComponents` as many times as there are states in automaton - (N).
And `FindStronglyConnectedComponents` will clear array which is also O(N), turning the whole thing into O(N^2)
C# now has a built-in type to represent pairs - `ValueTupe<>`. `Pair<>` was eliminated in favour of it.
At the same time a new type - `IntPair` was introduced. It is faster than `ValueTuple<int, int>` when `.Equals()` is called often.
It makes some common string operations which do many lookups by `IntPair` keys up to 10% faster.
A lot of automata operations create large short-lived data structures.
Those are now cached in thread local static fields. This saves a lot of allocations and consequently - causes less GC pressure.
To do so also a new container is introduced - `GenerationalDictionary` which can be cleared in constant time and reuses memory after the Clear.
These changes reduce amount of memory allocations done by `StringInferencePerformanceTests` from 4Gb to 2Gb. Another 1 Gb is allocated by `FindStronlyConnectedComponents` which will be fixed in separate PR, it is harder to introduce caching in there. Another 1Gb is set up of the tests (creation of test data).
Changed the BCC confusion matrix prior so that TrueLabels can be inferred when LabelCount==2.
Fixed serialization of BCC posteriors. BCC posteriors now save to the Results folder.
Fixed serialization example code.
There are two changes:
1. (major) `LogProbabilityOverride` was removed from `DiscreteChar` and `ImmutableDiscreteChar`
This was an unsound functionality that is not used and was complicating the implementation of char distributions
2. (minor) `ImmutableDiscreteChar.Multiply` may reuse immutable discrete char in more cases out of the box.
This reduces the GC pressure a little bit.
* Subarray checks that indices are distinct when debugging
* Renamed ConcatOpTests to StringConcatOpTests
* Crowdsourcing explains why accuracy and precision are NaN.
* Added build instructions for Visual Studio Code.
- Fixed automaton deserialization ignoring LogValueOverride
- Fixed SequenceDistribution.EnumerateSupport and TryEnumerateSupport having different side effects
- Added TryDeterminize, SetLogValueOverride, and ProjectOnTransducer methods to SequenceDistribution
- Added a parameterless overload for Automaton.TryDeterminize that returns the output of TryDeterminize(out TThis) and discards information about deterministicity of said output
* Immutable distribution interfaces
* DiscreteChar made immutable
* Automata made constant
* Automaton.GetLogValue optimized for cases of deterministic and epsilon-free automata
* Fixed Automaton.[Try]EnumerateSupport so that it won't produce duplicates for non-determinizable automata
* Introduced IWeightFunction - interface for abstract weight functions used by SequenceDistribution
* Multi-representable weight function for sequence distribution, that automatically switches between point mass, dictionary, and autoaton representations as appropriate
* Early stops for automaton support enumeration
* Improved automata graphviz format
* Language writer correctly processes nested generics
* Incremented version to 0.4
* Subarray and GetItems factors and operators take IReadOnlyList instead of IList.
* IMatchboxRecommenderMapping uses IReadOnlyList instead of IList.
* Moved Subarray and GetItems factors from Factor class to Collection class.
* Moved variable factors from Factor class to Clone class.
* Conversion.IsAssignableFrom handles covariance.
* Util.GetElementType and IsIList include IReadOnlyList.
* Code cleanup
* Refactored MessageTransform.ConvertMethodInvoke
* Removed Collection.Sort
# problem
The Azure DevOps VSTest runner is not handling the Compiler Options test well, because it takes so long. Also the test takes a very long time to run.
# solution
Run different options in parallel, and in a separate process to finish reliably and in a reasonable time.
InferNet.Infer is generic.
ModelBuilder uses the correct overload of InferNet.Infer.
CodeBuilder.Method checks for correct number of arguments.
FileArray implements IReadOnlyList.
* Added WordStrings and StringFormatTests.
* FactorManager does not allow point mass conversion of the return value argument of EP evidence methods (previously handled by MessageTransform).
* Code cleanup. Renamed IdentityComparer to ReferenceEqualityComparer.
* Enumerating support of distribution does not require it to be normalized unless determinization is requested.
Avoiding doing this speeds up enumerating some auotomata 10x.
* states of `EpsilonClosure` can be stored in plain array instead of list. This minor change speeds up epsilon
closure operations by 20% due to one less level of indirection and better memory allocation.
Previous version was correct but had a pathalogical (exponential) runtime for some forms of automata
with multiple branches with epsilon transitions. Also code was unneccessary complex, because it tried
to compute unreachable states in presence of loops.
Specific changes in this PR:
- `EnumerateSupport` implementation was moved into its own file - `Automaton.EnumerateSupport`
- `ComputeEndStateReachability` method has been resurrected. It is invoked only if loops are detected
In all other cases, it is trivial to check for end-state reachability during normal traversal
- Traversal loop has been split in steps, some of which were moved into their own local methods.
- Fast path for non-branchy part of automaton was implemented.
Removed Cancels attribute from PowerPlateOp.EnterAverageConditional
Sum_Expanded handles an empty array.
Dirichlet.GetMean handles zero pseudo-counts.
Moved TrueSkill tests into TrueSkillTests.
* Motif Finder uses SequenceCount = 70
* DiscreteFromDirichletOp.ProbsAverageConditional handles PiecewiseVector
* GammaPower.FromLogMeanMinusMeanLog handles infinite mean and negative power
* Tests use Assert.Throws instead of try/catch
* Test output is less verbose
* Added SimplestBackwardChainTest3
* Fixed IterativeProcessTransform when variable has QueryTypes.MarginalDividedByPrior and ConstrainEqualRandom(variable, observed)
* Discrete allows Dimension=0
* Fix long overflow in OperatorTests.Longs()
* Fixed printing big float arrays, extended series for ((exp(x) - 1) / x - 1) - 0.5
* GammaPower.GetLogProb uses ulp-based threshold
* Moved a constant for maximum terms in NormalCdfMomentRatio outside of method body
* Separate methods for Previous/NextDoubleWithPositiveDifference
* More magic constants replaced with ulp-based ones
* Fixed abs value comparison in GammaPower.GetLogProb
* Ulp-based constatnts in IsBetween.XAverageConditional
* Moved the definitions of Ulp1-dependant constants below that of Ulp1
* Replaced <= in GammaPower with a < as it used to be
Co-authored-by: Dmitry Kats <ratkillerx@hotmail.com>