For compliance, we do not want to apply Type.GetType to any data string we are given in-case it leads to harmful types being used. Even using Type.GetType and then checking the type afterwards is not compliant because it may inadvertently load assemblies we did not intend.
Therefore we create an explicit list of allowed types, and parse the type string to direct construction of the required type from the allowed list.
The nice thing about constructing directly from the allowed list is that even if there is a bug in the parsing code, or the string has been tampered with, it is impossible for us to end up with a type that is not a combination of types on the allowed list.
* Fixed a bug in Rand.Sample that could cause it to return a value that should have zero probability.
* Added MMath.AbsDiffAllowingNaNs
* Fixed SparseVector.MaxDiff
* DependencyGraphView, ModelView, TaskGraphView implement IDisposable
* Added missing evidence methods
* Added Vector.All(Vector, Func)
Add .net6.0-windows as a target so that Visualizer features are available on .NET 6.0 when targeting Windows (while preserving the cross-platform net6.0 target).
I have changed our minimum .NET 4xx dependency from net462 to net472 because this is the minimum version supported by the version of the automatic graph layout package that supports net6.0-windows.
It is possible to still use net462 and in this case to use the previous version of automatic graph layout; but the increased complexity doesn't seem worth it.
* MMath.Max, Min, Median throw InvalidOperationException on empty sequences.
* Increased precision of MaxGaussianOp.
* InnerProductArrayOp.AAverageConditional handles innerProduct near uniform.
* MaxOfOthersOp has internal flag for Monte Carlo.
In our project these 2 methods are very hot. They are near-optimal algorithmically but implementation
wasn't efficient due to use of convenient abstractions and JIT in netframework4.7.2 being not very sophisticated.
(dotnet7 optimizes original code a lot better, but due to various reasons we can't migrate to it yet).
List of micro-optimizations in no particular order:
- ReadOnlyArray<> now returns values by reference
- It means it can't implement IReadOnlyList<> interface anymore. But it wasn't used through interface anywhere anyway
- GenerationalDictionary<> now accesses hash-table cells by reference
- GenerationalDictionary.GetOrAdd() method added which replaces 2 calls to TryGetValue()+Add()
- Automaton.Builder now manages its dynamically growing arrays on its own instead of using List<>.
This reduces indirection and allows to access elements by reference
- Automaton.Transition.ElementDistribution is not Option<TElementDistribution> anymore. Boxing of
element distribution into Option<> was not optimized out by JIT.
- New API is a little less safe (because this property can be used only for non-epsilon transitions) but more efficient.
- Transition.OptionalElementDistribution property was added for non-performance critical parts of the code.
- StringManipulator is now a struct (value type). It allows to monomorphize generic class and inline calls to its methods.
With classes (reference types) calls had to be dynamically dispatched.
- Automaton.Product() and TransducerBase.ProjectSource() were rewritten without use of StateCollection, State
and ReadOnlyArraySegment<> types. These types provided convenient APIs but JIT was not ablet to eliminate
their overhead.
This PR speeds up methods mentioned in PR title by about 30%.
* Inferred constant variables are not inlined
* Added Factor.ProbLessThan, ProbBetween, Quantile, Integral, Apply.
* ModelCompiler handles Delegate-valued variables.
* RatioGaussianOp handles ratio instead of ProductOp.
* ExpOp_Slow and LogOp_EP can compute derivatives
* Refactored ConstantFoldingTransform out of ModelAnalysisTransform
* CodeBuilder.MethodRefExpr takes arguments.
* Added VariableInformation.NeedsMarginalDividedByPrior and CodeRecognizer.NeedsMarginalDividedByPrior
* MessageTransform does not use a distribution for the forward message of a constant.
BinaryNativeClassifierMapping and MulticlassNativeClassifierMapping serialize labels as strings. Incremented CustomSerializationVersion for each.
BinaryNativeClassifierMapping does not serialize labels when TLabel is bool.
IReader.ReadObject is generic.
* Added Variable.Max(int,int).
* Compiler warns about excess memory consumption in more cases when it should, and fewer cases when it shouldn't.
* TransformBrowser shows attributes by default.
* Updated FactorDocs
* BayesPointMachineClassifier.LoadBackwardCompatibleBinaryClassifier and SaveForwardCompatible use text or binary depending on the file extension
* Added IWriter and IReader, WrappedBinaryWriter, WrappedBinaryReader, WrappedTextWriter, WrappedTextReader
* Changed uses of BinaryWriter to IWriter
* Changed uses of BinaryReader to IReader
* LearnersTests uses same xunit version as other tests
C# now has a built-in type to represent pairs - `ValueTupe<>`. `Pair<>` was eliminated in favour of it.
At the same time a new type - `IntPair` was introduced. It is faster than `ValueTuple<int, int>` when `.Equals()` is called often.
It makes some common string operations which do many lookups by `IntPair` keys up to 10% faster.
A lot of automata operations create large short-lived data structures.
Those are now cached in thread local static fields. This saves a lot of allocations and consequently - causes less GC pressure.
To do so also a new container is introduced - `GenerationalDictionary` which can be cleared in constant time and reuses memory after the Clear.
These changes reduce amount of memory allocations done by `StringInferencePerformanceTests` from 4Gb to 2Gb. Another 1 Gb is allocated by `FindStronlyConnectedComponents` which will be fixed in separate PR, it is harder to introduce caching in there. Another 1Gb is set up of the tests (creation of test data).
Changed the BCC confusion matrix prior so that TrueLabels can be inferred when LabelCount==2.
Fixed serialization of BCC posteriors. BCC posteriors now save to the Results folder.
Fixed serialization example code.
There are two changes:
1. (major) `LogProbabilityOverride` was removed from `DiscreteChar` and `ImmutableDiscreteChar`
This was an unsound functionality that is not used and was complicating the implementation of char distributions
2. (minor) `ImmutableDiscreteChar.Multiply` may reuse immutable discrete char in more cases out of the box.
This reduces the GC pressure a little bit.