* Fixed a bug in Rand.Sample that could cause it to return a value that should have zero probability.
* Added MMath.AbsDiffAllowingNaNs
* Fixed SparseVector.MaxDiff
* DependencyGraphView, ModelView, TaskGraphView implement IDisposable
* Added missing evidence methods
* Added Vector.All(Vector, Func)
# problem
The way the ".pdb" is specified currently doesn't work for projects with multiple target platforms (OutDir is not populated).
# solution
Override the way we do it for other projects, and add entries for each target platform we have.
Add .net6.0-windows as a target so that Visualizer features are available on .NET 6.0 when targeting Windows (while preserving the cross-platform net6.0 target).
I have changed our minimum .NET 4xx dependency from net462 to net472 because this is the minimum version supported by the version of the automatic graph layout package that supports net6.0-windows.
It is possible to still use net462 and in this case to use the previous version of automatic graph layout; but the increased complexity doesn't seem worth it.
* MMath.Max, Min, Median throw InvalidOperationException on empty sequences.
* Increased precision of MaxGaussianOp.
* InnerProductArrayOp.AAverageConditional handles innerProduct near uniform.
* MaxOfOthersOp has internal flag for Monte Carlo.
In our project these 2 methods are very hot. They are near-optimal algorithmically but implementation
wasn't efficient due to use of convenient abstractions and JIT in netframework4.7.2 being not very sophisticated.
(dotnet7 optimizes original code a lot better, but due to various reasons we can't migrate to it yet).
List of micro-optimizations in no particular order:
- ReadOnlyArray<> now returns values by reference
- It means it can't implement IReadOnlyList<> interface anymore. But it wasn't used through interface anywhere anyway
- GenerationalDictionary<> now accesses hash-table cells by reference
- GenerationalDictionary.GetOrAdd() method added which replaces 2 calls to TryGetValue()+Add()
- Automaton.Builder now manages its dynamically growing arrays on its own instead of using List<>.
This reduces indirection and allows to access elements by reference
- Automaton.Transition.ElementDistribution is not Option<TElementDistribution> anymore. Boxing of
element distribution into Option<> was not optimized out by JIT.
- New API is a little less safe (because this property can be used only for non-epsilon transitions) but more efficient.
- Transition.OptionalElementDistribution property was added for non-performance critical parts of the code.
- StringManipulator is now a struct (value type). It allows to monomorphize generic class and inline calls to its methods.
With classes (reference types) calls had to be dynamically dispatched.
- Automaton.Product() and TransducerBase.ProjectSource() were rewritten without use of StateCollection, State
and ReadOnlyArraySegment<> types. These types provided convenient APIs but JIT was not ablet to eliminate
their overhead.
This PR speeds up methods mentioned in PR title by about 30%.
1. Implementation of `SequenceDistribution` (of which `StringDistribution` specialization is most important)
uses the curiously recurring template pattern (https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern)
by specifying the `TThis` template parameter. In places where `SequenceDistribution` creates new instances
it uses the `new TThis()` which goes through the `Activator.CreateInstance<>` method which is very slow.
To avoid this a special helper - `Util.New<T>` is introduced which uses a code generation at runtime trick:
https://stackoverflow.com/a/1280832 which is at least an order of magnitude faster that what compiler does.
2. `CollectionElementMappingInfo.ElementMapping` is switched from List of lists to List of read only arrays.
This brings 2 performance benefits: a) there is less indirection when reading the element mapping and
b) the common mappings can be cached and reused now.
* Inferred constant variables are not inlined
* Added Factor.ProbLessThan, ProbBetween, Quantile, Integral, Apply.
* ModelCompiler handles Delegate-valued variables.
* RatioGaussianOp handles ratio instead of ProductOp.
* ExpOp_Slow and LogOp_EP can compute derivatives
* Refactored ConstantFoldingTransform out of ModelAnalysisTransform
* CodeBuilder.MethodRefExpr takes arguments.
* Added VariableInformation.NeedsMarginalDividedByPrior and CodeRecognizer.NeedsMarginalDividedByPrior
* MessageTransform does not use a distribution for the forward message of a constant.
* DefaultFactorManager allows deterministic factors to have Full support
* DefaultFactorManager puts each factor on a separate line
* docs README mentions ShowFactorManager
BinaryNativeClassifierMapping and MulticlassNativeClassifierMapping serialize labels as strings. Incremented CustomSerializationVersion for each.
BinaryNativeClassifierMapping does not serialize labels when TLabel is bool.
IReader.ReadObject is generic.
* Added Variable.Max(int,int).
* Compiler warns about excess memory consumption in more cases when it should, and fewer cases when it shouldn't.
* TransformBrowser shows attributes by default.
* Updated FactorDocs