`System.Collections.Immutable` has `ImmutableArray` that serves the same purpose as
`ReadOnlyArray` but has different API. This type is available (without extra dependencies)
only in netcore, so can't be used in Infer.NET which has to support netframework.
Until netframework support can be dropped reimplement a subset of `ImmutableArray`
in Infer.NET codebase.
makeApiDocs.ps1 uses newer docfx version, and copies build directory because it is now required by project files to build.
toc.yml references md files instead of html files.
Compiler escapes the < and > symbols in XML doc.
Updated and fixed broken factor documentation.
Tools.PrepareSource only processes lines that start with ///
Co-authored-by: Tom Minka <8955276+tminka@users.noreply.github.com>
* Python tool to generate expressions for truncated series.
* Using the generated series in special functions
* Test projects set Optimize=true in Release configuration
Co-authored-by: Tom Minka <8955276+tminka@users.noreply.github.com>
Fixed an issue where DependencyAnalysisTransform would give incorrect SkipIfUniform dependencies, causing statements to be incorrectly pruned from the generated code.
Removed FactorManager.AnyItem. Added FactorManager.All.
GaussianProductOp.AAverageConditional handles uniform B.
MMath.ChooseLn handles more cases.
- Introduced an abstraction for truncated power series
- Introduced an abstraction for power series
- Made [Di|Tri|Tetra]Gamma[Ln] use it
- Added internal interface to recompute power series used in MMath and make them longer/shorter depending on the necessary precision. It can be used in tests.
Currently, power series computation is rather primitive (cut-off some precomputed series at a point where it used to be cut as long as the precision in <= 53, don't cut-off otherwise).
o Marked each non-packing assembly as "IsPackage=false" because the default is to package the assemblies. Done this via a common.props import.
o Switched to using integrated CSProj NuGet properties. Done this via a nuget-properties.props import.
o Switched to using msbuild in release.yml rather than nuget, since nuget.exe does not support the new spec.
o Added a LearnersNuGet project to be the "csproj" host for the learners nuspec (since the learners nuget does not correspond to any particular existing csproj project).
o Updated ClickThroughModel.csproj and ClinicalTrial.csproj and Image_Classifier.cs and MontyHall.csproj to new-style csprojs because otherwise new msbuild commands reject them.
o Made release.yml msbuild calls multiprocess to speed them up.
o Factored out some common properties into the common.props file.
GammaPower creates a point mass whenever Rate would be infinite.
Added GammaPower.FromMeanAndMeanLog.
Improved numerical accuracy of GammaPower.GetLogProb, GetMean, and GetMode.
Added GammaPowerEstimator
GammaProductOp supports GammaPower distributions.
GammaProductOp handles uniform message from product.
Added GammaPowerProductOp_Laplace
Fixed PowerOp for GammaPower distributions.
Added PlusGammaOp for GammaPower distributions.
MMath.GammaUpper has an unregularized option.
Added TruncatedGamma.GetMeanPower.
PowerOp supports TruncatedGamma.
Swapped the argument order of (internal) MMath.LargestDoubleProduct and LargestDoubleRatio.
UnlimitedStatesComputation() is used temporary to alter maximal size of automaton
which is defined my MaxStateCount. Using it from different threads could mess up the limit.
Now each threads gets its own limit.
Also, the default MaxStateCount limit is increased to 300k, because that is what the biggest String inference customer uses.
`CombinedRanges` method was actually 2 completely different methods which are now split in 2
(and simplified a little):
- `IntersectRanges` - returns only segments where both probabilities are non-zero
- `CombineRanges` - returns all segments
After recent refactoring that removed `ProbabilityOutsideRanges`, `DiscreteChar.Complement()`
started to work incorrectly in case ranges were going one after another.
For example DiscreteChar.Point('\0').Complement() was equal to uniform distribution, i.e. still included the \0 char.
Automaton determinization procedure used to keep a running total of weights for each state.
That sum was maintained by adding weights for next open segment and substracting weights
for closed segments. If the weights of segments differ a lot (LogValue difference is bigger than 100)
than due to numerical issues sum could become zero after subtraction which lead to dropping
of transition and automaton language truncation.
Now the weights sum is recalculated from scratch each time. It also results in loss of precision,
but it is important that the precision is lost only for very large weights, not very small ones. So
no accidental zeroing of weights is happening and language is not truncated.
It doesn't really impact runtime because `WeightedStateSet` construction enumerated all weights
anyway (to normalize them), so in the worst case slowdown is at most constant.
And in average case (where we maintain 1 or 2 destination states per transition) the runtime
is actually better due to lower constant costs.
New test (`AutomatonTests.Determinize11`) was added, it used to fail with previous implementation.
To make this change I had to rewrite code substantially which in my opinion makes it easier to follow:
- The determinization procedure now makes use of `CharSegmentsEnumerator` helper class
which enumerates over all char segments from multiple char distributions. These segments are
non-overlapping.
- `WeightedStateSetBuilder` now handles duplicate state indices in `Add()` call. Previously deduplication
had to happen via accumulating sum in `Dictionary<int, WeightSum>`
A lot of code existed that had to treat `ProbabilityOutsideRanges` in a special way.
Now in cases where `ProbabilityOutsideRanges` was non-zero missing ranges are added to
cove all char domain.
`ProbabilityOutsideRanges` had 1 useful property: it needed 2 times less ranges for representing distributions distributions covering whole domain
This property is never used in real code. (And even if it did, it would be in a very few places), so reducing code complexity trumps small performance/space gain.
`DiscreteChar` is basically a wrapper struct around a single pointer.
This pointer can be null if DiscreteChar is created using `default(DiscreteChar)`.
`IsInitialized` method is added which checks whether this pointer is not null.
It is useful to implement caching of `DiscreteChars` using an array `DiscreteChar[] cache`.
Uninitialized cached entries can be found using `cache[index].IsInitialized`, previously
we had to use `Nullable<DiscreteChar>[] cache` which has a measurable size overhead.