Граф коммитов

452 Коммитов

Автор SHA1 Сообщение Дата
Tom Minka 791d11a8a1
MMath.WeightedAverage fix (#379) 2021-12-26 20:03:11 +00:00
Tom Minka 5f394f9b8a
Added IClassifierMapping.LabelToString and ParseLabel. (#378)
BinaryNativeClassifierMapping and MulticlassNativeClassifierMapping serialize labels as strings. Incremented CustomSerializationVersion for each.
BinaryNativeClassifierMapping does not serialize labels when TLabel is bool.
IReader.ReadObject is generic.
2021-12-14 11:33:00 +00:00
Tom Minka 3c825f2d29
BayesPointMachineClassifier.CreateGaussianPriorBinaryClassifier is public. (#376)
* Added Variable.Max(int,int).
* Compiler warns about excess memory consumption in more cases when it should, and fewer cases when it shouldn't.
* TransformBrowser shows attributes by default.
* Updated FactorDocs
2021-12-10 23:37:00 +00:00
Jonathan Tims cc132f77f8
Switch to explicit pools in YAML for PR build (#374) 2021-11-30 14:39:43 +00:00
Tom Minka 6c6fb65dd2
BayesPointMachineClassifier can be serialized as text (#373)
* BayesPointMachineClassifier.LoadBackwardCompatibleBinaryClassifier and SaveForwardCompatible use text or binary depending on the file extension
* Added IWriter and IReader, WrappedBinaryWriter, WrappedBinaryReader, WrappedTextWriter, WrappedTextReader
* Changed uses of BinaryWriter to IWriter
* Changed uses of BinaryReader to IReader
* LearnersTests uses same xunit version as other tests
2021-11-29 23:08:58 +00:00
Ivan Korostelev 85be0de052
Avoid accidental quadratic behavior in caching for Automata calculations (#370)
Introduce "generational" strategy for clearing intermediate data in reused preallocated array for
`Automaton.Condensation.FindStronglyConnectedComponents` state.

To avoid allocations `FindStronglyConnectedComponents` employs a preallocated array with data about states.
It may use this cache multiple times when processing a single automaton. It typically will use only a handful
of entries in this array, but each time it grabs it, it has to clear it in full.

With previous array clearance strategy in the worst case `ComputeEpsilonClosure` was a quadratic algorithm,
since it will call `FindStronglyConnectedComponents` as many times as there are states in automaton - (N).
And `FindStronglyConnectedComponents` will clear array which is also O(N), turning the whole thing into O(N^2)
2021-10-06 14:21:28 +01:00
Ivan Korostelev 2376606055
Remove Pair<> and introduce IntPair (#369)
C# now has a built-in type to represent pairs - `ValueTupe<>`. `Pair<>` was eliminated in favour of it.

At the same time a new type - `IntPair` was introduced. It is faster than `ValueTuple<int, int>` when `.Equals()` is called often.
It makes some common string operations which do many lookups by `IntPair` keys up to 10% faster.
2021-09-30 17:11:04 +01:00
Ivan Korostelev 376918bfd6
Reduce GC pressure induced by string operations (#362)
A lot of automata operations create large short-lived data structures.
Those are now cached in thread local static fields. This saves a lot of allocations and consequently - causes less GC pressure.

To do so also a new container is introduced - `GenerationalDictionary` which can be cleared in constant time and reuses memory after the Clear.

These changes reduce amount of memory allocations done by `StringInferencePerformanceTests` from 4Gb to 2Gb. Another 1 Gb is allocated by `FindStronlyConnectedComponents` which will be fixed in separate PR, it is harder to introduce caching in there. Another 1Gb is set up of the tests (creation of test data).
2021-09-29 18:28:15 +01:00
Tom Minka 591117241e
All-zero Discrete is not a point mass (#368) 2021-09-29 11:19:15 +01:00
Tom Minka 76ec66dcca
Changed the BCC confusion matrix prior (#367)
Changed the BCC confusion matrix prior so that TrueLabels can be inferred when LabelCount==2.
Fixed serialization of BCC posteriors.  BCC posteriors now save to the Results folder.
Fixed serialization example code.
2021-09-29 07:09:38 +01:00
Tom Minka c90ad0bf1e
Discrete distribution can be all zero (#364)
Added Discrete.Zero and IsZero.
IntegerPlusOp allows result to be the same object as an input.
2021-09-23 17:16:56 +01:00
Ivan Korostelev 8370c9e8e7
Simplify discrete char implementation (#361)
There are two changes:
1. (major) `LogProbabilityOverride` was removed from `DiscreteChar` and `ImmutableDiscreteChar`
    This was an unsound functionality that is not used and was complicating the implementation of char distributions
2. (minor) `ImmutableDiscreteChar.Multiply` may reuse immutable discrete char in more cases out of the box. 
   This reduces the GC pressure a little bit.
2021-09-14 21:53:05 +01:00
Tom Minka abae518039
OuterQuantiles handles extreme values (#363)
* Subarray checks that indices are distinct when debugging
* Renamed ConcatOpTests to StringConcatOpTests
* Crowdsourcing explains why accuracy and precision are NaN.
* Added build instructions for Visual Studio Code.
2021-09-13 18:20:37 +01:00
msdmkats 321be1463d
Extend SequenceDistribution API and bugfix (#352)
- Fixed automaton deserialization ignoring LogValueOverride
- Fixed SequenceDistribution.EnumerateSupport and TryEnumerateSupport having different side effects
- Added TryDeterminize, SetLogValueOverride, and ProjectOnTransducer methods to SequenceDistribution
- Added a parameterless overload for Automaton.TryDeterminize that returns the output of TryDeterminize(out TThis) and discards information about deterministicity of said output
2021-07-26 16:54:41 +03:00
Tom Minka 14d4164ebc
Improved handling of numerical overflow (#347)
Improved handling of numerical overflow in GaussianProductVmpOp.ProductAverageLogarithm, DoublePlusOp, Gaussian.FromDerivatives, Gaussian.GetDerivatives, GammaFromShapeAndRateOp_Slow.SampleAverageConditional
2021-07-13 00:27:14 +01:00
msdmkats d27df8ce03
Automata: immutability and optimized GetLogValue (#344)
* Immutable distribution interfaces

* DiscreteChar made immutable

* Automata made constant

* Automaton.GetLogValue optimized for cases of deterministic and epsilon-free automata

* Fixed Automaton.[Try]EnumerateSupport so that it won't produce duplicates for non-determinizable automata
2021-07-08 17:03:53 +03:00
msdmkats bd48c63627
Multi-representable weight function for sequence distribution (#339)
* Introduced IWeightFunction - interface for abstract weight functions used by SequenceDistribution

* Multi-representable weight function for sequence distribution, that automatically switches between point mass, dictionary, and autoaton representations as appropriate

* Early stops for automaton support enumeration

* Improved automata graphviz format

* Language writer correctly processes nested generics
2021-06-08 14:03:44 +03:00
Martin 1d9536b2b2
Fix SonarQube's "IEnumerable LINQs should be simplified" / Code Smell (#341) 2021-05-13 21:47:27 +01:00
Tom Minka 95a6d39f7a
Fixed an issue where IsBetweenGaussianOp would throw "ylInvSqrtVxlPlusAlphaX is NaN" (#338)
* Renamed IsBetweenOp -> IsBetweenGaussianOp
2021-04-19 20:08:07 +01:00
Jonathan Tims bd821dac12
Migrate to VSEngSS-MicroBuild2019
Migrate to VSEngSS-MicroBuild2019
2021-04-13 18:04:19 +01:00
Jonathan Tims d4966ec295
Update status badges (for default branch rename)
Update status badges (for default branch rename)
2021-04-07 09:41:57 +01:00
Jonathan Tims 33bc436481
Update status badges (for default branch rename)
A different link is required now that the default branch has been renamed.
2021-04-07 09:28:00 +01:00
Jonathan Tims 50afbcd442
Disabled "all compiler options" build for PRs (#331) 2021-03-18 12:33:15 +00:00
Tom Minka 9b9c7a0571
Variable.Subarray, GetItems, Observed, and Constant are overloaded for IReadOnlyList arguments instead of IList (#330)
* Incremented version to 0.4
* Subarray and GetItems factors and operators take IReadOnlyList instead of IList.
* IMatchboxRecommenderMapping uses IReadOnlyList instead of IList.
* Moved Subarray and GetItems factors from Factor class to Collection class.
* Moved variable factors from Factor class to Clone class.
* Conversion.IsAssignableFrom handles covariance.
* Util.GetElementType and IsIList include IReadOnlyList.
* Code cleanup
* Refactored MessageTransform.ConvertMethodInvoke
* Removed Collection.Sort
2021-03-18 12:15:16 +00:00
Jonathan Tims dcb5ac5d5d
Move compiler options test to separate net461 process (#310)
# problem
The Azure DevOps VSTest runner is not handling the Compiler Options test well, because it takes so long. Also the test takes a very long time to run.

# solution
Run different options in parallel, and in a separate process to finish reliably and in a reasonable time.
2021-03-18 09:21:27 +00:00
Jonathan Tims b61f522d9c
Change references to "main" branch
Change references to "main" branch
2021-03-17 21:28:39 +00:00
Tom Minka 9cd729fd29
CopyPropagationTransform can substitute a value of different type if the context allows it (#328)
InferNet.Infer is generic.
ModelBuilder uses the correct overload of InferNet.Infer.
CodeBuilder.Method checks for correct number of arguments.
FileArray implements IReadOnlyList.
2021-03-16 15:06:59 +00:00
Tom Minka 4394af9e21
Added GetProbBetween to CanGetProbLessThan (#325)
CanGetQuantile and CanGetProbLessThan are generic.
2021-03-12 07:16:01 +00:00
Tom Minka b16f8074d4
Added MeanAccumulator and MMath.WeightedAverage (#324) 2021-03-11 18:49:31 +00:00
Tom Minka eeae7afcb5
Added TruncatedPoisson (#323)
* Added WordStrings and StringFormatTests.
* FactorManager does not allow point mass conversion of the return value argument of EP evidence methods (previously handled by MessageTransform).
* Code cleanup.  Renamed IdentityComparer to ReferenceEqualityComparer.
2021-03-05 18:29:54 +00:00
Tom Minka 3624808df0
Removed the unnecessary new() constraint on ICollectionDistribution.TransformElements and strengthened the method signature. (#321)
* GammaPower.GetProbLessThan returns 0 for x < 0
* LDA example does not use DistributionRefArray
* Updated FactorDocs
* Updated Sequential code doc
2021-02-26 00:38:53 +00:00
Tom Minka 8f3189b8e0
StringFormatOp takes IReadOnlyList (#320) 2021-02-09 15:10:29 +00:00
Tom Minka 7ae07237fe
SchedulingTransform identifies when a model with offset edges does not need iteration. (#319)
Fixed ForwardBackwardTransform.
2021-02-09 00:40:02 +00:00
Ivan Korostelev ac0c035bb9
Micro-optimizations for string distributions (#317)
* Enumerating support of distribution does not require it to be normalized unless determinization is requested.
   Avoiding doing this speeds up enumerating some auotomata 10x.
* states of `EpsilonClosure` can be stored in plain array instead of list. This minor change speeds up epsilon
   closure operations by 20% due to one less level of indirection and better memory allocation.
2021-02-08 14:19:42 +00:00
Ivan Korostelev 429b4debe7
Reimplement EnumerateSupport() (#315)
Previous version was correct but had a pathalogical (exponential) runtime for some forms of automata
with multiple branches with epsilon transitions. Also code was unneccessary complex, because it tried
to compute unreachable states in presence of loops.

Specific changes in this PR:
- `EnumerateSupport` implementation was moved into its own file - `Automaton.EnumerateSupport`
- `ComputeEndStateReachability` method has been resurrected. It is invoked only if loops are detected
   In all other cases, it is trivial to check for end-state reachability during normal traversal
- Traversal loop has been split in steps, some of which were moved into their own local methods.
- Fast path for non-branchy part of automaton was implemented.
2021-02-08 12:06:09 +00:00
Tom Minka b34f76f193
Channel2Transform creates unique array names (#316)
Removed Cancels attribute from PowerPlateOp.EnterAverageConditional
Sum_Expanded handles an empty array.
Dirichlet.GetMean handles zero pseudo-counts.
Moved TrueSkill tests into TrueSkillTests.
2021-02-04 08:10:25 +00:00
Tom Minka 9235b1732c
Added Python versions of string tutorials (#313) 2021-01-21 13:05:40 +00:00
Tom Minka 0c75b5ef41
Basic support for difference of Beta-distributed variables (#309) 2021-01-05 00:34:00 +00:00
Tom Minka 920e3facbc
DoubleIsBetweenOp with random bounds is more accurate (#306) 2020-12-23 23:07:54 +00:00
Tom Minka 9b79818034
Update motif finder example (#304)
* Motif Finder uses SequenceCount = 70
* DiscreteFromDirichletOp.ProbsAverageConditional handles PiecewiseVector
* GammaPower.FromLogMeanMinusMeanLog handles infinite mean and negative power
2020-11-27 11:12:31 +00:00
Tom Minka 3baf283b6f
Fixed ExpOp with point mass arguments (#303)
* Added Gamma.GetLogMeanMinusMeanLog and FromLogMeanMinusMeanLog
2020-11-23 22:02:32 +00:00
Tom Minka 69a7a1979f
Improved convergence rate of GammaPower.FromMeanAndMeanPower (#301)
* Improved accuracy of ExpOp.ExpAverageConditional
* Tutorials project uses Unicode OutputEncoding
* Fixed MethodBodySynthesizer
2020-11-18 23:44:47 +00:00
Tom Minka dcc3503b7d
Fixed TruncatedGaussian.SetToRatio (#300)
* Improved accuracy of GammaPower.FromMeanAndMeanLog, ExpOp, ExpOp_Slow
* Added MMath.DigammaInv
2020-11-11 08:01:47 +00:00
Tom Minka b3d4436e1f
Generated code uses .NET arrays where possible (#298)
* Improved accuracy of Gamma.FromMeanAndMeanLog, ExpOp, PlusGammaOp, PlusGammaVmpOp
* Fixed GammaPower.SetMeanAndVariance for infinite variance
* TruncatedGamma implements CanGetQuantile
* TruncatedGamma.Sample is faster when shape==1
* Extracted PlusWrappedGaussianOp and PlusTruncatedGaussianOp from PlusDoubleOp
* Gamma.FromMeanAndMeanLog takes an optimal logMean argument.  Removed Gamma.FromLogMeanAndMeanLog.
* Improved accuracy of GammaPower.FromMeanAndMeanLog.
* ExpOp supports GammaPower output.
2020-10-29 19:21:12 +00:00
Jonathan Tims 0be0aefe1d
Rename ImmutableArray to ReadOnlyArray (#293) 2020-09-30 18:54:20 +01:00
Jonathan Tims 568cb96cda
QuantileEstimator stores its random number generator (#291)
This allows QuantileEstimator to produce the same result even if it is serialized/deserialized in the middle.
2020-09-25 15:42:09 +01:00
Tom Minka f1c9073776
MethodInvoke.CanBeInlined returns true for more cases (#292)
* Tests use Assert.Throws instead of try/catch

* Test output is less verbose

* Added SimplestBackwardChainTest3

* Fixed IterativeProcessTransform when variable has QueryTypes.MarginalDividedByPrior and ConstrainEqualRandom(variable, observed)

* Discrete allows Dimension=0
2020-09-24 21:38:30 +01:00
msdmkats e167ce18f3
Use ulp-based constants (#289)
* Fix long overflow in OperatorTests.Longs()

* Fixed printing big float arrays, extended series for ((exp(x) - 1) / x - 1) - 0.5

* GammaPower.GetLogProb uses ulp-based threshold

* Moved a constant for maximum terms in NormalCdfMomentRatio outside of method body

* Separate methods for Previous/NextDoubleWithPositiveDifference

* More magic constants replaced with ulp-based ones

* Fixed abs value comparison in GammaPower.GetLogProb

* Ulp-based constatnts in IsBetween.XAverageConditional

* Moved the definitions of Ulp1-dependant constants below that of Ulp1

* Replaced  <= in GammaPower with a < as it used to be

Co-authored-by: Dmitry Kats <ratkillerx@hotmail.com>
2020-09-15 10:18:21 +01:00
Tom Minka 9ca3230018
Update URL for BookCrossing dataset (#288)
* Added Matrix.FromDiagonal
2020-09-09 23:54:45 +01:00
Tom Minka 2e03d713b9
Fixed Binomial.GetLogProb and Gamma.GetLogNormalizer (#287)
* Removed references to NETFULL.
* Don't define NET45.
* Don't define NETCORE, NETSTANDARD, or NETSTANDARD2_0
2020-09-08 13:39:12 +01:00