Граф коммитов

166 Коммитов

Автор SHA1 Сообщение Дата
Jonathan Tims 2f58f64afb w 2022-06-22 07:54:27 +01:00
Tom Minka 55d26a7138
Added Variable.ParallelCopy (#404)
* VectorGaussian.SetToRatio correctly enforces forceProper
* BeliefPropagationGateEnterPartialOp.ValueAverageConditional double-checks for conflicting values
* Added ReplicateOp_Divide_NoInit
* PlusProductHierarchyTest is OpenBug
* MatrixVectorProduct has optional check
* Deterministic gates track the number of cases
2022-05-03 22:27:20 +01:00
Tom Minka ca25068ead
WetGlassSprinklerRainModel has flexible state counts (#402)
* Expanded CausalityExample
* Variable.Dirichlet and Discrete check the array size
* IterativeProcessTransform ensures Marginal methods have unique names
* DependencyGraph.initializedEdges overrides required edges
* DependencyAnalysisTransform requires gated definitions to fully precede their uses
2022-04-08 20:14:57 +01:00
Tom Minka 68a10ac833
Rprop stepsize is lower bounded (#398)
Reduced the number of false "excess memory" warnings from GateTransform.
2022-03-02 21:10:46 +00:00
Tom Minka 45d2e593ba
Improved accuracy of MMath.WeightedAverage and GaussianProductOp (#395)
* RpropBufferData.EnsureConvergence = false
* GetJaggedItemsOp and GetItemsFromJaggedOp use IReadOnlyList
2022-02-16 14:58:03 +00:00
Andrii Kurdiumov 1b4c479339
.NET 5 instead of .NET Core 3.1 (#273)
* Rename netcoreapp3.1 to net5.0 in all files
* Remove references to Microsoft.NET.Sdk.WindowsDesktop
* WinForms/WPF examples use net5.0-windows
2022-01-26 13:22:26 +00:00
Tom Minka 9b28c62294
Improved support for computing derivatives (#390)
* Inferred constant variables are not inlined
* Added Factor.ProbLessThan, ProbBetween, Quantile, Integral, Apply.
* ModelCompiler handles Delegate-valued variables.
* RatioGaussianOp handles ratio instead of ProductOp.
* ExpOp_Slow and LogOp_EP can compute derivatives
* Refactored ConstantFoldingTransform out of ModelAnalysisTransform
* CodeBuilder.MethodRefExpr takes arguments.
* Added VariableInformation.NeedsMarginalDividedByPrior and CodeRecognizer.NeedsMarginalDividedByPrior
* MessageTransform does not use a distribution for the forward message of a constant.
2022-01-25 21:41:52 +00:00
Tom Minka b6dd9630f5
DiscreteEnumAreEqualOp supports VMP (#387) 2022-01-08 18:47:02 +00:00
Tom Minka 079250ef67
Observed Beta variables support a Gamma-distributed pseudocount if the other pseudocount is always 1 (#386)
* ShowFactorManager shows more patterns
* Tidy up code
2022-01-07 18:07:48 +00:00
Tom Minka 791d11a8a1
MMath.WeightedAverage fix (#379) 2021-12-26 20:03:11 +00:00
Tom Minka 5f394f9b8a
Added IClassifierMapping.LabelToString and ParseLabel. (#378)
BinaryNativeClassifierMapping and MulticlassNativeClassifierMapping serialize labels as strings. Incremented CustomSerializationVersion for each.
BinaryNativeClassifierMapping does not serialize labels when TLabel is bool.
IReader.ReadObject is generic.
2021-12-14 11:33:00 +00:00
Tom Minka 3c825f2d29
BayesPointMachineClassifier.CreateGaussianPriorBinaryClassifier is public. (#376)
* Added Variable.Max(int,int).
* Compiler warns about excess memory consumption in more cases when it should, and fewer cases when it shouldn't.
* TransformBrowser shows attributes by default.
* Updated FactorDocs
2021-12-10 23:37:00 +00:00
Tom Minka 6c6fb65dd2
BayesPointMachineClassifier can be serialized as text (#373)
* BayesPointMachineClassifier.LoadBackwardCompatibleBinaryClassifier and SaveForwardCompatible use text or binary depending on the file extension
* Added IWriter and IReader, WrappedBinaryWriter, WrappedBinaryReader, WrappedTextWriter, WrappedTextReader
* Changed uses of BinaryWriter to IWriter
* Changed uses of BinaryReader to IReader
* LearnersTests uses same xunit version as other tests
2021-11-29 23:08:58 +00:00
Ivan Korostelev 2376606055
Remove Pair<> and introduce IntPair (#369)
C# now has a built-in type to represent pairs - `ValueTupe<>`. `Pair<>` was eliminated in favour of it.

At the same time a new type - `IntPair` was introduced. It is faster than `ValueTuple<int, int>` when `.Equals()` is called often.
It makes some common string operations which do many lookups by `IntPair` keys up to 10% faster.
2021-09-30 17:11:04 +01:00
Ivan Korostelev 376918bfd6
Reduce GC pressure induced by string operations (#362)
A lot of automata operations create large short-lived data structures.
Those are now cached in thread local static fields. This saves a lot of allocations and consequently - causes less GC pressure.

To do so also a new container is introduced - `GenerationalDictionary` which can be cleared in constant time and reuses memory after the Clear.

These changes reduce amount of memory allocations done by `StringInferencePerformanceTests` from 4Gb to 2Gb. Another 1 Gb is allocated by `FindStronlyConnectedComponents` which will be fixed in separate PR, it is harder to introduce caching in there. Another 1Gb is set up of the tests (creation of test data).
2021-09-29 18:28:15 +01:00
Tom Minka 591117241e
All-zero Discrete is not a point mass (#368) 2021-09-29 11:19:15 +01:00
Tom Minka 76ec66dcca
Changed the BCC confusion matrix prior (#367)
Changed the BCC confusion matrix prior so that TrueLabels can be inferred when LabelCount==2.
Fixed serialization of BCC posteriors.  BCC posteriors now save to the Results folder.
Fixed serialization example code.
2021-09-29 07:09:38 +01:00
Tom Minka c90ad0bf1e
Discrete distribution can be all zero (#364)
Added Discrete.Zero and IsZero.
IntegerPlusOp allows result to be the same object as an input.
2021-09-23 17:16:56 +01:00
Ivan Korostelev 8370c9e8e7
Simplify discrete char implementation (#361)
There are two changes:
1. (major) `LogProbabilityOverride` was removed from `DiscreteChar` and `ImmutableDiscreteChar`
    This was an unsound functionality that is not used and was complicating the implementation of char distributions
2. (minor) `ImmutableDiscreteChar.Multiply` may reuse immutable discrete char in more cases out of the box. 
   This reduces the GC pressure a little bit.
2021-09-14 21:53:05 +01:00
Tom Minka abae518039
OuterQuantiles handles extreme values (#363)
* Subarray checks that indices are distinct when debugging
* Renamed ConcatOpTests to StringConcatOpTests
* Crowdsourcing explains why accuracy and precision are NaN.
* Added build instructions for Visual Studio Code.
2021-09-13 18:20:37 +01:00
msdmkats 321be1463d
Extend SequenceDistribution API and bugfix (#352)
- Fixed automaton deserialization ignoring LogValueOverride
- Fixed SequenceDistribution.EnumerateSupport and TryEnumerateSupport having different side effects
- Added TryDeterminize, SetLogValueOverride, and ProjectOnTransducer methods to SequenceDistribution
- Added a parameterless overload for Automaton.TryDeterminize that returns the output of TryDeterminize(out TThis) and discards information about deterministicity of said output
2021-07-26 16:54:41 +03:00
Tom Minka 14d4164ebc
Improved handling of numerical overflow (#347)
Improved handling of numerical overflow in GaussianProductVmpOp.ProductAverageLogarithm, DoublePlusOp, Gaussian.FromDerivatives, Gaussian.GetDerivatives, GammaFromShapeAndRateOp_Slow.SampleAverageConditional
2021-07-13 00:27:14 +01:00
msdmkats d27df8ce03
Automata: immutability and optimized GetLogValue (#344)
* Immutable distribution interfaces

* DiscreteChar made immutable

* Automata made constant

* Automaton.GetLogValue optimized for cases of deterministic and epsilon-free automata

* Fixed Automaton.[Try]EnumerateSupport so that it won't produce duplicates for non-determinizable automata
2021-07-08 17:03:53 +03:00
msdmkats bd48c63627
Multi-representable weight function for sequence distribution (#339)
* Introduced IWeightFunction - interface for abstract weight functions used by SequenceDistribution

* Multi-representable weight function for sequence distribution, that automatically switches between point mass, dictionary, and autoaton representations as appropriate

* Early stops for automaton support enumeration

* Improved automata graphviz format

* Language writer correctly processes nested generics
2021-06-08 14:03:44 +03:00
Tom Minka 95a6d39f7a
Fixed an issue where IsBetweenGaussianOp would throw "ylInvSqrtVxlPlusAlphaX is NaN" (#338)
* Renamed IsBetweenOp -> IsBetweenGaussianOp
2021-04-19 20:08:07 +01:00
Tom Minka 9b9c7a0571
Variable.Subarray, GetItems, Observed, and Constant are overloaded for IReadOnlyList arguments instead of IList (#330)
* Incremented version to 0.4
* Subarray and GetItems factors and operators take IReadOnlyList instead of IList.
* IMatchboxRecommenderMapping uses IReadOnlyList instead of IList.
* Moved Subarray and GetItems factors from Factor class to Collection class.
* Moved variable factors from Factor class to Clone class.
* Conversion.IsAssignableFrom handles covariance.
* Util.GetElementType and IsIList include IReadOnlyList.
* Code cleanup
* Refactored MessageTransform.ConvertMethodInvoke
* Removed Collection.Sort
2021-03-18 12:15:16 +00:00
Jonathan Tims dcb5ac5d5d
Move compiler options test to separate net461 process (#310)
# problem
The Azure DevOps VSTest runner is not handling the Compiler Options test well, because it takes so long. Also the test takes a very long time to run.

# solution
Run different options in parallel, and in a separate process to finish reliably and in a reasonable time.
2021-03-18 09:21:27 +00:00
Tom Minka 9cd729fd29
CopyPropagationTransform can substitute a value of different type if the context allows it (#328)
InferNet.Infer is generic.
ModelBuilder uses the correct overload of InferNet.Infer.
CodeBuilder.Method checks for correct number of arguments.
FileArray implements IReadOnlyList.
2021-03-16 15:06:59 +00:00
Tom Minka 4394af9e21
Added GetProbBetween to CanGetProbLessThan (#325)
CanGetQuantile and CanGetProbLessThan are generic.
2021-03-12 07:16:01 +00:00
Tom Minka b16f8074d4
Added MeanAccumulator and MMath.WeightedAverage (#324) 2021-03-11 18:49:31 +00:00
Tom Minka eeae7afcb5
Added TruncatedPoisson (#323)
* Added WordStrings and StringFormatTests.
* FactorManager does not allow point mass conversion of the return value argument of EP evidence methods (previously handled by MessageTransform).
* Code cleanup.  Renamed IdentityComparer to ReferenceEqualityComparer.
2021-03-05 18:29:54 +00:00
Tom Minka 3624808df0
Removed the unnecessary new() constraint on ICollectionDistribution.TransformElements and strengthened the method signature. (#321)
* GammaPower.GetProbLessThan returns 0 for x < 0
* LDA example does not use DistributionRefArray
* Updated FactorDocs
* Updated Sequential code doc
2021-02-26 00:38:53 +00:00
Tom Minka 7ae07237fe
SchedulingTransform identifies when a model with offset edges does not need iteration. (#319)
Fixed ForwardBackwardTransform.
2021-02-09 00:40:02 +00:00
Ivan Korostelev 429b4debe7
Reimplement EnumerateSupport() (#315)
Previous version was correct but had a pathalogical (exponential) runtime for some forms of automata
with multiple branches with epsilon transitions. Also code was unneccessary complex, because it tried
to compute unreachable states in presence of loops.

Specific changes in this PR:
- `EnumerateSupport` implementation was moved into its own file - `Automaton.EnumerateSupport`
- `ComputeEndStateReachability` method has been resurrected. It is invoked only if loops are detected
   In all other cases, it is trivial to check for end-state reachability during normal traversal
- Traversal loop has been split in steps, some of which were moved into their own local methods.
- Fast path for non-branchy part of automaton was implemented.
2021-02-08 12:06:09 +00:00
Tom Minka b34f76f193
Channel2Transform creates unique array names (#316)
Removed Cancels attribute from PowerPlateOp.EnterAverageConditional
Sum_Expanded handles an empty array.
Dirichlet.GetMean handles zero pseudo-counts.
Moved TrueSkill tests into TrueSkillTests.
2021-02-04 08:10:25 +00:00
Tom Minka 9235b1732c
Added Python versions of string tutorials (#313) 2021-01-21 13:05:40 +00:00
Tom Minka 0c75b5ef41
Basic support for difference of Beta-distributed variables (#309) 2021-01-05 00:34:00 +00:00
Tom Minka 920e3facbc
DoubleIsBetweenOp with random bounds is more accurate (#306) 2020-12-23 23:07:54 +00:00
Tom Minka 9b79818034
Update motif finder example (#304)
* Motif Finder uses SequenceCount = 70
* DiscreteFromDirichletOp.ProbsAverageConditional handles PiecewiseVector
* GammaPower.FromLogMeanMinusMeanLog handles infinite mean and negative power
2020-11-27 11:12:31 +00:00
Tom Minka 3baf283b6f
Fixed ExpOp with point mass arguments (#303)
* Added Gamma.GetLogMeanMinusMeanLog and FromLogMeanMinusMeanLog
2020-11-23 22:02:32 +00:00
Tom Minka 69a7a1979f
Improved convergence rate of GammaPower.FromMeanAndMeanPower (#301)
* Improved accuracy of ExpOp.ExpAverageConditional
* Tutorials project uses Unicode OutputEncoding
* Fixed MethodBodySynthesizer
2020-11-18 23:44:47 +00:00
Tom Minka dcc3503b7d
Fixed TruncatedGaussian.SetToRatio (#300)
* Improved accuracy of GammaPower.FromMeanAndMeanLog, ExpOp, ExpOp_Slow
* Added MMath.DigammaInv
2020-11-11 08:01:47 +00:00
Tom Minka b3d4436e1f
Generated code uses .NET arrays where possible (#298)
* Improved accuracy of Gamma.FromMeanAndMeanLog, ExpOp, PlusGammaOp, PlusGammaVmpOp
* Fixed GammaPower.SetMeanAndVariance for infinite variance
* TruncatedGamma implements CanGetQuantile
* TruncatedGamma.Sample is faster when shape==1
* Extracted PlusWrappedGaussianOp and PlusTruncatedGaussianOp from PlusDoubleOp
* Gamma.FromMeanAndMeanLog takes an optimal logMean argument.  Removed Gamma.FromLogMeanAndMeanLog.
* Improved accuracy of GammaPower.FromMeanAndMeanLog.
* ExpOp supports GammaPower output.
2020-10-29 19:21:12 +00:00
Jonathan Tims 0be0aefe1d
Rename ImmutableArray to ReadOnlyArray (#293) 2020-09-30 18:54:20 +01:00
Jonathan Tims 568cb96cda
QuantileEstimator stores its random number generator (#291)
This allows QuantileEstimator to produce the same result even if it is serialized/deserialized in the middle.
2020-09-25 15:42:09 +01:00
Tom Minka f1c9073776
MethodInvoke.CanBeInlined returns true for more cases (#292)
* Tests use Assert.Throws instead of try/catch

* Test output is less verbose

* Added SimplestBackwardChainTest3

* Fixed IterativeProcessTransform when variable has QueryTypes.MarginalDividedByPrior and ConstrainEqualRandom(variable, observed)

* Discrete allows Dimension=0
2020-09-24 21:38:30 +01:00
msdmkats e167ce18f3
Use ulp-based constants (#289)
* Fix long overflow in OperatorTests.Longs()

* Fixed printing big float arrays, extended series for ((exp(x) - 1) / x - 1) - 0.5

* GammaPower.GetLogProb uses ulp-based threshold

* Moved a constant for maximum terms in NormalCdfMomentRatio outside of method body

* Separate methods for Previous/NextDoubleWithPositiveDifference

* More magic constants replaced with ulp-based ones

* Fixed abs value comparison in GammaPower.GetLogProb

* Ulp-based constatnts in IsBetween.XAverageConditional

* Moved the definitions of Ulp1-dependant constants below that of Ulp1

* Replaced  <= in GammaPower with a < as it used to be

Co-authored-by: Dmitry Kats <ratkillerx@hotmail.com>
2020-09-15 10:18:21 +01:00
Tom Minka 9ca3230018
Update URL for BookCrossing dataset (#288)
* Added Matrix.FromDiagonal
2020-09-09 23:54:45 +01:00
Tom Minka 2e03d713b9
Fixed Binomial.GetLogProb and Gamma.GetLogNormalizer (#287)
* Removed references to NETFULL.
* Don't define NET45.
* Don't define NETCORE, NETSTANDARD, or NETSTANDARD2_0
2020-09-08 13:39:12 +01:00
Andrii Kurdiumov bc953687bf
TestFSharp uses .NET 4.7.2 (#285)
This is required for .NET 5 support as explained at https://github.com/dotnet/fsharp/issues/10009#issuecomment-681019355
2020-09-01 16:22:40 +01:00