Граф коммитов

197 Коммитов

Автор SHA1 Сообщение Дата
Tom Minka abae518039
OuterQuantiles handles extreme values (#363)
* Subarray checks that indices are distinct when debugging
* Renamed ConcatOpTests to StringConcatOpTests
* Crowdsourcing explains why accuracy and precision are NaN.
* Added build instructions for Visual Studio Code.
2021-09-13 18:20:37 +01:00
msdmkats 321be1463d
Extend SequenceDistribution API and bugfix (#352)
- Fixed automaton deserialization ignoring LogValueOverride
- Fixed SequenceDistribution.EnumerateSupport and TryEnumerateSupport having different side effects
- Added TryDeterminize, SetLogValueOverride, and ProjectOnTransducer methods to SequenceDistribution
- Added a parameterless overload for Automaton.TryDeterminize that returns the output of TryDeterminize(out TThis) and discards information about deterministicity of said output
2021-07-26 16:54:41 +03:00
Tom Minka 14d4164ebc
Improved handling of numerical overflow (#347)
Improved handling of numerical overflow in GaussianProductVmpOp.ProductAverageLogarithm, DoublePlusOp, Gaussian.FromDerivatives, Gaussian.GetDerivatives, GammaFromShapeAndRateOp_Slow.SampleAverageConditional
2021-07-13 00:27:14 +01:00
msdmkats d27df8ce03
Automata: immutability and optimized GetLogValue (#344)
* Immutable distribution interfaces

* DiscreteChar made immutable

* Automata made constant

* Automaton.GetLogValue optimized for cases of deterministic and epsilon-free automata

* Fixed Automaton.[Try]EnumerateSupport so that it won't produce duplicates for non-determinizable automata
2021-07-08 17:03:53 +03:00
msdmkats bd48c63627
Multi-representable weight function for sequence distribution (#339)
* Introduced IWeightFunction - interface for abstract weight functions used by SequenceDistribution

* Multi-representable weight function for sequence distribution, that automatically switches between point mass, dictionary, and autoaton representations as appropriate

* Early stops for automaton support enumeration

* Improved automata graphviz format

* Language writer correctly processes nested generics
2021-06-08 14:03:44 +03:00
Tom Minka 95a6d39f7a
Fixed an issue where IsBetweenGaussianOp would throw "ylInvSqrtVxlPlusAlphaX is NaN" (#338)
* Renamed IsBetweenOp -> IsBetweenGaussianOp
2021-04-19 20:08:07 +01:00
Tom Minka 9b9c7a0571
Variable.Subarray, GetItems, Observed, and Constant are overloaded for IReadOnlyList arguments instead of IList (#330)
* Incremented version to 0.4
* Subarray and GetItems factors and operators take IReadOnlyList instead of IList.
* IMatchboxRecommenderMapping uses IReadOnlyList instead of IList.
* Moved Subarray and GetItems factors from Factor class to Collection class.
* Moved variable factors from Factor class to Clone class.
* Conversion.IsAssignableFrom handles covariance.
* Util.GetElementType and IsIList include IReadOnlyList.
* Code cleanup
* Refactored MessageTransform.ConvertMethodInvoke
* Removed Collection.Sort
2021-03-18 12:15:16 +00:00
Jonathan Tims dcb5ac5d5d
Move compiler options test to separate net461 process (#310)
# problem
The Azure DevOps VSTest runner is not handling the Compiler Options test well, because it takes so long. Also the test takes a very long time to run.

# solution
Run different options in parallel, and in a separate process to finish reliably and in a reasonable time.
2021-03-18 09:21:27 +00:00
Tom Minka 9cd729fd29
CopyPropagationTransform can substitute a value of different type if the context allows it (#328)
InferNet.Infer is generic.
ModelBuilder uses the correct overload of InferNet.Infer.
CodeBuilder.Method checks for correct number of arguments.
FileArray implements IReadOnlyList.
2021-03-16 15:06:59 +00:00
Tom Minka 4394af9e21
Added GetProbBetween to CanGetProbLessThan (#325)
CanGetQuantile and CanGetProbLessThan are generic.
2021-03-12 07:16:01 +00:00
Tom Minka b16f8074d4
Added MeanAccumulator and MMath.WeightedAverage (#324) 2021-03-11 18:49:31 +00:00
Tom Minka eeae7afcb5
Added TruncatedPoisson (#323)
* Added WordStrings and StringFormatTests.
* FactorManager does not allow point mass conversion of the return value argument of EP evidence methods (previously handled by MessageTransform).
* Code cleanup.  Renamed IdentityComparer to ReferenceEqualityComparer.
2021-03-05 18:29:54 +00:00
Tom Minka 3624808df0
Removed the unnecessary new() constraint on ICollectionDistribution.TransformElements and strengthened the method signature. (#321)
* GammaPower.GetProbLessThan returns 0 for x < 0
* LDA example does not use DistributionRefArray
* Updated FactorDocs
* Updated Sequential code doc
2021-02-26 00:38:53 +00:00
Tom Minka 7ae07237fe
SchedulingTransform identifies when a model with offset edges does not need iteration. (#319)
Fixed ForwardBackwardTransform.
2021-02-09 00:40:02 +00:00
Ivan Korostelev 429b4debe7
Reimplement EnumerateSupport() (#315)
Previous version was correct but had a pathalogical (exponential) runtime for some forms of automata
with multiple branches with epsilon transitions. Also code was unneccessary complex, because it tried
to compute unreachable states in presence of loops.

Specific changes in this PR:
- `EnumerateSupport` implementation was moved into its own file - `Automaton.EnumerateSupport`
- `ComputeEndStateReachability` method has been resurrected. It is invoked only if loops are detected
   In all other cases, it is trivial to check for end-state reachability during normal traversal
- Traversal loop has been split in steps, some of which were moved into their own local methods.
- Fast path for non-branchy part of automaton was implemented.
2021-02-08 12:06:09 +00:00
Tom Minka b34f76f193
Channel2Transform creates unique array names (#316)
Removed Cancels attribute from PowerPlateOp.EnterAverageConditional
Sum_Expanded handles an empty array.
Dirichlet.GetMean handles zero pseudo-counts.
Moved TrueSkill tests into TrueSkillTests.
2021-02-04 08:10:25 +00:00
Tom Minka 9235b1732c
Added Python versions of string tutorials (#313) 2021-01-21 13:05:40 +00:00
Tom Minka 0c75b5ef41
Basic support for difference of Beta-distributed variables (#309) 2021-01-05 00:34:00 +00:00
Tom Minka 920e3facbc
DoubleIsBetweenOp with random bounds is more accurate (#306) 2020-12-23 23:07:54 +00:00
Tom Minka 9b79818034
Update motif finder example (#304)
* Motif Finder uses SequenceCount = 70
* DiscreteFromDirichletOp.ProbsAverageConditional handles PiecewiseVector
* GammaPower.FromLogMeanMinusMeanLog handles infinite mean and negative power
2020-11-27 11:12:31 +00:00
Tom Minka 3baf283b6f
Fixed ExpOp with point mass arguments (#303)
* Added Gamma.GetLogMeanMinusMeanLog and FromLogMeanMinusMeanLog
2020-11-23 22:02:32 +00:00
Tom Minka 69a7a1979f
Improved convergence rate of GammaPower.FromMeanAndMeanPower (#301)
* Improved accuracy of ExpOp.ExpAverageConditional
* Tutorials project uses Unicode OutputEncoding
* Fixed MethodBodySynthesizer
2020-11-18 23:44:47 +00:00
Tom Minka dcc3503b7d
Fixed TruncatedGaussian.SetToRatio (#300)
* Improved accuracy of GammaPower.FromMeanAndMeanLog, ExpOp, ExpOp_Slow
* Added MMath.DigammaInv
2020-11-11 08:01:47 +00:00
Tom Minka b3d4436e1f
Generated code uses .NET arrays where possible (#298)
* Improved accuracy of Gamma.FromMeanAndMeanLog, ExpOp, PlusGammaOp, PlusGammaVmpOp
* Fixed GammaPower.SetMeanAndVariance for infinite variance
* TruncatedGamma implements CanGetQuantile
* TruncatedGamma.Sample is faster when shape==1
* Extracted PlusWrappedGaussianOp and PlusTruncatedGaussianOp from PlusDoubleOp
* Gamma.FromMeanAndMeanLog takes an optimal logMean argument.  Removed Gamma.FromLogMeanAndMeanLog.
* Improved accuracy of GammaPower.FromMeanAndMeanLog.
* ExpOp supports GammaPower output.
2020-10-29 19:21:12 +00:00
Jonathan Tims 0be0aefe1d
Rename ImmutableArray to ReadOnlyArray (#293) 2020-09-30 18:54:20 +01:00
Jonathan Tims 568cb96cda
QuantileEstimator stores its random number generator (#291)
This allows QuantileEstimator to produce the same result even if it is serialized/deserialized in the middle.
2020-09-25 15:42:09 +01:00
Tom Minka f1c9073776
MethodInvoke.CanBeInlined returns true for more cases (#292)
* Tests use Assert.Throws instead of try/catch

* Test output is less verbose

* Added SimplestBackwardChainTest3

* Fixed IterativeProcessTransform when variable has QueryTypes.MarginalDividedByPrior and ConstrainEqualRandom(variable, observed)

* Discrete allows Dimension=0
2020-09-24 21:38:30 +01:00
msdmkats e167ce18f3
Use ulp-based constants (#289)
* Fix long overflow in OperatorTests.Longs()

* Fixed printing big float arrays, extended series for ((exp(x) - 1) / x - 1) - 0.5

* GammaPower.GetLogProb uses ulp-based threshold

* Moved a constant for maximum terms in NormalCdfMomentRatio outside of method body

* Separate methods for Previous/NextDoubleWithPositiveDifference

* More magic constants replaced with ulp-based ones

* Fixed abs value comparison in GammaPower.GetLogProb

* Ulp-based constatnts in IsBetween.XAverageConditional

* Moved the definitions of Ulp1-dependant constants below that of Ulp1

* Replaced  <= in GammaPower with a < as it used to be

Co-authored-by: Dmitry Kats <ratkillerx@hotmail.com>
2020-09-15 10:18:21 +01:00
Tom Minka 9ca3230018
Update URL for BookCrossing dataset (#288)
* Added Matrix.FromDiagonal
2020-09-09 23:54:45 +01:00
Tom Minka 2e03d713b9
Fixed Binomial.GetLogProb and Gamma.GetLogNormalizer (#287)
* Removed references to NETFULL.
* Don't define NET45.
* Don't define NETCORE, NETSTANDARD, or NETSTANDARD2_0
2020-09-08 13:39:12 +01:00
Andrii Kurdiumov bc953687bf
TestFSharp uses .NET 4.7.2 (#285)
This is required for .NET 5 support as explained at https://github.com/dotnet/fsharp/issues/10009#issuecomment-681019355
2020-09-01 16:22:40 +01:00
Tom Minka 6186fff2a6
Better handling of Sequential attribute (#279)
IncrementTransform handles GetJaggedItemsOp and GetDeepJaggedItemsOp.
IndexingTransform gives a warning for unimplemented cases instead of throwing.
Improved code doc for Damp functions.
Changed "#if HAS_BINARY_FORMATTER" to "#if NETFULL"
Removed Assert.Timeout from performance tests
2020-08-26 14:47:32 +01:00
Andrii Kurdiumov 4e2ecb17e9
Put binary formatter usage behind define (#276)
This is required to be able support code on .NET 5.0 where this code would be obsolete.
2020-08-25 16:33:21 +01:00
Andrii Kurdiumov 17fbd5f078
Disable warning using #pragma (#275) 2020-08-25 15:26:06 +01:00
Tom Minka f72fe0f846
Improved accuracy of FastSumOp (#272)
* ModelCompiler.IncludeDebugInformation defaults to false

* Fixed TestFSharp.fsproj
2020-08-22 17:02:25 +01:00
msdmkats 980f1e2513
Fixes for RoslynDeclarationProvider and Conversion.TryFindConversion (#271)
* RoslynDeclarationProvider uses all available source codes
to build the requested type declaration.

* Conversion.TryFindConversion looks for casts defined on the toType as
well.

Co-authored-by: Dmitry Kats <ratkillerx@hotmail.com>
2020-08-13 16:47:59 +01:00
Tom Minka 421507bb6e
Update to xUnit 2.4.1 (#265)
* Update build instructions
2020-07-26 22:07:20 +01:00
Tom Minka 82bfc893ed
Ported tutorial code to Python (#264)
* Added TestPython project.
* TestFSharp exposes xUnit tests.
* Added Invoker.InvokeInstance.  Invoker favors overloads with higher array indexing depth.
* Added EpTests.LinearProgrammingTest.
* Updated FSharp.Core, Msagl
* pr-netcore.yml uses latest version of .NET 3.1 instead of exact version
* UseDotNet task
* Updated versions of YAML tasks
* netcoretest.sh uses "dotnet test"
2020-07-26 07:08:10 +01:00
Tom Minka 032372ce36 StringUtil.EscapeXmlCharacters replaces StringUtil.MethodFullNameToXmlString
Removed Visualizers/Windows/FactorManagerView
Fixed MslTests.MethodOverrideTest
Added MslTests.MethodInAnotherFileTest
2020-07-22 19:43:51 +01:00
Tom Minka b94260d87e Added GammaPowerProductOp_LaplaceTest 2020-07-09 06:24:24 +01:00
Tom Minka dbda600379 MMath.DigammaLookup and MMath.NormalCdfMomentRatioTable are Lazy 2020-07-09 06:24:24 +01:00
Tom Minka 4f824e4c23
More special functions improvements (#254)
Removed c_digamma_small case from MMath.Digamma
Added tests for MMath.GammaLnSeries and XMinusLog1Plus
GenerateSeries also generates error bounds
Added CheckMathLibraries
2020-05-28 18:32:55 +01:00
msdmkats c6f896cfbf
Special functions improvements (#253)
- Ensured argument consistency in NormalCdfIntegralTest
- Fixed some test values
- Exp-Sinh quadrature in LogisticGaussian
- Arithmetic-precision-based constants in logistic gaussian
- Named const for Ulp(1)
- Generating series for gamma(x) - 1/x
- Fixed integer overflow in BGRat
- GenerateSeries can print arrays of bigfloats.
2020-05-27 18:19:16 +03:00
Tom Minka 804c8cc83d
NormalCdfRatioLn tests work correctly in arbitrary precision (#250) 2020-05-21 12:54:48 +01:00
Tom Minka 9affe7f082
Updated ComputeSpecialFunctionsTestValues.py (#248)
ComputeSpecialFunctionsTestValues.py computes more accurate test values, uses pure mpmath, more reliable quadrature. Computes values for NormalCdfRatioLn. Removed hard-coded cases.
MMath.Log1PlusExp uses a threshold based on machine epsilon.
2020-05-15 22:20:47 +01:00
Tom Minka 8aa4fbf2fc
Moved ExtendedDouble to the Microsoft.ML.Probabilistic.Math namespace (#246)
Added FloatTests
2020-05-12 18:30:49 +01:00
Tom Minka 4a6cbbaf8e
Fixed more corner cases in MMath.LogisticGaussian (#245) 2020-05-08 14:00:47 +01:00
Tom Minka cc60de338b
Fixed MMath.LogisticGaussian corner case (#244) 2020-05-07 01:14:49 +01:00
Tom Minka d00ac25426
ChainWithTransitionParameterTest2 uses more iterations when OptimiseInferenceCode=false (#239) 2020-04-24 18:49:45 +01:00
Tom Minka f8cf34f301
Discrete point mass distributions are considered equal whenever their Point properties are equal. (#212)
Added PointMassEstimator.
PointMass overrides Equals.
Observed variables can have query types.
Removed the obsolete Output attribute.
Fixed build order.

Internal compiler changes:
* Added MarginalAnalysisTransform.  VariableTransform, ChannelTransform, Channel2Transform no longer set marginal prototypes.
* VariableInformation can be attached to any declaration, not just variables.
* DeriveArrayVariable does not propagate MarginalPrototype attributes.
* VariableTransform and ChannelTransform create marginal channels for constants and parameters.
* ModelBuilder only attaches sizes to objects assignable from arrays.
* Added ObservedVariableMessages CompilerAttribute.
* Channel2Transform attaches DescriptionAttributes
2020-04-19 16:31:49 +01:00
Ivan Korostelev a1cbadd391
Simplify TryComputePoint() (#229)
Automaton is pointmass if it has only 1 support string.

This implementation is also faster than previous one.
2020-04-14 12:09:38 +01:00
msdmkats 520f5d2767
Special functions testing improvements, NormalCdfLn bugfix. (#233)
- Moved arguments and expected result values for special function tests from inlined code to csv files.
- Added a python script to compute expected result values for tested special functions in high precision.
- Improved accuracy of NormalCdfLn for `x < -8`. Truncated series was too short, test used to pass, because the expected value itself was incorrect.
2020-04-08 18:13:08 +03:00
Tom Minka 911c2d1444
CSharpWriter uses G17 to write doubles and G9 to write singles, for correct round-trip behavior (#235) 2020-04-08 10:44:34 +01:00
Andrey Kurdyumov 27200bd649
Update to .NET Core 3.1 (#232)
* Update to .NET Core 3.1

This update was faily trivial change of project properties with only one caveat
```
error CS0104: 'Range' is an ambiguous reference between 'Microsoft.ML.Probabilistic.Models.Range' and 'System.Range'
```

which require sprinkling everywhere following line of code
```
using Range = Microsoft.ML.Probabilistic.Models.Range;
```

* Update build definition to use more modern images
- Windows 2019 for .NET Core 3.1
- Mac OS 10.14 (Specifically https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/hosted?view=azure-devops)

Co-authored-by: Tom Minka <8955276+tminka@users.noreply.github.com>
2020-04-07 20:30:04 +01:00
Tom Minka 46bdd9dc49
More accurate messages for IsBetween, IsPositive, Plus, Max (#230)
* Use g17 when printing doubles
* Re-enabled GammaUpperRegularized tests
* MMath.LargestDoubleSum fix for 32-bit
2020-04-02 19:02:51 +01:00
Jonathan Tims 57f5a6c7b3
Merge branch 'master' into jotims/pr/fix-prefix-build 2020-03-19 10:32:31 +00:00
Jonathan Tims 205d26f4a1 wip 2020-03-18 23:13:59 +00:00
Tom Minka 9b2cbe9619
Improved TruncatedGamma support (#222)
Improved accuracy of TruncatedGamma.GetMeanAndVariance, GetMeanPower, GetNormalizer, Sample
Improved accuracy of Factor.TruncatedGammaFromShapeAndRate
GammaFromShapeAndRateOp_Slow.SampleAverageConditional handles sample=0
GammaFromShapeAndRateOp_Slow.RateAverageConditional handles rate=0
Improved accuracy of MMath.ExpMinus1, Gamma, GammaUpper
Gamma.SetShapeAndScale throws if shape is infinite
Added TruncatedGamma.GetMode
Added MMath.ToStringExact, GammaUpperScale
Changed uses of :r to :g17 numeric format string
MMath.IndexOfMinimum takes IEnumerable
PowerOp supports GammaPower = TruncatedGamma ^ y
2020-03-13 00:28:01 +00:00
Ivan Korostelev 085a5a4675
Remove Owner field from State (#220)
It really is not required - at all points it is known with which automaton we operate right now.

Also, where only state index was needed we store now the int index instead of fat State object
which contains extra information.
2020-03-11 16:28:08 +00:00
Ivan Korostelev 0c2f2859f5
Improve support enumeration in StringAutomata (#218)
Contracts has been changed: `TryEnumerateSupport()` does not throw if it encounters non-enumerable automata.

- Implementation of `TryEnumerateSupport` was changed a lot:
  - It avoids recursion (and "stacked IEnumerables")
  - An optimization has been added - traversing states with single point-mass forward transition
    (90+% of real-world cases) is cheaper, because it avoids some extra allocations

Also "is enumerable" status is cached. It is set proactively at automaton construction
in 2 common cases which are cheap to detect:
  - automaton with self-loops can not be enumerated
  - automaton with only forward transitions (and thus no loops) can always be enumerated

In all other cases this flag is calculated lazily on first enumeration try.
2020-03-03 20:46:05 +00:00
Ivan Korostelev 7bd784335d
Replace ReadOnlyArray with ImmutableArray look-alike (#217)
`System.Collections.Immutable` has `ImmutableArray` that serves the same purpose as
`ReadOnlyArray` but has different API. This type is available (without extra dependencies)
only in netcore, so can't be used in Infer.NET which has to support netframework.
Until netframework support can be dropped reimplement a subset of `ImmutableArray`
in Infer.NET codebase.
2020-02-29 09:49:41 +00:00
msdmkats 5cd2718883
Special functions use auto-generated truncated series (#207)
* Python tool to generate expressions for truncated series.
* Using the generated series in special functions
* Test projects set Optimize=true in Release configuration

Co-authored-by: Tom Minka <8955276+tminka@users.noreply.github.com>
2020-02-07 12:57:20 +00:00
John Guiver 053bac1751
Log probability override in DiscreteChar (#206)
Log probability override for Discrete char
2020-01-13 14:27:31 +00:00
Tom Minka 75ae71f2b5
DependencyAnalysisTransform fix (#211)
Fixed an issue where DependencyAnalysisTransform would give incorrect SkipIfUniform dependencies, causing statements to be incorrectly pruned from the generated code.
Removed FactorManager.AnyItem. Added FactorManager.All.
GaussianProductOp.AAverageConditional handles uniform B.
MMath.ChooseLn handles more cases.
2020-01-07 20:22:52 +00:00
Ivan Korostelev 54eea9b343
Make CheckStateCount() behave as expected (#210) 2020-01-07 18:23:12 +00:00
msdmkats 482f807bab
Power series abstraction (#201)
- Introduced an abstraction for truncated power series
- Introduced an abstraction for power series
- Made [Di|Tri|Tetra]Gamma[Ln] use it
- Added internal interface to recompute power series used in MMath and make them longer/shorter depending on the necessary precision. It can be used in tests.

Currently, power series computation is rather primitive (cut-off some precomputed series at a point where it used to be cut as long as the precision in <= 53, don't cut-off otherwise).
2019-12-17 17:13:31 +03:00
Jonathan Tims daa1058627 Use license and icon rather than licenseUrl and iconUrl which are now deprecated. (#200)
o Marked each non-packing assembly as "IsPackage=false" because the default is to package the assemblies. Done this via a common.props import.
o Switched to using integrated CSProj NuGet properties. Done this via a nuget-properties.props import.
o Switched to using msbuild in release.yml rather than nuget, since nuget.exe does not support the new spec.
o Added a LearnersNuGet project to be the "csproj" host for the learners nuspec (since the learners nuget does not correspond to any particular existing csproj project).
o Updated ClickThroughModel.csproj and ClinicalTrial.csproj and Image_Classifier.cs and MontyHall.csproj to new-style csprojs because otherwise new msbuild commands reject them.
o Made release.yml msbuild calls multiprocess to speed them up.
o Factored out some common properties into the common.props file.
2019-12-04 14:39:03 +00:00
Tom Minka 57c888024c
Improved GammaPower implementation (#195)
GammaPower creates a point mass whenever Rate would be infinite.
Added GammaPower.FromMeanAndMeanLog.
Improved numerical accuracy of GammaPower.GetLogProb, GetMean, and GetMode.
Added GammaPowerEstimator
GammaProductOp supports GammaPower distributions.
GammaProductOp handles uniform message from product.
Added GammaPowerProductOp_Laplace
Fixed PowerOp for GammaPower distributions.
Added PlusGammaOp for GammaPower distributions.
MMath.GammaUpper has an unregularized option.
Added TruncatedGamma.GetMeanPower.
PowerOp supports TruncatedGamma.
Swapped the argument order of (internal) MMath.LargestDoubleProduct and LargestDoubleRatio.
2019-11-13 14:31:12 +00:00
Ivan Korostelev d6e7d5b975
Make MaxStateCount overrides thread local (#194)
UnlimitedStatesComputation() is used temporary to alter maximal size of automaton
which is defined my MaxStateCount. Using it from different threads could mess up the limit.
Now each threads gets its own limit.

Also, the default MaxStateCount limit is increased to 300k, because that is what the biggest String inference customer uses.
2019-11-07 16:44:54 +00:00
Ivan Korostelev fe41d89eba
Fix DiscreteChar.Complement() & Simplify ranges operations (#192)
After recent refactoring that removed `ProbabilityOutsideRanges`, `DiscreteChar.Complement()` 
started to work incorrectly in case ranges were going one after another.

For example DiscreteChar.Point('\0').Complement() was equal to uniform distribution, i.e. still included the \0 char.
2019-11-05 14:42:28 +00:00
Ivan Korostelev 1f54bc32f7
Precise sums in automaton determinization (#189)
Automaton determinization procedure used to keep a running total of weights for each state.
That sum was maintained by adding weights for next open segment and substracting weights
for closed segments. If the weights of segments differ a lot (LogValue difference is bigger than 100)
than due to numerical issues sum could become zero after subtraction which lead to dropping
of transition and automaton language truncation.

Now the weights sum is recalculated from scratch each time. It also results in loss of precision,
but it is important that the precision is lost only for very large weights, not very small ones. So
no accidental zeroing of weights is happening and language is not truncated.

It doesn't really impact runtime because `WeightedStateSet` construction enumerated all weights
anyway (to normalize them), so in the worst case slowdown is at most constant.
And in average case (where we maintain  1 or 2 destination states per transition) the runtime
is actually better due to lower constant costs.

New test (`AutomatonTests.Determinize11`) was added, it used to fail with previous implementation.

To make this change I had to rewrite code substantially which in my opinion makes it easier to follow:
- The determinization procedure now makes use of `CharSegmentsEnumerator` helper class
  which enumerates over all char segments from multiple char distributions. These segments are
  non-overlapping.
- `WeightedStateSetBuilder` now handles duplicate state indices in `Add()` call. Previously deduplication
  had to happen via accumulating sum in `Dictionary<int, WeightSum>`
2019-10-29 18:18:50 +00:00
Tom Minka f44d7f3a26
Gaussian.GetMean throws if Precision == 0, instead of returning 0. (#183)
ExpOp.ExpAverageConditional correctly handles improper d.
InnerQuantiles/OuterQuantiles check for an empty quantiles array.
2019-10-03 19:47:01 +01:00
Tom Minka 625228bc40 Gaussian.GetLogProb is more accurate for large precision.
DoubleIsBetweenOp fix.
MMath.LargestDoubleSum works in 32-bit.
2019-09-21 20:02:10 +01:00
Tom Minka cfc8128143 MMath.LargestDoubleProduct is more efficient 2019-09-21 20:02:10 +01:00
Tom Minka e642655221
Merge branch 'master' into InnerProduct 2019-09-21 14:25:03 +01:00
Tom Minka a11a745a84
Merge branch 'master' into ivkorost/remove-try-determinize-assert 2019-09-18 15:49:07 +01:00
Ivan Korostelev cfe60e308a Remove wrong Debug.Assert()
String automaton determinization supports infinite weights.
2019-09-18 11:17:15 +01:00
Ivan Korostelev c4c36c1f4b Replace custom enums for caches with bool? type.
Also, fix assert in Automaton.DataContainer.With()
2019-09-18 10:35:32 +01:00
Ivan Korostelev 1893477580 Fix comments 2019-09-17 14:02:06 +01:00
Ivan Korostelev 2d5ce49ea3 Add a test with large automaton 2019-09-17 13:58:23 +01:00
Ivan Korostelev 5fdb6c0779 Use explicit stack in Automaton.IsZero() & cache call result 2019-09-17 13:54:55 +01:00
Tom Minka 2c6a925066 Gaussian constructor reduces precision to avoid overflow, instead of creating a point mass.
Fixed corner cases in MMath.LargestDoubleProduct and NormalCdfIntegral.
Improved accuracy of DoubleIsBetweenOp.
Loosened the tolerance of GaussianIsBetweenCRCC_IsMonotonicInXMean and GaussianIsBetweenCRCC_IsMonotonicInXPrecision, to be fixed later.
2019-08-30 17:36:29 +01:00
Tom Minka 4ecd6d19fc GaussianOp.PrecisionAverageConditional_Point correctly handles corner cases 2019-08-30 17:36:29 +01:00
Tom Minka a280a6272f Added GatedInnerProductVectorTest and missing overloads of InnerProductOp 2019-08-23 16:03:05 +01:00
Tom Minka 98e07ae335 AutomatonTests.Determinize10 is OpenBug 2019-08-22 18:09:39 +01:00
Tom Minka 9e7dfcf7f2 InnerProduct supports fixed output when B vector has zero or one nonzero element.
ExpOp_Slow handles GammaPower messages.
GammaPower implements division by constant.
2019-08-22 14:53:06 +01:00
Tom Minka 397642ff2a Improved numerical accuracy of GaussianOp, DoubleIsBetweenOp, and MMath.NormalCdf2.
Fixed cases where MMath.NormalCdf, DoubleIsBetweenOp, IsPositiveOp, DoublePlusOp would throw.
JaggedSubarrayOp uses ForceProper.
PlusDoubleOp gracefully handles improper distributions on input.
Added MMath.NormalCdfIntegral, NormalCdfDiff, NormalCdfExtended.
Added ExtendedDouble class.
MeanVarianceAccumulator correctly handles weight=0.
Region.GetLogVolume has a lower bound.
Variables with PointEstimate do not use JaggedSubarrayWithMarginal.
2019-06-25 23:23:09 +01:00
Tom Minka 63fe682bed OuterQuantiles and InnerQuantiles can be serialized and compared.
Their second constructor is now a factory method.
2019-06-13 22:35:24 +01:00
Tom Minka b7ab86af1b InnerQuantiles and OuterQuantiles throw on infinite quantiles. InnerQuantiles.GetLogProb and GetQuantile are non-decreasing. 2019-06-07 20:36:09 +01:00
Ivan Korostelev 70ae46cd79
Fix DiscreteChar.Sample() (#153)
Due to refactoring mistake `sampleProb / prob * intervalLength`
turned into `sampleProb / (prob * intervalLength)` which is obviosly incorrect.

Fixed that + added a rudimentary test for `DiscreteChar.Sample()`
2019-05-28 15:52:38 +01:00
Elena Pochernina 97c8995e4f Added test for StringFormatOp parallel run (#147)
Made `ArgumentCountToNames` cache thread-safety.
After that another exception appeared with `ArgsToValidatingAutomaton` dictionary (reference null exception). Making the dictionary concurrent fixed this error.
2019-04-24 11:20:23 +01:00
Ivan Korostelev e486a155b9
Automaton determinization improvements (#144)
* Determinization doesn't change language of the automaton anymore.
  2 new tests were added which check that it doesn't happen.
  Previously all very low probability transitions (with probability of less than e^-35) were removed.
  That was done because of 2 reasons:
  - some probabilities were represented in linear space. And e^-35 is as low resolution
    as you can get with regular doubles. (That is, if one probability = 1 and another is e^-35,
    then when you add them together, second one is indistinguishable from zero).
    This was fixed when discrete charprobabilities
  - Trying to determinize some non-determinizable automata lead to explosion of low-probability
    states and transitions which led to a very poor performance.
    (E.g. `AutomatonNormalizationPerformance3` test). Now a smarter strategy for detecting
    these non-determinizable automata is used - during traversal all sets of states from root
    are remembered. If automaton comes to the same set of states but with different weights
    than it stops immediately, because otherwise it will be caught in infinite loop
* `Equals()` and `GetHashCode()` for `WeightedStateSet` take into account only high 32 bits of weight.
  This coupled with normalization of weights allows to reuse already added states with
  very close weights. This speeds up the "PropertyInferencePerformanceTest" 2.5x due to
  smaller intermediate automata.
* Weighted sets of size one are handled specially in `TryDeterminize` - they don't need to be
  determinized and can be copied into result almost as is. (Unless they have non-deterministic
  transitions. Simple heuristic of "has different destination states" is used to detect that
  and fallback to slow general path).
* Representation for `WeightedStateSet` is changed from (int -> float) dictionary to
  sorted array of (int, float) pairs. As an optimization, a common case of single-element
  set does not allocate any arrays.
* Determinization code for `ListAutomaton` was removed, because it has never worked
2019-04-23 15:30:16 +01:00
Ivan Korostelev faf55fffb9
Do not fail in MergeParallelTransitions when transition weights are high (#149)
Transition weights may become infinite when going from log space into value space. And
`SetToSum()` which is used by `MergerParallelTransitions` can't handle 2 inifnite weights at once.
To solve this, transitions weights are normalized by max weight prior to passing into `SetToSum()`
2019-04-20 00:10:13 +01:00
Tom Minka 8398c1d4f7
ModelCompiler.TraceAllMessages activates Tracing.Trace for all variables (#145)
Tracing.Trace uses System.Diagnostics.Trace.
Added DifficultyAbility.fs to TestFSharp.
2019-04-01 22:52:22 +01:00
Ivan Korostelev 9ab76c6b7c
Change transitions range representation in Automaton.StateData (#140)
* Pair (FirstTranstiionIndex, LastTransitionIndex) changed to pair
  (FirstTransitionIndex, TransitionsCount).
* Introduced Automaton.Builder.LinkedStateData struct which
  mirrors Automaton.StateData but represents transitions as linked list.
  Previously Automaton.StateData was reused for this purpose.
  That was confusing.
2019-03-26 19:20:27 +00:00
Ivan Korostelev 036f9bfd71
Store log probabilities inside DiscreteChar (#137)
In some edge cases StringAutomaton needs to represent extremely
low probabilities of character transitions. To be able to do that,
instead of storing probabilities as double values they are stored
as `Weight` structs already used by automatons. Weight stores logarithm
of value instead of value itself.
2019-03-21 13:49:06 +00:00
Tom Minka c0a5734c98
Gamma.GetQuantile supports Shape != 1. (#131)
GammaPower implements GetProbLessThan and GetQuantile.
GammaPower.GetMeanPower does not throw on infinity.
2019-03-15 10:52:39 +00:00
Ivan Korostelev a2a3498f6f
Cache TryDeterminize() call results & add proper support for enum constant to Compiler (#127)
TryDeterminized() tried to do something even if it is known that automaton is already determinized
or non-determinizable.

Because automaton state is immutable, it is possible to store determinization state alongside with it.
There are 3 states:
* Unknown - TryDeterminize() was never called for this automaton
* IsDeterminized - TryDeterminized() successfully determinized automaton
* IsNonDeterminizable - TryDeterminize() was called but didn't succeed.

Because determinization state depends on maximum number of states,
`TryDeterminize(int maxStatesCount)` method was removed in favour of using defaults.
Because this overload was never used in practice.

Also, as an implementation detail an enum type was exposed as part of automaton quoting interface.
Compiler generated incorrect C# code for quoting enum constants. Fixed that.
2019-03-13 01:00:16 +00:00
Ivan Korostelev dc0b30487e
Add operator overloads to Weight class (#121) 2019-03-12 13:18:33 +00:00
Ivan Korostelev 6d2ce9a993
Get rid of recursive implementations (#125)
We finally reached a point where automatons are big enough that recursive implementations
of algorithms fail with `StackOverflowException`.

There are 4 tests which test operations with big automatons (100k states):
* `TryComputePointLargeAutomaton`
* `SetToProductLargeAutomaton`
* `GetLogNormalizerLargeAutomaton`
* `ProjectSourceLargeAutomaton`

All four used to fail before code was rewritten to use explicit stack instead of recursive calls.
All changes except one didn't change algorithms used, code was almost mechanically changed
to use stack.

The only exception - automaton simplification. Old code used recursion in very non-trivial way.
It was rewritten from scratch, using different algorithm: instead of extraction of generalized
sequences and then reinserting them back new code merges states directly in automaton.
(There's a comment at the beginning of Simplify() method explaining all operations)
2019-03-11 11:33:29 +00:00