Граф коммитов

158 Коммитов

Автор SHA1 Сообщение Дата
Ivan Korostelev 9ab76c6b7c
Change transitions range representation in Automaton.StateData (#140)
* Pair (FirstTranstiionIndex, LastTransitionIndex) changed to pair
  (FirstTransitionIndex, TransitionsCount).
* Introduced Automaton.Builder.LinkedStateData struct which
  mirrors Automaton.StateData but represents transitions as linked list.
  Previously Automaton.StateData was reused for this purpose.
  That was confusing.
2019-03-26 19:20:27 +00:00
Ivan Korostelev 26c093e17f
GetLogProb should return stored log value directly (#139)
Since DiscreteChar was refactored, GetLogProb() can directly return the saved
logarithm of probability. Instead of first exponentiating it and then taking logarithm.
2019-03-26 18:44:08 +00:00
Ivan Korostelev 8d1857adbf
Make ToLower() work with more char classes (#135)
Previously `NotSupportedException` was thrown for `WordChar` and `Uniform` char classes.
Now versions of these distributions with upper chars excluded are returned.

Note: implementation of `ToLower()` for `Unknown` char class is still somewhat broken:
- it ignores probability outside ranges.
- it can get slow if char ranges are big.
2019-03-22 14:23:15 +00:00
Ivan Korostelev a0246ff22d
Infer determinization state in simple cases (#136)
If both inputs are determinized than product is also determinized
2019-03-22 10:24:16 +00:00
Ivan Korostelev 946b5dfb83
Represent ReadOnlyArraySegment range as a (begin, length) tuple
1. Previously it was represented as (begin, end) tuple.
2. Add index checks in debug builds
2019-03-21 16:15:44 +00:00
Ivan Korostelev 036f9bfd71
Store log probabilities inside DiscreteChar (#137)
In some edge cases StringAutomaton needs to represent extremely
low probabilities of character transitions. To be able to do that,
instead of storing probabilities as double values they are stored
as `Weight` structs already used by automatons. Weight stores logarithm
of value instead of value itself.
2019-03-21 13:49:06 +00:00
Ivan Korostelev 4b03a49877
Make Weights comparable (#134) 2019-03-20 18:11:43 +00:00
John Guiver 3cc2602194
Bug in previous commit. Call Try version of SetToConstantOnSupportOfLog. (#132) 2019-03-15 14:43:13 +00:00
Tom Minka c0a5734c98
Gamma.GetQuantile supports Shape != 1. (#131)
GammaPower implements GetProbLessThan and GetQuantile.
GammaPower.GetMeanPower does not throw on infinity.
2019-03-15 10:52:39 +00:00
John Guiver b0c42758b8
Implement TrySetToConstantSupportOfLog (#130)
* SetToConstantOnSupportOfLog will throw a NotImplementedException if non-determinizable.

* TrySetToConstantOnSupportOfLog implementation
2019-03-14 16:45:13 +00:00
Ivan Korostelev 0af99effe7
Simplification improvements (#129)
1. Actually remove merged states in MergeTrees(). Not doing this led to slightly worse performance
2. TryComputePoint() contained a bug when it didn't traverse some transitions in automaton
2019-03-13 21:23:14 +00:00
Ivan Korostelev a064ca63bd
Fix performance issues after recursion elimination (#128)
* FindStronglyConnectedComponents() ncorreclty used array to store state info. When
  epsilon closure of state was constructed a very small fraction of states was traversed
  but array was created for all states in automaton. This created huge gc pressure.
* SetToEpsilonClosure() is also now recursion-free
* Automaton simplification now considers important case of eliminating epsilon-transition
2019-03-13 11:20:37 +00:00
Ivan Korostelev a2a3498f6f
Cache TryDeterminize() call results & add proper support for enum constant to Compiler (#127)
TryDeterminized() tried to do something even if it is known that automaton is already determinized
or non-determinizable.

Because automaton state is immutable, it is possible to store determinization state alongside with it.
There are 3 states:
* Unknown - TryDeterminize() was never called for this automaton
* IsDeterminized - TryDeterminized() successfully determinized automaton
* IsNonDeterminizable - TryDeterminize() was called but didn't succeed.

Because determinization state depends on maximum number of states,
`TryDeterminize(int maxStatesCount)` method was removed in favour of using defaults.
Because this overload was never used in practice.

Also, as an implementation detail an enum type was exposed as part of automaton quoting interface.
Compiler generated incorrect C# code for quoting enum constants. Fixed that.
2019-03-13 01:00:16 +00:00
Ivan Korostelev dc0b30487e
Add operator overloads to Weight class (#121) 2019-03-12 13:18:33 +00:00
Ivan Korostelev bb11960fc0
Fix automaton simplification code (#126)
* It didn't recurse past first state
* transition2 was not checked if it is valid for merging. Because of subscript error, transition1 was checked twice.
2019-03-12 12:28:13 +00:00
Ivan Korostelev 41bbdb0026
Turn EpsilonClosure into a struct (#122)
EpsilonClosure is small and immutable. Perfect fit for value type.

Also, changed usages of Pair<> to ValueTuple<> and started using
expression-bodied functions for properties.
2019-03-12 10:42:38 +00:00
Ivan Korostelev 6d2ce9a993
Get rid of recursive implementations (#125)
We finally reached a point where automatons are big enough that recursive implementations
of algorithms fail with `StackOverflowException`.

There are 4 tests which test operations with big automatons (100k states):
* `TryComputePointLargeAutomaton`
* `SetToProductLargeAutomaton`
* `GetLogNormalizerLargeAutomaton`
* `ProjectSourceLargeAutomaton`

All four used to fail before code was rewritten to use explicit stack instead of recursive calls.
All changes except one didn't change algorithms used, code was almost mechanically changed
to use stack.

The only exception - automaton simplification. Old code used recursion in very non-trivial way.
It was rewritten from scratch, using different algorithm: instead of extraction of generalized
sequences and then reinserting them back new code merges states directly in automaton.
(There's a comment at the beginning of Simplify() method explaining all operations)
2019-03-11 11:33:29 +00:00
Tom Minka 702874f9aa
ComputeMovieGenre allows any number of genres (#124) 2019-03-09 09:48:27 +00:00
Tom Minka 7ec8af439b
Removed msbuild instructions for mono (#120)
RegexpFormattingSettings ctor sets UseLazyQuantifier.
ProductExpOp fix.
General code cleanup.
2019-03-08 22:13:18 +00:00
Ivan Korostelev a00d7fc462
Make GetValidatedFormatString() work with automatons with groups (#119)
SetToConstantOnSupportOfLog() fails on automatons with groups,
but validatedFormat doesn't need them anyway
2019-03-08 10:02:50 +00:00
Ivan Korostelev f55a968362 Turn Automaton.UsesGroups into property & optimize SetGroups() (#118)
* Turn UseGroups into a property, and calculate it at automaton construction time

* Do not clear groups if they are not used anyway
2019-03-08 09:23:26 +00:00
Tom Minka b0a5d6c843 Solution config for ReviewerCalibration project 2019-03-05 21:40:54 +00:00
Tom Minka 9a0e1ae3ab Added ReviewerCalibration example 2019-03-05 16:17:51 +00:00
John Guiver 11e0b83d29
Merge pull request #111 from dotnet/joguiver/string-format-op
Solve numerical issues in StringFormatOp
2019-02-21 17:13:17 +00:00
John Guiver b24e600ca9 Solve numerical issues in StringFormatOp 2019-02-15 17:43:45 +00:00
Tom Minka b5f62d3300 TryComputePointLargeAutomaton is OpenBug 2019-01-26 10:17:49 +00:00
Tom Minka 299edba684 Added EnumerableExtensions.ValueEquals and removed Util.ValueEquals 2019-01-24 22:10:36 +00:00
Tom Minka 4a7c874c39 QuantileEstimator starts with no allocated buffers and de-allocates empty buffers. 2019-01-24 22:10:36 +00:00
Tom Minka 85107a3f88 VectorSoftmax_PointSoftmax_Throws allows any exception type. 2019-01-24 22:10:36 +00:00
Tom Minka 44bfa2129d QuantileEstimator.GetQuantile checks for probability is NaN 2019-01-24 22:10:36 +00:00
Tom Minka f567d68bfa Added QuantileEstimator.Inflate 2019-01-24 22:10:36 +00:00
Ivan Korostelev 65e6981ee0
Convert Automaton.StateData to value type (#95)
To reduce memory pressure automaton data (states and transitions) was
converted from classess to structs and "vectorized" into just two large arrays.

Now, one state costs just 16 bytes and one transition costs 24 bytes.
Previously cost was at least twice of that, due to additional indirection,
object headers and pre-reservation of arrays capacity: 8 bytes pre reference
to state + 16 bytes per state header + 24 bytes per transitions array per state).
In practice memory consumption by automatons was reduced approximately
by the factor of 3.

Important changes:
  * `Automaton.StateData` is now a struct instead of class.
  * All transitions of automaton are stored in single array instead of
    "transition array per state"
  * Automatons can not be mutated. All immutable data was moved
    to new `DataContainer` struct.
  * New Automaton.Builder class was introduces for construction
    of new automatons from scratch. All mutation is done through
   "copy to builder, mutate, get immutable data" pattern.
  * Widely used `AppendInPlace()` method is not actually inplace anymore
    and became way more expensive. Multiple `AppendInPlace()` calls
    should be replaced with 1 `Automaton.Builder` instance with
    multiple `Builder.Append()` calls. Places which were hot in profiler were rewritten.
    `AppendInPlace()` will be removed completely in separate pull-request.

Technical changes:
  * `Optional<>` struct was moved into Containers namespace. Logically it is
     a container of "0-or-1" elements.
  * To enforce immutable data-structures new `ReadOnlyArray`, `ReadOnlyArraySegment`
    and `ReadOnlyArraySegment` enumerator structs were introduced. They wrap regular
    arrays but do not allow to mutate them. Unlike `ReadOnlyList<>` these are value types
    and do not introduce any memory-costs.
  * In quite a few places Tuples were replaced with ValueTuples.
  * Most recursive helper methods were "inlined" as local functions at their call-sites.
    Not strictly necessary, but mady refactoring a lot easier by bringing related code together.
  * Some code was reformated/updated to more modern style. Either by tooling or manually.
2019-01-23 09:57:54 +00:00
Tom Minka 04e329f63f
Softmax factor throws when output is observed under VMP. (#102)
* Softmax factor throws when output is observed under VMP.

* VectorSoftmaxOp_KM11.XAverageLogarithm allows softmax to be a point mass as long as some x[i] is a point mass.
Fixes #101
2019-01-21 19:54:49 +00:00
Tom Minka 305f5c9583 QuantileEstimator allocates buffers when needed 2019-01-18 20:01:45 +00:00
Tom Minka 0b89726d69 Fixed an issue where InnerQuantiles could fail. 2019-01-17 22:58:19 +00:00
Tom Minka a741aed54c DenseVector.SetToLeastSquares avoids overflow. 2019-01-17 22:58:19 +00:00
Vijay 2397d94a09
Merge pull request #87 from dotnet/avishar/Automaton-Cleanup
Automaton cleanup
2018-12-12 20:28:22 +00:00
Vijay 10f80b9371
Merge branch 'master' into avishar/Automaton-Cleanup 2018-12-12 19:47:02 +00:00
Tom Minka 9f20867b92
Added IrregularQuantiles. (#93)
* Added IrregularQuantiles

* IrregularQuantiles throws ArgumentOutOfRangeException

* Added Region.Equals and CompareTo

* BlogTests.Handedness is a test

* BallCountingTest comment

* More documentation and testing
2018-12-12 18:18:20 +00:00
Ivan Korostelev eba741b13c
Merge branch 'master' into avishar/Automaton-Cleanup 2018-12-12 16:27:54 +00:00
Tom Minka 13def19155
MixtureOfGaussians tutorial uses the recommended modern style for initializing a VariableArray. (#91) 2018-12-03 16:03:59 +00:00
Vijay 17e5024b5a
Merge pull request #89 from dotnet/avishar/GateTransform-Cleanup
GateTransform cleanup
2018-12-02 22:17:28 +00:00
Vijay 030de7de82
Merge branch 'master' into avishar/GateTransform-Cleanup 2018-12-02 20:36:11 +00:00
Vijay Sharma 5f3f4d127e - Removed local variable recognizer.
- Chaanged IVariableReferenceExpression line to use pattern matching.
2018-12-02 20:35:57 +00:00
Vijay 4f63a3f9c5
Merge pull request #88 from dotnet/avishar/CreateVariableArrayFromItem-Simplification
Removed redundant code in CreateVariableArrayFromItem
2018-11-26 21:38:17 +00:00
Vijay Sharma 7dc27aa614 Removed accidental linebreak 2018-11-25 03:41:34 +00:00
Vijay Sharma be4e7368e5 - String interpolation
- Made some properties readonly
- IsLiteralOrLoopVar used local variable with same name and casing as property, chnaged local to lowercase: recognizer
- Pattern matching
2018-11-25 03:39:56 +00:00
Vijay Sharma 30f89827b0 Removed redundant code in CreateVariableArrayFromItem 2018-11-25 03:15:03 +00:00
Vijay Sharma 3917a7df62 String interpolation 2018-11-25 02:22:47 +00:00
Vijay Sharma d03f81f40d AutomatonFormats - Removed unused private sets of properties 2018-11-25 02:21:45 +00:00