Граф коммитов

72 Коммитов

Автор SHA1 Сообщение Дата
David Brownell d53254e21e Merged PR 5060: New technique for conversion of floats and doubles to strings
New technique for conversion of floats and doubles to strings. This fixes an issue on Linux which we could only reproduce when testing via Nimbus.
2019-09-04 21:52:35 +00:00
David Brownell b88e88a3e6 Merged PR 5055: Integration tests for the Shared Library interface 2019-09-04 19:02:11 +00:00
David Brownell 6960d88541 Merged PR 5049: Fixed bug with optional strings in code generated for the DLL interface
Fixed bug with optional strings in code generated for the DLL interface
2019-08-30 18:39:26 +00:00
David Brownell 3054511061 Merged PR 5044: Scripts to create the C DLL interface, code generated by those scripts, and updates for Linux builds
Note that all code in folders name "GeneratedCode" have been generated by other tools in this repo.
2019-08-29 23:48:26 +00:00
Michael Sharp ff82df716a Merged PR 5026: Additional columns for DateTimeFeaturizer
Combined the functionality of both DateTimeFeaturizers by adding all the combined columns to this one.

Holiday information is still not in here as we are not sure how we are going to handle that yet.

I added 2 more libraries to the shared section. They are in the ORT repo and Dmitri said we are good to take a dependency on that.
2019-08-29 18:42:50 +00:00
Michael Sharp 0b6dea26ba Merged PR 5035: Changed DateTimeFeaturizer input to std::int64_t
Changed DT input to std::int64_t representing seconds since 1970
Removed existing ML.NET C++ wrapper code and tests as that will now be implemented with Dave's codegen.
2019-08-29 05:33:02 +00:00
Anuj Shrotriya a2ab1c8a4e Merged PR 5017: CatImputer
CatImputer Description:
This featurizer imputes missing values in an input column with the most frequent one.

Design:
Underlying implementation of this featurizer is composed of two estimators:
1) HistogramEstimator: This estimator computes the histogram for the input column and creates a HistogramAnnotation. Note that this 'IS A' Annotation Estimator i.e it doesn't have a transformer.
2) HistogramConsumerEstimator: This class retrieves a HistogramAnnotation created by HistogramEstimator and computes the most frequent value from it. This value is then used to impute missing values.
Both of these estimators are chained in PipelineExecutionEstimator which is exposed as CatImputer.
2019-08-28 22:49:18 +00:00
David Brownell 67d0fd984c Merged PR 5029: Fix for InferenceOnlyEstimators that were prematurely converted to Transformers within a Pipeline
InferenceOnlyEstimators don't require training, but they can't be created prematurely within a pipeline if they rely on information generation by ancestor AnnotationEstimators. This fix delays the creation of Transformers associated with InterferenceOnlyEstimators until everything that comes before it in a pipeline as completed training.
2019-08-28 00:19:15 +00:00
David Brownell b6f2cefc88 Merged PR 5025: Changes deferred from previous PR 2019-08-26 19:59:05 +00:00
David Brownell 445eb2abb8 Merged PR 5006: Added TrainingOnlyEstimatorImpl and renamed files for consistency 2019-08-26 16:43:42 +00:00
David Brownell 88a6395cd5 Merged PR 4996: Pipeline Execution Estimator
The PipelineExecutionEstimator allows the caller to chain multiple Estimators end-to-end to form a pipeline (or DAG). During training, data trickles down to the currently untrained Estimator within the chain. Once all training is complete, a Transformer is created that only invokes the Transformers associated with TransformerEstimators in the original chain.

This code attempts to provide compile-time warnings when Estimators are chained together in an incompatible way.
2019-08-21 16:44:13 +00:00
Michael Sharp 14835dbe44 Merged PR 4976: DateTimeTransformer implemented in ML.NET with assocated C++ wrapper
This is the ML.NET framework implementation to use the shared C++ library DateTimeTransformer. It includes the ML.NET C# code, as well as the C++ wrapper code.

It does NOT yet implement saving of the model or exporting to ONNX.

The C# code works and has its associated Unit Tests, but since we have not setup the new project in the ML.NET solution it cannot be built in the DataPipelines repo yet.

REVIEW: I am not sure how to make a CMAKE file for the C++ dll's, so that still needs to be added.
2019-08-19 16:45:08 +00:00
David Brownell 706ebbc9f1 Merged PR 4982: Removed boost::optional, restored Linux tests, updated some exceptions
Removed boost::optional, restored Linux tests, updated some exceptions
2019-08-15 22:24:50 +00:00
David Brownell 0d6e278ec4 Merged PR 4972: Added serialization functionality to Transformers
Added serialization functionality to Transformers
2019-08-14 20:01:20 +00:00
David Brownell 2e3d623850 Merged PR 4950: Updates to Featurizers to support additional scenarios
Updates to Featurizers to support additional scenarios
2019-08-12 20:16:20 +00:00
Ye Wang 623496d798 Merged PR 4936: add to-do list for String Transformer
add to-do list in Traits.h
2019-08-12 19:24:19 +00:00
Ye Wang 26a96778df Merged PR 4930: add tuple transformer
add tuple transformer for StringTransformer
2019-08-07 22:55:02 +00:00
Michael Sharp 855c3f6d27 Merged PR 4915: adding new optional to replace boost
adding new optional to replace boost
2019-08-07 16:28:28 +00:00
Ye Wang 05a5474c5a Merged PR 4904: String Transformer
string transformer(wrapper for traits.h) for basic types(bool, string. integers, numbers, arrays, vectors, maps). Nested struct included. tests added for string transformer and traits.
2019-08-06 21:22:42 +00:00
David Brownell e7b796b720 Merged PR 4902: Pass command line arguments to setup in the bootstrap process
Pass command line arguments to setup in the bootstrap process
2019-08-05 17:38:36 +00:00
Michael Sharp 90d7aef480 Merged PR 4846: DateTime Transformer
Created DateTimeTransformer in the Featurizers folder.
Copied Jamie's DateTime code to the Featurizers folder. //REVIEW, should I remove her code from its original location?
2019-07-31 22:28:58 +00:00
David Brownell c81b3ab3c0 Merged PR 4881: Including boost as a header only library for now (which disables cmake errors
Including boost as a header only library for now (which disables cmake errors associated when compiled boost libraries can't be found)
2019-07-31 22:17:05 +00:00
Michael Sharp 455bf4f76c Merged PR 4877: Traits class added
Traits class added
2019-07-31 21:09:38 +00:00
Teo Magnino Chaban 7827612fa5 Merged PR 4858: CheckPolicy adaptability
Modified CheckPolicy.py so its easier to add a new type to the list of accepted types.
2019-07-31 15:52:36 +00:00
David Brownell 1357c64e35 Merged PR 4815: Added Sample Featurizer and Infrastructure
Added Sample Featurizer and Infrastructure
2019-07-26 21:18:30 +00:00
Teo Magnino Chaban 10294a6334 Merged PR 4814: General Improvements
Fixed some naming conventions.
Added more UnitTests.
2019-07-24 21:38:42 +00:00
Teo Magnino Chaban 212b948d9e Merged PR 4767: Support for type verification on Structs
Integrated the type verification with the Structs verification.
Changed UnitTests and IntegrationTests to act accordingly, and to cover new cases.
2019-07-24 20:53:01 +00:00
Teo Magnino Chaban 509abaa5e7 Merged PR 4743: Removed Class support
Removed support for class. Renamed variables to reflect this change.
Improved IntegrationTests by reducing number of function calls, running time decreased by 50%.
2019-07-17 20:12:35 +00:00
Teo Magnino Chaban bec7fa73d1 Merged PR 4737: Policy Change
Modified CheckPolicy and its UnitTest to reflect the last decision on supported types. Still need to add the processing of structs.
Changed the UnitTests of CppToJson to reflect better the types we want to accept. This is not needed, since its strictly testing CppToJson. Also added tests that were discussed with David, that will make sure that the warnings are working as expected.
Modified IntegrationTests so that now deserialize is called with the flaw always_include_optional, and removed temporary comments.
Improved performance on the UnitTests by reducing the amount of function calls it was performing. Execution time went from 5 seconds to 1.6 seconds.
2019-07-15 23:35:36 +00:00
David Brownell ba548978c2 Merged PR 4721: Refactor to move classes into separate files
Refactor to move classes into separate files
2019-07-12 21:09:51 +00:00
David Brownell 7f33613d54 Merged PR 4719: Refactor to create the ArgumentInfo class
Refactor to create the ArgumentInfo class
2019-07-12 20:52:02 +00:00
David Brownell c6a1612c89 Merged PR 4717: Moved CppToJson python code to its own directory
Moved CppToJson python code to its own directory
2019-07-12 20:32:44 +00:00
Teo Magnino Chaban a9649457e8 Merged PR 4695: Integration and Hierarchy
Added support for struct hierarchy. Now each struct has a list of the structs it depends on.
Changed 'obj_type_list' to 'struct_list'.
Changed 'struct_name' and 'function_name' to 'name' on the SimpleSchema.
Created a deserialization IntegrationTest to make sure that the output from CppToJson matches the SimpleSchema.
Removed function AddVar, and instead now the variables are set on the constructor.
2019-07-12 19:55:07 +00:00
Teo Magnino Chaban ec092c4d35 Merged PR 4685: Declaration/Definition line adjustment
Changed the way that Declaration/Definition lines where being added into the Class Function.
Changed name from isValid to Verify in a couple functions. It makes more sense now, since they are not only checking validity, but also throwing errors where needed.
2019-07-11 16:11:14 +00:00
Teo Magnino Chaban 89bd64f8cc Merged PR 4679: Fixed consistency and spelling
Only changed Function to a dictionary at the last second (similar to obj_type).
Fixed spelling of a couple words.
Changed 'func_name' to 'name' to keep function and struct consistent.
2019-07-09 22:26:06 +00:00
Teo Magnino Chaban 88eb76fad5 Merged PR 4669: Improved error messages
Gave more detailed information on errors within functions and structs.
Updated UnitTests and IntegrationTests accordingly.
2019-07-09 18:44:17 +00:00
Teo Magnino Chaban c3a9caee65 Merged PR 4667: Structs as arguments
We will not support structs as parameters given to functions. UnitTests and IntegrationTests were modified to reflect this change. Added TODOs.
2019-07-08 22:18:34 +00:00
Michael Sharp f5015f3c6c Merged PR 4660: Added unit tests for the Debug plugin and the MlNet Plugin
Added unit testing for Debug plugin and MlNet plugin.

We had an error with file names previously due to no testing on the MlNet plugin. This PR adds testing to prevent issues like that in the future.
2019-07-08 17:14:16 +00:00
Michael Sharp 80e38d5d56 Merged PR 4653: Added simple mapping for C++ to C# types. Added the includes for the C++ wrapper.
Added a mapping from some C++ to C# types. This list will have to be updated/modified after we confirm the types we support. It will also need to be modified when we support structs.

Prior, the C++ wrapper didn't include any of the `#includes` from the C++ files. Now, it parses the includes list and includes them as well.

NOTE: The includes list does not say whether it was `#includes "file"` or `#includes <file>` and after talking with @<Teo Magnino Chaban>, determining that is not trivial. According to the [C standard](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf#page=182), section 6.10.2 the only real difference is where the compiler looks initially for the file. If it is `#includes "file"` the compiler will first look somewhere (specific to each compiler, but usually in that directory) and then if it can't find the file will reprocess that line as if it were `#includes <file>`. Due to this, I have decided to do all includes as `#includes "file"`. The only potential problem I see with this is if there is a file in the local directory with the exact same name as in the system directory. Open to discussion on this point.
2019-07-03 21:23:18 +00:00
Teo Magnino Chaban 5a2026eba4 Merged PR 4649: Removed constructor_name and minor changes
Removed constructor_name from the .SimpleSchema, made ENUMS be unsupported, fixed the empty name that would be caused by a default constructor.
2019-07-03 16:19:12 +00:00
Michael Sharp 4b9812696a Merged PR 4641: ML.NET code to auto generate C# files and C++ wrapper.
Code that creates the ML.NET C# classes and the C++ wrapper to interface C++ DLL files with ML.NET.

Related work items: #3571
2019-07-03 16:10:36 +00:00
Teo Magnino Chaban a5a8777775 Merged PR 4642: Cleaning and Refactoring
Changed the code to be easier to understand, this includes the creation of new Classes and the removal of code that was not going to be used.
2019-07-02 21:24:18 +00:00
Jamie Gordon 9b74999a18 Merged PR 4630: fixes for build/parse issues and ide integration
Build cpp files separately instead of #including them, as that masks some errors.
Add header files as well as cpp files to the cmake, for IDE purposes.
2019-06-28 18:38:34 +00:00
Teo Magnino Chaban 6996e9721c Merged PR 4615: Support for Includes
Added support to multiple files, will now recursively go looking for structs/classes that are required for functions that are on included files
2019-06-27 19:54:54 +00:00
Jamie Gordon a5ea64f005 Merged PR 4600: Regex Vectorizer Transform
Adds a transform for "Regex Vectorization", creating creating a matrix of matches of a list of regular expressions on a list/column of strings.

There are a lot of unknowns on what this should *really* be doing and what the interface should be, but for now it is emulating the functionality of a python implementation.
2019-06-27 16:17:39 +00:00
Teo Magnino Chaban 91d0502fba Merged PR 4603: Adaptability to DateTime
Added required functionalities to make date_time work, but its not complete since there are a few functionalities that need multiple file access, which is not currently implemented.
2019-06-26 23:26:20 +00:00
Teo Magnino Chaban f5fd69061f Merged PR 4576: Structs/Classes and Includes
Added all object_type functionalities, exporting includes and fixed general problems
2019-06-25 20:53:50 +00:00
Teo Magnino Chaban 9375dee935 Merged PR 4590: Json SimpleSchema
Added necessary files for the new Json SimpleSchema
2019-06-24 15:36:36 +00:00
Jamie Gordon 6c335c69ed Merged PR 4570: Add DateTime conversion
Adds DateTime structure and DateTime from c++chrono::system_clock::time_point.
Conversion available as a function and as conversion constructor and assignment operator in the DateTime struct.

Though the input is time_point, we have to convert and use the old C lib methods, because there is still nothing else.
C++2x is set to finally have useful time_point functions ... I suspect we will wind up wanting to take time_t directly from callers as well, since I would imagine that's a common format for them to have and it would be silly to convert from time_t for the function to just convert right back.

I have made some assumptions about input and expected output that may need updating - most values I had to get whether they want to be 0-based or 1-based. All are documented in the struct declaration, and are trivial to change where needed.

Bigger and more important, I also assumed that we will not be operating on dates earlier than 1970. Earlier dates require a completely different implementation on Windows. (msvcrt doesn't support negative time_t, earlier dates require a win32 solution).

Related work items: #3538
2019-06-19 18:51:51 +00:00
David Brownell 3cd18d47ac Merged PR 4575: Generating CMake files that can be used to compile C++ code for the ML.NET Plugin on Windows and Linux
Generating CMake files that can be used to compile C++ code for the ML.NET Plugin on Windows and Linux
2019-06-19 17:56:19 +00:00