FeaturizersLibrary

Граф коммитов

Автор	SHA1	Сообщение	Дата
Michael Sharp	2dd988cfce	Merged PR 5201: Updated Featurizers SimpleSchema Updated SimpleSchema to allow for better mapping of input->output types. Added support for multiple template types. Note: Tuple types aren't supported yet, so the code is generated for RobustScalar but we will need to add that code in manually in the custom section.	2019-10-04 15:28:51 +00:00
Ye Wang	da44bfb1f9	Merged PR 5184: RobustScalarFeaturizer RobustScalarFeaturizer chains the RobustScalarNormEstimator and RobustScalarTransformer. RobustScalarNormEstimator: takes the training data and gets its median and range RobustScalarTransformer: use the median and range(transform to "scale" using q_range) to modify inference data row by row Estimator signature: Estimator(ptr, _with_centering = true, _with_scaling = true, _quantile_range = (25.0, 75.0))	2019-10-03 20:03:47 +00:00
Ye Wang	916bb7958b	Merged PR 5176: Fix build error(artifact download) when triggered by another build(completion) resolve the issue by using two artifact download tasks with different condition	2019-09-30 18:22:06 +00:00
Michael Sharp	e4649ef6a2	Merged PR 5170: removed ML.NET files as they are now in ML.NET itself removed ML.NET files as they are now in ML.NET itself	2019-09-26 22:29:51 +00:00
Jin Yan	f1a713d9df	Merged PR 5151: new integration add support for big endian serialization. The goal is to serialization all in little endian to add portability across little and big endian architectures.	2019-09-26 17:43:13 +00:00
Ye Wang	969886d25d	Merged PR 5144: Azure Upload Azure upload for documentation and nuget packages	2019-09-26 17:01:50 +00:00
David Brownell	48b99117db	Merged PR 5155: Updated linux runtime to linux-x64 in nuget packages Updated linux runtime to linux-x64 in nuget packages	2019-09-24 19:48:05 +00:00
David Brownell	1242c0a9b8	Merged PR 5153: Updates for TimeSeriesImputerFeaturizer Updates for TimeSeriesImputerFeaturizer: - Validates input impute strategy - Moves median/col error validation to the transformer - Populates empty values in median scenarios when errors are suppressed - Introduces new Shared-object layer tests - Ensures chronological order for inputs during transform - Enumerates tests that need to be written	2019-09-24 17:30:42 +00:00
David Brownell	733ca71a36	Merged PR 5150: Functionality to specify an optional data root dir - Functionality to specify an optional data root dir - Moved sg_pointerTable to a single compilation unit and renamed it to g_pointerTable	2019-09-23 21:42:22 +00:00
David Brownell	77c02d515d	Merged PR 5149: Fixed bug when deserializing data used to recreate a transformer Fixed bug when deserializing data used to recreate a transformer	2019-09-23 16:51:31 +00:00
David Brownell	d4a1675353	Merged PR 5148: Signing binaries and nuget packages Signing content	2019-09-21 01:56:22 +00:00
Anuj Shrotriya	81f98ee18c	Merged PR 5139: TimeSeriesImputerFeaturizer- Initial implementation (WIP) This featurizer is supposed to do following: 1) Fill gaps (add rows) for grain cols in timeseries. 2) Impute specified cols per the impute strategy (For this iteration, supported imputed strategies are: ffill, bfill and median). Implementation Details This featurizer has been implemented as composition of three estimators: FrequencyEstimator: It is an annotation estimator. MedianEstimator: It is also an annotation estimator. ImputationEstimator: It is an inference only estimator. It reads in the frequency and median annotations and creates the transformer which does imputation. Opens Issues: PipelineExecutor to enable passing args to ctor. PipelineExecutor to enable invoking flush. flush implementation Archive implementation Honoring suppressError flag during imputation. More Unit tests.	2019-09-21 01:40:36 +00:00
David Brownell	f8482a016e	Merged PR 5142: Fixed serialization issues, added tests Fixed serialization issues, added tests	2019-09-20 20:38:13 +00:00
David Brownell	86639a8ae7	Merged PR 5141: Added functionality to enumerate holiday files Added functionality to enumerate holiday files and code to invoke functionality from the shared wrapper.	2019-09-20 19:03:48 +00:00
David Brownell	89dcbba065	Merged PR 5134: Estimators with custom constructor args can now be initialized within the PipelineExecutionEstimator Prior to this checkin, all Estimators within a Pipeline had to be initialized in the same way. After this checkin, functors can be provided to the Pipeline's constructor that allow for estimator-specific construction for estimators within the chain.	2019-09-19 21:32:09 +00:00
David Brownell	dae05ad276	Merged PR 5133: Shared Object Interface for TimeSeriesImputer Initial checkin	2019-09-19 17:59:58 +00:00
Anuj Shrotriya	4181401a98	Merged PR 5123: Interfaces and Dummy implementation for TimeSeriesImputer TimeSeries Imputer Implementation E2E (with dummy functional logic)	2019-09-19 01:19:14 +00:00
David Brownell	be04d491bd	Merged PR 5127: Nuget Package Updates Updates to ensure that data files are correctly copied when using the package as a dependency.	2019-09-18 22:30:22 +00:00
David Brownell	d774ed6115	Merged PR 5115: Building nuget packages Building nuget	2019-09-17 18:04:29 +00:00
David Brownell	69232f9353	Merged PR 5102: Refactor of DateTimeFeaturizer Changes to upload the data as part of the build - Moved data location - Updated cmake files to used cmake module that includes copies of the data - Added version information to Linux binaries	2019-09-13 23:46:48 +00:00
Ye Wang	23648f8774	Merged PR 5070: Add holiday name for DateTimefeaturizer 1: Add folder /3rdParty/holidays_by_country which contains json files(holiday information) for each country, and code for generating those files 2: Add json.h in /3rdParty for json related process 3: Modify DateTimeFeaturizer(.h & .cpp) to handle passing in Country Name by constructor. 4: Add tests for this new feature _please note that the holiday name may look different for different compilers_	2019-09-12 23:52:53 +00:00
Jin Yan	9d9a9df2f8	Merged PR 5083: integration with generated code Integration with generated code completed. EstimatorHandle, TransformerHandle, and ErrorInfoHandle go through pointer table after created and before getting used. Rebased on new Master branch.	2019-09-11 19:25:10 +00:00
David Brownell	a6b244c7b0	Merged PR 5085: Removed unused code and updated build headers Removed unused code and updated build headers	2019-09-11 17:39:56 +00:00
David Brownell	db2adde992	Merged PR 5084: Disabling code coverage on Linux PR builds Disabling code coverage on Linux PR builds	2019-09-11 00:07:03 +00:00
David Brownell	3efef14e12	Merged PR 5081: Added new build configurations, removing unused code - Added new build configurations for MSVC and Linux builds - Removed unused code - Added doxygen documentation generator - Added placeholder packaging code	2019-09-10 19:54:09 +00:00
Jin Yan	b85a776f72	Merged PR 5066: Pointer Table for Security Pointer Table implemented. Files: src\FeaturizerPrep\SharedLibrary\UnitTests\PointerTable_UnitTest.cpp src\FeaturizerPrep\SharedLibrary\PointerTable.h Testcase passed. TODO: Integrate with generated code.	2019-09-09 22:55:17 +00:00
David Brownell	44148ace13	Merged PR 5065: Refactor of Featurizer code and cmake build in preparation for new Featurizers Note that this change is only a refactor and does not alter any existing functionality	2019-09-05 22:48:49 +00:00
David Brownell	d53254e21e	Merged PR 5060: New technique for conversion of floats and doubles to strings New technique for conversion of floats and doubles to strings. This fixes an issue on Linux which we could only reproduce when testing via Nimbus.	2019-09-04 21:52:35 +00:00
David Brownell	b88e88a3e6	Merged PR 5055: Integration tests for the Shared Library interface	2019-09-04 19:02:11 +00:00
David Brownell	6960d88541	Merged PR 5049: Fixed bug with optional strings in code generated for the DLL interface Fixed bug with optional strings in code generated for the DLL interface	2019-08-30 18:39:26 +00:00
David Brownell	3054511061	Merged PR 5044: Scripts to create the C DLL interface, code generated by those scripts, and updates for Linux builds Note that all code in folders name "GeneratedCode" have been generated by other tools in this repo.	2019-08-29 23:48:26 +00:00
Michael Sharp	ff82df716a	Merged PR 5026: Additional columns for DateTimeFeaturizer Combined the functionality of both DateTimeFeaturizers by adding all the combined columns to this one. Holiday information is still not in here as we are not sure how we are going to handle that yet. I added 2 more libraries to the shared section. They are in the ORT repo and Dmitri said we are good to take a dependency on that.	2019-08-29 18:42:50 +00:00
Michael Sharp	0b6dea26ba	Merged PR 5035: Changed DateTimeFeaturizer input to std::int64_t Changed DT input to std::int64_t representing seconds since 1970 Removed existing ML.NET C++ wrapper code and tests as that will now be implemented with Dave's codegen.	2019-08-29 05:33:02 +00:00
Anuj Shrotriya	a2ab1c8a4e	Merged PR 5017: CatImputer CatImputer Description: This featurizer imputes missing values in an input column with the most frequent one. Design: Underlying implementation of this featurizer is composed of two estimators: 1) HistogramEstimator: This estimator computes the histogram for the input column and creates a HistogramAnnotation. Note that this 'IS A' Annotation Estimator i.e it doesn't have a transformer. 2) HistogramConsumerEstimator: This class retrieves a HistogramAnnotation created by HistogramEstimator and computes the most frequent value from it. This value is then used to impute missing values. Both of these estimators are chained in PipelineExecutionEstimator which is exposed as CatImputer.	2019-08-28 22:49:18 +00:00
David Brownell	67d0fd984c	Merged PR 5029: Fix for InferenceOnlyEstimators that were prematurely converted to Transformers within a Pipeline InferenceOnlyEstimators don't require training, but they can't be created prematurely within a pipeline if they rely on information generation by ancestor AnnotationEstimators. This fix delays the creation of Transformers associated with InterferenceOnlyEstimators until everything that comes before it in a pipeline as completed training.	2019-08-28 00:19:15 +00:00
David Brownell	b6f2cefc88	Merged PR 5025: Changes deferred from previous PR	2019-08-26 19:59:05 +00:00
David Brownell	445eb2abb8	Merged PR 5006: Added TrainingOnlyEstimatorImpl and renamed files for consistency	2019-08-26 16:43:42 +00:00
David Brownell	88a6395cd5	Merged PR 4996: Pipeline Execution Estimator The PipelineExecutionEstimator allows the caller to chain multiple Estimators end-to-end to form a pipeline (or DAG). During training, data trickles down to the currently untrained Estimator within the chain. Once all training is complete, a Transformer is created that only invokes the Transformers associated with TransformerEstimators in the original chain. This code attempts to provide compile-time warnings when Estimators are chained together in an incompatible way.	2019-08-21 16:44:13 +00:00
Michael Sharp	14835dbe44	Merged PR 4976: DateTimeTransformer implemented in ML.NET with assocated C++ wrapper This is the ML.NET framework implementation to use the shared C++ library DateTimeTransformer. It includes the ML.NET C# code, as well as the C++ wrapper code. It does NOT yet implement saving of the model or exporting to ONNX. The C# code works and has its associated Unit Tests, but since we have not setup the new project in the ML.NET solution it cannot be built in the DataPipelines repo yet. REVIEW: I am not sure how to make a CMAKE file for the C++ dll's, so that still needs to be added.	2019-08-19 16:45:08 +00:00
David Brownell	706ebbc9f1	Merged PR 4982: Removed boost::optional, restored Linux tests, updated some exceptions Removed boost::optional, restored Linux tests, updated some exceptions	2019-08-15 22:24:50 +00:00
David Brownell	0d6e278ec4	Merged PR 4972: Added serialization functionality to Transformers Added serialization functionality to Transformers	2019-08-14 20:01:20 +00:00
David Brownell	2e3d623850	Merged PR 4950: Updates to Featurizers to support additional scenarios Updates to Featurizers to support additional scenarios	2019-08-12 20:16:20 +00:00
Ye Wang	623496d798	Merged PR 4936: add to-do list for String Transformer add to-do list in Traits.h	2019-08-12 19:24:19 +00:00
Ye Wang	26a96778df	Merged PR 4930: add tuple transformer add tuple transformer for StringTransformer	2019-08-07 22:55:02 +00:00
Michael Sharp	855c3f6d27	Merged PR 4915: adding new optional to replace boost adding new optional to replace boost	2019-08-07 16:28:28 +00:00
Ye Wang	05a5474c5a	Merged PR 4904: String Transformer string transformer(wrapper for traits.h) for basic types(bool, string. integers, numbers, arrays, vectors, maps). Nested struct included. tests added for string transformer and traits.	2019-08-06 21:22:42 +00:00
David Brownell	e7b796b720	Merged PR 4902: Pass command line arguments to setup in the bootstrap process Pass command line arguments to setup in the bootstrap process	2019-08-05 17:38:36 +00:00
Michael Sharp	90d7aef480	Merged PR 4846: DateTime Transformer Created DateTimeTransformer in the Featurizers folder. Copied Jamie's DateTime code to the Featurizers folder. //REVIEW, should I remove her code from its original location?	2019-07-31 22:28:58 +00:00
David Brownell	c81b3ab3c0	Merged PR 4881: Including boost as a header only library for now (which disables cmake errors Including boost as a header only library for now (which disables cmake errors associated when compiled boost libraries can't be found)	2019-07-31 22:17:05 +00:00
Michael Sharp	455bf4f76c	Merged PR 4877: Traits class added Traits class added	2019-07-31 21:09:38 +00:00

... 2 3 4 5 6

299 Коммитов Все ветки Поиск

299 Коммитов

Все ветки