Aleksei Smirnov
f7b8d560cd
Implement vectorized binary arithmetic operations ( #6854 )
...
* Remove separate enums for scalar operations
* Align implementation with .Net 8.0 Tensor API
* Fix Modulo
* Implement vectorized Arithmetic binary operations
* Implement vectorized Arithmetics binary scalar operations
2023-10-16 14:59:49 -07:00
Aleksei Smirnov
9c183fc35b
Fix Saving csv with VBufferDataFrameColumn ( #6860 )
2023-10-13 22:23:43 -07:00
Diego Colombo
e3ec250d51
update .NET Interactive ( #6857 )
2023-10-11 20:45:31 -04:00
Aleksei Smirnov
64d7ebd093
Fixes incorrect work of DataFrame with VBufferColumn when number of e… ( #6851 )
...
* Fixes incorrect work of DataFrame with VBufferColumn when number of elements is greater than Int.MaxValue
* Fix calculation of max capacity and amount of required buffers
* Fix unit test
* Run test allocating more than 2 Gb of memory on 64bit env only
* Fix StringDataFrameColumn same way as VBufferDataFrameColumn
* Fix wrong amount of buffers created in constructor of StringDataFrameColumn
* Fix code review findings
2023-10-04 11:04:47 -07:00
Aleksei Smirnov
5cf6051db7
Increase performance of arithmetic operations by enhancing calculations on nullable values ( #6846 )
...
* Optimize PrimitiveColumnContainer.Clone method
* Avoid unnecessary type conversion during binary operations
* Remove using
* Fix DataFrameBuffer constructor
* remove uncorrectly added using
* Make DataFrameBuffer Length field protected
* Add performance tests
* Split Test for AppendMany into 4 different tests
* Block init of null validity buffer instead of setting individual bits
* Add unit tests for PrimitiveDataFrameColumn.Clone
* Fixes #6821
* Fix
* Add extra tests
* Fix
* Fix typo
* Fix Divide_Int16 and Divide_Int32_Int16 benchmarks
* Fix
* Avoid using constructor, that copies memory
* First step of tt refactoring
* Step 2
* Step 3
* Move iteration over buffers outside of the PrimitiveDataFrameColumnArithmetic
* Change PrimitiveDataFrameColumnArithmetic
* Fix typo
* Use RawSpan
* Fix bug with AppendMany values to not empty column
* Restart unit tests
* Add more unit tests
* Add GetBitCount method
* Fix failing unit test
* Implementation
* Change unit tests
* Update unit tests
* Refactoring BinaryOperation
* Intermediate changes
* Intermediate results
* Implement Binary Scalar Reverse Operarions
* Add implementation for BinaryIntOperations
* Implement Comparison Operations
* Implement actual calculations for Comparison operations
* Uncomment performance tests
* Remove unintentional code changes
* Add reference to Apache Arrow project license in THIRD-PARTY-NOTICES
* Fix license issues
2023-10-02 17:04:03 -07:00
Aleksei Smirnov
3c625bf542
6847 incorrectly sets column value ( #6849 )
...
* Fix DataFrame incorrectly sets column value for index higher than Buffer.MaxCapacity
* Revert renaming
2023-10-02 16:07:10 -07:00
Aleksei Smirnov
7fe293da31
PrimitiveDataFrameColumn.Clone method crashes when is used with IEnumerable mapIndices argument ( #6822 )
...
* Split Test for AppendMany into 4 different tests
* Block init of null validity buffer instead of setting individual bits
* Add unit tests for PrimitiveDataFrameColumn.Clone
* Fixes #6821
* Fix
* Fix bug with AppendMany values to not empty column
* Restart unit tests
* Add more unit tests
* Fix failing unit test
* Fix code review findings
2023-09-27 15:28:28 -07:00
Eric StJohn
97926a8c53
Update dependencies ( #6837 )
...
* Update dependencies
* Add reference to NuGet.Packaging.Core
2023-09-27 11:16:01 -07:00
R. G. Esteves
66eed89f6b
Addresses #6533 ( #6838 )
...
* Initial structure and started fleshing out some sections
* Some corrections and paragraph on DL usages
* Starting fleshing out DL on ML.NET section
* Addresses #6533
2023-09-27 11:35:51 -06:00
Aleksei Smirnov
85ee6e5117
Simplify tt files for PrimitiveDataFrameColumnAritmetics ( #6830 )
...
* First step of tt refactoring
* Step 2
* Step 3
2023-09-26 18:36:11 -07:00
Aleksei Smirnov
5648c89bbd
Improve performance of column cloning inside DataFrame arithmetics ( #6814 )
...
* Optimize PrimitiveColumnContainer.Clone method
* Avoid unnecessary type conversion during binary operations
* Remove using
* Fix DataFrameBuffer constructor
* remove uncorrectly added using
* Make DataFrameBuffer Length field protected
* Fix typo
* Use RawSpan
2023-09-26 18:28:39 -07:00
Aleksei Smirnov
15e6a556ce
Add performance benchmarks for dataframe arithmetic operations ( #6827 )
...
* Add performance tests
* Add extra tests
* Fix
* Fix typo
* Fix Divide_Int16 and Divide_Int32_Int16 benchmarks
* Fix
* Change csproj file
* Update BenchmarkDotNetVersion to 0.13.5
* Fix
* Change to 0.13.1 because that is what is latest version in our nuget feeds.
---------
Co-authored-by: Jake Radzikowski <JakeRad@Microsoft.com>
2023-09-26 10:23:39 -07:00
Xiaoyun Zhang
a052146947
update interactive kernel version ( #6836 )
...
* update interactive kernel version
* update
* Update Microsoft.Data.Analysis.Interactive.Tests.csproj
2023-09-26 01:26:20 -07:00
Raffaello Fraboni
49824f3915
Fix wrong type conversion on PrimitiveDataFrameColumn ( #6834 )
...
* Fix wrong type conversion on PrimitiveDataFrameColumn
* Added tests for #6829
* Fix test
* Add file generated from tt template and fix unit tests
---------
Co-authored-by: Aleksei Smirnov <tlalok@inbox.ru>
2023-09-25 10:52:21 -07:00
Michael Sharp
09b80f8a08
removed codecov token ( #6811 )
2023-09-08 13:13:01 -06:00
Aleksei Smirnov
d6927515d8
Append dataframe rows based on column names ( #6808 )
...
* Append dataframe rows based on column names
* Update DataFrame.cs
---------
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
2023-09-01 22:06:55 -06:00
Aleksei Smirnov
d9dbf99d97
Allow to define CultureInfo for parsing values on reading DataFrame from csv ( #6782 )
...
* Use CultureInfo for parsing values in csv file
* Fix merge issues
2023-08-31 21:36:46 -06:00
Lehonti Ramos
ccf34e370b
File-scoped namespaces in files under `Prediction` (`Microsoft.ML.Core`) ( #6792 )
...
Co-authored-by: Lehonti Ramos <john@doe>
2023-08-31 21:36:33 -06:00
Aleksei Smirnov
e6a88c440b
Fix inconsistent null handling in DataFrame Arithmetics ( #6770 )
...
* Fix inconsistent null handling in DataFrame Arithmetics
* Fix Null Count and division by zero issues
* Minor changes to restart build and rerun flaky tests
2023-08-31 10:27:49 -06:00
Lehonti Ramos
aaf226c7e7
File-scoped namespaces in files under `Data` (`Microsoft.ML.Core`) ( #6789 )
...
Co-authored-by: Lehonti Ramos <john@doe>
2023-08-30 22:21:45 -06:00
Lehonti Ramos
34389b63e5
File-scoped namespaces in files under `ComponentModel` (`Microsoft.ML.Core`) ( #6788 )
...
Co-authored-by: Lehonti Ramos <john@doe>
2023-08-30 22:21:19 -06:00
Aleksei Smirnov
e3f53a4497
Fix DataFrame.LoadCsv can not load CSV with duplicate column names ( #6772 )
2023-08-30 20:38:24 -06:00
Aleksei Smirnov
39235a76d9
Fix issue with addIndexColumn in DataFrame.LoadCsv ( #6769 )
...
* Fix issue with addIndexColumn in DataFrame.LoadCsv
* Fix tests
2023-08-25 15:59:41 -06:00
Lehonti Ramos
43a6a81185
File-scoped namespaces in files under `EntryPoints` (`Microsoft.ML.Core`) ( #6790 )
...
Co-authored-by: Lehonti Ramos <john@doe>
2023-08-25 15:58:35 -06:00
Lehonti Ramos
92eccadb19
File-scoped namespaces in files under `Environment` (`Microsoft.ML.Core`) ( #6791 )
...
Co-authored-by: Lehonti Ramos <john@doe>
2023-08-25 15:58:05 -06:00
zewditu Hailemariam
179f7dc781
Add TargetType to Type_convert ( #6785 )
...
* Add target Type in convert type
* Add custom type "DataKind"
* clean
* Add DataKind name space
* clean test
2023-08-25 10:57:35 -07:00
Michael Sharp
c28d5af9f4
removed deprecated yosemite brew ( #6805 )
2023-08-24 12:56:15 -07:00
Lehonti Ramos
077a6b8196
Modernized some argument checks that still used string literals for parameter names ( #6766 )
...
Co-authored-by: John Doe <john@doe>
2023-08-06 19:49:12 -07:00
zewditu Hailemariam
ea84d429a1
Add QA sweepable estimator in AutoML ( #6781 )
...
* Add QA sweepable
* clean
2023-08-03 12:24:11 -07:00
Aleksei Smirnov
a823199307
Improve DataFrame Arithmetics implementation ( #6763 )
...
* Change methods signature generation
* Change DataFrameColumn Arithmetics
* Change DataFrameColumn Operations
* Fix unit tests
* Fix spaces
* Fix code review findings
2023-07-28 07:58:03 -07:00
Michael Sharp
8952994c67
fixed mac build and minor torch sharp changes ( #6776 )
2023-07-27 21:46:14 -06:00
Xiaoyun Zhang
7b6af06545
fix issue ( #6768 )
2023-07-24 16:23:09 -06:00
Michael Sharp
65c7ca9d9a
Add NameEntityRecognition and Q&A deep learning tasks. ( #6760 )
...
* NER
* QA almost done, runtime error
* QA finished
* fixes from PR comments
* fixed build
* build fixes
* perf changes
* made disposable
* fixed not disposing model
* added some disposables to TensorFlow for memory
* build testing
* fixing build
* added missing dispose
* build fixes
* build fixes
* testing macos fix
2023-07-24 13:47:24 -06:00
Aleksei Smirnov
321158d138
Clean DataFrame meaningless code ( #6761 )
2023-07-11 13:07:57 -07:00
Aleksei Smirnov
69cc4bcb76
Fix incorrect DataFrame min max computation with NULL ( #6734 )
...
* Step 1
* Step 2
* Fixed code review findings
2023-07-07 15:22:02 -07:00
Aleksei Smirnov
578d7bcb67
Provide ability to filter dataframe column by null via ElementWise Methods ( #6723 )
...
* Provide ability to filter by null value
* Add comments
* Fix code review findings
2023-07-07 08:58:42 -07:00
Aleksei Smirnov
caee3c2e2d
Reduce coupling of Data.Analysis.Tests project ( #6759 )
2023-07-07 02:41:49 -07:00
Aleksei Smirnov
d9e1ee1e27
Run tests that requires more than 2 Gb of Memory only on 64-bit env ( #6758 )
2023-07-07 02:39:47 -07:00
Aleksei Smirnov
69eca5689a
Fix dataframe arithmetics for columns having several value buffers (column size is more than 2 Gb) ( #6724 )
...
* Fix dataframe arithmetics
* Fix
2023-07-06 14:00:22 -07:00
Xiaoyun Zhang
36f87d111e
avoid empty dataset ( #6756 )
2023-07-06 13:16:01 -07:00
Aleksei Smirnov
26c24463a8
Fix DataFrame to allow to store columns with size more than 2 Gb ( #6710 )
...
* Fix error with allocating more than MaxCapacity of Byte Memory Buffer
* Remove Unit test as it consumes too much memory
* Fix issue with increasing buffer capacity over limit when double it size
2023-07-06 12:33:39 -07:00
Aleksei Smirnov
53c0f269d6
Fix the behavior or column SetName method ( #6676 )
...
* Fix the behavior or column SetName method
* Fix stack overflow exception
* Fix merge issues
---------
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
2023-07-06 12:17:20 -07:00
Aleksei Smirnov
443ceb936e
Add missing implementation for datetime relevant arrow type into dataframe ( #6675 )
...
* Add missing implementation for datetime relevant arrow type
* Return required usage
2023-07-06 12:15:46 -07:00
Jake
4c799ab1c8
Update build templates to handle feature branches ( #6744 )
...
* Update build templates
* Update build templates to include all releases/* and feature/*
* Update releases to release
* Update triggers for PR Validation Build
* Add triggers for Code Coverage
2023-06-27 17:02:56 -07:00
Xiaoyun Zhang
8858ab6466
stop shuffle rows in ITrainValidationDatasetManager ( #6742 )
...
* stop shuffle rows
* make file read only
2023-06-23 02:25:08 -07:00
Xiaoyun Zhang
184e66108c
smac - ignore fail trial during initialize ( #6738 )
2023-06-21 10:33:27 -07:00
Michael Sharp
8f8905ee32
brew test ( #6739 )
2023-06-20 10:48:17 -07:00
Xiaoyun Zhang
22342df7fb
Continue training on OOM error && add subsampling support for trainValidationDatasetManager ( #6714 )
...
* Update AutoMLExperiment.cs
* implement subsampling for train-validation dataset manager
* fix test
* fix comments
* fix comment
* revert tests
2023-06-14 00:27:29 -07:00
Aleksei Smirnov
b28710ae78
Reset DataFrame.RowCount to zero, when DataFrame is empty ( #6698 )
...
* Reset RowCount to zero, when DataFrame is empty
* Fix typo.
---------
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
2023-06-13 17:39:40 -06:00
Aleksei Smirnov
31e4b64d00
Fix DataFrame bounds checking on indexing elements ( #6681 )
2023-06-13 17:26:40 -06:00