Граф коммитов

2931 Коммитов

Автор SHA1 Сообщение Дата
Aleksei Smirnov f7b8d560cd
Implement vectorized binary arithmetic operations (#6854)
* Remove separate enums for scalar operations

* Align implementation with .Net 8.0 Tensor  API

* Fix Modulo

* Implement vectorized Arithmetic binary operations

* Implement vectorized Arithmetics binary scalar operations
2023-10-16 14:59:49 -07:00
Aleksei Smirnov 9c183fc35b
Fix Saving csv with VBufferDataFrameColumn (#6860) 2023-10-13 22:23:43 -07:00
Diego Colombo e3ec250d51
update .NET Interactive (#6857) 2023-10-11 20:45:31 -04:00
Aleksei Smirnov 64d7ebd093
Fixes incorrect work of DataFrame with VBufferColumn when number of e… (#6851)
* Fixes incorrect work of DataFrame with VBufferColumn when number of elements is greater than Int.MaxValue

* Fix calculation of max capacity and amount of required buffers

* Fix unit test

* Run test allocating more than 2 Gb of memory on 64bit env only

* Fix StringDataFrameColumn same way as VBufferDataFrameColumn

* Fix wrong amount of buffers created in constructor of StringDataFrameColumn

* Fix code review findings
2023-10-04 11:04:47 -07:00
Aleksei Smirnov 5cf6051db7
Increase performance of arithmetic operations by enhancing calculations on nullable values (#6846)
* Optimize PrimitiveColumnContainer.Clone method

* Avoid unnecessary type conversion during binary operations

* Remove using

* Fix DataFrameBuffer constructor

* remove uncorrectly added using

* Make DataFrameBuffer Length field protected

* Add performance tests

* Split Test for AppendMany into 4 different tests

* Block init of null validity buffer instead of setting individual bits

* Add unit tests for PrimitiveDataFrameColumn.Clone

* Fixes #6821

* Fix

* Add extra tests

* Fix

* Fix typo

* Fix Divide_Int16 and Divide_Int32_Int16 benchmarks

* Fix

* Avoid using constructor, that copies memory

* First step of tt refactoring

* Step 2

* Step 3

* Move iteration over buffers outside of the PrimitiveDataFrameColumnArithmetic

* Change PrimitiveDataFrameColumnArithmetic

* Fix typo

* Use RawSpan

* Fix bug with AppendMany values to not empty column

* Restart unit tests

* Add more unit tests

* Add GetBitCount method

* Fix failing unit test

* Implementation

* Change unit tests

* Update unit tests

* Refactoring BinaryOperation

* Intermediate changes

* Intermediate results

* Implement Binary Scalar Reverse Operarions

* Add implementation for BinaryIntOperations

* Implement Comparison Operations

* Implement actual calculations for Comparison operations

* Uncomment performance tests

* Remove unintentional code changes

* Add reference to Apache Arrow project license in THIRD-PARTY-NOTICES

* Fix license issues
2023-10-02 17:04:03 -07:00
Aleksei Smirnov 3c625bf542
6847 incorrectly sets column value (#6849)
* Fix DataFrame incorrectly sets column value for index higher than Buffer.MaxCapacity

* Revert renaming
2023-10-02 16:07:10 -07:00
Aleksei Smirnov 7fe293da31
PrimitiveDataFrameColumn.Clone method crashes when is used with IEnumerable mapIndices argument (#6822)
* Split Test for AppendMany into 4 different tests

* Block init of null validity buffer instead of setting individual bits

* Add unit tests for PrimitiveDataFrameColumn.Clone

* Fixes #6821

* Fix

* Fix bug with AppendMany values to not empty column

* Restart unit tests

* Add more unit tests

* Fix failing unit test

* Fix code review findings
2023-09-27 15:28:28 -07:00
Eric StJohn 97926a8c53
Update dependencies (#6837)
* Update dependencies

* Add reference to NuGet.Packaging.Core
2023-09-27 11:16:01 -07:00
R. G. Esteves 66eed89f6b
Addresses #6533 (#6838)
* Initial structure and started fleshing out some sections

* Some corrections and paragraph on DL usages

* Starting fleshing out DL on ML.NET section

* Addresses #6533
2023-09-27 11:35:51 -06:00
Aleksei Smirnov 85ee6e5117
Simplify tt files for PrimitiveDataFrameColumnAritmetics (#6830)
* First step of tt refactoring

* Step 2

* Step 3
2023-09-26 18:36:11 -07:00
Aleksei Smirnov 5648c89bbd
Improve performance of column cloning inside DataFrame arithmetics (#6814)
* Optimize PrimitiveColumnContainer.Clone method

* Avoid unnecessary type conversion during binary operations

* Remove using

* Fix DataFrameBuffer constructor

* remove uncorrectly added using

* Make DataFrameBuffer Length field protected

* Fix typo

* Use RawSpan
2023-09-26 18:28:39 -07:00
Aleksei Smirnov 15e6a556ce
Add performance benchmarks for dataframe arithmetic operations (#6827)
* Add performance tests

* Add extra tests

* Fix

* Fix typo

* Fix Divide_Int16 and Divide_Int32_Int16 benchmarks

* Fix

* Change csproj file

* Update BenchmarkDotNetVersion to 0.13.5

* Fix

* Change to 0.13.1 because that is what is latest version in our nuget feeds.

---------

Co-authored-by: Jake Radzikowski <JakeRad@Microsoft.com>
2023-09-26 10:23:39 -07:00
Xiaoyun Zhang a052146947
update interactive kernel version (#6836)
* update interactive kernel version

* update

* Update Microsoft.Data.Analysis.Interactive.Tests.csproj
2023-09-26 01:26:20 -07:00
Raffaello Fraboni 49824f3915
Fix wrong type conversion on PrimitiveDataFrameColumn (#6834)
* Fix wrong type conversion on PrimitiveDataFrameColumn

* Added tests for #6829

* Fix test

* Add file generated from tt template and fix unit tests

---------

Co-authored-by: Aleksei Smirnov <tlalok@inbox.ru>
2023-09-25 10:52:21 -07:00
Michael Sharp 09b80f8a08
removed codecov token (#6811) 2023-09-08 13:13:01 -06:00
Aleksei Smirnov d6927515d8
Append dataframe rows based on column names (#6808)
* Append dataframe rows based on column names

* Update DataFrame.cs

---------

Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
2023-09-01 22:06:55 -06:00
Aleksei Smirnov d9dbf99d97
Allow to define CultureInfo for parsing values on reading DataFrame from csv (#6782)
* Use CultureInfo for parsing values in csv file

* Fix merge issues
2023-08-31 21:36:46 -06:00
Lehonti Ramos ccf34e370b
File-scoped namespaces in files under `Prediction` (`Microsoft.ML.Core`) (#6792)
Co-authored-by: Lehonti Ramos <john@doe>
2023-08-31 21:36:33 -06:00
Aleksei Smirnov e6a88c440b
Fix inconsistent null handling in DataFrame Arithmetics (#6770)
* Fix inconsistent null handling in DataFrame Arithmetics

* Fix Null Count and division by zero issues

* Minor changes to restart build and rerun flaky tests
2023-08-31 10:27:49 -06:00
Lehonti Ramos aaf226c7e7
File-scoped namespaces in files under `Data` (`Microsoft.ML.Core`) (#6789)
Co-authored-by: Lehonti Ramos <john@doe>
2023-08-30 22:21:45 -06:00
Lehonti Ramos 34389b63e5
File-scoped namespaces in files under `ComponentModel` (`Microsoft.ML.Core`) (#6788)
Co-authored-by: Lehonti Ramos <john@doe>
2023-08-30 22:21:19 -06:00
Aleksei Smirnov e3f53a4497
Fix DataFrame.LoadCsv can not load CSV with duplicate column names (#6772) 2023-08-30 20:38:24 -06:00
Aleksei Smirnov 39235a76d9
Fix issue with addIndexColumn in DataFrame.LoadCsv (#6769)
* Fix issue with addIndexColumn in DataFrame.LoadCsv

* Fix tests
2023-08-25 15:59:41 -06:00
Lehonti Ramos 43a6a81185
File-scoped namespaces in files under `EntryPoints` (`Microsoft.ML.Core`) (#6790)
Co-authored-by: Lehonti Ramos <john@doe>
2023-08-25 15:58:35 -06:00
Lehonti Ramos 92eccadb19
File-scoped namespaces in files under `Environment` (`Microsoft.ML.Core`) (#6791)
Co-authored-by: Lehonti Ramos <john@doe>
2023-08-25 15:58:05 -06:00
zewditu Hailemariam 179f7dc781
Add TargetType to Type_convert (#6785)
* Add target Type in convert  type

* Add custom type "DataKind"

* clean

* Add DataKind name space

* clean test
2023-08-25 10:57:35 -07:00
Michael Sharp c28d5af9f4
removed deprecated yosemite brew (#6805) 2023-08-24 12:56:15 -07:00
Lehonti Ramos 077a6b8196
Modernized some argument checks that still used string literals for parameter names (#6766)
Co-authored-by: John Doe <john@doe>
2023-08-06 19:49:12 -07:00
zewditu Hailemariam ea84d429a1
Add QA sweepable estimator in AutoML (#6781)
* Add QA sweepable

* clean
2023-08-03 12:24:11 -07:00
Aleksei Smirnov a823199307
Improve DataFrame Arithmetics implementation (#6763)
* Change methods signature generation

* Change DataFrameColumn Arithmetics

* Change DataFrameColumn Operations

* Fix unit tests

* Fix spaces

* Fix code review findings
2023-07-28 07:58:03 -07:00
Michael Sharp 8952994c67
fixed mac build and minor torch sharp changes (#6776) 2023-07-27 21:46:14 -06:00
Xiaoyun Zhang 7b6af06545
fix issue (#6768) 2023-07-24 16:23:09 -06:00
Michael Sharp 65c7ca9d9a
Add NameEntityRecognition and Q&A deep learning tasks. (#6760)
* NER

* QA almost done, runtime error

* QA finished

* fixes from PR comments

* fixed build

* build fixes

* perf changes

* made disposable

* fixed not disposing model

* added some disposables to TensorFlow for memory

* build testing

* fixing build

* added missing dispose

* build fixes

* build fixes

* testing macos fix
2023-07-24 13:47:24 -06:00
Aleksei Smirnov 321158d138
Clean DataFrame meaningless code (#6761) 2023-07-11 13:07:57 -07:00
Aleksei Smirnov 69cc4bcb76
Fix incorrect DataFrame min max computation with NULL (#6734)
* Step 1

* Step 2

* Fixed code review findings
2023-07-07 15:22:02 -07:00
Aleksei Smirnov 578d7bcb67
Provide ability to filter dataframe column by null via ElementWise Methods (#6723)
* Provide ability to filter by null value

* Add comments

* Fix code review findings
2023-07-07 08:58:42 -07:00
Aleksei Smirnov caee3c2e2d
Reduce coupling of Data.Analysis.Tests project (#6759) 2023-07-07 02:41:49 -07:00
Aleksei Smirnov d9e1ee1e27
Run tests that requires more than 2 Gb of Memory only on 64-bit env (#6758) 2023-07-07 02:39:47 -07:00
Aleksei Smirnov 69eca5689a
Fix dataframe arithmetics for columns having several value buffers (column size is more than 2 Gb) (#6724)
* Fix dataframe arithmetics

* Fix
2023-07-06 14:00:22 -07:00
Xiaoyun Zhang 36f87d111e
avoid empty dataset (#6756) 2023-07-06 13:16:01 -07:00
Aleksei Smirnov 26c24463a8
Fix DataFrame to allow to store columns with size more than 2 Gb (#6710)
* Fix error with allocating more than MaxCapacity of Byte Memory Buffer

* Remove Unit test as it consumes too much memory

* Fix issue with increasing buffer capacity over limit when double it size
2023-07-06 12:33:39 -07:00
Aleksei Smirnov 53c0f269d6
Fix the behavior or column SetName method (#6676)
* Fix the behavior or column SetName method

* Fix stack overflow exception

* Fix merge issues

---------

Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
2023-07-06 12:17:20 -07:00
Aleksei Smirnov 443ceb936e
Add missing implementation for datetime relevant arrow type into dataframe (#6675)
* Add missing implementation for datetime relevant arrow type

* Return required usage
2023-07-06 12:15:46 -07:00
Jake 4c799ab1c8
Update build templates to handle feature branches (#6744)
* Update build templates

* Update build templates to include all releases/* and feature/*

* Update releases to release

* Update triggers for PR Validation Build

* Add triggers for Code Coverage
2023-06-27 17:02:56 -07:00
Xiaoyun Zhang 8858ab6466
stop shuffle rows in ITrainValidationDatasetManager (#6742)
* stop shuffle rows

* make file read only
2023-06-23 02:25:08 -07:00
Xiaoyun Zhang 184e66108c
smac - ignore fail trial during initialize (#6738) 2023-06-21 10:33:27 -07:00
Michael Sharp 8f8905ee32
brew test (#6739) 2023-06-20 10:48:17 -07:00
Xiaoyun Zhang 22342df7fb
Continue training on OOM error && add subsampling support for trainValidationDatasetManager (#6714)
* Update AutoMLExperiment.cs

* implement subsampling for train-validation dataset manager

* fix test

* fix comments

* fix comment

* revert tests
2023-06-14 00:27:29 -07:00
Aleksei Smirnov b28710ae78
Reset DataFrame.RowCount to zero, when DataFrame is empty (#6698)
* Reset RowCount to zero, when DataFrame is empty

* Fix typo.

---------

Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
2023-06-13 17:39:40 -06:00
Aleksei Smirnov 31e4b64d00
Fix DataFrame bounds checking on indexing elements (#6681) 2023-06-13 17:26:40 -06:00