azure-kusto-parquet-conv/arrow-rs/CHANGELOG-old.md

630 KiB

Historical Changelog

51.0.0 (2024-03-15)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Prototype Arrow over HTTP in Rust #5496 [arrow]
  • Add DataType::ListView and DataType::LargeListView #5492 [parquet] [arrow]
  • Improve documentation around handling of dictionary arrays in arrow flight #5487 [arrow] [arrow-flight]
  • Better memory limiting in parquet ArrowWriter #5484 [parquet]
  • Support Creating Non-Nullable Lists and Maps within a Struct #5482 [arrow]
  • DISCUSSION
  • Build Scalar with ArrayRef #5459
  • AsyncArrowWriter doesn't limit underlying ArrowWriter to respect buffer-size #5450 [parquet]
  • Refine Display implementation for FlightError #5438 [arrow] [arrow-flight]
  • Better ergonomics for FixedSizeList and LargeList #5372 [arrow]
  • Update Flight proto #5367 [arrow] [arrow-flight]
  • Support check similar datatype but with different magnitudes #5358 [arrow]
  • Buffer memory usage for custom allocations is reported as 0 #5346 [arrow]
  • Can the ArrayBuilder trait be made Sync? #5344 [arrow]
  • support cast 'UTF8' to FixedSizeList #5339 [arrow]
  • Support Creating Non-Nullable Lists with ListBuilder #5330 [arrow]
  • ParquetRecordBatchStreamBuilder::new() panics instead of erroring out when opening a corrupted file #5315 [parquet]
  • Raw JSON Writer #5314 [arrow]
  • Add support for more fused boolean operations #5297 [arrow]
  • parquet: Allow disabling embed ARROW_SCHEMA_META_KEY added by the ArrowWriter #5296 [parquet]
  • Support casting strings like '2001-01-01 01:01:01' to Date32 #5280 [arrow]
  • Temporal Extract/Date Part Kernel #5266 [arrow]
  • Support for extracting hours/minutes/seconds/etc. from Time32/Time64 type in temporal kernels #5261 [arrow]
  • parquet: add method to get both the inner writer and the file metadata when closing SerializedFileWriter #5253 [parquet]
  • Release arrow-rs version 50.0.0 #5234

Fixed bugs:

  • Empty String Parses as Zero in Unreleased Arrow #5504 [arrow]
  • Unused import in nightly rust #5476 [parquet] [arrow] [arrow-flight]
  • Error The data type type List .. has no natural order when using arrow::compute::lexsort_to_indices with list and more than one column #5454 [arrow]
  • Wrong size assertion in arrow_buffer::builder::NullBufferBuilder::new_from_buffer #5445 [arrow]
  • Inconsistency between comments and code implementation #5430 [arrow]
  • OOB access in Buffer::from_iter #5412 [arrow]
  • Cast kernel doesn't return null for string to integral cases when overflowing under safe option enabled #5397 [arrow]
  • Make ffi consume variable layout arrays with empty offsets #5391 [arrow]
  • RecordBatch conversion from pyarrow loses Schema's metadata #5354 [arrow]
  • Debug output of Time32/Time64 arrays with invalid values has confusing nulls #5336 [arrow]
  • Removing a column from a RecordBatch drops schema metadata #5327 [arrow]
  • Panic when read an empty parquet file #5304 [parquet]
  • How to enable statistics for string columns? #5270 [parquet]
  • concat::tests::test_string_dictionary_merge failure fails on Mac / has different results in different platforms #5255 [arrow]

Documentation updates:

  • Minor: Add doc comments to GenericByteViewArray #5512 [arrow] (alamb)
  • Improve docs for logical and physical nulls even more #5434 [arrow] (alamb)
  • Add example of converting RecordBatches to JSON objects #5364 [arrow] (alamb)

Performance improvements:

Closed issues:

  • Add StringViewArray implementation and layout and basic construction + tests #5469 [parquet] [arrow]
  • Add DataType::Utf8View and DataType::BinaryView #5468 [parquet] [arrow]

Merged pull requests:

50.0.0 (2024-01-08)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Support get offsets or blocks info from arrow file. #5252 [arrow]
  • Make regexp_match take scalar pattern and flag #5246 [arrow]
  • Cannot access pen state website on arrow-row #5238 [arrow]
  • RecordBatch with_schema's error message is hard to read #5227 [arrow]
  • Support cast between StructArray. #5219 [arrow]
  • Remove nightly-only simd feature and related code in ArrowNumericType #5185 [arrow]
  • Use Vec instead of Slice in ColumnReader #5177 [parquet]
  • Request to Memmap Arrow IPC files on disk #5153 [arrow]
  • GenericColumnReader::read_records Yields Truncated Records #5150 [parquet]
  • Nested Schema Projection #5148 [parquet] [arrow]
  • Support specifying quote and escape in Csv WriterBuilder #5146 [arrow]
  • Support casting of Float16 with other numeric types #5138 [arrow]
  • Parquet: read parquet metadata with page index in async and with size hints #5129 [parquet]
  • Cast from floating/timestamp to timestamp/floating #5122 [arrow]
  • Support Casting List To/From LargeList in Cast Kernel #5113 [arrow]
  • Expose a path for converting bytes::Bytes into arrow_buffer::Buffer without copy #5104 [arrow]
  • API inconsistency of ListBuilder make it hard to use as nested builder #5098 [arrow]
  • Parquet: don't truncate min/max statistics for float16 and decimal when writing file #5075 [parquet]
  • Parquet: derive boundary order when writing columns #5074 [parquet]
  • Support new Arrow PyCapsule Interface for Python FFI #5067 [arrow]
  • 48.0.1 arrow patch release #5050 [parquet] [arrow]
  • Binary columns do not receive truncated statistics #5037 [parquet]
  • Re-evaluate Explicit SIMD Aggregations #5032 [arrow]
  • Min/Max Kernels Should Use Total Ordering #5031 [arrow]
  • Allow zip compute kernel to take Scalar / Datum #5011 [arrow]
  • Add Float16/Half-float logical type to Parquet #4986 [parquet]
  • feat: cast (Large)List to FixedSizeList #5081 [arrow] (wjones127)
  • Update Parquet Encoding Documentation #5051 [parquet]

Fixed bugs:

  • json schema inference can't handle null field turned into object field in subsequent rows #5215 [arrow]
  • Invalid trailing content after Z in timezone is ignored #5182 [arrow]
  • Take panics on a fixed size list array when given null indices #5169 [arrow]
  • EnabledStatistics::Page does not take effect on ByteArrayEncoder #5162 [parquet]
  • Parquet: ColumnOrder not being written when writing parquet files #5152 [parquet]
  • Parquet: Interval columns shouldn't write min/max stats #5145 [parquet]
  • cast Utf8 to decimal failure #5127 [arrow]
  • coerce_primitive not honored when decoding from serde object #5095 [arrow]
  • Unsound MutableArrayData Constructor #5091 [arrow]
  • RowGroupReader.get_row_iter() fails with Path ColumnPath not found #5064 [parquet]
  • cast format 'yyyymmdd' to Date32 give a error #5044 [arrow]

Performance improvements:

  • ArrowArrayStreamReader imports FFI_ArrowSchema on each iteration #5103 [arrow]

Closed issues:

  • Working example of list_flights with ObjectStore #5116
  • object\_store Error broken pipe on S3 multipart upload #5106

Merged pull requests:

49.0.0 (2023-11-07)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Cast from integer/timestamp to timestamp/integer #5039 [arrow]
  • Support casting from integer to binary #5014 [arrow]
  • Return row count when inferring schema from JSON #5007 [arrow]
  • FlightSQL
  • Support RecordBatch::remove_column() and Schema::remove_field() #4952 [arrow]
  • arrow_json: support binary deserialization #4945 [arrow]
  • Support StructArray in Cast Kernel #4908 [arrow]
  • There exists a ParquetRecordWriter proc macro in parquet_derive, but ParquetRecordReader is missing #4772 [parquet]

Fixed bugs:

  • Regression when serializing large json numbers #5038 [arrow]
  • RowSelection::intersection Produces Invalid RowSelection #5036 [parquet]
  • Incorrect comment on arrow::compute::kernels::sort::sort_to_indices #5029 [arrow]

Documentation updates:

  • chore: Update docs to refer to non deprecated function `partition` #5027 [arrow] (alamb)

Merged pull requests:

48.0.0 (2023-10-18)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Allow schema fields to merge with Null datatype #4901 [arrow]
  • Add option to FlightDataEncoder to always send dictionaries #4895 [arrow] [arrow-flight]
  • Rework Thrift Encoding / Decoding of Parquet Metadata #4891 [parquet]
  • Plans for supporting Extension Array to support Fixed shape tensor Array #4890
  • Implement Take for UnionArray #4882 [arrow]
  • Check precision overflow for casting floating to decimal #4865 [arrow]
  • Replace lexical #4774 [arrow]
  • Add read access to settings in csv::WriterBuilder #4735 [arrow]
  • Improve the performance of "DictionaryValue" row encoding #4712 [arrow] [arrow-flight]

Fixed bugs:

  • Should we make blank values and empty string to None in csv? #4939 [arrow]
  • FlightSQL
  • Loading page index breaks skipping of pages with nested types #4921 [parquet]
  • CSV schema inference assumes Utf8 for empty columns #4903 [arrow]
  • parquet: Field Ids are not read from a Parquet file without serialized arrow schema #4877 [parquet]
  • make_primitive_scalar function loses DataType Internal information #4851 [arrow]
  • StructBuilder doesn't handle nulls correctly for empty structs #4842 [arrow]
  • NullArray::is_null() returns false incorrectly #4835 [arrow]
  • cast_string_to_decimal should check precision overflow #4829 [arrow]
  • Null fields are omitted by infer_json_schema_from_seekable #4814 [arrow]

Closed issues:

  • Support for reading JSON Array to Arrow #4905 [arrow]

Merged pull requests:

47.0.0 (2023-09-19)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Row Format Adapative Block Size #4812 [arrow]
  • Stateless Row Conversion #4811 [arrow] [arrow-flight]
  • Add option to specify custom null values for CSV reader #4794 [arrow]
  • parquet::record::RowIter cannot be customized with batch_size and defaults to 1024 #4782 [parquet]
  • DynScalar abstraction something that makes it easy to create scalar `Datum`s #4781 [arrow]
  • Datum is not exported as part of arrow it is only exported in `arrow_array` #4780 [arrow]
  • Scalar is not exported as part of arrow it is only exported in `arrow_array` #4779 [arrow]
  • Support IntoPyArrow for impl RecordBatchReader #4730 [arrow]
  • Datum Based String Kernels #4595 [arrow] [arrow-flight]

Fixed bugs:

  • MapArray::new_from_strings creates nullable entries field #4807 [arrow]
  • pyarrow module can't roundtrip tensor arrays #4805 [arrow]
  • concat_batches errors with "schema mismatch" error when only metadata differs #4799 [arrow]
  • panic in cmp kernels with DictionaryArrays: Option::unwrap() on a None value' #4788 [arrow]
  • stream ffi panics if schema metadata values aren't valid utf8 #4750 [arrow]
  • Regression: Incorrect Sorting of *ListArray in 46.0.0 #4746 [arrow]
  • Row is no longer comparable after reuse #4741 [arrow]
  • DoPut FlightSQL handler inadvertently consumes schema at start of Request<Streaming<FlightData>> #4658
  • Return error when converting schema #4752 [arrow] (wjones127)
  • Implement PyArrowType for Box<dyn RecordBatchReader + Send> #4751 [arrow] (wjones127)

Closed issues:

  • Building arrow-rust for target wasm32-wasi falied to compile packed_simd_2 #4717

Merged pull requests:

46.0.0 (2023-08-21)

Full Changelog

Breaking changes:

Implemented enhancements:

  • parquet: support setting the field_id with an ArrowWriter #4702 [parquet]
  • Support references in i256 arithmetic ops #4694 [arrow]
  • Precision-Loss Decimal Arithmetic #4664 [arrow]
  • Faster i256 Division #4663 [arrow]
  • Support concat_batches for 0 columns #4661 [arrow]
  • filter_record_batch should support filtering record batch without columns #4647 [arrow]
  • Improve speed of lexicographical_partition_ranges #4614 [arrow]
  • object_store: multipart ranges for HTTP #4612
  • Add Rank Function #4606 [arrow]
  • Datum Based Comparison Kernels #4596 [parquet] [arrow] [arrow-flight]
  • Convenience method to create DataType::List correctly #4544 [arrow]
  • Remove Deprecated Arithmetic Kernels #4481 [arrow]
  • Equality kernel where null==null gives true #4438 [arrow]

Fixed bugs:

  • Parquet ArrowWriter Ignores Nulls in Dictionary Values #4690 [parquet] [arrow]
  • Schema Nullability Validation Fails to Account for Dictionary Nulls #4689 [parquet] [arrow]
  • Comparison Kernels Ignore Nulls in Dictionary Values #4688 [parquet] [arrow]
  • Casting List to String Ignores Format Options #4669 [arrow]
  • Double free in C Stream Interface #4659 [arrow]
  • CI Failing On Packed SIMD #4651 [arrow]
  • RowInterner::size() much too low for high cardinality dictionary columns #4645 [arrow]
  • Decimal PrimitiveArray change datatype after try_unary #4644
  • Better explanation in docs for Dictionary field encoding using RowConverter #4639 [arrow]
  • List(FixedSizeBinary) array equality check may return wrong result #4637 [arrow]
  • arrow::compute::nullif panics if NullArray is provided #4634 [arrow]
  • Empty lists in FixedSizeListArray::try_new is not handled #4623 [arrow]
  • Bounds checking in MutableBuffer::set_null_bits can be bypassed #4620 [arrow]
  • TypedDictionaryArray Misleading Null Behaviour #4616 [parquet] [arrow]
  • bug: Parquet writer missing row group metadata fields such as compressed_size, file offset. #4610 [parquet]
  • new_null_array generates an invalid union array #4600 [arrow]
  • Footer parsing fails for very large parquet file. #4592 [parquet]
  • bug(parquet): Disabling global statistics but enabling for particular column breaks reading #4587 [parquet]
  • arrow::compute::concat panics for dense union arrays with non-trivial type IDs #4578 [arrow]

Closed issues:

  • object\_store

Merged pull requests:

45.0.0 (2023-07-30)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Use FormatOptions in Const Contexts #4580 [arrow]
  • Human Readable Duration Display #4554 [arrow]
  • BooleanBuilder: Add validity_slice method for accessing validity bits #4535 [arrow]
  • Support FixedSizedListArray for length kernel #4517 [arrow]
  • RowCoverter::convert that targets an existing Rows #4479 [arrow]

Fixed bugs:

  • Panic assertion failed: idx < self.len when casting DictionaryArrays with nulls #4576 [arrow]
  • arrow-arith is_null is buggy with NullArray #4565 [arrow]
  • Incorrect Interval to Duration Casting #4553 [arrow]
  • Too large validity buffer pre-allocation in FixedSizeListBuilder::new #4549 [arrow]
  • Like with wildcards fail to match fields with new lines. #4547 [arrow]
  • Timestamp Interval Arithmetic Ignores Timezone #4457 [arrow]

Merged pull requests:

44.0.0 (2023-07-14)

Full Changelog

Breaking changes:

Implemented enhancements:

Fixed bugs:

  • Parquet: AsyncArrowWriter to a file corrupts the footer for large columns #4526 [parquet]
  • object\_store
  • Cannot cast string '2021-01-02' to value of Date64 type #4512 [arrow]
  • Incorrect Interval Subtraction #4489 [arrow]
  • Interval Negation Incorrect #4488 [arrow]
  • Parquet: AsyncArrowWriter inner buffer is not correctly limited and causes OOM #4477 [parquet]

Merged pull requests:

43.0.0 (2023-06-30)

Full Changelog

Breaking changes:

Implemented enhancements:

Fixed bugs:

  • Regression in in parquet 42.0.0 : Bad parquet column indexes for All Null Columns, resulting in Parquet error: StructArrayReader out of sync on read #4459 [parquet]
  • Regression in 42.0.0: Parsing fractional intervals without leading 0 is not supported #4424 [arrow]

Documentation updates:

Merged pull requests:

42.0.0 (2023-06-16)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Add doc example of constructing a MapArray #4385 [arrow]
  • Support millisecond and microsecond functions #4374 [arrow]
  • Changed array_to_json_array to take &dyn Array #4369 [arrow]
  • compute::ord kernel for getting min and max of two scalar/array values #4347 [arrow]
  • Release 41.0.0 of arrow/arrow-flight/parquet/parquet-derive #4346
  • Refactor CAST tests to use new cast array syntax #4336 [arrow]
  • pass bytes directly to parquet's KeyValue #4317
  • PyArrow conversions could return TypeError if provided incorrect Python type #4312 [arrow]
  • Have array_to_json_array support Map #4297 [arrow]
  • FlightSQL: Add helpers to create CommandGetXdbcTypeInfo responses `XdbcInfoValue` and builders #4257 [arrow] [arrow-flight]
  • Have array_to_json_array support FixedSizeList #4248 [arrow]
  • Truncate ColumnIndex ByteArray Statistics #4126 [parquet]
  • Arrow compute kernel regards selection vector #4095 [arrow]

Fixed bugs:

  • Wrongly calculated data compressed length in IPC writer #4410 [arrow]
  • Take Kernel Handles Nullable Indices Incorrectly #4404 [arrow]
  • StructBuilder::new Doesn't Validate Builder DataTypes #4397 [arrow]
  • Parquet error: Not all children array length are the same! when using RowSelection to read a parquet file #4396
  • RecordReader::skip_records Is Incorrect for Repeated Columns #4368 [parquet]
  • List-of-String Array panics in the presence of row filters #4365 [parquet]
  • Fail to read block compressed gzip files with parquet-fromcsv #4173 [parquet]

Closed issues:

  • Have a parquet file not able to be deduped via arrow-rs, complains about Decimal precision? #4356
  • Question: Could we move dict_id, dict_is_ordered into DataType? #4325

Merged pull requests:

41.0.0 (2023-06-02)

Full Changelog

Breaking changes:

Implemented enhancements:

Fixed bugs:

  • Doc for arrow_flight::sql is missing enums that are Xdbc related #4339 [arrow] [arrow-flight]
  • concat_batches panics with total_len <= bit_len assertion for records with lists #4324 [arrow]
  • Incorrect PageMetadata Row Count returned for V1 DataPage #4321 [parquet]
  • parquet
  • ambiguous glob re-exports of contains_utf8 #4289 [parquet] [arrow]
  • flight_sql_client --header "key: value" yields a value with a leading whitespace #4270 [arrow] [arrow-flight]
  • Casting Timestamp to date is off by one day for dates before 1970-01-01 #4211 [arrow]

Merged pull requests:

40.0.0 (2023-05-19)

Full Changelog

Breaking changes:

Implemented enhancements:

  • ObjectStore with_url Should Handle Path #4199
  • Support Interval +/- Interval #4178 [arrow]
  • parquet
  • Allow cast to take in a format specification #4168 [arrow]
  • Support extended pow arithmetic #4166 [arrow]
  • Preload page index for async ParquetObjectReader #4090 [parquet]

Fixed bugs:

  • Subtracting Timestamp from Timestamp should produce a Duration not `Timestamp` #3964 [arrow]

Merged pull requests:

39.0.0 (2023-05-05)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Release 39.0.0 of arrow/arrow-flight/parquet/parquet-derive next release after 38.0.0 #4170 [arrow] [arrow-flight]
  • Fixed point decimal multiplication for DictionaryArray #4135 [arrow]
  • Remove Seek Requirement from CSV ReaderBuilder #4130 [parquet] [arrow]
  • Inconsistent CSV Inference and Parsing DateTime Handling #4129 [parquet] [arrow]
  • Support accessing ipc Reader/Writer inner by reference #4121
  • Add Type Declarations for All Primitive Tensors and Buffer Builders #4112 [arrow]
  • Support Interval + Timestamp and Interval + Date in addition to Timestamp + Interval and Interval + Date #4094 [arrow]
  • Enable setting FlightDescriptor on FlightDataEncoderBuilder #3855 [arrow] [arrow-flight]

Fixed bugs:

  • Parquet Page Index Reader Assumes Consecutive Offsets #4149 [parquet]
  • Equality of nested data types #4110 [arrow]

Documentation updates:

  • Improve Documentation of Parquet ChunkReader #4118

Closed issues:

  • add specific error log for empty JSON array #4105 [arrow]

Merged pull requests:

38.0.0 (2023-04-21)

Full Changelog

Breaking changes:

Implemented enhancements:

Fixed bugs:

  • Update readme to remove reference to Jira #4091
  • OffsetBuffer::new Rejects 0 Offsets #4066 [arrow]
  • Parquet AsyncArrowWriter not shutting down inner async writer. #4058 [parquet]
  • Flight SQL Server missing command type.googleapis.com/arrow.flight.protocol.sql.CommandGetXdbcTypeInfo #4054 [arrow] [arrow-flight]
  • RawJsonReader Errors with Empty Schema #4053 [parquet] [arrow]
  • RawJsonReader Integer Truncation #4049 [arrow]
  • Sparse UnionArray Equality Incorrect Offset Handling #4044 [arrow]

Documentation updates:

  • Write blog about improvements in JSON and CSV processing #4062 [arrow]

Closed issues:

  • Parquet reader of Int96 columns and coercion to timestamps #4075
  • Serializing timestamp from int json raw decoder #4069 [arrow]
  • Support casting to/from Interval and Duration #3998 [arrow]

Merged pull requests:

37.0.0 (2023-04-07)

Full Changelog

Breaking changes:

Implemented enhancements:

Fixed bugs:

  • Incorrect Overflow Casting String to Timestamp #4033
  • f16::ZERO and f16::ONE are mixed up #4016 [arrow]
  • Handle overflow precision when casting from integer to decimal #3995 [arrow]
  • PrimitiveDictionaryBuilder.finish should use actual value type #3971 [arrow]
  • RecordBatch From StructArray Silently Discards Nulls #3952 [parquet] [arrow]
  • I256 Checked Subtraction Overflows for i256::MINUS_ONE #3942 [arrow]
  • I256 Checked Multiply Overflows for i256::MIN #3941 [arrow]

Closed issues:

Merged pull requests:

36.0.0 (2023-03-24)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Improve speed of parsing string to Times #3919 [arrow]
  • feat: add comparison/sort support for Float16 #3914
  • Pinned version in arrow-flight's build-dependencies are causing conflicts #3876
  • Add compression options levels #3844 [parquet] [arrow]
  • Use Unsigned Integer for Fixed Size DataType #3815
  • Common trait for RecordBatch and StructArray #3764 [arrow]
  • Allow precision loss on multiplying decimal arrays #3689 [arrow]

Fixed bugs:

  • Raw JSON Reader Allows Non-Nullable Struct Children to Contain Nulls #3904
  • Nullable field with nested not nullable map in json #3900
  • parquet_derive doesn't support Vec<u8> #3864 [parquet]
  • REGRESSION
  • REGRESSION
  • REGRESSION
  • CSV Reader Doesn't set Timezone #3841
  • PyArrowConvert Leaks Memory #3683 [arrow]

Merged pull requests:

35.0.0 (2023-03-10)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Support timestamp/time and date types in json decoder #3834 [arrow]
  • Support decoding decimals in new raw json decoder #3819 [arrow]
  • Timezone Aware Timestamp Parsing #3794 [arrow]
  • Preallocate buffers for FixedSizeBinary array creation #3792 [arrow]
  • Make Parquet CLI args consistent #3785 [parquet]
  • Creates PrimitiveDictionaryBuilder from provided keys and values builders #3776 [arrow]
  • Use NullBuffer in ArrayData #3775 [parquet] [arrow]
  • Support unary_dict_mut in arth #3710 [arrow]
  • Support cast <> String to interval #3643 [arrow]
  • Support Zero-Copy Conversion from Vec to/from MutableBuffer #3516 [arrow]

Fixed bugs:

  • Timestamp Unit Casts are Unchecked #3833 [arrow]
  • regexp_match skips first match when returning match #3803 [arrow]
  • Cast to timestamp with time zone returns timestamp #3800 [arrow]
  • Schema-level metadata is not encoded in Flight responses #3779 [arrow] [arrow-flight]

Closed issues:

Merged pull requests:

34.0.0 (2023-02-24)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Support casting string to timestamp with microsecond resolution #3751
  • Add datatime/interval/duration into comparison kernels #3729 [arrow]
  • ! not operator overload for SortOptions #3726 [arrow]
  • parquet: convert Bytes to ByteArray directly #3719 [parquet]
  • Implement simple RecordBatchReader #3704
  • Is possible to implement GenericListArray::from_iter ? #3702
  • take_run improvements #3701 [arrow]
  • Support as_mut_any in Array trait #3655
  • Array --> Display formatter that supports more options and is configurable #3638 [parquet] [arrow]
  • arrow-csv: support decimal256 #3474 [arrow]

Fixed bugs:

  • CSV reader infers Date64 type for fields like "2020-03-19 00:00:00" that it can't parse to Date64 #3744 [arrow]

Merged pull requests:

33.0.0 (2023-02-10)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Support UTF8 cast to Timestamp with timezone #3664
  • Add modulus_dyn and modulus_scalar_dyn #3648 [arrow]
  • A trait for append_value and append_null on ArrayBuilders #3644
  • Improve error message "batches[0] schema is different with argument schema" #3628 [arrow]
  • Specified version of helper function to cast binary to string #3623 [arrow]
  • Casting generic binary to generic string #3606 [arrow]
  • Use array_value_to_string in arrow-csv #3483 [arrow]

Fixed bugs:

  • ArrowArray::try_from_raw Misleading Signature #3684 [arrow]
  • PyArrowConvert Leaks Memory #3683 [arrow]
  • Arrow-csv reader cannot produce RecordBatch even if the bytes are necessary #3674
  • FFI Fails to Account For Offsets #3671 [arrow]
  • Regression in CSV reader error handling #3656 [arrow]
  • UnionArray Child and Value Fail to Account for non-contiguous Type IDs #3653 [arrow]
  • Panic when accessing RecordBatch from pyarrow #3646 [arrow]
  • Multiplication for decimals is incorrect #3645
  • Inconsistent output between pretty print and CSV writer for Arrow #3513 [arrow]

Closed issues:

  • Release 33.0.0 of arrow/arrow-flight/parquet/parquet-derive next release after 32.0.0 #3682
  • Release 32.0.0 of arrow/arrow-flight/parquet/parquet-derive next release after `31.0.0` #3584 [parquet] [arrow] [arrow-flight]

Merged pull requests:

32.0.0 (2023-01-27)

Full Changelog

Breaking changes:

Implemented enhancements:

  • There should be a From<Vec<Option<String>>> impl for GenericStringArray<OffsetSize> #3599 [arrow]
  • FlightDataEncoder Optionally send Schema even when no record batches #3591 [arrow-flight]
  • Use Native Types in PageIndex #3575 [parquet]
  • Packing array into dictionary of generic byte array #3571 [arrow]
  • Implement Error::Source for ArrowError and FlightError #3566 [arrow] [arrow-flight]
  • FlightSQL
  • Arrow CSV writer should not fail when cannot cast the value #3547 [arrow]
  • Write Deprecated Min Max Statistics When ColumnOrder Signed #3526 [parquet]
  • Improve Performance of JSON Reader #3441
  • Support footer kv metadata for IPC file #3432
  • Add External variant to ParquetError #3285 [parquet]

Fixed bugs:

  • Nullif of NULL Predicate is not NULL #3589
  • BooleanBufferBuilder Fails to Clear Set Bits On Truncate #3587 [arrow]
  • nullif incorrectly calculates null_count, sometimes panics with subtraction overflow error #3579 [arrow]
  • Meet warning when use pyarrow #3543 [arrow]
  • Incorrect row group total_byte_size written to parquet file #3530 [parquet]
  • Overflow when casting timestamps prior to the epoch #3512 [arrow]

Closed issues:

  • Panic on Key Overflow in Dictionary Builders #3562 [parquet] [arrow]
  • Bumping version gives compilation error arrow-array #3525

Merged pull requests:

31.0.0 (2023-01-13)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Support casting Date32 to timestamp #3504 [arrow]
  • Support casting strings like '2001-01-01' to timestamp #3492 [arrow]
  • CLI to "rewrite" parquet files #3476 [parquet]
  • Add more dictionary value type support to build_compare #3465
  • Allow concat_batches to take non owned RecordBatch #3456 [arrow]
  • Release Arrow 30.0.1 maintenance release for `30.0.0` #3455
  • Add string comparisons starts\_with, ends\_with, and contains to kernel #3442 [arrow]
  • make_builder Loses Timezone and Decimal Scale Information #3435 [arrow]
  • Use RFC3339 style timestamps in arrow-json #3416 [arrow]
  • ArrayDataget_slice_memory_size or similar #3407 [arrow] [arrow-flight]

Fixed bugs:

  • Unable to read CSV with null boolean value #3521 [arrow]
  • Make consistent behavior on zeros equality on floating point types #3509
  • Sliced batch w/ bool column doesn't roundtrip through IPC #3496 [arrow] [arrow-flight]
  • take kernel on List array introduces nulls instead of empty lists #3471 [arrow]
  • Infinite Loop If Skipping More CSV Lines than Present #3469 [arrow]

Merged pull requests:

30.0.1 (2023-01-04)

Full Changelog

Implemented enhancements:

Fixed bugs:

  • nullif kernel no longer exported #3454 [arrow]
  • PrimitiveArray from ArrayData Unsound For IntervalArray #3439 [arrow]
  • LZ4-compressed PQ files unreadable by Pandas and ClickHouse #3433 [parquet]
  • Parquet Record API: Cannot convert date before Unix epoch to json #3430 [parquet]
  • parquet-fromcsv with writer version v2 does not stop #3408 [parquet]

30.0.0 (2022-12-29)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Add derived implementations of Clone and Debug for ParquetObjectReader #3381 [parquet]
  • Speed up TrackedWrite #3366 [parquet]
  • Is it possible for ArrowWriter to write key_value_metadata after write all records #3356 [parquet]
  • Add UnionArray test to arrow-pyarrow integration test #3346
  • Document / Deprecate arrow_flight::utils::flight_data_from_arrow_batch #3312 [arrow] [arrow-flight]
  • FlightSQL
  • Support UnionArray in ffi #3304 [arrow]
  • Add support for Azure Data Lake Storage Gen2 aka: ADLS Gen2 in Object Store library #3283
  • Support casting from String to Decimal #3280 [arrow]
  • Allow ArrowCSV writer to control the display of NULL values #3268 [arrow]

Fixed bugs:

  • FlightSQL example is broken #3386 [arrow-flight]
  • CSV Reader Bounds Incorrectly Handles Header #3364 [arrow]
  • Incorrect output string from try_to_type #3350
  • Decimal arithmetic computation fails to run because decimal type equality #3344 [arrow]
  • Pretty print not implemented for Map #3322 [arrow]
  • ILIKE Kernels Inconsistent Case Folding #3311 [arrow]

Documentation updates:

Merged pull requests:

29.0.0 (2022-12-09)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Support writing BloomFilter in arrow_writer #3275 [parquet]
  • Support casting from unsigned numeric to Decimal256 #3272 [arrow]
  • Support casting from Decimal256 to float types #3266 [arrow]
  • Make arithmetic kernels supports DictionaryArray of DecimalType #3254 [arrow]
  • Casting from Decimal256 to unsigned numeric #3239 [arrow]
  • precision is not considered when cast value to decimal #3223 [arrow]
  • Use RegexSet in arrow_csv::infer_field_schema #3211 [arrow]
  • Implement FlightSQL Client #3206 [arrow-flight]
  • Add binary_mut and try_binary_mut #3143 [arrow]
  • Add try_unary_mut #3133 [arrow]

Fixed bugs:

  • Skip null buffer when importing FFI ArrowArray struct if no null buffer in the spec #3290 [arrow]
  • using ahash compile-time-rng kills reproducible builds #3271 [parquet]
  • Decimal128 to Decimal256 Overflows #3265 [arrow]
  • nullif panics on empty array #3261 [arrow]
  • Some more inconsistency between can_cast_types and cast_with_options #3250 [arrow]
  • Enable casting between Dictionary of DecimalArray and DecimalArray #3237 [arrow]
  • new_null_array Panics creating StructArray with non-nullable fields #3226 [arrow]
  • bool should cast from/to Float16Type as can_cast_types returns true #3221 [arrow]
  • Utf8 and LargeUtf8 cannot cast from/to Float16 but can_cast_types returns true #3220 [arrow]
  • Re-enable some tests in arrow-cast crate #3219 [arrow]
  • Off-by-one buffer size error triggers Panic when constructing RecordBatch from IPC bytes should return an Error #3215 [arrow]
  • arrow to and from pyarrow conversion results in changes in schema #3136 [arrow]

Documentation updates:

  • better document when we need LargeUtf8 instead of Utf8 #3228 [arrow]

Merged pull requests:

28.0.0 (2022-11-25)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Add iterator to RowSelection #3172 [parquet]
  • create an integration test set for parquet crate against pyspark for working with bloom filters #3167 [parquet]
  • Row Format Size Tracking #3160 [arrow]
  • Add ArrayBuilder::finish_cloned() #3154 [arrow]
  • Optimize memory usage of json reader #3150
  • Add Field::size and DataType::size #3147 [parquet] [arrow]
  • Add like_utf8_scalar_dyn kernel #3145 [arrow]
  • support comparison for decimal128 array with scalar in kernel #3140 [arrow]
  • audit and create a document for bloom filter configurations #3138 [parquet]
  • Should be the rounding vs truncation when cast decimal to smaller scale #3137 [arrow]
  • Upgrade chrono to 0.4.23 #3120
  • Implements more temporal kernels using time_fraction_dyn #3108 [arrow]
  • Upgrade to thrift 0.17 #3105 [parquet] [arrow]
  • Be able to parse time formatted strings #3100 [arrow]
  • Improve "Fail to merge schema" error messages #3095 [arrow]
  • Expose SortingColumn when reading and writing parquet metadata #3090 [parquet]
  • Change Field::metadata to HashMap #3086 [parquet] [arrow]
  • Support bloom filter reading and writing for parquet #3023 [parquet]
  • API to take back ownership of an ArrayRef #2901 [arrow]
  • Specialized Interleave Kernel #2864 [arrow]

Fixed bugs:

  • arithmetic overflow leads to segfault in concat_batches #3123 [arrow]
  • Clippy failing on master : error: use of deprecated associated function chrono::NaiveDate::from_ymd: use from_ymd_opt() instead #3097 [parquet] [arrow]
  • Pretty print for interval types has wrong formatting #3092 [arrow]
  • Field is not serializable with binary formats #3082 [arrow]
  • Decimal Casts are Unchecked #2986 [arrow]

Closed issues:

Merged pull requests:

27.0.0 (2022-11-11)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Row Format: Option to detach/own a row #3078 [arrow]
  • Row Format: API to check if datatypes are supported #3077 [arrow]
  • Deprecate Buffer::count_set_bits #3067 [arrow]
  • Add Decimal128 and Decimal256 to downcast_primitive #3055 [arrow]
  • Improved UX of creating TimestampNanosecondArray with timezones #3042 [arrow]
  • Cast decimal256 to signed integer #3039 [arrow]
  • Support casting Date64 to Timestamp #3037 [arrow]
  • Check overflow when casting floating point value to decimal256 #3032 [arrow]
  • Compare i256 in validate_decimal256_precision #3024 [arrow]
  • Check overflow when casting floating point value to decimal128 #3020 [arrow]
  • Add macro downcast_temporal_array #3008 [arrow]
  • Replace hour_generic with hour_dyn #3005 [arrow]
  • Replace temporal _generic kernels with dyn #3004 [arrow]
  • Add RowSelection::intersection #3003 [parquet]
  • I would like to round rather than truncate when casting f64 to decimal #2997 [arrow]
  • arrow::compute::kernels::temporal should support nanoseconds #2995 [arrow]
  • Release Arrow 26.0.0 next release after `25.0.0` #2953 [parquet] [arrow] [arrow-flight]
  • Add timezone offset for debug format of Timestamp with Timezone #2917 [arrow]
  • Support merge RowSelectors when creating RowSelection #2858 [parquet]

Fixed bugs:

  • Inconsistent Nan Handling Between Scalar and Non-Scalar Comparison Kernels #3074 [arrow]
  • Debug format for timestamp ignores timezone #3069 [arrow]
  • Row format decode loses timezone #3063 [arrow]
  • binary operator produces incorrect result on arrays with resized null buffer #3061 [arrow]
  • RLEDecoder Panics on Null Padded Pages #3035 [parquet]
  • Nullif with incorrect valid_count #3031 [arrow]
  • RLEDecoder::get_batch_with_dict may panic on bit-packed runs longer than 1024 #3029 [parquet]
  • Converted type is None according to Parquet Tools then utilizing logical types #3017
  • CompressionCodec LZ4 incompatible with C++ implementation #2988 [parquet]

Documentation updates:

Merged pull requests:

26.0.0 (2022-10-28)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Optimized way to count the numbers of true and false values in a BooleanArray #2963 [arrow]
  • Add pow to i256 #2954 [arrow]
  • Write Generic Code over [Large]BinaryArray and [Large]StringArray #2946 [arrow]
  • Add Page Row Count Limit #2941 [parquet]
  • prettyprint to show timezone offset for timestamp with timezone #2937 [arrow]
  • Cast numeric to decimal256 #2922 [arrow]
  • Add freeze_with_dictionary API to MutableArrayData #2914 [arrow]
  • Support decimal256 array in sort kernels #2911 [arrow]
  • support [+/-]hhmm and [+/-]hh as fixedoffset timezone format #2910 [arrow]
  • Cleanup decimal sort function #2907 [arrow]
  • replace from_timestamp by from_timestamp_opt #2892 [arrow]
  • Move Primitive arity kernels to arrow-array #2787 [arrow]
  • add overflow-checking for negative arithmetic kernel #2662 [arrow]

Fixed bugs:

  • Subtle compatibility issue with serve_arrow #2952
  • error[E0599]: no method named total_cmp found for struct f16 in the current scope #2926 [arrow]
  • Fail at rowSelection and_then method #2925 [parquet]
  • Ordering not implemented for FixedSizeBinary types #2904 [arrow]
  • Parquet API: Could not convert timestamp before unix epoch to string/json #2897 [parquet]
  • Overly Pessimistic RLE Size Estimation #2889 [parquet]
  • Memory alignment error in RawPtrBox::new #2882 [arrow]
  • Compilation error under chrono-tz feature #2878 [arrow]
  • AHash Statically Allocates 64 bytes #2875 [parquet]
  • parquet::arrow::arrow_writer::ArrowWriter ignores page size properties #2853 [parquet]

Documentation updates:

Closed issues:

  • SerializedFileWriter comments about multiple call on consumed self #2935 [parquet]
  • Pointer freed error when deallocating ArrayData with shared memory buffer #2874
  • Release Arrow 25.0.0 next release after `24.0.0` #2820 [parquet] [arrow] [arrow-flight]
  • Replace DecimalArray with PrimitiveArray #2637 [parquet] [arrow]

Merged pull requests:

25.0.0 (2022-10-14)

Full Changelog

Breaking changes:

Implemented enhancements:

Fixed bugs:

  • Don't try to infer nulls in CSV schema inference #2859 [arrow]
  • parquet::arrow::arrow_writer::ArrowWriter ignores page size properties #2853 [parquet]
  • Introducing ArrowNativeTypeOp made it impossible to call kernels from generics #2839 [arrow]
  • Unsound ArrayData to Array Conversions #2834 [parquet] [arrow]
  • Regression: the trait bound for<'de> arrow::datatypes::Schema: serde::de::Deserialize<'de> is not satisfied #2825 [arrow]
  • convert string to timestamp shouldn't apply local timezone offset if there's no explicit timezone info in the string #2813 [arrow]

Closed issues:

  • Add pub api for checking column index is sorted #2848 [parquet]

Merged pull requests:

24.0.0 (2022-09-30)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Include field name in Parquet PrimitiveTypeBuilder error messages #2804 [parquet]
  • Add PrimitiveArray::reinterpret_cast #2785
  • BinaryBuilder and StringBuilder initialization parameters in struct_builder may be wrong #2783 [arrow]
  • Add divide scalar dyn kernel which produces null for division by zero #2767 [arrow]
  • Add divide dyn kernel which produces null for division by zero #2763 [arrow]
  • Improve performance of checked kernels on non-null data #2747 [arrow]
  • Add overflow-checking variants of arithmetic dyn kernels #2739 [arrow]
  • The binary function should not panic on unequaled array length. #2721 [arrow]

Fixed bugs:

  • min compute kernel is incorrect with sliced buffers in arrow 23 #2779 [arrow]
  • try_unary_dict should check value type of dictionary array #2754 [arrow]

Closed issues:

  • Add back JSON import/export for schema #2762
  • null casting and coercion for Decimal128 #2761
  • Json decoder behavior changed from versions 21 to 21 and returns non-sensical num_rows for RecordBatch #2722 [arrow]
  • Release Arrow 23.0.0 next release after `22.0.0` #2665 [parquet] [arrow] [arrow-flight]

Merged pull requests:

23.0.0 (2022-09-16)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Cleanup like and nlike utf8 kernels #2744 [arrow]
  • Speedup eq and neq kernels for utf8 arrays #2742 [arrow]
  • API for more ergonomic construction of RecordBatchOptions #2728 [arrow]
  • Automate updates to CHANGELOG-old.md #2726
  • Don't check the DivideByZero error for float modulus #2720 [arrow]
  • try_binary should not panic on unequaled array length. #2715 [arrow]
  • Add benchmark for bitwise operation #2714 [arrow]
  • Add overflow-checking variants of arithmetic scalar dyn kernels #2712 [arrow]
  • Add divide_opt kernel which produce null values on division by zero error #2709 [arrow]
  • Add DataType function to detect nested types #2704 [arrow]
  • Add support of sorting dictionary of other primitive types #2700 [arrow]
  • Sort indices of dictionary string values #2697 [arrow]
  • Support empty projection in RecordBatch::project #2690 [arrow]
  • Support sorting dictionary encoded primitive integer arrays #2679 [arrow]
  • Use BitIndexIterator in min_max_helper #2674 [arrow]
  • Support building comparator for dictionaries of primitive integer values #2672 [arrow]
  • Change max/min string macro to generic helper function min_max_helper #2657 [arrow]
  • Add overflow-checking variant of arithmetic scalar kernels #2651 [arrow]
  • Compare dictionary with binary array #2644 [arrow]
  • Add overflow-checking variant for primitive arithmetic kernels #2642 [arrow]
  • Use downcast_primitive_array in arithmetic kernels #2639 [arrow]
  • Support DictionaryArray in temporal kernels #2622 [arrow]
  • Inline Generated Thift Code Into Parquet Crate #2502 [parquet]

Fixed bugs:

  • Escape contains patterns for utf8 like kernels #2745 [arrow]
  • Float Array should not panic on DivideByZero in the Divide kernel #2719 [arrow]
  • DictionaryBuilders can Create Invalid DictionaryArrays #2684 [parquet] [arrow]
  • arrow crate does not build with features = ["ffi"] and default_features = false. #2670 [arrow]
  • Invalid results with RowSelector having row_count of 0 #2669 [parquet]
  • clippy error: unresolved import crate::array::layout #2659 [arrow]
  • Cast the numeric without the CastOptions #2648 [arrow]
  • Explicitly define overflow behavior for primitive arithmetic kernels #2641 [arrow]
  • update the flight.proto and fix schema to SchemaResult #2571 [arrow] [arrow-flight]
  • Panic when first data page is skipped using ColumnChunkData::Sparse #2543 [parquet]
  • SchemaResult in IPC deviates from other implementations #2445 [arrow] [arrow-flight]

Closed issues:

Merged pull requests:

22.0.0 (2022-09-02)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Add Macros to assist with static dispatch #2635 [arrow]
  • Support comparison between DictionaryArray and BooleanArray #2617 [arrow]
  • Use total_cmp for floating value ordering and remove nan_ordering feature flag #2613 [arrow]
  • Support empty projection in CSV, JSON readers #2603 [arrow]
  • Support SQL-compliant NaN ordering between for DictionaryArray and non-DictionaryArray #2599 [arrow]
  • Add dyn_cmp_dict feature flag to gate dyn comparison of dictionary arrays #2596 [arrow]
  • Add max_dyn and min_dyn for max/min for dictionary array #2584 [arrow]
  • Allow FlightSQL implementers to extend do_get() #2581 [arrow-flight]
  • Support SQL-compliant behavior on eq_dyn, neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn #2569 [arrow]
  • Add sql-compliant feature for enabling sql-compliant kernel behavior #2568
  • Calculate sum for dictionary array #2565 [arrow]
  • Add test for float nan comparison #2556 [arrow]
  • Compare dictionary with string array #2548 [arrow]
  • Compare dictionary with primitive array in lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn #2538 [arrow]
  • Compare dictionary with primitive array in eq_dyn and neq_dyn #2535 [arrow]
  • UnionBuilder Create Children With Capacity #2523 [arrow]
  • Speed up like_utf8_scalar for %pat% #2519 [arrow]
  • Replace macro with TypedDictionaryArray in comparison kernels #2513 [arrow]
  • Use same codebase for boolean kernels #2507 [arrow]
  • Use u8 for Decimal Precision and Scale #2496 [arrow]
  • Integrate skip row without pageIndex in SerializedPageReader in Fuzz Test #2475 [parquet]
  • Avoid unnecessary copies in Arrow IPC reader #2437 [arrow]
  • Add GenericColumnReader::skip_records Missing OffsetIndex Fallback #2433 [parquet]
  • Support Reading PageIndex with ParquetRecordBatchStream #2430 [parquet]
  • Specialize FixedLenByteArrayReader for Parquet #2318 [parquet]
  • Make JSON support Optional via Feature Flag #2300 [arrow]

Fixed bugs:

  • Casting timestamp array to string should not ignore timezone #2607 [arrow]
  • Ilike_ut8_scalar kernels have incorrect logic #2544 [arrow]
  • Always validate the array data when creating array in IPC reader #2541 [arrow]
  • Int96Converter Truncates Timestamps #2480 [parquet]
  • Error Reading Page Index When Not Available #2434 [parquet]
  • ParquetFileArrowReader::get_record_reader[_by_column] batch_size overallocates #2321 [parquet]

Documentation updates:

  • Document All Arrow Features in docs.rs #2633 [arrow]

Closed issues:

  • Add support for CAST from Interval(DayTime) to Timestamp(Nanosecond, None) #2606 [arrow]
  • Why do we check for null in TypedDictionaryArray value function #2564 [arrow]
  • Add the length field for Buffer #2524 [arrow]
  • Avoid large over allocate buffer in async reader #2512 [parquet]
  • Rewriting Decimal Builders using const_generic. #2390 [arrow]
  • Rewrite Decimal Array using const_generic #2384 [arrow]

Merged pull requests:

21.0.0 (2022-08-18)

Full Changelog

Breaking changes:

Implemented enhancements:

  • add into_inner method to ArrowWriter #2491 [parquet]
  • Remove byteorder dependency #2472 [parquet]
  • Return Structured ColumnCloseResult from GenericColumnWriter::close #2465 [parquet]
  • Push ChunkReader into SerializedPageReader #2463 [parquet]
  • Support SerializedPageReader::skip_page without OffsetIndex #2459 [parquet]
  • Support Time64/Time32 comparison #2457 [arrow]
  • Revise FromIterator for Decimal128Array to use Into instead of Borrow #2441 [parquet]
  • Support RowFilter withinParquetRecordBatchReader #2431 [parquet]
  • Remove the field StructBuilder::len #2429 [arrow]
  • Standardize creation and configuration of parquet --> Arrow readers `ParquetRecordBatchReaderBuilder` #2427 [parquet]
  • Use OffsetIndex to Prune IO in ParquetRecordBatchStream #2426 [parquet]
  • Support peek_next_page and skip_next_page in InMemoryPageReader #2406 [parquet]
  • Support casting from Utf8/LargeUtf8 to Binary/LargeBinary #2402 [arrow]
  • Support casting between Decimal128 and Decimal256 arrays #2375 [arrow]
  • Combine multiple selections into the same batch size in skip_records #2358 [parquet]
  • Add API to change timezone for timestamp array #2346 [arrow]
  • Change the output of read_buffer Arrow IPC API to return Result<_> #2342 [arrow]
  • Allow skip_records in GenericColumnReader to skip across row groups #2331 [parquet]
  • Optimize the validation of Decimal256 #2320 [arrow]
  • Implement Skip for DeltaBitPackDecoder #2281 [parquet]
  • Changes to ParquetRecordBatchStream to support row filtering in DataFusion #2270 [parquet]
  • Add ArrayReader::skip_records API #2197 [parquet]

Fixed bugs:

  • Panic in SerializedPageReader without offset index #2503 [parquet]
  • MapArray columns don't handle null values correctly #2484 [arrow]
  • There is no compiler error when using an invalid Decimal type. #2440 [arrow]
  • Flight SQL Server sends incorrect response for DoPutUpdateResult #2403 [arrow-flight]
  • AsyncFileReaderNo Longer Object-Safe #2372 [parquet]
  • StructBuilder Does not Verify Child Lengths #2252 [arrow]

Closed issues:

Merged pull requests:

20.0.0 (2022-08-05)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Add the constant data type constructors for ListArray #2311 [arrow]
  • Update FlightSqlService trait to pass session info along #2308 [arrow-flight]
  • Optimize take_bits for non-null indices #2306 [arrow]
  • Make FFI support optional via Feature Flag ffi #2302 [arrow]
  • Mark ffi::ArrowArray::try_new is safe #2301 [arrow]
  • Remove test_utils from default arrow-rs features #2298 [arrow]
  • Remove JsonEqual trait #2296 [arrow]
  • Move with_precision_and_scale to Decimal array traits #2291 [arrow]
  • Improve readability and maybe performance of string --> numeric/time/date/timetamp cast kernels #2285 [arrow]
  • Add vectorized unpacking for 8, 16, and 64 bit integers #2276 [parquet]
  • Use initial capacity for interner hashmap #2273 [arrow]
  • Impl FromIterator for Decimal256Array #2248 [arrow]
  • Separate ArrayReader::next_batchwith ArrayReader::read_records and ArrayReader::consume_batch #2236 [parquet]
  • Rename DataType::Decimal to DataType::Decimal128 #2228 [arrow]
  • Automatically Grow Parquet BitWriter Buffer #2226 [parquet]
  • Add append_option support to Decimal128Builder and Decimal256Builder #2224 [arrow]
  • Split the FixedSizeBinaryArray and FixedSizeListArray from array_binary.rs and array_list.rs #2217 [arrow]
  • Don't Box Values in PrimitiveDictionaryBuilder #2215 [arrow]
  • Use BitChunks in equal_bits #2186 [arrow]
  • Implement Hash for Schema #2182 [arrow]
  • read decimal data type from parquet file with binary physical type #2159 [parquet]
  • The GenericStringBuilder should use GenericBinaryBuilder #2156 [arrow]
  • Update Rust version to 1.62 #2143 [parquet] [arrow] [arrow-flight]
  • Check precision and scale against maximum value when constructing Decimal128 and Decimal256 #2139 [arrow]
  • Use ArrayAccessor in Decimal128Iter and Decimal256Iter #2138 [arrow]
  • Use ArrayAccessor and FromIterator in Cast Kernels #2137 [arrow]
  • Add TypedDictionaryArray for more ergonomic interaction with DictionaryArray #2136 [arrow]
  • Use ArrayAccessor in Comparison Kernels #2135 [arrow]
  • Support peek_next_page() and skip_next_page in InMemoryColumnChunkReader #2129 [parquet]
  • Lazily materialize the null buffer builder for all array builders. #2125 [arrow]
  • Do value validation for Decimal256 #2112 [arrow]
  • Support skip_def_levels for ColumnLevelDecoder #2107 [parquet]
  • Add integration test for scan rows with selection #2106 [parquet]
  • Support for casting from Utf8/String to Time32 / Time64 #2053 [arrow]
  • Update prost and tonic related crates #2268 [arrow-flight] (carols10cents)

Fixed bugs:

  • temporal conversion functions cannot work on negative input properly #2325 [arrow]
  • IPC writer should truncate string array with all empty string #2312 [arrow]
  • Error order for comparing Decimal128 or Decimal256 #2256 [arrow]
  • Fix maximum and minimum for decimal values for precision greater than 38 #2246 [arrow]
  • IntervalMonthDayNanoType::make_value() does not match C implementation #2234 [arrow]
  • FlightSqlService trait does not allow impls to do handshake #2210 [arrow-flight]
  • EnabledStatistics::None not working #2185 [parquet]
  • Boolean ArrayData Equality Incorrect Slice Handling #2184 [arrow]
  • Publicly export MapFieldNames #2118 [arrow]

Documentation updates:

  • Update instructions on How to join the slack #arrow-rust channel -- or maybe try to switch to discord?? #2192
  • Minor

Performance improvements:

Closed issues:

  • Fix wrong logic in calculate_row_count when skipping values #2328 [parquet]
  • Support filter for parquet data type #2126 [parquet]
  • Make skip value in ByteArrayDecoderDictionary avoid decoding #2088 [parquet]

Merged pull requests:

19.0.0 (2022-07-22)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Use total_cmp from std #2130 [arrow]
  • Permit parallel fetching of column chunks in ParquetRecordBatchStream #2110 [parquet]
  • The GenericBinaryBuilder should use buffer builders directly. #2104 [arrow]
  • Pass generate_decimal256_case arrow integration test #2093 [arrow]
  • Rename weekday and weekday0 kernels to to num_days_from_monday and days_since_sunday #2065 [arrow]
  • Improve performance of filter_dict #2062 [arrow]
  • Improve performance of set_bits #2060 [arrow]
  • Lazily materialize the null buffer builder of BooleanBuilder #2058 [arrow]
  • BooleanArray::from_iter should omit validity buffer if all values are valid #2055 [arrow]
  • FFI_ArrowSchema should set DICTIONARY_ORDERED flag if a field's dictionary is ordered #2049 [arrow]
  • Support peek_next_page() and skip_next_page in SerializedPageReader #2043 [parquet]
  • Support FFI / C Data Interface for MapType #2037 [arrow]
  • The DecimalArrayBuilder should use FixedSizedBinaryBuilder #2026 [arrow]
  • Enable serialized_reader read specific Page by passing row ranges. #1976 [parquet]

Fixed bugs:

  • type_id and value_offset are incorrect for sliced UnionArray #2086 [arrow]
  • Boolean take kernel does not handle null indices correctly #2057 [arrow]
  • Don't double-count nulls in write_batch_with_statistics #2046 [parquet]
  • Parquet Writer Ignores Statistics specification in WriterProperties #2014 [parquet]

Documentation updates:

  • Improve docstrings + examples for as_primitive_array cast functions #2114 [arrow] (alamb)

Closed issues:

  • Why does serde_json specify the preserve_order feature in arrow package #2095 [arrow]
  • Support skip_values in DictionaryDecoder #2079 [parquet]
  • Support skip_values in ColumnValueDecoderImpl #2078 [parquet]
  • Support skip_values in ByteArrayColumnValueDecoder #2072 [parquet]
  • Several Builder::append methods returning results even though they are infallible #2071
  • Improve formatting of logical plans containing subqueries #2059
  • Return reference from UnionArray::child #2035
  • support write page index #1777 [parquet]

Merged pull requests:

18.0.0 (2022-07-08)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Add DataType::Dictionary support to subtract_scalar, multiply_scalar, divide_scalar #2019 [arrow]
  • Support DictionaryArray in add_scalar kernel #2017 [arrow]
  • Enable column page index read test for all types #2010 [parquet]
  • Simplify FixedSizeBinaryBuilder #2007 [arrow]
  • Support Decimal256Builder and Decimal256Array #1999 [arrow]
  • Support DictionaryArray in unary kernel #1989 [arrow]
  • Add kernel to quickly compute comparisons on Arrays #1987 [arrow]
  • Support DictionaryArray in divide kernel #1982 [arrow]
  • Implement Into<ArrayData> for T: Array #1979 [arrow]
  • Support DictionaryArray in multiply kernel #1972 [arrow]
  • Support DictionaryArray in subtract kernel #1970 [arrow]
  • Declare DecimalArray::length as a constant #1967 [arrow]
  • Support DictionaryArray in add kernel #1950 [arrow]
  • Add builder style methods to Field #1934 [arrow]
  • Make StringDictionaryBuilder faster #1851 [arrow]
  • concat_elements_utf8 should accept arbitrary number of input arrays #1748 [arrow]

Fixed bugs:

  • Array reader for list columns fails to decode if batches fall on row group boundaries #2025 [parquet]
  • ColumnWriterImpl::write_batch_with_statistics incorrect distinct count in statistics #2016 [parquet]
  • ColumnWriterImpl::write_batch_with_statistics can write incorrect page statistics #2015 [parquet]
  • RowFormatter is not part of the public api #2008 [parquet]
  • Infinite Loop possible in ColumnReader::read_batch For Corrupted Files #1997 [parquet]
  • PrimitiveBuilder::finish_dict does not validate dictionary offsets #1978 [arrow]
  • Incorrect n_buffers in FFI_ArrowArray #1959 [arrow]
  • DecimalArray::from_fixed_size_list_array fails when offset > 0 #1958 [arrow]
  • Incorrect but ignored metadata written after ColumnChunk #1946 [parquet]
  • Send + Sync impl for Allocation may not be sound unless Allocation is Send + Sync as well #1944 [arrow]
  • Disallow cast from other datatypes to NullType #1923 [arrow]

Documentation updates:

  • The doc of FixedSizeListArray::value_length is incorrect. #1908 [arrow]

Closed issues:

  • Column chunk statistics of min_bytes and max_bytes return wrong size #2021 [parquet]
  • Discussion
  • Move DecimalArray to a new file #1985 [arrow]
  • Support DictionaryArray in multiply kernel #1974
  • close function instead of mutable reference #1969 [parquet]
  • Incorrect null_count of DictionaryArray #1962 [arrow]
  • Support multi diskRanges for ChunkReader #1955 [parquet]
  • Persisting Arrow timestamps with Parquet produces missing TIMESTAMP in schema #1920 [parquet]
  • Separate get_next_page_header from get_next_page in PageReader #1834 [parquet]

Merged pull requests:

17.0.0 (2022-06-24)

Full Changelog

Breaking changes:

Implemented enhancements:

  • add a small doc example showing ArrowWriter being used with a cursor #1927 [parquet]
  • Support cast to/from NULL and DataType::Decimal #1921 [arrow]
  • Add Decimal256 API #1913 [arrow]
  • Add DictionaryArray::key function #1911 [arrow]
  • Support specifying capacities for ListArrays in MutableArrayData #1884 [arrow]
  • Explicitly declare the features used for each dependency #1876 [parquet] [arrow] [arrow-flight]
  • Add Decimal128 API and use it in DecimalArray and DecimalBuilder #1870 [arrow]
  • PrimitiveArray::from_iter should omit validity buffer if all values are valid #1856 [arrow]
  • Add from(v: Vec<Option<&[u8]>>) and from(v: Vec<&[u8]>) for FixedSizedBInaryArray #1852 [arrow]
  • Add Vec-inspired APIs to BufferBuilder #1850 [arrow]
  • PyArrow integration test for C Stream Interface #1847 [arrow]
  • Add nilike support in comparison #1845 [arrow]
  • Split up arrow::array::builder module #1843 [arrow]
  • Add quarter support in temporal kernels #1835 [arrow]
  • Rename ArrayData::validate_dictionary_offset to ArrayData::validate_values #1812 [arrow]
  • Clean up the testing code for substring kernel #1801 [arrow]
  • Speed up substring_by_char kernel #1800 [arrow]

Fixed bugs:

  • unable to write parquet file with UTC timestamp #1932 [parquet]
  • Incorrect max and min decimals #1916 [arrow]
  • dynamic_types example does not print the projection #1902 [arrow]
  • log2(0) panicked at 'attempt to subtract with overflow', parquet/src/util/bit_util.rs:148:5 #1901 [parquet]
  • Final slicing in combine_option_bitmap needs to use bit slices #1899 [arrow]
  • Dictionary IPC writer writes incorrect schema #1892 [arrow]
  • Creating a RecordBatch with null values in non-nullable fields does not cause an error #1888 [arrow]
  • Upgrade regex dependency #1874 [arrow]
  • Miri reports leaks in ffi tests #1872 [arrow]
  • AVX512 + simd binary and/or kernels slower than autovectorized version #1829 [arrow]

Documentation updates:

  • Blog post about arrow 10.0.0 - 16.0.0 #1808
  • Add README for the compute module. #1940 [arrow] (HaoYang670)
  • minor: clarify docstring on DictionaryArray::lookup_key #1910 [arrow] (alamb)
  • minor: add a diagram to docstring for DictionaryArray #1909 [arrow] (alamb)
  • Closes #1902: Print the original and projected RecordBatch in dynamic_types example #1903 [arrow] (martin-g)

Closed issues:

Merged pull requests:

16.0.0 (2022-06-10)

Full Changelog

Breaking changes:

Implemented enhancements:

  • List equality method should work on empty offset ListArray #1817 [arrow]
  • Command line tool for convert CSV to Parquet #1797 [parquet]
  • IPC writer should write validity buffer for UnionArray in V4 IPC message #1793 [arrow]
  • Add function for row alignment with page mask #1790 [parquet]
  • Rust IPC Read should be able to read V4 UnionType Array #1788 [arrow]
  • combine_option_bitmap should accept arbitrary number of input arrays. #1780 [arrow]
  • Add substring_by_char kernels for slicing on character boundaries #1768 [arrow]
  • Support reading PageIndex from column metadata #1761 [parquet]
  • Support casting from DataType::Utf8 to DataType::Boolean #1740 [arrow]
  • Make current position available in FileWriter. #1691 [parquet]
  • Support writing parquet to stdout #1687 [parquet]

Fixed bugs:

  • Incorrect Offset Validation for Sliced List Array Children #1814 [arrow]
  • Parquet Snappy Codec overwrites Existing Data in Decompression Buffer #1806 [parquet]
  • flight_data_to_arrow_batch does not support RecordBatches with no columns #1783 [arrow-flight]
  • parquet does not compile with features=["zstd"] #1630 [parquet]

Documentation updates:

Closed issues:

Merged pull requests:

15.0.0 (2022-05-27)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Rename the string kernel to concatenate_elements #1747 [arrow]
  • ArrayDataBuilder::null_bit_buffer should accept Option<Buffer> as input type #1737 [arrow]
  • Fix schema comparison for non_canonical_map when running flight test #1730 [arrow]
  • Add support in aggregate kernel for BinaryArray #1724 [arrow]
  • Fix incorrect null_count in generate_unions_case integration test #1712 [arrow]
  • Keep type ids in Union datatype to follow Arrow spec and integrate with other implementations #1690 [arrow]
  • Support Reading Alternative List Representations to Arrow From Parquet #1680 [parquet]
  • Speed up the offsets checking #1675 [arrow]
  • Separate Parquet -> Arrow Schema Conversion From ArrayBuilder #1655 [parquet]
  • Add leaf_columns argument to ArrowReader::get_record_reader_by_columns #1653 [parquet]
  • Implement string_concat kernel #1540 [arrow]
  • Improve Unit Test Coverage of ArrayReaderBuilder #1484 [parquet]

Fixed bugs:

  • Parquet write failure from record batches when data is nested two levels deep #1744 [parquet]
  • IPC reader may break on projection #1735 [arrow]
  • Latest nightly fails to build with feature simd #1734 [arrow]
  • Trying to write parquet file in parallel results in corrupt file #1717 [parquet]
  • Roundtrip failure when using DELTA_BINARY_PACKED #1708 [parquet]
  • ArrayData::try_new cannot always return expected error. #1707 [arrow]
  • "out of order projection is not supported" after Fix Parquet Arrow Schema Inference #1701 [parquet]
  • Rust is not interoperability with C++ for IPC schemas with dictionaries #1694 [arrow]
  • Incorrect Repeated Field Schema Inference #1681 [parquet]
  • Parquet Treats Embedded Arrow Schema as Authoritative #1663 [parquet]
  • parquet_to_arrow_schema_by_columns Incorrectly Handles Nested Types #1654 [parquet]
  • Inconsistent Arrow Schema When Projecting Nested Parquet File #1652 [parquet]
  • StructArrayReader Cannot Handle Nested Lists #1651 [parquet]
  • Bug `substring` kernel: The null buffer is not aligned when offset != 0 #1639 [arrow]

Documentation updates:

  • Parquet command line tool does not install "globally" #1710 [parquet]
  • Improve integration test document to follow Arrow C++ repo CI #1742 [arrow] (viirya)

Merged pull requests:

14.0.0 (2022-05-13)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Add support for DataType::Duration in ffi interface #1688 [arrow]
  • Fix generate_unions_case integration test #1676 [arrow]
  • Add DictionaryArray support for bit_length kernel #1673 [arrow]
  • Add DictionaryArray support for length kernel #1672 [arrow]
  • flight_client_scenarios integration test should receive schema from flight data #1669 [arrow]
  • Unpin Flatbuffer version dependency #1667 [arrow]
  • Add dictionary array support for substring function #1656 [arrow]
  • Exclude dict_id and dict_is_ordered from equality comparison of Field #1646 [arrow]
  • Remove StringOffsetTrait and BinaryOffsetTrait #1644 [arrow]
  • Add tests and examples for UnionArray::from(data: ArrayData) #1643 [arrow]
  • Add methods pub fn offsets_buffer, pub fn types_ids_bufferand pub fn data_buffer for ArrayDataBuilder #1640 [arrow]
  • Fix generate_nested_dictionary_case integration test failure for Rust cases #1635 [arrow]
  • Expose ArrowWriter row group flush in public API #1626 [parquet]
  • Add substring support for FixedSizeBinaryArray #1618 [arrow]
  • Add PrettyPrint for UnionArrays #1594 [arrow]
  • Add SIMD support for the length kernel #1489 [arrow]
  • Support dictionary arrays in length and bit_length #1674 [arrow] (viirya)
  • Add dictionary array support for substring function #1665 [arrow] (sunchao)
  • Add DecimalType support in new_null_array #1659 [arrow] (yjshen)

Fixed bugs:

  • Docs.rs build is broken #1695
  • Interoperability with C++ for IPC schemas with dictionaries #1694
  • UnionArray::is_null incorrect #1625 [arrow]
  • Published Parquet documentation missing arrow::async_reader #1617 [parquet]
  • Files written with Julia's Arrow.jl in IPC format cannot be read by arrow-rs #1335 [arrow]

Documentation updates:

Closed issues:

  • Make OffsetSizeTrait::IS_LARGE as a const value #1658
  • Question: Why are there 3 types of OffsetSizeTraits? #1638
  • Written Parquet file way bigger than input files #1627
  • Ensure there is a single zero in the offsets buffer for an empty ListArray. #1620
  • Filtering UnionArray Changes DataType #1595

Merged pull requests:

13.0.0 (2022-04-29)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Read/write nested dictionary under fixed size list in ipc stream reader/write #1609 [arrow]
  • Add support for BinaryArray in substring kernel #1593 [arrow]
  • Read/write nested dictionary under large list in ipc stream reader/write #1584 [arrow]
  • Read/write nested dictionary under map in ipc stream reader/write #1582 [arrow]
  • Implement Clone for JSON DecoderOptions #1580 [arrow]
  • Add utf-8 validation checking to substring kernel #1575 [arrow]
  • Support casting to/from DataType::Null in cast kernel #1572 [arrow] (WinkerDu)

Fixed bugs:

  • Parquet schema should allow scale == precision for decimal type #1606 [parquet]
  • ListArray::from(ArrayData) dereferences invalid pointer when offsets are empty #1601 [arrow]
  • ArrayData Equality Incorrect Null Mask Offset Handling #1599
  • Filtering UnionArray Incorrect Handles Runs #1598
  • Safety
  • Safety
  • Union Layout Should Not Support Separate Validity Mask #1590
  • Incorrect nullable flag when reading maps test\_read\_maps fails when `force_validate` is active #1587 [parquet]
  • Output of ipc::reader::tests::projection_should_work fails validation #1548 [arrow]
  • Incorrect min/max statistics for decimals with byte-array notation #1532

Documentation updates:

Closed issues:

  • Dense UnionArray Offsets Are i32 not i8 #1597 [arrow]
  • Replace &Option<T> with Option<&T> in some APIs #1556 [parquet] [arrow]
  • Improve ergonomics of parquet::basic::LogicalType #1554 [parquet]
  • Mark the current substring function as unsafe and rename it. #1541 [arrow]
  • Requirements for Async Parquet API #1473 [parquet]

Merged pull requests:

12.0.0 (2022-04-15)

Full Changelog

Breaking changes:

  • Add ArrowReaderOptions to ParquetFileArrowReader, add option to skip decoding arrow metadata from parquet \#1459 #1558 [parquet] (tustvold)
  • Support RecordBatch with zero columns but non zero row count, add field to RecordBatchOptions \#1536 #1552 [arrow] (tustvold)
  • Consolidate JSON Reader options and DecoderOptions #1539 [arrow] (alamb)
  • Update prost, prost-derive and prost-types to 0.10, tonic, and tonic-build to 0.7 #1510 [arrow-flight] (alamb)
  • Add Json DecoderOptions and support custom format_string for each field #1451 [arrow] (sum12)

Implemented enhancements:

  • Read/write nested dictionary in ipc stream reader/writer #1565 [arrow]
  • Support FixedSizeBinary in the Arrow C data interface #1553 [arrow]
  • Support Empty Column Projection in ParquetRecordBatchReader #1537 [parquet]
  • Support RecordBatch with zero columns but non zero row count #1536 [arrow]
  • Add support for Date32/Date64<--> String/LargeString in cast kernel #1535 [arrow]
  • Support creating arrays from externally owned memory like Vec or String #1516 [arrow]
  • Speed up the substring kernel #1511 [arrow]
  • Handle Parquet Files With Inconsistent Timestamp Units #1459 [parquet]

Fixed bugs:

  • Error Inferring Schema for LogicalType::UNKNOWN #1557 [parquet]
  • Read dictionary from nested struct in ipc stream reader panics #1549 [arrow]
  • filter produces invalid sparse UnionArrays #1547 [arrow]
  • Documentation for GenericListBuilder is not exposed. #1518 [arrow]
  • cannot read parquet file #1515 [parquet]
  • The substring kernel panics when chars > U+0x007F #1478 [arrow]
  • Hang due to infinite loop when reading some parquet files with RLE encoding and bit packing #1458 [parquet]

Documentation updates:

Closed issues:

  • Interesting benchmark results of min_max_helper #1400

Merged pull requests:

11.1.0 (2022-03-31)

Full Changelog

Implemented enhancements:

  • Implement size_hint and ExactSizedIterator for DecimalArray #1505 [arrow]
  • Support calculate length by chars for StringArray #1493 [arrow]
  • Add length kernel support for ListArray #1470 [arrow]
  • The length kernel should work with BinaryArrays #1464 [arrow]
  • FFI for Arrow C Stream Interface #1348 [arrow]
  • Improve performance of DictionaryArray::try_new() #1313 [arrow]

Fixed bugs:

  • MIRI error in math_checked_divide_op/try_from_trusted_len_iter #1496 [arrow]
  • Parquet Writer Incorrect Definition Levels for Nested NullArray #1480 [parquet]
  • FFI: ArrowArray::try_from_raw shouldn't clone #1425 [arrow]
  • Parquet reader fails to read null list. #1399 [parquet]

Documentation updates:

  • A small mistake in the doc of BinaryArray and LargeBinaryArray #1455 [arrow]
  • A small mistake in the doc of GenericBinaryArray::take_iter_unchecked #1454 [arrow]
  • Add links in the doc of BinaryOffsetSizeTrait #1453 [arrow]
  • The doc of FixedSizeBinaryArray is confusing. #1452 [arrow]
  • Clarify docs that SlicesIterator ignores null values #1504 [arrow] (alamb)
  • Update the doc of BinaryArray and LargeBinaryArray #1471 [arrow] (HaoYang670)

Closed issues:

  • packed_simd v.s. portable_simd, which should be used? #1492
  • Cleanup: Use Arrow take kernel Within parquet ListArrayReader #1482 [parquet]

Merged pull requests:

11.0.0 (2022-03-17)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Fix generate_interval_case integration test failure #1445
  • Make the doc examples of ListArray and LargeListArray more readable #1433
  • Redundant if and abs in shift() #1427
  • Improve substring kernel performance #1422 [arrow]
  • Add missing value_unchecked() of FixedSizeBinaryArray #1419
  • Remove duplicate bound check in function shift #1408
  • Support dictionary array in C data interface #1397
  • filter kernel should work with UnionArrays #1394 [arrow]
  • filter kernel should work with FixedSizeListArrayss #1393 [arrow]
  • Add doc examples for creating FixedSizeListArray #1392 [arrow]
  • Update rust-version to 1.59 #1377
  • Arrow IPC projection support #1338
  • Implement basic FlightSQL Server #1386 [arrow-flight] (wangfenjin)

Fixed bugs:

  • DictionaryArray::try_new ignores validity bitmap of the keys #1429 [arrow]
  • The doc of GenericListArray is confusing #1424
  • DeltaBitPackDecoder Incorrectly Handles Non-Zero MiniBlock Bit Width Padding #1417 [parquet]
  • DeltaBitPackEncoder Pads Miniblock BitWidths With Arbitrary Values #1416 [parquet]
  • Possible unaligned write with MutableBuffer::push #1410 [arrow]
  • Integration Test is failing on master branch #1398 [arrow]

Documentation updates:

Merged pull requests:

10.0.0 (2022-03-04)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Add extract month and day in temporal.rs #1387
  • Add clone to IpcWriteOptions #1381 [arrow]
  • Support MapArray in filter kernel #1378 [arrow]
  • Add week temporal kernel #1375 [arrow]
  • Improve performance of compare_dict_op #1371 [arrow]
  • Add support for LargeUtf8 in json writer #1357 [parquet]
  • Make arrow::array::builder::MapBuilder public #1354 [arrow]
  • Refactor StructArray::from #1351 [arrow]
  • Refactor RecordBatch::validate_new_batch #1350 [arrow]
  • Remove redundant has_ methods for optional column metadata fields #1344 [parquet]
  • Add write method to JsonWriter #1340 [arrow]
  • Refactor the code of Bitmap::new #1337 [arrow]
  • Use DictionaryArray's iterator in compare_dict_op #1329 [arrow]
  • Add as_decimal_array(arr: &dyn Array) -> &DecimalArray #1312 [arrow]
  • More ergonomic / idiomatic primitive array creation from iterators #1298 [arrow]
  • Implement DictionaryArray support in eq_dyn, neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn #1201 [arrow]

Fixed bugs:

  • cargo clippy fails on the master branch #1362 [arrow]
  • ArrowArray::try_from_raw should not assume the pointers are from Arc #1333 [arrow]
  • Fix CSV Writer::new to accept delimiter and make WriterBuilder::build use it #1328 [arrow]
  • Make bounds configurable via builder when reading CSV #1327 [arrow]
  • Add with_datetime_format() to CSV WriterBuilder #1272 [arrow]

Performance improvements:

  • Improve performance of min and max aggregation kernels without nulls #1373 [arrow]

Closed issues:

  • Consider removing redundant has_XXX metadata functions in ColumnChunkMetadata #1332

Merged pull requests:

9.1.0 (2022-02-19)

Full Changelog

Implemented enhancements:

Fixed bugs:

  • len is not a parameter of MutableArrayData::extend #1316
  • module data_type is private in Rust Parquet 8.0.0 #1302 [parquet]
  • Test failure: bit_chunk_iterator #1294
  • csv_writer benchmark fails with "no such file or directory" #1292

Documentation updates:

Performance improvements:

Closed issues:

  • Expose column and offset index metadata offset #1317
  • Expose bloom filter metadata offset #1308
  • Improve ergonomics to construct DictionaryArrays from Key and Value arrays #1299
  • Make it easier to iterate over DictionaryArray #1295 [arrow]
  • (WON'T FIX) Don't Interwine Bit and Byte Aligned Operations in BitReader #1282
  • how to create arrow::array from streamReader #1278
  • Remove scientific notation when converting floats to strings. #983

Merged pull requests:

9.0.2 (2022-02-09)

Full Changelog

Breaking changes:

  • Add Send + Sync to DataType, RowGroupReader, FileReader, ChunkReader. #1264
  • Rename the function Bitmap::len to Bitmap::bit_len to clarify its meaning #1242 [parquet] [arrow] (HaoYang670)
  • Remove unused / broken memory-check feature #1222 [arrow] (jhorstmann)
  • Potentially buffer multiple RecordBatches before writing a parquet row group in ArrowWriter #1214 [parquet] [arrow] (tustvold)

Implemented enhancements:

  • Add async arrow parquet reader #1154 [parquet] [arrow] (tustvold)
  • Rename Bitmap::len to Bitmap::bit_len #1233
  • Extend CSV schema inference to allow scientific notation for floating point types #1215 [arrow]
  • Write Multiple RecordBatch to Parquet Row Group #1211
  • Add doc examples for eq_dyn etc. #1202 [arrow]
  • Add comparison kernels for BinaryArray #1108
  • impl ArrowNativeType for i128 #1098
  • Remove Copy trait bound from dyn scalar kernels #1243 [arrow] (matthewmturner)
  • Add into_inner for IPC FileWriter #1236 [arrow] (yjshen)
  • Minor

Fixed bugs:

  • Parquet v8.0.0 panics when reading all null column to NullArray #1245 [parquet]
  • Get Unknown configuration option rust-version when running the rust format command #1240
  • Bitmap Length Validation is Incorrect #1231 [arrow]
  • Writing sliced ListArray or MapArray ignore offsets #1226 [parquet]
  • Remove broken memory-tracking crate feature #1171
  • Revert making parquet::data_type and parquet::arrow::schema experimental #1244 [parquet] (tustvold)

Documentation updates:

Performance improvements:

  • Improve performance for arithmetic kernels with simd feature enabled except for division/modulo #1221 [arrow] (jhorstmann)
  • Do not concatenate identical dictionaries #1219 [arrow] (tustvold)
  • Preserve dictionary encoding when decoding parquet into Arrow arrays, 60x perf improvement \#171 #1180 [parquet] (tustvold)

Closed issues:

  • UnalignedBitChunkIterator to that iterates through already aligned u64 blocks #1227
  • Remove unused ArrowArrayReader in parquet #1197 [parquet]

Merged pull requests:

8.0.0 (2022-01-20)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Parquet reader should be able to read structs within list #1186 [parquet]
  • Disable serde_json arbitrary_precision feature flag #1174 [arrow]
  • Simplify and reduce code duplication in arithmetic.rs #1160 [arrow]
  • Return Err from JSON writer rather than panic! for unsupported types #1157 [arrow]
  • Support scalar mathematics kernels for Array and scalar value #1153 [arrow]
  • Support DecimalArray in sort kernel #1137
  • Parquet Fuzz Tests #1053
  • BooleanBufferBuilder Append Packed #1038 [arrow]
  • parquet Performance Optimization: StructArrayReader Redundant Level & Bitmap Computation #1034 [parquet]
  • Reduce Public Parquet API #1032 [parquet]
  • Add from_iter_values for binary array #1188 [arrow] (Jimexist)
  • Add support for MapArray in json writer #1149 [arrow] (helgikrs)

Fixed bugs:

  • Empty string arrays with no nulls are not equal #1208 [arrow]
  • Pretty print a RecordBatch containing Float16 triggers a panic #1193 [arrow]
  • Writing structs nested in lists produces an incorrect output #1184 [parquet]
  • Undefined behavior for GenericStringArray::from_iter_values if reported iterator upper bound is incorrect #1144 [arrow]
  • Interval comparisons with simd feature asserts #1136 [arrow]
  • RecordReader Permits Illegal Types #1132 [parquet]

Security fixes:

Documentation updates:

Performance improvements:

  • Improve parquet reading performance for columns with nulls by preserving bitmask when possible \#1037 #1054 [parquet] [arrow] (tustvold)
  • Improve parquet performance: Skip levels computation for required struct arrays in parquet #1035 [parquet] (tustvold)

Closed issues:

  • Generify ColumnReaderImpl and RecordReader #1040 [parquet]
  • Parquet Preserve BitMask #1037

Merged pull requests:

7.0.0 (2022-1-07)

Full Changelog

Arrow

Breaking changes:

  • pretty_format_batches now returns Result<impl Display> rather than String: #975
  • MutableBuffer::typed_data_mut is marked unsafe: #1029
  • UnionArray updated match latest Arrow spec, added UnionMode, UnionArray::new() marked unsafe: #885

New Features:

  • Support for Float16Array types #888
  • IPC support for UnionArray #654
  • Dynamic comparison kernels for scalars (e.g. eq_dyn_scalar), including DictionaryArray: #1113

Enhancements:

  • Added Schema::with_metadata and Field::with_metadata #1092
  • Support for custom datetime format for inference and parsing csv files #1112
  • Implement Array for ArrayRef for easier use #1129
  • Pretty printing display support for FixedSizeBinaryArray #1097
  • Dependency Upgrades: pyo3, parquet-format, prost, tonic
  • Avoid allocating vector of indices in lexicographical_partition_ranges#998

Parquet

Fixed bugs:

  • (parquet) Fix reading of dictionary encoded pages with null values: #1130

Changelog

6.5.0 (2021-12-23)

Full Changelog

6.4.0 (2021-12-10)

Full Changelog

6.3.0 (2021-11-26)

Full Changelog

Changes:

6.2.0 (2021-11-12)

Full Changelog

Features / Fixes:

6.1.0 (2021-10-29)

Full Changelog

Features / Fixes:

Other:

6.0.0 (2021-10-13)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Improve parquet binary writer speed by reducing allocations #819
  • Expose buffer operations #808
  • Add doc examples of writing parquet files using ArrowWriter #788

Fixed bugs:

  • JSON reader can create null struct children on empty lists #825
  • Incorrect null count for cast kernel for list arrays #815
  • minute and second temporal kernels do not respect timezone #500
  • Fix data corruption in json decoder f64-to-i64 cast #652 [arrow] (xianwill)

Documentation updates:

5.5.0 (2021-09-24)

Full Changelog

Implemented enhancements:

  • parquet should depend on a small set of arrow features #800
  • Support equality on RecordBatch #735

Fixed bugs:

  • Converting from string to timestamp uses microseconds instead of milliseconds #780
  • Document has no link to RowColumnIter #762
  • length on slices with null doesn't work #744

5.4.0 (2021-09-10)

Full Changelog

Implemented enhancements:

  • Upgrade lexical-core to 0.8 #747
  • append_nulls and append_trusted_len_iter for PrimitiveBuilder #725
  • Optimize MutableArrayData::extend for null buffers #397

Fixed bugs:

  • Arithmetic with scalars doesn't work on slices #742
  • Comparisons with scalar don't work on slices #740
  • unary kernel doesn't respect offset #738
  • new_null_array creates invalid struct arrays #734
  • --no-default-features is broken for parquet #733 [parquet]
  • Bitmap::len returns the number of bytes, not bits. #730
  • Decimal logical type is formatted incorrectly by print_schema #713
  • parquet_derive does not support chrono time values #711
  • Numeric overflow when formatting Decimal type #710
  • The integration tests are not running #690

Closed issues:

  • Question: Is there no way to create a DictionaryArray with a pre-arranged mapping? #729

5.3.0 (2021-08-26)

Full Changelog

Implemented enhancements:

  • Add optimized filter kernel for regular expression matching #697
  • Can't cast from timestamp array to string array #587

Fixed bugs:

  • 'Encoding DELTA_BYTE_ARRAY is not supported' with parquet arrow readers #708
  • Support reading json string into binary data type. #701

Closed issues:

  • Resolve Issues with prettytable-rs dependency #69 [arrow]

5.2.0 (2021-08-12)

Full Changelog

Implemented enhancements:

  • Make rand an optional dependency #671
  • Remove undefined behavior in value method of boolean and primitive arrays #645
  • Avoid materialization of indices in filter_record_batch for single arrays #636
  • Add a note about arrow crate security / safety #627
  • Allow the creation of String arrays from an iterator of &Option<&str> #598
  • Support arrow map datatype #395

Fixed bugs:

  • Parquet fixed length byte array columns write byte array statistics #660 [parquet]
  • Parquet boolean columns write Int32 statistics #659 [parquet]
  • Writing Parquet with a boolean column fails #657
  • JSON decoder data corruption for large i64/u64 #653
  • Incorrect min/max statistics for strings in parquet files #641 [parquet]

Closed issues:

  • Release candidate verifying script seems work on macOS #640
  • Update CONTRIBUTING #342

5.1.0 (2021-07-29)

Full Changelog

Implemented enhancements:

  • Make FFI_ArrowArray empty() public #602
  • exponential sort can be used to speed up lexico partition kernel #586
  • Implement sort() for binary array #568
  • primitive sorting can be improved and more consistent with and without limit if sorted unstably #553

Fixed bugs:

  • Confusing memory usage with CSV reader #623
  • FFI implementation deviates from specification for array release #595
  • Parquet file content is different if ~/.cargo is in a git checkout #589
  • Ensure output of MIRI is checked for success #581
  • MIRI failure in array::ffi::tests::test_struct and other ffi tests #580
  • ListArray equality check may return wrong result #570
  • cargo audit failed #561
  • ArrayData::slice() does not work for nested types such as StructArray #554

Documentation updates:

  • More examples of how to construct Arrays #301

Closed issues:

  • Implement StringBuilder::append_option #263 [arrow]

5.0.0 (2021-07-14)

Full Changelog

Breaking changes:

Implemented enhancements:

Fixed bugs:

  • Error building on master - error: cyclic package dependency: package ahash v0.7.4 depends on itself. Cycle #544
  • IPC reader panics with out of bounds error #541
  • Take kernel doesn't handle nulls and structs correctly #530 [arrow]
  • master fails to compile with default-features=false #529
  • README developer instructions out of date #523
  • Update rustc and packed_simd in CI before 5.0 release #517
  • Incorrect memory usage calculation for dictionary arrays #503 [arrow]
  • sliced null buffers lead to incorrect result in take kernel and probably on other places #502
  • Cast of utf8 types and list container types don't respect offset #334 [arrow]
  • fix take kernel null handling on structs #531 [arrow] (bjchambers)
  • Correct array memory usage calculation for dictionary arrays #505 [arrow] (jhorstmann)
  • parquet: improve BOOLEAN writing logic and report error on encoding fail #443 [parquet] (garyanaplan)
  • Fix bug with null buffer offset in boolean not kernel #418 [arrow] (jhorstmann)
  • respect offset in utf8 and list casts #335 [arrow] (ritchie46)
  • Fix comparison of dictionaries with different values arrays \#332 #333 [arrow] (tustvold)
  • ensure null-counts are written for all-null columns #307 [parquet] (crepererum)
  • fix invalid null handling in filter #296 [arrow] (ritchie46)
  • fix NaN handling in parquet statistics #256 (crepererum)

Documentation updates:

Merged pull requests:

4.4.0 (2021-06-24)

Full Changelog

Breaking changes:

  • migrate partition kernel to use Iterator trait #437 [arrow]
  • Remove DictionaryArray::keys_array #391 [arrow]

Implemented enhancements:

  • sort kernel boolean sort can be O(n) #447 [arrow]
  • C data interface for decimal128, timestamp, date32 and date64 #413
  • Add Decimal to CsvWriter #405
  • Use iterators to increase performance of creating Arrow arrays #200 [parquet]

Fixed bugs:

  • Release Audit Tool RAT is not being triggered #481
  • Security Vulnerabilities: flatbuffers: read_scalar and read_scalar_at allow transmuting values without unsafe blocks #476
  • Clippy broken after upgrade to rust 1.53 #467
  • Pull Request Labeler is not working #462
  • Arrow 4.3 release: error[E0658]: use of unstable library feature 'partition_point': new API #456
  • parquet reading hangs when row_group contains more than 2048 rows of data #349
  • Fail to build arrow #247
  • JSON reader does not implement iterator #193 [arrow]

Security fixes:

  • Ensure a successful MIRI Run on CI #227

Closed issues:

  • sort kernel has a lot of unnecessary wrapping #446
  • Parquet

4.3.0 (2021-06-10)

Full Changelog

Implemented enhancements:

  • Add partitioning kernel for sorted arrays #428 [arrow]
  • Implement sort by float lists #427 [arrow]
  • Derive Eq and PartialEq for SortOptions #426 [arrow]
  • use prettier and github action to normalize markdown document syntax #399
  • window::shift can work for more than just primitive array type #392
  • Doctest for ArrayBuilder #366

Fixed bugs:

  • Boolean not kernel does not take offset of null buffer into account #417
  • my contribution not marged in 4.2 release #394
  • window::shift shall properly handle boundary cases #387
  • Parquet WriterProperties.max_row_group_size not wired up #257
  • Out of bound reads in chunk iterator #198 [arrow]

4.2.0 (2021-05-29)

Full Changelog

Breaking changes:

  • DictionaryArray::values() clones the underlying ArrayRef #313 [arrow]

Implemented enhancements:

  • Simplify shift kernel using null array #371
  • Provide Arc-based constructor for parquet::util::cursor::SliceableCursor #368
  • Add badges to crates #361
  • Consider inlining PrimitiveArray::value #328
  • Implement automated release verification script #327
  • Add wasm32 to the list of target architectures of the simd feature #316
  • add with_escape for csv::ReaderBuilder #315 [arrow]
  • IPC feature gate #310
  • csv feature gate #309 [arrow]
  • Add shrink_to / shrink_to_fit to MutableBuffer #297

Fixed bugs:

  • Incorrect crate setup instructions #364
  • Arrow-flight only register rerun-if-changed if file exists #350
  • Dictionary Comparison Uses Wrong Values Array #332
  • Undefined behavior in FFI implementation #322
  • All-null column get wrong parquet null-counts #306 [parquet]
  • Filter has inconsistent null handling #295

4.1.0 (2021-05-17)

Full Changelog

Implemented enhancements:

  • Add Send to ArrayBuilder #290 [arrow]
  • Improve performance of bound checking option #280 [arrow]
  • extend compute kernel arity to include nullary functions #276
  • Implement FFI / CDataInterface for Struct Arrays #251 [arrow]
  • Add support for pretty-printing Decimal numbers #230 [arrow]
  • CSV Reader String Dictionary Support #228 [arrow]
  • Add Builder interface for adding Arrays to record batches #210 [arrow]
  • Support auto-vectorization for min/max #209 [arrow]
  • Support LargeUtf8 in sort kernel #25 [arrow]

Fixed bugs:

  • no method named select_nth_unstable_by found for mutable reference &mut [T] #283
  • Rust 1.52 Clippy error #266
  • NaNs can break parquet statistics #255 [parquet]
  • u64::MAX does not roundtrip through parquet #254 [parquet]
  • Integration tests failing to compile flatbuffer #249 [arrow]
  • Fix compatibility quirks between arrow and parquet structs #245 [parquet]
  • Unable to write non-null Arrow structs to Parquet #244 [parquet]
  • schema: missing field metadata when deserialize #241 [arrow]
  • Arrow does not compile due to flatbuffers upgrade #238 [arrow]
  • Sort with limit panics for the limit includes some but not all nulls, for large arrays #235 [arrow]
  • arrow-rs contains a copy of the "format" directory #233 [arrow]
  • Fix SEGFAULT/ SIGILL in child-data ffi #206 [arrow]
  • Read list field correctly in <struct<list>> #167 [parquet]
  • FFI listarray lead to undefined behavior. #20

Security fixes:

Documentation updates:

  • Comment out the instructions in the PR template #277
  • Update links to datafusion and ballista in README.md #19
  • Update "repository" in Cargo.toml #12

Closed issues:

  • Arrow Aligned Vec #268
  • Rust
  • Umbrella issue for clippy integration #217 [arrow]
  • Support sort #215 [arrow]
  • Support stable Rust #214 [arrow]
  • Remove Rust and point integration tests to arrow-rs repo #211 [arrow]
  • ArrayData buffers are inconsistent across implementations #207
  • 3.0.1 patch release #204
  • Document patch release process #202
  • Simplify Offset #186 [arrow]
  • Typed Bytes #185 [arrow]
  • CI
  • Improve take primitive performance #174
  • CI
  • Update assignees in JIRA where missing #160
  • Rust
  • DataFusion
  • Rust
  • DataFusion
  • DataFusion
  • Rust
  • DataFusion
  • DataFusion
  • DataFusion
  • Archery
  • rust
  • Rust
  • DataFusion
  • DataFusion
  • Merge utils from Parquet and Arrow #32 [arrow] [parquet]
  • Add benchmarks for Parquet #30 [parquet]
  • Mark methods that do not perform bounds checking as unsafe #28 [arrow]
  • Test issue #24 [arrow]
  • This is a test issue #11