onnxruntime-tvm/include/tvm/expr_operator.h

654 строки
19 KiB
C
Исходник Обычный вид История

/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
/*!
* \file tvm/expr_operator.h
* \brief Common operators defined for Expr.
*
* \note Most of the operator defined here perform simple constant folding
* when the type is int32 or int64 for simplifying the index expressions.
*/
#ifndef TVM_EXPR_OPERATOR_H_
#define TVM_EXPR_OPERATOR_H_
#include <algorithm>
#include <type_traits>
#include "expr.h"
#include "ir.h"
namespace tvm {
[Datatypes] Custom datatypes (#2900) * Register and use custom datatypes in TVM This patch adds the ability to register and use a custom datatype from Python, using the `register_datatype` call. The datatype can then be passed as the `dtype` parameter using the syntax `dtype="custom[<type_name>]bitsxlanes"`. * Removes extra file * Register custom datatypes with TVM; specify Cast and Add lowering This commit adds functionality for registering custom datatypes with TVM, and furthermore adding custom lowering functions to lower those custom datatypes. This commit only adds lowering for the Cast and Add ops; more ops will be added soon. Check out some custom datatype samples in my repository of samples: https://github.com/gussmith23/tvm-custom-datatype-samples * Register and lower casts from Python * Formatting * Fix include; was including too much * Add comment * Add DatatypeRegistered * Add storage size field to custom datatypes This field indicates the bitwidth of the opaque block of data into which instances of the datatype will be stored, when TVM compiles. For example, if I create a datatype with a storage size of 16, then - Constants of that datatype will be created as unsigned 16-bit ints - Calls to external functions taking that datatype will pass the data as unsigned 16-bit ints - External functions returning that datatype will be assumed to return unsigned 16-bit ints. * Change how lowering funcs (Cast and other ops) are named in registry tvm.datatypes.lower.<target>.cast.<dst-type>.<src-type> becomes tvm.datatypes.lower.<target>.Cast.<dst-type>.<src-type> And fixes some sloppy code around how the other ops were being formatted. * Update Python register_datatype to accept storage size * Oops, left out one cast->Cast change * Look up storage size when parsing `custom[typename]` When we encounter this type string in Python, it will be parsed into a Halide type object in C++. Some of my original code supported this parsing, but we now have to attach the storage type to the type (by setting the bits field). * Change how external calls for casting/other ops are done Firstly, we now use the storage size of the custom type when determining input/output types; e.g. a cast to a custom type with storage size 16 is seen as a call to an external function returning an opaque uint of size 16. Secondly, write a macro to handle the other ops. Originally I thought I could handle these at runtime, with a single `_register_op` global. I transitioned instead to using individual `_register_Add` etc. calls generated with a macro, but I don't remember why. * When encountering a custom type immediate, generate UIntImm * Translate custom types to LLVM type * Generate correct return type in Casts Originally I was assuming that the result type from casts was always a custom datatype, and so I was making the Call return a UInt type. * Use TVM-idiomatic recursion style in DatatypesLowerer This was actually a bug, I'm pretty sure; we wouldn't have recursed deep on any complex programs. As a result of making this change, I also uncovered another potential bug, where the datatypes lowering pass would attempt to lower a Load of a custom type. By commenting out the `Mutate_` for Load, I was able to stop the error from cropping up, but frankly, I'm not satisfied with the solution; how is it that we are able to run codegen when Loads of custom datatypes are present in the IR? I have not written any code, to my knowledge, that will support this. Perhaps Load does not care about the underlying datatype? * Use CHECK * Add comment about which Mutate_s are needed * Add comments * Add GetCustomDatatypeRegistered as an extern C function * Formatting, comments, casting * Change how datatype string is formatted * Use bits() instead of GetStorageSize Use bits() instead of GetStorageSize * Change comment * Add datatype.py * Change registered function name (datatypes->datatype) * Remove GetStorageSize * Format custom datatypes like any other datatype Specifically, we now print the bits and lanes after the `custom[...]` string. * Correctly implement datatype lowering in Python * Remove unneeded include * Make function naming consistent * Use CHECK instead of internal_assert * Rename macro * Formatting * Rename functions * Implement Cast lowering `_datatype_register_op` is now able to lower both binary ops and Casts. * Formatting * Formatting * Clang format, google style * Fix std::string/extern "C" warnings * Formatting * Formatting * Lower Allocates and Loads during datatype lowering This should ensure that there are no custom datatypes remaining once datatype lowering is done. This will allow us to remove the code in the LLVM codegen which deals with custom datatypes. * Revert additions to codegen_llvm.cc which are now unneeded * Pass cpplint on lower_datatypes.cc * Add clarifying comment * Remove datatype lowering registration funcs from C++ * Add CHECKs * Remove TODO * Remove all references to storage size * Move and rename function * Rename function * Remove done TODOs and other handled comments * Remove irrelevant Load code and comments * Comment out the IR node types I'm not sure about yet * Add bfloat16 datatype unittest * Fix MakeConstScalar MakeConstScalar for a custom datatype will now call out to a function which can be registered on a per-datatype basis. The function will take a double and return the equivalent value in the custom datatype format. Note that these code paths are not actually used or tested at the moment. I have not yet written an example which uses const scalars of a custom datatype. * Formatting * Change pass name * Allow users to register whatever lowering function they want Tianqi pointed out that users should be able to register whatever lowering function they want, and should not be constrained to registering lowering functions which just call out to external libraries. I still provide a function for making lowering functions which call out to external libraries, for convenience. * Add clarifying comment * Remove unneeded comment * Remove unneeded function * Rename file * Undo unnecessary change * Undo unnecessary change * Make naming consistent Rename "datatypes" to "custom datatypes" in most contexts. * Revert an artifact of old code * Fix build warnings, add TODO * Lint * Remove unnecessary use of extern C by separating decl and impl * Error checking * Remove TODO * Missed a name change * Lint * Python lint * Correctly format datatype * Move bfloat16 to 3rdparty * "custom_datatypes" --> "datatype" in most places I left the pass as "LowerCustomDatatypes" to indicate that we're not lowering anything other than custom datatypes. Otherwise, everything else has been changed. * Upgrade datatype unittest I used a float calculator to generate some real testcases for the unittest. * Separate public includes and private implementation Specifically, create cleaner decoupling between datatypes stuff in packed_func and the datatype registry implementation. * Formatting * Limit custom datatype codes to >128 * Add TODOs * Fix comment * Formatting * Clean up datatype unittest * Remove un-exported functions in public headers; UIntImm->FloatImm More places where I accidentally was using implementation-only functions in public headers. Additionally, store custom datatype immediates as FloatImms. A later change will add new lowering logic to lower these FloatImms to UIntImms. Plus formatting change. * Lint * Use FloatImm (not UIntImm) to hold immediates of custom datatypes This change switches from using UIntImm to FloatImm for storing immediates of custom datatypes. The value of the number is stored in a double, which should be enough precision for now, for most custom types we will explore in the immediate future. In line with this change, we change the datatype lowering so that FloatImms are lowered to UInts of the appropriate size. Originally, this was going to be done by allowing the user to register a double->uint_<storage size>_t conversion which would be called at compile time to convert the value from the FloatImm to a UInt and store it in a UIntImm. After discussions with Tianqi, we decided to take the simpler route, and lower FloatImms just as we lower all other ops: by replacing them with Call nodes. In this case, presumably the user will Call out to a conversion function in their datatype library. The justification for this decision is due to the functionality added in #1486. This pull request adds the ability to load LLVM bytecode in at compile time. This applies in our case as follows: 1. The user writes their custom datatype programs and registers their lowering functions in the same way we've been doing it so far. All operations over custom datatypes are lowered to Calls to the datatype library. 2. The user compiles their datatype library to LLVM bytecode. 3. At TVM compile time, the user loads the LLVM bytecode. Depending on how the datatype library is written, Clang should be able to perform constant folding over the custom datatype immediates, even if their conversions are done with calls to the library. Additionally adds test to test the FloatImm codepath. * Re-add a change I removed accidentally during rebase * Cleanup * Remove unnecessary TVM_DLLs * Add custom datatype utilities source file to Go runtime pack * Revert "Remove unnecessary TVM_DLLs" This reverts commit 4b742b99557fd3bf0ce6617f033c8b444b74eda4. * Mark bfloat code as TVM_DLL * Moves custom datatype runtime utilities to c_runtime_api.cc * Revert "Add custom datatype utilities source file to Go runtime pack" This reverts commit aecbcde0b2cc09a2693955b77037fe20f93b5bfd. * Move datatype parsing to its own function * Change comments * Remove unneeded function * Formatting * Formatting * Documentation * Add kCustomBegin, use it for checking for custom types * Documentation * Formatting * Move static definition to implementation * Remove comment * Decide toBeLowered before lowering arguments of Expr In the past, e.g. when lowering custom datatypes for an Add, we would lower a and b first, and then decide whether the resulting new Add needed to be lowered based on the (new) types of a and b. Now, instead, we need to check the types of a and b first (to see if they're custom types), and then lower them (so they'll become non-custom types), and then lower the new Add. * Revert "Move datatype parsing to its own function" This reverts commit d554a5881afcf69af1c070d882a7651022703a09. This broke parsing. Will figure this out later. There isn't a really clean way to separate this out given how the rest of the function is written. * Replace comment * Documentation * Remove comment and TVM_DLL * Better error messages * Remove artifact of rebase * Separate datatypes parsing to its own function * Add \returns * Comment changes; add TODO * Refactor tests
2019-05-15 23:34:30 +03:00
/*!
* \brief Make a const value with certain data type.
* \param t The target type.
* \param value The input value
* \return the result expression.
* \tparam ValueType The constant value type
*/
template<typename ValueType,
typename = typename std::enable_if<std::is_pod<ValueType>::value>::type>
inline Expr make_const(Type t, ValueType value);
/*!
* \brief Make a const zero expr.
* \param t The target type.
* \return the result expression.
*/
inline Expr make_zero(Type t);
/*!
* \brief Make a constant true expression.
* \param lanes The number of lanes in the bool
* \return The result expression.
*/
inline Expr const_true(int lanes = 1) {
return make_const(UInt(1, lanes), 1);
}
/*!
* \brief Make a constant false expression.
* \param lanes The number of lanes in the bool
* \return The result expression.
*/
inline Expr const_false(int lanes = 1) {
return make_const(UInt(1, lanes), 0);
}
/*!
* \brief Get x as constant int expression.
* \param x The expression
* \return the address to the int expression,
* return nullptr, if x is not IntImm.
*/
inline const int64_t* as_const_int(const Expr& x) {
if (!x.defined()) return nullptr;
if (const ir::IntImm* op = x.as<ir::IntImm>()) {
return &(op->value);
} else {
return nullptr;
}
}
/*!
* \brief Get x as constant uint expression.
* \param x The expression
* \return the address to the int expression,
* return nullptr, if x is not UIntImm.
*/
inline const uint64_t* as_const_uint(const Expr& x) {
if (!x.defined()) return nullptr;
if (const ir::UIntImm* op = x.as<ir::UIntImm>()) {
return &(op->value);
} else {
return nullptr;
}
}
/*!
* \brief Check whether x is a constant integer expression.
* \param x The input argument
* \param value the value to be compared against.
* \return whether x is constant expression.
*/
inline bool is_const_int(const Expr& x, int64_t value);
/*!
* \brief Check whether stmt is nop.
* \param stmt The input statement
* \return whether stmt is nop
*/
inline bool is_no_op(const Stmt& stmt);
/*!
* \brief Check whether x is a constant integer 1
* \param x The input argument.
* \note This only return true for integer types.
* \return whether x is constant 1
*/
inline bool is_one(const Expr& x) {
return is_const_int(x, 1);
}
/*!
* \brief Check whether x is a constant integer 0
* \param x The input argument
* \return whether x is constant 0
* \note This only return true for integer types.
*/
inline bool is_zero(const Expr& x) {
return is_const_int(x, 0);
}
/*!
* \brief Check whether x is a constant.
* \note This only return true for integer types.
* \return whether x is constant
*/
inline bool is_const(const Expr& x);
/*!
* \brief Check whether x is a constant power of two
* If x is power of two, write the power to the shift.
*
* \param x The input expression.
* \param shift The output shift if x is power of two.
* \return whether x is constant power of two
*/
TVM_DLL bool is_const_power_of_two_integer(const Expr& x, int* shift);
/*!
* \brief cast value to type.
*
* \param t the target type.
* \param value The value
* \return The result expression.
* \note This function may return value if the type is the same.
*/
TVM_DLL Expr cast(const Type& t, Expr value);
/*!
* \brief perform reinterpret cast value to type.
*
* \param t the target type.
* \param value The value
* \return The result expression.
* \note This function may return value if the type is the same.
*/
TVM_DLL Expr reinterpret(const Type& t, Expr value);
/*!
* \brief add operator
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator+(Expr a, Expr b);
/*!
* \brief subtraction operator
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator-(Expr a, Expr b);
/*!
* \brief negation.
*
* \param a input.
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator-(Expr a);
/*!
* \brief multiplication operator
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator*(Expr a, Expr b);
/*!
* \brief division operator
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator/(Expr a, Expr b);
/*!
* \brief mod operator
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator%(Expr a, Expr b);
/*!
* \brief left shift operator
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator<<(Expr a, Expr b);
/*!
* \brief right shift operator
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator>>(Expr a, Expr b);
/*!
* \brief greater
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator>(Expr a, Expr b);
/*!
* \brief greater_equal
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator>=(Expr a, Expr b);
/*!
* \brief less
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator<(Expr a, Expr b);
/*!
* \brief less_equal
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator<=(Expr a, Expr b);
/*!
* \brief equal
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator==(Expr a, Expr b);
/*!
* \brief not_equal
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator!=(Expr a, Expr b);
/*!
* \brief and
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note This operator does eager constant folding.
*/
TVM_DLL Expr operator&&(Expr a, Expr b);
/*!
* \brief or
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note This operator does eager constant folding.
*/
TVM_DLL Expr operator||(Expr a, Expr b);
/*!
* \brief not
*
* \param a left operand
* \return The result expression.
* \note This operator does eager constant folding.
*/
TVM_DLL Expr operator!(Expr a);
/*!
* \brief take maximum of two values
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr max(Expr a, Expr b);
/*!
* \brief take minimum of two values
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr min(Expr a, Expr b);
/*!
* \brief take bitwise and of two values
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator&(Expr a, Expr b);
/*!
* \brief take bitwise or of two values
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator|(Expr a, Expr b);
/*!
* \brief take bitwise xor of two values
*
* \param a left operand
* \param b right operand
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator^(Expr a, Expr b);
/*!
* \brief take bitwise negation of two values
*
* \param a the input expression.
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr operator~(Expr a);
/*!
* \brief Conditional expression.
*
* \param cond The condition
* \param true_value The value when results are true.
* \param false_value The value when results are false.
* \return The result expression.
* \note this function does eager constant folding for
* index types(int32, int64) when possible.
*/
TVM_DLL Expr if_then_else(Expr cond, Expr true_value, Expr false_value);
/*!
* \brief Mark condition as likely.
* \param cond The condition
* \return The marked expression.
*/
TVM_DLL Expr likely(Expr cond);
/*!
* \brief Calculate power(x, y)
* \param x The left operand.
* \param y The right operand.
*/
TVM_DLL Expr pow(Expr x, Expr y);
/*!
* \brief Calculate absolute value of x.
* \param x The input data
*
* \return The aboslute value of input data x
*/
TVM_DLL Expr abs(Expr x);
/*!
* \brief sum of of source expression over axis
* \param source The source expression.
* \param axis List of iteration variables that will be used for reduction.
*/
TVM_DLL Expr sum(Expr source, Array<IterVar> axis);
/*!
* \brief logical And of of source expression over axis
* \param source The source expression.
* \param axis List of iteration variables that will be used for reduction.
*/
TVM_DLL Expr all(Expr source, Array<IterVar> axis);
/*!
* \brief max of of source expression over axis
* \param source The source expression.
* \param axis List of iteration variables that will be used for reduction.
*/
TVM_DLL Expr max(Expr source, Array<IterVar> axis);
/*!
* \brief max of of source expression over axis
* \param source The source expression.
* \param axis List of iteration variables that will be used for reduction.
*/
TVM_DLL Expr min(Expr source, Array<IterVar> axis);
/*!
* \brief product of of source expression over axis
* \param source The source expression.
* \param axis List of iteration variables that will be used for reduction.
*/
TVM_DLL Expr prod(Expr source, Array<IterVar> axis);
/*!
* \brief Calculate floor(x)
* \param x The input expression.
* \return The result expression.
*/
TVM_DLL Expr floor(Expr x);
/*!
* \brief Calculate ceil(x)
* \param x The input expression.
* \return The result expression.
*/
TVM_DLL Expr ceil(Expr x);
/*!
* \brief Calculate round(x)
* \param x The input expression.
* \return The result expression.
*/
TVM_DLL Expr round(Expr x);
/*!
* \brief Calculate trunc(x)
* \param x The input expression.
* \return The result expression.
*/
TVM_DLL Expr trunc(Expr x);
// Intrinsic operators
#define TVM_DECLARE_INTRIN_UNARY(OpName) \
inline Expr OpName(Expr x) { \
Porting schedules (except convolutions) to C++ (#763) * Ported injective schedules to C++. Added some elementwise ops. * Fix lint errors * Added reduction ops and schedules * Fix lint errors * Fix lint errors * Fix lint errors * Added transform ops * Fix lint errors * Fix lint errors * Added softmax, log_softmax, leaky_relu and flatten ops. Fixed issue where TVM_DECLARE_INTRIN_UNARY used the PureExtern flag instead of PureIntrinsic. Added softmax CUDA schedule. * Fix lint * Fix lint * Added binary_dense, batch_norm_inference, dense, dilate, scale_shift_*, global_pool and pool ops. Extended pad to allow specifying pad_value. Fixed issue where pad would throw if padding was zero in all dimensions. * Fix lint * Fix lint * Added CUDA schedules for dense, pool and global_pool * Added extern schedules for generic and CUDA * Fix lint * Added x86 binary schedules * Fix lint * Added rocm dense schedule. Added rocBLAS and cuBLAS support to dense ops * Added pow ops. Added x86 default and injective schedules * Fix lint * Fix lint * Fix lint * Fix lint * Fix lint * Fix indent * Removed schedules directory * Changed left_shift, right_shift to operators. Changed pad_value in pad() to remove pointer usage * Fixed usage of pad in nn/pooling.h. Fixed declaration of operator>> * Fixed comments for shift operators * Added comments to utility functions * Added TOPI C++ library, exporting broadcast_add op * Fix lint * Share libinfo.py with TVM * Fix lint * Add other broadcast ops * Fix lint * Fix imports in topi * Fix lib names * Fixed build issue where windows builds don't apply correct definitions * Removed TVM_EXPORTS from topi library * Attempted CI build fix * Add topi lib to tvm_multilib * Fix Jenkinsfile * Added TOPI build target to Makefile * Fix nn op namespaces. * Fix lint * Renamed TOPI lib to libtvm_topi * Removed _ffi/base.py * Remove _ffi from topi, now shared with tvm. * Make libtvm_topi loading optional * Fix compiler warnings * Fix lint * Fix lint * Fix lint * Fix build error by making new libs argument to Target optional * Added C++ Target type interop. Added registration of remaining C++ ops and schedules. Added test of broadcast ops * Fix lint * Fix lint * Fix compile error * Fix compiler warnings * Fix compiler warnings * Fixed int vector interop. Fixed argmin incorrectly invoking argmax. Fixed corner case in default schedules of attempting to fuse 0 length axes. Added tests for reduce ops. * Refactored reduce builders * Fixed typos in topi.cc. Added basic test. * Fixed padding size error. Added dense, dilate, pooling tests * Fixed issue where clip would output a different dtype to the input. Added split_sections op to cover the other mode of the python split op. Added tests. * Changed extension type numbers to avoid clash with NNVM * Fix lint * Fix compiler warnings * Removed use of std::vector from the public TOPI API * Fix lint * Add TOPI C++ tests to CI * Fixed detail namespacing. Improved comments.
2018-01-28 08:57:13 +03:00
return ir::Call::make(x.type(), #OpName, {x}, ir::Call::PureIntrinsic); \
} \
TVM_DECLARE_INTRIN_UNARY(exp);
TVM_DECLARE_INTRIN_UNARY(tanh);
TVM_DECLARE_INTRIN_UNARY(sigmoid);
TVM_DECLARE_INTRIN_UNARY(sqrt);
TVM_DECLARE_INTRIN_UNARY(rsqrt);
Porting schedules (except convolutions) to C++ (#763) * Ported injective schedules to C++. Added some elementwise ops. * Fix lint errors * Added reduction ops and schedules * Fix lint errors * Fix lint errors * Fix lint errors * Added transform ops * Fix lint errors * Fix lint errors * Added softmax, log_softmax, leaky_relu and flatten ops. Fixed issue where TVM_DECLARE_INTRIN_UNARY used the PureExtern flag instead of PureIntrinsic. Added softmax CUDA schedule. * Fix lint * Fix lint * Added binary_dense, batch_norm_inference, dense, dilate, scale_shift_*, global_pool and pool ops. Extended pad to allow specifying pad_value. Fixed issue where pad would throw if padding was zero in all dimensions. * Fix lint * Fix lint * Added CUDA schedules for dense, pool and global_pool * Added extern schedules for generic and CUDA * Fix lint * Added x86 binary schedules * Fix lint * Added rocm dense schedule. Added rocBLAS and cuBLAS support to dense ops * Added pow ops. Added x86 default and injective schedules * Fix lint * Fix lint * Fix lint * Fix lint * Fix lint * Fix indent * Removed schedules directory * Changed left_shift, right_shift to operators. Changed pad_value in pad() to remove pointer usage * Fixed usage of pad in nn/pooling.h. Fixed declaration of operator>> * Fixed comments for shift operators * Added comments to utility functions * Added TOPI C++ library, exporting broadcast_add op * Fix lint * Share libinfo.py with TVM * Fix lint * Add other broadcast ops * Fix lint * Fix imports in topi * Fix lib names * Fixed build issue where windows builds don't apply correct definitions * Removed TVM_EXPORTS from topi library * Attempted CI build fix * Add topi lib to tvm_multilib * Fix Jenkinsfile * Added TOPI build target to Makefile * Fix nn op namespaces. * Fix lint * Renamed TOPI lib to libtvm_topi * Removed _ffi/base.py * Remove _ffi from topi, now shared with tvm. * Make libtvm_topi loading optional * Fix compiler warnings * Fix lint * Fix lint * Fix lint * Fix build error by making new libs argument to Target optional * Added C++ Target type interop. Added registration of remaining C++ ops and schedules. Added test of broadcast ops * Fix lint * Fix lint * Fix compile error * Fix compiler warnings * Fix compiler warnings * Fixed int vector interop. Fixed argmin incorrectly invoking argmax. Fixed corner case in default schedules of attempting to fuse 0 length axes. Added tests for reduce ops. * Refactored reduce builders * Fixed typos in topi.cc. Added basic test. * Fixed padding size error. Added dense, dilate, pooling tests * Fixed issue where clip would output a different dtype to the input. Added split_sections op to cover the other mode of the python split op. Added tests. * Changed extension type numbers to avoid clash with NNVM * Fix lint * Fix compiler warnings * Removed use of std::vector from the public TOPI API * Fix lint * Add TOPI C++ tests to CI * Fixed detail namespacing. Improved comments.
2018-01-28 08:57:13 +03:00
TVM_DECLARE_INTRIN_UNARY(log);
TVM_DECLARE_INTRIN_UNARY(popcount);
Porting schedules (except convolutions) to C++ (#763) * Ported injective schedules to C++. Added some elementwise ops. * Fix lint errors * Added reduction ops and schedules * Fix lint errors * Fix lint errors * Fix lint errors * Added transform ops * Fix lint errors * Fix lint errors * Added softmax, log_softmax, leaky_relu and flatten ops. Fixed issue where TVM_DECLARE_INTRIN_UNARY used the PureExtern flag instead of PureIntrinsic. Added softmax CUDA schedule. * Fix lint * Fix lint * Added binary_dense, batch_norm_inference, dense, dilate, scale_shift_*, global_pool and pool ops. Extended pad to allow specifying pad_value. Fixed issue where pad would throw if padding was zero in all dimensions. * Fix lint * Fix lint * Added CUDA schedules for dense, pool and global_pool * Added extern schedules for generic and CUDA * Fix lint * Added x86 binary schedules * Fix lint * Added rocm dense schedule. Added rocBLAS and cuBLAS support to dense ops * Added pow ops. Added x86 default and injective schedules * Fix lint * Fix lint * Fix lint * Fix lint * Fix lint * Fix indent * Removed schedules directory * Changed left_shift, right_shift to operators. Changed pad_value in pad() to remove pointer usage * Fixed usage of pad in nn/pooling.h. Fixed declaration of operator>> * Fixed comments for shift operators * Added comments to utility functions * Added TOPI C++ library, exporting broadcast_add op * Fix lint * Share libinfo.py with TVM * Fix lint * Add other broadcast ops * Fix lint * Fix imports in topi * Fix lib names * Fixed build issue where windows builds don't apply correct definitions * Removed TVM_EXPORTS from topi library * Attempted CI build fix * Add topi lib to tvm_multilib * Fix Jenkinsfile * Added TOPI build target to Makefile * Fix nn op namespaces. * Fix lint * Renamed TOPI lib to libtvm_topi * Removed _ffi/base.py * Remove _ffi from topi, now shared with tvm. * Make libtvm_topi loading optional * Fix compiler warnings * Fix lint * Fix lint * Fix lint * Fix build error by making new libs argument to Target optional * Added C++ Target type interop. Added registration of remaining C++ ops and schedules. Added test of broadcast ops * Fix lint * Fix lint * Fix compile error * Fix compiler warnings * Fix compiler warnings * Fixed int vector interop. Fixed argmin incorrectly invoking argmax. Fixed corner case in default schedules of attempting to fuse 0 length axes. Added tests for reduce ops. * Refactored reduce builders * Fixed typos in topi.cc. Added basic test. * Fixed padding size error. Added dense, dilate, pooling tests * Fixed issue where clip would output a different dtype to the input. Added split_sections op to cover the other mode of the python split op. Added tests. * Changed extension type numbers to avoid clash with NNVM * Fix lint * Fix compiler warnings * Removed use of std::vector from the public TOPI API * Fix lint * Add TOPI C++ tests to CI * Fixed detail namespacing. Improved comments.
2018-01-28 08:57:13 +03:00
// Implementation details after this
inline bool is_const(const Expr& x) {
if (x.as<ir::IntImm>() || x.as<ir::UIntImm>()) {
return true;
} else if (const auto* op = x.as<ir::Broadcast>()) {
const Expr& val = op->value;
if (val.as<ir::IntImm>() || val.as<ir::UIntImm>()) {
return true;
}
}
return false;
Porting schedules (except convolutions) to C++ (#763) * Ported injective schedules to C++. Added some elementwise ops. * Fix lint errors * Added reduction ops and schedules * Fix lint errors * Fix lint errors * Fix lint errors * Added transform ops * Fix lint errors * Fix lint errors * Added softmax, log_softmax, leaky_relu and flatten ops. Fixed issue where TVM_DECLARE_INTRIN_UNARY used the PureExtern flag instead of PureIntrinsic. Added softmax CUDA schedule. * Fix lint * Fix lint * Added binary_dense, batch_norm_inference, dense, dilate, scale_shift_*, global_pool and pool ops. Extended pad to allow specifying pad_value. Fixed issue where pad would throw if padding was zero in all dimensions. * Fix lint * Fix lint * Added CUDA schedules for dense, pool and global_pool * Added extern schedules for generic and CUDA * Fix lint * Added x86 binary schedules * Fix lint * Added rocm dense schedule. Added rocBLAS and cuBLAS support to dense ops * Added pow ops. Added x86 default and injective schedules * Fix lint * Fix lint * Fix lint * Fix lint * Fix lint * Fix indent * Removed schedules directory * Changed left_shift, right_shift to operators. Changed pad_value in pad() to remove pointer usage * Fixed usage of pad in nn/pooling.h. Fixed declaration of operator>> * Fixed comments for shift operators * Added comments to utility functions * Added TOPI C++ library, exporting broadcast_add op * Fix lint * Share libinfo.py with TVM * Fix lint * Add other broadcast ops * Fix lint * Fix imports in topi * Fix lib names * Fixed build issue where windows builds don't apply correct definitions * Removed TVM_EXPORTS from topi library * Attempted CI build fix * Add topi lib to tvm_multilib * Fix Jenkinsfile * Added TOPI build target to Makefile * Fix nn op namespaces. * Fix lint * Renamed TOPI lib to libtvm_topi * Removed _ffi/base.py * Remove _ffi from topi, now shared with tvm. * Make libtvm_topi loading optional * Fix compiler warnings * Fix lint * Fix lint * Fix lint * Fix build error by making new libs argument to Target optional * Added C++ Target type interop. Added registration of remaining C++ ops and schedules. Added test of broadcast ops * Fix lint * Fix lint * Fix compile error * Fix compiler warnings * Fix compiler warnings * Fixed int vector interop. Fixed argmin incorrectly invoking argmax. Fixed corner case in default schedules of attempting to fuse 0 length axes. Added tests for reduce ops. * Refactored reduce builders * Fixed typos in topi.cc. Added basic test. * Fixed padding size error. Added dense, dilate, pooling tests * Fixed issue where clip would output a different dtype to the input. Added split_sections op to cover the other mode of the python split op. Added tests. * Changed extension type numbers to avoid clash with NNVM * Fix lint * Fix compiler warnings * Removed use of std::vector from the public TOPI API * Fix lint * Add TOPI C++ tests to CI * Fixed detail namespacing. Improved comments.
2018-01-28 08:57:13 +03:00
}
inline bool is_positive_const(const Expr& a) {
if (const ir::IntImm* op = a.as<ir::IntImm>()) {
return op->value > 0;
} else if (const ir::UIntImm* op = a.as<ir::UIntImm>()) {
return op->value > 0;
} else {
return false;
}
}
inline bool is_negative_const(const Expr& a) {
if (const ir::IntImm* op = a.as<ir::IntImm>()) {
return op->value < 0;
} else {
return false;
}
}
inline bool is_const_int(const Expr& x, int64_t value) {
if (const auto* op = x.as<ir::IntImm>()) {
return op->value == value;
} else if (const auto* op = x.as<ir::UIntImm>()) {
return op->value == static_cast<uint64_t>(value);
} else if (const auto* op = x.as<ir::Broadcast>()) {
const Expr& val = op->value;
if (const auto* opv = val.as<ir::IntImm>()) {
return opv->value == value;
} else if (const auto* opv = val.as<ir::UIntImm>()) {
return opv->value == static_cast<uint64_t>(value);
}
}
return false;
}
inline bool is_no_op(const Stmt& stmt) {
if (!stmt.defined()) return true;
if (const auto* op = stmt.as<ir::Evaluate>()) {
return is_const(op->value);
}
return false;
}
template<typename ValueType>
inline Expr MakeConstScalar(Type t, ValueType value) {
if (t.is_int()) return ir::IntImm::make(t, static_cast<int64_t>(value));
if (t.is_uint()) return ir::UIntImm::make(t, static_cast<uint64_t>(value));
if (t.is_float()) return ir::FloatImm::make(t, static_cast<double>(value));
[Datatypes] Custom datatypes (#2900) * Register and use custom datatypes in TVM This patch adds the ability to register and use a custom datatype from Python, using the `register_datatype` call. The datatype can then be passed as the `dtype` parameter using the syntax `dtype="custom[<type_name>]bitsxlanes"`. * Removes extra file * Register custom datatypes with TVM; specify Cast and Add lowering This commit adds functionality for registering custom datatypes with TVM, and furthermore adding custom lowering functions to lower those custom datatypes. This commit only adds lowering for the Cast and Add ops; more ops will be added soon. Check out some custom datatype samples in my repository of samples: https://github.com/gussmith23/tvm-custom-datatype-samples * Register and lower casts from Python * Formatting * Fix include; was including too much * Add comment * Add DatatypeRegistered * Add storage size field to custom datatypes This field indicates the bitwidth of the opaque block of data into which instances of the datatype will be stored, when TVM compiles. For example, if I create a datatype with a storage size of 16, then - Constants of that datatype will be created as unsigned 16-bit ints - Calls to external functions taking that datatype will pass the data as unsigned 16-bit ints - External functions returning that datatype will be assumed to return unsigned 16-bit ints. * Change how lowering funcs (Cast and other ops) are named in registry tvm.datatypes.lower.<target>.cast.<dst-type>.<src-type> becomes tvm.datatypes.lower.<target>.Cast.<dst-type>.<src-type> And fixes some sloppy code around how the other ops were being formatted. * Update Python register_datatype to accept storage size * Oops, left out one cast->Cast change * Look up storage size when parsing `custom[typename]` When we encounter this type string in Python, it will be parsed into a Halide type object in C++. Some of my original code supported this parsing, but we now have to attach the storage type to the type (by setting the bits field). * Change how external calls for casting/other ops are done Firstly, we now use the storage size of the custom type when determining input/output types; e.g. a cast to a custom type with storage size 16 is seen as a call to an external function returning an opaque uint of size 16. Secondly, write a macro to handle the other ops. Originally I thought I could handle these at runtime, with a single `_register_op` global. I transitioned instead to using individual `_register_Add` etc. calls generated with a macro, but I don't remember why. * When encountering a custom type immediate, generate UIntImm * Translate custom types to LLVM type * Generate correct return type in Casts Originally I was assuming that the result type from casts was always a custom datatype, and so I was making the Call return a UInt type. * Use TVM-idiomatic recursion style in DatatypesLowerer This was actually a bug, I'm pretty sure; we wouldn't have recursed deep on any complex programs. As a result of making this change, I also uncovered another potential bug, where the datatypes lowering pass would attempt to lower a Load of a custom type. By commenting out the `Mutate_` for Load, I was able to stop the error from cropping up, but frankly, I'm not satisfied with the solution; how is it that we are able to run codegen when Loads of custom datatypes are present in the IR? I have not written any code, to my knowledge, that will support this. Perhaps Load does not care about the underlying datatype? * Use CHECK * Add comment about which Mutate_s are needed * Add comments * Add GetCustomDatatypeRegistered as an extern C function * Formatting, comments, casting * Change how datatype string is formatted * Use bits() instead of GetStorageSize Use bits() instead of GetStorageSize * Change comment * Add datatype.py * Change registered function name (datatypes->datatype) * Remove GetStorageSize * Format custom datatypes like any other datatype Specifically, we now print the bits and lanes after the `custom[...]` string. * Correctly implement datatype lowering in Python * Remove unneeded include * Make function naming consistent * Use CHECK instead of internal_assert * Rename macro * Formatting * Rename functions * Implement Cast lowering `_datatype_register_op` is now able to lower both binary ops and Casts. * Formatting * Formatting * Clang format, google style * Fix std::string/extern "C" warnings * Formatting * Formatting * Lower Allocates and Loads during datatype lowering This should ensure that there are no custom datatypes remaining once datatype lowering is done. This will allow us to remove the code in the LLVM codegen which deals with custom datatypes. * Revert additions to codegen_llvm.cc which are now unneeded * Pass cpplint on lower_datatypes.cc * Add clarifying comment * Remove datatype lowering registration funcs from C++ * Add CHECKs * Remove TODO * Remove all references to storage size * Move and rename function * Rename function * Remove done TODOs and other handled comments * Remove irrelevant Load code and comments * Comment out the IR node types I'm not sure about yet * Add bfloat16 datatype unittest * Fix MakeConstScalar MakeConstScalar for a custom datatype will now call out to a function which can be registered on a per-datatype basis. The function will take a double and return the equivalent value in the custom datatype format. Note that these code paths are not actually used or tested at the moment. I have not yet written an example which uses const scalars of a custom datatype. * Formatting * Change pass name * Allow users to register whatever lowering function they want Tianqi pointed out that users should be able to register whatever lowering function they want, and should not be constrained to registering lowering functions which just call out to external libraries. I still provide a function for making lowering functions which call out to external libraries, for convenience. * Add clarifying comment * Remove unneeded comment * Remove unneeded function * Rename file * Undo unnecessary change * Undo unnecessary change * Make naming consistent Rename "datatypes" to "custom datatypes" in most contexts. * Revert an artifact of old code * Fix build warnings, add TODO * Lint * Remove unnecessary use of extern C by separating decl and impl * Error checking * Remove TODO * Missed a name change * Lint * Python lint * Correctly format datatype * Move bfloat16 to 3rdparty * "custom_datatypes" --> "datatype" in most places I left the pass as "LowerCustomDatatypes" to indicate that we're not lowering anything other than custom datatypes. Otherwise, everything else has been changed. * Upgrade datatype unittest I used a float calculator to generate some real testcases for the unittest. * Separate public includes and private implementation Specifically, create cleaner decoupling between datatypes stuff in packed_func and the datatype registry implementation. * Formatting * Limit custom datatype codes to >128 * Add TODOs * Fix comment * Formatting * Clean up datatype unittest * Remove un-exported functions in public headers; UIntImm->FloatImm More places where I accidentally was using implementation-only functions in public headers. Additionally, store custom datatype immediates as FloatImms. A later change will add new lowering logic to lower these FloatImms to UIntImms. Plus formatting change. * Lint * Use FloatImm (not UIntImm) to hold immediates of custom datatypes This change switches from using UIntImm to FloatImm for storing immediates of custom datatypes. The value of the number is stored in a double, which should be enough precision for now, for most custom types we will explore in the immediate future. In line with this change, we change the datatype lowering so that FloatImms are lowered to UInts of the appropriate size. Originally, this was going to be done by allowing the user to register a double->uint_<storage size>_t conversion which would be called at compile time to convert the value from the FloatImm to a UInt and store it in a UIntImm. After discussions with Tianqi, we decided to take the simpler route, and lower FloatImms just as we lower all other ops: by replacing them with Call nodes. In this case, presumably the user will Call out to a conversion function in their datatype library. The justification for this decision is due to the functionality added in #1486. This pull request adds the ability to load LLVM bytecode in at compile time. This applies in our case as follows: 1. The user writes their custom datatype programs and registers their lowering functions in the same way we've been doing it so far. All operations over custom datatypes are lowered to Calls to the datatype library. 2. The user compiles their datatype library to LLVM bytecode. 3. At TVM compile time, the user loads the LLVM bytecode. Depending on how the datatype library is written, Clang should be able to perform constant folding over the custom datatype immediates, even if their conversions are done with calls to the library. Additionally adds test to test the FloatImm codepath. * Re-add a change I removed accidentally during rebase * Cleanup * Remove unnecessary TVM_DLLs * Add custom datatype utilities source file to Go runtime pack * Revert "Remove unnecessary TVM_DLLs" This reverts commit 4b742b99557fd3bf0ce6617f033c8b444b74eda4. * Mark bfloat code as TVM_DLL * Moves custom datatype runtime utilities to c_runtime_api.cc * Revert "Add custom datatype utilities source file to Go runtime pack" This reverts commit aecbcde0b2cc09a2693955b77037fe20f93b5bfd. * Move datatype parsing to its own function * Change comments * Remove unneeded function * Formatting * Formatting * Documentation * Add kCustomBegin, use it for checking for custom types * Documentation * Formatting * Move static definition to implementation * Remove comment * Decide toBeLowered before lowering arguments of Expr In the past, e.g. when lowering custom datatypes for an Add, we would lower a and b first, and then decide whether the resulting new Add needed to be lowered based on the (new) types of a and b. Now, instead, we need to check the types of a and b first (to see if they're custom types), and then lower them (so they'll become non-custom types), and then lower the new Add. * Revert "Move datatype parsing to its own function" This reverts commit d554a5881afcf69af1c070d882a7651022703a09. This broke parsing. Will figure this out later. There isn't a really clean way to separate this out given how the rest of the function is written. * Replace comment * Documentation * Remove comment and TVM_DLL * Better error messages * Remove artifact of rebase * Separate datatypes parsing to its own function * Add \returns * Comment changes; add TODO * Refactor tests
2019-05-15 23:34:30 +03:00
// For now, we store const scalar values of custom datatypes within doubles; later, during the
// datatypes lowering pass, we will lower the value to its true representation in the format
// specified by the datatype.
// TODO(gus) when do we need to start worrying about doubles not being precise enough?
if (static_cast<uint8_t>(t.code()) >= static_cast<uint8_t>(kCustomBegin))
return ir::FloatImm::make(t, static_cast<double>(value));
LOG(FATAL) << "cannot make const for type " << t;
return Expr();
}
template<typename ValueType, typename>
inline Expr make_const(Type t, ValueType value) {
if (t.lanes() == 1) {
return MakeConstScalar(t, value);
} else {
return ir::Broadcast::make(
MakeConstScalar(t.element_of(), value), t.lanes());
}
}
inline Expr make_zero(Type t) {
if (t.is_handle()) {
return reinterpret(t, make_const(UInt(64), 0));
}
return make_const(t, 0);
}
// additional const expression overloading
#define TVM_DEFINE_ASSIGN_OP_OVERLOAD(Name, OpFunc) \
inline Expr Name(Expr& a, Expr b) { \
a = OpFunc(a, b); \
return a; \
}
#define TVM_DEFINE_BINOP_CONST_VAL_OVERLOAD(Name) \
inline Expr Name(const Expr& a, float b) { \
return Name(a, Expr(b)); \
} \
inline Expr Name(float a, const Expr& b) { \
return Name(Expr(a), b); \
} \
inline Expr Name(int a, const Expr& b) { \
return Name(make_const(b.type(), a), b); \
} \
inline Expr Name(const Expr& a, int b) { \
return Name(a, make_const(a.type(), b)); \
}
#define TVM_DEFINE_LOGICAL_OP_CONST_VAL_OVERLOAD(Name) \
inline Expr Name(const Expr& a, bool b) { \
return Name(a, Expr(b)); \
} \
inline Expr Name(bool a, const Expr& b) { \
return Name(Expr(a), b); \
}
#define TVM_DEFINE_INT_OP_CONST_VAL_OVERLOAD(Name) \
inline Expr Name(const Expr& a, int b) { \
return Name(a, make_const(a.type(), b)); \
} \
inline Expr Name(int a, const Expr& b) { \
return Name(make_const(b.type(), a), b); \
}
TVM_DEFINE_ASSIGN_OP_OVERLOAD(operator+=, operator+);
TVM_DEFINE_ASSIGN_OP_OVERLOAD(operator-=, operator-);
TVM_DEFINE_ASSIGN_OP_OVERLOAD(operator*=, operator*);
TVM_DEFINE_ASSIGN_OP_OVERLOAD(operator/=, operator/);
TVM_DEFINE_BINOP_CONST_VAL_OVERLOAD(operator+);
TVM_DEFINE_BINOP_CONST_VAL_OVERLOAD(operator-);
TVM_DEFINE_BINOP_CONST_VAL_OVERLOAD(operator*);
TVM_DEFINE_BINOP_CONST_VAL_OVERLOAD(operator/);
TVM_DEFINE_BINOP_CONST_VAL_OVERLOAD(max);
TVM_DEFINE_BINOP_CONST_VAL_OVERLOAD(min);
TVM_DEFINE_BINOP_CONST_VAL_OVERLOAD(operator>); // NOLINT(*)
TVM_DEFINE_BINOP_CONST_VAL_OVERLOAD(operator>=);
TVM_DEFINE_BINOP_CONST_VAL_OVERLOAD(operator<); // NOLINT(*)
TVM_DEFINE_BINOP_CONST_VAL_OVERLOAD(operator<=);
// integer related ops
TVM_DEFINE_INT_OP_CONST_VAL_OVERLOAD(operator%);
TVM_DEFINE_INT_OP_CONST_VAL_OVERLOAD(operator>>); // NOLINT(*)
TVM_DEFINE_INT_OP_CONST_VAL_OVERLOAD(operator<<); // NOLINT(*)
TVM_DEFINE_INT_OP_CONST_VAL_OVERLOAD(operator&);
TVM_DEFINE_INT_OP_CONST_VAL_OVERLOAD(operator|);
TVM_DEFINE_INT_OP_CONST_VAL_OVERLOAD(operator^);
// logical ops
TVM_DEFINE_LOGICAL_OP_CONST_VAL_OVERLOAD(operator&&);
TVM_DEFINE_LOGICAL_OP_CONST_VAL_OVERLOAD(operator||);
} // namespace tvm
#endif // TVM_EXPR_OPERATOR_H_