Граф коммитов

  • 36098ae227 update readme. fractaltensor_artifact lcy.seso 2024-09-19 06:22:33 +0000
  • 2264119e9d Simplify ncu scripts. lcy.seso 2024-09-19 05:03:00 +0000
  • fc061ce512 Update table6 README. lcy.seso 2024-09-18 05:30:33 +0000
  • 2074b3b8ab Fixed ncu script bugs. lcy.seso 2024-09-18 05:16:10 +0000
  • f2ac7443ce update README. lcy.seso 2024-09-17 09:18:27 +0000
  • 95a4e06118 open the tunning option for tvm. lcy.seso 2024-09-16 10:25:43 +0000
  • 3e8e8e9778 fix README. lcy.seso 2024-09-16 10:01:37 +0000
  • cb55867291 fix the error file name. lcy.seso 2024-09-14 09:00:39 +0000
  • d81ffbb403 add missing file. lcy.seso 2024-09-13 05:35:24 +0000
  • 41f0ee576c add ncu tests. lcy.seso 2024-09-13 04:04:21 +0000
  • 23b8a00a14 fix the script to run tvm test. lcy.seso 2024-09-12 05:23:07 +0000
  • 388fa4a568 add build big bird into the build script. lcy.seso 2024-09-11 08:12:29 +0000
  • f2f45e56c0 add the artifact evaluation for fractaltensor. lcy.seso 2024-08-21 15:45:40 +0000
  • 7557e3649c
    Merge cce49db2e0 into 00c13173a1 donglinb 2024-06-13 02:44:06 +0000
  • 0c8963d74e
    Merge cfae096f9c into 00c13173a1 Lingxiao Ma 2024-04-19 12:44:16 -0700
  • 00c13173a1
    [Security] Fix dangling s3 bucket (#531) main Ziming Miao 2024-04-01 11:27:49 +0800
  • 5e0045d902 fix Ziming Miao 2024-04-01 12:21:50 +0900
  • fcf5f3f902 fp16 support dsl_kernel Jilong Xue 2023-11-22 16:43:07 +0900
  • 438996fa79
    Merge bcbe7d0e87 into 56f3ab5c4b Lei Wang 2023-11-01 08:30:59 -0600
  • 56f3ab5c4b
    support folding onnx>2GB with onnxruntime (#528) xbox Ziming Miao 2023-10-30 20:53:28 +0800
  • 9d1dded971 fix Ziming Miao 2023-10-30 21:53:12 +0900
  • 9f3cef8f3a port nofuse flag, fix weight path of optimized onnx model Ziming Miao 2023-10-23 15:27:51 +0900
  • 7137aed46c avoid create large constant when converting onnx, some other fix Ziming Miao 2023-10-23 11:49:48 +0900
  • 35e1a76f25
    Release loaded DLL libraries in destructor of class Executor (#510) donglinb 2023-07-19 11:06:19 +0800
  • 4aaf4c50c0 update some features LadderLLM LeiWang1999 2023-07-14 04:01:57 -0800
  • 1e60c132e5 add round zimiao 2023-07-11 17:27:19 +0800
  • f5a2de3f0c init figures LeiWang1999 2023-07-09 01:44:40 -0800
  • 34c594173d update readme LeiWang1999 2023-07-09 01:26:48 -0800
  • d9fbd114df update readme LeiWang1999 2023-07-09 01:18:27 -0800
  • 8e94fd291d scrip LeiWang1999 2023-07-09 01:10:34 -0800
  • 3f9dd4e77b update readme LeiWang1999 2023-07-09 01:10:14 -0800
  • 510a592745 update quatization scrips LeiWang1999 2023-07-09 01:09:48 -0800
  • ed049a8890 fix std bugs LeiWang1999 2023-07-09 00:06:27 -0800
  • 3a17d50555 update welder tvm LeiWang1999 2023-07-08 23:17:19 -0800
  • 3d84a86065 welder update LeiWang1999 2023-07-08 23:11:43 -0800
  • d489538548 quantization support LeiWang1999 2023-07-08 23:11:07 -0800
  • 79af71fcd7 nnfusion update LeiWang1999 2023-07-08 22:54:19 -0800
  • 437b553ae9 update cutlass LeiWang1999 2023-07-08 22:52:07 -0800
  • 8f50973939 readme Jilong Xue 2023-07-07 21:17:14 +0900
  • 23e3c7a0ab add custom op and support gelu+dropout+linear fusion Jilong Xue 2023-07-07 21:08:35 +0900
  • 5730e935ae
    Some initial implementations for DSL frontend (#524) donglinb/dsl donglinb 2023-07-07 19:59:00 +0800
  • 6f5e182e3b update msav2grad yuqxia 2023-06-27 08:12:24 +0000
  • a53899bc04 add msav2grad yuqxia 2023-06-20 03:08:20 +0000
  • 27c310b880 v1 yuqxia 2023-06-15 08:41:13 +0000
  • 793ce41de2 merge more yuqxia 2023-06-14 11:37:37 +0000
  • b4d214b9d0 merge add mask yuqxia 2023-06-14 11:20:36 +0000
  • e6bc8f1d47 add msav2 yuqxia 2023-06-14 10:18:51 +0000
  • ff25d02e14 modify msa0 yuqxia 2023-06-14 06:02:46 +0000
  • 174647384e added comparation with CustomOp in the AlexNet case donglinb 2023-06-14 08:35:07 +0800
  • fa514dfa8a debug for packing kernel by adding extern C keyword donglinb 2023-06-13 17:31:51 +0800
  • 72e5f38582 generate json graph from antares expression and run custom op donglinb 2023-06-13 16:36:15 +0800
  • 79b52569d9 fp16 precision fix script lingm/xbox Lingxiao Ma 2023-06-06 18:20:11 +0900
  • d99c5fb435 enable subgraphfusion in hlsl backend Lingxiao Ma 2023-06-06 18:11:00 +0900
  • beb2145e6f Merge remote-tracking branch 'origin/xbox' into lingm/xbox Lingxiao Ma 2023-06-06 18:05:16 +0900
  • 323a8bcb04
    match figure id in camera ready and remove slurm related commands (#523) cocktailer_artifact Chen Zhang 2023-06-06 14:57:56 +0800
  • 4b6e760e76 match figure id in camera ready and remove slurm related commands heheda 2023-06-04 16:57:12 +0800
  • 7d1e71ea2b Update to camera-ready version. osdi2023welder Shi Yining 2023-06-01 07:10:46 +0000
  • bcbe7d0e87 lowbit update LeiWang1999 2023-05-29 22:05:52 -0800
  • fd83d07a4a opt msa xiayuqing0622 2023-05-23 06:02:17 +0000
  • 3e1992aef7
    fix antares ir for gather (#522) Ziming Miao 2023-05-22 13:37:28 +0800
  • 28de0281fd fix antares ir for gather Ziming Miao 2023-05-22 14:35:09 +0900
  • 375dcf9d4f
    Update MemEffAttnGrad.cpp Yuqing 2023-05-18 14:46:28 +0800
  • 1f9c8670eb add flashattn grad yuqxia 2023-05-15 06:36:14 +0000
  • d9b7d59bd6 add Identity yuqxia 2023-05-11 07:08:36 +0000
  • ab76b4470e add flash attn submodule yuqxia 2023-05-10 08:39:04 +0000
  • 73e361cb87 add create onnx script yuqxia 2023-05-10 08:28:59 +0000
  • 701b2c42ed add msa as one op yuqxia 2023-05-09 03:32:23 +0000
  • 70576e2109 add MultiScaleAttn yuqxia 2023-05-08 12:14:48 +0000
  • babb729b0d optimize flash attention basic yuqxia 2023-04-26 08:23:56 +0000
  • f4c5272a99 Update description. Shi Yining 2023-04-25 10:22:20 +0000
  • 18b3cc6277 Update scripts. Shi Yining 2023-04-24 15:51:41 +0000
  • f702673b76 fix bug yuqxia 2023-04-25 03:44:40 +0000
  • 290af73e6b
    Cocktailer Artifact (#518) Chen Zhang 2023-04-24 21:32:18 +0800
  • 417546dfae Add tune_welder.py Shi Yining 2023-04-24 10:23:30 +0000
  • 7d0c87a8a2 update links heheda 2023-04-24 17:00:56 +0800
  • 071ded7b35 rename project and remove some script heheda 2023-04-24 14:59:33 +0800
  • 8cea2c4010 Update Ansor script Shi Yining 2023-04-24 00:15:27 +0000
  • b1a889bc6c small fix heheda 2023-04-23 20:34:12 +0800
  • eb62833302 Update Figure5 and Figure11 Shi Yining 2023-04-23 11:29:44 +0000
  • 063996b4f2 update gitignore heheda 2023-04-23 20:15:42 +0800
  • ec0594fb9a remove name 'grinder' from scripts heheda 2023-04-23 19:57:54 +0800
  • dd1796a717 finish rocm? heheda 2023-04-23 17:31:53 +0800
  • 2632ad8f81 kerneldb scripts heheda 2023-04-23 15:27:57 +0800
  • 767c6f8778 remove grinder from filename heheda 2023-04-22 23:08:43 +0800
  • 48d065141c rocm reproduced heheda 2023-04-22 22:59:17 +0800
  • 186fe5be12 first try of rocm kerneldb heheda 2023-04-22 20:44:07 +0800
  • e0a5cc0bd1 Add a readme file. Shi Yining 2023-04-22 07:47:29 +0000
  • 3596cbda59 Add scripts. Shi Yining 2023-04-21 11:16:47 +0000
  • 06a0ff2127 copy roller rocm code heheda 2023-04-21 15:43:15 +0800
  • afdae0201f add rocm kerneldb script heheda 2023-04-21 14:23:52 +0800
  • 089e5eb1e1
    profile for rocm kernel_db import lingm/rocm_kernel_db Lingxiao Ma 2023-04-21 02:31:50 +0800
  • 81e1c012a1 Some fix. Shi Yining 2023-04-19 13:08:38 +0000
  • a6514e7386 update kernels in manual impls heheda 2023-04-20 16:28:47 +0800
  • e568ef60f6 change permission heheda 2023-04-20 13:28:28 +0800
  • 2ba5cf4ba2 remove training code heheda 2023-04-20 13:27:47 +0800
  • ec00f1e403 autotvm kernel heheda 2023-04-20 10:41:50 +0800
  • 9e3c80f4b0 remove cudnn in manual heheda 2023-04-19 22:38:35 +0800
  • 16713b3cdb install_grinder script heheda 2023-04-19 19:51:39 +0800
  • e4e02fe981 remove cudnn heheda 2023-04-19 16:38:08 +0800
  • 246385a179 add more guides heheda 2023-04-19 16:01:27 +0800