Граф коммитов

  • f2d7df38f5 add more guides heheda 2023-04-19 10:43:05 +0800
  • 8266a71f5b fix bug in scripts heheda 2023-04-18 20:59:32 +0800
  • 77fbf5b499 change path heheda 2023-04-18 15:06:53 +0800
  • ac6bd1feb6 copy file from ControlFlow repo heheda 2023-04-18 14:31:53 +0800
  • 8f09a102c2 update ir yuqxia 2023-04-17 11:33:27 +0000
  • df3126ce44 Add artifect files. Shi Yining 2023-04-17 08:42:25 +0000
  • 810a2e6cd8 add mem eff attn basic yuqxia 2023-04-17 08:45:14 +0000
  • cda34737a8 Update profile scripts for baselines. Shi Yining 2023-04-17 07:50:29 +0000
  • 845c52cf48 add __syncthreads() to cf kernels heheda 2023-04-17 15:07:46 +0800
  • b03e0a97a4 support layout of layoutdot LeiWang1999 2023-04-15 23:54:15 -0800
  • 7b605e3726 add dot permutation pass LeiWang1999 2023-04-15 07:22:11 -0800
  • 26685bf03c Add a docker file. Shi Yining 2023-04-14 11:32:47 +0000
  • f1383415bb wrap python part with ifdef heheda 2023-04-14 13:58:34 +0800
  • 6b681e3a4d bug fix .. LeiWang1999 2023-04-11 23:52:19 -0800
  • 48bc398e93 apply code style, revert typedef_int change yuqxia 2023-04-12 06:58:04 +0000
  • 90df32d7d0 resolve conflict yuqxia 2023-04-12 06:46:28 +0000
  • ca473321f5 fix bug yuqxia 2023-04-12 06:44:46 +0000
  • b62ce8400a re-type the CUDA_ARCH String LeiWang1999 2023-04-11 22:36:29 -0800
  • 7eb413f166
    fix bug (#515) Yuqing 2023-04-12 12:19:55 +0800
  • e24e0c6833 fix bug yuqxia/fix_register_fusion_pass yuqing 2023-04-12 11:26:03 +0900
  • 4cb6ce3f07 fix bugs of register fusion pass LeiWang1999 2023-04-11 04:23:57 -0800
  • 556158b6c2 add support for int16_t load (bloom fp16 model) LeiWang1999 2023-04-11 04:23:35 -0800
  • f0c358d685 fix blockfusion sync problem heheda 2023-04-10 16:27:51 +0800
  • 5ccabaca7d Update Volta Tensorcore template. Shi Yining 2023-04-10 04:10:00 +0000
  • 1859e9f33e Rename python package name to welder. Shi Yining 2023-04-07 13:20:37 +0000
  • a73a8b9cfe Update reduce step policy. Shi Yining 2023-04-07 01:35:56 +0000
  • 92bd514054 Update README.md Shi Yining 2023-04-07 01:35:45 +0000
  • 9ec9afd68a Some bug fix. Shi Yining 2023-04-06 07:55:31 +0000
  • d55331f2a5 Update readme install command Shi Yining 2023-04-05 00:41:55 +0000
  • e930bf9f99 Profile kernel in a seperate process Shi Yining 2023-04-05 09:22:18 +0900
  • 5d6413f8e0 Fix local fuse when op is skipped Shi Yining 2023-04-04 23:26:22 +0900
  • df5b963d7f fixup Shi Yining 2023-04-03 08:46:28 +0000
  • 65c6f4f5d7 search unroll width heheda 2023-04-03 16:08:13 +0800
  • f5e67390e5 dump kerneldb requests heheda 2023-04-03 15:49:18 +0800
  • 60a287afab fix depunit bug in loop heheda 2023-04-03 11:48:18 +0800
  • c8cce319e4 remove cudadevicereset heheda 2023-04-03 11:48:02 +0800
  • 6c403e256d fixup Shi Yining 2023-04-02 04:48:34 +0000
  • d210a5fde4 Add cache tensor to policy Shi Yining 2023-03-31 05:00:59 +0000
  • 547136da2e reorganize parameters heheda 2023-03-30 21:02:00 +0800
  • 724cbf5693 add to nhwc pass yuqxia 2023-03-30 06:13:30 +0000
  • 8ae6e90877
    disable external result memory (#513) Ziming Miao 2023-03-29 12:57:29 +0800
  • d92928d327 disable external result memory zimiao 2023-03-29 12:55:15 +0800
  • ecf47cca8f add IfSingle operator heheda 2023-03-28 23:15:51 +0800
  • f23589621f merge rocm code heheda 2023-03-27 16:58:33 +0800
  • a25b432f22 [Experimental] Implement local fusion case Shi Yining 2023-03-27 06:26:53 +0000
  • 581f0e5eb0 Add inverse shape inference. Shi Yining 2023-03-26 06:46:54 +0000
  • 676a01a83a Fix an engine cache bug Shi Yining 2023-03-23 10:43:35 +0000
  • af213acc17 fix check yuqxia 2023-03-22 04:35:08 +0000
  • f86906f59c Merge branch 'lingm/xbox' of https://github.com/microsoft/nnfusion into lingm/xbox Lingxiao Ma 2023-03-17 15:49:44 +0800
  • da721d78c6 merge origin/xbox branch Lingxiao Ma 2023-03-17 15:43:33 +0800
  • 37316c75d7 Fix mma policy infer padded shared memory size Shi Yining 2023-03-17 02:19:28 +0000
  • 6eb1445f56 Add test operators and MMA tests Shi Yining 2023-03-17 10:58:33 +0900
  • 31f604c474 Add condition remove pass after vectorizing Shi Yining 2023-03-17 10:09:56 +0900
  • cd78f1adb4 merge welder yuqxia 2023-03-16 09:27:30 +0000
  • 2aa530bac4 Update TIR scheduler Shi Yining 2023-03-15 10:48:30 +0000
  • 40bea8caa8
    meet int16 request in fp16 bloom model (#512) Lei Wang 2023-03-14 16:57:55 +0800
  • f338e372b6 Update te shape inference Shi Yining 2023-03-14 03:12:25 +0000
  • b382152d25 Add Nimble. Shi Yining 2023-03-11 14:26:41 +0000
  • 6cfa2be0b9 meet int16 request in fp16 bloom model LeiWang1999 2023-03-11 11:42:48 +0000
  • fc58e7756a add transpose-matmul fusion in subgraph_fusion Lingxiao Ma 2023-03-10 16:26:38 +0800
  • 9d3b56cb30 add transpose-matmul fusion in subgraph_fusion Lingxiao Ma 2023-03-10 16:08:35 +0800
  • 9253962b2c
    Fix FileNotFoundError when tuning_steps is 0 (#511) Hongzhou Liu 2023-03-10 13:04:52 +0800
  • e55e3b0a7d Fix FileNotFoundError when tuning_steps is 0 Hongzhou Liu 2023-03-08 00:03:03 +0800
  • 77fd82bcdb Move handle for proxy output to schedule Shi Yining 2023-03-06 06:22:07 +0000
  • bdd695fead add arch, policyv2 roller yuqing 2023-03-07 16:30:16 +0900
  • 4d2abb829b merge xbox yuqing 2023-03-07 16:26:26 +0900
  • e703e344ce added HLSL test pipeline on windows donglinb 2023-03-06 22:41:07 +0800
  • 23fa719ea5 release loaded DLL in destructor donglinb 2023-03-06 15:56:39 +0800
  • 1682bd845c Add check for vector load Your Name 2023-03-04 10:42:39 +0900
  • 8b33cc3a52
    Merge branch 'microsoft:xbox' into xbox donglinb 2023-03-01 18:23:13 -0800
  • dc8389bc5e Fix tir mma schedule Your Name 2023-03-02 00:47:33 +0900
  • 9256a30f72 Update header Shi Yining 2023-03-01 12:34:41 +0000
  • a06d5e7b2b Update policy Shi Yining 2023-02-28 11:47:19 +0000
  • 317b2b3b76 Update policy stride propogate code. Shi Yining 2023-02-28 04:43:04 +0000
  • e26cc77cb7
    Flags to disable emitting cudaSetDevice call (#509) Ziming Miao 2023-02-27 19:48:13 +0800
  • 3447cf49ef flags to disable emitting cudaSetDevice Ziming Miao 2023-02-27 20:46:52 +0900
  • 1c69df6807 disable const folding in onnx exportor Ziming Miao 2023-02-27 20:32:05 +0900
  • 0e494f6fc5
    Yuqxia/diffuser (#506) Yuqing 2023-02-27 14:21:16 +0800
  • 9b96aa80be convert int64_t to long long yuqxia/diffuser yuqing 2023-02-27 15:07:03 +0900
  • e45d274c84 add cutlass submodule Ziming Miao 2023-02-27 13:31:33 +0900
  • 14cfed9629 delete cutlass index Ziming Miao 2023-02-27 13:29:04 +0900
  • 5121aa1757 Update V100 policy Shi Yining 2023-02-24 15:45:56 +0000
  • b38e16f864 Refine extract subgraph logic Shi Yining 2023-02-24 14:54:39 +0000
  • 4e5603207a Fix a tir op bug in te shape inference Shi Yining 2023-02-23 08:48:15 +0000
  • 257f5337b8 Add log op fix Shi Yining 2023-02-23 08:37:46 +0000
  • e168101f6c Skip parsing ir for skipped nodes Shi Yining 2023-02-23 08:37:34 +0000
  • e40f11737e Switch shape inference from antares to te. Shi Yining 2023-02-22 08:44:56 +0000
  • 211b18dd81 Add TIR pass for mma codegen Shi Yining 2023-02-22 08:33:06 +0000
  • 272d07e86c fix gatherv2 Yuqing 2023-02-22 07:57:27 +0000
  • 8546c1010e
    Merge branch 'microsoft:xbox' into xbox donglinb 2023-02-20 02:49:36 -0800
  • cb2ffaa37b Remove unused files Shi Yining 2023-02-19 07:57:12 +0000
  • 46758fa33b fixup Your Name 2023-02-19 16:18:54 +0900
  • 54e5e7a8fc Add mma schedule for sm80 Shi Yining 2023-02-19 03:10:50 +0000
  • 6dc7c0799b fixup Shi Yining 2023-02-18 05:02:26 +0000
  • 1e88dcabd4
    Invoke all ONNX tests (#502) wx 2023-02-17 12:41:03 +0800
  • a918264fc5 Implement TIR MMA scheduler Shi Yining 2023-02-16 05:25:18 +0000
  • a9007e5f91 update antares wrapper zimiao 2023-02-15 20:33:58 +0800
  • bbdbfb3f10 update Reduce ops for opset18 and ground_truth results xbox_test Jilong Xue 2023-02-14 20:38:46 +0800
  • dfc26dd082 Merge remote-tracking branch 'origin/xbox' into xbox_test Jilong Xue 2023-02-14 13:08:01 +0800
  • fd25765ab7 Refactor schedule and build code structure Shi Yining 2023-02-12 07:17:08 +0000