mkdir -p build
mkdir -p dependencies
ENVIRONMENT
============================
CUDA_VERSION: 118
============================
NVCC path: /usr/local/cuda/bin/nvcc
GPP path: /usr/bin/g++ VERSION: g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
CUDA_HOME: /usr/local/cuda
CONDA_PREFIX:
PATH: /root/miniconda3/envs/py3.9/bin:/root/miniconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
============================
/usr/local/cuda/bin/nvcc -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -Xcompiler '-fPIC' --use_fast_math -Xptxas=-v -dc /workspace/bitsandbytes/csrc/ops.cu /workspace/bitsandbytes/csrc/kernels.cu -I /usr/local/cuda/include -I /workspace/bitsandbytes/csrc -I /include -I /workspace/bitsandbytes/include -L /usr/local/cuda/lib64 -lcudart -lcublas -lcublasLt -lcusparse -L /lib --output-directory /workspace/bitsandbytes/build
ptxas info : 15 bytes gmem
ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_75'
ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 352 bytes cmem[0]
ptxas info : 15 bytes gmem
ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_80'
ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 352 bytes cmem[0]
ptxas info : 15 bytes gmem
ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_86'
ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 352 bytes cmem[0]
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas info : 31 bytes gmem
ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_75'
ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 352 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI13__nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI13__nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 74 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI13__nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI13__nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 456 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI6__halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI6__halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 456 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 456 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 59 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 26 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 26 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 29 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 36 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 36 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 36 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 26 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 26 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 26 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 26 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 29 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 36 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 36 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 36 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 26 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 26 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 38 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 45 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 45 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 45 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 51 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 51 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 57 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 51 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 39 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 44 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 44 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 44 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 50 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 50 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 59 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 50 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z19kPercentileClippingI6__halfLi2048ELi4EEvPT_Pfii' for 'sm_75'
ptxas info : Function properties for _Z19kPercentileClippingI6__halfLi2048ELi4EEvPT_Pfii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 35 registers, 376 bytes cmem[0]
ptxas info : Compiling entry function '_Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii' for 'sm_75'
ptxas info : Function properties for _Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 35 registers, 376 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi' for 'sm_75'
ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 484 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi' for 'sm_75'
ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 484 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi' for 'sm_75'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 115 registers, 464 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi' for 'sm_75'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 115 registers, 464 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_75'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6__halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_75'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6__halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_75'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_75'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_75'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_75'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_75'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_75'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_75'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_75'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_75'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_75'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI13__nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit2StateI13__nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI6__halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit2StateI6__halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI13__nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI13__nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 57 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI6__halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI6__halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 57 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 55 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi4EEvPT_S2_PfS3_ffffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi4EEvPT_S2_PfS3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI13__nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI13__nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 49 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 49 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi5EEvPT_S2_PfS3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 50 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 49 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_ffffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 50 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi1EEvPT_S2_PfS3_ffffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi1EEvPT_S2_PfS3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 50 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 53 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI13__nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI13__nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 53 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 53 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 43 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kEstimateQuantilesI6__halfEvPT_PffS1_i' for 'sm_75'
ptxas info : Function properties for _Z18kEstimateQuantilesI6__halfEvPT_PffS1_i
16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 82 registers, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kEstimateQuantilesIfEvPT_PffS0_i' for 'sm_75'
ptxas info : Function properties for _Z18kEstimateQuantilesIfEvPT_PffS0_i
32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 84 registers, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6__halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_75'
ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6__halfPfS2_PcS3_PiS4_S1_S4_fiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 38 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6__halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_75'
ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6__halfPfS2_PcS3_PiS4_S1_S4_fiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 36 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6__halfS1_S1_S3_iiii' for 'sm_75'
ptxas info : Function properties for _Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6__halfS1_S1_S3_iiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 42 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii' for 'sm_75'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii' for 'sm_75'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 42 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii' for 'sm_75'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii' for 'sm_75'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 43 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii' for 'sm_75'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 35 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii' for 'sm_75'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_75'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_75'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_75'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_75'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_75'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_75'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z16kExtractOutliersILi4EEvPcPiS0_iiiii' for 'sm_75'
ptxas info : Function properties for _Z16kExtractOutliersILi4EEvPcPiS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 14 registers, 396 bytes cmem[0]
ptxas info : Compiling entry function '_Z16kExtractOutliersILi3EEvPcPiS0_iiiii' for 'sm_75'
ptxas info : Function properties for _Z16kExtractOutliersILi3EEvPcPiS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 14 registers, 396 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6__halfLi160EEviiiPT_PhPfS2_iiii' for 'sm_75'
ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6__halfLi160EEviiiPT_PhPfS2_iiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 416 bytes cmem[0], 48 bytes cmem[2]
ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6__halfLi128EEviiiPT_PhPfS2_iiii' for 'sm_75'
ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6__halfLi128EEviiiPT_PhPfS2_iiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 416 bytes cmem[0], 48 bytes cmem[2]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi96EEviiiPT_S2_S2_iii' for 'sm_75'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi96EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi64EEviiiPT_S2_S2_iii' for 'sm_75'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi64EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi32EEviiiPT_S2_S2_iii' for 'sm_75'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi32EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi128EEviiiPT_S2_S2_iii' for 'sm_75'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi128EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi160EEviiiPT_S2_S2_iii' for 'sm_75'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi160EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi192EEviiiPT_S2_S2_iii' for 'sm_75'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi192EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi256EEviiiPT_S2_S2_iii' for 'sm_75'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi256EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi96EEviiiPT_S2_S2_iii' for 'sm_75'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi96EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi64EEviiiPT_S2_S2_iii' for 'sm_75'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi64EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi32EEviiiPT_S2_S2_iii' for 'sm_75'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi32EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi128EEviiiPT_S2_S2_iii' for 'sm_75'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi128EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi160EEviiiPT_S2_S2_iii' for 'sm_75'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi160EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi192EEviiiPT_S2_S2_iii' for 'sm_75'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi192EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi256EEviiiPT_S2_S2_iii' for 'sm_75'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi256EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z5kfuncIfLi2EEvPT_S1_S0_l' for 'sm_75'
ptxas info : Function properties for _Z5kfuncIfLi2EEvPT_S1_S0_l
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 26 registers, 384 bytes cmem[0]
ptxas info : Compiling entry function '_Z5kfuncIfLi1EEvPT_S1_S0_l' for 'sm_75'
ptxas info : Function properties for _Z5kfuncIfLi1EEvPT_S1_S0_l
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 384 bytes cmem[0]
ptxas info : Compiling entry function '_Z5kfuncIhLi0EEvPT_S1_S0_l' for 'sm_75'
ptxas info : Function properties for _Z5kfuncIhLi0EEvPT_S1_S0_l
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 384 bytes cmem[0]
ptxas info : Compiling entry function '_Z5kfuncIfLi0EEvPT_S1_S0_l' for 'sm_75'
ptxas info : Function properties for _Z5kfuncIfLi0EEvPT_S1_S0_l
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 384 bytes cmem[0]
ptxas info : Compiling entry function '_Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii' for 'sm_75'
ptxas info : Function properties for _Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii' for 'sm_75'
ptxas info : Function properties for _Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11kDequantizePfPhS_i' for 'sm_75'
ptxas info : Function properties for _Z11kDequantizePfPhS_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 12 registers, 1024 bytes smem, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z9kQuantizePfS_Phi' for 'sm_75'
ptxas info : Function properties for _Z9kQuantizePfS_Phi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 52 registers, 21520 bytes smem, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z22kHistogramScatterAdd2DPfPiS0_S_ii' for 'sm_75'
ptxas info : Function properties for _Z22kHistogramScatterAdd2DPfPiS0_S_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 14 registers, 392 bytes cmem[0]
ptxas info : Function properties for _Z9dQuantizeILi1EEhPfff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9dQuantizeILi0EEhPfff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z12dQuantizeNF4f
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z14dDequantizeNF4h
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z15dhDequantizeNF4h
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z12dQuantizeFP4f
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z18dDequantizeFP4Treehf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z15d2DequantizeFP4h
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z14dDequantizeFP4hf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9atomicMinPff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9atomicMaxPff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas info : 31 bytes gmem
ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_80'
ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 352 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI13__nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI13__nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 74 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI13__nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI13__nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 456 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI6__halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI6__halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 456 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi
8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads
ptxas info : Used 80 registers, 456 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 56 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 56 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 26 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 26 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 26 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 26 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 26 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 26 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z19kPercentileClippingI6__halfLi2048ELi4EEvPT_Pfii' for 'sm_80'
ptxas info : Function properties for _Z19kPercentileClippingI6__halfLi2048ELi4EEvPT_Pfii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 376 bytes cmem[0]
ptxas info : Compiling entry function '_Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii' for 'sm_80'
ptxas info : Function properties for _Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 376 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi' for 'sm_80'
ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 484 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi' for 'sm_80'
ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 484 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi' for 'sm_80'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 115 registers, 464 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi' for 'sm_80'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 120 registers, 464 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_80'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6__halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_80'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6__halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_80'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_80'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_80'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_80'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_80'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_80'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_80'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_80'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_80'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_80'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI13__nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit2StateI13__nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI6__halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit2StateI6__halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI13__nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI13__nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 55 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI6__halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI6__halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 55 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 56 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi4EEvPT_S2_PfS3_ffffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi4EEvPT_S2_PfS3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI13__nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI13__nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi5EEvPT_S2_PfS3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_ffffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi1EEvPT_S2_PfS3_ffffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi1EEvPT_S2_PfS3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 46 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI13__nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI13__nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 46 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 46 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 46 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 46 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kEstimateQuantilesI6__halfEvPT_PffS1_i' for 'sm_80'
ptxas info : Function properties for _Z18kEstimateQuantilesI6__halfEvPT_PffS1_i
16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 82 registers, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kEstimateQuantilesIfEvPT_PffS0_i' for 'sm_80'
ptxas info : Function properties for _Z18kEstimateQuantilesIfEvPT_PffS0_i
32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 83 registers, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6__halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_80'
ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6__halfPfS2_PcS3_PiS4_S1_S4_fiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6__halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_80'
ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6__halfPfS2_PcS3_PiS4_S1_S4_fiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6__halfS1_S1_S3_iiii' for 'sm_80'
ptxas info : Function properties for _Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6__halfS1_S1_S3_iiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 36 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii' for 'sm_80'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii' for 'sm_80'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii' for 'sm_80'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii' for 'sm_80'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii' for 'sm_80'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii' for 'sm_80'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_80'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_80'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_80'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_80'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_80'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_80'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z16kExtractOutliersILi4EEvPcPiS0_iiiii' for 'sm_80'
ptxas info : Function properties for _Z16kExtractOutliersILi4EEvPcPiS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 14 registers, 396 bytes cmem[0]
ptxas info : Compiling entry function '_Z16kExtractOutliersILi3EEvPcPiS0_iiiii' for 'sm_80'
ptxas info : Function properties for _Z16kExtractOutliersILi3EEvPcPiS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 13 registers, 396 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6__halfLi160EEviiiPT_PhPfS2_iiii' for 'sm_80'
ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6__halfLi160EEviiiPT_PhPfS2_iiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6__halfLi128EEviiiPT_PhPfS2_iiii' for 'sm_80'
ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6__halfLi128EEviiiPT_PhPfS2_iiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi96EEviiiPT_S2_S2_iii' for 'sm_80'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi96EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi64EEviiiPT_S2_S2_iii' for 'sm_80'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi64EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi32EEviiiPT_S2_S2_iii' for 'sm_80'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi32EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi128EEviiiPT_S2_S2_iii' for 'sm_80'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi128EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi160EEviiiPT_S2_S2_iii' for 'sm_80'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi160EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi192EEviiiPT_S2_S2_iii' for 'sm_80'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi192EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi256EEviiiPT_S2_S2_iii' for 'sm_80'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi256EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi96EEviiiPT_S2_S2_iii' for 'sm_80'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi96EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi64EEviiiPT_S2_S2_iii' for 'sm_80'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi64EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi32EEviiiPT_S2_S2_iii' for 'sm_80'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi32EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi128EEviiiPT_S2_S2_iii' for 'sm_80'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi128EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi160EEviiiPT_S2_S2_iii' for 'sm_80'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi160EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi192EEviiiPT_S2_S2_iii' for 'sm_80'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi192EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi256EEviiiPT_S2_S2_iii' for 'sm_80'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi256EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 167 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z5kfuncIfLi2EEvPT_S1_S0_l' for 'sm_80'
ptxas info : Function properties for _Z5kfuncIfLi2EEvPT_S1_S0_l
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 26 registers, 384 bytes cmem[0]
ptxas info : Compiling entry function '_Z5kfuncIfLi1EEvPT_S1_S0_l' for 'sm_80'
ptxas info : Function properties for _Z5kfuncIfLi1EEvPT_S1_S0_l
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 384 bytes cmem[0]
ptxas info : Compiling entry function '_Z5kfuncIhLi0EEvPT_S1_S0_l' for 'sm_80'
ptxas info : Function properties for _Z5kfuncIhLi0EEvPT_S1_S0_l
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 384 bytes cmem[0]
ptxas info : Compiling entry function '_Z5kfuncIfLi0EEvPT_S1_S0_l' for 'sm_80'
ptxas info : Function properties for _Z5kfuncIfLi0EEvPT_S1_S0_l
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 384 bytes cmem[0]
ptxas info : Compiling entry function '_Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii' for 'sm_80'
ptxas info : Function properties for _Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 29 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii' for 'sm_80'
ptxas info : Function properties for _Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11kDequantizePfPhS_i' for 'sm_80'
ptxas info : Function properties for _Z11kDequantizePfPhS_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 12 registers, 1024 bytes smem, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z9kQuantizePfS_Phi' for 'sm_80'
ptxas info : Function properties for _Z9kQuantizePfS_Phi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 51 registers, 21520 bytes smem, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z22kHistogramScatterAdd2DPfPiS0_S_ii' for 'sm_80'
ptxas info : Function properties for _Z22kHistogramScatterAdd2DPfPiS0_S_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 14 registers, 392 bytes cmem[0]
ptxas info : Function properties for _Z9dQuantizeILi1EEhPfff
8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads
ptxas info : Function properties for _Z9dQuantizeILi0EEhPfff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z12dQuantizeNF4f
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z14dDequantizeNF4h
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z15dhDequantizeNF4h
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z12dQuantizeFP4f
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z18dDequantizeFP4Treehf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z15d2DequantizeFP4h
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z14dDequantizeFP4hf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9atomicMinPff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9atomicMaxPff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas info : 31 bytes gmem
ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_86'
ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 352 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI13__nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI13__nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 74 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI13__nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI13__nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi
16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads
ptxas info : Used 80 registers, 456 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI6__halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI6__halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi
16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads
ptxas info : Used 80 registers, 456 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 456 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 392 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 34 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 34 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 34 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 34 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 34 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 34 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 34 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 34 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 34 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 34 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 34 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 34 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 39 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 39 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 39 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 39 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 38 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 39 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 39 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 39 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z19kPercentileClippingI6__halfLi2048ELi4EEvPT_Pfii' for 'sm_86'
ptxas info : Function properties for _Z19kPercentileClippingI6__halfLi2048ELi4EEvPT_Pfii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 37 registers, 376 bytes cmem[0]
ptxas info : Compiling entry function '_Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii' for 'sm_86'
ptxas info : Function properties for _Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 39 registers, 376 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi' for 'sm_86'
ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 484 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi' for 'sm_86'
ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 484 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi' for 'sm_86'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 115 registers, 464 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi' for 'sm_86'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 120 registers, 464 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_86'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6__halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_86'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6__halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_86'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 62 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_86'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_86'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_86'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 62 registers, 452 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_86'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_86'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_86'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_86'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_86'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_86'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI13__nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit2StateI13__nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI6__halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit2StateI6__halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI13__nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI13__nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 55 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI6__halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI6__halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 55 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 56 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi4EEvPT_S2_PfS3_ffffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi4EEvPT_S2_PfS3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI13__nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI13__nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi5EEvPT_S2_PfS3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_ffffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi1EEvPT_S2_PfS3_ffffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi1EEvPT_S2_PfS3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 428 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 46 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI13__nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI13__nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 46 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 46 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 46 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 46 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kEstimateQuantilesI6__halfEvPT_PffS1_i' for 'sm_86'
ptxas info : Function properties for _Z18kEstimateQuantilesI6__halfEvPT_PffS1_i
16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 82 registers, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kEstimateQuantilesIfEvPT_PffS0_i' for 'sm_86'
ptxas info : Function properties for _Z18kEstimateQuantilesIfEvPT_PffS0_i
32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 83 registers, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6__halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_86'
ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6__halfPfS2_PcS3_PiS4_S1_S4_fiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 38 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6__halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_86'
ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6__halfPfS2_PcS3_PiS4_S1_S4_fiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 36 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6__halfS1_S1_S3_iiii' for 'sm_86'
ptxas info : Function properties for _Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6__halfS1_S1_S3_iiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 37 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii' for 'sm_86'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii' for 'sm_86'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii' for 'sm_86'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii' for 'sm_86'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii' for 'sm_86'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii' for 'sm_86'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_86'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_86'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_86'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_86'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_86'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_86'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z16kExtractOutliersILi4EEvPcPiS0_iiiii' for 'sm_86'
ptxas info : Function properties for _Z16kExtractOutliersILi4EEvPcPiS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 14 registers, 396 bytes cmem[0]
ptxas info : Compiling entry function '_Z16kExtractOutliersILi3EEvPcPiS0_iiiii' for 'sm_86'
ptxas info : Function properties for _Z16kExtractOutliersILi3EEvPcPiS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 13 registers, 396 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6__halfLi160EEviiiPT_PhPfS2_iiii' for 'sm_86'
ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6__halfLi160EEviiiPT_PhPfS2_iiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 72 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6__halfLi128EEviiiPT_PhPfS2_iiii' for 'sm_86'
ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6__halfLi128EEviiiPT_PhPfS2_iiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 72 registers, 416 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi96EEviiiPT_S2_S2_iii' for 'sm_86'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi96EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 168 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi64EEviiiPT_S2_S2_iii' for 'sm_86'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi64EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 168 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi32EEviiiPT_S2_S2_iii' for 'sm_86'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi32EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 168 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi128EEviiiPT_S2_S2_iii' for 'sm_86'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi128EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 168 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi160EEviiiPT_S2_S2_iii' for 'sm_86'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi160EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 168 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi192EEviiiPT_S2_S2_iii' for 'sm_86'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi192EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 168 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi16ELi256EEviiiPT_S2_S2_iii' for 'sm_86'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi16ELi256EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 168 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi96EEviiiPT_S2_S2_iii' for 'sm_86'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi96EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 168 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi64EEviiiPT_S2_S2_iii' for 'sm_86'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi64EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 168 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi32EEviiiPT_S2_S2_iii' for 'sm_86'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi32EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 168 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi128EEviiiPT_S2_S2_iii' for 'sm_86'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi128EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 168 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi160EEviiiPT_S2_S2_iii' for 'sm_86'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi160EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 168 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi192EEviiiPT_S2_S2_iii' for 'sm_86'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi192EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 168 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11gemm_deviceI6__halfLi32ELi256EEviiiPT_S2_S2_iii' for 'sm_86'
ptxas info : Function properties for _Z11gemm_deviceI6__halfLi32ELi256EEviiiPT_S2_S2_iii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 168 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z5kfuncIfLi2EEvPT_S1_S0_l' for 'sm_86'
ptxas info : Function properties for _Z5kfuncIfLi2EEvPT_S1_S0_l
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 384 bytes cmem[0]
ptxas info : Compiling entry function '_Z5kfuncIfLi1EEvPT_S1_S0_l' for 'sm_86'
ptxas info : Function properties for _Z5kfuncIfLi1EEvPT_S1_S0_l
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 384 bytes cmem[0]
ptxas info : Compiling entry function '_Z5kfuncIhLi0EEvPT_S1_S0_l' for 'sm_86'
ptxas info : Function properties for _Z5kfuncIhLi0EEvPT_S1_S0_l
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 384 bytes cmem[0]
ptxas info : Compiling entry function '_Z5kfuncIfLi0EEvPT_S1_S0_l' for 'sm_86'
ptxas info : Function properties for _Z5kfuncIfLi0EEvPT_S1_S0_l
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 384 bytes cmem[0]
ptxas info : Compiling entry function '_Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii' for 'sm_86'
ptxas info : Function properties for _Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii' for 'sm_86'
ptxas info : Function properties for _Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11kDequantizePfPhS_i' for 'sm_86'
ptxas info : Function properties for _Z11kDequantizePfPhS_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 12 registers, 1024 bytes smem, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z9kQuantizePfS_Phi' for 'sm_86'
ptxas info : Function properties for _Z9kQuantizePfS_Phi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 51 registers, 21520 bytes smem, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z22kHistogramScatterAdd2DPfPiS0_S_ii' for 'sm_86'
ptxas info : Function properties for _Z22kHistogramScatterAdd2DPfPiS0_S_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 14 registers, 392 bytes cmem[0]
ptxas info : Function properties for _Z9dQuantizeILi1EEhPfff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9dQuantizeILi0EEhPfff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z12dQuantizeNF4f
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z14dDequantizeNF4h
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z15dhDequantizeNF4h
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z12dQuantizeFP4f
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z18dDequantizeFP4Treehf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z15d2DequantizeFP4h
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z14dDequantizeFP4hf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9atomicMinPff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9atomicMaxPff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
/usr/local/cuda/bin/nvcc -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -Xcompiler '-fPIC' -dlink /workspace/bitsandbytes/build/ops.o /workspace/bitsandbytes/build/kernels.o -o /workspace/bitsandbytes/build/link.o
/usr/bin/g++ -std=c++14 -DBUILD_CUDA -shared -fPIC -I /usr/local/cuda/include -I /workspace/bitsandbytes/csrc -I /include -I /workspace/bitsandbytes/include /workspace/bitsandbytes/build/ops.o /workspace/bitsandbytes/build/kernels.o /workspace/bitsandbytes/build/link.o /workspace/bitsandbytes/csrc/common.cpp /workspace/bitsandbytes/csrc/cpu_ops.cpp /workspace/bitsandbytes/csrc/pythonInterface.c -o ./bitsandbytes/libbitsandbytes_cuda118.so -L /usr/local/cuda/lib64 -lcudart -lcublas -lcublasLt -lcusparse -L /lib
libs: ['libbitsandbytes_cuda118.so']
running install
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
creating bitsandbytes.egg-info
writing bitsandbytes.egg-info/PKG-INFO
writing dependency_links to bitsandbytes.egg-info/dependency_links.txt
writing top-level names to bitsandbytes.egg-info/top_level.txt
writing manifest file 'bitsandbytes.egg-info/SOURCES.txt'
reading manifest file 'bitsandbytes.egg-info/SOURCES.txt'
adding license file 'LICENSE'
adding license file 'NOTICE.md'
writing manifest file 'bitsandbytes.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/lib
creating build/lib/bitsandbytes
copying bitsandbytes/__init__.py -> build/lib/bitsandbytes
copying bitsandbytes/__main__.py -> build/lib/bitsandbytes
copying bitsandbytes/cextension.py -> build/lib/bitsandbytes
copying bitsandbytes/functional.py -> build/lib/bitsandbytes
copying bitsandbytes/utils.py -> build/lib/bitsandbytes
creating build/lib/bitsandbytes/autograd
copying bitsandbytes/autograd/__init__.py -> build/lib/bitsandbytes/autograd
copying bitsandbytes/autograd/_functions.py -> build/lib/bitsandbytes/autograd
creating build/lib/bitsandbytes/cuda_setup
copying bitsandbytes/cuda_setup/__init__.py -> build/lib/bitsandbytes/cuda_setup
copying bitsandbytes/cuda_setup/env_vars.py -> build/lib/bitsandbytes/cuda_setup
copying bitsandbytes/cuda_setup/main.py -> build/lib/bitsandbytes/cuda_setup
creating build/lib/bitsandbytes/nn
copying bitsandbytes/nn/__init__.py -> build/lib/bitsandbytes/nn
copying bitsandbytes/nn/modules.py -> build/lib/bitsandbytes/nn
copying bitsandbytes/nn/triton_based_modules.py -> build/lib/bitsandbytes/nn
creating build/lib/bitsandbytes/optim
copying bitsandbytes/optim/__init__.py -> build/lib/bitsandbytes/optim
copying bitsandbytes/optim/adagrad.py -> build/lib/bitsandbytes/optim
copying bitsandbytes/optim/adam.py -> build/lib/bitsandbytes/optim
copying bitsandbytes/optim/adamw.py -> build/lib/bitsandbytes/optim
copying bitsandbytes/optim/lamb.py -> build/lib/bitsandbytes/optim
copying bitsandbytes/optim/lars.py -> build/lib/bitsandbytes/optim
copying bitsandbytes/optim/lion.py -> build/lib/bitsandbytes/optim
copying bitsandbytes/optim/optimizer.py -> build/lib/bitsandbytes/optim
copying bitsandbytes/optim/rmsprop.py -> build/lib/bitsandbytes/optim
copying bitsandbytes/optim/sgd.py -> build/lib/bitsandbytes/optim
creating build/lib/bitsandbytes/research
copying bitsandbytes/research/__init__.py -> build/lib/bitsandbytes/research
creating build/lib/bitsandbytes/triton
copying bitsandbytes/triton/__init__.py -> build/lib/bitsandbytes/triton
copying bitsandbytes/triton/dequantize_rowwise.py -> build/lib/bitsandbytes/triton
copying bitsandbytes/triton/int8_matmul_mixed_dequanitze.py -> build/lib/bitsandbytes/triton
copying bitsandbytes/triton/int8_matmul_rowwise_dequantize.py -> build/lib/bitsandbytes/triton
copying bitsandbytes/triton/quantize_columnwise_and_transpose.py -> build/lib/bitsandbytes/triton
copying bitsandbytes/triton/quantize_global.py -> build/lib/bitsandbytes/triton
copying bitsandbytes/triton/quantize_rowwise.py -> build/lib/bitsandbytes/triton
copying bitsandbytes/triton/triton_utils.py -> build/lib/bitsandbytes/triton
creating build/lib/bitsandbytes/research/autograd
copying bitsandbytes/research/autograd/__init__.py -> build/lib/bitsandbytes/research/autograd
copying bitsandbytes/research/autograd/_functions.py -> build/lib/bitsandbytes/research/autograd
creating build/lib/bitsandbytes/research/nn
copying bitsandbytes/research/nn/__init__.py -> build/lib/bitsandbytes/research/nn
copying bitsandbytes/research/nn/modules.py -> build/lib/bitsandbytes/research/nn
copying bitsandbytes/libbitsandbytes_cuda118.so -> build/lib/bitsandbytes
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/bitsandbytes
copying build/lib/bitsandbytes/__init__.py -> build/bdist.linux-x86_64/egg/bitsandbytes
copying build/lib/bitsandbytes/__main__.py -> build/bdist.linux-x86_64/egg/bitsandbytes
copying build/lib/bitsandbytes/cextension.py -> build/bdist.linux-x86_64/egg/bitsandbytes
copying build/lib/bitsandbytes/functional.py -> build/bdist.linux-x86_64/egg/bitsandbytes
copying build/lib/bitsandbytes/utils.py -> build/bdist.linux-x86_64/egg/bitsandbytes
creating build/bdist.linux-x86_64/egg/bitsandbytes/autograd
copying build/lib/bitsandbytes/autograd/__init__.py -> build/bdist.linux-x86_64/egg/bitsandbytes/autograd
copying build/lib/bitsandbytes/autograd/_functions.py -> build/bdist.linux-x86_64/egg/bitsandbytes/autograd
creating build/bdist.linux-x86_64/egg/bitsandbytes/cuda_setup
copying build/lib/bitsandbytes/cuda_setup/__init__.py -> build/bdist.linux-x86_64/egg/bitsandbytes/cuda_setup
copying build/lib/bitsandbytes/cuda_setup/env_vars.py -> build/bdist.linux-x86_64/egg/bitsandbytes/cuda_setup
copying build/lib/bitsandbytes/cuda_setup/main.py -> build/bdist.linux-x86_64/egg/bitsandbytes/cuda_setup
creating build/bdist.linux-x86_64/egg/bitsandbytes/nn
copying build/lib/bitsandbytes/nn/__init__.py -> build/bdist.linux-x86_64/egg/bitsandbytes/nn
copying build/lib/bitsandbytes/nn/modules.py -> build/bdist.linux-x86_64/egg/bitsandbytes/nn
copying build/lib/bitsandbytes/nn/triton_based_modules.py -> build/bdist.linux-x86_64/egg/bitsandbytes/nn
creating build/bdist.linux-x86_64/egg/bitsandbytes/optim
copying build/lib/bitsandbytes/optim/__init__.py -> build/bdist.linux-x86_64/egg/bitsandbytes/optim
copying build/lib/bitsandbytes/optim/adagrad.py -> build/bdist.linux-x86_64/egg/bitsandbytes/optim
copying build/lib/bitsandbytes/optim/adam.py -> build/bdist.linux-x86_64/egg/bitsandbytes/optim
copying build/lib/bitsandbytes/optim/adamw.py -> build/bdist.linux-x86_64/egg/bitsandbytes/optim
copying build/lib/bitsandbytes/optim/lamb.py -> build/bdist.linux-x86_64/egg/bitsandbytes/optim
copying build/lib/bitsandbytes/optim/lars.py -> build/bdist.linux-x86_64/egg/bitsandbytes/optim
copying build/lib/bitsandbytes/optim/lion.py -> build/bdist.linux-x86_64/egg/bitsandbytes/optim
copying build/lib/bitsandbytes/optim/optimizer.py -> build/bdist.linux-x86_64/egg/bitsandbytes/optim
copying build/lib/bitsandbytes/optim/rmsprop.py -> build/bdist.linux-x86_64/egg/bitsandbytes/optim
copying build/lib/bitsandbytes/optim/sgd.py -> build/bdist.linux-x86_64/egg/bitsandbytes/optim
creating build/bdist.linux-x86_64/egg/bitsandbytes/research
copying build/lib/bitsandbytes/research/__init__.py -> build/bdist.linux-x86_64/egg/bitsandbytes/research
creating build/bdist.linux-x86_64/egg/bitsandbytes/research/autograd
copying build/lib/bitsandbytes/research/autograd/__init__.py -> build/bdist.linux-x86_64/egg/bitsandbytes/research/autograd
copying build/lib/bitsandbytes/research/autograd/_functions.py -> build/bdist.linux-x86_64/egg/bitsandbytes/research/autograd
creating build/bdist.linux-x86_64/egg/bitsandbytes/research/nn
copying build/lib/bitsandbytes/research/nn/__init__.py -> build/bdist.linux-x86_64/egg/bitsandbytes/research/nn
copying build/lib/bitsandbytes/research/nn/modules.py -> build/bdist.linux-x86_64/egg/bitsandbytes/research/nn
creating build/bdist.linux-x86_64/egg/bitsandbytes/triton
copying build/lib/bitsandbytes/triton/__init__.py -> build/bdist.linux-x86_64/egg/bitsandbytes/triton
copying build/lib/bitsandbytes/triton/dequantize_rowwise.py -> build/bdist.linux-x86_64/egg/bitsandbytes/triton
copying build/lib/bitsandbytes/triton/int8_matmul_mixed_dequanitze.py -> build/bdist.linux-x86_64/egg/bitsandbytes/triton
copying build/lib/bitsandbytes/triton/int8_matmul_rowwise_dequantize.py -> build/bdist.linux-x86_64/egg/bitsandbytes/triton
copying build/lib/bitsandbytes/triton/quantize_columnwise_and_transpose.py -> build/bdist.linux-x86_64/egg/bitsandbytes/triton
copying build/lib/bitsandbytes/triton/quantize_global.py -> build/bdist.linux-x86_64/egg/bitsandbytes/triton
copying build/lib/bitsandbytes/triton/quantize_rowwise.py -> build/bdist.linux-x86_64/egg/bitsandbytes/triton
copying build/lib/bitsandbytes/triton/triton_utils.py -> build/bdist.linux-x86_64/egg/bitsandbytes/triton
copying build/lib/bitsandbytes/libbitsandbytes_cuda118.so -> build/bdist.linux-x86_64/egg/bitsandbytes
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/__init__.py to __init__.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/__main__.py to __main__.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/cextension.py to cextension.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/functional.py to functional.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/utils.py to utils.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/autograd/__init__.py to __init__.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/autograd/_functions.py to _functions.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/cuda_setup/__init__.py to __init__.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/cuda_setup/env_vars.py to env_vars.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/cuda_setup/main.py to main.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/nn/__init__.py to __init__.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/nn/modules.py to modules.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/nn/triton_based_modules.py to triton_based_modules.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/optim/__init__.py to __init__.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/optim/adagrad.py to adagrad.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/optim/adam.py to adam.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/optim/adamw.py to adamw.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/optim/lamb.py to lamb.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/optim/lars.py to lars.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/optim/lion.py to lion.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/optim/optimizer.py to optimizer.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/optim/rmsprop.py to rmsprop.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/optim/sgd.py to sgd.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/research/__init__.py to __init__.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/research/autograd/__init__.py to __init__.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/research/autograd/_functions.py to _functions.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/research/nn/__init__.py to __init__.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/research/nn/modules.py to modules.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/triton/__init__.py to __init__.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/triton/dequantize_rowwise.py to dequantize_rowwise.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/triton/int8_matmul_mixed_dequanitze.py to int8_matmul_mixed_dequanitze.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/triton/int8_matmul_rowwise_dequantize.py to int8_matmul_rowwise_dequantize.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/triton/quantize_columnwise_and_transpose.py to quantize_columnwise_and_transpose.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/triton/quantize_global.py to quantize_global.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/triton/quantize_rowwise.py to quantize_rowwise.cpython-39.pyc
byte-compiling build/bdist.linux-x86_64/egg/bitsandbytes/triton/triton_utils.py to triton_utils.cpython-39.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying bitsandbytes.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying bitsandbytes.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying bitsandbytes.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying bitsandbytes.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
bitsandbytes.cuda_setup.__pycache__.main.cpython-39: module references __file__
creating dist
creating 'dist/bitsandbytes-0.39.0-py3.9.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing bitsandbytes-0.39.0-py3.9.egg
removing '/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg' (and everything under it)
creating /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg
Extracting bitsandbytes-0.39.0-py3.9.egg to /root/miniconda3/envs/py3.9/lib/python3.9/site-packages
bitsandbytes 0.39.0 is already the active version in easy-install.pth
Installed /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg
Processing dependencies for bitsandbytes==0.39.0
Finished processing dependencies for bitsandbytes==0.39.0
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /workspace/bitsandbytes/bitsandbytes/libbitsandbytes_cuda118.so
/workspace/bitsandbytes/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')}
warn(msg)
/workspace/bitsandbytes/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/workspace/bitsandbytes/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('module'), PosixPath('//matplotlib_inline.backend_inline')}
warn(msg)
/workspace/bitsandbytes/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIK1tFOFrWbmoa2ckCJYhzgBHKTSMeR/AeuScCCzugqlI utensilcandel@gmail.com')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/workspace/bitsandbytes/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /workspace/bitsandbytes/bitsandbytes/libbitsandbytes_cuda118.so...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ /usr/local CUDA PATHS +++++++++++++++++++
/usr/local/cuda-11.8/compat/libcuda.so
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudart.so
/usr/local/cuda-11.8/targets/x86_64-linux/lib/stubs/libcuda.so
+++++++++++++++ WORKING DIRECTORY CUDA PATHS +++++++++++++++
/workspace/bitsandbytes/bitsandbytes/libbitsandbytes_cuda118.so
/workspace/bitsandbytes/build/lib/bitsandbytes/libbitsandbytes_cuda118.so
++++++++++++++++++ LD_LIBRARY CUDA PATHS +++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
COMPILED_WITH_CUDA = True
COMPUTE_CAPABILITIES_PER_GPU = ['8.6']
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Running a quick check that:
+ library is importable
+ CUDA function is callable
WARNING: Please be sure to sanitize sensible info from any such env vars!
SUCCESS!
Installation was successful!