Currently, the "in_slice" APIs copy data to the CPU, which is inefficient
Currently, the "in_slice" APIs copy data to the CPU, which is inefficient