Skip to content

migrate XLA optimization from tf2.15 to 2.20#6

Open
wendi98 wants to merge 3 commits intojoeyye-work:for-serving-2.20from
wendi98:xla-migration
Open

migrate XLA optimization from tf2.15 to 2.20#6
wendi98 wants to merge 3 commits intojoeyye-work:for-serving-2.20from
wendi98:xla-migration

Conversation

@wendi98
Copy link
Copy Markdown
Collaborator

@wendi98 wendi98 commented Jan 13, 2026

Rebuild the tensorflow server

If you do not want to build libblas_mlir.so and only want to use the OpenBLAS interface, use this command or set ENABLE_BLAS_MLIR to false.

bazel --output_user_root=./output build -c opt --distdir=/XXX/serving/downloads tensorflow_serving/model_servers:tensorflow_model_server --jobs 64 2>&1 | tee build.log

When the ENABLE_BLAS_MLIR variable is set to true, you can use self-developed interfaces with mlir, such as __xla_cpu_runtime_KernelSelectorGEMVMLIR.
bazel --output_user_root=./output build -c opt --define ENABLE_BLAS_MLIR=true --distdir=/XXX/serving/downloads tensorflow_serving/model_servers:tensorflow_model_server --jobs 64 2>&1 | tee build.log

Enable the self-developed optimizations

export SET_CPU_INS_FUSION_NOT_DUPLICATE = "1"
XLA_FLAGS="--xla_cpu_enable_xnnpack=true --xla_cpu_use_kernel_selector=true "
XLA_FLAGS+="--xla_cpu_use_thunk_runtime=false --xla_cpu_use_fusion_emitters=false"
export XLA_FLAGS
export OMP_NUM_THREADS=1
export KERNEL_MAP_FILE="kernels_map.txt"

Create a new kernels_map.txt file in the directory where the script is located, and the content of the file is as follows:

[gemv] (*, 64:256) -> __xla_cpu_runtime_KernelSelectorGEMVMLIR
[gemm] (*) -> __xla_cpu_runtime_KernelSelectorGEMMSequential
[batch3d] (*) -> __xla_cpu_runtime_KernelSelectorBatch3DMLIR
[batch4d] (*) -> __xla_cpu_runtime_KernelSelectorBatch4DMLIR
[argmax] (*) -> __xla_cpu_runtime_ArgMax3DSequential

For scenarios where intra is greater than 1, parallel kernels can be used:

[gemv] (*, 64:256) -> __xla_cpu_runtime_KernelSelectorGEMVMLIR
[gemm] (*) -> __xla_cpu_runtime_KernelSelectorGEMMParallel
[batch3d] (*) -> __xla_cpu_runtime_KernelSelectorBatch3DParallel
[batch4d] (*) -> __xla_cpu_runtime_KernelSelectorBatch4DParallel
[argmax] (*) -> __xla_cpu_runtime_ArgMax3DParallel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant