migrate XLA optimization from tf2.15 to 2.20 by wendi98 · Pull Request #6 · joeyye-work/tensorflow

wendi98 · 2026-01-13T03:16:22Z

Rebuild the tensorflow server

If you do not want to build libblas_mlir.so and only want to use the OpenBLAS interface, use this command or set ENABLE_BLAS_MLIR to false.

bazel --output_user_root=./output build -c opt --distdir=/XXX/serving/downloads tensorflow_serving/model_servers:tensorflow_model_server --jobs 64 2>&1 | tee build.log

When the ENABLE_BLAS_MLIR variable is set to true, you can use self-developed interfaces with mlir, such as __xla_cpu_runtime_KernelSelectorGEMVMLIR.
bazel --output_user_root=./output build -c opt --define ENABLE_BLAS_MLIR=true --distdir=/XXX/serving/downloads tensorflow_serving/model_servers:tensorflow_model_server --jobs 64 2>&1 | tee build.log

Enable the self-developed optimizations

export SET_CPU_INS_FUSION_NOT_DUPLICATE = "1"
XLA_FLAGS="--xla_cpu_enable_xnnpack=true --xla_cpu_use_kernel_selector=true "
XLA_FLAGS+="--xla_cpu_use_thunk_runtime=false --xla_cpu_use_fusion_emitters=false"
export XLA_FLAGS
export OMP_NUM_THREADS=1
export KERNEL_MAP_FILE="kernels_map.txt"

Create a new kernels_map.txt file in the directory where the script is located, and the content of the file is as follows:

[gemv] (*, 64:256) -> __xla_cpu_runtime_KernelSelectorGEMVMLIR
[gemm] (*) -> __xla_cpu_runtime_KernelSelectorGEMMSequential
[batch3d] (*) -> __xla_cpu_runtime_KernelSelectorBatch3DMLIR
[batch4d] (*) -> __xla_cpu_runtime_KernelSelectorBatch4DMLIR
[argmax] (*) -> __xla_cpu_runtime_ArgMax3DSequential

For scenarios where intra is greater than 1, parallel kernels can be used:

[gemv] (*, 64:256) -> __xla_cpu_runtime_KernelSelectorGEMVMLIR
[gemm] (*) -> __xla_cpu_runtime_KernelSelectorGEMMParallel
[batch3d] (*) -> __xla_cpu_runtime_KernelSelectorBatch3DParallel
[batch4d] (*) -> __xla_cpu_runtime_KernelSelectorBatch4DParallel
[argmax] (*) -> __xla_cpu_runtime_ArgMax3DParallel

Wen Di added 3 commits January 12, 2026 17:23

add xnnpack for softmax

77e3e3c

add kernel selector

912c9af

add env to set cpu instructions fusion not duplicate

e5dedb2

wendi98 force-pushed the xla-migration branch from 8305a86 to e5dedb2 Compare January 13, 2026 03:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

migrate XLA optimization from tf2.15 to 2.20#6

migrate XLA optimization from tf2.15 to 2.20#6
wendi98 wants to merge 3 commits intojoeyye-work:for-serving-2.20from
wendi98:xla-migration

wendi98 commented Jan 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wendi98 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rebuild the tensorflow server

Enable the self-developed optimizations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wendi98 commented Jan 13, 2026 •

edited

Loading