Metadata Center

A near real-time load metric collection component, designed for intelligent inference scheduler in large-scale inference services.

English | 中文

Status

Early & quick developing

Background

Load metrics is very import for LLM inference scheduler.

Typically, the following four load metrics are very important: (for each engine level)

Total number of requests
Token usage (KVCache usage)
Number of requests in Prefill
Prompt length in Prefill

Timeliness is critical in large scale service. Poor timeliness will lead to large races, may choosing the same inference engine before the load metrics are updated.

There will be a fixed periodic delay, when polling metrics from engines. Especially in large-scale scenarios, as the QPS (throughput) increases, the race will also increase significantly.

Architecture

Cooperating with Inference Gateway(i.e. AIGW), we can achieve near real-time load metric collection by the following steps:

Request proxy to Inference Engine:

a. prefill & total request number: +1

b. prefill prompt length: +prompt-length
First token responded

a. prefill request number: -1

b. prefill prompt length: -prompt-length
Request done

a. total request number: -1

Even more, we can introduce CAS API to reduce race, when it is required in the feature.

📚 Documentation

📜 License

This project is licensed under Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
cmd		cmd
configs		configs
docs		docs
pkg		pkg
.gitignore		.gitignore
.licenserc.yaml		.licenserc.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_ZH.md		README_ZH.md
VERSION		VERSION
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metadata Center

Status

Background

Architecture

📚 Documentation

📜 License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

aigw-project/metadata-center

Folders and files

Latest commit

History

Repository files navigation

Metadata Center

Status

Background

Architecture

📚 Documentation

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages