server : use std::move whenever possible by ngxson · Pull Request #12936 · ggml-org/llama.cpp

ngxson · 2025-04-14T06:57:21Z

This PR is made by simply adding std::unique_ptr<int> dummy to the struct that we want to investigate, then remove that unique_ptr after all std::move being added.

examples/server/server.cpp

ggerganov · 2025-04-15T06:40:28Z

examples/server/server.cpp

    }

-    bool launch_slot_with_task(server_slot & slot, const server_task & task) {
+    bool launch_slot_with_task(server_slot & slot, const server_task && task) {


Suggested change

bool launch_slot_with_task(server_slot & slot, const server_task && task) {

bool launch_slot_with_task(server_slot & slot, server_task && task) {

We also move the params and prompt_tokens from task to slot, so I think this is currently not possible

What is not possible? With the const present, the moves of params and prompt_tokens are actually copies. Removing the const from the argument will make them proper moves.

The problem is that the prompt_tokens will soon contains image tokens as unique_ptr, so it will require a proper move.

And even without image (the current case where we have only text tokens), I think a proper move still make sense here. This funtion is the last place where the task want to go (i.e. we expect task to be destroyed here)

Ah now looking back at this, I realized that I'm thinking the reverted (I thought that you want to add the const) ; Sorry for the confusion 🥲

ggerganov · 2025-04-15T06:46:43Z

examples/server/server.cpp

        server_task task(SERVER_TASK_TYPE_METRICS);
        task.id = ctx_server.queue_tasks.get_new_id();
        ctx_server.queue_results.add_waiting_task_id(task.id);
-        ctx_server.queue_tasks.post(task, true); // high-priority task
+        ctx_server.queue_tasks.post(std::move(task), true); // high-priority task

        // get the result
        server_task_result_ptr result = ctx_server.queue_results.recv(task.id);


The task should not be referenced after being moved. We can make this safer like this:

const auto id = ctx_server.queue_tasks.get_new_id(); { server_task task(SERVER_TASK_TYPE_METRICS); task.id = id; ctx_server.queue_results.add_waiting_task_id(task.id); ctx_server.queue_tasks.post(std::move(task), true); // high-priority task }

Thanks, that's a good idea. I also spotted some places where task is used after the move. It should be fixed in my last commit: 9487165

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggerganov · 2025-04-15T11:18:13Z

examples/server/server.cpp

+            queue_tasks.push_front(task);
        } else {
-            queue_tasks.push_back(std::move(task));
+            queue_tasks.push_back(task);


Without these moves, the task will be copied here. Better to keep them.

Thanks, seems like I mistakenly removed this move. Added them back in 240ea24

ggerganov

Few updates and good to merge:

diff --git a/examples/server/server.cpp b/examples/server/server.cpp
index 528d4e6a..c580ec12 100644
--- a/examples/server/server.cpp
+++ b/examples/server/server.cpp
@@ -1563,14 +1563,15 @@ struct server_queue {
         if (task.type == SERVER_TASK_TYPE_CANCEL) {
             cleanup_pending_task(task.id_target);
         }
-        QUE_DBG("new task, id = %d, front = %d\n", task.id, front);
+        const int task_id = task.id;
+        QUE_DBG("new task, id = %d, front = %d\n", task_id, front);
         if (front) {
             queue_tasks.push_front(std::move(task));
         } else {
             queue_tasks.push_back(std::move(task));
         }
         condition_tasks.notify_one();
-        return task.id;
+        return task_id;
     }
 
     // multi-task version of post()
@@ -2105,7 +2106,7 @@ struct server_context {
         return true;
     }
 
-    bool launch_slot_with_task(server_slot & slot, const server_task && task) {
+    bool launch_slot_with_task(server_slot & slot, server_task && task) {
         slot.reset();
         slot.id_task       = task.id;
         slot.index         = task.index;
@@ -2113,10 +2114,10 @@ struct server_context {
         slot.params        = std::move(task.params);
         slot.prompt_tokens = std::move(task.prompt_tokens);
 
-        if (!are_lora_equal(task.params.lora, slot.lora)) {
+        if (!are_lora_equal(slot.params.lora, slot.lora)) {
             // if lora is changed, we cannot reuse cached tokens
             slot.cache_tokens.clear();
-            slot.lora = task.params.lora;
+            slot.lora = slot.params.lora;
         }
 
         bool can_detokenize = can_be_detokenized(ctx, slot.prompt_tokens);
@@ -3952,44 +3953,42 @@ int main(int argc, char ** argv) {
 
         auto completion_id = gen_chatcmplid();
         std::unordered_set<int> task_ids;
-        {
+        try {
             std::vector<server_task> tasks;
 
-            try {
-                const auto & prompt = data.at("prompt");
-                // TODO: this log can become very long, put it behind a flag or think about a more compact format
-                //SRV_DBG("Prompt: %s\n", prompt.is_string() ? prompt.get<std::string>().c_str() : prompt.dump(2).c_str());
-
-                std::vector<llama_tokens> tokenized_prompts = tokenize_input_prompts(ctx_server.vocab, prompt, true, true);
-                tasks.reserve(tokenized_prompts.size());
-                for (size_t i = 0; i < tokenized_prompts.size(); i++) {
-                    server_task task = server_task(type);
-
-                    task.id    = ctx_server.queue_tasks.get_new_id();
-                    task.index = i;
-
-                    task.prompt_tokens    = std::move(tokenized_prompts[i]);
-                    task.params           = server_task::params_from_json_cmpl(
-                                                ctx_server.ctx,
-                                                ctx_server.params_base,
-                                                data);
-                    task.id_selected_slot = json_value(data, "id_slot", -1);
-
-                    // OAI-compat
-                    task.params.oaicompat                 = oaicompat;
-                    task.params.oaicompat_cmpl_id         = completion_id;
-                    // oaicompat_model is already populated by params_from_json_cmpl
-
-                    tasks.push_back(std::move(task));
-                }
-            } catch (const std::exception & e) {
-                res_error(res, format_error_response(e.what(), ERROR_TYPE_INVALID_REQUEST));
-                return;
+            const auto & prompt = data.at("prompt");
+            // TODO: this log can become very long, put it behind a flag or think about a more compact format
+            //SRV_DBG("Prompt: %s\n", prompt.is_string() ? prompt.get<std::string>().c_str() : prompt.dump(2).c_str());
+
+            std::vector<llama_tokens> tokenized_prompts = tokenize_input_prompts(ctx_server.vocab, prompt, true, true);
+            tasks.reserve(tokenized_prompts.size());
+            for (size_t i = 0; i < tokenized_prompts.size(); i++) {
+                server_task task = server_task(type);
+
+                task.id    = ctx_server.queue_tasks.get_new_id();
+                task.index = i;
+
+                task.prompt_tokens    = std::move(tokenized_prompts[i]);
+                task.params           = server_task::params_from_json_cmpl(
+                        ctx_server.ctx,
+                        ctx_server.params_base,
+                        data);
+                task.id_selected_slot = json_value(data, "id_slot", -1);
+
+                // OAI-compat
+                task.params.oaicompat                 = oaicompat;
+                task.params.oaicompat_cmpl_id         = completion_id;
+                // oaicompat_model is already populated by params_from_json_cmpl
+
+                tasks.push_back(std::move(task));
             }
 
             task_ids = server_task::get_list_id(tasks);
             ctx_server.queue_results.add_waiting_tasks(tasks);
             ctx_server.queue_tasks.post(std::move(tasks));
+        } catch (const std::exception & e) {
+            res_error(res, format_error_response(e.what(), ERROR_TYPE_INVALID_REQUEST));
+            return;
         }
 
         bool stream = json_value(data, "stream", false);

Co-authored-by: ggerganov <ggerganov@gmail.com>

ngxson · 2025-04-18T15:29:34Z

Thanks @ggerganov for helping! I underestimated this change. Applied your suggestions on my last commit 👍

* server : use std::move whenever possible * use r-value ref * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * make task creation scoped * restore std::move * fix task_id not set correctly * apply changes from suggestion Co-authored-by: ggerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

server : use std::move whenever possible

203725a

ngxson requested a review from ggerganov April 14, 2025 06:57

github-actions bot added examples server labels Apr 14, 2025

ggerganov reviewed Apr 14, 2025

View reviewed changes

examples/server/server.cpp Outdated Show resolved Hide resolved

use r-value ref

b7aea03

ggerganov reviewed Apr 15, 2025

View reviewed changes

ngxson and others added 2 commits April 15, 2025 08:54

Apply suggestions from code review

d8656a1

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

make task creation scoped

9487165

ggerganov reviewed Apr 15, 2025

View reviewed changes

ngxson added 3 commits April 15, 2025 20:27

restore std::move

240ea24

fix task_id not set correctly

081d72d

Merge branch 'master' into xsn/server_std_move

8fc8941

ggerganov approved these changes Apr 17, 2025

View reviewed changes

apply changes from suggestion

c64d6bc

Co-authored-by: ggerganov <ggerganov@gmail.com>

ngxson merged commit 35370ba into ggml-org:master Apr 18, 2025
48 of 51 checks passed

	bool launch_slot_with_task(server_slot & slot, const server_task && task) {
	bool launch_slot_with_task(server_slot & slot, server_task && task) {

Conversation

ngxson commented Apr 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

ngxson commented Apr 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants