server : use std::move whenever possible#12936
Conversation
examples/server/server.cpp
Outdated
| } | ||
|
|
||
| bool launch_slot_with_task(server_slot & slot, const server_task & task) { | ||
| bool launch_slot_with_task(server_slot & slot, const server_task && task) { |
There was a problem hiding this comment.
| bool launch_slot_with_task(server_slot & slot, const server_task && task) { | |
| bool launch_slot_with_task(server_slot & slot, server_task && task) { |
There was a problem hiding this comment.
We also move the params and prompt_tokens from task to slot, so I think this is currently not possible
There was a problem hiding this comment.
What is not possible? With the const present, the moves of params and prompt_tokens are actually copies. Removing the const from the argument will make them proper moves.
There was a problem hiding this comment.
The problem is that the prompt_tokens will soon contains image tokens as unique_ptr, so it will require a proper move.
And even without image (the current case where we have only text tokens), I think a proper move still make sense here. This funtion is the last place where the task want to go (i.e. we expect task to be destroyed here)
There was a problem hiding this comment.
Ah now looking back at this, I realized that I'm thinking the reverted (I thought that you want to add the const) ; Sorry for the confusion 🥲
examples/server/server.cpp
Outdated
| server_task task(SERVER_TASK_TYPE_METRICS); | ||
| task.id = ctx_server.queue_tasks.get_new_id(); | ||
| ctx_server.queue_results.add_waiting_task_id(task.id); | ||
| ctx_server.queue_tasks.post(task, true); // high-priority task | ||
| ctx_server.queue_tasks.post(std::move(task), true); // high-priority task | ||
|
|
||
| // get the result | ||
| server_task_result_ptr result = ctx_server.queue_results.recv(task.id); |
There was a problem hiding this comment.
The task should not be referenced after being moved. We can make this safer like this:
const auto id = ctx_server.queue_tasks.get_new_id();
{
server_task task(SERVER_TASK_TYPE_METRICS);
task.id = id;
ctx_server.queue_results.add_waiting_task_id(task.id);
ctx_server.queue_tasks.post(std::move(task), true); // high-priority task
}There was a problem hiding this comment.
Thanks, that's a good idea. I also spotted some places where task is used after the move. It should be fixed in my last commit: 9487165
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
examples/server/server.cpp
Outdated
| queue_tasks.push_front(task); | ||
| } else { | ||
| queue_tasks.push_back(std::move(task)); | ||
| queue_tasks.push_back(task); |
There was a problem hiding this comment.
Without these moves, the task will be copied here. Better to keep them.
There was a problem hiding this comment.
Thanks, seems like I mistakenly removed this move. Added them back in 240ea24
ggerganov
left a comment
There was a problem hiding this comment.
Few updates and good to merge:
diff --git a/examples/server/server.cpp b/examples/server/server.cpp
index 528d4e6a..c580ec12 100644
--- a/examples/server/server.cpp
+++ b/examples/server/server.cpp
@@ -1563,14 +1563,15 @@ struct server_queue {
if (task.type == SERVER_TASK_TYPE_CANCEL) {
cleanup_pending_task(task.id_target);
}
- QUE_DBG("new task, id = %d, front = %d\n", task.id, front);
+ const int task_id = task.id;
+ QUE_DBG("new task, id = %d, front = %d\n", task_id, front);
if (front) {
queue_tasks.push_front(std::move(task));
} else {
queue_tasks.push_back(std::move(task));
}
condition_tasks.notify_one();
- return task.id;
+ return task_id;
}
// multi-task version of post()
@@ -2105,7 +2106,7 @@ struct server_context {
return true;
}
- bool launch_slot_with_task(server_slot & slot, const server_task && task) {
+ bool launch_slot_with_task(server_slot & slot, server_task && task) {
slot.reset();
slot.id_task = task.id;
slot.index = task.index;
@@ -2113,10 +2114,10 @@ struct server_context {
slot.params = std::move(task.params);
slot.prompt_tokens = std::move(task.prompt_tokens);
- if (!are_lora_equal(task.params.lora, slot.lora)) {
+ if (!are_lora_equal(slot.params.lora, slot.lora)) {
// if lora is changed, we cannot reuse cached tokens
slot.cache_tokens.clear();
- slot.lora = task.params.lora;
+ slot.lora = slot.params.lora;
}
bool can_detokenize = can_be_detokenized(ctx, slot.prompt_tokens);
@@ -3952,44 +3953,42 @@ int main(int argc, char ** argv) {
auto completion_id = gen_chatcmplid();
std::unordered_set<int> task_ids;
- {
+ try {
std::vector<server_task> tasks;
- try {
- const auto & prompt = data.at("prompt");
- // TODO: this log can become very long, put it behind a flag or think about a more compact format
- //SRV_DBG("Prompt: %s\n", prompt.is_string() ? prompt.get<std::string>().c_str() : prompt.dump(2).c_str());
-
- std::vector<llama_tokens> tokenized_prompts = tokenize_input_prompts(ctx_server.vocab, prompt, true, true);
- tasks.reserve(tokenized_prompts.size());
- for (size_t i = 0; i < tokenized_prompts.size(); i++) {
- server_task task = server_task(type);
-
- task.id = ctx_server.queue_tasks.get_new_id();
- task.index = i;
-
- task.prompt_tokens = std::move(tokenized_prompts[i]);
- task.params = server_task::params_from_json_cmpl(
- ctx_server.ctx,
- ctx_server.params_base,
- data);
- task.id_selected_slot = json_value(data, "id_slot", -1);
-
- // OAI-compat
- task.params.oaicompat = oaicompat;
- task.params.oaicompat_cmpl_id = completion_id;
- // oaicompat_model is already populated by params_from_json_cmpl
-
- tasks.push_back(std::move(task));
- }
- } catch (const std::exception & e) {
- res_error(res, format_error_response(e.what(), ERROR_TYPE_INVALID_REQUEST));
- return;
+ const auto & prompt = data.at("prompt");
+ // TODO: this log can become very long, put it behind a flag or think about a more compact format
+ //SRV_DBG("Prompt: %s\n", prompt.is_string() ? prompt.get<std::string>().c_str() : prompt.dump(2).c_str());
+
+ std::vector<llama_tokens> tokenized_prompts = tokenize_input_prompts(ctx_server.vocab, prompt, true, true);
+ tasks.reserve(tokenized_prompts.size());
+ for (size_t i = 0; i < tokenized_prompts.size(); i++) {
+ server_task task = server_task(type);
+
+ task.id = ctx_server.queue_tasks.get_new_id();
+ task.index = i;
+
+ task.prompt_tokens = std::move(tokenized_prompts[i]);
+ task.params = server_task::params_from_json_cmpl(
+ ctx_server.ctx,
+ ctx_server.params_base,
+ data);
+ task.id_selected_slot = json_value(data, "id_slot", -1);
+
+ // OAI-compat
+ task.params.oaicompat = oaicompat;
+ task.params.oaicompat_cmpl_id = completion_id;
+ // oaicompat_model is already populated by params_from_json_cmpl
+
+ tasks.push_back(std::move(task));
}
task_ids = server_task::get_list_id(tasks);
ctx_server.queue_results.add_waiting_tasks(tasks);
ctx_server.queue_tasks.post(std::move(tasks));
+ } catch (const std::exception & e) {
+ res_error(res, format_error_response(e.what(), ERROR_TYPE_INVALID_REQUEST));
+ return;
}
bool stream = json_value(data, "stream", false);
Co-authored-by: ggerganov <ggerganov@gmail.com>
|
Thanks @ggerganov for helping! I underestimated this change. Applied your suggestions on my last commit 👍 |
* server : use std::move whenever possible * use r-value ref * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * make task creation scoped * restore std::move * fix task_id not set correctly * apply changes from suggestion Co-authored-by: ggerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : use std::move whenever possible * use r-value ref * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * make task creation scoped * restore std::move * fix task_id not set correctly * apply changes from suggestion Co-authored-by: ggerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : use std::move whenever possible * use r-value ref * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * make task creation scoped * restore std::move * fix task_id not set correctly * apply changes from suggestion Co-authored-by: ggerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Ref comment : #12898 (comment)
This PR is made by simply adding
std::unique_ptr<int> dummyto the struct that we want to investigate, then remove that unique_ptr after allstd::movebeing added.