Mekhanik evgenii/fix 1346 1 by EvgeniiMekhanik · Pull Request #2456 · tempesta-tech/tempesta

EvgeniiMekhanik · 2025-07-01T17:59:43Z

No description provided.

EvgeniiMekhanik · 2025-07-03T14:26:39Z

fw/ss_skb.c

 	__u8 pfmemalloc = skb->pfmemalloc;

-	WARN_ON_ONCE(skb->sk);
+	skb_orphan(skb);


Please pay attention on this place. Here we release skb owner and decrease client->mem. This function ss_skb_init_for_xmit is called before push skb to the socket write queue. So all skbs in socket write queue are not taken into account for client memory calculation. We release skb owner here, because if don't do it we need a rather big kernel patch to adjust skb memory before it will be passed to socket write queue. @krizhanovsky @const-t what do you think about it?

Why don't we make a pointer to a client accounting in skb->cb instead of to play with skb_orphan()? I'd prefer to avoid this since we can get plenty of crashes in this patch or in later kernel version migrations due to breaking kernel logic about orphaned skbs.

fw/http_limits.h

fw/sock_clnt.c

fw/ss_skb.h

const-t · 2026-02-02T12:31:05Z

fw/sock_clnt.c

 		.allow_reconfig = true,
 	},
+	{
+		.name = "client_mem",


@EvgeniiMekhanik @krizhanovsky I should notice that currently client_mem overrides frang's http_body_len. By default Tempesta can receive 1GB request/response, but with enabled client_mem it is limited to client_mem. It may be surprising during configuration. It seems we can temporary disable http_body_len and set client_mem to 1GB until #498 is implemented.

const-t · 2026-02-02T14:53:25Z

fw/pool.h

 */
 typedef struct {
 	TfwPoolChunk	*curr;
+	void		*owner;


Why not forward-declare TfwClient struct?

EvgeniiMekhanik · 2026-02-08T07:01:15Z

2e94835:
finished in 40.05s, 1754757.73 req/s, 1.33GB/s - no client mem option, just accounting
finished in 40.06s, 1698862.07 req/s, 1.29GB/s - client mem set, check memory consumption
d7783ad:
finished in 40.06s, 1351866.00 req/s, 1.02GB/s - no client mem option, just accounting
finished in 40.06s, 1297204.55 req/s, 1004.79MB/s - client mem set, check memory consumption
7dd07ec:
finished in 40.05s, 1544324.25 req/s, 1.17GB/s - no client mem option, just accounting
finished in 40.06s, 1479000.85 req/s, 1.12GB/s - client mem set, check memory consumption
no 4037885:
finished in 40.05s, 1568655.85 req/s, 1.19GB/s - no client mem option, just accounting
finished in 40.05s, 1568655.85 req/s, 1.16GB/s - client mem set, check memory consumption
master:
finished in 40.05s, 1802978.02 req/s, 1.36GB/s

EvgeniiMekhanik · 2026-02-08T07:13:57Z

Big responses
master:
finished in 40.06s, 64391.32 req/s, 6.12GB/s
2e94835:
finished in 40.10s, 63755.38 req/s, 6.11GB/s - no client mem option, just accounting

EvgeniiMekhanik · 2026-02-08T07:15:57Z

Also pay attention that we call frang_client_mem_limit on in a few places, not on each memory allocation to prevent performance degradation

const-t · 2026-02-16T14:57:06Z

fw/client.c

+tfw_client_free(TdbRec *rec)
+{
+        /* Stats should be updated(dec/inc) only for complete records. */
+        if (!tdb_entry_is_complete(rec))


I believe we don't need this condition for clients, we allocate per-cpu memory even for not complete records. At this moment we are safe, however if in future we fail before set record as complete we will have memleak.

const-t · 2026-02-16T15:54:31Z

fw/client.c

+
+	assert_spin_locked(&client_db->ga_lock);
+
+	cli->mem = tfw_alloc_percpu(long);


alloc_percpu() has GFP_KERNEL flag by default, in this place we must set GFP_ATOMIC | __GFP_ZERO and remove

for_each_online_cpu(cpu) *(per_cpu_ptr(cli->mem, cpu)) = 0;

const-t · 2026-02-17T15:40:26Z

fw/http_frame.c

 		       parsed, skb->len);
 	}

+	r = frang_client_mem_limit((TfwCliConn *)c, true);


I'm worrying about this place, without benchmark I can't say how it is crucial, but seems we introduce a lot of inter-cache traffic, that might be significant on NUMA. Even for small frames, service frames, we call frang_client_mem_limit(). Maybe we can have a local per-cpu threshold(for instance 4 * default mtu) to check during request processing and to check the remote cpus only when threshold has been reached and then once per request or client processing stage - when all skbs are processed? However the last makes sense only if we close all connections from the client.

Yes this only one place, when we decrease performance for 10 percent (if appropriate option is enabled, and we check client memory consumption). @krizhanovsky is it crusial? Should we accuracy check memory consumption and block client or not?

const-t · 2026-02-17T18:11:21Z

fw/tls.c

 					r = -ENOMEM;
 					goto out;
 				}
+				ss_skb_set_owner(skb, ss_skb_dflt_destructor,


We have a few places like that, we do ss_skb_set_owner() where we assign new value to per-cpu. And then ss_skb_adjust_data_len() where we also assign new value to per-cpu and we do this in the loop. By itself assign new value to per-cpu in the loop is very cheap, it is regular assign. However, we have remote access to this per-cpu variable that leads to inter-cache traffic even for local writes. Maybe we can assign values to "local" variable and after some threshold assign it to per-cpu? Not only in this place, do like so everywhere.

As I see on SUT, we have no problem with assigning the only one real place which decrease performance (in case when it is enabled) is frang_client_mem_limit, where we sum client memory for all cpus

const-t · 2026-02-17T18:24:17Z

fw/cache.c

 	skb_fill_page_desc(it->skb, it->frag, page, off, sz);
 	if (!h2)
 		skb_frag_ref(it->skb, it->frag);
 	ss_skb_adjust_data_len(it->skb, sz);


Here we add client mem for plain http1 as well as other protocols, however for plain http1 memory will not be allocated and ss_skb_to_sgvec_with_new_pages()will not be called. I'm not ask to change this behavior it is up to you, but I ask to write a comment to describe this.

const-t · 2026-02-17T18:31:57Z

fw/http.c

+				 conn->peer, skb->truesize);
+	}
+
+	r = frang_client_mem_limit((TfwCliConn *)conn, false);


FYI: We checking limit of memory, before parsing, cache response build, etc. Those may consume a lot of memory, that can lead to interesting side effect: the current response will live during transfer to the client consuming a lot of memory and only new requests will not received from the same client, even if they are not so big. However checking mem limit in the end of req processing even worse.

const-t · 2026-02-18T11:08:07Z

fw/sock.c

+			memset(twin_skb->cb, 0, sizeof(twin_skb->cb));
+			ss_skb_set_owner(twin_skb, ss_skb_dflt_destructor,
+					 TFW_SKB_CB(skb)->opaque_data,
+					 skb_headlen(skb));


Suggested change

skb_headlen(skb));

skb_headlen(twin_skb));

Also we don't account struct sk_buff and struct skb_shared_info in this place, but in other places we do, why?

client_mem <soft_limit> <hard_limit> - controls haw many memory is used to store unanswered client requests and requests with linked responses which can not be forwarded to a client.

To track socket memory we should pass TfwHttpMsg * not TfwMsgIter * to most of http_nsg_* functions, because TfwHttpMsg has a pointer to connection and socket.

In task #498 we decide to use `client_mem` option to limit count of memory used by client. This commit is a part of this task - now Tempesta FW uses `sk->sk_rmem_alloc` to adjust memory used by Tempesta FW for this client connection.

In task #498 we decide to use `client_mem` option to limit count of memory used by client. This commit is a part of this task and the next step of implementaion. Previosly Tempesta FW uses `sk->sk_rmem_alloc` to adjust memory used by Tempesta FW for this client connection, now we adjust memory for the whole TfwClient, because the can be a lot of connection for one client and for all other cases we use limitation for TfwClient and block it if necessary.

If administrator specify `client_mem` and the memory used by all connection of current client exceeded this value Tempesta FW drops connection and block client by ip if `ip_block on;` is specified.

Previosuly we get connection when we adjust memory for skb, but it leads to several problems: - we can't adjust memory for skb before tls decryption, because skb from `tls->io_in.skb_list` are freed during connection released (but connection will be never released if we increment it's reference counter for these skbs). - We have the same problems for skbs, which are wait for appropriate tcp window to be pushed in socket write queue. Now we increment/decrement reference counter for TfwClient and adjust skb memory for requests before tls decryption.

Previously we adjust tcp send window only for http2 connection and only during making HEADER or DATA frames, but if we want to control client memory usage we should do it for all type of sending data. (We orphane skb and decrease memory usage when we pass skb to the socket write queue, so we we don't adjust tcp send window we push a lot of skbs in socket write queue and don't adjust it's memory).

- remove `client_get_light/client_put_light` functions, because after removing lock from `client` structure we don't need these functions at all. - Adjust memory usage of skb in `skb->cb`. Usually it is equal to `skb->truesize, but for some cases ( skb which was created by `pskb_copy_for_clone` for example it is different).

- use `skb_shift` instead of `skb_try_coalesce` to correctly adjust send window during entailing skb to socket write queue. - adjust FRAME_HEADER_SIZE during calculation send window during making frames. (There was a mistake with accuracy of send window calculation, we don't take into account, that each frame also contains frame header). - change BUG_ON to WARN_ON.

- rename tfw_cli_*_limit to tfw_cli_*_mem_limit - rename `ss_skb_can_collapce` to `ss_skb_can_collapse` - rename `tfw_h2_or_stream_wnd_is_exceeded` to `tfw_h2_conn_or_stream_wnd_is_exceeded`. - move braces `{` to the next line. - rename `ss_skb_adjust_sk_mem` to `ss_skb_adjust_client_mem`. - Do not dublicate code for http1 and http2 in `tfw_connection_push`. - Change BUG_ON to WARN_ON in some places.

Do not use `skb->sk` and `skb->destructor` to check memory used by skb, use `skb->cb` for this purposes. - Implement our own version of `skb_orphan` with name `ss_skb_orphan` which is called when skb is freed in Tempesta FW code our just before pushing skb to socket write queue. - Implement wrappers over `__kfree_skb` and `kfree_skb` where we call `ss_skb_orphan` before free skb. - Check that skb is pushed to socket write queue, using new ipmlemented function `skb_tfw_is_in_socket_write_queue` from linux kernel, to skip adjusting memory used be skb, when it belongs to kernel (when `ss_skb_*` functions called from `tls_encrypt`).

- Usually we use callbacks which are set in `skb->cb` for different purposes. So remove to callbacks, which was added in previous patches and use callbacks saved in `skb->cb`.

- Since we use pool for http memory allocation, change api of all `tfw_pool_*` functions to pass `TfwClient` and accounting memory in this structure. - Remove `TfwClient` refcounter (it not used, can be done in previous commits). - Fix unit tests to check memory accounting, cleanup memory after each test, to check that client memory is equal to zero after test.

A big performance degradation was found after this patch. During investigation it was found that the problem is in usage atomic counter for client mem accounting. Usage per_cpu array instead of atomic counter fix a performance issue.

During investigation of performance degradation was found that we loose about 5 - 10 % of performance when we use `skb_shift` and adjust send window accurately during entailing skb in socket write queue. Revert this change. Also call `ss_skb_orphan` if we merge entailed skb to the tail skb in socket write queue.

Previously we remove client entry from TDB if there is no entry in `client_lru.free_list` and new client is allocated, even if such removed client still have any active connections. There is a BUG in such strategy - if this removed client has hung connections, we can't close and destroy them during Tempesta FW unloading, because we close and destroy connections during iteration through active clients (`tfw_client_for_each`). In new strategy we change logic in `tdb_htrie_put_rec`. We add pointer to the bucket in the record structure. When we remove record we zeroed this pointer. If record reference counter became equal to zero, but bucket pointer is still not NULL (record was not removed) we remove such record from the bucket using this pointer. For clients we just use tfw_client_put, without record removing, when client reference counter became equal to zero client record will be removed from bucket and freed.

const-t · 2026-03-19T14:28:07Z

fw/cache.c

 	if (!h2)
 		skb_frag_ref(it->skb, it->frag);
-	ss_skb_adjust_data_len(it->skb, sz);
+	else


Why else if we add fragment for both h2 and h1?

We can't call tfw_client_get/put on each allocated or orphaned skb. (Or each pool creation/destroing). Under pressure when we have a lot of cpus that lead to atomic contention and bad performance degradation. To fix this problem we implement special TfwClientMem structure, with it's own reference accounting (using struct percpu_ref!) and save in the client structure point to it. We use percpu_ref_tryget/percpu_ref_put during skb allocation/deallocation (it's very cheap). When we destroy client we schedule work, call `percpu_ref_kill_and_confirm` and wait until all skbs will be orphaned. Also make some fixes according review: - Call `tfw_client_free` for incomplete records also. - Implement `tfw_alloc_percpu_gfp` same as `alloc_percpu_gfp` but with error injection - Fix memory accouting during copying skbs.

const-t · 2026-03-20T15:01:45Z

fw/client.c

+{
+	TfwClientMem *cli_mem;
+
+	cli_mem = tfw_kmalloc(sizeof(TfwClientMem), GFP_ATOMIC);


It would be good to re-use this memory, good for memcache, but as I see, we can't use it, because during shutdown we may destroy the cache before freeing all objects.

Also this is untested place. We didn't do benchmarks with many clients so we can't say what overhead it introduces

const-t · 2026-03-20T15:08:31Z

fw/client.c

+{
+	TfwClientMem *cli_mem = container_of(ref, TfwClientMem, refcnt);
+
+	call_rcu(&cli_mem->rcu_head, __cli_mem_release);


Warning on shutdown: Start Tempesta do single request, then stop Tempesta.

[ 183.855417] [tempesta fw] Tempesta FW is ready [ 200.349624] [tdb] Close table 'client0.tdb' [ 200.353864] [tdb] Close table 'sessions0.tdb' [ 200.391901] ------------[ cut here ]------------ [ 200.392530] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:2631 rcu_core+0x3d9/0x7c0 [ 200.393491] Modules linked in: tempesta_fw(OE) tempesta_db(OE) tempesta_tls(OE) tempesta_lib(OE) veth intel_rapl_msr intel_rapl_common xt_conntrack xt_MASQUERADE nft_masq xfrm_user xfrm_algo nft_chain_nat nf_nat snd_hda_codec_generic nf_conntrack snd_hda_intel xt_addrtype nf_defrag_ipv6 nft_compat nf_defrag_ipv4 snd_intel_dspcfg bridge snd_intel_sdw_acpi snd_hda_codec stp llc snd_hda_core nf_tables overlay qxl snd_hwdep kvm_amd snd_pcm i2c_i801 drm_ttm_helper snd_timer ccp cfg80211 ttm i2c_mux snd binfmt_misc kvm joydev input_leds i2c_smbus lpc_ich drm_kms_helper soundcore virtiofs mac_hid serio_raw sch_fq_codel dm_multipath efi_pstore drm nfnetlink dmi_sysfs qemu_fw_cfg ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 crct10dif_pclmul hid_generic crc32_pclmul ghash_clmulni_intel virtio_net usbhid ahci sha512_ssse3 net_failover psmouse virtio_scsi sha256_ssse3 virtio_rng libahci failover hid virtio_blk [ 200.393538] aesni_intel [ 200.397304] [tdb] Close table 'cache0.tdb' [ 200.413234] crypto_simd cryptd [last unloaded: tempesta_lib(OE)] [ 200.413249] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G OE 6.12.12+ #225 [ 200.413257] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 200.413257] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.17.0-2-2 04/01/2014 [ 200.413259] RIP: 0010:rcu_core+0x3d9/0x7c0 [ 200.413264] Code: 02 00 00 80 7c 24 2f 00 0f 85 51 03 00 00 48 85 c0 0f 84 c4 02 00 00 84 d2 74 11 48 8b 7c 24 08 e8 ec 33 00 00 48 85 c0 75 02 <0f> 0b 41 f7 c6 00 02 00 00 74 05 e8 97 99 ff ff 48 8b 7c 24 08 e8 [ 200.413270] RSP: 0018:ffffc1b080005f10 EFLAGS: 00010046 [ 200.413277] RAX: 0000000000000000 RBX: ffff9c2a182373b8 RCX: 00000000802a0028 [ 200.413278] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9c2a182373b8 [ 200.413279] RBP: ffff9c2a18237340 R08: ffff9c26c44f3de0 R09: 00000000802a0028 [ 200.413279] R10: 00000000802a0028 R11: ffffffffbd74a628 R12: ffffffffbd60a940 [ 200.413280] R13: 00000000000355c8 R14: 0000000000000246 R15: ffffffffffffffff [ 200.413283] FS: 0000000000000000(0000) GS:ffff9c2a18200000(0000) knlGS:0000000000000000 [ 200.413285] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 200.413286] CR2: 000055aea04216a8 CR3: 0000000100a96000 CR4: 0000000000750ef0 [ 200.413289] PKRU: 55555554 [ 200.413291] Call Trace: [ 200.413305] <IRQ> [ 200.413309] ? __warn+0x89/0x140 [ 200.413320] ? rcu_core+0x3d9/0x7c0 [ 200.413322] ? report_bug+0x164/0x1a0 [ 200.413344] ? handle_bug+0x58/0xa0 [ 200.413361] ? exc_invalid_op+0x17/0x80 [ 200.413363] ? asm_exc_invalid_op+0x1a/0x20 [ 200.413378] ? rcu_core+0x3d9/0x7c0 [ 200.413380] ? rcu_core+0x269/0x7c0 [ 200.413381] handle_softirqs+0xd9/0x2e0 [ 200.413400] __irq_exit_rcu+0x63/0x80 [ 200.413404] sysvec_apic_timer_interrupt+0x71/0xa0 [ 200.413421] </IRQ> [ 200.413423] <TASK> [ 200.413423] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 200.413436] RIP: 0010:pv_native_safe_halt+0xf/0x20 [ 200.413440] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 55 f5 39 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 [ 200.413440] RSP: 0018:ffffffffbd603e88 EFLAGS: 00000206 [ 200.413442] RAX: ffff9c2a18200000 RBX: ffffffffbd60a940 RCX: 0000000000000000 [ 200.413443] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000012dd64 [ 200.413443] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000002 [ 200.413443] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ 200.413444] R13: 0000000000000000 R14: ffffffffbd60a040 R15: 000000000008a000 [ 200.413446] ? ct_kernel_exit.constprop.0+0x5d/0x80 [ 200.413448] default_idle+0x9/0x20 [ 200.413449] default_idle_call+0x30/0x100 [ 200.413451] do_idle+0x1fb/0x240 [ 200.413473] cpu_startup_entry+0x29/0x40 [ 200.413476] rest_init+0xcc/0xe0 [ 200.413477] start_kernel+0x61b/0x8a0 [ 200.413533] x86_64_start_reservations+0x18/0x40 [ 200.413560] x86_64_start_kernel+0x7a/0x80 [ 200.413563] common_startup_64+0x13e/0x141 [ 200.413584] </TASK> [ 200.413586] ---[ end trace 0000000000000000 ]--- [ 200.492377] ------------[ cut here ]------------ [ 200.493101] WARNING: CPU: 0 PID: 1419 at kernel/rcu/tree.c:2628 rcu_core+0x70e/0x7c0 [ 200.494063] Modules linked in: tempesta_fw(OE) tempesta_db(OE) tempesta_tls(OE) tempesta_lib(OE) veth intel_rapl_msr intel_rapl_common xt_conntrack xt_MASQUERADE nft_masq xfrm_user xfrm_algo nft_chain_nat nf_nat snd_hda_codec_generic nf_conntrack snd_hda_intel xt_addrtype nf_defrag_ipv6 nft_compat nf_defrag_ipv4 snd_intel_dspcfg bridge snd_intel_sdw_acpi snd_hda_codec stp llc snd_hda_core nf_tables overlay qxl snd_hwdep kvm_amd snd_pcm i2c_i801 drm_ttm_helper snd_timer ccp cfg80211 ttm i2c_mux snd binfmt_misc kvm joydev input_leds i2c_smbus lpc_ich drm_kms_helper soundcore virtiofs mac_hid serio_raw sch_fq_codel dm_multipath efi_pstore drm nfnetlink dmi_sysfs qemu_fw_cfg ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 crct10dif_pclmul hid_generic crc32_pclmul ghash_clmulni_intel virtio_net usbhid ahci sha512_ssse3 net_failover psmouse virtio_scsi sha256_ssse3 virtio_rng libahci failover hid virtio_blk [ 200.494102] aesni_intel crypto_simd cryptd [last unloaded: tempesta_lib(OE)] [ 200.506042] CPU: 0 UID: 0 PID: 1419 Comm: kworker/0:9 Tainted: G W OE 6.12.12+ #225 [ 200.507170] Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 200.508032] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.17.0-2-2 04/01/2014 [ 200.509228] Workqueue: dio/vda2 iomap_dio_complete_work [ 200.509976] RIP: 0010:rcu_core+0x70e/0x7c0 [ 200.510571] Code: 8b 15 f6 98 b8 01 48 f7 da 48 85 d2 7e 33 48 8b 55 78 48 85 d2 0f 84 aa fc ff ff eb 31 48 8b 45 78 48 85 c0 0f 85 bc fc ff ff <0f> 0b e9 c6 fc ff ff 48 89 ce 4c 89 ff e8 a0 a4 eb 00 e9 d3 f9 ff [ 200.513008] RSP: 0018:ffffc1b080005f10 EFLAGS: 00010046 [ 200.513732] RAX: 0000000000000000 RBX: ffff9c2a182373b8 RCX: ffff9c2a18237460 [ 200.514667] RDX: ffffffffffffd8f0 RSI: 0000000000000001 RDI: ffff9c2a182373b8 [ 200.515746] RBP: ffff9c2a18237340 R08: 0000000000000001 R09: 7fffffffffffffff [ 200.516730] R10: ffffffffbd6060c0 R11: 00000000000ecef5 R12: ffff9c26c4264000 [ 200.517710] R13: 00000000000355c8 R14: 0000000000000246 R15: ffffffffffffffff [ 200.518768] FS: 0000000000000000(0000) GS:ffff9c2a18200000(0000) knlGS:0000000000000000 [ 200.519894] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 200.520685] CR2: 000055aea04216a8 CR3: 0000000101ff2000 CR4: 0000000000750ef0 [ 200.521624] PKRU: 55555554 [ 200.522073] Call Trace: [ 200.522551] <IRQ> [ 200.522977] ? __warn+0x89/0x140 [ 200.523538] ? rcu_core+0x70e/0x7c0 [ 200.524075] ? report_bug+0x164/0x1a0 [ 200.524634] ? handle_bug+0x58/0xa0 [ 200.525168] ? exc_invalid_op+0x17/0x80 [ 200.525742] ? asm_exc_invalid_op+0x1a/0x20 [ 200.526347] ? rcu_core+0x70e/0x7c0 [ 200.526883] ? rcu_core+0x269/0x7c0 [ 200.527509] ? __hrtimer_run_queues+0x141/0x2a0 [ 200.528166] handle_softirqs+0xd9/0x2e0 [ 200.528779] __irq_exit_rcu+0x63/0x80 [ 200.529383] sysvec_apic_timer_interrupt+0x71/0xa0 [ 200.530061] </IRQ> [ 200.530440] <TASK> [ 200.530827] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 200.531580] RIP: 0010:_raw_spin_unlock_irqrestore+0x1d/0x40 [ 200.532338] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 e8 b2 0a 00 00 90 f7 c6 00 02 00 00 74 06 fb 0f 1f 44 00 00 <65> ff 0d a4 81 5b 43 74 05 c3 cc cc cc cc 0f 1f 44 00 00 c3 cc cc [ 200.534759] RSP: 0018:ffffc1b082567e58 EFLAGS: 00000206 [ 200.535534] RAX: 0000000000000001 RBX: ffff9c26c0ce9f00 RCX: ffff9c26c867d368 [ 200.536465] RDX: 0000000000000000 RSI: 0000000000000287 RDI: ffff9c26c867d360 [ 200.537508] RBP: ffff9c26c3888800 R08: 0000000000000001 R09: 0000000000000002 [ 200.538542] R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000004000 [ 200.539603] R13: ffff9c26c4264000 R14: ffff9c270dd39160 R15: 0000000000000000 [ 200.540532] ? _raw_spin_unlock_irqrestore+0xe/0x40 [ 200.541209] aio_complete_rw+0xdb/0x1c0 [ 200.541787] process_one_work+0x16d/0x380 [ 200.542380] worker_thread+0x2cb/0x3e0 [ 200.543032] ? __pfx_worker_thread+0x20/0x20 [ 200.543689] kthread+0xcf/0x100 [ 200.544178] ? __pfx_kthread+0x20/0x20 [ 200.544720] ret_from_fork+0x31/0x60 [ 200.545246] ? __pfx_kthread+0x20/0x20 [ 200.545780] ret_from_fork_asm+0x22/0x60 [ 200.546345] </TASK> [ 200.546707] ---[ end trace 0000000000000000 ]--- [ 200.565583] [tdb] Close table 'filter0.tdb' [ 200.566166] [tempesta fw] modules are stopped [ 200.693391] [tempesta fw] exiting... [ 200.805140] [tdb] Shutdown Tempesta DB

const-t · 2026-03-20T15:27:59Z

fw/client.c

+}
+
+static void
+cli_mem_release(struct percpu_ref *ref)


It seems this function will be called twice, the first time from percpu_ref_call_confirm_rcu() when per-cpu will be switched to atomic calling percpu_ref_kill_and_confirm() and the second time when refcounter reached zero in percpu_ref_put_many(). Please check it.

EvgeniiMekhanik requested review from const-t and krizhanovsky July 1, 2025 17:59

EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-1346-1 branch from a62787c to c9bba36 Compare July 2, 2025 09:46

EvgeniiMekhanik commented Jul 3, 2025

View reviewed changes

krizhanovsky mentioned this pull request Jul 7, 2025

HTTTP/2 (D)DoS prevention #1346

Open

2 tasks

EvgeniiMekhanik marked this pull request as draft July 8, 2025 09:12

EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-1346-1 branch from c9bba36 to 26e3525 Compare July 8, 2025 09:12

const-t reviewed Jul 8, 2025

View reviewed changes

fw/http_limits.h Outdated Show resolved Hide resolved

fw/sock_clnt.c Show resolved Hide resolved

fw/ss_skb.h Outdated Show resolved Hide resolved

EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-1346-1 branch 16 times, most recently from 40654f8 to e2de424 Compare July 11, 2025 14:09

EvgeniiMekhanik marked this pull request as ready for review July 11, 2025 14:09

EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-1346-1 branch 5 times, most recently from 7b5e367 to ac06de7 Compare July 14, 2025 21:12

EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-1346-1 branch from 92911ea to 1975ddc Compare January 14, 2026 15:03

const-t reviewed Feb 3, 2026

View reviewed changes

const-t reviewed Feb 16, 2026

View reviewed changes

const-t reviewed Feb 18, 2026

View reviewed changes

EvgeniiMekhanik added 19 commits March 18, 2026 19:52

Implement new config option client_mem.

a6f50a7

client_mem <soft_limit> <hard_limit> - controls haw many memory is used to store unanswered client requests and requests with linked responses which can not be forwarded to a client.

Codestyle fixes

37cd3d4

Change API of some functions.

3692a6c

To track socket memory we should pass TfwHttpMsg * not TfwMsgIter * to most of http_nsg_* functions, because TfwHttpMsg has a pointer to connection and socket.

Adjust memory used by Tempesta FW.

e4c36dc

In task #498 we decide to use `client_mem` option to limit count of memory used by client. This commit is a part of this task - now Tempesta FW uses `sk->sk_rmem_alloc` to adjust memory used by Tempesta FW for this client connection.

Drop connetion with TCP RST if client mem is exceeded

52806ac

If administrator specify `client_mem` and the memory used by all connection of current client exceeded this value Tempesta FW drops connection and block client by ip if `ip_block on;` is specified.

TMP (need merge)

a63721f

TMP (NEED MERGE)

64977da

Remove extra connections callbacks

5e8fd8a

- Usually we use callbacks which are set in `skb->cb` for different purposes. So remove to callbacks, which was added in previous patches and use callbacks saved in `skb->cb`.

Use per cpu counter instead of atomic

8a0c78d

A big performance degradation was found after this patch. During investigation it was found that the problem is in usage atomic counter for client mem accounting. Usage per_cpu array instead of atomic counter fix a performance issue.

const-t reviewed Mar 19, 2026

View reviewed changes

const-t reviewed Mar 20, 2026

View reviewed changes


		assert_spin_locked(&client_db->ga_lock);

		cli->mem = tfw_alloc_percpu(long);

Conversation

EvgeniiMekhanik commented Jul 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EvgeniiMekhanik commented Feb 8, 2026

Uh oh!

EvgeniiMekhanik commented Feb 8, 2026

Uh oh!

EvgeniiMekhanik commented Feb 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants