xdp: Adjust xdp_frame layout to avoid using bitfields
Practical experience (and advice from Alexei) tell us that bitfields in structs lead to un-optimized assembly code. I've verified this change does lead to better x86_64 assembly, both via objdump and playing with code snippets in godbolt.org. Using scripts/bloat-o-meter shows the code size is reduced with 24 bytes for xdp_convert_buff_to_frame() that gets inlined e.g. in i40e_xmit_xdp_tx_ring() which were used for microbenchmarking. Microbenchmarking results do show improvements, but very small and varying between 0.5 to 2 nanosec improvement per packet. The member @metasize is changed from u8 to u32. Future users of this area could split this into two u16 fields. I've also benchmarked with two u16 fields showing equal performance gains and code size reduction. The moved member @frame_sz doesn't change sizeof struct due to existing padding. Like xdp_buff member @frame_sz is placed next to @flags, which allows compiler to optimize assignment of these. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/166393728005.2213882.4162674859542409548.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
Родитель
4991931223
Коммит
b860a1b964
|
@ -164,13 +164,13 @@ struct xdp_frame {
|
|||
void *data;
|
||||
u16 len;
|
||||
u16 headroom;
|
||||
u32 metasize:8;
|
||||
u32 frame_sz:24;
|
||||
u32 metasize; /* uses lower 8-bits */
|
||||
/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
|
||||
* while mem info is valid on remote CPU.
|
||||
*/
|
||||
struct xdp_mem_info mem;
|
||||
struct net_device *dev_rx; /* used by cpumap */
|
||||
u32 frame_sz;
|
||||
u32 flags; /* supported values defined in xdp_buff_flags */
|
||||
};
|
||||
|
||||
|
|
Загрузка…
Ссылка в новой задаче