性能调优需要:技术 + 耐心 + 运气
我们公司最高的手就负责性能调优的活。
转篇**给你看看吧:
In contrast to tuning for low code size, many users want to tune lwIP for maximum throughput. This page wants to give an overview what influences the performance of an ethernet device using lwIP.
Architecture designEdit
Favour big-endian systems over little-endian systems if you have the choice (since network byte order is big-endian, so conversion can be omitted)
One bottle neck of the system is the ethernet MAC driver (called "netif-driver" with lwIP):
Use interrupts and DMA if possible
Make sure it is as fast as it can be
Often, drivers can be written in a way to prefer TX or RX. If, for your application, one direction is more important than the other one, make sure this direction is preferred in high load situations!
If the hardware allows, make sure the driver supports scatter-gather. This allows the driver to DMA a packet consisting of multiple pbufs (e.g. one pbuf for the protocol headers and another pbuf for the application data, which can then be sent zero-copy).
The other big bottleneck is (TCP- and UDP-) checksum calculation (creating checksums when transmitting data, checking checksums when receiving data):
If the hardware allows it, leave checksum-generation and -checking to the hardware (see also configuration options CHECKSUM_CHECK_* and CHECKSUM_GEN_*)
If you do not have hardware support, make sure you have a really optimized software routine to calculate the checksums. This routine is probably the most critical path regarding throughput in the whole stack, so knowing the architecture well and writing a highly optimized assembler-routine is recommended!
Define a fast alternative (that copies the architecture's maximum word size) for the default memcpy (define MEMCPY), which results in (slow!) byte-copy on many targets
Configuration options influencing throughputEdit
Options are only listed here if they must be changed from their default values in opt.h. Make sure to check your lwipopts.h for unnecessarily changing from defaults.
First of all, turn on statistics in a test-run (defines LWIP_STATS and *_STATS for each protocol) and check that none of the statistic counters reports an error (member '.err' != 0)
Generally, set the MEMP_NUM_* defines as high as your memory allows to prevent running out of pools in high-load situations
Turn off debugging options (don't define LWIP_DEBUG, )
As mentioned in the previous paragraph, set the CHECKSUM_CHECK_* and CHECKSUM_GEN* defines to 0 if checksum is generated and/or checked by your hardware
If your memory allows it, set MEM_USE_POOLS to 1 and define LWIP_MALLOC_MEMPOOL's in lwippools.h. This may waste memory, but pools are way faster than a heap!
On 32-bit platforms, set ETH_PAD_SIZE to 2 to make sure data and headers are 32-bit aligned.
You may even turn off structure-packing for better performance, but this is not thoroughly tested, yet, so make sure you test it!
When using a version later than 1.3.2, make sure LWIP_CHECKSUM_ON_COPY is set to 1. This lets the stack calculate the checksum on-the-fly when copying data using memcpy. (This has no effect when the hardware generates/checks checksums.)
Set LWIP_RAW to 0 if you don't nee raw pcbs (speeds up input processing).
For TCP optimizations, see Tuning TCP
http://lwip.wikia.com/wiki/Maximizing_throughput
|