[活动专区] 【AT-START-F425测评】硬件CRC32/软件CRC32性能测试

[复制链接]
 楼主| zhanzr21 发表于 2022-3-27 14:08 | 显示全部楼层 |阅读模式
本帖最后由 zhanzr21 于 2022-3-27 18:24 编辑

CRC32或者说CRC一个系列(CRC8, CRC16, CRC32)是常见的用于计算一块数据的特征的算法. 最常见的用途是效验, 比如下载了一个文件, CRC一下子, 跟数据源的CRC值比较下, 可以判断有没有数据损坏. 这个算法的缺点在于碰撞概率很大, 只能适用于已经知道元数据的CRC值, 来校验结果数据的情况, 传输信道还必须基本可用, 不能噪音太大. 否则的话两组数据不同, 计算出来的CRC值相同这种情况发生概率较大.

不适用CRC算法的情况下, 一般使用hash算法(CRC从广义来讲也算一种hash, 但是勉强), 常见的hash算法: MD5, SHA1, SHA256, SM3, BLAKE2等等等等.

虽然CRC不少缺点, 但是应用还是很广泛, 尤其是嵌入式软件中, CRC应用的非常广泛. 这是因为这个算法简单, 所需的计算资源, 存储资源很少, 所以在简陋的条件下也很容易实现, 算起来也很快. 比如bootloader阶段, 比如产品出了严重故障需要收集现场数据的情况, 比如以太网收发包快速校验数据等等等.

因为很常见, 所以很多芯片内置了硬件加速单元, 比如大多数的以太网MAC芯片, 还有CISC类的CPU甚至把CRC32作为指令来实现.

AT32F425也自带了CRC32的硬件加速单元, 计算一个32bit的word只需要4个HCLK时钟. 这里就来benchmark一下子硬件实现与软件实现的速度.

需要指出的一点是, 这个硬件单元取数据时用的BigEndian的字序. 使用的CRC32参数如下:
  1. Width  : 32
  2.   Poly   : 0x04c11db7
  3.   Init   : parameter, typically 0xffffffff
  4.   RefIn  : false
  5.   RefOut : false
  6.   XorOut : 0
以下是软件CRC32的代码, 用的是查表实现, 比直接计算要快一点:
  1. #include <stdint.h>
  2. #include <stdio.h>

  3. /* This table was generated by the following program.
  4. #include <stdio.h>
  5. int
  6. main ()
  7. {
  8. uint32_t i, j;
  9. uint32_t c;
  10. int table[256];
  11. for (i = 0; i < 256; i++)
  12. {
  13. for (c = i << 24, j = 8; j > 0; --j)
  14. c = c & 0x80000000 ? (c << 1) ^ 0x04c11db7 : (c << 1);
  15. table[i] = c;
  16. }
  17. printf ("static const uint32_t crc32_table[] =\n{\n");
  18. for (i = 0; i < 256; i += 4)
  19. {
  20. printf (" 0x%08x, 0x%08x, 0x%08x, 0x%08x",
  21. table[i + 0], table[i + 1], table[i + 2], table[i + 3]);
  22. if (i + 4 < 256)
  23. putchar (',');
  24. putchar ('\n');
  25. }
  26. printf ("};\n");
  27. return 0;
  28. }
  29. For more information on CRC, see, e.g.,
  30. http://www.ross.net/crc/download/crc_v3.txt. */

  31. static const uint32_t crc32_table[] = {
  32. 0x00000000, 0x04c11db7, 0x09823b6e, 0x0d4326d9, 0x130476dc, 0x17c56b6b,
  33. 0x1a864db2, 0x1e475005, 0x2608edb8, 0x22c9f00f, 0x2f8ad6d6, 0x2b4bcb61,
  34. 0x350c9b64, 0x31cd86d3, 0x3c8ea00a, 0x384fbdbd, 0x4c11db70, 0x48d0c6c7,
  35. 0x4593e01e, 0x4152fda9, 0x5f15adac, 0x5bd4b01b, 0x569796c2, 0x52568b75,
  36. 0x6a1936c8, 0x6ed82b7f, 0x639b0da6, 0x675a1011, 0x791d4014, 0x7ddc5da3,
  37. 0x709f7b7a, 0x745e66cd, 0x9823b6e0, 0x9ce2ab57, 0x91a18d8e, 0x95609039,
  38. 0x8b27c03c, 0x8fe6dd8b, 0x82a5fb52, 0x8664e6e5, 0xbe2b5b58, 0xbaea46ef,
  39. 0xb7a96036, 0xb3687d81, 0xad2f2d84, 0xa9ee3033, 0xa4ad16ea, 0xa06c0b5d,
  40. 0xd4326d90, 0xd0f37027, 0xddb056fe, 0xd9714b49, 0xc7361b4c, 0xc3f706fb,
  41. 0xceb42022, 0xca753d95, 0xf23a8028, 0xf6fb9d9f, 0xfbb8bb46, 0xff79a6f1,
  42. 0xe13ef6f4, 0xe5ffeb43, 0xe8bccd9a, 0xec7dd02d, 0x34867077, 0x30476dc0,
  43. 0x3d044b19, 0x39c556ae, 0x278206ab, 0x23431b1c, 0x2e003dc5, 0x2ac12072,
  44. 0x128e9dcf, 0x164f8078, 0x1b0ca6a1, 0x1fcdbb16, 0x018aeb13, 0x054bf6a4,
  45. 0x0808d07d, 0x0cc9cdca, 0x7897ab07, 0x7c56b6b0, 0x71159069, 0x75d48dde,
  46. 0x6b93dddb, 0x6f52c06c, 0x6211e6b5, 0x66d0fb02, 0x5e9f46bf, 0x5a5e5b08,
  47. 0x571d7dd1, 0x53dc6066, 0x4d9b3063, 0x495a2dd4, 0x44190b0d, 0x40d816ba,
  48. 0xaca5c697, 0xa864db20, 0xa527fdf9, 0xa1e6e04e, 0xbfa1b04b, 0xbb60adfc,
  49. 0xb6238b25, 0xb2e29692, 0x8aad2b2f, 0x8e6c3698, 0x832f1041, 0x87ee0df6,
  50. 0x99a95df3, 0x9d684044, 0x902b669d, 0x94ea7b2a, 0xe0b41de7, 0xe4750050,
  51. 0xe9362689, 0xedf73b3e, 0xf3b06b3b, 0xf771768c, 0xfa325055, 0xfef34de2,
  52. 0xc6bcf05f, 0xc27dede8, 0xcf3ecb31, 0xcbffd686, 0xd5b88683, 0xd1799b34,
  53. 0xdc3abded, 0xd8fba05a, 0x690ce0ee, 0x6dcdfd59, 0x608edb80, 0x644fc637,
  54. 0x7a089632, 0x7ec98b85, 0x738aad5c, 0x774bb0eb, 0x4f040d56, 0x4bc510e1,
  55. 0x46863638, 0x42472b8f, 0x5c007b8a, 0x58c1663d, 0x558240e4, 0x51435d53,
  56. 0x251d3b9e, 0x21dc2629, 0x2c9f00f0, 0x285e1d47, 0x36194d42, 0x32d850f5,
  57. 0x3f9b762c, 0x3b5a6b9b, 0x0315d626, 0x07d4cb91, 0x0a97ed48, 0x0e56f0ff,
  58. 0x1011a0fa, 0x14d0bd4d, 0x19939b94, 0x1d528623, 0xf12f560e, 0xf5ee4bb9,
  59. 0xf8ad6d60, 0xfc6c70d7, 0xe22b20d2, 0xe6ea3d65, 0xeba91bbc, 0xef68060b,
  60. 0xd727bbb6, 0xd3e6a601, 0xdea580d8, 0xda649d6f, 0xc423cd6a, 0xc0e2d0dd,
  61. 0xcda1f604, 0xc960ebb3, 0xbd3e8d7e, 0xb9ff90c9, 0xb4bcb610, 0xb07daba7,
  62. 0xae3afba2, 0xaafbe615, 0xa7b8c0cc, 0xa379dd7b, 0x9b3660c6, 0x9ff77d71,
  63. 0x92b45ba8, 0x9675461f, 0x8832161a, 0x8cf30bad, 0x81b02d74, 0x857130c3,
  64. 0x5d8a9099, 0x594b8d2e, 0x5408abf7, 0x50c9b640, 0x4e8ee645, 0x4a4ffbf2,
  65. 0x470cdd2b, 0x43cdc09c, 0x7b827d21, 0x7f436096, 0x7200464f, 0x76c15bf8,
  66. 0x68860bfd, 0x6c47164a, 0x61043093, 0x65c52d24, 0x119b4be9, 0x155a565e,
  67. 0x18197087, 0x1cd86d30, 0x029f3d35, 0x065e2082, 0x0b1d065b, 0x0fdc1bec,
  68. 0x3793a651, 0x3352bbe6, 0x3e119d3f, 0x3ad08088, 0x2497d08d, 0x2056cd3a,
  69. 0x2d15ebe3, 0x29d4f654, 0xc5a92679, 0xc1683bce, 0xcc2b1d17, 0xc8ea00a0,
  70. 0xd6ad50a5, 0xd26c4d12, 0xdf2f6bcb, 0xdbee767c, 0xe3a1cbc1, 0xe760d676,
  71. 0xea23f0af, 0xeee2ed18, 0xf0a5bd1d, 0xf464a0aa, 0xf9278673, 0xfde69bc4,
  72. 0x89b8fd09, 0x8d79e0be, 0x803ac667, 0x84fbdbd0, 0x9abc8bd5, 0x9e7d9662,
  73. 0x933eb0bb, 0x97ffad0c, 0xafb010b1, 0xab710d06, 0xa6322bdf, 0xa2f33668,
  74. 0xbcb4666d, 0xb8757bda, 0xb5365d03, 0xb1f740b4};

  75. /*
  76. @deftypefn Extension {uint32_t} crc32 (const uint8_t *@var{buf}, @
  77. int @var{len}, uint32_t @var{init})
  78. Compute the 32-bit CRC of @var{buf} which has length @var{len}. The
  79. starting value is @var{init}; this may be used to compute the CRC of
  80. data split across multiple buffers by passing the return value of each
  81. call as the @var{init} parameter of the next.
  82. This is used by the @command{gdb} remote protocol for the @samp{qCRC}
  83. command. In order to get the same results as gdb for a block of data,
  84. you must pass the first CRC parameter as @code{0xffffffff}.
  85. This CRC can be specified as:
  86. Width : 32
  87. Poly : 0x04c11db7
  88. Init : parameter, typically 0xffffffff
  89. RefIn : false
  90. RefOut : false
  91. XorOut : 0
  92. This differs from the "standard" CRC-32 algorithm in that the values
  93. are not reflected, and there is no final XOR value. These differences
  94. make it easy to compose the values of multiple blocks.
  95. [url=home.php?mod=space&uid=320442]@end[/url] deftypefn
  96. */

  97. uint32_t xcrc32(const uint8_t *buf, size_t len, uint32_t init) {
  98. #define XOROUT 0x00000000
  99. uint32_t crc = init;
  100. while (len--) {
  101. crc = (crc << 8) ^ crc32_table[((crc >> 24) ^ *buf) & 255];
  102. buf++;
  103. }
  104. return crc ^ XOROUT;
  105. #undef XOROUT
  106. }
以下是测试代码, 120个word的buffer, 重复了5000次, 注意这个硬件CRC单元用的big endian字序, 所以软件计算时, 需要把buffer换一下子字序, 否则结果对不上. 实际使用的时, 也需要注意这一点.
  1. #include "at32f425.h"
  2. #include "at32f425_clock.h"
  3. #include "custom_at32f425_board.h"

  4. __IO uint32_t g_Ticks;

  5. #define TEST_LOOP 5000
  6. #define CRC32_REF_RESULT 0xE5DFCF6D

  7. #define BUFFER_SIZE 120
  8. static const uint32_t data_buffer[BUFFER_SIZE] = {
  9.     0xc33dd31c, 0xe37ff35e, 0x129022f3, 0x32d24235, 0x52146277, 0x7256b5ea,
  10.     0x4a755a54, 0x6a377a16, 0x0af11ad0, 0x2ab33a92, 0xed0fdd6c, 0xcd4dbdaa,
  11.     0xbb3bab1a, 0x6ca67c87, 0x5cc52c22, 0x3c030c60, 0x1c41edae, 0xfd8fcdec,
  12.     0xad8b9de8, 0x8dc97c26, 0x5c644c45, 0x3ca22c83, 0x1ce00cc1, 0xef1fff3e,
  13.     0x95a88589, 0xf56ee54f, 0xd52cc50d, 0x34e224c3, 0x04817466, 0x64475424,
  14.     0x78066827, 0x18c008e1, 0x28a3cb7d, 0xdb5ceb3f, 0xfb1e8bf9, 0x9bd8abbb,
  15.     0xdf7caf9b, 0xbfba8fd9, 0x9ff86e17, 0x7e364e55, 0x2e933eb2, 0x0ed11ef0,
  16.     0xa35ad3bd, 0xc39cf3ff, 0xe3de2462, 0x34430420, 0x64e674c7, 0x44a45485,
  17.     0xad2abd0b, 0x8d689d49, 0x7e976eb6, 0x5ed54ef4, 0x2e321e51, 0x0e70ff9f,
  18.     0xefbedfdd, 0xcffcbf1b, 0x9f598f78, 0x918881a9, 0xb1caa1eb, 0xd10cc12d,
  19.     0xe16f1080, 0x00a130c2, 0x20e35004, 0x40257046, 0x83b99398, 0xa3fbb3da,
  20.     0x00001021, 0x20423063, 0x408450a5, 0x60c670e7, 0x9129a14a, 0xb16bc18c,
  21.     0x569546b4, 0xb75ba77a, 0x97198738, 0xf7dfe7fe, 0xc7bc48c4, 0x58e56886,
  22.     0x4405a7db, 0xb7fa8799, 0xe75ff77e, 0xc71dd73c, 0x26d336f2, 0x069116b0,
  23.     0x76764615, 0x5634d94c, 0xc96df90e, 0xe92f99c8, 0xb98aa9ab, 0x58444865,
  24.     0x78a70840, 0x18612802, 0xc9ccd9ed, 0xe98ef9af, 0x89489969, 0xa90ab92b,
  25.     0xd1ade1ce, 0xf1ef1231, 0x32732252, 0x52b54294, 0x72f762d6, 0x93398318,
  26.     0xa56ab54b, 0x85289509, 0xf5cfc5ac, 0xd58d3653, 0x26721611, 0x063076d7,
  27.     0x8d689d49, 0xf7dfe7fe, 0xe98ef9af, 0x063076d7, 0x93398318, 0xb98aa9ab,
  28.     0x4ad47ab7, 0x6a961a71, 0x0a503a33, 0x2a12dbfd, 0xfbbfeb9e, 0x9b798b58};

  29. static const uint32_t bswap32_data_buffer[BUFFER_SIZE] = {
  30.     0x1CD33DC3, 0x5EF37FE3, 0xF3229012, 0x3542D232, 0x77621452, 0xEAB55672,
  31.     0x545A754A, 0x167A376A, 0xD01AF10A, 0x923AB32A, 0x6CDD0FED, 0xAABD4DCD,
  32.     0x1AAB3BBB, 0x877CA66C, 0x222CC55C, 0x600C033C, 0xAEED411C, 0xECCD8FFD,
  33.     0xE89D8BAD, 0x267CC98D, 0x454C645C, 0x832CA23C, 0xC10CE01C, 0x3EFF1FEF,
  34.     0x8985A895, 0x4FE56EF5, 0x0DC52CD5, 0xC324E234, 0x66748104, 0x24544764,
  35.     0x27680678, 0xE108C018, 0x7DCBA328, 0x3FEB5CDB, 0xF98B1EFB, 0xBBABD89B,
  36.     0x9BAF7CDF, 0xD98FBABF, 0x176EF89F, 0x554E367E, 0xB23E932E, 0xF01ED10E,
  37.     0xBDD35AA3, 0xFFF39CC3, 0x6224DEE3, 0x20044334, 0xC774E664, 0x8554A444,
  38.     0x0BBD2AAD, 0x499D688D, 0xB66E977E, 0xF44ED55E, 0x511E322E, 0x9FFF700E,
  39.     0xDDDFBEEF, 0x1BBFFCCF, 0x788F599F, 0xA9818891, 0xEBA1CAB1, 0x2DC10CD1,
  40.     0x80106FE1, 0xC230A100, 0x0450E320, 0x46702540, 0x9893B983, 0xDAB3FBA3,
  41.     0x21100000, 0x63304220, 0xA5508440, 0xE770C660, 0x4AA12991, 0x8CC16BB1,
  42.     0xB4469556, 0x7AA75BB7, 0x38871997, 0xFEE7DFF7, 0xC448BCC7, 0x8668E558,
  43.     0xDBA70544, 0x9987FAB7, 0x7EF75FE7, 0x3CD71DC7, 0xF236D326, 0xB0169106,
  44.     0x15467676, 0x4CD93456, 0x0EF96DC9, 0xC8992FE9, 0xABA98AB9, 0x65484458,
  45.     0x4008A778, 0x02286118, 0xEDD9CCC9, 0xAFF98EE9, 0x69994889, 0x2BB90AA9,
  46.     0xCEE1ADD1, 0x3112EFF1, 0x52227332, 0x9442B552, 0xD662F772, 0x18833993,
  47.     0x4BB56AA5, 0x09952885, 0xACC5CFF5, 0x53368DD5, 0x11167226, 0xD7763006,
  48.     0x499D688D, 0xFEE7DFF7, 0xAFF98EE9, 0xD7763006, 0x18833993, 0xABA98AB9,
  49.     0xB77AD44A, 0x711A966A, 0x333A500A, 0xFDDB122A, 0x9EEBBFFB, 0x588B799B,
  50. };

  51. __IO uint32_t crc_value = 0;

  52. extern uint32_t xcrc32(const uint8_t *buf, size_t len, uint32_t init);

  53. int main(void) {
  54.   system_clock_config();

  55.   uint32_t test_tick_0;
  56.   uint32_t test_tick_1;

  57.   /* System timer configuration */
  58.   SysTick_Config(system_core_clock / 1000);

  59.   uart_print_init(115200);
  60.   at32_board_init();
  61.   /* enable crc clock */
  62.   crm_periph_clock_enable(CRM_CRC_PERIPH_CLOCK, TRUE);

  63.   printf("AT START F425 Board [url=home.php?mod=space&uid=72445]@[/url] %u MHz\n", system_core_clock / (1000000));
  64.   printf("Boot Mem:%02X\n", scfg_mem_map_get());

  65.   printf("CRC test start\n");

  66.   {
  67.     crc_data_reset();
  68.     crc_value = crc_block_calculate((uint32_t *)data_buffer, BUFFER_SIZE);
  69.     printf("Hardware:\t%08X\n", crc_value);
  70.     crc_value = xcrc32((const uint8_t *)bswap32_data_buffer,
  71.                        sizeof(bswap32_data_buffer), 0xffffffff);
  72.     printf("Software:\t%08X\n", crc_value);
  73.   }
  74.   {
  75.     crc_data_reset();
  76.     crc_value =
  77.         crc_block_calculate((uint32_t *)bswap32_data_buffer, BUFFER_SIZE);
  78.     printf("Hardware:\t%08X\n", crc_value);
  79.     crc_value =
  80.         xcrc32((const uint8_t *)data_buffer, sizeof(data_buffer), 0xffffffff);
  81.     printf("Software:\t%08X\n", crc_value);
  82.   }

  83.   test_tick_0 = g_Ticks;
  84.   for (uint32_t i = 0; i < TEST_LOOP; ++i) {
  85.     crc_data_reset();

  86.     /* compute the crc of "data_buffer" */
  87.     crc_value = crc_block_calculate((uint32_t *)data_buffer, BUFFER_SIZE);
  88.     if (crc_value == CRC32_REF_RESULT) {
  89.       continue;
  90.     } else {
  91.       printf("error %08X\n", crc_value);
  92.       break;
  93.     }
  94.   }
  95.   test_tick_1 = g_Ticks;

  96.   printf("Hardware CRC test end[%08X], %u, %u, [%u]\n", crc_value, test_tick_0,
  97.          test_tick_1, (test_tick_1 - test_tick_0));

  98.   test_tick_0 = g_Ticks;
  99.   for (uint32_t i = 0; i < TEST_LOOP; ++i) {
  100.     crc_value = xcrc32((const uint8_t *)bswap32_data_buffer,
  101.                        sizeof(bswap32_data_buffer), 0xffffffff);

  102.     if (crc_value == CRC32_REF_RESULT) {
  103.       continue;
  104.     } else {
  105.       printf("error %08X\n", crc_value);
  106.       break;
  107.     }
  108.   }
  109.   test_tick_1 = g_Ticks;

  110.   printf("Software CRC test end[%08X], %u, %u, [%u]\n", crc_value, test_tick_0,
  111.          test_tick_1, (test_tick_1 - test_tick_0));

  112.   printf("F425[url=home.php?mod=space&uid=72445]@[/url] %u MHz\n", system_core_clock / (1000000));

  113.   while (1) {
  114.     // printf("%u MHz, Ticks:%u\n", system_core_clock/(1000000), g_Ticks);

  115.     test_tick_0 = g_Ticks;
  116.     while ((test_tick_0 + 200) > g_Ticks) {
  117.       __NOP();
  118.       __WFI();
  119.     }
  120.     at32_led_toggle(LED2);

  121.     test_tick_0 = g_Ticks;
  122.     while ((test_tick_0 + 400) > g_Ticks) {
  123.       __NOP();
  124.       __WFI();
  125.     }
  126.     at32_led_toggle(LED3);

  127.     test_tick_0 = g_Ticks;
  128.     while ((test_tick_0 + 800) > g_Ticks) {
  129.       __NOP();
  130.       __WFI();
  131.     }
  132.     at32_led_toggle(LED4);
  133.   }
  134. }
测试的结果:
69100623ffdc45c0a8.png
画成图表更容易看:
crc32_benchmark.png
可以得出几个结论:
1. 硬件CRC速度比软件CRC计算快很多, 最快的情况下7倍以上
2. 硬件CRC速度还是受软件优化级别影响, 主要是数据搬运还是需要软件代码
3. 使用microlib, stdlib对性能没有什么影响, 因为CRC计算中基本没有调用libc的函数, 主要就是逻辑运算和数据搬运

4. ARMCLANG比ARMCC的优化性能要高, 但是当不使用优化时, ARMCLANG的性能较差, 原因应该是插入了很多调试代码.

本帖到此为止, 代码地址:
https://github.com/zhanzr/at32f425-prj.git
分支: crc_test


   
您需要登录后才可以回帖 登录 | 注册

本版积分规则

个人签名:每天都進步

91

主题

1017

帖子

34

粉丝
快速回复 在线客服 返回列表 返回顶部
个人签名:每天都進步

91

主题

1017

帖子

34

粉丝
快速回复 在线客服 返回列表 返回顶部