打印
[活动专区]

【AT-START-F425测评】硬件CRC32/软件CRC32性能测试

[复制链接]
1233|0
手机看帖
扫描二维码
随时随地手机跟帖
跳转到指定楼层
楼主
本帖最后由 zhanzr21 于 2022-3-27 18:24 编辑

CRC32或者说CRC一个系列(CRC8, CRC16, CRC32)是常见的用于计算一块数据的特征的算法. 最常见的用途是效验, 比如下载了一个文件, CRC一下子, 跟数据源的CRC值比较下, 可以判断有没有数据损坏. 这个算法的缺点在于碰撞概率很大, 只能适用于已经知道元数据的CRC值, 来校验结果数据的情况, 传输信道还必须基本可用, 不能噪音太大. 否则的话两组数据不同, 计算出来的CRC值相同这种情况发生概率较大.

不适用CRC算法的情况下, 一般使用hash算法(CRC从广义来讲也算一种hash, 但是勉强), 常见的hash算法: MD5, SHA1, SHA256, SM3, BLAKE2等等等等.

虽然CRC不少缺点, 但是应用还是很广泛, 尤其是嵌入式软件中, CRC应用的非常广泛. 这是因为这个算法简单, 所需的计算资源, 存储资源很少, 所以在简陋的条件下也很容易实现, 算起来也很快. 比如bootloader阶段, 比如产品出了严重故障需要收集现场数据的情况, 比如以太网收发包快速校验数据等等等.

因为很常见, 所以很多芯片内置了硬件加速单元, 比如大多数的以太网MAC芯片, 还有CISC类的CPU甚至把CRC32作为指令来实现.

AT32F425也自带了CRC32的硬件加速单元, 计算一个32bit的word只需要4个HCLK时钟. 这里就来benchmark一下子硬件实现与软件实现的速度.

需要指出的一点是, 这个硬件单元取数据时用的BigEndian的字序. 使用的CRC32参数如下:
Width  : 32
  Poly   : 0x04c11db7
  Init   : parameter, typically 0xffffffff
  RefIn  : false
  RefOut : false
  XorOut : 0
以下是软件CRC32的代码, 用的是查表实现, 比直接计算要快一点:
#include <stdint.h>
#include <stdio.h>

/* This table was generated by the following program.
#include <stdio.h>
int
main ()
{
uint32_t i, j;
uint32_t c;
int table[256];
for (i = 0; i < 256; i++)
{
for (c = i << 24, j = 8; j > 0; --j)
c = c & 0x80000000 ? (c << 1) ^ 0x04c11db7 : (c << 1);
table[i] = c;
}
printf ("static const uint32_t crc32_table[] =\n{\n");
for (i = 0; i < 256; i += 4)
{
printf (" 0x%08x, 0x%08x, 0x%08x, 0x%08x",
table[i + 0], table[i + 1], table[i + 2], table[i + 3]);
if (i + 4 < 256)
putchar (',');
putchar ('\n');
}
printf ("};\n");
return 0;
}
For more information on CRC, see, e.g.,
http://www.ross.net/crc/download/crc_v3.txt. */

static const uint32_t crc32_table[] = {
0x00000000, 0x04c11db7, 0x09823b6e, 0x0d4326d9, 0x130476dc, 0x17c56b6b,
0x1a864db2, 0x1e475005, 0x2608edb8, 0x22c9f00f, 0x2f8ad6d6, 0x2b4bcb61,
0x350c9b64, 0x31cd86d3, 0x3c8ea00a, 0x384fbdbd, 0x4c11db70, 0x48d0c6c7,
0x4593e01e, 0x4152fda9, 0x5f15adac, 0x5bd4b01b, 0x569796c2, 0x52568b75,
0x6a1936c8, 0x6ed82b7f, 0x639b0da6, 0x675a1011, 0x791d4014, 0x7ddc5da3,
0x709f7b7a, 0x745e66cd, 0x9823b6e0, 0x9ce2ab57, 0x91a18d8e, 0x95609039,
0x8b27c03c, 0x8fe6dd8b, 0x82a5fb52, 0x8664e6e5, 0xbe2b5b58, 0xbaea46ef,
0xb7a96036, 0xb3687d81, 0xad2f2d84, 0xa9ee3033, 0xa4ad16ea, 0xa06c0b5d,
0xd4326d90, 0xd0f37027, 0xddb056fe, 0xd9714b49, 0xc7361b4c, 0xc3f706fb,
0xceb42022, 0xca753d95, 0xf23a8028, 0xf6fb9d9f, 0xfbb8bb46, 0xff79a6f1,
0xe13ef6f4, 0xe5ffeb43, 0xe8bccd9a, 0xec7dd02d, 0x34867077, 0x30476dc0,
0x3d044b19, 0x39c556ae, 0x278206ab, 0x23431b1c, 0x2e003dc5, 0x2ac12072,
0x128e9dcf, 0x164f8078, 0x1b0ca6a1, 0x1fcdbb16, 0x018aeb13, 0x054bf6a4,
0x0808d07d, 0x0cc9cdca, 0x7897ab07, 0x7c56b6b0, 0x71159069, 0x75d48dde,
0x6b93dddb, 0x6f52c06c, 0x6211e6b5, 0x66d0fb02, 0x5e9f46bf, 0x5a5e5b08,
0x571d7dd1, 0x53dc6066, 0x4d9b3063, 0x495a2dd4, 0x44190b0d, 0x40d816ba,
0xaca5c697, 0xa864db20, 0xa527fdf9, 0xa1e6e04e, 0xbfa1b04b, 0xbb60adfc,
0xb6238b25, 0xb2e29692, 0x8aad2b2f, 0x8e6c3698, 0x832f1041, 0x87ee0df6,
0x99a95df3, 0x9d684044, 0x902b669d, 0x94ea7b2a, 0xe0b41de7, 0xe4750050,
0xe9362689, 0xedf73b3e, 0xf3b06b3b, 0xf771768c, 0xfa325055, 0xfef34de2,
0xc6bcf05f, 0xc27dede8, 0xcf3ecb31, 0xcbffd686, 0xd5b88683, 0xd1799b34,
0xdc3abded, 0xd8fba05a, 0x690ce0ee, 0x6dcdfd59, 0x608edb80, 0x644fc637,
0x7a089632, 0x7ec98b85, 0x738aad5c, 0x774bb0eb, 0x4f040d56, 0x4bc510e1,
0x46863638, 0x42472b8f, 0x5c007b8a, 0x58c1663d, 0x558240e4, 0x51435d53,
0x251d3b9e, 0x21dc2629, 0x2c9f00f0, 0x285e1d47, 0x36194d42, 0x32d850f5,
0x3f9b762c, 0x3b5a6b9b, 0x0315d626, 0x07d4cb91, 0x0a97ed48, 0x0e56f0ff,
0x1011a0fa, 0x14d0bd4d, 0x19939b94, 0x1d528623, 0xf12f560e, 0xf5ee4bb9,
0xf8ad6d60, 0xfc6c70d7, 0xe22b20d2, 0xe6ea3d65, 0xeba91bbc, 0xef68060b,
0xd727bbb6, 0xd3e6a601, 0xdea580d8, 0xda649d6f, 0xc423cd6a, 0xc0e2d0dd,
0xcda1f604, 0xc960ebb3, 0xbd3e8d7e, 0xb9ff90c9, 0xb4bcb610, 0xb07daba7,
0xae3afba2, 0xaafbe615, 0xa7b8c0cc, 0xa379dd7b, 0x9b3660c6, 0x9ff77d71,
0x92b45ba8, 0x9675461f, 0x8832161a, 0x8cf30bad, 0x81b02d74, 0x857130c3,
0x5d8a9099, 0x594b8d2e, 0x5408abf7, 0x50c9b640, 0x4e8ee645, 0x4a4ffbf2,
0x470cdd2b, 0x43cdc09c, 0x7b827d21, 0x7f436096, 0x7200464f, 0x76c15bf8,
0x68860bfd, 0x6c47164a, 0x61043093, 0x65c52d24, 0x119b4be9, 0x155a565e,
0x18197087, 0x1cd86d30, 0x029f3d35, 0x065e2082, 0x0b1d065b, 0x0fdc1bec,
0x3793a651, 0x3352bbe6, 0x3e119d3f, 0x3ad08088, 0x2497d08d, 0x2056cd3a,
0x2d15ebe3, 0x29d4f654, 0xc5a92679, 0xc1683bce, 0xcc2b1d17, 0xc8ea00a0,
0xd6ad50a5, 0xd26c4d12, 0xdf2f6bcb, 0xdbee767c, 0xe3a1cbc1, 0xe760d676,
0xea23f0af, 0xeee2ed18, 0xf0a5bd1d, 0xf464a0aa, 0xf9278673, 0xfde69bc4,
0x89b8fd09, 0x8d79e0be, 0x803ac667, 0x84fbdbd0, 0x9abc8bd5, 0x9e7d9662,
0x933eb0bb, 0x97ffad0c, 0xafb010b1, 0xab710d06, 0xa6322bdf, 0xa2f33668,
0xbcb4666d, 0xb8757bda, 0xb5365d03, 0xb1f740b4};

/*
@deftypefn Extension {uint32_t} crc32 (const uint8_t *@var{buf}, @
int @var{len}, uint32_t @var{init})
Compute the 32-bit CRC of @var{buf} which has length @var{len}. The
starting value is @var{init}; this may be used to compute the CRC of
data split across multiple buffers by passing the return value of each
call as the @var{init} parameter of the next.
This is used by the @command{gdb} remote protocol for the @samp{qCRC}
command. In order to get the same results as gdb for a block of data,
you must pass the first CRC parameter as @code{0xffffffff}.
This CRC can be specified as:
Width : 32
Poly : 0x04c11db7
Init : parameter, typically 0xffffffff
RefIn : false
RefOut : false
XorOut : 0
This differs from the "standard" CRC-32 algorithm in that the values
are not reflected, and there is no final XOR value. These differences
make it easy to compose the values of multiple blocks.
[url=home.php?mod=space&uid=320442]@end[/url] deftypefn
*/

uint32_t xcrc32(const uint8_t *buf, size_t len, uint32_t init) {
#define XOROUT 0x00000000
uint32_t crc = init;
while (len--) {
crc = (crc << 8) ^ crc32_table[((crc >> 24) ^ *buf) & 255];
buf++;
}
return crc ^ XOROUT;
#undef XOROUT
}
以下是测试代码, 120个word的buffer, 重复了5000次, 注意这个硬件CRC单元用的big endian字序, 所以软件计算时, 需要把buffer换一下子字序, 否则结果对不上. 实际使用的时, 也需要注意这一点.
#include "at32f425.h"
#include "at32f425_clock.h"
#include "custom_at32f425_board.h"

__IO uint32_t g_Ticks;

#define TEST_LOOP 5000
#define CRC32_REF_RESULT 0xE5DFCF6D

#define BUFFER_SIZE 120
static const uint32_t data_buffer[BUFFER_SIZE] = {
    0xc33dd31c, 0xe37ff35e, 0x129022f3, 0x32d24235, 0x52146277, 0x7256b5ea,
    0x4a755a54, 0x6a377a16, 0x0af11ad0, 0x2ab33a92, 0xed0fdd6c, 0xcd4dbdaa,
    0xbb3bab1a, 0x6ca67c87, 0x5cc52c22, 0x3c030c60, 0x1c41edae, 0xfd8fcdec,
    0xad8b9de8, 0x8dc97c26, 0x5c644c45, 0x3ca22c83, 0x1ce00cc1, 0xef1fff3e,
    0x95a88589, 0xf56ee54f, 0xd52cc50d, 0x34e224c3, 0x04817466, 0x64475424,
    0x78066827, 0x18c008e1, 0x28a3cb7d, 0xdb5ceb3f, 0xfb1e8bf9, 0x9bd8abbb,
    0xdf7caf9b, 0xbfba8fd9, 0x9ff86e17, 0x7e364e55, 0x2e933eb2, 0x0ed11ef0,
    0xa35ad3bd, 0xc39cf3ff, 0xe3de2462, 0x34430420, 0x64e674c7, 0x44a45485,
    0xad2abd0b, 0x8d689d49, 0x7e976eb6, 0x5ed54ef4, 0x2e321e51, 0x0e70ff9f,
    0xefbedfdd, 0xcffcbf1b, 0x9f598f78, 0x918881a9, 0xb1caa1eb, 0xd10cc12d,
    0xe16f1080, 0x00a130c2, 0x20e35004, 0x40257046, 0x83b99398, 0xa3fbb3da,
    0x00001021, 0x20423063, 0x408450a5, 0x60c670e7, 0x9129a14a, 0xb16bc18c,
    0x569546b4, 0xb75ba77a, 0x97198738, 0xf7dfe7fe, 0xc7bc48c4, 0x58e56886,
    0x4405a7db, 0xb7fa8799, 0xe75ff77e, 0xc71dd73c, 0x26d336f2, 0x069116b0,
    0x76764615, 0x5634d94c, 0xc96df90e, 0xe92f99c8, 0xb98aa9ab, 0x58444865,
    0x78a70840, 0x18612802, 0xc9ccd9ed, 0xe98ef9af, 0x89489969, 0xa90ab92b,
    0xd1ade1ce, 0xf1ef1231, 0x32732252, 0x52b54294, 0x72f762d6, 0x93398318,
    0xa56ab54b, 0x85289509, 0xf5cfc5ac, 0xd58d3653, 0x26721611, 0x063076d7,
    0x8d689d49, 0xf7dfe7fe, 0xe98ef9af, 0x063076d7, 0x93398318, 0xb98aa9ab,
    0x4ad47ab7, 0x6a961a71, 0x0a503a33, 0x2a12dbfd, 0xfbbfeb9e, 0x9b798b58};

static const uint32_t bswap32_data_buffer[BUFFER_SIZE] = {
    0x1CD33DC3, 0x5EF37FE3, 0xF3229012, 0x3542D232, 0x77621452, 0xEAB55672,
    0x545A754A, 0x167A376A, 0xD01AF10A, 0x923AB32A, 0x6CDD0FED, 0xAABD4DCD,
    0x1AAB3BBB, 0x877CA66C, 0x222CC55C, 0x600C033C, 0xAEED411C, 0xECCD8FFD,
    0xE89D8BAD, 0x267CC98D, 0x454C645C, 0x832CA23C, 0xC10CE01C, 0x3EFF1FEF,
    0x8985A895, 0x4FE56EF5, 0x0DC52CD5, 0xC324E234, 0x66748104, 0x24544764,
    0x27680678, 0xE108C018, 0x7DCBA328, 0x3FEB5CDB, 0xF98B1EFB, 0xBBABD89B,
    0x9BAF7CDF, 0xD98FBABF, 0x176EF89F, 0x554E367E, 0xB23E932E, 0xF01ED10E,
    0xBDD35AA3, 0xFFF39CC3, 0x6224DEE3, 0x20044334, 0xC774E664, 0x8554A444,
    0x0BBD2AAD, 0x499D688D, 0xB66E977E, 0xF44ED55E, 0x511E322E, 0x9FFF700E,
    0xDDDFBEEF, 0x1BBFFCCF, 0x788F599F, 0xA9818891, 0xEBA1CAB1, 0x2DC10CD1,
    0x80106FE1, 0xC230A100, 0x0450E320, 0x46702540, 0x9893B983, 0xDAB3FBA3,
    0x21100000, 0x63304220, 0xA5508440, 0xE770C660, 0x4AA12991, 0x8CC16BB1,
    0xB4469556, 0x7AA75BB7, 0x38871997, 0xFEE7DFF7, 0xC448BCC7, 0x8668E558,
    0xDBA70544, 0x9987FAB7, 0x7EF75FE7, 0x3CD71DC7, 0xF236D326, 0xB0169106,
    0x15467676, 0x4CD93456, 0x0EF96DC9, 0xC8992FE9, 0xABA98AB9, 0x65484458,
    0x4008A778, 0x02286118, 0xEDD9CCC9, 0xAFF98EE9, 0x69994889, 0x2BB90AA9,
    0xCEE1ADD1, 0x3112EFF1, 0x52227332, 0x9442B552, 0xD662F772, 0x18833993,
    0x4BB56AA5, 0x09952885, 0xACC5CFF5, 0x53368DD5, 0x11167226, 0xD7763006,
    0x499D688D, 0xFEE7DFF7, 0xAFF98EE9, 0xD7763006, 0x18833993, 0xABA98AB9,
    0xB77AD44A, 0x711A966A, 0x333A500A, 0xFDDB122A, 0x9EEBBFFB, 0x588B799B,
};

__IO uint32_t crc_value = 0;

extern uint32_t xcrc32(const uint8_t *buf, size_t len, uint32_t init);

int main(void) {
  system_clock_config();

  uint32_t test_tick_0;
  uint32_t test_tick_1;

  /* System timer configuration */
  SysTick_Config(system_core_clock / 1000);

  uart_print_init(115200);
  at32_board_init();
  /* enable crc clock */
  crm_periph_clock_enable(CRM_CRC_PERIPH_CLOCK, TRUE);

  printf("AT START F425 Board [url=home.php?mod=space&uid=72445]@[/url] %u MHz\n", system_core_clock / (1000000));
  printf("Boot Mem:%02X\n", scfg_mem_map_get());

  printf("CRC test start\n");

  {
    crc_data_reset();
    crc_value = crc_block_calculate((uint32_t *)data_buffer, BUFFER_SIZE);
    printf("Hardware:\t%08X\n", crc_value);
    crc_value = xcrc32((const uint8_t *)bswap32_data_buffer,
                       sizeof(bswap32_data_buffer), 0xffffffff);
    printf("Software:\t%08X\n", crc_value);
  }
  {
    crc_data_reset();
    crc_value =
        crc_block_calculate((uint32_t *)bswap32_data_buffer, BUFFER_SIZE);
    printf("Hardware:\t%08X\n", crc_value);
    crc_value =
        xcrc32((const uint8_t *)data_buffer, sizeof(data_buffer), 0xffffffff);
    printf("Software:\t%08X\n", crc_value);
  }

  test_tick_0 = g_Ticks;
  for (uint32_t i = 0; i < TEST_LOOP; ++i) {
    crc_data_reset();

    /* compute the crc of "data_buffer" */
    crc_value = crc_block_calculate((uint32_t *)data_buffer, BUFFER_SIZE);
    if (crc_value == CRC32_REF_RESULT) {
      continue;
    } else {
      printf("error %08X\n", crc_value);
      break;
    }
  }
  test_tick_1 = g_Ticks;

  printf("Hardware CRC test end[%08X], %u, %u, [%u]\n", crc_value, test_tick_0,
         test_tick_1, (test_tick_1 - test_tick_0));

  test_tick_0 = g_Ticks;
  for (uint32_t i = 0; i < TEST_LOOP; ++i) {
    crc_value = xcrc32((const uint8_t *)bswap32_data_buffer,
                       sizeof(bswap32_data_buffer), 0xffffffff);

    if (crc_value == CRC32_REF_RESULT) {
      continue;
    } else {
      printf("error %08X\n", crc_value);
      break;
    }
  }
  test_tick_1 = g_Ticks;

  printf("Software CRC test end[%08X], %u, %u, [%u]\n", crc_value, test_tick_0,
         test_tick_1, (test_tick_1 - test_tick_0));

  printf("F425[url=home.php?mod=space&uid=72445]@[/url] %u MHz\n", system_core_clock / (1000000));

  while (1) {
    // printf("%u MHz, Ticks:%u\n", system_core_clock/(1000000), g_Ticks);

    test_tick_0 = g_Ticks;
    while ((test_tick_0 + 200) > g_Ticks) {
      __NOP();
      __WFI();
    }
    at32_led_toggle(LED2);

    test_tick_0 = g_Ticks;
    while ((test_tick_0 + 400) > g_Ticks) {
      __NOP();
      __WFI();
    }
    at32_led_toggle(LED3);

    test_tick_0 = g_Ticks;
    while ((test_tick_0 + 800) > g_Ticks) {
      __NOP();
      __WFI();
    }
    at32_led_toggle(LED4);
  }
}
测试的结果:

画成图表更容易看:

可以得出几个结论:
1. 硬件CRC速度比软件CRC计算快很多, 最快的情况下7倍以上
2. 硬件CRC速度还是受软件优化级别影响, 主要是数据搬运还是需要软件代码
3. 使用microlib, stdlib对性能没有什么影响, 因为CRC计算中基本没有调用libc的函数, 主要就是逻辑运算和数据搬运

4. ARMCLANG比ARMCC的优化性能要高, 但是当不使用优化时, ARMCLANG的性能较差, 原因应该是插入了很多调试代码.

本帖到此为止, 代码地址:
https://github.com/zhanzr/at32f425-prj.git
分支: crc_test


   

使用特权

评论回复
发新帖 我要提问
您需要登录后才可以回帖 登录 | 注册

本版积分规则

个人签名:每天都進步

91

主题

1013

帖子

34

粉丝