新版DSP库将SIMD32操作直接都换成了memcpy，这是啥操作

[复制链接]

426|0

手机看帖

扫描二维码
随时随地手机跟帖

电梯直达

楼主

电员师|

楼主 | 2018-8-4 17:58 | 只看该作者回帖奖励

|倒序浏览 |阅读模式

MCP, SIM, DSP库, EMC, AD

没感觉出来有什么优势，后面实测下性能。
以加法函数的Q15为例：
原来是：
#define __SIMD32(addr)       (*(__SIMD32_TYPE **) & (addr))
#define __SIMD32_CONST(addr)  ( (__SIMD32_TYPE * ) (addr))
#define _SIMD32_OFFSET(addr)  (*(__SIMD32_TYPE * ) (addr))
#define __SIMD64(addr)       (*(    int64_t **) & (addr))

/* C = A + B */
/* Add and then store the results in the destination buffer. */
inA1 = *__SIMD32(pSrcA)++;
inA2 = *__SIMD32(pSrcA)++;
inB1 = *__SIMD32(pSrcB)++;
inB2 = *__SIMD32(pSrcB)++;

*__SIMD32(pDst)++ = __QADD16(inA1, inB1);
*__SIMD32(pDst)++ = __QADD16(inA2, inB2);

现在变成：
inA1 = read_q15x2_ia ((q15_t **) &pSrcA);
inA2 = read_q15x2_ia ((q15_t **) &pSrcA);

/* read 2 times 2 samples at a time from sourceB */
inB1 = read_q15x2_ia ((q15_t **) &pSrcB);
inB2 = read_q15x2_ia ((q15_t **) &pSrcB);

/* Add and store 2 times 2 samples at a time */
write_q15x2_ia (&pDst, __QADD16(inA1, inB1));
write_q15x2_ia (&pDst, __QADD16(inA2, inB2));

__STATIC_FORCEINLINE q31_t read_q15x2_ia (
  q15_t ** pQ15)
{
  q31_t val;

  memcpy (&val, *pQ15, 4);
  *pQ15 += 2;

  return (val);
}

使用特权