.global complex_asm
complex_asm:
MVK .S2 512, B0
LOOP: LDDW .D1 *A4++,A9:A8
|| LDDW .D2 *B4++,B9:B8
SUB .s2 B0,1,B0
NOP 2
[B0] B .s1 LOOP
MPY .M1x A9, B9, A10
|| MPY .M2x A8, B8, B10
MPY .M1x A9, B8, A11
|| MPY .M2x A8, B9, B11
SUB .s1x B10, A10, A14
ADD .L1x A11, B11, A15
STDW .D A15:A14, *A6++
MV .S1 A6,A4
BNOP .s2 B3,5
这个是我用汇编写的乘法函数,执行时间,是5135个时钟周期
汇编应该是运行最块的,
就是看到这个执行周期,,我才疑惑理论周期到底怎么算?
|