使用STM32CubeG4软件包设计利用FMAC的数字滤波器之一(FIR)

只看该作者 · 2020-7-12 20:51

简介

本文介绍使用STM32CubeG4软件包设计利用STM32G4系列MCU的FMA外设的数字滤波器.

数字滤波器应用广泛,从音频处理,工业控制到IoT领域均有应用.

比如从通信信号中去除干扰,控制数字电源或者压缩音频采样数据. 传统的设计中利用专用的DSP器件, 但是使用通用的微控制器实现该功能越来越普遍, 尤其在低成本的应用中可以节省器件. 现代的微控制器内核,如ARM Cortex M4, 本身就包含加速数**算的硬件特性. 但即使使用通用的微控制器, DSP功能本身的计算认物依然很繁重, 某些情况下会占用原本控制任务所需的计算资源.

有三种思路来解决计算资源问题:

· 升级主控制器, 使用更快更强大的应用处理器;
· 在原有的电路中增加一个专用DSP器件;
· 使用专用的硬件加速单元来卸载此类软件计算任务.

显然第三种思路的成本与功耗性价比最高.但是在通用的微控制器内核中,此种加速单元必须尽可能的通用化,以适用于不同的应用场景.

FMAC (filtermath accelerator)结合了DSP任务的灵活性与专用硬件的成本和性能的优势. FMAC的核心是一个2x16bit乘法器和一个26bit累加器. 它既可用于有限冲激响应(FIR)类型也可用于无线冲激响应(IIR)类型的滤波器.它包含有256个16bit的缓存, 可以灵活地分配来存储协因子,状态存储,输入或输出缓存. FMAC可工作于与内核同样的频率, 因此FMAC在执行滤波任务的性能可以等于甚至高于软件实现的滤波器. 如果使用DMA来传输输入和输出, 那么滤波任务可以说100%地由FMAC来承担, MCU资源可用于其它软件任务.

笔者计划使用两个例子展示如何使用FMAC来解放MCU:

· 用于噪音消除的自适应FIR滤波器;
· 用于数字电源控制的3p3z IIR补偿器.

本贴先写FIR.

只看该作者 · 2020-7-12 20:56

自适应FIR滤波器概述

所谓自适应滤波器, 能够根据输入信号的各种特性来调整自己的冲激响应. 通信应用中通常使用这种滤波器来消除干扰. 使用合适的自适应用算法可以自动检测干扰分量的频率特性,调整滤波器的系数从而在频率上对干扰进行带阻.

本文不探讨自适应算法的细节. 本例中使用一种自动回归算法(Auto Regressive), 此算法的输出即是一组FIR滤波器的系数. 该算法根据信号特点,信道参数和干扰特性对输出进行自动调整. 该算法的计算最好是在MCU空闲时由软件完成. 而对信号的过滤则必须在一定的延迟下连续进行. 此应用特性非常适合使用FMAC来加速.

本文中, 输入为已经采样,量化后存于MCU的存储器中的序列(每个序列长度为2048). 每次一个完整的采样序列准备好之后, DMA被触发将其输入给FMAC,FMAC对逐个采样进行滤波,输出由DMA再传输至存储器. 每个序列之间的间隔可用于更新滤波器系数.

自适应滤波器例子(序列1)

下面展示采样序列1及其频谱, 输入信号为高斯噪音(或称”白噪音”), 有两个窄带干扰,分别位于8KHz和16KHz.信号的采样频率为48KHz.

生成上述信号的matlab代码:

复制

% Specify the parameters of a signal with a sampling frequency of 1 kHz and a signal duration of 1.5 seconds.



Fs = 48000;            % Sampling frequency                    

T = 1/Fs;             % Sampling period       

L = 2^nextpow2(2048);             % Length of signal

t = (0:L-1)*T;        % Time vector



figure('name', 'Gaussian noise with two-tone interferer');



% Form a signal containing a 50 Hz sinusoid of amplitude 0.7 and a 120 Hz sinusoid of amplitude 1.



S1 = 0.8 * sin(2*pi*8000*t);

S2 = 0.5 * sin(2*pi*16000*t);

S = S1 + S2;

subplot(4, 1, 1);

plot(t(1:L/1)*1000, S1(1:L/1), 'red');

title('8KHz');

xlabel('t (milliseconds)')

ylabel('S1(t)')



subplot(4, 1, 2);

plot(t(1:L/1)*1000, S2(1:L/1), 'magenta');

title('16KHz');

xlabel('t (milliseconds)')

ylabel('S2(t)')



subplot(4, 1, 3);

plot(t(1:L/1)*1000, S(1:L/1), 'cyan')

title('Signal')

xlabel('t (milliseconds)')

ylabel('S(t)')



% Corrupt the signal with zero-mean white noise with a variance of 4.

X = S + 2*randn(size(t));



% Plot the noisy signal in the time domain. It is difficult to identify the frequency components by looking at the signal X(t).



subplot(4, 1, 4);

plot(t(1:L/1)*1000, X(1:L/1), 'green')

title('Signal Corrupted with Zero-Mean Random Noise')

xlabel('t (milliseconds)')

ylabel('X(t)')

只看该作者 · 2020-7-12 21:03

下一步使用matlab设计FIR滤波器, 滤波器的频域响应应该为:

实际工程中, 干扰往往不会固定, 比如下图的信号在上面信号的基础上又多了一个12K的干扰分量.

相应信号的matlab代码:

复制

% Specify the parameters of a signal with a sampling frequency of 1 kHz and a signal duration of 1.5 seconds.

linespec = {'yellow-', 'magenta-', 'cyan-', 'red-', 'green-', 'blue-',  'black-', };

index_color = 0;



Fs = 48000;            % Sampling frequency                    

T = 1/Fs;             % Sampling period       

L = 2^nextpow2(2048);             % Length of signal

t = (0:L-1)*T;        % Time vector



figure('name', 'Gaussian noise with two-tone interferer');



% Form a signal containing a 50 Hz sinusoid of amplitude 0.7 and a 120 Hz sinusoid of amplitude 1.



S1 = 0.8 * sin(2*pi*8000*t);

S2 = 0.5 * sin(2*pi*16000*t);

S3 = 0.7 * sin(2*pi*12000*t);

S = S1 + S2 + S3;

subplot(5, 1, 1);

plot(t(1:L/1)*1000, S1(1:L/1), linespec{1 + mod((index_color-1), length(linespec))});

index_color = index_color + 1;

title('8 KHz');

xlabel('t (milliseconds)')

ylabel('S1(t)')



subplot(5, 1, 2);

plot(t(1:L/1)*1000, S2(1:L/1), linespec{1 + mod((index_color-1), length(linespec))});

index_color = index_color + 1;

title('16 KHz');

xlabel('t (milliseconds)')

ylabel('S2(t)')



subplot(5, 1, 3);

plot(t(1:L/1)*1000, S3(1:L/1), linespec{1 + mod((index_color-1), length(linespec))});

index_color = index_color + 1;

title('12 KHz');

xlabel('t (milliseconds)')

ylabel('S3(t)')



subplot(5, 1, 4);

plot(t(1:L/1)*1000, S(1:L/1), linespec{1 + mod((index_color-1), length(linespec))});

index_color = index_color + 1;

title('Signal')

xlabel('t (milliseconds)')

ylabel('S(t)')



% Corrupt the signal with zero-mean white noise with a variance of 4.

X = S + 2*randn(size(t));



% Plot the noisy signal in the time domain. It is difficult to identify the frequency components by looking at the signal X(t).



subplot(5, 1, 5);

plot(t(1:L/1)*1000, X(1:L/1), linespec{1 + mod((index_color-1), length(linespec))});

index_color = index_color + 1;

title('Signal Corrupted with Zero-Mean Random Noise')

xlabel('t (milliseconds)')

ylabel('X(t)')

只看该作者 · 2020-7-12 21:08

因为干扰信号变化了, 所以滤波器的目标频响图应该如此:

这个例子中, 滤波器的特性变化了, 需要根据干扰的变化来动态配置滤波器参数, 所以称之为自适应滤波器.自适应的算法本贴不涉及.

FIR滤波器

FIR滤波器的实质就是将信号与滤波器系数进行卷积运算. 系数的个数决定了信号带宽中最小过滤频带的宽度. 系数越多则对窄带干扰过滤的精度越高,也意味着信号损失越小. 但同时N个系数的滤波器一般需要N^2的运算复杂度,所以系数的个数受限于计算资源. 本例中使用了51阶的FIR滤波器.

滤波器的系数一般使用浮点格式计算, 使用时要转换为16bit的定点格式. FMAC使用的定点格式为Q1.15,即bit0-14表示小数位, bit15表示符号与整数位. Q1.15格式表示的范围为

[-1.0 (0x8000), +0.999969482421875(0x7FFF)].

设计得合理的话, 生成的系数的绝对值都会小于等于1, 所以都在Q1.15格式的表示范围内.如果不是这种情况, 则所有的系数需要除以一个大于等于最大值的值.此过程也被称之为”归一化”.

将浮点数转换为Q1.15格式:

复制

value_q15 = (int16_t)(value_f32*0x8000);

上述C代码编译后在CortexM4F内核上使用FPU需要8个时钟周期, 所以转换51个系数需要约408个周期.

只看该作者 · 2020-7-12 21:11

这里以上面两种干扰为例, 设计两个FIR滤波器:

复制

ord = 50;

format long;

Fs = 48000;            % Sampling frequency                    

NarrowBandWidth = 10;



StopF1 = 8000;

StopF2 = 12000;

StopF3 = 16000;

bnd8K = [(StopF1 - NarrowBandWidth)/(Fs/2) (StopF1 + NarrowBandWidth)/(Fs/2)];

bnd12K = [(StopF2 - NarrowBandWidth)/(Fs/2) (StopF2 + NarrowBandWidth)/(Fs/2)];

bnd16K = [(StopF3 - NarrowBandWidth)/(Fs/2) (StopF3 + NarrowBandWidth)/(Fs/2)];



band_stop_8K_16K = fir1(ord, [bnd8K bnd16K],'DC-1');

band_stop_8K_12K_16K = fir1(ord, [bnd8K bnd12K bnd16K],'DC-1');



hfvt = fvtool(band_stop_8K_16K, 1, band_stop_8K_12K_16K, 1);

legend(hfvt,'BTF2','BTF3')



band_stop_8K_16K_q1_15 = cast((band_stop_8K_16K * 32768), 'int16');

band_stop_8K_12K_16K_q1_15 = cast((band_stop_8K_12K_16K * 32768), 'int16');

第一个滤波器系数:

Coeff.	Value	Coeff.	Value	Coeff.	Value
b0	1	b1	-0.06758554	b2	0.00622496
b3	0.0282774	b4	0.09562168	b5	-0.02541931
b6	-0.11821158	b7	-0.01763455	b8	0.00794264
b9	0.01041298	b10	0.00806815	b11	-0.025829
b12	-0.09904707	b13	0.01375418	b14	0.01032627
b15	-0.00441015	b16	0.01825352	b17	-0.02035376
b18	-0.0244004	b19	-0.01464122	b20	0.05634489
b21	0.00220208	b22	0.06268198	b23	0.01072175
b24	-0.07694778	b25	-0.01317927	b26	0.00029196
b27	0.03238028	b28	0.05527402	b29	-0.02844389
b30	-0.07398416	b31	0.01854994	b32	0.0612234
b33	0.00562513	b34	-0.01274634	b35	-0.01404053
b36	-0.00688377	b37	-0.0334368	b38	0.02592314
b39	-0.0632741	b40	0.07521879	b41	-0.01797365
b42	-0.08349794	b43	-0.01245005	b44	0.05184924
b45	-0.03642993	b46	0.00493303	b47	0.01581092
b48	-0.04743807	b49	0.0068994	b50	0.08575577

转换为Q1_15格式:

Coeff.	Value	Coeff.	Value	Coeff.	Value
b0	32767	b1	-2215	b2	203
b3	926	b4	3133	b5	-833
b6	-3874	b7	-578	b8	260
b9	341	b10	264	b11	-847
b12	-3246	b13	450	b14	338
b15	-145	b16	598	b17	-667
b18	-800	b19	-480	b20	1846
b21	72	b22	2053	b23	351
b24	-2522	b25	-432	b26	9
b27	1061	b28	1811	b29	-933
b30	-2425	b31	607	b32	2006
b33	184	b34	-418	b35	-461
b36	-226	b37	-1096	b38	849
b39	-2074	b40	2464	b41	-589
b42	-2737	b43	-408	b44	1698
b45	-1194	b46	161	b47	518
b48	-1555	b49	226	b50	2810

只看该作者 · 2020-7-12 21:13

由于使用了Q1.15的定点数, 相对单精度的浮点数动态范围减小了, 所以滤波器的性能有一些损失. 这一点也是在使用FMAC时需要考虑的因素. 如果无法承受损失的精度, 则应该选择软件来实现滤波器(使用浮点数或者32bit定点数运算).本例中,定点运算带来的精度影响十分微小, 误差在0.007 dB之内. 定点与浮点系数的频响差:

FMAC配置

本文使用STM32CubeG4支持包中的HAL库来初始化FMAC. 访问FMAC之前, 使能FMAC时钟:

复制

__HAL_RCC_FMAC_CLK_ENABLE();

定义一块存放滤波器系数的内存块:

复制

/* Declare an array to hold the filter coefficients */

static int16_t aFilterCoeffB[51];

定义一个FMAC配置结构体:

复制

FMAC_HandleTypeDef hfmac;

下面是配置代码:

复制

/* declare a filter configuration structure */

FMAC_FilterConfigTypeDef sFmacConfig;

/* Set the coefficient buffer base address */

sFmacConfig.CoeffBaseAddress = 0;

/* Set the coefficient buffer size to the number of coeffs */

sFmacConfig.CoeffBufferSize = 51;

/* Set the Input buffer base address to the next free address */

sFmacConfig.InputBaseAddress = 51;

/* Set the input buffer size greater than the number of coeffs */

sFmacConfig.InputBufferSize = 100;

/* Set the input watermark to zero since we are using DMA */

sFmacConfig.InputThreshold = 0;

/* Set the Output buffer base address to the next free address */

sFmacConfig.OutputBaseAddress = 151;

/* Set the output buffer size */

sFmacConfig.OutputBufferSize = 100;

/* Set the output watermark to zero since we are using DMA */

sFmacConfig.OutputThreshold = 0;

/* No A coefficients since FIR */

sFmacConfig.pCoeffA = NULL;

sFmacConfig.CoeffASize = 0;

/* Pointer to the coefficients in memory */

sFmacConfig.pCoeffB = aFilterCoeffB;

/* Number of coefficients */

sFmacConfig.CoeffBSize = 51;

/* Select FIR filter function */

sFmacConfig.Filter = FMAC_FUNC_CONVO_FIR;

/* Enable DMA input transfer */

sFmacConfig.InputAccess = FMAC_BUFFER_ACCESS_DMA;

/* Enable DMA output transfer */

sFmacConfig.OutputAccess = FMAC_BUFFER_ACCESS_DMA;

/* Enable clipping of the output at 0x7FFF and 0x8000 */

sFmacConfig.Clip = FMAC_CLIP_ENABLED;

/* P parameter contains number of coefficients */

sFmacConfig.P = 51;

/* Q parameter is not used */

sFmacConfig.Q = FILTER_PARAM_Q_NOT_USED;

/* R parameter contains the post-shift value (none) */

sFmacConfig.R = 0;

/* Configure the FMAC */

if (HAL_FMAC_FilterConfig(&hfmac, &sFmacConfig) != HAL_OK)

/* Configuration Error */

Error_Handler();

HAL_FMAC_FilterConfig()这个函数对配置和控制寄存器编程, 并将滤波器系数载入FMAC本地内存(X2 buffer).

只看该作者 · 2020-7-12 21:13

DMA配置

配置3个DMA通道.

通道1: 预加载通道, 使用采样值初始化FMAC输入buffer. 每当滤波器系数变化, FMAC必须停止重启一次. FMAC在输入buffer被填充后才启动,所以我们使用前一个frame的最后51个采样来将滤波器初始化到停止时刻的状态,以免丢失数据. 这个步骤使用软件也可以, 本文中使用DMA.
通道2: 写通道. 只要FMAC的输入buffer有空位置, 就将采样值从内存中搬运到FMAC.
通道3: 读通道. 只要FMAC的输出buffer有未读取的值, 就将其从FMAC搬运到内存.

以下为DMA配置代码. 定义三个DMA配置结构体, 使用HAL_DMA_Init()函数进行初始化.

复制

DMA_HandleTypeDef hdma_fmac_preload; /* Preload channel */

DMA_HandleTypeDef hdma_fmac_read; /* Read channel */

DMA_HandleTypeDef hdma_fmac_write; /* Write channel */



/* Preload channel initialisation */

hdma_fmac_preload.Instance = DMA1_Channel1;

hdma_fmac_preload.Init.Request = DMA_REQUEST_MEM2MEM;

hdma_fmac_preload.Init.Direction = DMA_MEMORY_TO_MEMORY;

hdma_fmac_preload.Init.PeriphInc = DMA_PINC_ENABLE;

hdma_fmac_preload.Init.MemInc = DMA_MINC_DISABLE;

hdma_fmac_preload.Init.PeriphDataAlignment =

DMA_PDATAALIGN_HALFWORD;

hdma_fmac_preload.Init.MemDataAlignment = DMA_MDATAALIGN_WORD;

hdma_fmac_preload.Init.Mode = DMA_NORMAL;

hdma_fmac_preload.Init.Priority = DMA_PRIORITY_HIGH;

if (HAL_DMA_Init(&hdma_fmac_preload) != HAL_OK)

Error_Handler();

/* Connect the DMA channel to the FMAC handle */

__HAL_LINKDMA(hfmac,hdmaPreload,hdma_fmac_preload);

/* Write channel initialisation */

hdma_fmac_write.Instance = DMA1_Channel2;

hdma_fmac_write.Init.Request = DMA_REQUEST_FMAC_WRITE;

hdma_fmac_write.Init.Direction = DMA_MEMORY_TO_PERIPH;

hdma_fmac_write.Init.PeriphInc = DMA_PINC_DISABLE;

hdma_fmac_write.Init.MemInc = DMA_MINC_ENABLE;

hdma_fmac_write.Init.PeriphDataAlignment = DMA_PDATAALIGN_WORD;

hdma_fmac_write.Init.MemDataAlignment = DMA_MDATAALIGN_HALFWORD;

hdma_fmac_write.Init.Mode = DMA_NORMAL;

hdma_fmac_write.Init.Priority = DMA_PRIORITY_HIGH;

if (HAL_DMA_Init(&hdma_fmac_write) != HAL_OK)

Error_Handler();

/* Connect the DMA channel to the FMAC handle */

__HAL_LINKDMA(hfmac,hdmaIn,hdma_fmac_write);

/* Read channel initialisation */

hdma_fmac_read.Instance = DMA1_Channel3;

hdma_fmac_read.Init.Request = DMA_REQUEST_FMAC_READ;

hdma_fmac_read.Init.Direction = DMA_PERIPH_TO_MEMORY;

hdma_fmac_read.Init.PeriphInc = DMA_PINC_DISABLE;

hdma_fmac_read.Init.MemInc = DMA_MINC_ENABLE;

hdma_fmac_read.Init.PeriphDataAlignment = DMA_PDATAALIGN_WORD;

hdma_fmac_read.Init.MemDataAlignment = DMA_MDATAALIGN_HALFWORD;

hdma_fmac_read.Init.Mode = DMA_NORMAL;

hdma_fmac_read.Init.Priority = DMA_PRIORITY_HIGH;

if (HAL_DMA_Init(&hdma_fmac_read) != HAL_OK)

Error_Handler();

/* Connect the DMA channel to the FMAC handle */

__HAL_LINKDMA(hfmac,hdmaIn,hdma_fmac_read);

其中读通道和预加载通道的DMA中断需要打开, 这样传输完成后软件可以被通知到.

只看该作者 · 2020-7-12 21:16

使用滤波器

每当有一个frame的采样需要滤波时, 软件触发滤波器. 新的采样存在系统内存中, 使用双缓冲方式.

复制

static int16_t aInputValues[2][2048];

int CurrentInputArraySize = 2048;

双缓冲方式下, 一个frame的采样值正在被处理的同时, 另外一个frame可以接收新的采样输入. 这里使用一个变量表示正在被使用的frame:

复制

int Frame = 1;

以下数组用来存储输出数据:

复制

static int16_t aCalculatedFilteredData[2048];

int ExpectedCalculatedFilteredDataSize = 2048;

软件触发上一个frame的最后的51个采样的预加载. 这是为了将滤波器恢复到FMAC被停止时的状态.

复制

if (HAL_FMAC_FilterPreload_DMA(&hfmac,

&aInputValues[CurrentInputArray][1997], 51, NULL, 0) != HAL_OK)

Error_Handler();

/* Switch frames */

Frame ? Frame=0 : Frame=1;

当预加载结束(通过相应的DMA通道的中断来获知),软件触发DMA的写通道将当前frame的数据写到FMAC :

复制

if (HAL_FMAC_AppendFilterData(&hfmac, &aInputValues[CurrentInputArray][0],

&CurrentInputArraySize) != HAL_OK)

Error_Handler();

之后软件启动FMAC:

复制

if (HAL_FMAC_FilterStart(&hfmac, aCalculatedFilteredData,

&ExpectedCalculatedFilteredDataSize) != HAL_OK)

Error_Handler();

此时FMAC开始计算输出采样,并将其写入输出buffer. 对于每个采样值, FMAC在DMA的读通道产生一个请求, DMA会将新的采样值搬运到内存中. 这个过程不需要软件干预一值持续进行, 直到整个frame的数据处理完毕. 此时DMA的读通道产生一个中断, CPU收到这个中断信号后关闭FMAC:

复制

if (HAL_FMAC_FilterStop(&hfmac) != HAL_OK)

Error_Handler();

如果滤波器的系数需要更新, 此时可以再次调用HAL_FMAC_FilterConfig()函数. 新的系数应该在更新之前存入aFilterCoeffB[]; 要处理下一个frame, 重复上述过程即可, 从预加载开始:

复制

(HAL_FMAC_FilterPreload_DMA()).

只看该作者 · 2020-7-12 21:16

总结

任何可以转换为FIR或者直接I型IIR的滤波器都可以使用FMAC进行卸载加速,以释放珍贵的CPU计算资源.
滤波器的阶数越多, 数据buffer越多, 比如上文所述的51阶FIR, 使用FMAC的收益越明显. 因为CPU要做的仅仅是间或参与(本文中是每一帧计算后更新滤波器系数). 其他时间, CPU可以专注于其更他更加有用,也更加适合CPU运算的任务, 如运行自适应算法来计算滤波器系数.
控制环应用中, 如buck电源转换器, CPU需要在每一个采样的处理中进行干预, 因此收益相对要小一点. 但是此种应用场景经常需要有实时响应的需求, 因此意味着其他任务本身也需要被控制环逻辑抢占. 因为FMAC提高了每一个采样的处理效率, 因此抢占的时间也被缩短了, 可以降低对CPU速度/性能的要求.

只看该作者 · 2023-2-3 09:50

使用STM32CubeG4软件包设计利用FMAC的数字滤波器之一(FIR)

技术新星奖章