【英飞凌 CY8CKIT-062S2-AI评测】声音识别

无垠的广袤 发表于 2025-11-20 14:11

<h1>【英飞凌 CY8CKIT-062S2-AI评测】声音识别</h1>
<p>本文介绍了英飞凌 CY8CKIT-062S-AI 开发板结合板载 Audio 传感器收集环境声音数据，并通过机器学习模型预测和推理特定声音信号，实现声音识别的项目设计。</p>
<h2>项目介绍</h2>
<p>该项目使用板载 Audio 传感器收集加速度计数据，发送至 ML 模型以检测特定声音，如咳嗽、哭笑、鸟鸣等；</p>
<p><img src="data/attachment/forum/202511/20/140545ffbaicaffbfczcfb.jpg" alt="audio_recog_cover.jpg" title="audio_recog_cover.jpg" /></p>
<ul>
<li>环境搭建：安装相关软件和机器学习工具，用以生成对应的模型代码；</li>
<li>工程创建：使用 ModusToolbox 软件快速加载和编译固件及调试；</li>
<li>工程代码：给出项目方案落地实现的关键代码，包括流程图等；</li>
<li>效果演示：通过串口显示目标声音的概率，推理并给出预测结果。</li>
</ul>
<h2>环境搭建</h2>
<ul>
<li>
<p>在 CY8CKIT-062S2-AI 设备官方网站下载对应的开发工具和 IDE 软件，包括</p>
<ul>
<li>ModusToolbox ；</li>
<li>DEEPCRAFT™ Studio 或 Imagimob ；</li>
</ul>
</li>
<li>
<p>可通过 ModusToolbox Setup 软件安装相关软件和工具链；</p>
</li>
<li>
<p>使用 ModusToolbox Programmer 软件烧录固件。</p>
</li>
</ul>
<h2>工程测试</h2>
<p>加载 CY8CKIT-062S-AI 开发板 Demo 工程，演示了部署由 DEEPCRAFT™ Studio 生成的机器学习（ML）模型。</p>
<ul>
<li>使用声学模型/关键词探测器，接收脉冲密度调制（PDM）音频数据作为输入；</li>
<li>检测各种关键词，如数字、笑声、方向、鸟、狗、猫等；</li>
<li>调谐麦克风的预设检测距离为 1 米；</li>
<li>运行示例工程，接收麦克风的音频并传递至机器学习模型，通过串口终端输出识别结果。</li>
</ul>
<h3>工程创建</h3>
<ul>
<li>进入 <code>Eclipse for ModusToolbox</code> 软件；</li>
<li>在 <code>Quick Package</code> 标签界面下选择 <code>Start</code> - <code>New Application</code> ；</li>
<li>待加载出设备目录后（需要科学上网），在检索框中输入 <code>CY8CKIT-062S2-AI</code> 获取对应设备；</li>
</ul>
<p><img src="data/attachment/forum/202511/20/140604nfy3fxvvteykv3kv.jpg" alt="ML_motion_create.jpg" title="ML_motion_create.jpg" /></p>
<ul>
<li>勾选 Machine Learning 目录下的 <code>DEEPCRAFT Deploy Model Audio</code> 工程，点击 Create 按钮；</li>
</ul>
<p><img src="data/attachment/forum/202511/20/140615p16txgi1xv618116.jpg" alt="deploy_audio_test.jpg" title="deploy_audio_test.jpg" /></p>
<ul>
<li>待完成 Demo 创建，右键项目，构建工程，确认无报错；</li>
</ul>
<p>详见：Infineon/mtb-example-ml-deepcraft-deploy-audio .</p>
<h3>流程图</h3>
<p><img src="data/attachment/forum/202511/20/140634nb2fe29922efbfbf.png" alt="flowchart_ML_audio.png" title="flowchart_ML_audio.png" /></p>
<h3>工程代码</h3>
<p>打开工程目录中的 <code>main.c</code> 文件，代码如下</p>
<pre><code class="language-c++">#include "cyhal.h"
#include "cybsp.h"
#include "cy_retarget_io.h"
#include <float.h>

/* Model to use */
#include <models/model.h>

/*******************************************************************************
* Macros
********************************************************************************/
/* Desired sample rate. Typical values: 8/16/22.05/32/44.1/48 kHz */
#define SAMPLE_RATE_HZ          16000

/* Audio Subsystem Clock. Typical values depends on the desire sample rate:
- 8/16/48kHz : 24.576 MHz
- 22.05/44.1kHz : 22.579 MHz */
#define AUDIO_SYS_CLOCK_HZ       24576000

/* Decimation Rate of the PDM/PCM block. Typical value is 64 */
#define DECIMATION_RATE          64

/* Microphone sensitivity
* PGA in 0.5 dB increment, for example a value of 5 would mean +2.5 dB. */
#define MICROPHONE_GAIN          20

/* Multiplication factor of the input signal.
* This should ideally be 1. Higher values will have a negative impact on
* the sampling dynamic range. However, it can be used as a last resort
* when MICROPHONE_GAIN is already at maximum and the ML model was trained
* with data at a higher amplitude than the microphone captures.
* Note: If you use the same board for recording training data and
* deployment of your own ML model set this to 1.0. */
#define DIGITAL_BOOST_FACTOR          10.0f

/* Specifies the dynamic range in bits.
* PCM word length, see the A/D specific documentation for valid ranges. */
#define AUIDO_BITS_PER_SAMPLE    16

/* PDM/PCM Pins */
#define PDM_DATA                P10_5
#define PDM_CLK                   P10_4

/* Size of audio buffer */
#define AUDIO_BUFFER_SIZE       512

/* Converts given audio sample into range [-1,1] */
#define SAMPLE_NORMALIZE(sample)    (((float) (sample)) / (float) (1 << (AUIDO_BITS_PER_SAMPLE - 1)))

/* DEEPCRAFT compatibility defines to support all versions of code generation APIs */
#ifndef IPWIN_RET_SUCCESS
#define IPWIN_RET_SUCCESS (0)
#endif
#ifndef IPWIN_RET_NODATA
#define IPWIN_RET_NODATA (-1)
#endif
#ifndef IPWIN_RET_ERROR
#define IPWIN_RET_ERROR (-2)
#endif
#ifndef IMAI_DATA_OUT_SYMBOLS
#define IMAI_DATA_OUT_SYMBOLS IMAI_SYMBOL_MAP
#endif
/* End DEEPCFRAT compatibility defines */

/*******************************************************************************
* Function Prototypes
*******************************************************************************/
static void init_board(void);
static void init_audio(cyhal_pdm_pcm_t* pdm_pcm);
static void halt_error(int code);
static void pdm_frequency_fix();

/**********************************************
* Function Name: main
***********************************************/
int main(void)
{
int16_t audio_buffer = {0};
float label_scores;
char *label_text[] = IMAI_DATA_OUT_SYMBOLS;

cy_rslt_t result;
size_t audio_count;
cyhal_pdm_pcm_t pdm_pcm;
int16_t prev_best_label = 0;
int16_t best_label = 0;
float sample = 0.0f;
float sample_abs = 0.0f;
float max_score = 0.0f;
float sample_max = 0;
float sample_max_slow = 0;

/* Basic board setup */
init_board();

/* Initialize model */
result = IMAI_init();
halt_error(result);

/* Initialize audio sampling */
init_audio(&pdm_pcm);

/* ANSI ESC sequence for clear screen */
printf("\x1b[2J\x1b[;H\x1b[?25l;");

for (;;)
{
   /* Move cursor home */
   printf("\033[H");
   printf("DEEPCRAFT Studio Audio Model Example\r\n\n");

   /* Initialize the audio_buffer to zeroes and read data
      * from the pdm mic into it */
   audio_count = AUDIO_BUFFER_SIZE;
   memset(audio_buffer, 0, AUDIO_BUFFER_SIZE * sizeof(uint16_t));
   result = cyhal_pdm_pcm_read(&pdm_pcm, (void *) audio_buffer, &audio_count);
   halt_error(result);

   sample_max_slow -= 0.0005;
   sample_max = 0;
   for(int i = 0; i < audio_count; i++)
   {
         /* Convert integer sample to float and pass it to the model */
         sample = SAMPLE_NORMALIZE(audio_buffer) * DIGITAL_BOOST_FACTOR;
         if (sample > 1.0)
         {
            sample = 1.0;
         }
         else if (sample < -1.0)
         {
            sample = -1.0;
         }
         result = IMAI_enqueue(&sample);
         halt_error(result);

         /* Used to tune gain control. sample_max should be near 1.0
         * when shouting directly into the microphone */
         sample_abs = fabs(sample);
         if(sample_abs > sample_max)
         {
            sample_max = sample_abs;
         }

         if(sample_max > sample_max_slow)
         {
            sample_max_slow = sample_max;
         }
         /* Check if there is any model output to process */
         best_label = 0;
         max_score = -1000.0f;
         switch(IMAI_dequeue(label_scores))
         {
            case IMAI_RET_SUCCESS:    /* We have data, display it */

               for(int i = 0; i < IMAI_DATA_OUT_COUNT; i++)
               {
                     printf("label: %-10s: score: %.4f\r\n", label_text, label_scores);
                     if (label_scores > max_score)
                     {
                        max_score = label_scores;
                        best_label = i;
                     }
               }
               printf("\r\n");

               /* Post processing
                  * If the previous best label still has a confidence score above > 0.05
                  * keep it as the best label. */
               if(prev_best_label != 0 && label_scores > 0.05)
               {
                     best_label = prev_best_label;
                     printf("Output: %-30s\r\n", label_text);
               }
               /* Otherwise, if the best label is not "unlabeled", and conf score is above 0.5
                  * use it as best label. */
               else if(best_label != 0 && max_score >= 0.50)
               {
                     prev_best_label = best_label;
                     printf("Output: %-30s\r\n", label_text);
               }
               /* Else the best label is "unlabeled" */
               printf("\r\n");
               printf("Volume: %.4f (%.2f)\r\n", sample_max, sample_max_slow);
               printf("Audio buffer utilization: %.3f\r\n", audio_count / (float)AUDIO_BUFFER_SIZE);
               break;
            case IMAI_RET_NODATA: /* No new output, continue with sampling */
               break;
            case IMAI_RET_ERROR: /* Abort on error */
               halt_error(IMAI_RET_ERROR);
               break;
         }
   }
}
}
</code></pre>
<p>保存代码。</p>
<h3>固件上传</h3>
<ul>
<li>连接开发板和电脑，点击菜单栏的运行按钮，完成固件上传；</li>
<li>或使用 <code>ModusToolbox Programmer</code> 工具烧录固件；</li>
<li>固件位于 <code>.../DEEPCRAFT_..._Audio/build/APP_CY8CKIT-062S2-AI/Debug</code> 文件夹；</li>
</ul>
<p><img src="data/attachment/forum/202511/20/140654pv9n9v9vzc3sphds.jpg" alt="hex_localization.jpg" title="hex_localization.jpg" /></p>
<ul>
<li>加载固件，配置烧录器、开发板型号；</li>
<li>点击 Program 即可。</li>
</ul>
<p><img src="data/attachment/forum/202511/20/140709oi73qiw30c5radla.jpg" alt="programmer_modustoolbox_audio.jpg" title="programmer_modustoolbox_audio.jpg" /></p>
<h3>效果</h3>
<ul>
<li>运行 <code>Tera Term</code> 软件，连接设备串口，配置波特率为 <code>115200</code>；</li>
<li>短按板载 RESET 键，终端显示 Audio 例程，并进行声音推理；</li>
</ul>
<p><img src="data/attachment/forum/202511/20/140728qra6trwraltratyb.gif" alt="audio_analysis.gif" title="audio_analysis.gif" /></p>
<ul>
<li>环境输入标签对应的各种声音信号，开发板可根据 Audio 模型推理识别出相应的声音并标签显示；</li>
</ul>
<p><img src="data/attachment/forum/202511/20/141056e4w4aw4vqt4au9bw.gif" alt="audio_test.gif" title="audio_test.gif" /></p>
<h2>总结</h2>
<p>本文介绍了英飞凌 CY8CKIT-062S-AI 开发板结合板载 Audio 传感器收集环境声音数据，并通过机器学习模型预测和推理特定声音信号，实现声音识别的项目设计，为相关产品在边缘 AI 领域的快速开发和设计应用提供了参考。</p>

页: [1]

21ic电子技术开发论坛's Archiver

【英飞凌 CY8CKIT-062S2-AI评测】声音识别