[ModusToolbox™] 【英飞凌 CY8CKIT-062S2-AI评测】声音识别

[复制链接]
23|0
无垠的广袤 发表于 2025-11-20 14:11 | 显示全部楼层 |阅读模式

【英飞凌 CY8CKIT-062S2-AI评测】声音识别

本文介绍了英飞凌 CY8CKIT-062S-AI 开发板结合板载 Audio 传感器收集环境声音数据,并通过机器学习模型预测和推理特定声音信号,实现声音识别的项目设计。

项目介绍

该项目使用板载 Audio 传感器收集加速度计数据,发送至 ML 模型以检测特定声音,如咳嗽、哭笑、鸟鸣等;

audio_recog_cover.jpg

  • 环境搭建:安装相关软件和机器学习工具,用以生成对应的模型代码;
  • 工程创建:使用 ModusToolbox 软件快速加载和编译固件及调试;
  • 工程代码:给出项目方案落地实现的关键代码,包括流程图等;
  • 效果演示:通过串口显示目标声音的概率,推理并给出预测结果。

环境搭建

  • 在 CY8CKIT-062S2-AI 设备 官方网站 下载对应的开发工具和 IDE 软件,包括

    • ModusToolbox ;
    • DEEPCRAFT™ Studio 或 Imagimob ;
  • 可通过 ModusToolbox Setup 软件安装相关软件和工具链;

  • 使用 ModusToolbox Programmer 软件烧录固件。

工程测试

加载 CY8CKIT-062S-AI 开发板 Demo 工程,演示了部署由 DEEPCRAFT™ Studio 生成的机器学习(ML)模型。

  • 使用声学模型/关键词探测器,接收脉冲密度调制(PDM)音频数据作为输入;
  • 检测各种关键词,如数字、笑声、方向、鸟、狗、猫等;
  • 调谐麦克风的预设检测距离为 1 米;
  • 运行示例工程,接收麦克风的音频并传递至机器学习模型,通过串口终端输出识别结果。

工程创建

  • 进入 Eclipse for ModusToolbox 软件;
  • Quick Package 标签界面下选择 Start - New Application
  • 待加载出设备目录后(需要科学上网),在检索框中输入 CY8CKIT-062S2-AI 获取对应设备;

ML_motion_create.jpg

  • 勾选 Machine Learning 目录下的 DEEPCRAFT Deploy Model Audio 工程,点击 Create 按钮;

deploy_audio_test.jpg

  • 待完成 Demo 创建,右键项目,构建工程,确认无报错;

详见:Infineon/mtb-example-ml-deepcraft-deploy-audio .

流程图

flowchart_ML_audio.png

工程代码

打开工程目录中的 main.c 文件,代码如下

#include "cyhal.h"
#include "cybsp.h"
#include "cy_retarget_io.h"
#include <float.h>

/* Model to use */
#include <models/model.h>

/*******************************************************************************
* Macros
********************************************************************************/
/* Desired sample rate. Typical values: 8/16/22.05/32/44.1/48 kHz */
#define SAMPLE_RATE_HZ              16000

/* Audio Subsystem Clock. Typical values depends on the desire sample rate:
- 8/16/48kHz    : 24.576 MHz
- 22.05/44.1kHz : 22.579 MHz */
#define AUDIO_SYS_CLOCK_HZ          24576000

/* Decimation Rate of the PDM/PCM block. Typical value is 64 */
#define DECIMATION_RATE             64

/* Microphone sensitivity
 * PGA in 0.5 dB increment, for example a value of 5 would mean +2.5 dB. */
#define MICROPHONE_GAIN             20

/* Multiplication factor of the input signal.
 * This should ideally be 1. Higher values will have a negative impact on
 * the sampling dynamic range. However, it can be used as a last resort 
 * when MICROPHONE_GAIN is already at maximum and the ML model was trained
 * with data at a higher amplitude than the microphone captures.
 * Note: If you use the same board for recording training data and 
 * deployment of your own ML model set this to 1.0. */
#define DIGITAL_BOOST_FACTOR            10.0f

/* Specifies the dynamic range in bits.
 * PCM word length, see the A/D specific documentation for valid ranges. */
#define AUIDO_BITS_PER_SAMPLE       16

/* PDM/PCM Pins */
#define PDM_DATA                    P10_5
#define PDM_CLK                     P10_4

/* Size of audio buffer */
#define AUDIO_BUFFER_SIZE           512

/* Converts given audio sample into range [-1,1] */
#define SAMPLE_NORMALIZE(sample)        (((float) (sample)) / (float) (1 << (AUIDO_BITS_PER_SAMPLE - 1)))

/* DEEPCRAFT compatibility defines to support all versions of code generation APIs */
#ifndef IPWIN_RET_SUCCESS
#define IPWIN_RET_SUCCESS (0)
#endif
#ifndef IPWIN_RET_NODATA
#define IPWIN_RET_NODATA (-1)
#endif
#ifndef IPWIN_RET_ERROR
#define IPWIN_RET_ERROR (-2)
#endif
#ifndef IMAI_DATA_OUT_SYMBOLS
#define IMAI_DATA_OUT_SYMBOLS IMAI_SYMBOL_MAP
#endif
/* End DEEPCFRAT compatibility defines */

/*******************************************************************************
* Function Prototypes
*******************************************************************************/
static void init_board(void);
static void init_audio(cyhal_pdm_pcm_t* pdm_pcm);
static void halt_error(int code);
static void pdm_frequency_fix();


/**********************************************
* Function Name: main
***********************************************/
int main(void)
{
    int16_t audio_buffer[AUDIO_BUFFER_SIZE] = {0};
    float label_scores[IMAI_DATA_OUT_COUNT];
    char *label_text[] = IMAI_DATA_OUT_SYMBOLS;

    cy_rslt_t result;
    size_t audio_count;
    cyhal_pdm_pcm_t pdm_pcm;
    int16_t prev_best_label = 0;
    int16_t best_label = 0;
    float sample = 0.0f;
    float sample_abs = 0.0f;
    float max_score = 0.0f;
    float sample_max = 0;
    float sample_max_slow = 0;

    /* Basic board setup */
    init_board();

    /* Initialize model */
    result = IMAI_init();
    halt_error(result);

    /* Initialize audio sampling */
    init_audio(&pdm_pcm);

    /* ANSI ESC sequence for clear screen */
    printf("\x1b[2J\x1b[;H\x1b[?25l;");

    for (;;)
    {
        /* Move cursor home */
        printf("\033[H");
        printf("DEEPCRAFT Studio Audio Model Example\r\n\n");

        /* Initialize the audio_buffer to zeroes and read data
         * from the pdm mic into it */
        audio_count = AUDIO_BUFFER_SIZE;
        memset(audio_buffer, 0, AUDIO_BUFFER_SIZE * sizeof(uint16_t));
        result = cyhal_pdm_pcm_read(&pdm_pcm, (void *) audio_buffer, &audio_count);
        halt_error(result);

        sample_max_slow -= 0.0005;
        sample_max = 0;
        for(int i = 0; i < audio_count; i++)
        {
            /* Convert integer sample to float and pass it to the model */
            sample = SAMPLE_NORMALIZE(audio_buffer[i]) * DIGITAL_BOOST_FACTOR;
            if (sample > 1.0)
            {
                sample = 1.0;
            }
            else if (sample < -1.0)
            {
                sample = -1.0;
            }
            result = IMAI_enqueue(&sample);
            halt_error(result);

            /* Used to tune gain control. sample_max should be near 1.0 
             * when shouting directly into the microphone */
            sample_abs = fabs(sample);
            if(sample_abs > sample_max)
            {
                sample_max = sample_abs;
            }

            if(sample_max > sample_max_slow)
            {
                sample_max_slow = sample_max;
            }
            /* Check if there is any model output to process */
            best_label = 0;
            max_score = -1000.0f;
            switch(IMAI_dequeue(label_scores))
            {
                case IMAI_RET_SUCCESS:      /* We have data, display it */

                    for(int i = 0; i < IMAI_DATA_OUT_COUNT; i++)
                    {
                        printf("label: %-10s: score: %.4f\r\n", label_text[i], label_scores[i]);
                        if (label_scores[i] > max_score)
                        {
                            max_score = label_scores[i];
                            best_label = i;
                        }
                    }
                    printf("\r\n");

                    /* Post processing
                     * If the previous best label still has a confidence score above > 0.05
                     * keep it as the best label. */
                    if(prev_best_label != 0 && label_scores[prev_best_label] > 0.05)
                    {
                        best_label = prev_best_label;
                        printf("Output: %-30s\r\n", label_text[best_label]);
                    }
                    /* Otherwise, if the best label is not "unlabeled", and conf score is above 0.5
                     * use it as best label. */
                    else if(best_label != 0 && max_score >= 0.50)
                    {
                        prev_best_label = best_label;
                        printf("Output: %-30s\r\n", label_text[best_label]);
                    }
                    /* Else the best label is "unlabeled" */
                    printf("\r\n");
                    printf("Volume: %.4f    (%.2f)\r\n", sample_max, sample_max_slow);
                    printf("Audio buffer utilization: %.3f\r\n", audio_count / (float)AUDIO_BUFFER_SIZE);
                    break;
                case IMAI_RET_NODATA:   /* No new output, continue with sampling */
                    break;
                case IMAI_RET_ERROR:    /* Abort on error */
                    halt_error(IMAI_RET_ERROR);
                    break;
            }
        }
    }
}

保存代码。

固件上传

  • 连接开发板和电脑,点击菜单栏的运行按钮,完成固件上传;
  • 或使用 ModusToolbox Programmer 工具烧录固件;
  • 固件位于 .../DEEPCRAFT_..._Audio/build/APP_CY8CKIT-062S2-AI/Debug 文件夹;

hex_localization.jpg

  • 加载固件,配置烧录器、开发板型号;
  • 点击 Program 即可。

programmer_modustoolbox_audio.jpg

效果

  • 运行 Tera Term 软件,连接设备串口,配置波特率为 115200
  • 短按板载 RESET 键,终端显示 Audio 例程,并进行声音推理;

audio_analysis.gif

  • 环境输入标签对应的各种声音信号,开发板可根据 Audio 模型推理识别出相应的声音并标签显示;

audio_test.gif

总结

本文介绍了英飞凌 CY8CKIT-062S-AI 开发板结合板载 Audio 传感器收集环境声音数据,并通过机器学习模型预测和推理特定声音信号,实现声音识别的项目设计,为相关产品在边缘 AI 领域的快速开发和设计应用提供了参考。

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?注册

×
您需要登录后才可以回帖 登录 | 注册

本版积分规则

30

主题

77

帖子

1

粉丝
快速回复 在线客服 返回列表 返回顶部