基于深度卷积神经网络的语音降噪研究 Speech denoising based on deep convolutional neural network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于深度卷积神经网络的语音降噪研究

引用本文：	张琨瑶王华朋牛瑾琳倪令格刘元周.基于深度卷积神经网络的语音降噪研究[J].刑事技术,2021(5):457-463.

作者姓名：	张琨瑶王华朋牛瑾琳倪令格刘元周

作者单位：	1.中国刑事警察学院公安信息技术与情报学院110035;

基金项目：	国家重点研发计划(2017YFC0821000)；上海市现场物证重点实验室开放课题基金项目(2018XCWZK09)；重庆市高校刑事科学技术重点实验室开放基金项目(XKZDSYS2019-Z1)；广州市科技计划(2019030004)；中国刑事警察学院研究生创新能力提升项目(2020YCYB38)。

摘要：	目的为了提高实际工作中获取到的音频资料中语音的质量,降低噪声对语音质量及可懂度的影响,提出了一种基于深度卷积神经网络的语音降噪模型。方法该模型通过卷积、加偏置、批量归一化、Relu激活的多层循环结构,能够有效地对低信噪比条件下语音中的洗衣机噪声、鼓掌噪声、汽车内部噪声等多种常见的环境噪声进行降噪处理。结果最终含噪语音经过模型处理后的MOS评分达到3.91分,其中最高分4.05分,最低分3.81分。结论该模型能够切实提高含噪语音的质量及可懂度,对于实际的公安工作、智慧警务建设、语音分析、语音文本识别等具有重要的意义和价值。
关键词：	深度卷积神经网络语音降噪环境噪声
Speech denoising based on deep convolutional neural network

Institution:	1.Criminal Investigation Police University of China, Shenyang110035;

Abstract:	Objective Speech, one of the main means for people to communicate, is defi nitely capable of providing reliable clues for relevant cases to solve and powerful evidence for trial. However, the forensic-purposed speech materials are usually prone to high probability of containing environmental noise that affects the quality of the speech. Thus, a model of vocal noise reduction was here to put forward based on deep convolutional neural network so as to debase the infl uence of noise on speech quality and intelligibility. Methods A model of multi-layer cycling structure was set up through the procedural operation of 2D convolution, bias addition, batch normalization and Relu activation. With optimizing repetitions of the structure, such a model of deep convolutional network was drilled to denoise a variety of common environmental noises (e.g., those from washing machine, clapping and inside automobile) that were contained with speech of low SNR (signal-to-noise ratio). Adam algorithm was used to optimize the training parameters for network to drill. The assessment methodology was adopted with that developed by the Telecommunication Standardization Unit of the International Telecommunication Union (ITU) for evaluation of voice/video quality. Results For the noise-containing speech signals processed through such the setting-up model, the MOS (mean opinion score) was up to 3.91 scores, with the highest 4.05 and lowest 3.81. Conclusions The model set up here is of strong generalization ability to tackle various environmental noises, capable of effectively improving the quality and intelligibility of noisy speech, therefore potential for the involving speech analysis and speech text recognition to play roles into public security practice and construction of intelligent policing. © 2021, Editorial Office of Forensic Science and Technology. All rights reserved.

Keywords:	Deep convolutional neural network Environmental noise Noise reduction
本文献已被维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏