首页 | 本学科首页   官方微博 | 高级检索  
 共查询到19条相似文献,搜索用时 125 毫秒
目的为了提高实际工作中获取到的音频资料中语音的质量,降低噪声对语音质量及可懂度的影响,提出了一种基于深度卷积神经网络的语音降噪模型。方法该模型通过卷积、加偏置、批量归一化、Relu激活的多层循环结构,能够有效地对低信噪比条件下语音中的洗衣机噪声、鼓掌噪声、汽车内部噪声等多种常见的环境噪声进行降噪处理。结果最终含噪语音经过模型处理后的MOS评分达到3.91分,其中最高分4.05分,最低分3.81分。结论该模型能够切实提高含噪语音的质量及可懂度,对于实际的公安工作、智慧警务建设、语音分析、语音文本识别等具有重要的意义和价值。  相似文献   

Yang X  Shi S  Ling J 《法医学杂志》1998,14(4):224-225
利用KAY5500语图仪的各项功能及这些功能对语音研究的作用,研究语音的声学特征,探讨语音同一认定的原理、方法,及声谱图反映出的表征语音声学特性的各声纹特征在声纹鉴定中价值。  相似文献   

白噪声不同信噪比对语音基音和共振峰的影响研究   总被引:1,自引:0,他引:1  
目的研究不同强度白噪声对语音特征提取的影响,总结变化规律,为带噪语音的声纹鉴定提供参考。方法对TIMIT连续语音语料库中的录音样本加载不同强度白噪声,使用语音工作站提取纯净语音、不同信噪比带噪语音的基频、共振峰,研究分析白噪声对语音特征参数的影响。结果低噪环境下,语音共振峰相对稳定,增大噪声强度,共振峰出现偏移或者无法检出现象;各阶共振峰抗噪能力不同,低阶共振峰抗噪声能力强,稳定度高,高阶共振峰抗噪能力弱,稳定性差,低阶共振峰抗噪能力优于高阶共振峰;基音在各种噪声强度下稳定度高,具备较强鲁棒性。结论信噪比降低会引起共振峰频率偏移,甚至丢失共振峰;噪声对高阶共振峰影响大于低阶共振峰;基频在噪声环境下具备较高的抗干扰能力,声纹鉴定中应重点分析噪声对语音特征的影响。  相似文献   

本文主要通过对正常、大声两种说话状态下的普通话中三个单元音[a]、[i]、[u]的声强、时长、基频、谐波振幅差值、共振峰等声学参数的分析,综合比较了各参数的变化规律,发现大声说话时的语音并非正常语音的简单放大,二者不仅在声强上存在差别,同时在频率域上也发生了重要变化。同一人不同状态下发音的频谱特征差异性较大,同种状态下发音的相似性、可比性较强,为此,声纹鉴定中应尽量选取状态相同的语音进行比对。  相似文献   

曹洪林  刘建伟 《证据科学》2009,17(6):754-764
本文主要通过对正常、大声两种说话状态下的普通话中三个单元音[a]、[i]、[u]的声强、时长、基频、谐波振幅差值、共振峰等声学参数的分析,综合比较了各参数的变化规律,发现大声说话时的语音并非正常语音的简单放大,二者不仅在声强上存在差别,同时在频率域上也发生了重要变化。同一人不同状态下发音的频谱特征差异性较大,同种状态下发音的相似性、可比性较强,为此,声纹鉴定中应尽量选取状态相同的语音进行比对。  相似文献   

通过介绍两起利用非语音信息最终辅助确认了说话人的司法话者识别检验案件,发现在说话人识别中,当待检语音无法满足语图比对条件时,充分利用非语音信息所揭示出的个体特性将有助于解决话者识别问题。得出了当待检语音条件不充分时,依靠非语音信息来辅助进行话者识别的方法。  相似文献   

目的为送检手机通话录音质量的审查提供理论依据。方法提出一种定量化的语音检材质量评价标准,并对不同手机及不同通信网络下的录音进行质量评价。该标准基于主流鉴定设备,涵盖了声学语谱图共振峰个数及数值、基频参数、区域平均频谱等分析方法以及声纹比对测试。结果实验结果显示,不同条件下得到的通话录音质量存在一定的差异性,会对声纹图谱鉴定产生一定影响,但并不会造成本质性差异。结论语音同一认定中,对基于移动通信网络获取的检材录音应考虑到通话语音质量的差异性对检验的影响,并在鉴定分析中加以评估和克服。  相似文献   

手机通话语音的实验研究   总被引:1,自引:0,他引:1  
当前,手机通话语音已成为司法语音鉴定中最为常见的一种语音形式。本研究从手机通信系统的信道特点出发,分析手机通话语音的声谱特点和共振峰频率变化等情况;同时还比较了不同通话网络、不同通话方式及不同手机的通话语音特点。实验发现,手机通话语音与直接录音语音有明显的变化,主要表现在高低频信息的带宽滤波效应、高低频共振峰的漂移、语音质量、音色、韵律特征等方面;还发现,不同手机通话条件下的语音变化程度不同。最后,讨论了手机通话语音变化对说话人鉴定的影响及鉴定中的注意事项。  相似文献   

一、语言人身分析概述语音人身分析和声纹鉴定、话者计算机自动识别、提高信噪比、噪音分析等技术一样均是司法声学(Forensic Acoustics、Forensic Audio,国外有学者也称司法语音学Forensic Phonetics)的组成部分之一,它是利用方音、方言、语调、词汇、语法、习惯语、口头语等对未知说话人的语音进行人身状况分析,从而推断未知案犯的性别、年龄、生活地域、文化水平、职业,甚至身高、体态,为侦查提供线索和方向的一种鉴定技术。“语音人身分析”和“语言识别”不同,前者是利用语音形式的语言进行人身分析,后者是利用文字形式的语言进行人身分析。据了解罗马尼亚、古巴等国已编制出“语音人身分析”的软件。  相似文献   

本文以目前社会上流行的两款不同类别的电子伪装语音器材为对象,对多名实验对象变声前、后的声学特征(基频、共振峰、声调、能量和过零率等)进行了深入的实验分析。实验结果和分析表明,变声相对于原声,其声学特征变化是有规律的,根据这一声学特征变化规律,对变声语音逆变声就能得到与原声符合很好的恢复语音,这为深入开展对经过电子伪装的语音进行同一人鉴定奠定了基础。  相似文献   

The noncontemporariness of speech is important to both of the two general approaches to speaker identification. Ear-witness identification is one of them; in that instance, the time at which the identification is made is noncontemporary. A substantial amount of research has been carried out on this relationship and it now is well established that an auditor's memory for a voice decays sharply over time. It is the second approach to speaker identification which is of present interest. In this case, samples of a speaker's utterances are obtained at different points in time. For example, a threat call will be recorded and then sometime later (often very much later), a suspect' s exemplar recording will be obtained. In this instance, it is the speech samples that are noncontemporary and they are the materials that are subjected to some form of speaker identification. Prevailing opinion is that noncontemporary speech itself poses just as difficult a challenge to the identification process as does the listener's memory decay in earwitness identification. Accordingly, series of aural-perceptual speaker identification projects were carried out on noncontemporary speech: first, two with latencies of 4 and 8 weeks followed by 4 and 32 weeks plus two more with the pairs separated by 6 and 20 years. Mean correct noncontemporary identification initially dropped to 75-80% at week 4 and this general level was sustained for up to six years. It was only after 20 years had elapsed that a significant drop (to 33%) was noted. It can be concluded that a listener's competency in identifying noncontemporary speech samples will show only modest decay over rather substantial periods of time and, hence, this factor should have only a minimal negative effect on the speaker identification process.  相似文献   

This article examines the effects of hate speech laws in Australia. Triangulating data from primary and secondary sources, we examine five hypothesized effects: whether the laws provide a remedy to targets of hate speech, encourage more respectful speech, have an educative or symbolic effect, have a chilling effect, or create “martyrs.” We find the laws provide a limited remedy in the complaints mechanisms, provide a framework for direct community advocacy, and that knowledge of the laws exists in public discourse. However, the complaints mechanism imposes a significant enforcement burden on targeted communities, who still regularly experience hate speech. We find a reduction in the expression of prejudice in mediated outlets, but not on the street. We find no evidence of a chilling effect and we find the risk of free speech martyrs to be marginal. We draw out the implications of these findings for other countries.  相似文献   

In this study, the effects of fibre type, hair style, time and fibre persistence on the secondary transfer of mask fibres to pillowcases via head hair were studied. Volunteers with a range of hair styles, and masks consisting of different fibre compositions were used in the study. Fibres from the masks were found to transfer from donor subjects to the pillowcases up to 14 nights after the mask had been worn. On average, the number of secondarily transferred fibres found decreased with time; however, this decrease appeared to be more 'linear' in nature, rather than an exponential decay. The greatest degree of secondary transfer occurred with cotton, then acrylic, then wool. In a primary transfer/persistence experiment with a 50% acrylic/50% wool mask, wool was found to persist in the hair more readily than acrylic. The results also showed that the greatest degree of secondary transfer occurred via short straight and long straight hair, with no clear pattern emerging between medium length hair (both straight and curly) and with long curly hair. The implications of these findings for the assessment and interpretation of casework are considered along with data obtained from related studies.  相似文献   


Criteria-Based Content Analysis (CBCA) is a tool to assess the veracity of written statements, and is used as evidence in criminal courts in several countries in the world. CBCA scores are expected to be higher for truth tellers than for liars. The underlying assumption of CBCA is that (i) lying is cognitively more difficult than truth telling, and (ii) that liars are more concerned with the impression they make on others than truth tellers. However, these assumptions have not been tested to date. In the present experiment 80 participants (undergraduate students) lied or told the truth about an event. Afterwards, they completed a questionnaire measuring “cognitive load” and “tendency to control speech”. The interviews were transcribed and coded by trained CBCA raters. In agreement with CBCA assumptions, (i) truth tellers obtained higher scores than liars, (ii) liars experienced more cognitive load than truth tellers, and (iii) liars tried harder to control their speech. However, cognitive load and speech control were not correlated with CBCA scores in the predicted way.  相似文献   

A conventional agarose gel electrophoretic method was described for typing phosphoglucomutase-1, esterase D, or glyoxalase I as single systems. Bloodstain extracts were absorbed into 1-mm-thick agarose gels via an application mask. The electrode wick distance was 12 cm and electrophoresis was carried out at 400 V at 6 degrees C. The electrophoretic run times were 30 min for glyoxalase and 1 h for esterase D or phosphoglucomutase. This method is reliable and produces highly resolved band patterns. Additionally, the shorter separation times as a result of the increased voltage gradient permitted typing of more samples in a given time period compared with presently used methods. This technique requires little technical expertise and can be incorporated into the laboratory at a minimal cost.  相似文献   

Determining appropriate analytical thresholds (ATs) for forensic DNA analysis is critical to maximize allele detection. In this study, six methods to determine ATs for forensic DNA purposes were examined and compared. Four of the methods rely on analysis of the baseline noise of a number of negatives, while two utilize the relationship between relative fluorescence unit signal and DNA input in the polymerase chain reaction (PCR) derived from a dilution series ranging from 1 to 0.06 ng. Results showed that when a substantial mass of DNA (i.e., >1 ng) was amplified, the baseline noise increased, suggesting the application of an AT derived from negatives should only be applied to samples with low levels of DNA. Further, the number and intensity of these noise peaks increased with increasing injection times, indicating that to maximize the ability to detect alleles, ATs should be validated for each post‐PCR procedure employed.  相似文献   

This paper is the second of a series; the first has been published (J Forensic Sci, 1998;43:1153-62). The goal in the initial pair of experiments was to determine if speakers (actors) could effectively mimic the speech of intoxicated individuals and also volitionally reduce the degradation to their speech that resulted from severe inebriation. To this end, two highly controlled experiments involving 12 actor-speakers were carried out. It was found that, even when sober, nearly all of them were judged drunker (when pretending) than when they actually were severely intoxicated. In the second experiment, they tried to sound sober when highly intoxicated; here most were judged less inebriated than they were. The goal of this second paper is to identify some of the speech characteristics that allowed the subjects to achieve the cited illusions. The focus here is on four paralinguistic factors: fundamental frequency (F0), speaking rate, vocal intensity, and nonfluency level. For the simulation of intoxication study, it was found that F0 was raised along with increased intoxication but raised even more when this state was feigned. A slowing of speaking rate was associated with increasing intoxication, but this shift also was greater when the speaker simulated intoxication. The most striking contrast was found for the nonfluencies; they were doubled for actual intoxication, but quadrupled when intoxication was simulated. On the other hand, the shifts exhibited by the subjects when they attempted to sound sober were not as clear cut. Indeed, no systematic relationships were found here for either F0 or vocal intensity. Both speaking rate and the number of nonfluencies shifted appropriately, but these changes were not statistically significant. In sum, discernable suprasegmental relationships occurred for both studies (but especially the first); further, it is predicted that useful cues also will be found embedded in the segmentals (the sounds of speech).  相似文献   

The federal courts’ approach to regulating K-12 public school teacher speech in the classroom has been split during the past twenty years. Some circuit courts use Pickering v. Board of Education, in which speech is examined to see if it touches on a matter of public concern. Others prefer Hazelwood v. Kuhlmeier, which focuses on whether speech is school-sponsored and whether the school had a legitimate reason for restricting it. In 2007, Garcetti v. Ceballos offered a new perspective on public employee speech. In that case, speech was examined to determine whether it was related to an employee's professional duties. An examination of federal court treatment of in-class teacher speech before and after Garcetti shows the case has further complicated the issue because it is being embraced by some federal courts as an appropriate precedent when dealing with classroom speech.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号