CN112017676A - 音频处理方法、装置和计算机可读存储介质 - Google Patents
音频处理方法、装置和计算机可读存储介质 Download PDFInfo
- Publication number
- CN112017676A CN112017676A CN201910467088.0A CN201910467088A CN112017676A CN 112017676 A CN112017676 A CN 112017676A CN 201910467088 A CN201910467088 A CN 201910467088A CN 112017676 A CN112017676 A CN 112017676A
- Authority
- CN
- China
- Prior art keywords
- audio
- probability
- frame
- processed
- effective
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 20
- 238000012545 processing Methods 0.000 claims abstract description 24
- 238000010801 machine learning Methods 0.000 claims abstract description 17
- 230000000875 corresponding effect Effects 0.000 claims description 27
- 230000002596 correlated effect Effects 0.000 claims description 8
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 4
- 125000004122 cyclic group Chemical group 0.000 claims description 3
- 238000000034 method Methods 0.000 abstract description 14
- 238000010586 diagram Methods 0.000 description 12
- 239000010410 layer Substances 0.000 description 11
- 238000012549 training Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 239000002356 single layer Substances 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 206010011224 Cough Diseases 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/167—Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02087—Noise filtering the noise being separate speech, e.g. cocktail party
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Mathematical Optimization (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Mathematics (AREA)
- Molecular Biology (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (9)
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910467088.0A CN112017676A (zh) | 2019-05-31 | 2019-05-31 | 音频处理方法、装置和计算机可读存储介质 |
JP2021569116A JP2022534003A (ja) | 2019-05-31 | 2020-05-18 | 音声処理方法、音声処理装置およびヒューマンコンピュータインタラクションシステム |
US17/611,741 US20220238104A1 (en) | 2019-05-31 | 2020-05-18 | Audio processing method and apparatus, and human-computer interactive system |
PCT/CN2020/090853 WO2020238681A1 (zh) | 2019-05-31 | 2020-05-18 | 音频处理方法、装置和人机交互*** |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910467088.0A CN112017676A (zh) | 2019-05-31 | 2019-05-31 | 音频处理方法、装置和计算机可读存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112017676A true CN112017676A (zh) | 2020-12-01 |
Family
ID=73501009
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910467088.0A Pending CN112017676A (zh) | 2019-05-31 | 2019-05-31 | 音频处理方法、装置和计算机可读存储介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20220238104A1 (zh) |
JP (1) | JP2022534003A (zh) |
CN (1) | CN112017676A (zh) |
WO (1) | WO2020238681A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115394288A (zh) * | 2022-10-28 | 2022-11-25 | 成都爱维译科技有限公司 | 民航多语种无线电陆空通话的语种识别方法及*** |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113593603A (zh) * | 2021-07-27 | 2021-11-02 | 浙江大华技术股份有限公司 | 音频类别的确定方法、装置、存储介质及电子装置 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012158156A1 (en) * | 2011-05-16 | 2012-11-22 | Google Inc. | Noise supression method and apparatus using multiple feature modeling for speech/noise likelihood |
KR101240588B1 (ko) * | 2012-12-14 | 2013-03-11 | 주식회사 좋은정보기술 | 오디오-영상 융합 음성 인식 방법 및 장치 |
WO2013132926A1 (ja) * | 2012-03-06 | 2013-09-12 | 日本電信電話株式会社 | 雑音推定装置、雑音推定方法、雑音推定プログラム及び記録媒体 |
CN104157290A (zh) * | 2014-08-19 | 2014-11-19 | 大连理工大学 | 一种基于深度学习的说话人识别方法 |
CN107077842A (zh) * | 2014-12-15 | 2017-08-18 | 百度(美国)有限责任公司 | 用于语音转录的***和方法 |
US20180068653A1 (en) * | 2016-09-08 | 2018-03-08 | Intel IP Corporation | Method and system of automatic speech recognition using posterior confidence scores |
CN108389575A (zh) * | 2018-01-11 | 2018-08-10 | 苏州思必驰信息科技有限公司 | 音频数据识别方法及*** |
CN108877775A (zh) * | 2018-06-04 | 2018-11-23 | 平安科技(深圳)有限公司 | 语音数据处理方法、装置、计算机设备及存储介质 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100631608B1 (ko) * | 2004-11-25 | 2006-10-09 | 엘지전자 주식회사 | 음성 판별 방법 |
KR100745976B1 (ko) * | 2005-01-12 | 2007-08-06 | 삼성전자주식회사 | 음향 모델을 이용한 음성과 비음성의 구분 방법 및 장치 |
JP4512848B2 (ja) * | 2005-01-18 | 2010-07-28 | 株式会社国際電気通信基礎技術研究所 | 雑音抑圧装置及び音声認識システム |
US10319374B2 (en) * | 2015-11-25 | 2019-06-11 | Baidu USA, LLC | Deployed end-to-end speech recognition |
WO2017112813A1 (en) * | 2015-12-22 | 2017-06-29 | Sri International | Multi-lingual virtual personal assistant |
CN106971741B (zh) * | 2016-01-14 | 2020-12-01 | 芋头科技(杭州)有限公司 | 实时将语音进行分离的语音降噪的方法及*** |
IL263655B2 (en) * | 2016-06-14 | 2023-03-01 | Netzer Omry | Automatic speech recognition |
GB201617016D0 (en) * | 2016-09-09 | 2016-11-23 | Continental automotive systems inc | Robust noise estimation for speech enhancement in variable noise conditions |
US10490183B2 (en) * | 2017-11-22 | 2019-11-26 | Amazon Technologies, Inc. | Fully managed and continuously trained automatic speech recognition service |
-
2019
- 2019-05-31 CN CN201910467088.0A patent/CN112017676A/zh active Pending
-
2020
- 2020-05-18 US US17/611,741 patent/US20220238104A1/en active Pending
- 2020-05-18 WO PCT/CN2020/090853 patent/WO2020238681A1/zh active Application Filing
- 2020-05-18 JP JP2021569116A patent/JP2022534003A/ja active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012158156A1 (en) * | 2011-05-16 | 2012-11-22 | Google Inc. | Noise supression method and apparatus using multiple feature modeling for speech/noise likelihood |
WO2013132926A1 (ja) * | 2012-03-06 | 2013-09-12 | 日本電信電話株式会社 | 雑音推定装置、雑音推定方法、雑音推定プログラム及び記録媒体 |
KR101240588B1 (ko) * | 2012-12-14 | 2013-03-11 | 주식회사 좋은정보기술 | 오디오-영상 융합 음성 인식 방법 및 장치 |
CN104157290A (zh) * | 2014-08-19 | 2014-11-19 | 大连理工大学 | 一种基于深度学习的说话人识别方法 |
CN107077842A (zh) * | 2014-12-15 | 2017-08-18 | 百度(美国)有限责任公司 | 用于语音转录的***和方法 |
US20180068653A1 (en) * | 2016-09-08 | 2018-03-08 | Intel IP Corporation | Method and system of automatic speech recognition using posterior confidence scores |
CN108389575A (zh) * | 2018-01-11 | 2018-08-10 | 苏州思必驰信息科技有限公司 | 音频数据识别方法及*** |
CN108877775A (zh) * | 2018-06-04 | 2018-11-23 | 平安科技(深圳)有限公司 | 语音数据处理方法、装置、计算机设备及存储介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115394288A (zh) * | 2022-10-28 | 2022-11-25 | 成都爱维译科技有限公司 | 民航多语种无线电陆空通话的语种识别方法及*** |
CN115394288B (zh) * | 2022-10-28 | 2023-01-24 | 成都爱维译科技有限公司 | 民航多语种无线电陆空通话的语种识别方法及*** |
Also Published As
Publication number | Publication date |
---|---|
JP2022534003A (ja) | 2022-07-27 |
US20220238104A1 (en) | 2022-07-28 |
WO2020238681A1 (zh) | 2020-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021208287A1 (zh) | 用于情绪识别的语音端点检测方法、装置、电子设备及存储介质 | |
CN106683680B (zh) | 说话人识别方法及装置、计算机设备及计算机可读介质 | |
US20200211550A1 (en) | Whispering voice recovery method, apparatus and device, and readable storage medium | |
CN112185352B (zh) | 语音识别方法、装置及电子设备 | |
CN111402891B (zh) | 语音识别方法、装置、设备和存储介质 | |
CN110428820B (zh) | 一种中英文混合语音识别方法及装置 | |
JP5932869B2 (ja) | N−gram言語モデルの教師無し学習方法、学習装置、および学習プログラム | |
CN109360572B (zh) | 通话分离方法、装置、计算机设备及存储介质 | |
CN112562691A (zh) | 一种声纹识别的方法、装置、计算机设备及存储介质 | |
WO2014029099A1 (en) | I-vector based clustering training data in speech recognition | |
CN111833849B (zh) | 语音识别和语音模型训练的方法及存储介质和电子设备 | |
CN114038457B (zh) | 用于语音唤醒的方法、电子设备、存储介质和程序 | |
CN110491375B (zh) | 一种目标语种检测的方法和装置 | |
CN112102850A (zh) | 情绪识别的处理方法、装置、介质及电子设备 | |
CN112151015A (zh) | 关键词检测方法、装置、电子设备以及存储介质 | |
WO2020238681A1 (zh) | 音频处理方法、装置和人机交互*** | |
CN113628612A (zh) | 语音识别方法、装置、电子设备及计算机可读存储介质 | |
CN114550703A (zh) | 语音识别***的训练方法和装置、语音识别方法和装置 | |
Ding et al. | Personal VAD 2.0: Optimizing personal voice activity detection for on-device speech recognition | |
CN115312033A (zh) | 基于人工智能的语音情感识别方法、装置、设备及介质 | |
CN111091809A (zh) | 一种深度特征融合的地域性口音识别方法及装置 | |
CN113889091A (zh) | 语音识别方法、装置、计算机可读存储介质及电子设备 | |
WO2024093578A1 (zh) | 语音识别方法、装置、电子设备、存储介质及计算机程序产品 | |
Rose et al. | Integration of utterance verification with statistical language modeling and spoken language understanding | |
JP2016162437A (ja) | パターン分類装置、パターン分類方法およびパターン分類プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176 Applicant after: Jingdong Technology Holding Co.,Ltd. Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176 Applicant before: Jingdong Digital Technology Holding Co.,Ltd. Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176 Applicant after: Jingdong Digital Technology Holding Co.,Ltd. Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176 Applicant before: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd. |
|
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |