GB2516208B - Noise reduction in voice communications - Google Patents

Noise reduction in voice communications Download PDF

Info

Publication number
GB2516208B
GB2516208B GB1219175.5A GB201219175A GB2516208B GB 2516208 B GB2516208 B GB 2516208B GB 201219175 A GB201219175 A GB 201219175A GB 2516208 B GB2516208 B GB 2516208B
Authority
GB
United Kingdom
Prior art keywords
voice
phonemes
acoustic signal
words
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
GB1219175.5A
Other versions
GB201219175D0 (en
GB2516208A (en
Inventor
Knight Phil
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AZENBY Ltd
Original Assignee
AZENBY Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AZENBY Ltd filed Critical AZENBY Ltd
Priority to GB1219175.5A priority Critical patent/GB2516208B/en
Publication of GB201219175D0 publication Critical patent/GB201219175D0/en
Publication of GB2516208A publication Critical patent/GB2516208A/en
Application granted granted Critical
Publication of GB2516208B publication Critical patent/GB2516208B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Telephone Function (AREA)

Description

Noise Reduction in Voice Communications
Technical Field of the Invention
The present invention relates to noise reduction in voice communications, in particular to noise reduction in reproduction of voices captured as part of a voice communication.
Background to the Invention
In voice communications, an acoustic signal captured at a first device is transmitted to a second device and reproduced. Typically, the second device is also operable to capture an acoustic signal and transmit this to the first device for reproduction. For convenience, the acoustic signal is usually converted to or encoded in another form for transmission. The captured acoustic signals generally comprise a speaker’s voice and background noise. Furthermore, in transmission of the signal there may be significant channel noise introduced to the signal. If the overall noise level is low, this will not be a significant issue. If the overall noise level is high, whether resulting from background noise, channel noise or both, this can have a significant impact on the intelligibility and/or recognisability of the captured voice reproduced at the other device.
This problem can be addressed by amplifying or filtering the captured voice signals either on capture or on reproduction or by applying similar techniques to the converted form of the signal for transmission. Such simple techniques typically only provide very limited success. It is also possible to address this problem using noise cancellation technology. This requires the provision of noise cancellation microphones to capture the background noise independently of the voice allowing this background noise to subsequently be cancelled from the captured signal either by emitting an opposing acoustic signal or by deleting said noise from the captured signal including the voice. This technique relies upon the provision of additional hardware and on there being suitable places to mount said additional hardware. Furthermore, whilst this system may have impact on reducing background noise, it will not have an impact on channel noise.
It is therefore an object of the present invention to provide a method and system for at least partially overcoming or alleviating the above problems.
Summary of the Invention
According to a first aspect of the present invention there is provided a method of noise reduction in voice communications, the method comprising the steps of: comparing an initial acoustic signal including a voice to a stored model of the voice; identifying elements of the initial acoustic signal corresponding to words or phonemes uttered by the voice; parsing the identified elements into an ordered data stream of said words or phonemes; retrieving data from the stored model of the voice corresponding to the words or phonemes of the ordered data stream; and utilising the retrieved data to generate a secondary acoustic signal corresponding to the parsed words or phonemes.
Identifying the voiced words or phonemes in this manner allows the subsequent reconstruction of a secondary acoustic signal corresponding to the voiced words or phonemes without or with reduced noise. Since the method concentrates on identifying elements within the voice of interest, it can perform more effectively than simple filtering or amplification techniques applied to the initial acoustic signal as a whole. This method can further be applied without the provision of additional microphones to cancel background noise.
The above method may be applied in systems wherein the initial signal is captured by a first voice communication device and the secondary acoustic signal is reproduced by a second voice communication device. In such instances, the method may be applied by the first device or second device as desired or as appropriate. The transmission may take place using any suitable communication networks including but not limited to: public telephone systems, either cellular or fixed line as desired or required, internet connections, Wi-Fi (Registered Trade Mark) networks or other data networks. For transmission the initial or secondary acoustic signal may be converted or encoded in any suitable manner according to the standards of the communication network.
The method may include the step of capturing the initial acoustic signal using a suitable microphone or a device comprising a suitable microphone. The method may include the step of outputting the secondary signal using a suitable loudspeaker or a device comprising a suitable loudspeaker.
The or each voice communication device may be a fixed line or cellular telephone; desktop, laptop or tablet computer; audio or audiovisual recording device or the like.
The method may include the step of identifying the voice. The identification can be achieved by direct consideration of the initial acoustic signal. This consideration may involve comparing the captured acoustic signal to one or more stored voice models. Preferably, where possible, a specific speech model is stored for each speaker. Using individual models for each speaker in this way can significantly increase the effectiveness of the method. Additionally or alternatively, the identification may be made by identifying the voice communication device or a physical or network location of the voice communication device used to capture the acoustic signal. For example, a telephone handset may be identified by a phone number, SIM or handset IMEI.
The method may be applied on all possible occasions. Alternatively, the method may only be applied in response to a user request, or when the noise exceeds a particular threshold. In the last case, the method may include the step of measuring the background noise and/or channel noise and comparing it to a predetermined threshold.
Identifying and parsing the words or phonemes in the initial acoustic signal can be achieved directly by comparing the acoustic signal to the stored model. Additionally or alternatively, the identification and parsing may include a probabilistic prediction based on the syntax of other identified words or phonemes. Using a probabilistic approach can also allow for the identification of phonemes previously missing from a particular voice model.
The stored model may comprise samples of the voice uttering words or phonemes. Additionally or alternatively, the stored model may comprise data indicating how characteristics of the voice differ from reference samples of the same words or phonemes. The voice characteristics may include accent, cadence, tone, excitation, inflexion, spectral characteristics, sound/pause duration or the like.
The method may include the step of updating a stored model on an ongoing basis and/or the step of building up and storing a model of any unidentified voices. This may be achieved by capturing samples of the voice, analysing the samples to identify corresponding words or phonemes and storing said samples or data indicating how the voice characteristics differ from reference samples of the same words or phonemes.
According to a second aspect of the present invention there is provided a noise reduction system for use in voice communication comprising: a library of stored voice models; a speech detection engine operable to identify elements of an initial acoustic signal corresponding to words or phonemes uttered by a voice and parse the identified elements into an ordered data stream of said words or phonemes; a speech reconstruction engine operable to retrieve data from the library of stored voice models corresponding to the words or phonemes of the ordered data stream and to utilise the retrieved data to generate a secondary acoustic signal corresponding to the parsed words or phonemes.
The noise reduction system of the second aspect of the present invention may incorporate any or all features of the first aspect of the present invention, as desired or as appropriate.
According to a third aspect of the present invention there is provided a voice communications device incorporating a noise reduction system according to the second aspect of the present invention.
The voice communications device may be a fixed line or cellular telephone, desktop, laptop or tablet computer, audio or audiovisual recording device or the like.
Detailed Description of the Invention
In order that the invention may be more clearly understood an embodiment/embodiments thereof will now be described, by way of example only, with reference to the accompanying drawings, of which:
Figure 1 is a schematic illustration of a voice communication situation in which the present invention might be implemented;
Figure 2 is a flow diagram illustrating the steps involved in creating or updating a stored voice model in the present invention;
Figure 3 is a flow diagram illustrating the steps involved in processing an initial acoustic signal to reduce background noise in the present invention; and Figure 4 is a schematic block diagram of a mobile telephone handset adapted to implement the present invention.
In a conventional voice communication system, a first voice communication device A (such as a telephone handset) captures an acoustic signal including a speaker’s voice. This captured acoustic signal is then transmitted, in suitably encoded form, via a communication network N to a second voice communication device B. The captured signal is subsequently reproduced by device B for the benefit of a listener. Should the listener at device B wish to reply, device B is also operable to capture an acoustic signal and transmit the suitable encoded signal to device A for reproduction. On occasions where the voice is captured alongside significant amounts of background noise, this background noise forms part of the acoustic signal reproduced for the listener. Additionally or alternatively, there can be significant channel noise encountered upon transmission of a signal. These noise contributions can significantly reduce the intelligibility and/or recognisability of the voice communication.
In the present invention, the acoustic signal captured by device A is subjected to noise reduction processing before reproduction by device B. This processing can take place either at device A before transmission or at device B after receipt but before reproduction. The processing involves an initial step of analysing the captured acoustic signal with respect to a stored model of the speaker’s voice. By way of this analysis, elements of the captured acoustic signal corresponding to words or phonemes uttered by the speaker can be identified and parsed into an ordered data stream of said words or phonemes. Subsequently, data from the stored model of the voice corresponding to the words or phonemes of the ordered data stream can be retrieved and used to generate a new acoustic signal corresponding to the parsed words or phonemes. This new acoustic signal can then be reproduced for the listener. By identifying the voiced words or phonemes in this manner, the subsequent reconstruction of new acoustic signal corresponding to the voiced words or phonemes can substantially exclude noise. This increases the intelligibility of the voice communications considerably on occasions where the voice is captured alongside significant amounts of background noise or is subject to significant amounts of channel noise.
In order for the method to operate, it is necessary to have a viable model of a speaker’s voice. Such a model may be created by processing samples of the speaker’s voice. These samples may be acquired by the speaker submitting a predetermined range of voice samples. More typically, these samples can be acquired by capturing and analysing voice samples on occasions where the noise reduction is not required. Ideally, ongoing sampling allows for each speaker’s voice model to be continuously adapted. This can significantly improve models overtime as the noise contribution in the collected averaged samples will tend to zero as more samples are collected.
Turning now to figure 2, a schematic illustration of this analysis is illustrated. Following detection of a voice at SI, a determination is made at S2 as to whether the speaker has an existing voice model. This may be achieved by comparing the initially detected voice against existing speech models. Alternatively, the speaker may be identified by another method (for instance by their phone number or by direct input of an identity). If the speaker does have an existing model, then at S3, the voice sample is analysed to determine its likely content and characteristic parameters ofthe voice. These parameters may include accent, cadence, tone, excitation, inflexion, spectral characteristics, sound/pause duration or the like. At S4, the voice model is then updated with any additional or revised parameters.
If the speaker does not have an existing model, a choice may be made at step S5 whether to create a new model or not. If a new model is to be created, this model is assigned to the speaker identity at S6. The voice sample can then be analysed and updated as set out above in steps S3 & S4.
Turning now to figure 3, there is presented a flow chart illustrating the steps involved in a preferred implementation of noise reduction processing according to the present invention. The steps are performed on a captured or received acoustic signal. Initially, at step SI 1, it is determined whether a voice is detected. If a voice is detected, at S12, an attempt is made to identify the voice. Thi s attempt may involve comparing the voice against existing voice models and/or analysing the source of the acoustic signal. For instance an acoustic signal received from a particular phone could be directly identified as containing a voice corresponding to the user of the phone. If a voice model exists, at step S13, an assessment is made as to whether the model contains sufficient data to make use of the present method viable.
In the event that use of the voice model is viable, the model parameters are retrieved from the library of voice models at SI 4. At S14, the acoustic signal is analysed probabilistically based on the word/phoneme recognition, syntax considerations and the specific parameters of the voice model At S15 this analysis is processed into an ordered data stream corresponding to the predicted word or phonemes uttered by the voice. Subsequently, at SI6, the voice model can be used to generate a new acoustic signal corresponding to the successive words or phonemes of the data stream. By applying the voice model, the new acoustic signal will correspond substantially to the voice elements within the original signal excluding noise. If desired, for a more natural sound, the new acoustic signal may be mixed with a low level of background noise.
To facilitate processing, one implementation of the invention may involve delaying the signal for a processing interval. In view of the low latency of contemporary networks, a delay of a few milliseconds may prove adequate for processing whilst having minimal impact on a user.
Turning now to figure 4, an exemplary device incorporating a system for implementing the method of the preset invention is shown. The device in this example is a cellular telephone handset 100, albeit that the skilled man will appreciate that this method may be applied to or implemented by any other device useable for voice communication including but not limited to fixed line telephones, desktop, laptop or tablet computers and the like.
The handset 10 incorporates a communication unit 11 adapted to enable data, in particular encoded acoustic signals to be transmitted and received via a cellular telephone network. The handset is also provided with a microphone 12 for capturing an acoustic signal including the voice of a phone user and a loudspeaker 13 for reproducing an acoustic signal received via the communication unit 11.
Within the phone 10 is provided a noise reduction system 100 according to the present invention for implementing the above discussed method. The system 100 comprises a data storage means 110, a speech detection engine 120 and a speech reconstruction engine 130.
The data storage means 110 contains a library of stored voice models. The speech detection engine 120 is operable to retrieve data from the library and use this in the analysis of an acoustic signal. The acoustic signal may be a signal captured by the microphone 12 or may be an acoustic signal received via the communication unit 11. The analysis can allow the speech detection engine to identify elements of the acoustic signal as corresponding to words or phonemes uttered by the modelled voice and to parse the identified elements into an ordered data stream of said words or phonemes. The ordered data stream can then be passed to the speech reconstruction engine 130. Subsequently, the speech reconstruction engine 130 is operable to retrieve data from the library corresponding to the words or phonemes of the ordered data stream and to utilise the retrieved data to generate a new acoustic signal corresponding to the parsed words or phonemes. This new acoustic signal may be output by the loudspeaker 13 or may be passed to the communication unit for transmission to another device via the cellular telephone network.
In further implementations of the invention, it is possible for the ordered data stream of the acoustic signal recreated from the ordered data stream to be fed to additional voice processing units. The data stream/reconstructed audio signal can provide a high quality in put for such systems to undertake further processing before generating an output audio signal. In a particular example, the additional voice processing unit may include a translation engine. In such an example, the captured acoustic signal may be translated into a separate language for regeneration as text or an acoustic signal in a different language.
It is of course to be understood that the invention is not to be restricted to the details of the above embodiment is/embodiments which are described by way of example only.

Claims (24)

1. A method of noise reduction in voice communications, the method comprising the steps of comparing an initial acoustic signal including a voice to a stored model of the voice; identifying elements of the initial acoustic signal corresponding to words or phonemes uttered by the voice; parsing the identified elements into an ordered data stream of said words or phonemes; retrieving data from the stored model of the voice corresponding to the words or phonemes of the ordered data stream; and utilising the retrieved data to generate a secondary acoustic signal corresponding to the parsed words or phonemes.
2. A method as claimed in claim 1 wherein the method is applied in systems wherein the initial signal is captured by a first voice communication device and the secondary acoustic signal is reproduced by a second voice communication device.
3. A method as claimed in claim 2 wherein the method of claim 1 is applied by the first device.
4. A method as claimed in claim 2 wherein the method of claim 1 is applied by the second device.
5. A method as claimed in any one of claims 2 to 4 wherein for transmission the initial or secondary acoustic signal is converted or encoded according to the standards of the communication network.
6. A method as claimed in any preceding claim wherein the method includes the step of capturing the initial acoustic signal using a suitable microphone or a device comprising a suitable microphone.
7. A method as claimed in any preceding claim wherein the method includes the step of outputting the secondary signal using a suitable loudspeaker or a device comprising a suitable loudspeaker.
8. A method as claimed in any one of claims 2 to 7 wherein the or each voice communication device is a fixed line or cellular telephone; desktop, laptop or tablet computer; audio or audiovisual recording device.
9. A method as claimed in any preceding claim wherein the method includes the step of identifying the voice.
10. A method as claimed in claim 9 wherein identification is achieved by direct consideration of the initial acoustic signal.
11. A method as claimed in claim 9 or claim 10 when dependent directly or indirectly on claim 2, wherein identification is achieved by identifying the voice communication device or a physical or network location of the voice communication device used to capture the acoustic signal.
12. A method as claimed in any preceding claim wherein the method is applied in response to a user request.
13. A method as claimed in any preceding claim wherein the method is applied when background noise exceeds a particular threshold.
14. A method as claimed in claim 13 wherein the method includes the step of measuring the background noise and comparing it to a predetermined threshold
15. A method as claimed in any preceding claim wherein identifying and parsing the words or phonemes in the initial acoustic signal is achieved directly by comparing the acoustic signal to the stored model.
16. A method as claimed in any preceding claim wherein identification and parsing includes a probabilistic prediction based on the syntax of other identified words or phonemes.
17. A method as claimed in any preceding claim wherein the stored model comprises samples of the voice uttering words or phonemes.
18. A method as claimed in any preceding claim wherein the stored model comprises data indicating how characteristics of the voice differ from reference samples of the same words or phonemes.
19. A method as claimed in claim 18 wherein the voice characteristics include accent, cadence, tone, excitation, inflexion, spectral characteristics, or sound/pause duration.
20. A method as claimed in any preceding claim wherein the method includes the step of updating a stored model on an ongoing basis and/or the step of building up and storing a model of any unidentified voices.
21. A method as claimed in claim 20 wherein this is achieved by capturing samples of the voice, analysing the samples to identify corresponding words or phonemes and storing said samples or data indicating how the voice characteristics differ from reference samples of the same words or phonemes
22. A noise reduction system for use in voice communication comprising: a library of stored voice models; a speech detection engine operable to identify elements of an initial acoustic signal corresponding to words or phonemes uttered by a voice and parse the identified elements into an ordered data stream of said words or phonemes; a speech reconstruction engine operable to retrieve data from the library of stored voice models corresponding to the words or phonemes of the ordered data stream and to utilise the retrieved data to generate a secondary acoustic signal corresponding to the parsed words or phonemes.
23. A noise reduction system operable to implement the method of any one of claims Ito 21.
24. A voice communications device incorporating a noise reduction system as claimed in 22 or claim 23.
GB1219175.5A 2012-10-25 2012-10-25 Noise reduction in voice communications Active GB2516208B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1219175.5A GB2516208B (en) 2012-10-25 2012-10-25 Noise reduction in voice communications

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1219175.5A GB2516208B (en) 2012-10-25 2012-10-25 Noise reduction in voice communications

Publications (3)

Publication Number Publication Date
GB201219175D0 GB201219175D0 (en) 2012-12-12
GB2516208A GB2516208A (en) 2015-01-21
GB2516208B true GB2516208B (en) 2019-08-28

Family

ID=47358616

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1219175.5A Active GB2516208B (en) 2012-10-25 2012-10-25 Noise reduction in voice communications

Country Status (1)

Country Link
GB (1) GB2516208B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106157959B (en) * 2015-03-31 2019-10-18 讯飞智元信息科技有限公司 Sound-groove model update method and system
CN107481732B (en) * 2017-08-31 2020-10-02 广东小天才科技有限公司 Noise reduction method and device in spoken language evaluation and terminal equipment
CN113409809B (en) * 2021-07-07 2023-04-07 上海新氦类脑智能科技有限公司 Voice noise reduction method, device and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2278708A (en) * 1992-11-04 1994-12-07 Secr Defence Children's speech training aid
US20020087307A1 (en) * 2000-12-29 2002-07-04 Lee Victor Wai Leung Computer-implemented progressive noise scanning method and system
US7133827B1 (en) * 2002-02-06 2006-11-07 Voice Signal Technologies, Inc. Training speech recognition word models from word samples synthesized by Monte Carlo techniques

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2278708A (en) * 1992-11-04 1994-12-07 Secr Defence Children's speech training aid
US20020087307A1 (en) * 2000-12-29 2002-07-04 Lee Victor Wai Leung Computer-implemented progressive noise scanning method and system
US7133827B1 (en) * 2002-02-06 2006-11-07 Voice Signal Technologies, Inc. Training speech recognition word models from word samples synthesized by Monte Carlo techniques

Also Published As

Publication number Publication date
GB201219175D0 (en) 2012-12-12
GB2516208A (en) 2015-01-21

Similar Documents

Publication Publication Date Title
JP6790029B2 (en) A device for managing voice profiles and generating speech signals
US10209951B2 (en) Language-based muting during multiuser communications
US7995732B2 (en) Managing audio in a multi-source audio environment
JP6113302B2 (en) Audio data transmission method and apparatus
TWI527024B (en) Method of transmitting voice data and non-transitory computer readable medium
CN107995360B (en) Call processing method and related product
JP5232151B2 (en) Packet-based echo cancellation and suppression
US20200012724A1 (en) Bidirectional speech translation system, bidirectional speech translation method and program
US9936068B2 (en) Computer-based streaming voice data contact information extraction
US9728202B2 (en) Method and apparatus for voice modification during a call
US9299358B2 (en) Method and apparatus for voice modification during a call
CN111919249A (en) Continuous detection of words and related user experience
US11328721B2 (en) Wake suppression for audio playing and listening devices
US9832299B2 (en) Background noise reduction in voice communication
US20130246061A1 (en) Automatic realtime speech impairment correction
US10540983B2 (en) Detecting and reducing feedback
US10204634B2 (en) Distributed suppression or enhancement of audio features
CN110875036A (en) Voice classification method, device, equipment and computer readable storage medium
GB2516208B (en) Noise reduction in voice communications
CN104078049B (en) Signal processing apparatus and signal processing method
Shang et al. Audio recordings dataset of genuine and replayed speech at both ends of a telecommunication channel
JP2014235263A (en) Speech recognition device and program
KR20070072793A (en) Noise suppressor for audio signal recording and method apparatus
JP2016025471A (en) Echo suppression device, echo suppression program, echo suppression method and communication terminal
CN113593568A (en) Method, system, apparatus, device and storage medium for converting speech into text