CN106098078A

CN106098078A - A kind of audio recognition method that may filter that speaker noise and system thereof

Info

Publication number: CN106098078A
Application number: CN201610413367.5A
Authority: CN
Inventors: 齐东京; 方国宽
Original assignee: Huizhou TCL Mobile Communication Co Ltd
Current assignee: Huizhou TCL Mobile Communication Co Ltd
Priority date: 2016-06-14
Filing date: 2016-06-14
Publication date: 2016-11-09
Anticipated expiration: 2036-06-14
Also published as: CN106098078B

Abstract

The invention provides a kind of audio recognition method that may filter that speaker noise and system thereof, method includes: when detecting by mike typing user speech and speaker storaged voice file in playing intelligent terminal being detected, then obtain user speech and the synthesized voice of loudspeaker sound；According to the first frequency of loudspeaker sound of sampling in intelligent terminal and the first amplitude, and the synthesized voice frequency of synthesized voice and synthesized voice amplitude, it is calculated second frequency and second amplitude of user speech；Filter the tone color of loudspeaker sound in synthesized voice, and restore obtain user speech with second frequency and second amplitude of user speech；According to speech database, user speech is converted into text.Present invention achieves user is using speech recognition software and speaker when playing outer sound, terminal inner treater is analyzed according to sound composition, filter out loudspeaker sound so that the user speech that backstage receives reduces environmental noise, it is achieved the efficient identification of voice.

Description

A kind of audio recognition method that may filter that speaker noise and system thereof

Technical field

The present invention relates to technical field of voice recognition, particularly relate to a kind of audio recognition method that may filter that speaker noise And system.

Background technology

Speech recognition technology the most progressively becomes the key technology of man-machine interface, speech recognition technology and language in information technology Sound synthetic technology combines and enables people to get rid of keyboard, is operated by voice command.The rise of mobile Internet just becomes The most important applied environment of speech recognition, such as the Siri of Apple, domestic news fly software etc., it is possible to identify user efficiently Voice.At present intelligent terminal is upper can install similar software, it is possible to user speech is converted into word, and by voice and after Platform data base mate, and generates text importing, is the most directly controlled.In order to efficient identification voice, need user Avoid environmental noise during input voice as far as possible.

But, when intelligent terminal is when playing music, and user speaks facing to mike, can bring the musical sound of speaker into, Recognition efficiency is caused to decline to a great extent.

Therefore, prior art could be improved and develop.

Summary of the invention

In place of above-mentioned the deficiencies in the prior art, it is an object of the invention to provide a kind of speaker noise of may filter that Audio recognition method and system thereof, it is intended in solution prior art, intelligent terminal is when playing music, and user says facing to mike Words, can bring the musical sound of speaker into, cause the problem that recognition efficiency declines to a great extent.

In order to achieve the above object, this invention takes techniques below scheme:

A kind of audio recognition method that may filter that speaker noise, wherein, said method comprising the steps of:

A, when detecting by mike typing user speech and detecting that speaker is playing storaged voice literary composition in intelligent terminal During part, then obtain user speech and the synthesized voice of loudspeaker sound；

B, according to the first frequency of loudspeaker sound of sampling in intelligent terminal and the first amplitude, and the synthesis of described synthesized voice Voice frequency and synthesized voice amplitude, be calculated second frequency and second amplitude of user speech；

C, filter the tone color of loudspeaker sound in described synthesized voice, and restore with second frequency and second amplitude of user speech Obtain user speech；

D, according to speech database, user speech is converted into text.

The described audio recognition method that may filter that speaker noise, wherein, described step B specifically includes:

B1, according to the least common multiple that synthesized voice frequency is first frequency and second frequency, by synthesized voice frequency and first frequency It is calculated second frequency；

B2, according to synthesized voice amplitude and the difference of the first amplitude, be calculated the second amplitude.

The described audio recognition method that may filter that speaker noise, wherein, described step C specifically includes:

C1, by synthesized voice by after audio coder analog/digital conversion, will have synthesized voice frequency, synthesized voice amplitude and synthesized voice The synthesized voice coding of tone color delivers to processor；

C2, processor filter out the tone color of loudspeaker sound in described synthesized voice, retain the tone color of user speech；

The second frequency of user speech and the second amplitude are changed into part of speech by C3, audio decoder, described part of speech with The tone color of user speech is restored and is obtained user speech.

The described audio recognition method that may filter that speaker noise, wherein, described step D specifically includes:

D1, user speech is uploaded to the speech database in high in the clouds；

D2, user speech is mated in speech database, obtain text；

D3, described text is sent to intelligent terminal, and show.

The described audio recognition method that may filter that speaker noise, wherein, also includes in described step A that processor obtains sound Frequently the loudspeaker sound coding of each frame of loudspeaker sound in encoder.

A kind of speech recognition system that may filter that speaker noise, wherein, including:

Detection and acquisition module, for when detecting by mike typing user speech and detecting that speaker is playing intelligence In energy terminal during storaged voice file, then obtain user speech and the synthesized voice of loudspeaker sound；

Computing module, for the first frequency according to the loudspeaker sound sampled in intelligent terminal and the first amplitude, and described conjunction The synthesized voice frequency of audio and synthesized voice amplitude, be calculated second frequency and second amplitude of user speech；

Filter and restoration module, for filtering the tone color of loudspeaker sound in described synthesized voice, and with the second frequency of user speech Rate and the second amplitude restore and obtain user speech；

Conversion module, for according to speech database, is converted into text by user speech.

The described speech recognition system that may filter that speaker noise, wherein, described computing module specifically includes:

Frequency computing unit, for according to the least common multiple that synthesized voice frequency is first frequency and second frequency, by synthesized voice Frequency and first frequency, be calculated second frequency；

Magnitude determinations unit, according to synthesized voice amplitude and the difference of the first amplitude, is calculated the second amplitude.

The described speech recognition system that may filter that speaker noise, wherein, described filtration and restoration module specifically include:

Coding transmitting element, after by synthesized voice by audio coder analog/digital conversion, will have synthesized voice frequency, synthesis The synthesized voice coding of sound amplitude and synthesized voice tone color delivers to processor；

Filter element, processor filters out the tone color of loudspeaker sound in described synthesized voice, retains the tone color of user speech；

Restoration unit, second frequency and second amplitude of user speech are changed into part of speech, described part by audio decoder Voice obtains user speech with the tone color recovery of user speech.

The described speech recognition system that may filter that speaker noise, wherein, described conversion module specifically includes:

Uploading unit, for being uploaded to the speech database in high in the clouds by user speech；

Matching unit, for being mated in speech database by user speech, obtains text；

Send display unit, for described text is sent to intelligent terminal, and show.

The described speech recognition system that may filter that speaker noise, wherein, is additionally operable to place in described detection and acquisition module Reason device obtains the loudspeaker sound coding of each frame of loudspeaker sound in audio coder.

The audio recognition method that may filter that speaker noise of the present invention and system thereof, method includes: when detecting By mike typing user speech and detect that speaker when playing storaged voice file in intelligent terminal, then obtains use Family voice and the synthesized voice of loudspeaker sound；Shake according to the first frequency and first of the loudspeaker sound of sampling in intelligent terminal Width, and the synthesized voice frequency of synthesized voice and synthesized voice amplitude, be calculated second frequency and second amplitude of user speech；Cross The tone color of loudspeaker sound in filter synthesized voice, and restore obtain user speech with second frequency and second amplitude of user speech； According to speech database, user speech is converted into text.Present invention achieves user using speech recognition software and raising Sound device is when playing outer sound, and the processor in terminal is analyzed according to the composition of sound, filters out loudspeaker sound so that after The user speech that platform receives reduces environmental noise, it is achieved the efficient identification of voice.

Accompanying drawing explanation

Fig. 1 is the flow chart of the audio recognition method preferred embodiment that may filter that speaker noise of the present invention.

Fig. 2 is acquisition user speech in the audio recognition method preferred embodiment that may filter that speaker noise of the present invention Second frequency and the particular flow sheet of the second amplitude.

Fig. 3 is to restore in the audio recognition method preferred embodiment that may filter that speaker noise of the present invention to obtain user The particular flow sheet of voice.

Fig. 4 is the tool converting text in the audio recognition method preferred embodiment that may filter that speaker noise of the present invention Body flow chart.

Fig. 5 is the structured flowchart of the speech recognition system preferred embodiment that may filter that speaker noise of the present invention.

Detailed description of the invention

The present invention provides a kind of audio recognition method that may filter that speaker noise and system thereof, for making the mesh of the present invention , technical scheme and effect clearer, clear and definite, the present invention is described in more detail for the embodiment that develops simultaneously referring to the drawings. Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.

Refer to Fig. 1, it is the stream of the audio recognition method preferred embodiment that may filter that speaker noise of the present invention Cheng Tu.May filter that the audio recognition method of speaker noise described in as it is shown in figure 1, comprise the following steps:

Step S100, when detecting by mike typing user speech and detecting that speaker is deposited playing in intelligent terminal During storage voice document, then obtain user speech and the synthesized voice of loudspeaker sound.

In the present embodiment, when player during user opens intelligent terminal, it is possible to synchronize to open the speech recognition on backstage Process, such intelligent terminal can detect user's whether typing voice in real time when playing music.Once detect intelligent terminal On by player plays voice document, and when having user voice typing, then obtain user speech and the synthesis of loudspeaker sound Sound.Now, during without any process, user speech and loudspeaker sound also cannot be distinguished by out, and this is accomplished by the place of subsequent step Reason.

Step S200, according to the first frequency of loudspeaker sound of sampling in intelligent terminal and the first amplitude, and described conjunction The synthesized voice frequency of audio and synthesized voice amplitude, be calculated second frequency and second amplitude of user speech.

In the present embodiment, owing to speaker material and structure are fixing, therefore speaker tone color is in intelligent terminal Processor be known.Same, player is during playing voice document, and processor obtains in audio coder and raises The loudspeaker sound coding of each frame of sound device sound, can obtain each frame of voice data in loudspeaker sound by player First frequency and the first amplitude.

Since it is known that the first frequency of loudspeaker sound and the first amplitude, and the Composite tone of described synthesized voice Rate and synthesized voice amplitude, therefore can be tried to achieve according to the least common multiple that synthesized voice frequency is first frequency and second frequency Two frequencies, it is possible to be the first amplitude according to synthesized voice amplitude and the second amplitude sum tries to achieve the second amplitude.So, by processing Device simply calculating processes, and i.e. can get second frequency and second amplitude of user speech.

Step S300, filter the tone color of loudspeaker sound in described synthesized voice, and with the second frequency and of user speech Two amplitudes restore and obtain user speech.

When after the second frequency obtaining user speech and the second amplitude, owing to optionally filtering out speaker (owing to material and the structure of speaker are fixing, therefore speaker tone color is for the processor in intelligent terminal to tone color Know), only retain the tone color of user speech, so obtain by the tone color of user speech, second frequency and the second amplitude are resilient To user speech.So, filtered out loudspeaker sound part in synthesized voice, only remained the part of user speech, it is achieved that The speech recognition effect of filtering speaker noise.

Step S400, according to speech database, user speech is converted into text.

After user speech is mated by speech database, then transfer the text for correspondence to, according to the finger corresponding to text Order carries out the operation of correspondence to intelligent terminal.Such as, during user opens player plays music, the voice on backstage is known Other process detection is to user's typing voice " F.F. 10 seconds ", then, after passing through the process of step S100-S400, be converted into text " fast Enter 10 seconds ".Now, player according to control instruction F.F. corresponding to the text by currently playing voice document F.F. 10 seconds.This Sample achieves in the case of sound of having powerful connections, the accurate identification to user speech.

Further, as in figure 2 it is shown, in the described audio recognition method that may filter that speaker noise, described step S200 specifically includes:

Step S201, according to the least common multiple that synthesized voice frequency is first frequency and second frequency, by synthesized voice frequency and One frequency is calculated second frequency.

Due to after loudspeaker sound and user speech form synthesized voice, processor is the Composite tone of synthesized voice of can sampling Rate and synthesized voice amplitude.It is furthermore also known that the least common multiple that synthesized voice frequency is first frequency and second frequency, i.e. 1/ closes Audio frequency=N* (1/ first frequency) * (1/ second frequency), wherein N is any positive integer.According to above formula, can solve and obtain Two frequencies.

Step S202, according to synthesized voice amplitude and the difference of the first amplitude, be calculated the second amplitude.

Further, as it is shown on figure 3, in the described audio recognition method that may filter that speaker noise, described step S300 specifically includes:

Step S301, by synthesized voice by after audio coder analog/digital conversion, will have synthesized voice frequency, synthesized voice amplitude and The synthesized voice coding of synthesized voice tone color delivers to processor；

Step S302, processor filter out the tone color of loudspeaker sound in described synthesized voice, retain the tone color of user speech；

Second frequency and second amplitude of user speech are changed into part of speech, described part by step S303, audio decoder Voice obtains user speech with the tone color recovery of user speech.

Further, as shown in Figure 4, in the described audio recognition method that may filter that speaker noise, described step S400 specifically includes:

Step S401, user speech is uploaded to the speech database in high in the clouds；

Step S402, user speech is mated in speech database, obtain text；

Step S403, described text is sent to intelligent terminal, and show.

Visible, present invention achieves user use speech recognition software and speaker when playing outer sound, in terminal Processor be analyzed according to the composition of sound, filter out loudspeaker sound so that backstage receive user speech in reduce Environmental noise, it is achieved the efficient identification of voice.

Based on said method embodiment, present invention also offers a kind of speech recognition system that may filter that speaker noise. The speech recognition system of speaker noise is may filter that described in as it is shown in figure 5, including:

Detection and acquisition module 100, for when detecting by mike typing user speech and detecting that speaker is being play In intelligent terminal during storaged voice file, then obtain user speech and the synthesized voice of loudspeaker sound；

Computing module 200, for the first frequency according to the loudspeaker sound sampled in intelligent terminal and the first amplitude, and institute State synthesized voice frequency and the synthesized voice amplitude of synthesized voice, be calculated second frequency and second amplitude of user speech；

Filter and restoration module 300, for filtering the tone color of loudspeaker sound in described synthesized voice, and with the second of user speech Frequency and the second amplitude restore and obtain user speech；

Conversion module 400, for according to speech database, is converted into text by user speech.

Further, in the described speech recognition system that may filter that speaker noise, described computing module 200 specifically wraps Include:

Frequency computing unit, for according to the least common multiple that synthesized voice frequency is first frequency and second frequency, by synthesized voice Frequency and first frequency are calculated second frequency；

Further, in the described speech recognition system that may filter that speaker noise, described filtration and restoration module 300 Specifically include:

Further, in the described speech recognition system that may filter that speaker noise, described conversion module 400 specifically wraps Include:

Further, in the described speech recognition system that may filter that speaker noise, described detection and acquisition module 100 It is additionally operable to processor and obtains the loudspeaker sound coding of each frame of loudspeaker sound in audio coder.

In sum, the audio recognition method that may filter that speaker noise of the present invention and system thereof, method includes: When detecting by mike typing user speech and speaker storaged voice file in playing intelligent terminal being detected, Then obtain user speech and the synthesized voice of loudspeaker sound；According in intelligent terminal sampling loudspeaker sound first frequency, And first amplitude, and the synthesized voice frequency of synthesized voice and synthesized voice amplitude, it is calculated the second frequency and of user speech Two amplitudes；Filter the tone color of loudspeaker sound in synthesized voice, and restore obtain with second frequency and second amplitude of user speech User speech；According to speech database, user speech is converted into text.Present invention achieves user is using speech recognition soft Part and speaker are when playing outer sound, and the processor in terminal is analyzed according to the composition of sound, filters out speaker sound Sound so that reduce environmental noise in the user speech that backstage receives, it is achieved the efficient identification of voice.

It is understood that for those of ordinary skills, can according to technical scheme and this Bright design in addition equivalent or change, and all these change or replace the guarantor that all should belong to appended claims of the invention Protect scope.

Claims

1. the audio recognition method that may filter that speaker noise, it is characterised in that said method comprising the steps of:

D, according to speech database, user speech is converted into text.

May filter that the audio recognition method of speaker noise the most according to claim 1, it is characterised in that described step B has Body includes:

May filter that the audio recognition method of speaker noise the most according to claim 1, it is characterised in that described step C has Body includes:

May filter that the audio recognition method of speaker noise the most according to claim 1, it is characterised in that described step D has Body includes:

D1, user speech is uploaded to the speech database in high in the clouds；

D2, user speech is mated in speech database, obtain text；

D3, described text is sent to intelligent terminal, and show.

May filter that the audio recognition method of speaker noise the most according to claim 1, it is characterised in that in described step A Also include that processor obtains the loudspeaker sound coding of each frame of loudspeaker sound in audio coder.

6. the speech recognition system that may filter that speaker noise, it is characterised in that including:

May filter that the speech recognition system of speaker noise the most according to claim 6, it is characterised in that described computing module Specifically include:

May filter that the speech recognition system of speaker noise the most according to claim 6, it is characterised in that described filtration and multiple Grand master pattern block specifically includes:

May filter that the speech recognition system of speaker noise the most according to claim 6, it is characterised in that described conversion module Specifically include:

May filter that the speech recognition system of speaker noise the most according to claim 6, it is characterised in that described detection and Acquisition module is additionally operable to processor and obtains the loudspeaker sound coding of each frame of loudspeaker sound in audio coder.