CN102456345A

CN102456345A - Concatenated speech detection system and method

Info

Publication number: CN102456345A
Application number: CN2010105111445A
Authority: CN
Inventors: 张峰; 黄伟
Original assignee: Shengle Information Technolpogy Shanghai Co Ltd
Current assignee: Shengle Information Technolpogy Shanghai Co Ltd
Priority date: 2010-10-19
Filing date: 2010-10-19
Publication date: 2012-05-16

Abstract

The invention discloses a concatenated speech detection system, which comprises a user login module and a speech comparison module. When user login request information is received, the user login module outputs a text to a user, wherein the text contains N same characters and N is less than or equal to 2; the speech comparison module conducts detection and comparison to a speech given by the user and outputs a concatenated speech recognition signal; and if segments with the same pronunciation are detected in the speech given by the user, the output concatenated speech recognition signal is yes, otherwise, the output concatenated speech recognition signal is no. The invention additionally discloses a concatenated speech detection method. By using the system and the method, concatenated speeches can be accurately detected.

Description

Splicing speech detection system and method

Technical field

The present invention relates to speech recognition technology, particularly a kind of splicing speech detection system and method.

Background technology

Application on Voiceprint Recognition belongs to a kind of of biological identification technology, is a speech parameter according to reflection speaker's physiology and behavioural characteristic in the speech waveform, discerns the technology of speaker ' s identity automatically.The Application on Voiceprint Recognition utilization be the speaker information in the voice signal, and do not consider the words meaning in the voice, it stresses speaker's individual character.

Common voiceprint system is normally fixed or at random text through producing some, lets the user say, to discern its vocal print.But; If user's system has been mounted some hack tools, write down the voice of saying when the user landed in the past, just can be according to the text of voiceprint system generation; Voice when the user was landed in the past carry out cutting and splicing, pretend to be the user to speak with the voice of these splicings then and land.If the pronunciation conversion of each word of user is very fast, back splicing cut apart in these voice, can through analyze some characteristics (the for example variation of energy) of spliced voice, detect voice be splicing or the nature sounding, but the result is not necessarily reliable; If the pronunciation of each word of user is slower, to splice after these voice are cut apart, then existing method is difficult to detect, and in addition, spliced voice also may add some distortion, and existing method more difficulty detects.The hacker can pretend to be the speak system of successful login user of user through the voice of splicing like this, thus infringement user's interests, and security of system is poor.

Summary of the invention

The technical matters that the present invention will solve is the voice that can detect splicing exactly.

For solving the problems of the technologies described above, splicing speech detection system of the present invention comprises user log-in block, voice comparison module;

Said user log-in block is used to receive user logging request information, after receiving user logging request information, exports one section text and gives the user, comprises N identical character in said one section text, 2≤N;

Said voice comparison module is used for the voice that the user sends are detected comparison, output splicing speech recognition signal; If detect in the voice that the user sends the pronunciation identical segments is arranged, the splicing speech recognition signal of said voice comparison module output is for being, otherwise for denying.

Can also comprise voiceprint identification module;

Said user log-in block after receiving user logging request information, is also exported said one section text to said voiceprint identification module;

Said voiceprint identification module, the said one section text that transmits according to the splicing speech recognition signal and the user log-in block of said voice comparison module output carries out Application on Voiceprint Recognition to the voice that the user sends, and determines whether to allow User login to get into computer system; When the splicing speech recognition signal of said voice comparison module output when being, said voiceprint identification module refusing user's login entering computer system; When the splicing speech recognition signal of said voice comparison module output for not the time; Said voiceprint identification module is carried out Application on Voiceprint Recognition according to said one section text to the voice that the user sends; Identification gets into computer system through then allowing User login, otherwise refusing user's is landed the entering computer system.

Whether said voice comparison module can be to have the pronunciation identical segments to judge in the voice that the user sent according to the voice intensity of raw tone and time information corresponding or the phonetic feature that from raw tone, extracts.

Said phonetic feature can be in audio-frequency fingerprint, frequency spectrum, fundamental frequency, resonance peak, the cepstrum coefficient one or more.

Whether said voice comparison module can be to have the pronunciation identical segments to judge in the voice that the user sent according to the audio-frequency fingerprint that from raw tone, extracts.

Whether said voice comparison module can be to have the pronunciation identical segments to judge in the voice that adopt range difference method, cross correlation algorithm or dynamic programming algorithm the user to be sent according to the voice strength information of raw tone or the phonetic feature that from raw tone, extracts.

For solving the problems of the technologies described above, splicing speech detection method of the present invention may further comprise the steps:

One. the user sends landing request information to a user log-in block;

Two. after user log-in block receives user logging request information, export one section text and give the user, comprise N identical character in said one section text, 2≤N;

Three. the voice comparison module detects comparison to the voice that the user sends, output splicing speech recognition signal; If detect in the voice that the user sends the pronunciation identical segments is arranged, the splicing speech recognition signal of said voice comparison module output is for being, otherwise for denying.

After user log-in block receives user logging request information, export one section text and give the user, and can export said one section text to a voiceprint identification module;

When the splicing speech recognition signal of said voice comparison module output when being, said voiceprint identification module refusing user's login entering computer system; When the splicing speech recognition signal of said voice comparison module output for not the time; Said voiceprint identification module is carried out Application on Voiceprint Recognition according to said one section text to the voice that the user sends; Identification gets into computer system through then allowing User login, otherwise refusing user's is landed the entering computer system.

Whether said voice comparison module can be to have the pronunciation identical segments to judge that said character pronunciation is whether identical in the voice that the user sent according to the audio-frequency fingerprint that from raw tone, extracts to judge.

The said identical characters that comprises in said one section text be fix or at random, the number of said identical characters be fix or at random, said identical characters appear at position in said one section text be fix or at random.

Splicing speech detection system of the present invention and method; The text packets that lets user log-in block produce contains identical character, because the voice of splicing are the same for identical character pronunciation, so through the voice comparison module comparison discerned in the voice that the user sends; Detect in the voice and whether include identical segments; Whether the voice that can come judges this time to land are formed by historical voice joint, and accuracy is very high, and the splicing voice through conversion are also had good detection effect.

Description of drawings

Below in conjunction with accompanying drawing and embodiment the present invention is done further detailed description.

Fig. 1 is splicing speech detection system one an embodiment synoptic diagram of the present invention;

Fig. 2 is splicing speech detection method one an embodiment process flow diagram of the present invention.

Embodiment

Splicing speech detection system one embodiment of the present invention is as shown in Figure 1, comprises user log-in block, voice comparison module, voiceprint identification module;

Said user log-in block is used to receive user logging request information, after receiving user logging request information; Export one section text and let the user read aloud, and export said one section text, comprise N identical character in said one section text to said voiceprint identification module to the user; 2≤N; The said identical characters that comprises in said one section text can be fixed, and such as being " 6 " this character, also can generate at random; The number of said identical characters can be fixed, and such as all being 3, also can be at random, such as being 2～5; The position that said identical characters appears in said one section text can be fixed, and is 3 such as the number of identical characters, first in said one section text respectively, and the 3rd, the 6th also can be at random;

Said voice comparison module, the voice that the user is sent detect comparison, output splicing speech recognition signal; If detect in the voice that the user sends the pronunciation identical segments is arranged, the splicing speech recognition signal of said voice comparison module output is for being, otherwise for denying;

Said voice comparison module is that (raw tone is expressed as a Serial No. according to the voice intensity of raw tone and time information corresponding; Each numeral voice intensity at a time in the sequence) or in the phonetic feature that from raw tone, extracts the voice that the user is sent whether there is the pronunciation identical segments to judge; Said phonetic feature comprises audio-frequency fingerprint, frequency spectrum, fundamental frequency, resonance peak, cepstrum coefficient or the like; Whether there is the pronunciation identical segments to judge in the voice that said voice comparison module can send the user through a kind of phonetic feature that from raw tone, extracts (like audio-frequency fingerprint), whether has the pronunciation identical segments to judge in the voice that combination that also can be through a few kinds of phonetic features from raw tone, extracting is sent the user.

Audio-frequency fingerprint is the content-based numeric string that compacts that can represent the important acoustic feature of a section audio, and same audio frequency still can extract identical audio-frequency fingerprint through repeatedly recording, digitizing, and the fingerprint that simultaneously different audio frequency extracts is different.

Whether there is the pronunciation identical segments to judge in the voice that said voice comparison module can adopt range difference method, cross correlation algorithm or dynamic programming algorithm the user to be sent according to the voice strength information of raw tone or the phonetic feature that from raw tone, extracts.

The range difference method is that the voice intensity of raw tone and time information corresponding or the phonetic feature that from raw tone, extracts are handled; Get a window function; Calculate the distance that this lands in the voice in numerical value and the historical voice of user in the window function sometime the numerical value in the window function sometime; If the distance that certain two moment calculates thinks then that less than certain threshold value the pronunciation in these two moment is the same.

Cross correlation algorithm is that the voice intensity of raw tone and time information corresponding or the phonetic feature that from raw tone, extracts are handled; Get a window function; Calculate the product of the numerical value in this window function that lands in the voice a certain moment in numerical value and the historical voice of user in the window function sometime then; If this result, thinks then that the pronunciation in these two moment is the same greater than certain threshold value.

Dynamic programming algorithm is that the voice intensity of raw tone and time information corresponding or the phonetic feature that from raw tone, extracts are handled; Get a window function, calculate the dynamic programming distance that this lands in the voice numerical value in another window function constantly in numerical value and the historical voice of user in the window function sometime then.If this result, thinks then that the pronunciation in these two moment is the same less than certain threshold value.

Above say so is come for example with the numerical value in the window function in certain two moment, and actual needs calculates the distance of the numerical value in each two window functions constantly.

When utilizing splicing speech detection system of the present invention to splice speech detection, as shown in Figure 2, may further comprise the steps:

One. the user sends landing request information to a user log-in block;

Two. after user log-in block receives user logging request information, export one section text and let the user read aloud, and export said one section text, comprise N identical character in said one section text, 2≤N to a voiceprint identification module to the user;

Three. after the voice comparison module was received said one section text that said user log-in block transmits, the voice that the user is sent detected comparison, output splicing speech recognition signal; If detect in the voice that the user sends the pronunciation identical segments is arranged, the splicing speech recognition signal of said voice comparison module output is for being, otherwise for denying.Whether have the pronunciation identical segments to judge that said character pronunciation is whether identical in the voice that the voice comparison module can send the user according to the audio-frequency fingerprint that from raw tone, extracts judges.

Four. when the splicing speech recognition signal of said voice comparison module output when being, said voiceprint identification module refusing user's login entering computer system; When the splicing speech recognition signal of said voice comparison module output for not the time; Said voiceprint identification module is carried out Application on Voiceprint Recognition according to said one section text to the voice that the user sends; Identification gets into computer system through then allowing User login, otherwise refusing user's is landed the entering computer system.

In addition; If the hacker has obtained a large amount of voice of user, may make that the voice of splicing are different (because the sample of a large amount of pronunciations are arranged) for identical character pronunciation, at this moment; Can be through increase the methods such as number of times that identical characters occurs in text; Make the hacker need user's more more voice, could generate the different splicing voice of identical character pronunciation, increased hacker's illegal cost greatly.

Claims

1. a splicing speech detection system is characterized in that, comprises user log-in block, voice comparison module;

2. splicing speech detection system according to claim 1 is characterized in that, also comprises voiceprint identification module;

3. splicing speech detection system according to claim 1; It is characterized in that said voice comparison module is whether to have the pronunciation identical segments to judge in the voice that the user sent according to the voice intensity of raw tone and time information corresponding or the phonetic feature that from raw tone, extracts.

4. splicing speech detection system according to claim 3 is characterized in that, said phonetic feature is one or more in audio-frequency fingerprint, frequency spectrum, fundamental frequency, resonance peak, the cepstrum coefficient.

5. splicing speech detection system according to claim 1 is characterized in that, whether said voice comparison module is to have the pronunciation identical segments to judge in the voice that the user sent according to the audio-frequency fingerprint that from raw tone, extracts.

6. splicing speech detection system according to claim 3; It is characterized in that said voice comparison module is whether to have the pronunciation identical segments to judge in the voice that adopt range difference method, cross correlation algorithm or dynamic programming algorithm the user to be sent according to the voice strength information of raw tone or the phonetic feature that from raw tone, extracts.

7. a splicing speech detection method is characterized in that, may further comprise the steps:

One. the user sends landing request information to a user log-in block;

8. splicing speech detection method according to claim 7 is characterized in that, after user log-in block receives user logging request information, exports one section text and gives the user, and export said one section text to a voiceprint identification module;

9. splicing speech detection method according to claim 7; It is characterized in that said voice comparison module is whether to have the pronunciation identical segments to judge that said character pronunciation is whether identical in the voice that the user sent according to the audio-frequency fingerprint that from raw tone, extracts to judge.

10. splicing speech detection method according to claim 7; It is characterized in that; The said identical characters that comprises in said one section text be fix or at random; The number of said identical characters be fix or at random, said identical characters appear at position in said one section text be fix or at random.