CN102456345A - Concatenated speech detection system and method - Google Patents

Concatenated speech detection system and method Download PDF

Info

Publication number
CN102456345A
CN102456345A CN2010105111445A CN201010511144A CN102456345A CN 102456345 A CN102456345 A CN 102456345A CN 2010105111445 A CN2010105111445 A CN 2010105111445A CN 201010511144 A CN201010511144 A CN 201010511144A CN 102456345 A CN102456345 A CN 102456345A
Authority
CN
China
Prior art keywords
user
voice
comparison module
splicing
splicing speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010105111445A
Other languages
Chinese (zh)
Inventor
张峰
黄伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shengle Information Technolpogy Shanghai Co Ltd
Original Assignee
Shengle Information Technolpogy Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shengle Information Technolpogy Shanghai Co Ltd filed Critical Shengle Information Technolpogy Shanghai Co Ltd
Priority to CN2010105111445A priority Critical patent/CN102456345A/en
Publication of CN102456345A publication Critical patent/CN102456345A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a concatenated speech detection system, which comprises a user login module and a speech comparison module. When user login request information is received, the user login module outputs a text to a user, wherein the text contains N same characters and N is less than or equal to 2; the speech comparison module conducts detection and comparison to a speech given by the user and outputs a concatenated speech recognition signal; and if segments with the same pronunciation are detected in the speech given by the user, the output concatenated speech recognition signal is yes, otherwise, the output concatenated speech recognition signal is no. The invention additionally discloses a concatenated speech detection method. By using the system and the method, concatenated speeches can be accurately detected.

Description

Splicing speech detection system and method
Technical field
The present invention relates to speech recognition technology, particularly a kind of splicing speech detection system and method.
Background technology
Application on Voiceprint Recognition belongs to a kind of of biological identification technology, is a speech parameter according to reflection speaker's physiology and behavioural characteristic in the speech waveform, discerns the technology of speaker ' s identity automatically.The Application on Voiceprint Recognition utilization be the speaker information in the voice signal, and do not consider the words meaning in the voice, it stresses speaker's individual character.
Common voiceprint system is normally fixed or at random text through producing some, lets the user say, to discern its vocal print.But; If user's system has been mounted some hack tools, write down the voice of saying when the user landed in the past, just can be according to the text of voiceprint system generation; Voice when the user was landed in the past carry out cutting and splicing, pretend to be the user to speak with the voice of these splicings then and land.If the pronunciation conversion of each word of user is very fast, back splicing cut apart in these voice, can through analyze some characteristics (the for example variation of energy) of spliced voice, detect voice be splicing or the nature sounding, but the result is not necessarily reliable; If the pronunciation of each word of user is slower, to splice after these voice are cut apart, then existing method is difficult to detect, and in addition, spliced voice also may add some distortion, and existing method more difficulty detects.The hacker can pretend to be the speak system of successful login user of user through the voice of splicing like this, thus infringement user's interests, and security of system is poor.
Summary of the invention
The technical matters that the present invention will solve is the voice that can detect splicing exactly.
For solving the problems of the technologies described above, splicing speech detection system of the present invention comprises user log-in block, voice comparison module;
Said user log-in block is used to receive user logging request information, after receiving user logging request information, exports one section text and gives the user, comprises N identical character in said one section text, 2≤N;
Said voice comparison module is used for the voice that the user sends are detected comparison, output splicing speech recognition signal; If detect in the voice that the user sends the pronunciation identical segments is arranged, the splicing speech recognition signal of said voice comparison module output is for being, otherwise for denying.
Can also comprise voiceprint identification module;
Said user log-in block after receiving user logging request information, is also exported said one section text to said voiceprint identification module;
Said voiceprint identification module, the said one section text that transmits according to the splicing speech recognition signal and the user log-in block of said voice comparison module output carries out Application on Voiceprint Recognition to the voice that the user sends, and determines whether to allow User login to get into computer system; When the splicing speech recognition signal of said voice comparison module output when being, said voiceprint identification module refusing user's login entering computer system; When the splicing speech recognition signal of said voice comparison module output for not the time; Said voiceprint identification module is carried out Application on Voiceprint Recognition according to said one section text to the voice that the user sends; Identification gets into computer system through then allowing User login, otherwise refusing user's is landed the entering computer system.
Whether said voice comparison module can be to have the pronunciation identical segments to judge in the voice that the user sent according to the voice intensity of raw tone and time information corresponding or the phonetic feature that from raw tone, extracts.
Said phonetic feature can be in audio-frequency fingerprint, frequency spectrum, fundamental frequency, resonance peak, the cepstrum coefficient one or more.
Whether said voice comparison module can be to have the pronunciation identical segments to judge in the voice that the user sent according to the audio-frequency fingerprint that from raw tone, extracts.
Whether said voice comparison module can be to have the pronunciation identical segments to judge in the voice that adopt range difference method, cross correlation algorithm or dynamic programming algorithm the user to be sent according to the voice strength information of raw tone or the phonetic feature that from raw tone, extracts.
For solving the problems of the technologies described above, splicing speech detection method of the present invention may further comprise the steps:
One. the user sends landing request information to a user log-in block;
Two. after user log-in block receives user logging request information, export one section text and give the user, comprise N identical character in said one section text, 2≤N;
Three. the voice comparison module detects comparison to the voice that the user sends, output splicing speech recognition signal; If detect in the voice that the user sends the pronunciation identical segments is arranged, the splicing speech recognition signal of said voice comparison module output is for being, otherwise for denying.
After user log-in block receives user logging request information, export one section text and give the user, and can export said one section text to a voiceprint identification module;
When the splicing speech recognition signal of said voice comparison module output when being, said voiceprint identification module refusing user's login entering computer system; When the splicing speech recognition signal of said voice comparison module output for not the time; Said voiceprint identification module is carried out Application on Voiceprint Recognition according to said one section text to the voice that the user sends; Identification gets into computer system through then allowing User login, otherwise refusing user's is landed the entering computer system.
Whether said voice comparison module can be to have the pronunciation identical segments to judge that said character pronunciation is whether identical in the voice that the user sent according to the audio-frequency fingerprint that from raw tone, extracts to judge.
The said identical characters that comprises in said one section text be fix or at random, the number of said identical characters be fix or at random, said identical characters appear at position in said one section text be fix or at random.
Splicing speech detection system of the present invention and method; The text packets that lets user log-in block produce contains identical character, because the voice of splicing are the same for identical character pronunciation, so through the voice comparison module comparison discerned in the voice that the user sends; Detect in the voice and whether include identical segments; Whether the voice that can come judges this time to land are formed by historical voice joint, and accuracy is very high, and the splicing voice through conversion are also had good detection effect.
Description of drawings
Below in conjunction with accompanying drawing and embodiment the present invention is done further detailed description.
Fig. 1 is splicing speech detection system one an embodiment synoptic diagram of the present invention;
Fig. 2 is splicing speech detection method one an embodiment process flow diagram of the present invention.
Embodiment
Splicing speech detection system one embodiment of the present invention is as shown in Figure 1, comprises user log-in block, voice comparison module, voiceprint identification module;
Said user log-in block is used to receive user logging request information, after receiving user logging request information; Export one section text and let the user read aloud, and export said one section text, comprise N identical character in said one section text to said voiceprint identification module to the user; 2≤N; The said identical characters that comprises in said one section text can be fixed, and such as being " 6 " this character, also can generate at random; The number of said identical characters can be fixed, and such as all being 3, also can be at random, such as being 2~5; The position that said identical characters appears in said one section text can be fixed, and is 3 such as the number of identical characters, first in said one section text respectively, and the 3rd, the 6th also can be at random;
Said voice comparison module, the voice that the user is sent detect comparison, output splicing speech recognition signal; If detect in the voice that the user sends the pronunciation identical segments is arranged, the splicing speech recognition signal of said voice comparison module output is for being, otherwise for denying;
Said voiceprint identification module, the said one section text that transmits according to the splicing speech recognition signal and the user log-in block of said voice comparison module output carries out Application on Voiceprint Recognition to the voice that the user sends, and determines whether to allow User login to get into computer system; When the splicing speech recognition signal of said voice comparison module output when being, said voiceprint identification module refusing user's login entering computer system; When the splicing speech recognition signal of said voice comparison module output for not the time; Said voiceprint identification module is carried out Application on Voiceprint Recognition according to said one section text to the voice that the user sends; Identification gets into computer system through then allowing User login, otherwise refusing user's is landed the entering computer system.
Said voice comparison module is that (raw tone is expressed as a Serial No. according to the voice intensity of raw tone and time information corresponding; Each numeral voice intensity at a time in the sequence) or in the phonetic feature that from raw tone, extracts the voice that the user is sent whether there is the pronunciation identical segments to judge; Said phonetic feature comprises audio-frequency fingerprint, frequency spectrum, fundamental frequency, resonance peak, cepstrum coefficient or the like; Whether there is the pronunciation identical segments to judge in the voice that said voice comparison module can send the user through a kind of phonetic feature that from raw tone, extracts (like audio-frequency fingerprint), whether has the pronunciation identical segments to judge in the voice that combination that also can be through a few kinds of phonetic features from raw tone, extracting is sent the user.
Audio-frequency fingerprint is the content-based numeric string that compacts that can represent the important acoustic feature of a section audio, and same audio frequency still can extract identical audio-frequency fingerprint through repeatedly recording, digitizing, and the fingerprint that simultaneously different audio frequency extracts is different.
Whether there is the pronunciation identical segments to judge in the voice that said voice comparison module can adopt range difference method, cross correlation algorithm or dynamic programming algorithm the user to be sent according to the voice strength information of raw tone or the phonetic feature that from raw tone, extracts.
The range difference method is that the voice intensity of raw tone and time information corresponding or the phonetic feature that from raw tone, extracts are handled; Get a window function; Calculate the distance that this lands in the voice in numerical value and the historical voice of user in the window function sometime the numerical value in the window function sometime; If the distance that certain two moment calculates thinks then that less than certain threshold value the pronunciation in these two moment is the same.
Cross correlation algorithm is that the voice intensity of raw tone and time information corresponding or the phonetic feature that from raw tone, extracts are handled; Get a window function; Calculate the product of the numerical value in this window function that lands in the voice a certain moment in numerical value and the historical voice of user in the window function sometime then; If this result, thinks then that the pronunciation in these two moment is the same greater than certain threshold value.
Dynamic programming algorithm is that the voice intensity of raw tone and time information corresponding or the phonetic feature that from raw tone, extracts are handled; Get a window function, calculate the dynamic programming distance that this lands in the voice numerical value in another window function constantly in numerical value and the historical voice of user in the window function sometime then.If this result, thinks then that the pronunciation in these two moment is the same less than certain threshold value.
Above say so is come for example with the numerical value in the window function in certain two moment, and actual needs calculates the distance of the numerical value in each two window functions constantly.
When utilizing splicing speech detection system of the present invention to splice speech detection, as shown in Figure 2, may further comprise the steps:
One. the user sends landing request information to a user log-in block;
Two. after user log-in block receives user logging request information, export one section text and let the user read aloud, and export said one section text, comprise N identical character in said one section text, 2≤N to a voiceprint identification module to the user;
Three. after the voice comparison module was received said one section text that said user log-in block transmits, the voice that the user is sent detected comparison, output splicing speech recognition signal; If detect in the voice that the user sends the pronunciation identical segments is arranged, the splicing speech recognition signal of said voice comparison module output is for being, otherwise for denying.Whether have the pronunciation identical segments to judge that said character pronunciation is whether identical in the voice that the voice comparison module can send the user according to the audio-frequency fingerprint that from raw tone, extracts judges.
Four. when the splicing speech recognition signal of said voice comparison module output when being, said voiceprint identification module refusing user's login entering computer system; When the splicing speech recognition signal of said voice comparison module output for not the time; Said voiceprint identification module is carried out Application on Voiceprint Recognition according to said one section text to the voice that the user sends; Identification gets into computer system through then allowing User login, otherwise refusing user's is landed the entering computer system.
Splicing speech detection system of the present invention and method; The text packets that lets user log-in block produce contains identical character, because the voice of splicing are the same for identical character pronunciation, so through the voice comparison module comparison discerned in the voice that the user sends; Detect in the voice and whether include identical segments; Whether the voice that can come judges this time to land are formed by historical voice joint, and accuracy is very high, and the splicing voice through conversion are also had good detection effect.
In addition; If the hacker has obtained a large amount of voice of user, may make that the voice of splicing are different (because the sample of a large amount of pronunciations are arranged) for identical character pronunciation, at this moment; Can be through increase the methods such as number of times that identical characters occurs in text; Make the hacker need user's more more voice, could generate the different splicing voice of identical character pronunciation, increased hacker's illegal cost greatly.

Claims (10)

1. a splicing speech detection system is characterized in that, comprises user log-in block, voice comparison module;
Said user log-in block is used to receive user logging request information, after receiving user logging request information, exports one section text and gives the user, comprises N identical character in said one section text, 2≤N;
Said voice comparison module is used for the voice that the user sends are detected comparison, output splicing speech recognition signal; If detect in the voice that the user sends the pronunciation identical segments is arranged, the splicing speech recognition signal of said voice comparison module output is for being, otherwise for denying.
2. splicing speech detection system according to claim 1 is characterized in that, also comprises voiceprint identification module;
Said user log-in block after receiving user logging request information, is also exported said one section text to said voiceprint identification module;
Said voiceprint identification module, the said one section text that transmits according to the splicing speech recognition signal and the user log-in block of said voice comparison module output carries out Application on Voiceprint Recognition to the voice that the user sends, and determines whether to allow User login to get into computer system; When the splicing speech recognition signal of said voice comparison module output when being, said voiceprint identification module refusing user's login entering computer system; When the splicing speech recognition signal of said voice comparison module output for not the time; Said voiceprint identification module is carried out Application on Voiceprint Recognition according to said one section text to the voice that the user sends; Identification gets into computer system through then allowing User login, otherwise refusing user's is landed the entering computer system.
3. splicing speech detection system according to claim 1; It is characterized in that said voice comparison module is whether to have the pronunciation identical segments to judge in the voice that the user sent according to the voice intensity of raw tone and time information corresponding or the phonetic feature that from raw tone, extracts.
4. splicing speech detection system according to claim 3 is characterized in that, said phonetic feature is one or more in audio-frequency fingerprint, frequency spectrum, fundamental frequency, resonance peak, the cepstrum coefficient.
5. splicing speech detection system according to claim 1 is characterized in that, whether said voice comparison module is to have the pronunciation identical segments to judge in the voice that the user sent according to the audio-frequency fingerprint that from raw tone, extracts.
6. splicing speech detection system according to claim 3; It is characterized in that said voice comparison module is whether to have the pronunciation identical segments to judge in the voice that adopt range difference method, cross correlation algorithm or dynamic programming algorithm the user to be sent according to the voice strength information of raw tone or the phonetic feature that from raw tone, extracts.
7. a splicing speech detection method is characterized in that, may further comprise the steps:
One. the user sends landing request information to a user log-in block;
Two. after user log-in block receives user logging request information, export one section text and give the user, comprise N identical character in said one section text, 2≤N;
Three. the voice comparison module detects comparison to the voice that the user sends, output splicing speech recognition signal; If detect in the voice that the user sends the pronunciation identical segments is arranged, the splicing speech recognition signal of said voice comparison module output is for being, otherwise for denying.
8. splicing speech detection method according to claim 7 is characterized in that, after user log-in block receives user logging request information, exports one section text and gives the user, and export said one section text to a voiceprint identification module;
When the splicing speech recognition signal of said voice comparison module output when being, said voiceprint identification module refusing user's login entering computer system; When the splicing speech recognition signal of said voice comparison module output for not the time; Said voiceprint identification module is carried out Application on Voiceprint Recognition according to said one section text to the voice that the user sends; Identification gets into computer system through then allowing User login, otherwise refusing user's is landed the entering computer system.
9. splicing speech detection method according to claim 7; It is characterized in that said voice comparison module is whether to have the pronunciation identical segments to judge that said character pronunciation is whether identical in the voice that the user sent according to the audio-frequency fingerprint that from raw tone, extracts to judge.
10. splicing speech detection method according to claim 7; It is characterized in that; The said identical characters that comprises in said one section text be fix or at random; The number of said identical characters be fix or at random, said identical characters appear at position in said one section text be fix or at random.
CN2010105111445A 2010-10-19 2010-10-19 Concatenated speech detection system and method Pending CN102456345A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010105111445A CN102456345A (en) 2010-10-19 2010-10-19 Concatenated speech detection system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010105111445A CN102456345A (en) 2010-10-19 2010-10-19 Concatenated speech detection system and method

Publications (1)

Publication Number Publication Date
CN102456345A true CN102456345A (en) 2012-05-16

Family

ID=46039471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010105111445A Pending CN102456345A (en) 2010-10-19 2010-10-19 Concatenated speech detection system and method

Country Status (1)

Country Link
CN (1) CN102456345A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102882868A (en) * 2012-09-21 2013-01-16 北京十分科技有限公司 Audio-based user login method and device
CN105185379A (en) * 2015-06-17 2015-12-23 百度在线网络技术(北京)有限公司 Voiceprint authentication method and voiceprint authentication device
CN106652986A (en) * 2016-12-08 2017-05-10 腾讯音乐娱乐(深圳)有限公司 Song audio splicing method and device
CN111009238A (en) * 2020-01-02 2020-04-14 厦门快商通科技股份有限公司 Spliced voice recognition method, device and equipment
CN111179912A (en) * 2019-12-05 2020-05-19 厦门快商通科技股份有限公司 Detection method, device and equipment for spliced voice
WO2020211354A1 (en) * 2019-04-16 2020-10-22 平安科技(深圳)有限公司 Speaker identity recognition method and device based on speech content, and storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102882868A (en) * 2012-09-21 2013-01-16 北京十分科技有限公司 Audio-based user login method and device
CN105185379A (en) * 2015-06-17 2015-12-23 百度在线网络技术(北京)有限公司 Voiceprint authentication method and voiceprint authentication device
CN105185379B (en) * 2015-06-17 2017-08-18 百度在线网络技术(北京)有限公司 voiceprint authentication method and device
CN106652986A (en) * 2016-12-08 2017-05-10 腾讯音乐娱乐(深圳)有限公司 Song audio splicing method and device
CN106652986B (en) * 2016-12-08 2020-03-20 腾讯音乐娱乐(深圳)有限公司 Song audio splicing method and equipment
WO2020211354A1 (en) * 2019-04-16 2020-10-22 平安科技(深圳)有限公司 Speaker identity recognition method and device based on speech content, and storage medium
CN111179912A (en) * 2019-12-05 2020-05-19 厦门快商通科技股份有限公司 Detection method, device and equipment for spliced voice
CN111009238A (en) * 2020-01-02 2020-04-14 厦门快商通科技股份有限公司 Spliced voice recognition method, device and equipment
CN111009238B (en) * 2020-01-02 2023-06-23 厦门快商通科技股份有限公司 Method, device and equipment for recognizing spliced voice

Similar Documents

Publication Publication Date Title
CN105938716B (en) A kind of sample copying voice automatic testing method based on the fitting of more precision
CN102456345A (en) Concatenated speech detection system and method
CN102163427B (en) Method for detecting audio exceptional event based on environmental model
WO2017162017A1 (en) Method and device for voice data processing and storage medium
CN105933323B (en) Voiceprint registration, authentication method and device
CN103177733B (en) Standard Chinese suffixation of a nonsyllabic "r" sound voice quality evaluating method and system
CN102456346A (en) Concatenated speech detection system and method
CN105913850B (en) Text correlation vocal print method of password authentication
CN102402985A (en) Voiceprint authentication system for improving voiceprint identification safety and method for realizing the same
CN105933272A (en) Voiceprint recognition method capable of preventing recording attack, server, terminal, and system
CN102223367B (en) Method, device and system for accessing website of mobile subscriber
CN106128465A (en) A kind of Voiceprint Recognition System and method
CN102543084A (en) Online voiceprint recognition system and implementation method thereof
CN104538034A (en) Voice recognition method and system
CN104064189A (en) Vocal print dynamic password modeling and verification method
CN106898355B (en) Speaker identification method based on secondary modeling
CN103985390A (en) Method for extracting phonetic feature parameters based on gammatone relevant images
KR102607373B1 (en) Apparatus and method for recognizing emotion in speech
US7650281B1 (en) Method of comparing voice signals that reduces false alarms
CN102411929A (en) Voiceprint authentication system and implementation method thereof
CN102915740A (en) Phonetic empathy Hash content authentication method capable of implementing tamper localization
CN103778917A (en) System and method for detecting identity impersonation in telephone satisfaction survey
CN105283916A (en) Digital-watermark embedding device, digital-watermark embedding method, and digital-watermark embedding program
CN105118516A (en) Identification method of engineering machinery based on sound linear prediction cepstrum coefficients (LPCC)
KR102140770B1 (en) Method for unlocking user equipment based on voice, user equipment releasing lock based on voice and computer readable medium having computer program recorded therefor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120516