CN118282704A

CN118282704A - Safe communication system and method based on voiceprint recognition

Info

Publication number: CN118282704A
Application number: CN202410150569.XA
Authority: CN
Inventors: 谢娅娅
Original assignee: Jingchu University of Technology
Current assignee: Jingchu University of Technology
Priority date: 2024-02-02
Filing date: 2024-02-02
Publication date: 2024-07-02

Abstract

The invention relates to a safe communication system and method based on voiceprint recognition, which belongs to the technical field of communication safety, wherein a password server is mainly used for sending a verification password to a user and a verification server, simultaneously the user is required to input voice data comprising a registration password and the verification password, then the voice data is segmented based on the character number of a known registration password by utilizing end point detection, voiceprint extraction is carried out on the segmented registration password audio data to obtain input voiceprint characteristics, voice recognition is carried out on the segmented verification password audio data to obtain input password content, and the registration password and the verification password are respectively verified, so that the safe verification of communication is realized. For the user in the invention, the user can input correct user voice, can speak the correct registration password to ensure the correct voiceprint recognition, and can input the correct verification password to realize the user verification, thereby greatly improving the safety.

Description

Safe communication system and method based on voiceprint recognition

Technical Field

The invention relates to the technical field of communication safety, in particular to a safety communication system and method based on voiceprint recognition.

Background

Communication security is becoming increasingly important in the modern information age. With the development of technology, people increasingly rely on various communication modes to communicate and exchange, and meanwhile, some software clients are also relied on to communicate with other devices or servers to complete some tasks. However, this also makes the communication channel a target for hackers and criminals. In order to protect personal privacy and business confidentiality, ensuring the security and confidentiality of information, the communication security is crucial.

Voiceprint recognition technology is a biometric technology based on voice signal analysis. Voiceprint models of individuals are created and identified by analyzing individual speech characteristics, such as frequency, pitch, and resonance, among others. The voiceprint recognition has the advantages of non-contact, non-invasive and real-time performance, and simultaneously has higher real-time performance, and can be used for quickly recognizing in a short time. Voiceprint recognition technology is not limited by environment and language, and can accurately recognize in different languages and sound environments. These advantages make voiceprint recognition technology one of important biological feature recognition technology, and have wide application prospect in the security field.

Therefore, how to improve the security of communication based on voiceprint recognition technology is a problem that one wants to solve.

Disclosure of Invention

Therefore, the invention provides a secure communication system and a secure communication method based on voiceprint recognition, which are used for solving the problem of how to improve the security of communication based on voiceprint recognition technology.

The invention provides a safe communication system based on voiceprint recognition, which comprises a password server and a verification server, wherein the password server is in communication connection with the verification server, and the password server is used for receiving the password information, wherein the password server is used for receiving the password information, and the verification server is used for receiving the password information, and the password information is used for receiving the password information, wherein:

The password server is used for sending a randomly generated verification password to the user and the verification server;

The verification server is used for:

acquiring voice data input by a user, wherein the content of the voice data comprises a registration password and a verification password;

performing end point detection on the voice data, and dividing the voice data according to the end point detection result and the character number of the registration password to obtain registration password audio data and verification password audio data;

voiceprint recognition is carried out on the registered password audio data to obtain input voiceprint characteristics;

Performing voice recognition on the verification password audio data to obtain input password content;

and comparing the input voiceprint characteristics with voiceprint verification information preregistered in a verification server by the user, comparing the input password content with the verification password, and establishing communication connection of the user according to the comparison result.

In a preferred implementation manner, the performing endpoint detection on the voice data, and dividing the voice data according to the endpoint detection result and the character number of the registration password to obtain registration password audio data and verification password audio data, includes:

acquiring the character number of a registered password;

performing end point detection on the voice data based on a spectral entropy method to obtain the starting time and the ending time of each character in the voice data;

Obtaining a segmentation time point of the voice data based on the character number of the registered password according to the starting time and the ending time of each character in the voice data;

Based on the starting time and the ending time of each character in the voice data, the voice data is segmented by combining the segmentation time points of the voice data, and the registration password audio data and the verification password audio data are obtained.

In a preferred implementation, the location in the voice data where the password is registered precedes the verification password; the method for detecting the end point of the voice data based on the spectral entropy method, to obtain the starting time and the ending time of each character in the voice data, comprises the following steps:

Carrying out framing, windowing and Fourier transformation on voice data to obtain a plurality of voice frames, and calculating the energy probability density of each sampling point in each voice frame;

Based on the energy probability density of each sampling point in each voice frame, sequentially calculating the spectrum entropy value of the voice frame according to a first preset spectrum entropy calculation function, and judging whether the voice frame corresponding to the spectrum entropy value is a first active frame according to the magnitude relation between the spectrum entropy value and a first preset threshold value;

According to the continuous relation among a plurality of known first active frames, recognizing the starting time and the ending time of characters corresponding to the registered password in the voice data, and counting the number of the known characters;

If the number of the known characters exceeds the number of the characters of the registration password, continuously and sequentially calculating the spectral entropy value of the voice frame according to a second preset spectral entropy calculation function based on the energy probability density of each sampling point in each voice frame, and judging whether the voice frame corresponding to the spectral entropy value is a second active frame or not according to the size relation between the spectral entropy value and a second preset threshold, wherein the computer operation complexity of the second preset spectral entropy calculation function is lower than that of the first preset spectral entropy calculation function;

The start time and the end time of the character corresponding to the verification password in the voice data are identified according to the continuous relation among the plurality of known second active frames.

In a preferred implementation, the first preset spectral entropy calculation function is:

The second preset spectral entropy calculation function is:

Of the above two functions, the following one, A spectral entropy value calculated for the first predetermined spectral entropy calculation function,The spectral entropy value calculated for the second predetermined spectral entropy calculation function,Representing the first in a speech frameA number of sampling points are used to sample the sample,For the total number of sample points in a speech frame,Is the first in the voice frameThe energy probability density of the individual sampling points,AndRespectively different fitting parameters.

In a preferred implementation, the voice print recognition of the registered password audio data to obtain the input voice print feature includes:

And according to the registration password, identifying the voice frequency of the registration password based on a voice print identification algorithm of the text-related type, and obtaining the input voice print characteristics.

In a preferred implementation, the performing speech recognition on the verification password audio data to obtain the input password content includes:

Obtaining the character number of the input verification password according to the starting time and the ending time of the characters corresponding to the verification password in the voice data;

Feature extraction is carried out on the verification password audio data to obtain feature data, and the feature data and the character number of the input verification password are combined to obtain input data;

and inputting the input data into a preset neural network voice recognition model to obtain the content of the input password.

In a preferred implementation manner, the feature extraction of the verification password audio data to obtain feature data, and combining the feature data with the input verification password character number to obtain input data, includes:

Dividing the verification password audio data into a plurality of voice fragments in the time domain, wherein the number of the voice fragments is the same as the number of characters of the input verification password;

performing MFCC feature extraction on each voice segment to obtain a waveform feature vector of each voice segment;

And obtaining an input matrix as input data according to the waveform characteristic vector of each voice segment.

In a preferred implementation manner, the comparing the input voiceprint feature with voiceprint verification information pre-registered by the user in the verification server, comparing the input password content with the verification password, and establishing a communication connection of the user according to the comparison result, including:

comparing the input voiceprint characteristics with voiceprint verification information preregistered by a user in a verification server to obtain a first similarity;

comparing the content of the input password with the verification password to obtain a second similarity;

based on a preset similarity calculation formula, calculating overall similarity according to the first similarity and the second similarity;

comparing the overall similarity with a preset similarity threshold;

If the overall similarity exceeds a preset similarity threshold, judging that the user passes the verification and establishing communication connection of the user;

If the overall similarity does not exceed the preset similarity threshold, judging that the user verification fails;

The preset similarity calculation formula is as follows:

Wherein, For the overall degree of similarity,For the first degree of similarity,In order for the degree of similarity to be a second degree of similarity,AndRespectively, are of different weights, and the weight of the two weights is different,Greater than，AndRespectively, are different variable control parameters,Greater than。

The invention also provides a safe communication method based on voiceprint recognition, which comprises the following steps:

acquiring voice data input by a user through a verification server, wherein the content of the voice data comprises a registration password and a verification password;

The beneficial effects of adopting the embodiment are as follows:

the invention provides a safe communication system and a safe communication method based on voiceprint recognition, which mainly send verification passwords to a user and a verification server through a password server, simultaneously require the user to input voice data comprising registration passwords and verification passwords, then segment the voice data based on the known character numbers of the registration passwords by utilizing end point detection, then carry out voiceprint extraction on the segmented registration password audio data to obtain input voiceprint characteristics, carry out voice recognition on the segmented verification password audio data to obtain input password content, and then respectively verify the registration passwords and the verification passwords through the registration passwords, thereby realizing the safe verification of communication. For the user in the invention, the user can input correct user voice, can speak the correct registration password to ensure the correct voiceprint recognition, and can input the correct verification password to realize the user verification, thereby greatly improving the safety. In addition, in the prior art, the endpoint detection is usually used as a preprocessing step of voice data, and in the invention, after preprocessing through the endpoint detection, the detection result of the endpoint detection is skillfully utilized, and the character number of the registration password is used as a reference, so that the segmentation of two parts of the registration password and the verification password in the voice data is realized, and the complexity of the whole scheme is reduced.

Drawings

FIG. 1 is a system architecture diagram of a secure communication system based on voiceprint recognition provided by the present invention;

FIG. 2 is a flow chart of a method performed by the authentication server of FIG. 1;

FIG. 3 is a flowchart of the method of step S202 in FIG. 2;

FIG. 4 is a flowchart of the method of step S302 in FIG. 3;

fig. 5 is a flowchart of the method of step S204 in fig. 2.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1 and 2, a secure communication system based on voiceprint recognition is disclosed, which includes a password server 100 and an authentication server 200, the password server and the authentication server are in communication connection, wherein:

The password server 100 is used to transmit a randomly generated authentication password to the user 300 and the authentication server 200;

the authentication server 200 is configured to:

S201, voice data input by a user is obtained, and the content of the voice data comprises a registration password and a verification password;

S202, carrying out end point detection on voice data, and dividing the voice data according to the end point detection result and the character number of the registered password to obtain registered password audio data and verification password audio data;

S203, voiceprint recognition is carried out on the registered password audio data to obtain input voiceprint features;

s204, performing voice recognition on the verification password audio data to obtain input password content;

s205, comparing the input voiceprint characteristics with voiceprint verification information pre-registered by the user in a verification server, comparing the input password content with the verification password, and establishing communication connection of the user according to the comparison result.

The registered password is a password which is registered in the verification server in advance by a user, voiceprint verification information which is registered in the verification server by the user is extracted based on the registered password, and the verification password is a dynamic password which is randomly generated in real time. The registered password audio data is the content of the attention book password extracted from the voice data after the end point detection, and similarly, the verification password audio data is the content of the verification password extracted from the voice data after the end point detection. The input voiceprint feature is the voiceprint feature of the sound actually input by the user, and the input password content is the verification password content actually input by the user.

The present invention also provides a more detailed embodiment for clearly illustrating the above process:

For example, the voice data input by the user may be "blue sky and white cloud 2345", where "blue sky and white cloud" is a registered password that the user registers in advance in the authentication server, and voiceprint authentication information is preregistered in the authentication server based on the registered password, and "2345" is a randomly generated authentication password, which may be transmitted to the user by way of an authentication code. In addition to being able to enter the correct sound, the user needs to know "blue sky and white cloud" and "2345" to be able to pass verification.

The user local client can receive and send data without any processing, and the verification server is flexible and convenient, and can divide the voice data only by registering the character number of the password after presetting the structure of the voice data by only detecting and recognizing the position of each character in each voice data through the end point. For example, if the number of characters of the "blue sky and white cloud" is four, after the verification server detects the end point, the end time of the fourth character may be directly used as the dividing time point, and the part of the voice data, in which the password is registered, and the verification password may be accurately extracted. Of course, the voice data may be "2345 blue sky and white", and the user only needs to negotiate with the verification server in advance to define the voice data (the voice data may be determined in the process of software development).

Specifically, in connection with fig. 3, in a preferred embodiment, the step S202 performs endpoint detection on the voice data, and segments the voice data according to the result of the endpoint detection and the number of characters of the registered password to obtain the registered password audio data and the verification password audio data, which specifically includes:

S301, acquiring the character number of a registered password;

S302, carrying out end point detection on voice data based on a spectral entropy method to obtain the starting time and the ending time of each character in the voice data;

S303, obtaining a segmentation time point of the voice data based on the number of characters of the registered password according to the starting time and the ending time of each character in the voice data;

S304, dividing the voice data based on the starting time and the ending time of each character in the voice data and combining the dividing time points of the voice data to obtain the registration password audio data and the verification password audio data.

Spectral entropy is a commonly used endpoint detection method that determines the boundaries of speech activity and non-speech activity by analyzing the spectrum of a speech signal. In spectral entropy methods, the complexity and information content of a spectrum are measured by calculating entropy values over the spectrum. During voice activity, the spectrum of the voice signal fluctuates greatly, and the entropy value is high; while during non-speech activity the spectral change is smaller and the entropy is lower. From the change in entropy values, the start and end points of the speech can be found.

Spectral entropy methods have several important advantages. First, it has high robustness, and can accurately detect the start and end points of speech in the case of noisy environments or poor speech quality. The spectral entropy method has good adaptability and reliability in practical application. And secondly, the spectral entropy method has high calculation speed, can meet the real-time requirement, and is suitable for voice communication and voice recognition scenes needing quick response. In addition, the spectral entropy method is suitable for various voice materials and languages, and has wide applicability. It is not limited by specific voice characteristics and can adapt to different voice signal characteristics. Finally, the spectral entropy method has parameter adjustability, and parameter adjustment can be performed according to specific application scenes, so that better detection effect and accuracy are obtained.

It should be noted that, in the above process, besides dividing a voice data into two parts including a registration password and a verification password, when the registration password audio data and the verification password audio data are specifically obtained, the preprocessing function of the endpoint detection itself is also completed through the start time and the end time of each character, for example: and removing mute parts before and after the beginning of speaking in the audio signal, and removing blank parts between the voice terminal of one character and the voice starting point of the next character so as to achieve the beneficial effects of reducing noise, reducing data quantity, improving recognition rate and the like. The registration password audio data and the verification password audio data obtained in this step are data after the end point detection processing. In the subsequent voiceprint recognition and voice recognition process, pretreatment steps such as framing, windowing and the like can be repeatedly performed, but the step of detecting the end points is not required to be repeated, so that the redundancy degree of the whole method is reduced, and the operation efficiency is improved.

Further, as shown in connection with FIG. 4, in a preferred embodiment, the location of the registered password in the voice data precedes the verification password; step S302, performing endpoint detection on the voice data based on the spectral entropy method to obtain a start time and an end time of each character in the voice data, which specifically includes:

S401, framing, windowing and Fourier transforming are carried out on voice data to obtain a plurality of voice frames, and the energy probability density of each sampling point in each voice frame is calculated;

S402, based on the energy probability density of each sampling point in each voice frame, sequentially calculating the spectrum entropy value of the voice frame according to a first preset spectrum entropy calculation function, and judging whether the voice frame corresponding to the spectrum entropy value is a first active frame according to the magnitude relation between the spectrum entropy value and a first preset threshold value;

S403, recognizing the starting time and the ending time of characters corresponding to the registered password in the voice data according to the continuous relation among a plurality of known first active frames, and counting the number of the known characters;

s404, if the known character number exceeds the character number of the registration password, continuously and sequentially calculating the spectral entropy value of the voice frame according to a second preset spectral entropy calculation function based on the energy probability density of each sampling point in each voice frame, and judging whether the voice frame corresponding to the spectral entropy value is a second active frame or not according to the size relation between the spectral entropy value and a second preset threshold, wherein the computer operation complexity of the second preset spectral entropy calculation function is lower than that of the first preset spectral entropy calculation function;

s405, according to the continuous relation among a plurality of known second active frames, identifying the starting time and the ending time of the characters corresponding to the verification password in the voice data.

The above procedure is an improvement to the existing spectral entropy method, and because the registered password is before and the verification password is after in the voice data of this embodiment, the spectral entropy method is the first to calculate the spectral entropy of the voice frame corresponding to the characters in the registered password. In this embodiment, voiceprint detection is only needed for the part of the registered password, and the starting point of the voice needs to be accurately segmented to ensure correct extraction of the input voiceprint features, while for the part including the verification password, a voice recognition model with better robustness can be used to extract text content to obtain the input password content. Therefore, in this embodiment, the first preset spectral entropy calculation function is adopted to calculate the spectral entropy value, and the detected number of characters is also counted in real time, if the number of characters exceeds the number of characters of the registration password, the speech frame of the current spectral entropy to be calculated can be considered to belong to the part of the verification password in the speech data, and at this time, the second preset spectral entropy calculation function with lower complexity can be adopted to calculate the spectral entropy value, although a part of accuracy may be sacrificed, the faster running speed and the smaller operation pressure are replaced.

Specifically, in a preferred embodiment, the first preset spectral entropy calculation function is:

The second preset spectral entropy calculation function is:

Compared with the first preset spectrum entropy function, the second preset spectrum entropy function approximates a complex logarithmic function with a simple linear function so as to reduce the operation pressure of a computer.

Further, in a preferred embodiment, the step S203 performs voiceprint recognition on the registered password audio data to obtain an input voiceprint feature, which specifically includes:

Voiceprint recognition algorithms include both text-related and text-independent types, where text-related voiceprint recognition algorithms recognize based on characteristics of specific speech content. Such an algorithm can extract voiceprint features more accurately, but obviously requires the user to pronounce the same text as the preset text, which is not flexible in practice. Although the voiceprint recognition algorithm which is irrelevant to the text does not depend on the voice content, the voiceprint recognition algorithm is influenced by factors such as environment, emotional states of people, speaking modes and the like in practice, the establishment of a recognition model is relatively difficult, and the feature extraction precision is low.

In this embodiment, the registered password is a known text existing as a key, so that it is obviously more suitable to perform voiceprint extraction by using an algorithm related to the text, and voiceprint recognition accuracy can be improved. For the verification password, because the verification password is randomly generated, the verification password is subjected to voice recognition to extract content instead of voiceprint recognition, so that the defect of a voiceprint recognition algorithm irrelevant to texts is eliminated, and the feasibility and the reliability of the whole scheme are improved.

Further, referring to fig. 5, in a preferred embodiment, the step S204 of performing speech recognition on the verification password audio data to obtain the input password content specifically includes:

s501, obtaining the character number of the input verification password according to the starting time and the ending time of the characters corresponding to the verification password in the voice data;

s502, extracting features of the verification password audio data to obtain feature data, and combining the feature data with the input verification password character number to obtain input data;

s503, inputting the input data into a preset neural network voice recognition model to obtain the input password content.

The above process further uses the processing result of the preprocessing step of endpoint detection, determines the number of input verification password characters in the verification password audio data based on the start time and the end time of the characters which are already detected, and inputs the number of input verification password characters and the feature data together as known information input data into a preset neural network voice recognition model, so as to improve the speed and accuracy of voice recognition.

The character number of the input verification password can be recorded in the input data as displayed data, can be used as a known parameter to participate in the operation of a preset neural network voice recognition model, and can also be used as a limiting condition for the input data format to be combined with the input data implicitly. For example, in a preferred embodiment, the step S502 performs feature extraction on the verification password audio data to obtain feature data, and combines the feature data with the input verification password character number to obtain input data, which specifically includes:

Assuming that each waveform feature vector is a row vector in the input matrix, the number of rows in the input matrix is the number of input authentication password characters, and the input matrix implicitly contains the information of the number of input authentication password characters although the number of input authentication password characters is not explicitly described.

The present invention also provides a more detailed embodiment for clearly illustrating the above step S502:

Assuming that the speech speed at the time of user voice input is stable and uniform, 10 characters are contained in the verification password audio data which is known to be recognized through end point detection. The verification password audio data can be divided into 10 small speech segments on average and MFCC features (Mel-frequency cepstral coefficients, a prior art that is understood by those skilled in the art, are extracted for each segment, and are not described here too much).

For each speech segment, a fixed time window (e.g., 25 milliseconds) may be selected and the acoustic waveform within this time window is used for MFCC feature extraction. Assuming a sampling rate of 16kHz for a speech segment, a time window of 25 milliseconds corresponds to 400 sampling points. Thus, for each speech segment, the input data may be a waveform feature vector of size 400, and for the entire authentication challenge audio data, the input data is an input matrix of waveform feature vectors, the number of waveform feature vectors in the input matrix being 10.

Further, in a preferred embodiment, the step S205 compares the input voice print feature with voice print verification information pre-registered by the user in the verification server, compares the input password content with the verification password, and establishes a communication connection with the user according to the comparison result, which specifically includes:

comparing the overall similarity with a preset similarity threshold;

The preset similarity calculation formula is as follows:

In the above formula, the first similarity representing the similarity of the input voiceprint features and the preregistered voiceprint verification information is taken as a main determining factor, the second similarity is taken as an auxiliary factor, and the following effects can be achieved by correctly setting weights and changing control parameters:

When the first similarity is higher, the user is indicated to have accurate voice, the probability of being the user is extremely high, the overall similarity is increased, and when the second similarity is higher, the user is indicated to receive the correct verification password, the overall similarity is also increased, and the user can pass the verification; on the contrary, if the second similarity is lower at this time, it indicates that the user may not receive the verification password, and the overall similarity is properly reduced at this time, but this may be caused by signal delay, noise influence, and data transmission errors, and the overall similarity calculated by the above formula is not too low, so that the user still has the possibility of passing the verification.

When the first similarity is lower, the user voice is inaccurate, the probability of the user not being himself is extremely high, the risk is high at this moment and the user should be handled cautiously, the calculated overall similarity is lower, and even if the second similarity is higher at this moment, the calculated overall similarity does not necessarily reach a higher level, so that the condition that lawless persons pass verification in a way of stealing the verification password is avoided.

Further, the invention also provides a safe communication method based on voiceprint recognition, which comprises the following steps:

The principle and advantageous effects of the above method are described with reference to the foregoing descriptions, and will not be described here too much.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A secure communications system based on voiceprint recognition, comprising a password server and an authentication server, the password server and the authentication server being communicatively coupled, wherein:

The verification server is used for:

2. The voiceprint recognition based secure communication system of claim 1, wherein the performing endpoint detection on the voice data and dividing the voice data according to the endpoint detection result and the number of characters of the registration password to obtain the registration password audio data and the verification password audio data comprises:

acquiring the character number of a registered password;

3. The voiceprint recognition based secure communication system of claim 2, wherein the location of the registered password in the voice data precedes the verification password; the method for detecting the end point of the voice data based on the spectral entropy method, to obtain the starting time and the ending time of each character in the voice data, comprises the following steps:

4. A voiceprint recognition based secure communications system according to claim 3, wherein the first predetermined spectral entropy computing function is:

The second preset spectral entropy calculation function is:

5. A secure communications system based on voiceprint recognition according to claim 3, wherein voiceprint recognition of the registered password audio data results in input voiceprint features comprising:

6. A voiceprint recognition based secure communication system according to claim 3, wherein the voice recognition of the authentication password audio data to obtain the entered password content comprises:

7. The voiceprint recognition based secure communication system of claim 6, wherein the feature extracting the authentication password audio data to obtain feature data, and combining the feature data with the input authentication password character number to obtain input data, comprises:

8. The voiceprint recognition based secure communication system of claim 1, wherein the comparing the input voiceprint features to voiceprint authentication information pre-registered by the user in the authentication server, comparing the input password content to the authentication password, and establishing a communication connection for the user based on the comparison result, comprises:

comparing the overall similarity with a preset similarity threshold;

The preset similarity calculation formula is as follows:

9. A secure communication method based on voiceprint recognition, comprising: