CN114387968A - Voice unlocking method and device, electronic equipment and storage medium - Google Patents

Voice unlocking method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114387968A
CN114387968A CN202210016005.8A CN202210016005A CN114387968A CN 114387968 A CN114387968 A CN 114387968A CN 202210016005 A CN202210016005 A CN 202210016005A CN 114387968 A CN114387968 A CN 114387968A
Authority
CN
China
Prior art keywords
text
target
voice information
model
voiceprint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210016005.8A
Other languages
Chinese (zh)
Inventor
李忠泽
张鹏
贾巨涛
邹佳悦
吴伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gree Electric Appliances Inc of Zhuhai
Zhuhai Lianyun Technology Co Ltd
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Zhuhai Lianyun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai, Zhuhai Lianyun Technology Co Ltd filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN202210016005.8A priority Critical patent/CN114387968A/en
Publication of CN114387968A publication Critical patent/CN114387968A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application relates to a voice unlocking method, a voice unlocking device, electronic equipment and a storage medium, wherein the voice unlocking method comprises the following steps: establishing a user voiceprint model according to the target voice information acquired at the current moment; if the user voiceprint model is matched with the voiceprint recognition model, matching a text corresponding to the target voice information with a target text to obtain a text matching result; and determining whether to unlock the equipment or not based on the text matching result. In this application, voiceprint recognition and target text recognition are comparatively simple for face recognition and fingerprint recognition, consequently come the unblock equipment through voiceprint and target text, thereby can avoid appearing among the prior art face and the screen fingerprint and be difficult to discern the condition that leads to equipment unblock failure to when guaranteeing the unblock security, improve the unblock convenience.

Description

Voice unlocking method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of communications technologies, and in particular, to a voice unlocking method and apparatus, an electronic device, and a storage medium.
Background
Under the current epidemic situation state, people are used to wearing a mask, the mask is not convenient to use to unlock equipment, and due to the problem that the safety of the mask unlocking is high, the situation that the mask is difficult to identify and the unlocking fails can occur under the condition that a user replaces contact lenses, makes up or is insufficient in external light. In addition, more and more mobile terminals adopt a comprehensive screen design, the common fingerprint identification technology is gradually replaced by the under-screen fingerprint identification technology, and the more advanced under-screen fingerprint identification technology using ultrasonic feedback is also limited by fingers and a screen, so that the situation that unlocking failure is caused by difficulty in fingerprint identification can occur. That is, both the face and the off-screen fingerprint may be difficult to identify, resulting in a failed device unlock.
Disclosure of Invention
The application provides a voice unlocking method and device, electronic equipment and a storage medium, and aims to solve the problem that the unlocking of the equipment fails due to the fact that the face and the finger print under the screen are difficult to recognize.
In a first aspect, the present application provides a voice unlocking method, including:
establishing a user voiceprint model according to the target voice information acquired at the current moment;
if the user voiceprint model is matched with the voiceprint recognition model, matching a text corresponding to the target voice information with a target text to obtain a text matching result;
and determining whether to unlock the equipment or not based on the text matching result.
Optionally, before the establishing of the user voiceprint model according to the target voice information acquired at the current time, the method further includes:
performing voice feature extraction on voice information of at least one user in at least one environment within a preset time period to obtain at least one Mel cepstrum coefficient;
taking the at least one Mel cepstrum coefficient as an input voice characteristic parameter, and performing model training on a Gaussian mixture model-general background model to obtain the voiceprint recognition model; the voiceprint recognition model is used for recognizing the voiceprint of at least one user.
Optionally, the establishing a user voiceprint model according to the target voice information acquired at the current time includes:
extracting a Mel cepstrum coefficient of the target voice information;
and establishing the user voiceprint model by taking the Mel cepstrum coefficient of the target voice information as a voice characteristic parameter.
Optionally, if the user voiceprint model is matched with the voiceprint recognition model, matching the text corresponding to the target speech information with the target text to obtain a text matching result, including:
comparing the user voiceprint model with the voiceprint recognition model;
if the matching degree of the user voiceprint model and the voiceprint recognition model is larger than a preset threshold value, determining that the user voiceprint model is matched with the voiceprint recognition model;
and matching the text corresponding to the target voice information with the target text to obtain a text matching result.
Optionally, before the matching the text corresponding to the target speech information with the target text to obtain a text matching result, the method further includes:
reserving information of a preset frequency band in the target voice information;
and converting the target voice information of the preset frequency band into a text to obtain the text corresponding to the target voice information.
Optionally, the matching the text corresponding to the target voice information with the target text to obtain a text matching result includes:
comparing the text corresponding to the target voice information with the target text;
if the matching degree of the text corresponding to the target voice information and the target text is greater than or equal to a preset matching degree, determining that the text matching result is that the text corresponding to the target voice information is matched with the target text;
and if the matching degree of the text corresponding to the target voice information and the target text is smaller than the preset matching degree, determining that the text matching result is that the text corresponding to the target voice information is not matched with the target text.
Optionally, the determining whether to unlock the device based on the text matching result includes:
if the text matching result is that the text corresponding to the target voice information is matched with the target text, unlocking the equipment;
and if the text matching result is that the text corresponding to the target voice information is not matched with the target text, not unlocking the equipment.
In a second aspect, the present application provides a voice unlocking device, including:
the establishing module is used for establishing a user voiceprint model according to the target voice information acquired at the current moment;
the matching module is used for matching the text corresponding to the target voice information with the target text to obtain a text matching result if the user voiceprint model is matched with the voiceprint recognition model;
and the unlocking module is used for determining whether to unlock the equipment or not based on the text matching result.
In a third aspect, the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete mutual communication through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the steps of the voice unlocking method in any embodiment of the first aspect when executing the program stored in the memory.
In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the voice unlocking method according to any one of the embodiments of the first aspect.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:
according to the voice unlocking method provided by the embodiment of the application, the user voiceprint model is established based on the target voice information, whether the text corresponding to the target voice information is matched with the target text is further compared under the condition that the user voiceprint model is matched with the voiceprint recognition model, namely the voiceprint of the target voice information is matched with the voiceprint of the user using the preset equipment, a text matching result is obtained, and whether the equipment is unlocked is determined based on the text matching result. Voiceprint recognition and target text recognition are simpler than face recognition and fingerprint recognition, so that equipment is unlocked through the voiceprint and the target text, the condition that the face and the fingerprint under the screen are difficult to recognize in the prior art and accordingly equipment unlocking failure is avoided, unlocking safety is guaranteed, and unlocking convenience is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic flowchart of a voice unlocking method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a voice unlocking process provided in an embodiment of the present application;
fig. 3 is a schematic view of a voice unlocking device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to solve the problem that the unlocking of equipment fails due to the fact that the face and the fingerprint under the screen are difficult to recognize, the embodiment of the application provides a voice unlocking method which is applied to a processor, and the processor can be located in any equipment. As shown in fig. 1, the voice unlocking method includes steps 101 to 103:
step 101: and establishing a user voiceprint model according to the target voice information acquired at the current moment.
The target voice information acquired at the current moment is the voice information input by the user for unlocking the equipment at the current moment.
Optionally, before step 101 is executed, a voiceprint recognition model for recognizing a voiceprint of at least one user needs to be established, so as to determine that the target voice information is voice information input by a legal user through the voiceprint recognition model.
In a possible implementation manner, voice information within a preset time period is obtained, and model training is performed based on the voice information within the preset time period to obtain the voiceprint recognition model. The preset time period is located before the current time, that is, the voice information in the preset time period is the historical voice information input before the current time. In addition, the preset time period may be predetermined or determined according to actual conditions, for example, the preset time period may be 20 days, 1 month, half a year, or a year.
Specifically, voice information of at least one user in at least one environment within a preset time period is acquired for voice feature extraction, and at least one mel-scale frequency cepstral coefficients (MFCC) is obtained. Then, taking the at least one mel cepstrum coefficient as an input speech feature parameter, that is, taking the at least one mel cepstrum coefficient as an input parameter of a model, performing model training on a gaussian mixture model-general background model (GMM-UBM) to obtain the voiceprint recognition model, wherein the voiceprint recognition model can be used for recognizing the voiceprint of the at least one user.
Generally, the speech information used for the training of the voiceprint recognition model is speech information of a plurality of users under various environments, rather than speech information of a single user or a single environment. As such, the voiceprint recognition model can be used to recognize voiceprints of multiple users for unlocking of devices belonging to the multiple users.
In addition, because the influence of a small amount of voice information on the voiceprint recognition model is limited, the voiceprint recognition model does not need to be established before the step 101 is executed each time, and after the step 101 to the step 103 are executed for multiple times and a preset amount of voice information is acquired, the voiceprint recognition model can be updated by using the preset amount of voice information, so that resources are saved, and unnecessary resource waste is reduced.
The preset amount may be predetermined or determined according to actual conditions.
Certainly, in order to ensure the accuracy of voiceprint recognition, the preset amount may also be set to 1, that is, retraining the voiceprint recognition model is performed every time 1 piece of speech information is acquired, so as to obtain an updated voiceprint recognition model.
Illustratively, voice feature extraction is performed on the obtained voice information of the at least one user in the at least one environment, so as to obtain at least one mel-frequency cepstrum coefficient. Before the voice feature extraction, the pre-emphasis processing, the framing operation, the windowing processing and the like are carried out on the acquired voice information, so that the voice part is highlighted, the noise interference is reduced, and the signal-to-noise ratio is improved. In addition, it is possible to determine a better speech feature by extracting a mel cepstrum coefficient as a speech feature as compared with extracting a speech feature based on Linear Predictive Coding (LPC). And a multi-dimensional dynamic and static scheme is adopted, so that the Mel cepstrum coefficient with higher accuracy and sensitivity can be extracted, and the Mel cepstrum coefficient can better reflect the vocal print characteristics of a user.
Illustratively, according to the extraction scheme of the mel cepstrum coefficient, a voice frequency band of normal pronunciation of a person is used as a limiting filter according to a critical frequency band of an auditory effect of the human ear, the amplitudes of all signals in the frequency band range are weighted and summed, then the natural logarithm operation is carried out on the summation result as a result, and the mel cepstrum coefficient is obtained through discrete cosine transformation. For details, reference may be made to the prior art, which is not described herein in detail.
In popular terms, the critical frequency band is a section of audio masked by white noise, the noise is subjected to bandwidth adding to make the section of audio invisible to human ears, the frequency band at the transition position where the section of audio can be heard is used as the critical frequency band, and a corresponding filter is determined according to the critical frequency band.
Illustratively, the extracted at least one mel-frequency cepstrum coefficient is used as an input speech characteristic parameter, and a Gaussian mixture model-general background model is used for model training. The Gaussian mixture model is obtained by weighting a plurality of N-dimensional Gaussian distributions, and the speech of a plurality of users in various environments is collected for training to obtain the trained Gaussian mixture model corresponding to the speech characteristics of each user. The generic background model is a multi-dimensional probability density function represented by a density-weighted sum of N gaussian components. And then, obtaining the Gaussian mixture model-general background model of a plurality of users according to the trained Gaussian mixture model and general background model.
In particular, the voiceprint recognition model is periodically updated. Illustratively, the period is one year, one month, one week, or the like
Optionally, in the process of establishing the user voiceprint model according to the target voice information acquired at the current time, a mel cepstrum coefficient of the target voice information is extracted, and then the mel cepstrum coefficient of the target voice information is used as a voice feature parameter to establish the user voiceprint model.
Step 102: and if the user voiceprint model is matched with the voiceprint recognition model, matching the text corresponding to the target voice information with the target text to obtain a text matching result.
Optionally, it is determined whether the user voiceprint model and the voiceprint recognition model are matched, and then it is determined whether to match the text corresponding to the target voice information with the target text according to the determination result.
In one possible implementation, the user voiceprint model is compared to a voiceprint recognition model. If the matching degree of the user voiceprint model and the voiceprint recognition model is larger than a preset threshold value, determining that the user voiceprint model is matched with the voiceprint recognition model, and matching a text corresponding to the target voice information with the target text to obtain a text matching result; and if the matching degree of the user voiceprint model and the voiceprint recognition model is smaller than or equal to a preset threshold value, determining that the user voiceprint model is not matched with the voiceprint recognition model, and determining that the equipment is unlocked unsuccessfully. The preset threshold may be predetermined or determined according to actual conditions.
Specifically, voice feature extraction is performed on target voice information, the obtained mel-frequency cepstrum coefficient is used as a voice feature parameter and is input into a voiceprint recognition model, the obtained result is compared with a user voiceprint model, and the similarity between the voiceprint recognition model and the user voiceprint model, namely the matching degree of the user voiceprint model and the recognition model is obtained.
For example, taking the preset threshold value as 0.5, if the matching degree between the user voiceprint model and the voiceprint recognition model is greater than 0.5, it is determined that the user voiceprint model matches the voiceprint recognition model.
Specifically, in the process of comparing the obtained result with the user voiceprint model, the result of the model is compared with the result of the user voiceprint model, and the similarity between the result of the model and the result of the user voiceprint model is determined, and the larger the difference between the result of the model and the result of the user voiceprint model is, the lower the similarity is.
Illustratively, an expectation-maximization (EM) algorithm based on a maximum likelihood criterion is calculated to obtain the probability of the user's voiceprint model. Similarly, the target result is calculated in the same or similar way to obtain the probability of the voiceprint recognition model. The target result is obtained after a Mel cepstrum coefficient obtained by extracting voice features of target voice information is used as voice feature parameters and input into a voiceprint recognition model. At this time, if the difference between the probability of the user voiceprint model and the probability of the voiceprint recognition model is smaller than the preset difference, the matching degree between the user voiceprint model and the voiceprint recognition model is larger than the preset threshold value; if the difference value between the probability of the user voiceprint model and the probability of the voiceprint recognition model is equal to the preset difference value, the matching degree between the user voiceprint model and the voiceprint recognition model is equal to the preset threshold value; and if the difference value between the probability of the user voiceprint model and the probability of the voiceprint recognition model is larger than the preset difference value, the matching degree between the user voiceprint model and the voiceprint recognition model is smaller than the preset threshold value.
The preset difference may be predetermined or determined according to an actual working condition.
In a possible implementation manner, before a text corresponding to the target voice information is matched with the target text to obtain a text matching result, information of a preset frequency band in the target voice information is reserved, and then, the target voice information of the preset frequency band is converted into the text to obtain the text corresponding to the target voice information.
Correspondingly, information of a preset frequency band in the target voice information is reserved, and information outside the preset frequency band in the target voice information is deleted. Or may be regarded as performing emphasis processing on information in a preset frequency band in the target voice information. It should be noted that, information outside the preset frequency band in the target voice information is deleted, that is, the information of the preset frequency band in the target voice information is pre-emphasized, so that the situation that the text matching result is affected by the context is avoided, the workload of processing the target voice information subsequently can be reduced, and the efficiency of determining the text matching result is accelerated.
Illustratively, if the preset frequency band is 300Hz to 700Hz, the information of the target voice information with the frequency band of 300Hz to 700Hz is reserved, and the information of the target voice information with the frequency band of less than 300Hz or more than 700Hz is deleted.
It should be noted that the preset frequency band is determined according to the frequency band variation range of the target voice information.
In one possible implementation, the text corresponding to the target speech information is compared with the target text. And determining whether the text corresponding to the target voice information is matched with the target text or not based on the matching degree of the text corresponding to the target voice information and the target text.
Specifically, if the matching degree of the text corresponding to the target voice information and the target text is greater than or equal to the preset matching degree, determining that the text matching result is that the text corresponding to the target voice information is matched with the target text; and if the matching degree of the text corresponding to the target voice information and the target text is smaller than the preset matching degree, determining that the text matching result is that the text corresponding to the target voice information is not matched with the target text.
The matching degree between the text corresponding to the target voice information and the target text is the similarity between the text corresponding to the target voice information and the target text.
Illustratively, the target voice message is "xxx unlocked cell phone xxx", wherein the "unlocked cell phone" part is information in a preset frequency band, and the "xxx" part is noise or other blurred audio, that is, information outside the preset frequency band. And reserving information of a preset frequency band in the target voice information, and deleting information except the preset frequency band in the target voice information so as to pre-emphasize the information of the preset frequency band. And then, performing text recognition on the reserved preset frequency band of the target voice information to obtain a text 'unlocking mobile phone', and determining the text 'unlocking mobile phone' as a text corresponding to the target voice information. And finally, determining the similarity, namely the matching degree, of the text corresponding to the target voice information and the target text, such as 'unlocking a mobile phone', to be 100%. Taking the preset matching degree as 100% as an example, if the matching degree of the text corresponding to the target voice information and the target text is equal to the preset matching degree, determining that the text matching result is that the text corresponding to the target voice information is matched with the target text. Of course, if the matching degree between the text corresponding to the target voice information and the target text is, for example, 80% and is less than the preset matching degree 100%, it is determined that the text matching result is that the text corresponding to the target voice information does not match the target text.
The target text is a text corresponding to the instruction for unlocking the equipment, which is determined through multiple experiments.
Step 103: and determining whether to unlock the equipment based on the text matching result.
The text matching result comprises that the text corresponding to the target voice information is matched with the target text, and the text corresponding to the target voice information is not matched with the target text.
If the text matching result is that the text corresponding to the target voice information is matched with the target text, unlocking the equipment; correspondingly, if the text matching result is that the text corresponding to the target voice information is not matched with the target text, the device is not unlocked, that is, the device is unlocked unsuccessfully.
In one possible implementation, after the device fails to be unlocked, prompt information is generated and displayed.
Illustratively, the prompt message may be, for example, "unlock failed, retry" or the like "
Specifically, the prompt message is played in voice.
It should be noted that, through the above process, a user voiceprint model is established based on the target voice information, and under the condition that the user voiceprint model is matched with the voiceprint recognition model, that is, the voiceprint of the target voice information is matched with the voiceprint of the preset user using the device, whether the text corresponding to the target voice information is matched with the target text is further compared to obtain a text matching result, and whether the device is unlocked is determined based on the text matching result. Voiceprint recognition and target text recognition are simpler than face recognition and fingerprint recognition, so that equipment is unlocked through the voiceprint and the target text, the condition that the face and the fingerprint under the screen are difficult to recognize in the prior art and accordingly equipment unlocking failure is avoided, unlocking safety is guaranteed, and unlocking convenience is improved.
As shown in fig. 2, a training sample (i.e., historical speech information) is obtained, feature parameters (i.e., mel cepstrum coefficients) of the training sample are extracted, and a gaussian mixture model-general background model training is performed to obtain a voiceprint recognition model. In addition, target voice information is obtained, characteristic parameters of the target voice information are extracted to obtain a user voiceprint model, and the user voiceprint model is matched with the voiceprint recognition model to perform voiceprint matching. And then, matching text content to obtain a score result, namely a text matching result. Finally, whether to unlock the device is determined based on the text matching result.
As shown in fig. 3, an embodiment of the present application provides a voice unlocking apparatus, which includes a setup module 301, a matching module 302, and an unlocking module 303.
The establishing module 301 is configured to establish a user voiceprint model according to the target voice information acquired at the current time.
A matching module 302, configured to match the text corresponding to the target voice information with the target text to obtain a text matching result, if the user voiceprint model matches the voiceprint recognition model.
And an unlocking module 303, configured to determine whether to unlock the device based on the text matching result.
As shown in fig. 4, the embodiment of the present application provides an electronic device, which includes a processor 401, a communication interface 402, a memory 403, and a communication bus 404, where the processor 401, the communication interface 402, and the memory 403 complete mutual communication via the communication bus 404,
a memory 403 for storing a computer program;
in an embodiment of the present application, the processor 401 is configured to implement the steps of the voice unlocking method provided in any one of the foregoing method embodiments when executing the program stored in the memory 403.
The present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the voice unlocking method provided in any one of the foregoing method embodiments.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A voice unlocking method, characterized in that the method comprises:
establishing a user voiceprint model according to the target voice information acquired at the current moment;
if the user voiceprint model is matched with the voiceprint recognition model, matching a text corresponding to the target voice information with a target text to obtain a text matching result;
and determining whether to unlock the equipment or not based on the text matching result.
2. The voice unlocking method according to claim 1, wherein before the establishing of the user voiceprint model according to the target voice information acquired at the current time, the method further comprises:
performing voice feature extraction on voice information of at least one user in at least one environment within a preset time period to obtain at least one Mel cepstrum coefficient;
taking the at least one Mel cepstrum coefficient as an input voice characteristic parameter, and performing model training on a Gaussian mixture model-general background model to obtain the voiceprint recognition model; the voiceprint recognition model is used for recognizing the voiceprint of at least one user.
3. The voice unlocking method according to claim 1, wherein the establishing of the user voiceprint model according to the target voice information acquired at the current time comprises:
extracting a Mel cepstrum coefficient of the target voice information;
and establishing the user voiceprint model by taking the Mel cepstrum coefficient of the target voice information as a voice characteristic parameter.
4. The voice unlocking method according to claim 1, wherein if the user voiceprint model is matched with the voiceprint recognition model, matching a text corresponding to the target voice information with a target text to obtain a text matching result, includes:
comparing the user voiceprint model with the voiceprint recognition model;
if the matching degree of the user voiceprint model and the voiceprint recognition model is larger than a preset threshold value, determining that the user voiceprint model is matched with the voiceprint recognition model;
and matching the text corresponding to the target voice information with the target text to obtain a text matching result.
5. The voice unlocking method according to claim 1, wherein before the matching of the text corresponding to the target voice information and the target text to obtain the text matching result, the method further comprises:
reserving information of a preset frequency band in the target voice information;
and converting the target voice information of the preset frequency band into a text to obtain the text corresponding to the target voice information.
6. The voice unlocking method according to claim 1, wherein the matching of the text corresponding to the target voice information and the target text to obtain a text matching result comprises:
comparing the text corresponding to the target voice information with the target text;
if the matching degree of the text corresponding to the target voice information and the target text is greater than or equal to a preset matching degree, determining that the text matching result is that the text corresponding to the target voice information is matched with the target text;
and if the matching degree of the text corresponding to the target voice information and the target text is smaller than the preset matching degree, determining that the text matching result is that the text corresponding to the target voice information is not matched with the target text.
7. The voice unlocking method according to any one of claims 1 to 6, wherein the determining whether to unlock the device based on the text matching result comprises:
if the text matching result is that the text corresponding to the target voice information is matched with the target text, unlocking the equipment;
and if the text matching result is that the text corresponding to the target voice information is not matched with the target text, not unlocking the equipment.
8. A voice unlocking device, characterized in that the voice unlocking device comprises:
the establishing module is used for establishing a user voiceprint model according to the target voice information acquired at the current moment;
the matching module is used for matching the text corresponding to the target voice information with the target text to obtain a text matching result if the user voiceprint model is matched with the voiceprint recognition model;
and the unlocking module is used for determining whether to unlock the equipment or not based on the text matching result.
9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the steps of the voice unlocking method according to any one of claims 1 to 7 when executing the program stored in the memory.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the voice unlocking method according to any one of claims 1 to 7.
CN202210016005.8A 2022-01-07 2022-01-07 Voice unlocking method and device, electronic equipment and storage medium Pending CN114387968A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210016005.8A CN114387968A (en) 2022-01-07 2022-01-07 Voice unlocking method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210016005.8A CN114387968A (en) 2022-01-07 2022-01-07 Voice unlocking method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114387968A true CN114387968A (en) 2022-04-22

Family

ID=81199606

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210016005.8A Pending CN114387968A (en) 2022-01-07 2022-01-07 Voice unlocking method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114387968A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116504246A (en) * 2023-06-26 2023-07-28 深圳市矽昊智能科技有限公司 Voice remote control method, device, storage medium and device based on Bluetooth device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116504246A (en) * 2023-06-26 2023-07-28 深圳市矽昊智能科技有限公司 Voice remote control method, device, storage medium and device based on Bluetooth device
CN116504246B (en) * 2023-06-26 2023-11-24 深圳市矽昊智能科技有限公司 Voice remote control method, device, storage medium and device based on Bluetooth device

Similar Documents

Publication Publication Date Title
CN108597496B (en) Voice generation method and device based on generation type countermeasure network
EP3327720B1 (en) User voiceprint model construction method and apparatus
CN112562691B (en) Voiceprint recognition method, voiceprint recognition device, computer equipment and storage medium
WO2020181824A1 (en) Voiceprint recognition method, apparatus and device, and computer-readable storage medium
WO2017197953A1 (en) Voiceprint-based identity recognition method and device
US7133826B2 (en) Method and apparatus using spectral addition for speaker recognition
TW201935464A (en) Method and device for voiceprint recognition based on memorability bottleneck features
CN108922543B (en) Model base establishing method, voice recognition method, device, equipment and medium
CN111883140A (en) Authentication method, device, equipment and medium based on knowledge graph and voiceprint recognition
CN112102850A (en) Processing method, device and medium for emotion recognition and electronic equipment
CN116490920A (en) Method for detecting an audio challenge, corresponding device, computer program product and computer readable carrier medium for a speech input processed by an automatic speech recognition system
WO2021007856A1 (en) Identity verification method, terminal device, and storage medium
CN112509568A (en) Voice awakening method and device
CN113823293A (en) Speaker recognition method and system based on voice enhancement
CN114387968A (en) Voice unlocking method and device, electronic equipment and storage medium
CN111145748B (en) Audio recognition confidence determining method, device, equipment and storage medium
CN112652309A (en) Dialect voice conversion method, device, equipment and storage medium
CN113593580B (en) Voiceprint recognition method and device
CN113241059B (en) Voice wake-up method, device, equipment and storage medium
CN113870865A (en) Voiceprint feature updating method and device, electronic equipment and storage medium
CN109379499A (en) A kind of voice call method and device
CN114512133A (en) Sound object recognition method, sound object recognition device, server and storage medium
CN112885341A (en) Voice wake-up method and device, electronic equipment and storage medium
CN111508503B (en) Method and device for identifying same speaker
CN114400009B (en) Voiceprint recognition method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination