CN113241080A

CN113241080A - Automatic registration voiceprint recognition method and device

Info

Publication number: CN113241080A
Application number: CN202110649154.3A
Authority: CN
Inventors: 黄厚军; 钱彦旻
Original assignee: Sipic Technology Co Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2021-06-10
Filing date: 2021-06-10
Publication date: 2021-08-10

Abstract

The invention discloses a voiceprint recognition method and a voiceprint recognition device for automatic registration, wherein the voiceprint recognition method for automatic registration comprises the following steps: in response to the audio frequency containing the awakening words and the command words, extracting first voiceprint characteristics of the audio frequency, and judging whether the first voiceprint characteristics are matched with a voiceprint template in a registered voiceprint library or not; if not, storing the first voiceprint feature and the audio in a historical audio library, and judging whether the historical use record corresponding to the first voiceprint feature in the historical audio library is less than a preset number of times; and if not, automatically clustering the voiceprints of the audio corresponding to the first voiceprint characteristic in the historical audio library, automatically registering the voiceprints of the user corresponding to the first voiceprint characteristic based on the clustering result, and updating the registered voiceprint library. The voiceprint of the user corresponding to the first voiceprint feature is automatically registered based on the voiceprint clustering, and the registered voiceprint library is updated, so that the convenience of using the voiceprint recognition function and the recognition accuracy of the user can be greatly improved.

Description

Automatic registration voiceprint recognition method and device

Technical Field

The invention belongs to the field of voice processing technology, and particularly relates to an automatic registration voiceprint recognition method and device.

Background

In the related art, voiceprint recognition in the market currently requires a user to actively complete voiceprint collection (registration), and is based on a wakeup word or a command word, and a scheme based on fusion of the two is not provided.

Based on the text-related voiceprint recognition of the awakening words (the content of each equipment awakening word is fixed), a user firstly records the awakening words 3-5 times on the equipment according to the requirement of a registration process, voiceprint collection is completed, and a speaker template of the user is registered; in the verification stage, the user records the awakening words which are the same as the awakening words in the registration once in the equipment, the voiceprint characteristics are extracted, the voiceprint characteristics are compared with the speaker template in the database, whether the tester is the registered user or not is judged, and the defect is that the terminal user needs to actively cooperate with the registration, so that the use convenience of the equipment is reduced.

Based on text-independent voiceprint recognition of instruction words (instruction contents sent by a user cannot be predicted in advance), the user firstly needs to record free text for about 10 seconds on equipment according to a registration flow requirement, voiceprint collection is completed, and a speaker template of the user is registered; and in the verification stage, a user records an instruction of any content on the equipment, extracts voiceprint characteristics, compares the voiceprint characteristics with a speaker template in a database, and judges whether the test person is a registered user or not, wherein the defect is that the voiceprint identification precision is low.

Disclosure of Invention

The embodiment of the invention provides a voiceprint recognition method and device for automatic registration, which are used for solving at least one of the technical problems.

In a first aspect, an embodiment of the present invention provides an automatic registration voiceprint recognition method, including: in response to the audio frequency containing the awakening words and the command words, extracting first voiceprint characteristics of the audio frequency, and judging whether the first voiceprint characteristics are matched with a voiceprint template in a registered voiceprint library or not; if not, storing the first voiceprint feature and the audio frequency into a historical audio frequency library, and judging whether the historical use record corresponding to the first voiceprint feature in the historical audio frequency library is less than a preset number of times; and if not, automatically clustering the voiceprints of the audio corresponding to the first voiceprint feature in the historical audio library, automatically registering the voiceprints of the user corresponding to the first voiceprint feature based on a clustering result, and updating the registered voiceprint library.

In a second aspect, an embodiment of the present invention provides an apparatus for automatically registering voiceprint recognition, including: the acquisition, extraction and judgment program module is configured to respond to the acquired audio frequency containing the awakening word and the command word, extract a first voiceprint feature of the audio frequency and judge whether the first voiceprint feature is matched with a voiceprint template in a registered voiceprint library or not; the storage judging program module is configured to store the first voiceprint feature and the audio frequency into a historical audio frequency library if the first voiceprint feature and the audio frequency are not matched, and judge whether the historical use record corresponding to the first voiceprint feature in the historical audio frequency library is less than a preset number of times; and the clustering registration updating program module is configured to automatically cluster the voiceprints of the audio corresponding to the first voiceprint characteristic in the historical audio library if the clustering registration updating program module is not smaller than the first voiceprint characteristic, automatically register the voiceprints of the users corresponding to the first voiceprint characteristic based on a clustering result and update the registered voiceprint library.

In a third aspect, an electronic device is provided, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of automatically registering voiceprint recognition of any of the embodiments of the present invention.

In a fourth aspect, the present invention further provides a computer program product, which includes a computer program stored on a non-volatile computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer executes the steps of the automatic registration voiceprint recognition method according to any one of the embodiments of the present invention.

According to the method and the device, the voiceprint features and the audios of the users which are not registered are stored in the historical audio library, whether the historical use records corresponding to the first voiceprint features in the historical audio library are smaller than the preset times or not is further judged, so that the voiceprints of the users corresponding to the first voiceprint features can be automatically registered based on voiceprint clustering, the registered voiceprint library is updated, and the voiceprints are registered more conveniently and accurately.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a flowchart of an automatic voiceprint registration recognition method according to an embodiment of the present invention;

fig. 2 is a flowchart of another method for automatically registering voiceprint recognition according to an embodiment of the present invention;

fig. 3 is a flowchart of another method for automatically registering voiceprint recognition according to an embodiment of the present invention;

fig. 4 is a flowchart illustrating automatic voiceprint registration recognition according to a specific example of the method for automatically registering voiceprint recognition according to an embodiment of the present invention;

fig. 5 is a block diagram of an apparatus for automatically registering voiceprint recognition according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a flowchart of an embodiment of an automatic enrollment voiceprint recognition method of the present application is shown.

As shown in fig. 1, in step 101, in response to acquiring an audio including a wakeup word and a command word, extracting a first voiceprint feature of the audio, and determining whether the first voiceprint feature matches a voiceprint template in a registered voiceprint library;

in step 102, if the first voiceprint feature and the audio are not matched, storing both the first voiceprint feature and the audio into a historical audio library, and judging whether a historical usage record corresponding to the first voiceprint feature in the historical audio library is less than a preset number of times;

in step 103, if not, automatically clustering the voiceprints of the audio corresponding to the first voiceprint feature in the historical audio library, automatically registering the voiceprint of the user corresponding to the first voiceprint feature based on the clustering result, and updating the registered voiceprint library.

In this embodiment, for step 101, in response to acquiring an audio including a wakeup word and a command word, the automatic registered voiceprint recognition device extracts a first voiceprint feature of the audio, and determines whether the first voiceprint feature matches a voiceprint template in a registered voiceprint library, for example, respectively extracts a median of the voiceprint feature of the wakeup word and the voiceprint feature of the command word as the first voiceprint feature, then matches the first voiceprint feature with the voiceprint feature in the registered voiceprint library, and determines whether the first voiceprint feature is a voiceprint feature of a registered person.

Then, for step 102, if the first voiceprint feature is not matched with the voiceprint feature in the registered voiceprint library, both the first voiceprint feature and the audio are stored in the historical audio library, whether the historical usage record corresponding to the first voiceprint feature in the historical audio library is less than the preset number of times is determined, for example, the voiceprint feature and the audio of each user are stored in the historical audio library and the historical usage record when each user uses the device, and after the first voiceprint feature and the audio are stored in the historical audio library, whether the historical usage record corresponding to the first voiceprint feature in the historical audio library is less than the preset number of times is determined.

Finally, as for step 103, if the historical usage record corresponding to the first voiceprint feature in the historical audio library is not less than the preset number of times, automatically clustering the voiceprints of the audio corresponding to the first voiceprint feature in the historical audio library, automatically registering the voiceprints of the user corresponding to the first voiceprint feature based on the clustering result and updating the registered voiceprint library, for example, classifying the historical records by adopting a speaker clustering mode, and then allocating a speaker ID to each type of registered voiceprint template and updating the registered voiceprint template library.

According to the method, the voiceprint features and the audios of the users which are not registered are stored in the historical audio library, whether the historical use record corresponding to the first voiceprint feature in the historical audio library is smaller than the preset times or not is further judged, so that the voiceprints of the users corresponding to the first voiceprint feature can be automatically registered based on voiceprint clustering, the registered voiceprint library is updated, and the voiceprints are more convenient and accurate to register.

In the method according to the foregoing embodiment, the extracting, in response to acquiring an audio that includes a wakeup word and a command word, a first voiceprint feature of the audio includes:

and extracting a second voiceprint feature containing the awakening word and a third voiceprint feature containing the command word, and fusing the mean value of the second voiceprint feature and the third voiceprint feature into the first voiceprint feature.

The method of the embodiment can improve the accurate path of speaker clustering and the equipment awakening rate by fusing the voiceprint characteristics of the awakening words and the voiceprint characteristics of the command words.

With further reference to fig. 2, a flow chart of another method for automatically registering voiceprint recognition provided by an embodiment of the present application is shown. The flow chart is mainly a flow chart of steps further defined in the flow chart 1 "in response to the step of acquiring the audio containing the wake-up word and the command word, and extracting the first voiceprint feature of the audio".

As shown in fig. 2, in step 201, the user is asked whether to turn on voiceprint recognition;

in step 202, in response to the user confirming to start the voiceprint recognition instruction, extracting first voiceprint features of a wakeup word and a command word;

in step 203, in response to the user confirming not to turn on the voiceprint recognition instruction, not extracting the first voiceprint features of the wake-up word and the command word.

In this embodiment, for step 201, the automatically registered voiceprint recognition means asks the user whether to turn on voiceprint recognition, for example after initialization of the device or after restarting the device.

Then, for step 202, the automatic registration voiceprint recognition apparatus extracts the first voiceprint feature of the wake-up word and the command word in response to the user confirmation to turn on the voiceprint recognition instruction, for example, if the user confirms to turn on the voiceprint recognition, the voiceprint feature is automatically extracted and registration is automatically completed.

Finally, for step 203, the automatic enrollment voiceprint recognition means does not extract the first voiceprint feature of the wake up word and the command word in response to the user confirming that voiceprint recognition is not enabled, e.g., voiceprint extraction and enrollment will not be automatically performed if the user confirms that voiceprint recognition is not enabled.

The method of the embodiment can automatically extract the voiceprint characteristics of the user and complete the registration by only needing the user to confirm to start the voiceprint recognition.

With further reference to fig. 3, a flow chart of yet another method for automatically registering voiceprint recognition provided by an embodiment of the present application is shown. The flow chart is mainly a flow chart of the steps further defined for the flow chart 1 "judging whether the first voiceprint feature is matched with the voiceprint template in the registered voiceprint library".

As shown in fig. 3, in step 301, it is determined whether the registered voiceprint library is empty;

in step 302, if the registered voiceprint library is empty, it is determined that the first voiceprint feature does not match the voiceprint template in the registered voiceprint library.

In this embodiment, for step 301, the automatic registered voiceprint recognition device determines whether the registered voiceprint library is empty, for example, the registered voiceprint library is empty when the voiceprint recognition is started for the first time, or after the device is initialized, or the history of the user in the history audio library is less than the history number of automatic registration.

Thereafter, for step 302, if the registered voiceprint library is empty, it is determined that the first voiceprint feature does not match the voiceprint template in the registered voiceprint library.

The method of the embodiment judges that the first voiceprint feature is not matched with the voiceprint template in the registered voiceprint library by judging whether the registered voiceprint library is empty, so that the first voiceprint feature and the audio corresponding to the first voiceprint feature can be stored in the historical audio library.

In the method in the foregoing embodiment, after determining whether the first voiceprint feature matches a voiceprint template in a registered voiceprint library, the method further includes:

and if so, returning the identity ID of the user corresponding to the first voiceprint feature.

In the method of the embodiment, when the first voiceprint feature is matched with the voiceprint template in the registered voiceprint library, the identity ID of the user corresponding to the first voiceprint feature is returned, so that unnecessary voiceprint features and audio can be prevented from being stored in the historical audio library.

In the method in the foregoing embodiment, after the determining whether the historical usage records in the historical audio library corresponding to the first voiceprint feature are less than a preset number, the method further includes:

and if the historical use records corresponding to the first voiceprint features in the historical audio library are less than the preset number, returning unregistered information of the user.

In the method according to the foregoing embodiment, the automatically clustering voiceprints of the audio corresponding to the first voiceprint feature in the historical audio library includes:

based on the extracted first voiceprint feature, a speaker clustering technology is adopted to register a voiceprint template for a class corresponding to the first voiceprint feature according to historical record classification and allocate an identity ID, and a registered voiceprint library is updated, wherein each class comprises a user use audio, and different classes comprise different users.

The method of the embodiment registers the voiceprint template to the class corresponding to the first voiceprint characteristic according to the history record classification by adopting the speaker clustering technology, allocates an identity ID, and updates the registered voiceprint library, so that the convenience and the identification accuracy of the user using the voiceprint identification function can be greatly improved.

It should be noted that the above method steps are not intended to limit the execution order of the steps, and in fact, some steps may be executed simultaneously or in the reverse order of the steps, which is not limited herein.

The following description is provided to enable those skilled in the art to better understand the present disclosure by describing some of the problems encountered by the inventors in implementing the present disclosure and by describing one particular embodiment of the finally identified solution.

The inventor finds that the defects in the prior art are mainly caused by the following reasons in the process of implementing the application:

voiceprint recognition requires voiceprint information of registrars to be collected, and products on the market at present require users to complete registration in quiet environments according to registration requirements. Every terminal user who needs to use the voiceprint for identity authentication needs to actively register, and if the user needs to re-register, the convenience of the voiceprint is poor.

Most of voice assistants of intelligent devices (intelligent televisions, intelligent sound boxes, intelligent air conditioners and the like) in the current market need to firstly shout a fixed awakening word and then speak an instruction, so that the devices can respond to the instruction of a user. The common text-related voiceprint recognition based on the awakening words and the text-unrelated voiceprint recognition based on the instruction words on the market are not ideal in accuracy.

The problems caused by these drawbacks are long standing problems in the field.

The inventor also finds that no solution is seen at present for the voiceprint recognition that the voiceprint information of the registrant must be collected; aiming at the fact that most of voice assistants of intelligent equipment in the current market need to firstly shout a fixed awakening word, practitioners in the industry mainly research and improve the recognition rate of text-related and text-unrelated voiceprint recognition models through different-depth learning methods.

The biggest difficulty of the scheme applied by the user is how to automatically and accurately complete the registration of the speaker by using the historical use data of the user without the cooperation of the user.

The scheme of the application is mainly designed and optimized from the following aspects:

when the user uses the voice assistant on the intelligent device, the user shouts a wakeup word to the device to wake up the device, and then speaks an operation instruction to the device. At first, based on the text-related voiceprint recognition of the awakening words, the voiceprint recognition system judges the identity of a speaker according to the awakening words spoken by a user, and then selects a corresponding user portrait (one user portrait records the use habits of the user) according to the recognition result to respond to the following operation instructions. However, the actual product finds that the accuracy of the voiceprint recognition system cannot meet the requirement of high-precision identity authentication in a complex scene, so a scheme of fusing a wakeup word and an instruction word is provided, and the information of the instruction word is also used for assisting voiceprint recognition.

No matter the scheme is a text-related voiceprint recognition scheme based on the awakening words or a scheme based on the fusion of the awakening words and the instruction words, the active cooperation registration of the terminal users is required, but in the product release process, the fact that the registration is troublesome is found, and particularly, the old and children are more difficult to cooperate to complete the registration, so that a lot of users give up the voiceprint recognition function. Therefore, the user portrait of each person in the house cannot be accurately generated on the equipment, and the effect of personalized service is influenced. Aiming at the problem, an automatic registration voiceprint recognition scheme is provided, after the consent of the user is solicited, the use audio of the user on the equipment is collected, speaker clustering is started after the historical use audio reaches M pieces, the audios of a plurality of users are separated, and voiceprint template registration is automatically completed by the historical use audio of each user. By adopting the scheme of fusing the awakening words and the instruction words, the clustering accuracy of the speakers is greatly improved, and the registration effect of the voiceprint template is ensured.

Referring to fig. 4, it shows a flowchart of automatic voiceprint registration recognition of a specific example of the method for automatic voiceprint registration recognition according to an embodiment of the present invention.

As shown in fig. 4, step 1: checking whether the user agrees to start the voiceprint recognition function, and if not, ending the process; if so, continue with step 2.

Step 2: and receiving the audio of the awakening word and the command word of the current user, respectively extracting the voiceprint features, and then averaging the two voiceprint features to obtain the fused voiceprint features. Step 3 is entered.

And step 3: it is checked whether the registered voiceprint library on the device is empty. If the value is empty, go to step 4. If not, comparing the voiceprint features extracted in the step (2) with the voiceprint template in the registered voiceprint library, judging whether the speaker in the operation comes from the registered speaker, if so, returning the ID of the speaker, and ending the process; if not, go to step 4.

And 4, step 4: and (3) storing the currently received awakening words and command word audio and the voiceprint characteristics extracted in the step (2) into a user historical use database. Checking whether the number of records of the historical use database of the user reaches M, if not, returning a message that the current speaker is an unregistered person, and ending the process; if so, starting an automatic clustering function, classifying the historical records by adopting a speaker clustering technology based on the voiceprint characteristics extracted from the audio (each class comprises the use audio of one user, and different classes comprise different users), then distributing a speaker ID to each class of registered voiceprint templates, updating the registered voiceprint template library, finally returning the message that the current speaker is the unregistered speaker, and ending the process.

And 3, the voiceprint characteristics fused with the awakening words and the instruction words are extracted in the step 2, and the voiceprint registration and test are performed by adopting the voiceprint characteristics fused with the awakening words and the instruction words in the steps 3 and 4, so that the accuracy rate is higher than that of the voiceprint purely based on the awakening words or the instruction words in the market. Under the condition that the false rejection rate (the proportion of the registered users that are mistaken for the unregistered persons when in use) is fixed at 5%, the false acceptance rate of the text-dependent voiceprint recognition system based on the wake-up word (the proportion of the unregistered users that are mistaken for the registered persons when in use) is 0.4%, the false acceptance rate of the text-independent voiceprint recognition system based on the instruction word is 4%, and the false acceptance rate of the voiceprint recognition system based on the fusion of the wake-up word and the instruction word is 0.1%. The fusion scheme can reduce the false acceptance rate of the voiceprint recognition system to 25% of the existing scheme.

In the whole process, the user is not required to actively cooperate with registration, the user only needs to agree to start the voiceprint recognition function, and the voiceprint registration is not automatically completed according to historical data used by the user in the previous period in the step 4, so that the convenience of using the voiceprint recognition function by the user is greatly improved. The clustering accuracy of the speakers based on the combination of the awakening words and the instruction words and the voiceprint characteristics can reach more than 99.5% (the voiceprint characteristics of the awakening words are independently used, the clustering accuracy of the speakers can reach 98.5%, the voiceprint characteristics of the instruction words are independently used, the clustering accuracy of the speakers can reach 96%), and the voiceprint template recognition effect of automatic registration can be guaranteed.

Beta version formed by the inventor in the process of implementing the invention:

on some products which can only obtain the audio frequency of the awakening words or the instruction words, the voiceprint characteristics of the awakening words and the instruction words in the optimal scheme drawn in the figure 4 can be changed into the single voiceprint characteristics based on the awakening words or the instruction words, the automatic voiceprint registration recognition scheme can be realized, and the effect of the whole scheme is poor.

In this patent, the speaker ID is automatically assigned by the system, only a number, and does not know who the user is specifically. On the conditional product, after the automatic registration is completed, the next time the user uses it, step 3, finds that when the user belongs to the registered person and the registered person has not been named by the user, the user can ask whether the user's opinion should give the speaker ID a name (such as dad, mom, etc.), and if the user agrees to do the naming, several registered audio user references of the ID are played to determine who's voice. After the user starts the name, the system can directly call the user by the name when responding to the instruction given to the user, and the system is more humanized.

Referring to fig. 5, a block diagram of an apparatus for automatically registering voiceprint recognition according to an embodiment of the present invention is shown.

As shown in fig. 5, the automatic registration voiceprint recognition apparatus 500 includes an acquisition and extraction determination program module 510, a logging determination program module 520, and a cluster registration update program module 530.

The acquiring, extracting and determining program module 510 is configured to, in response to acquiring an audio including a wakeup word and a command word, extract a first voiceprint feature of the audio, and determine whether the first voiceprint feature matches a voiceprint template in a registered voiceprint library; a storage determination program module 520 configured to store the first voiceprint feature and the audio in a historical audio library if the first voiceprint feature and the audio are not matched, and determine whether a historical usage record corresponding to the first voiceprint feature in the historical audio library is less than a preset number of times; and a cluster registration update program module 530 configured to, if the voice print is not smaller than the threshold value, perform automatic voice print clustering on the audio corresponding to the first voice print feature in the historical audio library, automatically register the voice print of the user corresponding to the first voice print feature based on the clustering result, and update the registered voice print library.

It should be understood that the modules recited in fig. 5 correspond to various steps in the methods described with reference to fig. 1, 2, and 3. Thus, the operations and features described above for the method and the corresponding technical effects are also applicable to the modules in fig. 5, and are not described again here.

It should be noted that the modules in the embodiments of the present disclosure are not limited to the scheme of the present disclosure, for example, the obtaining extraction determining program module may be described as a module that, in response to obtaining an audio that includes a wakeup word and a command word, extracts a first voiceprint feature of the audio, and determines whether the first voiceprint feature matches a voiceprint template in a registered voiceprint library. In addition, the related function module may also be implemented by a hardware processor, for example, the module for obtaining the extraction judgment program may also be implemented by a processor, which is not described herein again.

In other embodiments, an embodiment of the present invention further provides a non-volatile computer storage medium, where the computer storage medium stores computer-executable instructions, and the computer-executable instructions may execute the automatic registration voiceprint recognition method in any of the above method embodiments;

as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:

in response to the audio frequency containing the awakening words and the command words, extracting first voiceprint characteristics of the audio frequency, and judging whether the first voiceprint characteristics are matched with a voiceprint template in a registered voiceprint library or not;

if not, storing the first voiceprint feature and the audio frequency into a historical audio frequency library, and judging whether the historical use record corresponding to the first voiceprint feature in the historical audio frequency library is less than a preset number of times;

and if not, automatically clustering the voiceprints of the audio corresponding to the first voiceprint feature in the historical audio library, automatically registering the voiceprints of the user corresponding to the first voiceprint feature based on a clustering result, and updating the registered voiceprint library.

The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the automatically registered voiceprint recognition apparatus, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the auto-enrollment voiceprint recognition device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Embodiments of the present invention also provide a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform any of the above-described automatic registration voiceprint recognition methods.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 6, the electronic device includes: one or more processors 610 and a memory 620, with one processor 610 being an example in fig. 6. The apparatus for automatically registering a voiceprint recognition method may further include: an input device 630 and an output device 640. The processor 610, the memory 620, the input device 630, and the output device 640 may be connected by a bus or other means, such as the bus connection in fig. 6. The memory 620 is a non-volatile computer-readable storage medium as described above. The processor 610 executes various functional applications and data processing of the server by running the nonvolatile software programs, instructions and modules stored in the memory 620, that is, the automatic registration voiceprint recognition method of the above method embodiment is realized. The input device 630 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the communication compensation device. The output device 640 may include a display device such as a display screen.

The product can execute the method provided by the embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.

As an embodiment, the electronic device is applied to an automatic registration voiceprint recognition apparatus, and is used for a client, and includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to:

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc., such as ipads.

(3) A portable entertainment device: such devices can display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.

(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.

(5) And other electronic devices with data interaction functions.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An automatic registration voiceprint recognition method, comprising:

2. The method of claim 1, wherein said extracting, in response to obtaining audio containing a wake word and a command word, a first voiceprint feature of the audio comprises:

and extracting a second voiceprint feature containing the awakening word and a third voiceprint feature containing the command word, and fusing the second voiceprint feature and the third voiceprint feature into the first voiceprint feature by taking the mean value.

3. The method of claim 1, wherein prior to said extracting a first voiceprint feature of audio containing a wake word and a command word in response to obtaining the audio, further comprising:

inquiring whether the user starts voiceprint recognition;

responding to the user confirmation starting voiceprint recognition instruction, and extracting first voiceprint characteristics of a wakeup word and a command word;

and in response to the user confirming not to turn on the voiceprint recognition instruction, not extracting the first voiceprint features of the wake-up word and the command word.

4. The method of claim 1, wherein the determining whether the first voiceprint feature matches a voiceprint template in a registered voiceprint library comprises:

judging whether the registered voiceprint library is empty or not;

and if the registered voiceprint library is empty, judging that the first voiceprint characteristic is not matched with the voiceprint template in the registered voiceprint library.

5. The method of claim 1, wherein after determining whether the first voiceprint feature matches a voiceprint template in a registered voiceprint library, the method further comprises:

6. The method of claim 1, wherein after the determining whether the historical usage records in the historical audio library corresponding to the first voiceprint feature are less than a preset number, further comprising:

7. The method of claim 1, wherein the automatically clustering voiceprints for audio in the historical audio library corresponding to the first voiceprint feature comprises:

based on the extracted first voiceprint feature, a speaker clustering technology is adopted to register a voiceprint template with a class corresponding to the first voiceprint feature according to historical record classification, an identity ID is distributed, and a registered voiceprint library is updated.

8. An automatic enrollment voiceprint recognition device comprising:

the acquisition, extraction and judgment program module is configured to respond to the acquired audio frequency containing the awakening word and the command word, extract a first voiceprint feature of the audio frequency and judge whether the first voiceprint feature is matched with a voiceprint template in a registered voiceprint library or not;

the storage judging program module is configured to store the first voiceprint feature and the audio frequency into a historical audio frequency library if the first voiceprint feature and the audio frequency are not matched, and judge whether the historical use record corresponding to the first voiceprint feature in the historical audio frequency library is less than a preset number of times;

and the clustering registration updating program module is configured to automatically cluster the voiceprints of the audio corresponding to the first voiceprint characteristic in the historical audio library if the clustering registration updating program module is not smaller than the first voiceprint characteristic, automatically register the voiceprints of the users corresponding to the first voiceprint characteristic based on a clustering result and update the registered voiceprint library.

9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1 to 7.

10. A storage medium having stored thereon a computer program, characterized in that the program, when being executed by a processor, is adapted to carry out the steps of the method of any one of claims 1 to 7.