CN111161746B - Voiceprint registration method and system - Google Patents

Voiceprint registration method and system Download PDF

Info

Publication number
CN111161746B
CN111161746B CN201911409832.8A CN201911409832A CN111161746B CN 111161746 B CN111161746 B CN 111161746B CN 201911409832 A CN201911409832 A CN 201911409832A CN 111161746 B CN111161746 B CN 111161746B
Authority
CN
China
Prior art keywords
voice
environmental audio
registered text
user
registration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911409832.8A
Other languages
Chinese (zh)
Other versions
CN111161746A (en
Inventor
顾向涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
Sipic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sipic Technology Co Ltd filed Critical Sipic Technology Co Ltd
Priority to CN201911409832.8A priority Critical patent/CN111161746B/en
Publication of CN111161746A publication Critical patent/CN111161746A/en
Application granted granted Critical
Publication of CN111161746B publication Critical patent/CN111161746B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the invention provides a voiceprint registration method. The method comprises the following steps: playing the voiceprint registration flow to a user through voice interaction to conduct voiceprint registration guidance, collecting environmental audio while conducting the voiceprint registration guidance, and prompting the user to speak a registration text when the duration of the collected environmental audio meets a preset threshold; carrying out echo cancellation on the collected environmental audio, and identifying the registered text voice input by the user; determining the signal-to-noise ratio, the speech speed and the amplitude interception as scoring factors, and determining the comprehensive scoring of the registered text speech through the scoring factors; and when the comprehensive score reaches a preset score threshold value, storing the environmental audio collected during voiceprint registration guidance into an environmental audio database, and registering the voiceprint of the registered text voice. The embodiment of the invention also provides a voiceprint registration system. The embodiment of the invention improves the audio frequency identification and the accuracy of signal-to-noise ratio calculation during voiceprint registration, hides the collection of environmental sounds in continuous voice guidance, and prevents the interference of users on the environmental audio frequency.

Description

Voiceprint registration method and system
Technical Field
The invention relates to the field of intelligent voice, in particular to a voiceprint registration method and a voiceprint registration system.
Background
With the development of smart speech, voiceprints can be used for authentication. Before voiceprint verification, voiceprint enrollment needs to be performed. In the process of voiceprint registration, generally, the energy of background noise of the environment is collected through calculation, whether the registration environment is quiet or not is judged, the quality of the registered audio is checked based on the signal-to-noise ratio calculation of audio endpoint detection, and voiceprint registration is carried out on the audio meeting the requirements.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:
the time consumed for collecting the environment sound is long, friendly registration human-computer interaction experience cannot be achieved, the environment sound and the human sound cannot be correctly distinguished under the condition of noisy environment, the signal-to-noise ratio is inaccurate to calculate, the speech speed cannot be accurately estimated, and the accurate identification effect in the voiceprint identification stage is influenced.
Disclosure of Invention
The method and the device aim to at least solve the problems that in the prior art, the time consumption for collecting the environmental sound is long, the user experience is influenced, the signal-to-noise ratio in the registration stage is not accurately calculated, and the accurate recognition effect of the voiceprint recognition stage is influenced by the speech speed.
In a first aspect, an embodiment of the present invention provides a voiceprint registration method, including:
playing a voiceprint registration flow to a user through voice interaction to conduct voiceprint registration guidance, collecting environmental audio while conducting the voiceprint registration guidance, and prompting the user to speak out a registration text when the duration of the collected environmental audio meets a preset threshold, wherein the voice interaction duration of the voiceprint registration guidance is matched with the duration of the collected environmental audio;
carrying out echo cancellation on the collected environmental audio to remove self-noise of the registered text voice and identify the registered text voice input by the user;
determining a signal-to-noise ratio of the registered text voice based on an environmental audio database, detecting a speech speed and an amplitude interception of the registered text voice, determining the signal-to-noise ratio, the speech speed and the amplitude interception as scoring factors, and determining a comprehensive score of the registered text voice through the scoring factors;
and when the comprehensive score reaches a preset score threshold value, storing the environmental audio collected during voiceprint registration guidance to the environmental audio database, and carrying out voiceprint registration on the registered text voice.
In a second aspect, an embodiment of the present invention provides a voiceprint registration method, including:
playing a voiceprint registration flow to a user through interface interaction to conduct voiceprint registration guidance, collecting environmental audio while conducting the voiceprint registration guidance, and prompting the user to speak out a registration text when the duration of the collected environmental audio meets a preset threshold, wherein the duration of the voiceprint registration guidance interface interaction and the user interaction flow is matched with the duration of the collected environmental audio;
recognizing a registered text voice input by a user;
determining a signal-to-noise ratio of the registered text voice based on an environmental audio database, detecting a speech speed and an amplitude interception of the registered text voice, determining the signal-to-noise ratio, the speech speed and the amplitude interception as scoring factors, and determining a comprehensive score of the registered text voice through the scoring factors;
and when the comprehensive score reaches a preset score threshold value, storing the environmental audio collected during voiceprint registration guidance to the environmental audio database, and carrying out voiceprint registration on the registered text voice.
In a third aspect, an embodiment of the present invention provides a voiceprint registration system, including:
the system comprises an environmental audio acquisition program module, a voice print registration processing module and a registration processing module, wherein the environmental audio acquisition program module is used for playing a voice print registration flow to a user through voice interaction to perform voice print registration guidance, acquiring environmental audio while the voice print registration guidance is performed, and prompting the user to speak out a registration text when the duration of the acquired environmental audio meets a preset threshold, wherein the voice interaction duration of the voice print registration guidance is matched with the duration of the acquired environmental audio;
the registered text voice recognition program module is used for carrying out echo cancellation on the collected environmental audio so as to remove the self-noise of the registered text voice and recognize the registered text voice input by the user;
a comprehensive grading determination program module, configured to determine a signal-to-noise ratio of the registered text voice based on an environmental audio database, detect a speech rate and an amplitude truncation of the registered text voice, determine the signal-to-noise ratio, the speech rate, and the amplitude truncation as grading factors, and determine a comprehensive grading of the registered text voice through the grading factors;
and the voiceprint registration program module is used for storing the environmental audio collected during voiceprint registration guidance to the environmental audio database and carrying out voiceprint registration on the registered text voice when the comprehensive score reaches a preset score threshold value.
In a fourth aspect, an embodiment of the present invention provides a voiceprint registration system, including:
the system comprises an environmental audio acquisition program module, a voice print registration program module and a voice print registration processing module, wherein the environmental audio acquisition program module is used for playing a voice print registration flow to a user through interface interaction to perform voice print registration guidance, acquiring environmental audio while the voice print registration guidance is performed, and prompting the user to speak out a registration text when the time length of the acquired environmental audio meets a preset threshold, wherein the time length of the user interaction flow in the interface interaction of the voice print registration guidance is matched with the time length of the acquired environmental audio;
the registered text voice recognition program module is used for recognizing the registered text voice input by the user;
a comprehensive grading determination program module, configured to determine a signal-to-noise ratio of the registered text voice based on an environmental audio database, detect a speech rate and an amplitude truncation of the registered text voice, determine the signal-to-noise ratio, the speech rate, and the amplitude truncation as grading factors, and determine a comprehensive grading of the registered text voice through the grading factors;
and the voiceprint registration program module is used for storing the environmental audio collected during voiceprint registration guidance to the environmental audio database and carrying out voiceprint registration on the registered text voice when the comprehensive score reaches a preset score threshold value.
In a fifth aspect, an electronic device is provided, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the voiceprint registration method of any of the embodiments of the present invention.
In a sixth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is configured to, when executed by a processor, implement the steps of the voiceprint registration method according to any one of the embodiments of the present invention.
The embodiment of the invention has the beneficial effects that: in the parameter extraction stage, the speech speed and the cut-off are considered more, and the speech speed and the cut-off do not belong to the parameters of the audio quality actually, but the speech speed and the cut-off affect the recognition after voiceprint registration, two factors are introduced, and the audio recognition performance during voiceprint registration is better. Meanwhile, an optimized signal-to-noise ratio calculation algorithm is considered, and the environment audio with high evaluation score is cached in a signal-to-noise ratio environment audio database and is continuously iterated, so that the accuracy of signal-to-noise ratio calculation is improved. The voiceprint registration process can be placed anywhere in the steps, but it is contemplated by the inventors that after the voiceprint registration, the user may not be given a separate time to acquire, possibly disrupting and disturbing the ambient audio. The intermediate step can prolong the time of parameter processing, and at this time, it is difficult to ensure that the user does not interfere with the environmental sound. Before the user registers the text voice, the guide is originally required, the inventor only appropriately prolongs the guide, hides the collection of the environmental sound in the guide, the client does not feel the sense of the noise, and the user is prevented from inserting the mouth through continuous voice guide.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of a voiceprint registration method according to an embodiment of the present invention;
fig. 2 is a flowchart of a voiceprint registration method according to another embodiment of the present invention;
fig. 3 is a schematic structural diagram of a voiceprint registration system according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a voiceprint registration system according to another embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a voiceprint registration method according to an embodiment of the present invention, which includes the following steps:
s11: playing a voiceprint registration flow to a user through voice interaction to conduct voiceprint registration guidance, collecting environmental audio while conducting the voiceprint registration guidance, and prompting the user to speak out a registration text when the duration of the collected environmental audio meets a preset threshold, wherein the voice interaction duration of the voiceprint registration guidance is matched with the duration of the collected environmental audio;
s12: carrying out echo cancellation on the collected environmental audio to remove self-noise of the registered text voice and identify the registered text voice input by the user;
s13: determining a signal-to-noise ratio of the registered text voice based on an environmental audio database, detecting a speech speed and an amplitude interception of the registered text voice, determining the signal-to-noise ratio, the speech speed and the amplitude interception as scoring factors, and determining a comprehensive score of the registered text voice through the scoring factors;
s14: and when the comprehensive score reaches a preset score threshold value, storing the environmental audio collected during voiceprint registration guidance to the environmental audio database, and carrying out voiceprint registration on the registered text voice.
In this embodiment, the method may be adapted to various types of intelligent voice devices, such as an intelligent sound box and a smart phone, for example, if a user needs to use a voiceprint as a verification, the intelligent sound box may prompt the user to enter a voiceprint registration process.
For step S11, after the user enters the registration process, the voiceprint registration process is broadcasted to the user through the speaker of the smart speaker to guide how to register the voiceprint. In the guiding process, the corresponding voiceprint registration process can be continuously broadcasted to the user, for example, some prompts can be played to remind the user to keep quiet, the sound of a television or a computer is adjusted to be quiet, other noises are not required to be introduced, and the noises which can be referred to include the noises of a washing machine, the noises of a smoke exhaust ventilator, the noises of a sweeping robot and the like. The registration process is loading, and later, a user will ask for the content to be broadcasted by replying the prompted registration text and the like. The guidance is properly prolonged, the duration of the guidance is matched with the duration of the required fixed environment audio, so that the collection of the environment audio is hidden in the guidance, and the user is prompted to avoid surrounding noise through continuous voice guidance, and the user is prevented from inserting the mouth. This allows ambient audio of sufficient duration to be obtained.
For step S12, when the environmental audio is collected, the smart sound box continuously performs voice broadcast to the user, and the broadcast voice of the smart sound box itself may exist in the recorded environmental audio, and the broadcast voice here is noise for the environmental audio. Through an echo cancellation algorithm, the self broadcast voice in the environmental audio is eliminated, and therefore the purer environmental audio is obtained.
As an embodiment, the recognizing the registered text speech input by the user includes:
and carrying out voice recognition on the registered text voice input by the user and collected in real time, and stopping collection when the registered text voice input by the user is recognized.
In the embodiment, the collected audio of the registered text is sent to the voice recognition module in real time, and after the user finishes speaking the registered text, the microphone is stopped collecting the audio. And the position of each word of the registered text can be accurately marked in the audio, which is more accurate than the audio endpoint check.
For step S13, first, performing snr calculation, based on the assistance of the environmental audio database, separating the location information of the speech recognition from the audio to obtain the user speech segment and the environmental background noise, and calculating the snr of the current registered text audio by using snr algorithm;
as an embodiment, the detecting the speech rate and the framing of the registered text speech includes:
determining the speech rate of the registered text speech based on an audio endpoint checking algorithm;
and detecting whether the signal waveform of the registered text voice exceeds a preset maximum amplitude or not so as to determine whether the registered text voice is truncated or not.
In the present embodiment, the speech rate check is to mark the position information of each word in the registered text based on the speech recognition result, and obtain the average speech length of each word by dividing the audio length of the registered content by the word length. Therefore, the speaking speed of the user is judged through the preset threshold value. And when the speech rate check score is within a preset threshold interval, the speech rate of the user is considered to meet the requirement.
The cropping check is to calculate whether the audio exceeds the maximum amplitude of the microphone using the speech recognition tagged registered text audio.
And taking the parameters of the three schemes as the score factors of the registered text voice, and determining the comprehensive score of the registered text voice through the score factors.
The determining a composite score for the registered text speech by the scoring factor comprises:
determining an audio quality score for the entirety of the registered text speech based on the signal-to-noise ratio;
determining a first word recognition effect score for a word within the registered text speech based on the speech rate;
determining a second word recognition effect score for a word within the registered text speech based on the cropping;
determining a composite score for the registered text speech by the audio quality score, the first word recognition effect score, and the second word recognition effect score.
In this embodiment, the foregoing steps are said to calculate the snr of the registered text audio by using the snr algorithm, and after obtaining the snr, score the snr, for example, a perfect snr is set as 1, and if the determined snr is 70% of the perfect snr, the score of the snr can be determined as 70.
Similarly, a reference speech rate interval is set, and the speech rate of the user in the registered text speech is determined at which stage of the interval, for example, the reference speech rate interval is divided into 7 intervals, and the speech rate of each interval is increased by { [15 points ], [40 points ], [65 points ], [100 points ], [65 points ], [40 points ], [15 points ] }. Thus, a score of the speech rate can be obtained.
The same step, to determine the truncated second word recognition effect score.
The invention considers the speech speed and the cut width, which do not belong to the parameters of audio quality, but the inventor considers the recognition after the voiceprint registration, and introduces two factors, which can lead the audio recognition performance to be better when the voiceprint registration.
As an embodiment, when the composite score does not reach a preset score threshold, respectively detecting comparison results of the audio quality score, the first word recognition effect score and the second word recognition effect score with respective corresponding preset thresholds;
and feeding back the scoring factors which do not reach the corresponding preset threshold values to the user.
The comprehensive score does not reach the standard due to the fact that a certain score factor is too low, at the moment, the specific score factor is determined to not reach the standard, and if the audio quality score of the signal-to-noise ratio is insufficient, the user is prompted to perform the registered text voice input again; if the first word recognition effect score of the speed of speech is insufficient, prompting the user to speak a slow point, and performing the registered text speech input again; if the recognition effect score of the truncated second word is insufficient, the user is prompted to break the voice, and the registered text voice input is carried out again.
For step S14, when the composite score reaches the preset score threshold, it indicates that the quality of the environmental audio collected this time is good, and the collected environmental audio is stored in the environmental audio database, so that the environmental audio database is iterated continuously, and the accuracy of determining the signal-to-noise ratio is improved. The inventor finds that the signal-to-noise ratio score is high, the voice and the environmental sound can be separated on certain representatives, but the voice is not fixed, the separation performance is limited, and even if the judgment score is high, the environmental sound is supplemented into a database only more quickly and the precision is not obvious; according to the method and the device, the environmental audio played by the system is independently collected, an echo cancellation algorithm is customized, relatively complete environmental audio can be obtained, and the method and the device are beneficial to iteration. This may encourage the user to speak a higher quality enrollment audio at different levels in the next round of enrollment prompts. Thereby voice-print registering the registered text speech.
It can be seen from this embodiment that, in the parameter extraction stage, the speech rate and the clipping are considered more, and they do not actually belong to the parameters of the audio quality, but they affect the recognition after voiceprint registration, and two factors are introduced, which makes the audio recognition performance better when voiceprint registration. Meanwhile, an optimized signal-to-noise ratio calculation algorithm is considered, and the environment audio with high evaluation score is cached in a signal-to-noise ratio environment audio database and is continuously iterated, so that the accuracy of signal-to-noise ratio calculation is improved. The voiceprint registration process can be placed anywhere in the steps, but it is contemplated by the inventors that after the voiceprint registration, the user may not be given a separate time to acquire, possibly disrupting and disturbing the ambient audio. The intermediate step can prolong the time of parameter processing, and at this time, it is difficult to ensure that the user does not interfere with the environmental sound. Before the user registers the text voice, the guide is originally required, the inventor only appropriately prolongs the guide, hides the collection of the environmental sound in the guide, the client does not feel the sense of the noise, and the user is prevented from inserting the mouth through continuous voice guide.
Fig. 2 is a flowchart of a voiceprint registration method according to an embodiment of the present invention, which includes the following steps:
s21: playing a voiceprint registration flow to a user through interface interaction to conduct voiceprint registration guidance, collecting environmental audio while conducting the voiceprint registration guidance, and prompting the user to speak out a registration text when the duration of the collected environmental audio meets a preset threshold, wherein the duration of the voiceprint registration guidance interface interaction and the user interaction flow is matched with the duration of the collected environmental audio;
s22: recognizing a registered text voice input by a user;
s23: determining a signal-to-noise ratio of the registered text voice based on an environmental audio database, detecting a speech speed and an amplitude interception of the registered text voice, determining the signal-to-noise ratio, the speech speed and the amplitude interception as scoring factors, and determining a comprehensive score of the registered text voice through the scoring factors;
s24: and when the comprehensive score reaches a preset score threshold value, storing the environmental audio collected during voiceprint registration guidance to the environmental audio database, and carrying out voiceprint registration on the registered text voice.
In the present embodiment, the method can be adapted to various types of smart voice devices, for example, smart devices with screens, such as smart phones. If the user needs to use the voiceprint as a check, the smart speaker prompts the user to enter a voiceprint registration process.
In step S21, after the user enters the registration process, the user plays the picture through the interface of the smart phone, during the guidance process, various interfaces are continuously broadcast to the user, the voiceprint registration process is displayed through different interfaces, and after multiple interfaces are displayed, it is ensured that the environmental audio is collected for a sufficient time. After the played content is considered, the playing time of the whole content is matched with the time of collecting the environmental audio. Therefore, the self-noise of the mobile phone can be avoided, and the environmental audio can be obtained.
With step S22, since step S21 uses interface interaction, no self-noise is introduced, and thus echo cancellation is not necessary.
Steps S23 and S24 are the same as steps S13 and S14, and are not repeated here.
According to the embodiment, when a user listens to the speech, the user can not speak according to the content of the interface, self-noise of equipment can be avoided by using interface interaction, and the environmental audio is enabled to be purer.
Fig. 3 is a schematic structural diagram of a voiceprint registration system according to an embodiment of the present invention, which can execute the voiceprint registration method according to any of the above embodiments and is configured in a terminal.
The voiceprint registration system provided by the embodiment comprises: an environmental audio collection program module 11, a registered text speech recognition program module 12, a comprehensive rating determination program module 13 and a voiceprint registration program module 14.
The environment audio acquisition program module 11 is configured to play a voiceprint registration process to a user through voice interaction to perform voiceprint registration guidance, acquire an environment audio while the voiceprint registration guidance is performed, and prompt the user to speak a registration text when a duration of the acquired environment audio meets a preset threshold, where a voice interaction duration of the voiceprint registration guidance is matched with a duration of the acquired environment audio; the registered text voice recognition program module 12 is configured to perform echo cancellation on the collected environmental audio to remove self-noise of the registered text voice, and recognize the registered text voice input by the user; the comprehensive score determining program module 13 is configured to determine a signal-to-noise ratio of the registered text speech based on the environmental audio database, detect a speech rate and an amplitude truncation of the registered text speech, determine the signal-to-noise ratio, the speech rate, and the amplitude truncation as score factors, and determine a comprehensive score of the registered text speech by the score factors; and the voiceprint registration program module 14 is configured to, when the comprehensive score reaches a preset score threshold, store the environmental audio collected during voiceprint registration guidance to the environmental audio database, and perform voiceprint registration on the registered text voice.
The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the voiceprint registration method in any method embodiment;
as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
playing a voiceprint registration flow to a user through voice interaction to conduct voiceprint registration guidance, collecting environmental audio while conducting the voiceprint registration guidance, and prompting the user to speak out a registration text when the duration of the collected environmental audio meets a preset threshold, wherein the voice interaction duration of the voiceprint registration guidance is matched with the duration of the collected environmental audio;
carrying out echo cancellation on the collected environmental audio to remove self-noise of the registered text voice and identify the registered text voice input by the user;
determining a signal-to-noise ratio of the registered text voice based on an environmental audio database, detecting a speech speed and an amplitude interception of the registered text voice, determining the signal-to-noise ratio, the speech speed and the amplitude interception as scoring factors, and determining a comprehensive score of the registered text voice through the scoring factors;
and when the comprehensive score reaches a preset score threshold value, storing the environmental audio collected during voiceprint registration guidance to the environmental audio database, and carrying out voiceprint registration on the registered text voice.
Fig. 4 is a schematic structural diagram of a voiceprint registration system according to an embodiment of the present invention, which can execute the voiceprint registration method according to any of the above embodiments and is configured in a terminal.
The voiceprint registration system provided by the embodiment comprises: an ambient audio collection program module 21, a registered text speech recognition program module 22, a comprehensive rating determination program module 23 and a voiceprint registration program module 24.
The environment audio acquisition program module 21 is configured to play a voiceprint registration procedure to a user through interface interaction to perform voiceprint registration guidance, acquire an environment audio while the voiceprint registration guidance is performed, and prompt the user to speak a registration text when a duration of the acquired environment audio meets a preset threshold, where a duration of the user interaction procedure in the interface interaction of the voiceprint registration guidance is matched with a duration of the acquired environment audio; the registered text speech recognition program module 22 is used for recognizing the registered text speech input by the user; the comprehensive score determining program module 23 is configured to determine a signal-to-noise ratio of the registered text speech based on the environmental audio database, detect a speech rate and an amplitude truncation of the registered text speech, determine the signal-to-noise ratio, the speech rate, and the amplitude truncation as score factors, and determine a comprehensive score of the registered text speech by the score factors; and the voiceprint registration program module 24 is configured to, when the comprehensive score reaches a preset score threshold, store the environmental audio collected during voiceprint registration guidance to the environmental audio database, and perform voiceprint registration on the registered text voice.
The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the voiceprint registration method in any method embodiment;
as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
playing a voiceprint registration flow to a user through interface interaction to conduct voiceprint registration guidance, collecting environmental audio while conducting the voiceprint registration guidance, and prompting the user to speak out a registration text when the duration of the collected environmental audio meets a preset threshold, wherein the duration of the voiceprint registration guidance interface interaction and the user interaction flow is matched with the duration of the collected environmental audio;
recognizing a registered text voice input by a user;
determining a signal-to-noise ratio of the registered text voice based on an environmental audio database, detecting a speech speed and an amplitude interception of the registered text voice, determining the signal-to-noise ratio, the speech speed and the amplitude interception as scoring factors, and determining a comprehensive score of the registered text voice through the scoring factors;
and when the comprehensive score reaches a preset score threshold value, storing the environmental audio collected during voiceprint registration guidance to the environmental audio database, and carrying out voiceprint registration on the registered text voice.
As a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform a voiceprint registration method in any of the method embodiments described above.
The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the voiceprint registration method of any of the embodiments of the present invention.
The client of the embodiment of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as tablet computers.
(3) Portable entertainment devices such devices may display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.
(4) Other electronic devices with data processing capabilities.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A voiceprint registration method comprising:
playing a voiceprint registration flow to a user through voice interaction to conduct voiceprint registration guidance, collecting environmental audio while conducting the voiceprint registration guidance, and prompting the user to speak out a registration text when the duration of the collected environmental audio meets a preset threshold, wherein the voice interaction duration of the voiceprint registration guidance is matched with the duration of the collected environmental audio;
carrying out echo cancellation on the collected environmental audio to remove self-noise of the registered text voice and identify the registered text voice input by the user;
determining a signal-to-noise ratio of the registered text voice based on an environmental audio database, detecting a speech speed and an amplitude interception of the registered text voice, determining the signal-to-noise ratio, the speech speed and the amplitude interception as scoring factors, and determining a comprehensive score of the registered text voice through the scoring factors;
and when the comprehensive score reaches a preset score threshold value, storing the environmental audio collected during voiceprint registration guidance to the environmental audio database, and carrying out voiceprint registration on the registered text voice.
2. The method of claim 1, wherein the detecting the pace and the intercept of the registered text speech comprises:
determining the speech rate of the registered text speech based on an audio endpoint checking algorithm;
and detecting whether the signal waveform of the registered text voice exceeds a preset maximum amplitude or not so as to determine whether the registered text voice is truncated or not.
3. The method of claim 1, wherein said determining a composite score for the registered text speech by the scoring factor comprises:
determining an audio quality score for the entirety of the registered text speech based on the signal-to-noise ratio;
determining a first word recognition effect score for a word within the registered text speech based on the speech rate;
determining a second word recognition effect score for a word within the registered text speech based on the cropping;
determining a composite score for the registered text speech by the audio quality score, the first word recognition effect score, and the second word recognition effect score.
4. The method of claim 3, wherein the method further comprises: when the comprehensive score does not reach a preset score threshold, respectively detecting comparison results of the audio quality score, the first word recognition effect score and the second word recognition effect score with respective corresponding preset thresholds;
and feeding back the scoring factors which do not reach the corresponding preset threshold values to the user.
5. The method of claim 1, wherein the recognizing the user-entered registered text speech comprises:
and carrying out voice recognition on the registered text voice input by the user and collected in real time, and stopping collection when the registered text voice input by the user is recognized.
6. A voiceprint registration method comprising:
playing a voiceprint registration flow to a user through interface interaction to conduct voiceprint registration guidance, collecting environmental audio while conducting the voiceprint registration guidance, and prompting the user to speak out a registration text when the duration of the collected environmental audio meets a preset threshold, wherein the duration of the voiceprint registration guidance interface interaction and the user interaction flow is matched with the duration of the collected environmental audio;
recognizing a registered text voice input by a user;
determining a signal-to-noise ratio of the registered text voice based on an environmental audio database, detecting a speech speed and an amplitude interception of the registered text voice, determining the signal-to-noise ratio, the speech speed and the amplitude interception as scoring factors, and determining a comprehensive score of the registered text voice through the scoring factors;
and when the comprehensive score reaches a preset score threshold value, storing the environmental audio collected during voiceprint registration guidance to the environmental audio database, and carrying out voiceprint registration on the registered text voice.
7. A voiceprint registration system comprising:
the system comprises an environmental audio acquisition program module, a voice print registration processing module and a registration processing module, wherein the environmental audio acquisition program module is used for playing a voice print registration flow to a user through voice interaction to perform voice print registration guidance, acquiring environmental audio while the voice print registration guidance is performed, and prompting the user to speak out a registration text when the duration of the acquired environmental audio meets a preset threshold, wherein the voice interaction duration of the voice print registration guidance is matched with the duration of the acquired environmental audio;
the registered text voice recognition program module is used for carrying out echo cancellation on the collected environmental audio so as to remove the self-noise of the registered text voice and recognize the registered text voice input by the user;
a comprehensive grading determination program module, configured to determine a signal-to-noise ratio of the registered text voice based on an environmental audio database, detect a speech rate and an amplitude truncation of the registered text voice, determine the signal-to-noise ratio, the speech rate, and the amplitude truncation as grading factors, and determine a comprehensive grading of the registered text voice through the grading factors;
and the voiceprint registration program module is used for storing the environmental audio collected during voiceprint registration guidance to the environmental audio database and carrying out voiceprint registration on the registered text voice when the comprehensive score reaches a preset score threshold value.
8. A voiceprint registration system comprising:
the system comprises an environmental audio acquisition program module, a voice print registration program module and a voice print registration processing module, wherein the environmental audio acquisition program module is used for playing a voice print registration flow to a user through interface interaction to perform voice print registration guidance, acquiring environmental audio while the voice print registration guidance is performed, and prompting the user to speak out a registration text when the time length of the acquired environmental audio meets a preset threshold, wherein the time length of the user interaction flow in the interface interaction of the voice print registration guidance is matched with the time length of the acquired environmental audio;
the registered text voice recognition program module is used for recognizing the registered text voice input by the user;
a comprehensive grading determination program module, configured to determine a signal-to-noise ratio of the registered text voice based on an environmental audio database, detect a speech rate and an amplitude truncation of the registered text voice, determine the signal-to-noise ratio, the speech rate, and the amplitude truncation as grading factors, and determine a comprehensive grading of the registered text voice through the grading factors;
and the voiceprint registration program module is used for storing the environmental audio collected during voiceprint registration guidance to the environmental audio database and carrying out voiceprint registration on the registered text voice when the comprehensive score reaches a preset score threshold value.
9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any of claims 1-6.
10. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
CN201911409832.8A 2019-12-31 2019-12-31 Voiceprint registration method and system Active CN111161746B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911409832.8A CN111161746B (en) 2019-12-31 2019-12-31 Voiceprint registration method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911409832.8A CN111161746B (en) 2019-12-31 2019-12-31 Voiceprint registration method and system

Publications (2)

Publication Number Publication Date
CN111161746A CN111161746A (en) 2020-05-15
CN111161746B true CN111161746B (en) 2022-04-15

Family

ID=70559924

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911409832.8A Active CN111161746B (en) 2019-12-31 2019-12-31 Voiceprint registration method and system

Country Status (1)

Country Link
CN (1) CN111161746B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021232213A1 (en) * 2020-05-19 2021-11-25 华为技术有限公司 Voiceprint recognition apparatus, voiceprint registration apparatus and cross-device voiceprint recognition method
CN112309406A (en) * 2020-09-21 2021-02-02 北京沃东天骏信息技术有限公司 Voiceprint registration method, voiceprint registration device and computer-readable storage medium
CN112929501A (en) * 2021-01-25 2021-06-08 深圳前海微众银行股份有限公司 Voice call service method, device, equipment, medium and computer program product

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104036780B (en) * 2013-03-05 2017-05-24 阿里巴巴集团控股有限公司 Man-machine identification method and system
US10467509B2 (en) * 2017-02-14 2019-11-05 Microsoft Technology Licensing, Llc Computationally-efficient human-identifying smart assistant computer
CN107492379B (en) * 2017-06-30 2021-09-21 百度在线网络技术(北京)有限公司 Voiceprint creating and registering method and device
CN108694952B (en) * 2018-04-09 2020-04-28 平安科技(深圳)有限公司 Electronic device, identity authentication method and storage medium
CN108989341B (en) * 2018-08-21 2023-01-13 平安科技(深圳)有限公司 Voice autonomous registration method and device, computer equipment and storage medium
CN109510844B (en) * 2019-01-16 2022-02-25 中民乡邻投资控股有限公司 Voice print-based conversation exchange type account registration method and device
CN109841218B (en) * 2019-01-31 2020-10-27 北京声智科技有限公司 Voiceprint registration method and device for far-field environment

Also Published As

Publication number Publication date
CN111161746A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
EP3611895B1 (en) Method and device for user registration, and electronic device
CN110832580B (en) Detection of replay attacks
CN111161746B (en) Voiceprint registration method and system
US11042616B2 (en) Detection of replay attack
CN109473123B (en) Voice activity detection method and device
CN110473539B (en) Method and device for improving voice awakening performance
CN102568478B (en) Video play control method and system based on voice recognition
CN109461449B (en) Voice wake-up method and system for intelligent device
WO2020181824A1 (en) Voiceprint recognition method, apparatus and device, and computer-readable storage medium
GB2583420A (en) Speaker identification
CN109739354B (en) Voice-based multimedia interaction method and device
CN110335593A (en) Sound end detecting method, device, equipment and storage medium
CN112002347B (en) Voice detection method and device and electronic equipment
EP4002363A1 (en) Method and apparatus for detecting an audio signal, and storage medium
CN111312218A (en) Neural network training and voice endpoint detection method and device
CN111090412B (en) Volume adjusting method and device and audio equipment
CN107977187B (en) Reverberation adjusting method and electronic equipment
CN112002349A (en) Voice endpoint detection method and device
CN111081260A (en) Method and system for identifying voiceprint of awakening word
CN110197663B (en) Control method and device and electronic equipment
CN109271480B (en) Voice question searching method and electronic equipment
CN109377806B (en) Test question distribution method based on learning level and learning client
CN111540357A (en) Voice processing method, device, terminal, server and storage medium
CN110931020B (en) Voice detection method and device
CN113707128B (en) Test method and system for full duplex voice interaction system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant before: AI SPEECH Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant