CN110634492A - Login verification method and device, electronic equipment and computer readable storage medium - Google Patents

Login verification method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN110634492A
CN110634492A CN201910512611.7A CN201910512611A CN110634492A CN 110634492 A CN110634492 A CN 110634492A CN 201910512611 A CN201910512611 A CN 201910512611A CN 110634492 A CN110634492 A CN 110634492A
Authority
CN
China
Prior art keywords
audio data
user
text
extracting
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910512611.7A
Other languages
Chinese (zh)
Other versions
CN110634492B (en
Inventor
赖勇铨
张靖友
李美玲
李家锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Citic Bank Corp Ltd
Original Assignee
China Citic Bank Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Citic Bank Corp Ltd filed Critical China Citic Bank Corp Ltd
Priority to CN201910512611.7A priority Critical patent/CN110634492B/en
Publication of CN110634492A publication Critical patent/CN110634492A/en
Application granted granted Critical
Publication of CN110634492B publication Critical patent/CN110634492B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application provides a login verification method, a login verification device, electronic equipment and a computer readable storage medium, which are applied to the technical field of voice, wherein the method comprises the following steps: the method comprises the steps of collecting first audio data of a user, extracting to-be-matched identity features of the user through a pre-trained neural network model based on a text corresponding to the first audio data and the determined first audio data, calculating similarity between the to-be-matched identity features and pre-stored target identity features, and determining whether the user is allowed to log in according to a similarity calculation result, namely extracting to-be-matched identity features of the user based on the text corresponding to the first audio data of the user and the determined first audio data, determining whether the user is allowed to log in according to the similarity between the to-be-matched identity features and the target identity features, and avoiding inputting an account number and a password, so that the problem that other users log in by stealing the account number and the corresponding password is solved, and the login verification safety of the user login is improved.

Description

Login verification method and device, electronic equipment and computer readable storage medium
Technical Field
The present application relates to the field of voice technologies, and in particular, to a login verification method, device, electronic device, and computer-readable storage medium.
Background
With the development of the APP application technology, various APP applications (such as short video APP, communication APP and financial service APP) are provided in the APP application market for users to select and use, and the users can select corresponding APP applications to log in and use based on their own needs, and how to log in the APP becomes a key problem.
At present, a user can realize login of a corresponding APP by inputting an account and a corresponding password of the APP, that is, the account and the corresponding password of the user are input on a corresponding APP login interface and submitted to a server, and then the server verifies the account and the corresponding password input by the user to determine whether to allow the current user to login. However, according to the existing mode of carrying out APP login by manually inputting an account and a corresponding password, the account and the corresponding password of the APP need to be manually input one by one, which causes a problem of complex operation. Therefore, the existing mode of manually inputting the APP account and the corresponding password to log in the APP has the problems of complex operation and low safety.
Disclosure of Invention
The application provides a login verification method, a login verification device, electronic equipment and a computer-readable storage medium, which are used for improving the convenience and the safety of user login, and the technical scheme adopted by the application is as follows:
in a first aspect, there is provided a login authentication method, the method comprising,
collecting first audio data of a user;
extracting to-be-matched identity features of the user through a pre-trained neural network model based on the first audio data and the text corresponding to the determined first audio data;
and calculating the similarity between the identity features to be matched and the pre-stored target identity features, and determining whether the user is allowed to log in according to the similarity calculation result.
In a second aspect, there is provided a login authentication device, the device comprising,
the first acquisition module is used for acquiring first audio data of a user;
the first extraction module is used for extracting the identity characteristics to be matched of the user through a pre-trained neural network model based on the first audio data acquired by the first acquisition module and the text corresponding to the determined first audio data;
and the calculating module is used for calculating the similarity between the identity feature to be matched extracted by the first extracting module and a pre-stored target identity feature and determining whether the user is allowed to log in or not according to the similarity calculation result.
In a third aspect, an electronic device is provided, which includes:
one or more processors;
a memory;
one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: the login authentication method shown in the first aspect is performed.
In a fourth aspect, there is provided a computer-readable storage medium for storing computer instructions which, when run on a computer, cause the computer to perform the login authentication method of the first aspect.
Compared with the prior art of performing APP login by manually inputting account numbers and corresponding passwords one by one, the method comprises the steps of acquiring first audio data of a user, extracting to-be-matched identity characteristics of the user through a pre-trained neural network model based on a text corresponding to the first audio data and the determined first audio data, calculating the similarity between the to-be-matched identity characteristics and a pre-stored target identity characteristic, determining whether the user is allowed to login according to a similarity calculation result, namely extracting to-be-matched identity characteristics of the user based on the text corresponding to the first audio data of the user and the determined first audio data, determining whether the user is allowed to login based on the similarity between the to-be-matched identity characteristics and the target identity characteristics, and avoiding the complicated step of inputting account numbers and corresponding passwords one by the user, the convenience of user login verification is improved, in addition, an account and a password do not need to be input, so that the problem that other users steal the account and log in through the corresponding password is avoided, and the safety of user login verification is improved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart of a login authentication method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a login authentication device according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of another login authentication device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
An embodiment of the present application provides a login verification method, as shown in fig. 1, the method may include the following steps:
step S101, collecting first audio data of a user;
specifically, the first audio data of the user may be acquired by a voice acquisition device configured by a terminal device, where the terminal device may be a mobile phone, a PAD, a computer terminal, and the like, and the voice acquisition device may be built in the terminal device or may be an external device connected to the terminal device, which is not limited herein.
Step S102, extracting to-be-matched identity characteristics of a user through a pre-trained neural network model based on the first audio data and a text corresponding to the determined first audio data;
specifically, the identity characteristics to be matched of the user are extracted and obtained through a pre-trained Neural network model based on the first audio data and a text corresponding to the first audio data determined by a corresponding method, wherein the pre-trained Neural network model can be obtained by training audio data of a plurality of users, and the pre-trained Neural network model can be a model based on a Convolutional Neural Network (CNN).
Step S103, calculating the similarity between the identity features to be matched and the pre-stored target identity features, and determining whether the user is allowed to log in according to the similarity calculation result.
Specifically, the similarity between the to-be-matched identity feature and the pre-stored target identity feature may be calculated, and it is determined that the user may log in when the similarity calculation result satisfies a certain threshold condition, where the similarity may be a euclidean distance or a cosine distance between the calculated to-be-matched identity feature and the pre-stored target identity feature, when the euclidean distance or the cosine distance is less than a preset threshold (e.g., 1), it is determined that the user may log in, and when the euclidean distance or the cosine distance is greater than or equal to the preset threshold, it is determined that the user is not allowed to log in.
Compared with the prior art that APP login is performed by inputting account numbers and corresponding passwords one by one manually, the login verification method provided by the embodiment of the application acquires first audio data of a user, extracts to-be-matched identity features of the user through a pre-trained neural network model based on a text corresponding to the first audio data and the determined first audio data, calculates similarity between the to-be-matched identity features and pre-stored target identity features, and determines whether the user is allowed to login according to a calculation result of the similarity, namely extracts to-be-matched identity features of the user based on the text corresponding to the first audio data and the determined first audio data of the user, determines whether the user is allowed to login based on the similarity between the to-be-matched identity features and the target identity features, and avoids the complicated step that the user inputs account numbers and corresponding passwords one by one, the convenience of user login verification is improved, in addition, an account and a password do not need to be input, so that the problem that other users steal the account and log in through the corresponding password is avoided, and the safety of user login verification is improved.
The embodiment of the present application provides a possible implementation manner, and the manner of determining the text corresponding to the first audio data in step S102 includes, but is not limited to, any of the following:
step S1021 (not shown in the figure), inputting the first audio data into a pre-trained speech recognition model, and recognizing to obtain a text corresponding to the first audio data;
for the embodiment of the present application, the first audio data may be input into a pre-trained speech recognition Model, and a text corresponding to the first audio data is recognized, where the pre-trained speech recognition Model may be a Hidden Markov Model (HMM), a Model based on a Recurrent Neural Network (RNN), a Model based on a Long-Short Term Memory artificial Neural Network (LSTM), or another speech recognition Model capable of implementing the functions of the present application, and this application is not limited herein.
Step S1022 (not shown in the figure), query the local database or send a query request to the server, and determine a text corresponding to the preset first audio data.
Specifically, a text corresponding to the first audio data is pre-stored, and the corresponding text may be stored locally and/or in a server, and the text corresponding to the first audio data may be determined through a corresponding query operation.
For the embodiment of the application, the problem of determining the text corresponding to the first audio data is solved, and a basis is provided for extracting the identity features to be matched based on the first audio data and the text corresponding to the determined first audio data.
The embodiment of the present application provides a possible implementation manner, and specifically, step S102 includes:
step S1023 (not shown in the figure), extracting text features of a text corresponding to the first audio data;
step S1024 (not shown in the figure), extracting the voiceprint feature of the first audio data;
step S1025 (not shown in the figure), the identity feature to be matched is obtained through full-connection network extraction of the pre-trained neural network model based on the text feature and the voiceprint feature.
Specifically, the text feature of the text corresponding to the first audio data may be extracted by a corresponding text feature extraction method, and the voiceprint feature of the first audio data may be extracted by a corresponding voiceprint feature extraction method.
Specifically, the extracted text features and voiceprint features are input into a fully connected network of a pre-trained neural network model, and the identity features to be matched are extracted, wherein the fully connected network can be one layer or multiple layers.
The embodiment of the application solves the problem of extracting the identity features to be matched, and provides a basis for the subsequent similarity calculation of the identity features to be matched and the target identity features.
The embodiment of the present application provides a possible implementation manner, and specifically, step S1023 includes:
step S10231 (not shown in the figure), inputting a text corresponding to the first audio data into a word nesting layer of the pre-trained neural network model to obtain a text vector of the text corresponding to the first audio data;
specifically, the pre-trained neural network model includes a word nesting layer (Embedding layer), where word nesting refers to Embedding discretely encoded words into a vector space through a neural network, and a text vector corresponding to the first audio data can be obtained by inputting a text corresponding to the first audio data into the word nesting layer of the neural network model; compared with the common word nesting in text processing, the word nesting parameters are learned in an end-to-end mode, so that the texts with closer pronunciations are closer to each other in the vector space.
Illustratively, the text corresponding to the first audio data is encoded by an embedding layer (embedding layer) and converted into a floating-point number vector of 128 bits (or similar dimension).
Step S10232 (not shown in the figure), the text vector is input to the deep network of the pre-trained neural network model to obtain the text feature with fixed length of the text corresponding to the first audio data.
Specifically, the pre-trained neural network model includes one or more layers of deep networks, and the obtained text vectors are input to the one or more layers of deep networks to obtain fixed-length text features of the text corresponding to the first audio data, where the one layer of deep network may be any one of a convolutional neural network, a fully-connected network, and an identity mapping network, and the multiple layers of deep networks may be a combination of one or more of a convolutional neural network, a fully-connected network, and an identity mapping network.
For the embodiment of the application, the problem of extracting the text features of the text corresponding to the first audio data is solved through the word nesting layer and the deep network of the pre-trained neural network model.
The embodiment of the present application provides a possible implementation manner, and specifically, step S1024 includes:
step S10241 (not shown in the figure), performing normalization processing, windowing, and short-time fourier transform processing on the first audio data to obtain a spectrogram corresponding to the first audio data;
specifically, the first audio data may be normalized, for example, the first audio data (sound signal) of the user is converted into electronic information through a microphone of the mobile phone, and finally sampled at a sampling rate of 16K or higher, and a string of 16-bit quantized digital signals is output; because the recording settings are different, the output voice digital signal may be a dual-channel signal, that is, the output voice digital signal includes two similar digital sequences, and at this time, one of the two digital sequences is taken as a voiceprint recognition signal; when the sampling rate is not 16K, the signal is converted into 16K by means of resampling or interpolation, and when the quantized signal is not 16-bit integer, the signal is converted into 16-bit quantization by means of linear mapping.
Specifically, the first audio data after the normalization processing is converted into a string of digital signals (array), a block of data (for example, 512 length, corresponding to 512/16K ═ 0.032 second) can be taken from the array through a sliding window at intervals (for example, 0.025 second) to perform Fast Fourier Transform (FFT) transform, and the absolute value of a complex signal obtained after FFT is calculated for each time interval of the first audio data is arranged as a row of images in time sequence, so as to obtain a spectrogram corresponding to the first audio data; wherein, before performing the FFT, the signal segment to be transformed may be windowed (i.e. multiplied by a window function to emphasize the middle portion of the signal segment and reduce the edge effect of the FFT).
Specifically, the spectrogram may be further truncated and normalized, according to the sampling law, the 16K sampled sound signal includes an original sound signal with a maximum frequency of 8K, and the spectrogram is truncated first, and an intermediate frequency portion (for example, a portion from 250Hz to 7 KHz) is taken. The retained fraction was then normalized as follows: for each frequency component (i.e., each line in the spectrogram), calculating the mean and variance of all time points; subtracting the average value of the frequency calculated in the step 1 from the array corresponding to each frequency component and dividing the average value by the variance of the frequency, thereby removing the frequency components which have little relation to the human voice characteristics, and simultaneously enabling the data in the frequency components in the time period to be in accordance with the distribution with the average value of 0 and the variance of 1, namely whitening (whiting) processing is carried out on the sound image spectrum. It is common practice to perform whitening processing on the signal for each time segment, and the present scheme performs whitening processing separately for each frequency. Experiments prove that the method is more suitable for sound signals and is less sensitive to noise.
Step S10242 (not shown in the figure), extracting a plurality of feature maps of the spectrogram through a multi-layer convolution network of the pre-trained neural network model, and averaging the obtained plurality of feature maps in a time direction to obtain a feature vector with a fixed length;
specifically, the voice duration of the user speaking is not fixed, even for the same sentence, such as for "hello", the user can finish speaking at normal speed, and can drag "hello" to a long voice and then say "good"; compared with the common method that a plurality of fixed-length segments are extracted from fixed-length voice and input into a depth network, and then the average value of the features is taken for calculation, for example, the method can be used for calculating the spectrogram and normalizing the spectrogram of the whole first audio data, then inputting the spectrogram into a convolutional neural network, obtaining a group of feature maps through a series of convolution processing operations, and then calculating the average value of the obtained group of feature maps according to the time direction to obtain the feature vectors with fixed lengths; the fixed-length feature vector has no deformation on the input spectrogram length, namely, the length of the audio.
Step S10243 (not shown), performing centering and length normalization processing on the obtained fixed-length feature vector to obtain a voiceprint feature of the first audio data.
Specifically, subtracting the mean value of the corresponding features of all training samples from the feature vector with fixed length, thereby moving the feature centers of all samples to the origin of coordinates; then, length normalization is carried out on the feature after the centralization processing, so as to obtain the voiceprint feature of the first audio data, wherein the length normalization can be L2 norm normalization, and can also be other processing capable of realizing the function; the two steps of centralization and length normalization are combined, so that the feature points of the first audio data are distributed on the spherical surface of the unit sphere in the high-dimensional space.
For the embodiment of the application, the feature vectors with fixed lengths of the spectrogram corresponding to the first audio data are extracted and obtained through the multi-layer convolution network of the pre-trained neural network model, and the voiceprint features corresponding to the first audio data are obtained by performing centralization and length normalization processing on the feature vectors with fixed lengths, so that the problem of extracting the voiceprint features of the first audio data is solved.
The embodiment of the present application provides another possible implementation manner, and the method further includes:
step S104 (not shown in the figure), acquiring a text input by the user corresponding to the account;
step S105 (not shown in the figure), acquiring second audio data read by the user corresponding to the account based on the input text;
step S106 (not shown in the figure), based on the input text and the second audio data, extracting the target identity feature of the user corresponding to the account through the pre-trained neural network model, and storing the target identity feature to a local and/or server.
Illustratively, when a certain user performs voiceprint registration, acquiring a text input by the user corresponding to the account, and acquiring second audio data of the text input by reading aloud by the user corresponding to the account through a voice acquisition device (such as a microphone) configured by the terminal device, wherein the text input by the certain user includes at least one of characters, numbers and characters; and then, based on the input text and the second audio data, extracting and obtaining target identity characteristics of the user corresponding to the account through a pre-trained neural network model, and storing the target identity characteristics to a local and/or server, wherein the target identity characteristics are used for verifying the identity characteristics to be matched of the certain user when the certain user logs in.
For the embodiment, the problem of extracting the target identity characteristics is solved, and a basis is provided for verifying the identity of the user when the user logs in subsequently.
The embodiment of the present application provides another possible implementation manner, and further, before step S101, the method includes:
step S107 (not shown in the figure), prompting the user to log in by voice, where the prompt does not include a display prompt or a voice prompt for the text corresponding to the first audio data.
Illustratively, when a user clicks an APP icon corresponding to a terminal device, entering a user login interface, prompting the user to log in through voice through the user login interface, and acquiring first audio data sent by the user through corresponding voice acquisition equipment after detecting that the user sends corresponding voice; when the user is prompted to log in through voice, the text corresponding to the first audio data is not subjected to display prompting or voice prompting.
For the embodiment of the application, the user is prompted to log in through the recording, and the audio data is collected only after the user sends out the corresponding voice, so that the audio data is detected and collected in real time, and the power consumption is reduced.
Fig. 2 is a login authentication apparatus according to an embodiment of the present application, where the apparatus 20 includes: a first acquisition module 201, a first extraction module 202 and a calculation module 203,
a first collecting module 201, configured to collect first audio data of a user;
the first extraction module 202 is configured to extract, based on the first audio data acquired by the first acquisition module 201 and a text corresponding to the determined first audio data, to-be-matched identity features of the user through a pre-trained neural network model;
the calculating module 203 is configured to calculate similarity between the identity feature to be matched extracted by the first extracting module 202 and a pre-stored target identity feature, and determine whether to allow the user to log in according to a result of the similarity calculation.
The embodiment of the application provides a login verification device, compared with the prior art that APP login is performed by inputting account numbers and corresponding passwords one by one manually, the embodiment of the application acquires first audio data of a user, extracts to-be-matched identity characteristics of the user through a pre-trained neural network model based on a text corresponding to the first audio data and the determined first audio data, calculates similarity between the to-be-matched identity characteristics and a pre-stored target identity characteristic, and determines whether the user is allowed to log in according to a calculation result of the similarity, namely extracts to-be-matched identity characteristics of the user based on the text corresponding to the first audio data and the determined first audio data of the user, determines whether the user is allowed to log in based on the similarity between the to-be-matched identity characteristics and the target identity characteristics, and avoids the complicated step that the user inputs account numbers and corresponding passwords one by one, the convenience of user login verification is improved, in addition, an account and a password do not need to be input, so that the problem that other users steal the account and log in through the corresponding password is avoided, and the safety of user login verification is improved.
The login verification apparatus of this embodiment may perform the login verification method provided in the above embodiments of this application, and the implementation principles thereof are similar, and are not described herein again.
As shown in fig. 3, the present embodiment provides another login authentication device, where the device 30 includes: a first acquisition module 301, a first extraction module 302, and a calculation module 303, wherein,
a first collecting module 301, configured to collect first audio data of a user;
wherein the first acquisition module 301 in fig. 3 has the same or similar function as the first acquisition module 201 in fig. 2.
The first extraction module 302 is configured to extract, based on the first audio data acquired by the first acquisition module 301 and a text corresponding to the determined first audio data, to-be-matched identity features of the user through a pre-trained neural network model;
wherein the first extraction module 302 in fig. 3 has the same or similar function as the first extraction module 202 in fig. 2.
The calculating module 303 is configured to calculate a similarity between the identity feature to be matched extracted by the first extracting module 302 and a pre-stored target identity feature, and determine whether to allow the user to log in according to a calculation result of the similarity.
Wherein the computing module 303 in fig. 3 has the same or similar function as the computing module 203 in fig. 2.
The embodiment of the application provides a possible implementation manner, wherein the manner of determining the text corresponding to the first audio data includes any one of the following:
inputting the first audio data into a pre-trained voice recognition model, and recognizing to obtain a text corresponding to the first audio data;
and inquiring a local database or sending an inquiry request to a server, and determining a text corresponding to the preset first audio data.
For the embodiment of the application, the problem of determining the text corresponding to the first audio data is solved, and a basis is provided for extracting the identity features to be matched based on the first audio data and the text corresponding to the determined first audio data.
The embodiment of the present application provides a possible implementation manner, and specifically, the first extraction module 302 includes:
a first extracting unit 3021 configured to extract a text feature of a text corresponding to the first audio data;
a second extraction unit 3022 configured to extract a voiceprint feature of the first audio data;
the third extracting unit 3023 is configured to extract, based on the text feature extracted by the first extracting unit 3021 and the voiceprint feature extracted by the second extracting unit 3022, the identity feature to be matched through a full-connection network of a pre-trained neural network model.
The embodiment of the application solves the problem of extracting the identity features to be matched, and provides a basis for the subsequent similarity calculation of the identity features to be matched and the target identity features.
The embodiment of the present application provides a possible implementation manner, and specifically, the first extracting unit 3021 includes:
a first input subunit 30211 (not shown in the figure), configured to input a text corresponding to the first audio data to a word nesting layer of the pre-trained neural network model, so as to obtain a text vector of the text corresponding to the first audio data;
a second input subunit 30212 (not shown in the figure) configured to input the text vector into the deep network of the pre-trained neural network model to obtain a text feature with a fixed length of the text corresponding to the first audio data.
For the embodiment of the application, the problem of extracting the text features of the text corresponding to the first audio data is solved through the word nesting layer and the deep network of the pre-trained neural network model.
The embodiment of the present application provides a possible implementation manner, and specifically, the second extracting unit 3022 includes:
a first processing subunit 30221 (not shown in the figure), configured to perform normalization processing and windowing and short-time fourier transform processing on the first audio data to obtain a spectrogram corresponding to the first audio data;
an extracting subunit 30222 (not shown in the figure) configured to extract a plurality of feature maps of the spectrogram through a multi-layer convolutional network of the pre-trained neural network model, and average the obtained plurality of feature maps in a time direction to obtain a feature vector with a fixed length;
a second processing subunit 30223 (not shown in the figure) is configured to perform centering and length normalization processing on the obtained fixed-length feature vector to obtain a voiceprint feature of the first audio data.
For the embodiment of the application, the feature vectors with fixed lengths of the spectrogram corresponding to the first audio data are extracted and obtained through the multi-layer convolution network of the pre-trained neural network model, and the voiceprint features corresponding to the first audio data are obtained by performing centralization and length normalization processing on the feature vectors with fixed lengths, so that the problem of extracting the voiceprint features of the first audio data is solved.
The embodiment of the present application provides a possible implementation manner, and further, the apparatus 30 further includes:
an obtaining module 304, configured to obtain a text input by a user corresponding to an account;
a second collecting module 305, configured to collect second audio data that is read by the user corresponding to the account based on the input text;
the second extracting module 306 is configured to extract, based on the input text acquired by the acquiring module 304 and the second audio data acquired by the second acquiring module 305, a target identity feature of the user corresponding to the account through a pre-trained neural network model, and store the target identity feature to a local and/or server.
For the embodiment, the problem of extracting the target identity characteristics is solved, and a basis is provided for verifying the identity of the user when the user logs in subsequently.
The embodiment of the present application provides a possible implementation manner, and further, the apparatus 30 further includes:
and the prompting module 307 is configured to prompt the user to log in through voice, where the prompt does not include a display prompt or a voice prompt for a text corresponding to the first audio data.
For the embodiment of the application, the user is prompted to log in through the recording, and the audio data is collected only after the user sends out the corresponding voice, so that the audio data is detected and collected in real time, and the power consumption is reduced.
The embodiment of the application provides a login verification device, compared with the prior art that APP login is performed by inputting account numbers and corresponding passwords one by one manually, the embodiment of the application acquires first audio data of a user, extracts to-be-matched identity characteristics of the user through a pre-trained neural network model based on a text corresponding to the first audio data and the determined first audio data, calculates similarity between the to-be-matched identity characteristics and a pre-stored target identity characteristic, and determines whether the user is allowed to log in according to a calculation result of the similarity, namely extracts to-be-matched identity characteristics of the user based on the text corresponding to the first audio data and the determined first audio data of the user, determines whether the user is allowed to log in based on the similarity between the to-be-matched identity characteristics and the target identity characteristics, and avoids the complicated step that the user inputs account numbers and corresponding passwords one by one, the convenience of user login verification is improved, in addition, an account and a password do not need to be input, so that the problem that other users steal the account and log in through the corresponding password is avoided, and the safety of user login verification is improved.
The embodiment of the present application provides a login verification apparatus, which is suitable for the method shown in the above embodiment, and is not described herein again.
An embodiment of the present application provides an electronic device, as shown in fig. 4, an electronic device 40 shown in fig. 4 includes: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Further, the electronic device 40 may also include a transceiver 4004. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 400 is not limited to the embodiment of the present application. The processor 4001 is applied in this embodiment of the application, and is configured to implement the functions of the first acquiring module, the first extracting module, and the calculating module shown in fig. 2 or fig. 3, and the acquiring module 304, the second acquiring module 305, the second extracting module 306, and the prompting module 307 shown in fig. 3. The transceiver 4004 includes a receiver and a transmitter.
Processor 4001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 4002 may include a path that carries information between the aforementioned components. Bus 4002 may be a PCI bus, EISA bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
Memory 4003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, an optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 4003 is used for storing application codes for executing the scheme of the present application, and the execution is controlled by the processor 4001. The processor 4001 is configured to execute application code stored in the memory 4003 to implement the functions of the login authentication apparatus provided by the embodiment shown in fig. 2 or fig. 3.
Compared with the prior art that APP login is performed by inputting account numbers and corresponding passwords one by one manually, the electronic equipment provided by the embodiment of the application acquires first audio data of a user, extracts to-be-matched identity features of the user through a pre-trained neural network model based on a text corresponding to the first audio data and the determined first audio data, calculates similarity between the to-be-matched identity features and pre-stored target identity features, and determines whether the user is allowed to login according to a calculation result of the similarity, namely extracts to-be-matched identity features of the user based on the text corresponding to the first audio data and the determined first audio data of the user, determines whether the user is allowed to login based on the similarity between the to-be-matched identity features and the target identity features, and avoids the complicated step that the user inputs account numbers and corresponding passwords one by one, the convenience of user login verification is improved, in addition, an account and a password do not need to be input, so that the problem that other users steal the account and log in through the corresponding password is avoided, and the safety of user login verification is improved.
The embodiment of the application provides an electronic device suitable for the method embodiment. And will not be described in detail herein.
The present application provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method shown in the above embodiments is implemented.
Compared with the prior art that APP login is performed by inputting account numbers and corresponding passwords one by one manually, the embodiment of the application acquires first audio data of a user, extracts to-be-matched identity features of the user through a pre-trained neural network model based on a text corresponding to the first audio data and the determined first audio data, calculates similarity between the to-be-matched identity features and pre-stored target identity features, and determines whether the user is allowed to login according to a similarity calculation result, namely extracts to-be-matched identity features of the user based on the text corresponding to the first audio data of the user and the determined first audio data, determines whether the user is allowed to login based on the similarity between the to-be-matched identity features and the target identity features, and avoids the complicated step that the user inputs account numbers and corresponding passwords one by one, the convenience of user login verification is improved, in addition, an account and a password do not need to be input, so that the problem that other users steal the account and log in through the corresponding password is avoided, and the safety of user login verification is improved.
The embodiment of the application provides a computer-readable storage medium which is suitable for the method embodiment. And will not be described in detail herein.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (10)

1. A login authentication method, comprising:
collecting first audio data of a user;
extracting to-be-matched identity features of the user through a pre-trained neural network model based on the first audio data and the determined text corresponding to the first audio data;
and calculating the similarity between the identity features to be matched and pre-stored target identity features, and determining whether the user is allowed to log in according to the similarity calculation result.
2. The method of claim 1, wherein determining the text corresponding to the first audio data comprises any one of:
inputting the first audio data into a pre-trained voice recognition model, and recognizing to obtain a text corresponding to the first audio data;
and inquiring a local database or sending an inquiry request to a server, and determining a preset text corresponding to the first audio data.
3. The method of claim 1, wherein the extracting the identity feature to be matched of the user through a pre-trained neural network model based on the text corresponding to the first audio data and the determined first audio data comprises:
extracting text features of a text corresponding to the first audio data;
extracting voiceprint features of the first audio data;
and extracting the identity features to be matched through a full-connection network of the pre-trained neural network model based on the text features and the voiceprint features.
4. The method of claim 1, wherein extracting text features of text corresponding to the first audio data comprises:
inputting the text corresponding to the first audio data into a word nesting layer of the pre-trained neural network model to obtain a text vector of the text corresponding to the first audio data;
and inputting the text vector to a deep network of the pre-trained neural network model to obtain the text feature with fixed length of the text corresponding to the first audio data.
5. The method of claim 1, wherein the extracting the voiceprint features of the first audio data comprises:
carrying out standardization processing, windowing and short-time Fourier transform processing on the first audio data to obtain a spectrogram corresponding to the first audio data;
extracting a plurality of feature maps of the spectrogram through a multilayer convolution network of the pre-trained neural network model, and averaging the obtained feature maps in a time direction to obtain feature vectors with fixed lengths;
and carrying out centralization and length normalization processing on the obtained feature vectors with fixed lengths to obtain the voiceprint features of the first audio data.
6. The method of claims 1-5, further comprising:
acquiring a text input by a user corresponding to an account;
acquiring second audio data of the user corresponding to the account based on the input text reading;
and extracting the target identity characteristics of the user corresponding to the account through a pre-trained neural network model based on the input text and the second audio data, and storing the target identity characteristics to a local and/or server.
7. The method of claims 1-5, wherein the capturing first audio data of a user previously comprises:
and prompting the user to log in through voice, wherein the prompt does not comprise displaying prompt or voice prompt for the text corresponding to the first audio data.
8. A login authentication apparatus, comprising:
the first acquisition module is used for acquiring first audio data of a user;
the first extraction module is used for extracting the identity features to be matched of the user through a pre-trained neural network model based on the first audio data acquired by the first acquisition module and the text corresponding to the determined first audio data;
and the calculating module is used for calculating the similarity between the identity feature to be matched extracted by the first extracting module and a pre-stored target identity feature and determining whether the user is allowed to log in according to the similarity calculation result.
9. An electronic device, comprising:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing a login authentication method according to any one of claims 1 to 7.
10. A computer-readable storage medium for storing computer instructions which, when executed on a computer, cause the computer to perform the login authentication method of any one of claims 1 to 7.
CN201910512611.7A 2019-06-13 2019-06-13 Login verification method, login verification device, electronic equipment and computer readable storage medium Active CN110634492B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910512611.7A CN110634492B (en) 2019-06-13 2019-06-13 Login verification method, login verification device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910512611.7A CN110634492B (en) 2019-06-13 2019-06-13 Login verification method, login verification device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110634492A true CN110634492A (en) 2019-12-31
CN110634492B CN110634492B (en) 2023-08-25

Family

ID=68968386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910512611.7A Active CN110634492B (en) 2019-06-13 2019-06-13 Login verification method, login verification device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110634492B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179941A (en) * 2020-01-06 2020-05-19 科大讯飞股份有限公司 Intelligent device awakening method, registration method and device
CN111343162A (en) * 2020-02-14 2020-06-26 深圳壹账通智能科技有限公司 System secure login method, device, medium and electronic equipment
CN113257255A (en) * 2021-07-06 2021-08-13 北京远鉴信息技术有限公司 Method and device for identifying forged voice, electronic equipment and storage medium
CN113257230A (en) * 2021-06-23 2021-08-13 北京世纪好未来教育科技有限公司 Voice processing method and device and computer storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106961418A (en) * 2017-02-08 2017-07-18 北京捷通华声科技股份有限公司 Identity identifying method and identity authorization system
CN107104803A (en) * 2017-03-31 2017-08-29 清华大学 It is a kind of to combine the user ID authentication method confirmed with vocal print based on numerical password
WO2017197953A1 (en) * 2016-05-16 2017-11-23 腾讯科技(深圳)有限公司 Voiceprint-based identity recognition method and device
CN107993071A (en) * 2017-11-21 2018-05-04 平安科技(深圳)有限公司 Electronic device, auth method and storage medium based on vocal print
CN109346088A (en) * 2018-12-06 2019-02-15 泰康保险集团股份有限公司 Personal identification method, device, medium and electronic equipment
CN109473108A (en) * 2018-12-15 2019-03-15 深圳壹账通智能科技有限公司 Auth method, device, equipment and storage medium based on Application on Voiceprint Recognition
WO2019085575A1 (en) * 2017-11-02 2019-05-09 阿里巴巴集团控股有限公司 Voiceprint authentication method and apparatus, and account registration method and apparatus

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017197953A1 (en) * 2016-05-16 2017-11-23 腾讯科技(深圳)有限公司 Voiceprint-based identity recognition method and device
CN106961418A (en) * 2017-02-08 2017-07-18 北京捷通华声科技股份有限公司 Identity identifying method and identity authorization system
CN107104803A (en) * 2017-03-31 2017-08-29 清华大学 It is a kind of to combine the user ID authentication method confirmed with vocal print based on numerical password
WO2019085575A1 (en) * 2017-11-02 2019-05-09 阿里巴巴集团控股有限公司 Voiceprint authentication method and apparatus, and account registration method and apparatus
CN107993071A (en) * 2017-11-21 2018-05-04 平安科技(深圳)有限公司 Electronic device, auth method and storage medium based on vocal print
CN109346088A (en) * 2018-12-06 2019-02-15 泰康保险集团股份有限公司 Personal identification method, device, medium and electronic equipment
CN109473108A (en) * 2018-12-15 2019-03-15 深圳壹账通智能科技有限公司 Auth method, device, equipment and storage medium based on Application on Voiceprint Recognition

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179941A (en) * 2020-01-06 2020-05-19 科大讯飞股份有限公司 Intelligent device awakening method, registration method and device
CN111343162A (en) * 2020-02-14 2020-06-26 深圳壹账通智能科技有限公司 System secure login method, device, medium and electronic equipment
CN111343162B (en) * 2020-02-14 2021-10-08 深圳壹账通智能科技有限公司 System secure login method, device, medium and electronic equipment
CN113257230A (en) * 2021-06-23 2021-08-13 北京世纪好未来教育科技有限公司 Voice processing method and device and computer storage medium
CN113257255A (en) * 2021-07-06 2021-08-13 北京远鉴信息技术有限公司 Method and device for identifying forged voice, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110634492B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
CN110265037B (en) Identity verification method and device, electronic equipment and computer readable storage medium
CN110634492B (en) Login verification method, login verification device, electronic equipment and computer readable storage medium
CN106683680B (en) Speaker recognition method and device, computer equipment and computer readable medium
JP6453917B2 (en) Voice wakeup method and apparatus
CN110718228B (en) Voice separation method and device, electronic equipment and computer readable storage medium
US10733986B2 (en) Apparatus, method for voice recognition, and non-transitory computer-readable storage medium
US5594834A (en) Method and system for recognizing a boundary between sounds in continuous speech
CN109360572B (en) Call separation method and device, computer equipment and storage medium
WO2017162053A1 (en) Identity authentication method and device
US9530417B2 (en) Methods, systems, and circuits for text independent speaker recognition with automatic learning features
AU684214B2 (en) System for recognizing spoken sounds from continuous speech and method of using same
WO1996013828A1 (en) Method and system for identifying spoken sounds in continuous speech by comparing classifier outputs
CN109410956B (en) Object identification method, device, equipment and storage medium of audio data
EP3989217B1 (en) Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium
CN109448732B (en) Digital string voice processing method and device
CN112328994A (en) Voiceprint data processing method and device, electronic equipment and storage medium
CN111653283B (en) Cross-scene voiceprint comparison method, device, equipment and storage medium
CN111540342A (en) Energy threshold adjusting method, device, equipment and medium
CN109545226B (en) Voice recognition method, device and computer readable storage medium
CN113948090B (en) Voice detection method, session recording product and computer storage medium
CN111667839A (en) Registration method and apparatus, speaker recognition method and apparatus
CN113889091A (en) Voice recognition method and device, computer readable storage medium and electronic equipment
CN108847251A (en) A kind of voice De-weight method, device, server and storage medium
CN106373576B (en) Speaker confirmation method and system based on VQ and SVM algorithms
CN111933153B (en) Voice segmentation point determining method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant