CN109326294A - A kind of relevant vocal print key generation method of text - Google Patents

A kind of relevant vocal print key generation method of text Download PDF

Info

Publication number
CN109326294A
CN109326294A CN201811139547.4A CN201811139547A CN109326294A CN 109326294 A CN109326294 A CN 109326294A CN 201811139547 A CN201811139547 A CN 201811139547A CN 109326294 A CN109326294 A CN 109326294A
Authority
CN
China
Prior art keywords
vocal print
key
matrix
training
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811139547.4A
Other languages
Chinese (zh)
Other versions
CN109326294B (en
Inventor
吴震东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201811139547.4A priority Critical patent/CN109326294B/en
Publication of CN109326294A publication Critical patent/CN109326294A/en
Application granted granted Critical
Publication of CN109326294B publication Critical patent/CN109326294B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • H04L9/0866Generation of secret information including derivation or calculation of cryptographic keys or passwords involving user or device identifiers, e.g. serial number, physical or biometrical information, DNA, hand-signature or measurable physical characteristics

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Complex Calculations (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The present invention relates to a kind of relevant vocal print key generation methods of text.The present invention includes the training of vocal print key and vocal print cipher key-extraction;The vocal print sample training that the training of vocal print key is acquired by early period goes out vocal print cipher key-extraction matrix.Vocal print sample to be extracted after pretreatment, is multiplied by the cipher key-extraction matrix that the training of vocal print key obtains, obtains vocal print key by vocal print cipher key-extraction.The present invention while more fully expressing the sound speciality of words person, keeps front and back sample to have more stable similitude using the relevant sound spectrograph of words person's text.On this basis, vocal print invariant feature vector is trained from multiple sound spectrographs with machine learning method and extracts matrix, subsequent samples are handled with the matrix, can extract more stable vocal print key.Method has the characteristics that stability is good, succinct, convenient to use.

Description

A kind of relevant vocal print key generation method of text
Technical field
The invention belongs to cyberspace security technology area, it is related to a kind of relevant vocal print key generation method of text.
Background technique
Sound groove recognition technology in e has been a kind of biometrics identification technology of comparative maturity, with artificial intelligence technology in recent years Rapid development, the accuracy rate of Application on Voiceprint Recognition also obtained a degree of raising, and Application on Voiceprint Recognition is accurate in lower noise environment Rate can reach 96% or more, be widely used in authentication scene.
With going deep into for vocal print technical application, begins trying directly to extract from mankind's vocal print in the art and stablize number Word sequence is used as biological secret key, it can all kinds of keys is directly generated with vocal print, with existing password, public private key cryptographic skill Fusion that art is seamless can remove the inconvenience of vocal print acquisition and storage process, and the safety issue that may cause from, further rich The means and method of rich network authentication.
Vocal print biological secret key technology has a degree of research, as Chinese invention patent ZL201110003202.8 is based on The file encryption and decryption method of vocal print propose one and extract the scheme for stablizing key sequence from voiceprint.But it should Scheme only stablizes vocal print feature value with chessboard method, and stablizing effect is limited and key length is insufficient.Chinese invention patent A kind of mankind's vocal print biological secret key generation method of ZL201410074511.8 proposes a kind of extraction vocal print Gauss model, and will Aspect of model parameter is projected to higher dimensional space to obtain the technology path of stable vocal print key.The program obtain vocal print key compared with Previous patent stability is significantly improved, but for the key authentication environment of high-stability requirement, above-mentioned technical proposal is extracted The stability of vocal print biological secret key still needs to be further increased.
Summary of the invention
It is an object of the invention to provide a kind of relevant vocal print key generation methods of text.
The present invention includes the training of vocal print key and vocal print cipher key-extraction;The vocal print sample that the training of vocal print key is acquired by early period Originally vocal print cipher key-extraction matrix is trained.Vocal print sample to be extracted after pretreatment, is multiplied by vocal print key by vocal print cipher key-extraction The cipher key-extraction matrix that training obtains, obtains vocal print key.Specific step is as follows:
Step 1: the training of vocal print key, specific steps are as follows:
The first step, user enroll own voices, repeat 20 to the same text information, generally 1-3 continuous words More than secondary, number is adjusted by user according to training.
Second step, admission 10 or more different user read the voice of same text information, are respectively repeated 20 times above;Admission 10 or more different users read voice similar in different text informations, duration, are respectively repeated 20 times above.
Third step pre-processes the first and second step admission voice, extracts vocal print sound spectrograph detailed process are as follows:
1) enhancing (Pre-Emphasis) in advance:
Voice time domain signal is indicated with S1 (n), wherein n=0,1,2 ..., N-1, enhance formula in advance are as follows: S (n)=S1 (n)- A*S1 (n-1), 0.9 < a < 1.0.A is coefficient to be aggravated, to adjust amplitude to be enhanced.
2) sound frame (Framing), i.e., to voice signal framing.
3) Hamming window (Hamming Windowing) is handled:
Voice time domain signal after sound frame is S (n), n=0,1,2 ..., N-1, and expression has been divided into n sections of voice signals;That Voice time domain signal after being multiplied by Hamming window is S ' (n), sees formula (1):
S ' (n)=S (n) * W (n) is (1);
:The value interval of a=0.46, a are 0.3 Between~0.7, specific value is determined according to experiment and empirical data.W (n) is Hamming window function, special with smoother low pass Property, it can preferably reflect the frequency characteristic of Short Time Speech signal.
4) fast Fourier transform (FFT):
Implement base 2FFT transformation to voice time domain signal S ' (n) after Hamming window is multiplied by, obtains linear spectral X (n, k), base 2FFT is transformed to general-purpose algorithm in the art;Spectral energy density function of the X (n, k) for n-th section of speech frame, k corresponding spectrum section, Each section of speech frame has corresponded to a timeslice on time shaft.
5) text correlation vocal print sound spectrograph is generated:
Use time n as time axial coordinate, k is incited somebody to action as frequency spectrum axial coordinate | X (n, k) |2Value be expressed as gray level, show Show on corresponding coordinate points position, that is, constitutes vocal print sound spectrograph.By converting 10log10(|X(n,k)|2) obtain sound spectrograph DB indicate.
4th step, the pretreatment such as is filtered to vocal print sound spectrograph, normalizes, and specific filtering mode has Gauss, small echo, two The general filtering mode of the field of signal processing such as value, specifically using the combination of which kind of mode or several ways, by user according to reality The selection of border test case.Normalized finger speech spectrogram size is unified to fixed length and width size, each pixel of sound spectrograph Primary system one within the scope of 0-255, specific method can be all made of universal method in the art, as picture size adjustment can be used Imresize function in matlab function library is realized.
5th step carries out machine learning to vocal print sound spectrograph, obtains vocal print invariant feature learning matrix, i.e. vocal print cipher key-extraction Matrix.
The vocal print sound spectrograph that 4th step obtains is divided into two major classes, and one kind is the related text vocal print sound spectrograph of user, The comparison vocal print sound spectrograph that the another kind of related text for non-user is mixed with irrelevant text, referred to as positive and negative sample set It closes.
With M=[M1,M2] indicate to participate in the positive and negative sample set of training, Mi=[xi1,xi2,...,xiL], i ∈ { 1,2 } table Show that the i-th class sample set, i=1 are positive sample, i=2 is negative sample;xir∈Rd, 1≤i≤2,1≤r≤L, xirFor it is one-dimensional arrange to Amount, forms a two-dimensional matrix by the value of all pixels point of a vocal print sound spectrograph, then sequentially by every a line of two-dimensional matrix Splicing, obtains one-dimensional row vector, a dimensional vector x is obtained after transpositionir, xirLength is d, RdIndicate that d ties up real number field, L indicates same There are L vocal print sound spectrographs, i.e. L column vector in a kind of sample set.
The characteristics of now according to two class samples, training vocal print cipher key-extraction matrix W1, W1∈Rd×dz, obtain formula (2):
WhereinFor the positive sample mean value of training sample,For the negative sample mean value of training sample.J is cost function, instead Training sample has been reflected through vocal print cipher key-extraction matrix W1It is poor with the distance between positive and negative sample set mean value after projection, with Euclidean away from From calculating.
It enables:
Solution matrix (H1-H2) eigen vector, obtain vocal print cipher key-extraction matrix W1, it may be assumed that (H1-H2) w=λ w;W is matrix (H1-H2) feature vector, λ is characterized value.
Due to { w1,w2,...,wdzIt is feature vector, respectively correspond characteristic value { λ12,...,λdz, wherein λ1≥λ2 ≥...≥λdz>=0, feature vector of the characteristic value less than 0 is not included into matrix W1Construction.
So far vocal print cipher key-extraction matrix W is trained1
Step 2: vocal print cipher key-extraction, specific steps are as follows:
Step 1, user enroll itself text related voice, and 3 seconds or so.
Step 2 extracts vocal print sound spectrograph, with specific reference to step 1 third step.
Step 3 the pretreatment such as is filtered to vocal print sound spectrograph, normalizes, vocal print sound spectrograph is then switched to rectangular Formula, and sequentially splice by row, obtain vocal print vector xt
Step 4, with the vocal print invariant feature learning matrix W of step 1 training1, premultiplication step 3 obtains after transposition vocal print Vector xt, i.e. W1 T·xt, obtain dzTie up vocal print feature vector xtz, xtzFor vocal print feature vector after stabilization.
Step 5, to xtzPer one-dimensional component carry out a chessboard method operation, further stablize vocal print feature vector be
Chessboard method operation, steps are as follows:
To xtzEach of dimension component be denoted as xtzi
(3) quantitative formula is shown in formula:
Wherein, D is the grid size of chessboard method, takes positive number, occurrence can rule of thumb be selected by user, general satisfaction Λ (x) value is between 0~63, xtziFor xtzEach of component, Λ (x) be integer value.
Λ (x) i.e. xtziValue after quantization is in checker-wise closest to xtziThe coordinate value of point and the grid of coordinate origin.
Step 6 takes the 5th step calculated result vectorPreceding 32 or 64 components, front and back splicing, with each component value 0~64, the calculating of 4 bit keys can be formed, the vocal print key of 128bit or 256bit can be formed;Complete mentioning for vocal print key It takes.
The present invention while more fully expressing the sound speciality of words person, is kept using the relevant sound spectrograph of words person's text Front and back sample has more stable similitude.On this basis, it is trained from multiple sound spectrographs with machine learning method Vocal print invariant feature vector extracts matrix, is handled with the matrix subsequent samples, can extract more stable vocal print key.Side Method has the characteristics that stability is good, succinct, convenient to use.
Detailed description of the invention
Fig. 1 is vocal print key of the present invention training flow chart;
Fig. 2 is vocal print sound spectrograph product process figure of the present invention;
Fig. 3 is vocal print sound spectrograph of the present invention;
Fig. 4 is vocal print cipher key-extraction flow chart of the present invention.
Fig. 5 is vocal print feature machine learning schematic diagram of the present invention.
Specific embodiment
A kind of relevant vocal print key generation method of text includes that vocal print key is trained and vocal print cipher key-extraction;Vocal print key The vocal print sample training that training is acquired by early period goes out vocal print cipher key-extraction matrix.Vocal print cipher key-extraction is by vocal print sample to be extracted After pretreatment, it is multiplied by the cipher key-extraction matrix that the training of vocal print key obtains, obtains vocal print key.Specific step is as follows:
Step 1: the training of vocal print key, as shown in Figure 1, specific steps are as follows:
The first step, user enroll own voices, repeat 20 to the same text information, generally 1-3 continuous words Secondary above (number can be adjusted by user according to training).
Second step, admission 10 or more different user read the voice of same text information, are respectively repeated 20 times above;Admission 10 or more different users read voice similar in different text informations, duration, are respectively repeated 20 times above.
Third step pre-processes the first and second step admission voice, as shown in Figures 2 and 3, it is specific to extract vocal print sound spectrograph Process are as follows:
1) enhancing (Pre-Emphasis) in advance:
Voice time domain signal is indicated with S1 (n), wherein n=0,1,2 ..., N-1, enhance formula in advance are as follows: S (n)=S1 (n)- A*S1 (n-1), 0.9 < a < 1.0.A is coefficient to be aggravated, to adjust amplitude to be enhanced.
2) sound frame (Framing), i.e., to voice signal framing.
3) Hamming window (Hamming Windowing) is handled:
Voice time domain signal after sound frame is S (n), n=0,1,2 ..., N-1, and expression has been divided into n sections of voice signals;That Voice time domain signal after being multiplied by Hamming window is S ' (n), sees formula (1):
S ' (n)=S (n) * W (n) is (1);
:The value interval of a=0.46, a are 0.3 Between~0.7, specific value is determined according to experiment and empirical data.W (n) is Hamming window function, special with smoother low pass Property, it can preferably reflect the frequency characteristic of Short Time Speech signal.
5) fast Fourier transform (FFT):
Implement base 2FFT transformation to voice time domain signal S ' (n) after Hamming window is multiplied by, obtains linear spectral X (n, k), base 2FFT is transformed to general-purpose algorithm in the art;Spectral energy density function of the X (n, k) for n-th section of speech frame, k corresponding spectrum section, Each section of speech frame has corresponded to a timeslice on time shaft.
5) text correlation vocal print sound spectrograph is generated:
Use time n as time axial coordinate, k is incited somebody to action as frequency spectrum axial coordinate | X (n, k) |2Value be expressed as gray level, show Show on corresponding coordinate points position, that is, constitutes vocal print sound spectrograph.By converting 10log10(|X(n,k)|2) obtain sound spectrograph DB indicate.
4th step, the pretreatment such as is filtered to vocal print sound spectrograph, normalizes, and specific filtering mode has Gauss, small echo, two The general filtering mode of the field of signal processing such as value, specifically using the combination of which kind of mode or several ways, by user according to reality The selection of border test case.Normalized finger speech spectrogram size is unified to fixed length and width size, each pixel of sound spectrograph Primary system one within the scope of 0-255, specific method can be all made of universal method in the art, as picture size adjustment can be used Imresize function in matlab function library is realized.
5th step carries out machine learning to vocal print sound spectrograph, obtains vocal print invariant feature learning matrix, i.e. vocal print cipher key-extraction Matrix.
The vocal print sound spectrograph that 4th step obtains is divided into two major classes, and one kind is the related text vocal print sound spectrograph of user, The comparison vocal print sound spectrograph that the another kind of related text for non-user is mixed with irrelevant text, referred to as positive and negative sample set It closes.
With M=[M1,M2] indicate to participate in the positive and negative sample set of training, Mi=[xi1,xi2,...,xiL], i ∈ { 1,2 } table Show that the i-th class sample set, i=1 are positive sample, i=2 is negative sample;xir∈Rd, 1≤i≤2,1≤r≤L, xirFor it is one-dimensional arrange to Amount, forms a two-dimensional matrix by the value of all pixels point of a vocal print sound spectrograph, then sequentially by every a line of two-dimensional matrix Splicing, obtains one-dimensional row vector, a dimensional vector x is obtained after transpositionir, xirLength is d, RdIndicate that d ties up real number field, L indicates same There are L vocal print sound spectrographs, i.e. L column vector in a kind of sample set.
The characteristics of now according to two class samples, training vocal print cipher key-extraction matrix W1, W1∈Rd×dz, obtain formula (2):
WhereinFor the positive sample mean value of training sample,For the negative sample mean value of training sample.J is cost function, instead Training sample has been reflected through vocal print cipher key-extraction matrix W1It is poor with the distance between positive and negative sample set mean value after projection, with Euclidean away from From calculating.
It enables:
Solution matrix (H1-H2) eigen vector, obtain vocal print cipher key-extraction matrix W1, it may be assumed that (H1-H2) w=λ w;W is matrix (H1-H2) feature vector, λ is characterized value.
Due to { w1,w2,...,wdzIt is feature vector, respectively correspond characteristic value { λ12,...,λdz, wherein λ1≥λ2 ≥...≥λdz>=0, feature vector of the characteristic value less than 0 is not included into matrix W1Construction.
So far vocal print cipher key-extraction matrix W is trained1
Step 2: vocal print cipher key-extraction, as shown in figure 4, specific steps are as follows:
Step 1, user enroll itself text related voice, and 3 seconds or so.
Step 2 extracts vocal print sound spectrograph, with specific reference to step 1 third step.
Step 3 the pretreatment such as is filtered to vocal print sound spectrograph, normalizes, vocal print sound spectrograph is then switched to rectangular Formula, and sequentially splice by row, obtain vocal print vector xt
Step 4, with the vocal print invariant feature learning matrix W of step 1 training1, premultiplication step 3 obtains after transposition vocal print Vector xt, i.e. W1 T·xt, obtain dzTie up vocal print feature vector xtz, xtzFor vocal print feature vector after stabilization.
Step 5, to xtzPer one-dimensional component carry out a chessboard method operation, further stablize vocal print feature vector be
Chessboard method operation, steps are as follows:
To xtzEach of dimension component be denoted as xtzi
(3) quantitative formula is shown in formula:
Wherein, D is the grid size of chessboard method, takes positive number, occurrence can rule of thumb be selected by user, general satisfaction Λ (x) value is between 0~63, xtziFor xtzEach of component, Λ (x) be integer value.
Λ (x) i.e. xtziValue after quantization is in checker-wise closest to xtziThe coordinate value of point and the grid of coordinate origin.
Step 6 takes the 5th step calculated result vectorPreceding 32 or 64 components, front and back splicing, with each component value 0~64, the calculating of 4 bit keys can be formed, the vocal print key of 128bit or 256bit can be formed;Complete mentioning for vocal print key It takes.
The present invention has the characteristics that higher similitude using same words person's text related voice vocal print frequency spectrum, from text phase It closes and extracts vocal print sound spectrograph in voice, multiple vocal print sound spectrographs that same section of text of the same words person is obtained through multiple repairing weld have Higher similitude, meanwhile, there is obvious difference between the vocal print sound spectrograph of same section of Text Feature Extraction of different words persons.Extraction sound After line sound spectrograph, common characteristic information is extracted from multiple vocal print sound spectrographs by machine learning method as shown in Figure 5, is passed through After crossing segment quantization, text correlation vocal print key is obtained.Vocal print key retains biometric templates without server-side, has higher Safety, and can be merged with universal networks enciphering and deciphering algorithms such as AES, RSA, it is user-friendly.This method can obtain More stable vocal print key, vocal print cipher key-extraction accuracy rate are greater than 95%, and key length is up to 256bit.

Claims (2)

1. a kind of relevant vocal print key generation method of text, it is characterised in that: trained including vocal print key and vocal print key mentions It takes;The vocal print sample training that the training of vocal print key is acquired by early period goes out vocal print cipher key-extraction matrix;Vocal print cipher key-extraction will be to It extracts vocal print sample after pretreatment, is multiplied by the cipher key-extraction matrix that the training of vocal print key obtains, obtains vocal print key;Specific step It is rapid as follows:
Step 1: the training of vocal print key, specific steps are as follows:
The first step, for user to the same text information, generally 1-3 continuous words enroll own voices, be repeated 20 times with On, number is adjusted by user according to training;
Second step, admission 10 or more different user read the voice of same text information, are respectively repeated 20 times above;Admission 10 The above different user reads voice similar in different text informations, duration, is respectively repeated 20 times above;
Third step pre-processes the first and second step admission voice, extracts vocal print sound spectrograph detailed process are as follows:
1) pre- enhancing:
Voice time domain signal is indicated with S1 (n), wherein n=0,1,2 ..., N-1, enhance formula in advance are as follows: S (n)=S1 (n)-a*S1 (n-1), 0.9 < a < 1.0;A is coefficient to be aggravated, to adjust amplitude to be enhanced;
2) sound frame, i.e., to voice signal framing;
3) Hamming window is handled:
Voice time domain signal after sound frame is S (n), n=0,1,2 ..., N-1, and expression has been divided into n sections of voice signals;So multiply Voice time domain signal after upper Hamming window is S ' (n), sees formula (1):
S ' (n)=S (n) * W (n) (1);
:The value interval of a=0.46, a are 0.3~0.7 Between, specific value is determined according to experiment and empirical data;W (n) is Hamming window function, has smoother low-pass characteristic, energy The frequency characteristic of enough preferable reflection Short Time Speech signals;
4) fast Fourier transform FFT:
Implement base 2FFT transformation to voice time domain signal S ' (n) after Hamming window is multiplied by, obtains linear spectral X (n, k), base 2FFT It is transformed to general-purpose algorithm in the art;X (n, k) is the spectral energy density function of n-th section of speech frame, and k corresponding spectrum section is each Section speech frame has corresponded to a timeslice on time shaft;
5) text correlation vocal print sound spectrograph is generated:
Use time n as time axial coordinate, k is incited somebody to action as frequency spectrum axial coordinate | X (n, k) |2Value be expressed as gray level, be shown in phase On the coordinate points position answered, that is, constitute vocal print sound spectrograph;By converting 10log10(|X(n,k)|2) obtain the dB table of sound spectrograph Show;
4th step, the pretreatment such as is filtered to vocal print sound spectrograph, normalizes, and specific filtering mode has Gauss, small echo, binaryzation Etc. the general filtering mode of field of signal processing, specifically using the combination of which kind of mode or several ways, by user according to practical survey Try situation selection;
5th step carries out machine learning to vocal print sound spectrograph, obtains vocal print invariant feature learning matrix, i.e. vocal print cipher key-extraction square Battle array;
The vocal print sound spectrograph that 4th step obtains is divided into two major classes, and one kind is the related text vocal print sound spectrograph of user, another Class is the comparison vocal print sound spectrograph that the related text of non-user is mixed with irrelevant text, referred to as positive and negative sample set;
With M=[M1,M2] indicate to participate in the positive and negative sample set of training, Mi=[xi1,xi2,...,xiL], i ∈ { 1,2 } indicates i-th Class sample set, i=1 are positive sample, and i=2 is negative sample;xir∈Rd, 1≤i≤2,1≤r≤L, xirFor a dimensional vector, by The value of all pixels point of one vocal print sound spectrograph forms a two-dimensional matrix, then every a line of two-dimensional matrix is sequentially spliced, One-dimensional row vector is obtained, a dimensional vector x is obtained after transpositionir, xirLength is d, RdIndicate that d ties up real number field, L indicates same class sample There are L vocal print sound spectrographs, i.e. L column vector in this set;
The characteristics of now according to two class samples, training vocal print cipher key-extraction matrix W1, W1∈Rd×dz, obtain formula (2):
WhereinFor the positive sample mean value of training sample,For the negative sample mean value of training sample;J is cost function, is reflected Training sample is through vocal print cipher key-extraction matrix W1It is poor with the distance between positive and negative sample set mean value after projection, with Euclidean distance meter It calculates;
It enables:
Solution matrix (H1-H2) eigen vector, obtain vocal print cipher key-extraction matrix W1, it may be assumed that (H1-H2) w=λ w;w For matrix (H1-H2) feature vector, λ is characterized value;
Due to { w1,w2,...,wdzIt is feature vector, respectively correspond characteristic value { λ12,...,λdz, wherein λ1≥λ2≥...≥ λdz>=0, feature vector of the characteristic value less than 0 is not included into matrix W1Construction;
So far vocal print cipher key-extraction matrix W is trained1
Step 2: vocal print cipher key-extraction, specific steps are as follows:
Step 1, user enroll itself text related voice, and 3 seconds or so;
Step 2 extracts vocal print sound spectrograph, with specific reference to step 1 third step;
Step 3 the pretreatment such as is filtered to vocal print sound spectrograph, normalizes, vocal print sound spectrograph is then switched to matrix form, and Sequentially splice by row, obtains vocal print vector xt
Step 4, with the vocal print invariant feature learning matrix W of step 1 training1, premultiplication step 3 obtains after transposition vocal print vector xt, i.e. W1 T·xt, obtain dzTie up vocal print feature vector xtz, xtzFor vocal print feature vector after stabilization;
Step 5, to xtzPer one-dimensional component carry out a chessboard method operation, further stablize vocal print feature vector be
Chessboard method operation, steps are as follows:
To xtzEach of dimension component be denoted as xtzi
(3) quantitative formula is shown in formula:
Wherein, D is the grid size of chessboard method, takes positive number, occurrence can rule of thumb be selected by user, general satisfaction Λ (x) Value between 0~63, xtziFor xtzEach of component, Λ (x) be integer value;
Λ (x) i.e. xtziValue after quantization is in checker-wise closest to xtziThe coordinate value of point and the grid of coordinate origin;
Step 6 takes the 5th step calculated result vectorPreceding 32 or 64 components, front and back splicing, with each component value 0~ 64, the calculating of 4 bit keys can be formed, the vocal print key of 128bit or 256bit can be formed;Complete the extraction of vocal print key.
2. the relevant vocal print key generation method of a kind of text as described in claim 1, it is characterised in that: described in the 4th step Normalized finger speech spectrogram size is unified to fixed length and width size, and the primary system one of each pixel of sound spectrograph arrives 0-255 In range, it can be realized using the imresize function in matlab function library.
CN201811139547.4A 2018-09-28 2018-09-28 Text-related voiceprint key generation method Active CN109326294B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811139547.4A CN109326294B (en) 2018-09-28 2018-09-28 Text-related voiceprint key generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811139547.4A CN109326294B (en) 2018-09-28 2018-09-28 Text-related voiceprint key generation method

Publications (2)

Publication Number Publication Date
CN109326294A true CN109326294A (en) 2019-02-12
CN109326294B CN109326294B (en) 2022-09-20

Family

ID=65266096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811139547.4A Active CN109326294B (en) 2018-09-28 2018-09-28 Text-related voiceprint key generation method

Country Status (1)

Country Link
CN (1) CN109326294B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110223699A (en) * 2019-05-15 2019-09-10 桂林电子科技大学 A kind of speaker's identity confirmation method, device and storage medium
CN110322887A (en) * 2019-04-28 2019-10-11 武汉大晟极科技有限公司 A kind of polymorphic type audio signal energies feature extracting method
CN111161705A (en) * 2019-12-19 2020-05-15 上海寒武纪信息科技有限公司 Voice conversion method and device
CN112908303A (en) * 2021-01-28 2021-06-04 广东优碧胜科技有限公司 Audio signal processing method and device and electronic equipment
CN113129897A (en) * 2021-04-08 2021-07-16 杭州电子科技大学 Voiceprint recognition method based on attention mechanism recurrent neural network
CN113179157A (en) * 2021-03-31 2021-07-27 杭州电子科技大学 Text-related voiceprint biological key generation method based on deep learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001092974A (en) * 1999-08-06 2001-04-06 Internatl Business Mach Corp <Ibm> Speaker recognizing method, device for executing the same, method and device for confirming audio generation
CN103873254A (en) * 2014-03-03 2014-06-18 杭州电子科技大学 Method for generating human vocal print biometric key
CN103971690A (en) * 2013-01-28 2014-08-06 腾讯科技(深圳)有限公司 Voiceprint recognition method and device
CN106128465A (en) * 2016-06-23 2016-11-16 成都启英泰伦科技有限公司 A kind of Voiceprint Recognition System and method
CN107274890A (en) * 2017-07-04 2017-10-20 清华大学 Vocal print composes extracting method and device
CN108198561A (en) * 2017-12-13 2018-06-22 宁波大学 A kind of pirate recordings speech detection method based on convolutional neural networks
CN112786059A (en) * 2021-03-11 2021-05-11 合肥市清大创新研究院有限公司 Voiceprint feature extraction method and device based on artificial intelligence

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001092974A (en) * 1999-08-06 2001-04-06 Internatl Business Mach Corp <Ibm> Speaker recognizing method, device for executing the same, method and device for confirming audio generation
CN103971690A (en) * 2013-01-28 2014-08-06 腾讯科技(深圳)有限公司 Voiceprint recognition method and device
CN103873254A (en) * 2014-03-03 2014-06-18 杭州电子科技大学 Method for generating human vocal print biometric key
CN106128465A (en) * 2016-06-23 2016-11-16 成都启英泰伦科技有限公司 A kind of Voiceprint Recognition System and method
CN107274890A (en) * 2017-07-04 2017-10-20 清华大学 Vocal print composes extracting method and device
CN108198561A (en) * 2017-12-13 2018-06-22 宁波大学 A kind of pirate recordings speech detection method based on convolutional neural networks
CN112786059A (en) * 2021-03-11 2021-05-11 合肥市清大创新研究院有限公司 Voiceprint feature extraction method and device based on artificial intelligence

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
丁冬兵: "TL-CNN-GAP模型下的小样本声纹识别方法研究", 《电脑知识与技术》 *
冯辉宗等: "语谱特征的身份认证向量识别方法", 《重庆大学学报》 *
马义德等: "基于PCNN的语谱图特征提取在说话人识别中的应用", 《计算机工程与应用》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110322887A (en) * 2019-04-28 2019-10-11 武汉大晟极科技有限公司 A kind of polymorphic type audio signal energies feature extracting method
CN110223699A (en) * 2019-05-15 2019-09-10 桂林电子科技大学 A kind of speaker's identity confirmation method, device and storage medium
CN110223699B (en) * 2019-05-15 2021-04-13 桂林电子科技大学 Speaker identity confirmation method, device and storage medium
CN111161705A (en) * 2019-12-19 2020-05-15 上海寒武纪信息科技有限公司 Voice conversion method and device
CN111161705B (en) * 2019-12-19 2022-11-18 寒武纪(西安)集成电路有限公司 Voice conversion method and device
CN112908303A (en) * 2021-01-28 2021-06-04 广东优碧胜科技有限公司 Audio signal processing method and device and electronic equipment
CN113179157A (en) * 2021-03-31 2021-07-27 杭州电子科技大学 Text-related voiceprint biological key generation method based on deep learning
CN113179157B (en) * 2021-03-31 2022-05-17 杭州电子科技大学 Text-related voiceprint biological key generation method based on deep learning
CN113129897A (en) * 2021-04-08 2021-07-16 杭州电子科技大学 Voiceprint recognition method based on attention mechanism recurrent neural network
CN113129897B (en) * 2021-04-08 2024-02-20 杭州电子科技大学 Voiceprint recognition method based on attention mechanism cyclic neural network

Also Published As

Publication number Publication date
CN109326294B (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN109326294A (en) A kind of relevant vocal print key generation method of text
Wu et al. LVID: A multimodal biometrics authentication system on smartphones
Rui et al. A survey on biometric authentication: Toward secure and privacy-preserving identification
Gomez-Barrero et al. General framework to evaluate unlinkability in biometric template protection systems
Galbally et al. Iris image reconstruction from binary templates: An efficient probabilistic approach based on genetic algorithms
Tolosana et al. BioTouchPass2: Touchscreen password biometrics using time-aligned recurrent neural networks
US8862888B2 (en) Systems and methods for three-factor authentication
US9430628B2 (en) Access authorization based on synthetic biometric data and non-biometric data
Galbally et al. Image quality assessment for fake biometric detection: Application to iris, fingerprint, and face recognition
CN110677260B (en) Authentication method, device, electronic equipment and storage medium
CN105512535A (en) User authentication method and user authentication device
CN106302330A (en) Auth method, device and system
CN106503655A (en) A kind of electric endorsement method and sign test method based on face recognition technology
CN103873253B (en) Method for generating human fingerprint biometric key
JP7412496B2 (en) Living body (liveness) detection verification method, living body detection verification system, recording medium, and training method for living body detection verification system
CN113505652A (en) Living body detection method, living body detection device, electronic apparatus, and storage medium
CN112132996A (en) Door lock control method, mobile terminal, door control terminal and storage medium
KR20220123118A (en) Systems and methods for distinguishing user, action and device-specific characteristics recorded in motion sensor data
Zhang et al. Volere: Leakage resilient user authentication based on personal voice challenges
Akasaka et al. Model-free template reconstruction attack with feature converter
Liu et al. Biohashing for human acoustic signature based on random projection
CN220983921U (en) Recognition device based on face and voiceprint
Uzun Security and Privacy in Biometrics-Based Systems.
Mtibaa Towards robust and privacy-preserving speaker verification systems
Korshunov et al. Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant