CN109326294A - A kind of relevant vocal print key generation method of text - Google Patents
A kind of relevant vocal print key generation method of text Download PDFInfo
- Publication number
- CN109326294A CN109326294A CN201811139547.4A CN201811139547A CN109326294A CN 109326294 A CN109326294 A CN 109326294A CN 201811139547 A CN201811139547 A CN 201811139547A CN 109326294 A CN109326294 A CN 109326294A
- Authority
- CN
- China
- Prior art keywords
- vocal print
- key
- matrix
- training
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000001755 vocal effect Effects 0.000 title claims abstract description 161
- 238000000034 method Methods 0.000 title claims abstract description 37
- 239000011159 matrix material Substances 0.000 claims abstract description 48
- 238000000605 extraction Methods 0.000 claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 38
- 239000000284 extract Substances 0.000 claims abstract description 12
- 238000010801 machine learning Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 15
- 238000001228 spectrum Methods 0.000 claims description 7
- 238000001914 filtration Methods 0.000 claims description 6
- 230000003595 spectral effect Effects 0.000 claims description 6
- 238000009432 framing Methods 0.000 claims description 5
- 238000013139 quantization Methods 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 238000002474 experimental method Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000006641 stabilisation Effects 0.000 claims description 3
- 238000011105 stabilization Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 6
- 238000012360 testing method Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0861—Generation of secret information including derivation or calculation of cryptographic keys or passwords
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0861—Generation of secret information including derivation or calculation of cryptographic keys or passwords
- H04L9/0866—Generation of secret information including derivation or calculation of cryptographic keys or passwords involving user or device identifiers, e.g. serial number, physical or biometrical information, DNA, hand-signature or measurable physical characteristics
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Complex Calculations (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The present invention relates to a kind of relevant vocal print key generation methods of text.The present invention includes the training of vocal print key and vocal print cipher key-extraction;The vocal print sample training that the training of vocal print key is acquired by early period goes out vocal print cipher key-extraction matrix.Vocal print sample to be extracted after pretreatment, is multiplied by the cipher key-extraction matrix that the training of vocal print key obtains, obtains vocal print key by vocal print cipher key-extraction.The present invention while more fully expressing the sound speciality of words person, keeps front and back sample to have more stable similitude using the relevant sound spectrograph of words person's text.On this basis, vocal print invariant feature vector is trained from multiple sound spectrographs with machine learning method and extracts matrix, subsequent samples are handled with the matrix, can extract more stable vocal print key.Method has the characteristics that stability is good, succinct, convenient to use.
Description
Technical field
The invention belongs to cyberspace security technology area, it is related to a kind of relevant vocal print key generation method of text.
Background technique
Sound groove recognition technology in e has been a kind of biometrics identification technology of comparative maturity, with artificial intelligence technology in recent years
Rapid development, the accuracy rate of Application on Voiceprint Recognition also obtained a degree of raising, and Application on Voiceprint Recognition is accurate in lower noise environment
Rate can reach 96% or more, be widely used in authentication scene.
With going deep into for vocal print technical application, begins trying directly to extract from mankind's vocal print in the art and stablize number
Word sequence is used as biological secret key, it can all kinds of keys is directly generated with vocal print, with existing password, public private key cryptographic skill
Fusion that art is seamless can remove the inconvenience of vocal print acquisition and storage process, and the safety issue that may cause from, further rich
The means and method of rich network authentication.
Vocal print biological secret key technology has a degree of research, as Chinese invention patent ZL201110003202.8 is based on
The file encryption and decryption method of vocal print propose one and extract the scheme for stablizing key sequence from voiceprint.But it should
Scheme only stablizes vocal print feature value with chessboard method, and stablizing effect is limited and key length is insufficient.Chinese invention patent
A kind of mankind's vocal print biological secret key generation method of ZL201410074511.8 proposes a kind of extraction vocal print Gauss model, and will
Aspect of model parameter is projected to higher dimensional space to obtain the technology path of stable vocal print key.The program obtain vocal print key compared with
Previous patent stability is significantly improved, but for the key authentication environment of high-stability requirement, above-mentioned technical proposal is extracted
The stability of vocal print biological secret key still needs to be further increased.
Summary of the invention
It is an object of the invention to provide a kind of relevant vocal print key generation methods of text.
The present invention includes the training of vocal print key and vocal print cipher key-extraction;The vocal print sample that the training of vocal print key is acquired by early period
Originally vocal print cipher key-extraction matrix is trained.Vocal print sample to be extracted after pretreatment, is multiplied by vocal print key by vocal print cipher key-extraction
The cipher key-extraction matrix that training obtains, obtains vocal print key.Specific step is as follows:
Step 1: the training of vocal print key, specific steps are as follows:
The first step, user enroll own voices, repeat 20 to the same text information, generally 1-3 continuous words
More than secondary, number is adjusted by user according to training.
Second step, admission 10 or more different user read the voice of same text information, are respectively repeated 20 times above;Admission
10 or more different users read voice similar in different text informations, duration, are respectively repeated 20 times above.
Third step pre-processes the first and second step admission voice, extracts vocal print sound spectrograph detailed process are as follows:
1) enhancing (Pre-Emphasis) in advance:
Voice time domain signal is indicated with S1 (n), wherein n=0,1,2 ..., N-1, enhance formula in advance are as follows: S (n)=S1 (n)-
A*S1 (n-1), 0.9 < a < 1.0.A is coefficient to be aggravated, to adjust amplitude to be enhanced.
2) sound frame (Framing), i.e., to voice signal framing.
3) Hamming window (Hamming Windowing) is handled:
Voice time domain signal after sound frame is S (n), n=0,1,2 ..., N-1, and expression has been divided into n sections of voice signals;That
Voice time domain signal after being multiplied by Hamming window is S ' (n), sees formula (1):
S ' (n)=S (n) * W (n) is (1);
:The value interval of a=0.46, a are 0.3
Between~0.7, specific value is determined according to experiment and empirical data.W (n) is Hamming window function, special with smoother low pass
Property, it can preferably reflect the frequency characteristic of Short Time Speech signal.
4) fast Fourier transform (FFT):
Implement base 2FFT transformation to voice time domain signal S ' (n) after Hamming window is multiplied by, obtains linear spectral X (n, k), base
2FFT is transformed to general-purpose algorithm in the art;Spectral energy density function of the X (n, k) for n-th section of speech frame, k corresponding spectrum section,
Each section of speech frame has corresponded to a timeslice on time shaft.
5) text correlation vocal print sound spectrograph is generated:
Use time n as time axial coordinate, k is incited somebody to action as frequency spectrum axial coordinate | X (n, k) |2Value be expressed as gray level, show
Show on corresponding coordinate points position, that is, constitutes vocal print sound spectrograph.By converting 10log10(|X(n,k)|2) obtain sound spectrograph
DB indicate.
4th step, the pretreatment such as is filtered to vocal print sound spectrograph, normalizes, and specific filtering mode has Gauss, small echo, two
The general filtering mode of the field of signal processing such as value, specifically using the combination of which kind of mode or several ways, by user according to reality
The selection of border test case.Normalized finger speech spectrogram size is unified to fixed length and width size, each pixel of sound spectrograph
Primary system one within the scope of 0-255, specific method can be all made of universal method in the art, as picture size adjustment can be used
Imresize function in matlab function library is realized.
5th step carries out machine learning to vocal print sound spectrograph, obtains vocal print invariant feature learning matrix, i.e. vocal print cipher key-extraction
Matrix.
The vocal print sound spectrograph that 4th step obtains is divided into two major classes, and one kind is the related text vocal print sound spectrograph of user,
The comparison vocal print sound spectrograph that the another kind of related text for non-user is mixed with irrelevant text, referred to as positive and negative sample set
It closes.
With M=[M1,M2] indicate to participate in the positive and negative sample set of training, Mi=[xi1,xi2,...,xiL], i ∈ { 1,2 } table
Show that the i-th class sample set, i=1 are positive sample, i=2 is negative sample;xir∈Rd, 1≤i≤2,1≤r≤L, xirFor it is one-dimensional arrange to
Amount, forms a two-dimensional matrix by the value of all pixels point of a vocal print sound spectrograph, then sequentially by every a line of two-dimensional matrix
Splicing, obtains one-dimensional row vector, a dimensional vector x is obtained after transpositionir, xirLength is d, RdIndicate that d ties up real number field, L indicates same
There are L vocal print sound spectrographs, i.e. L column vector in a kind of sample set.
The characteristics of now according to two class samples, training vocal print cipher key-extraction matrix W1, W1∈Rd×dz, obtain formula (2):
WhereinFor the positive sample mean value of training sample,For the negative sample mean value of training sample.J is cost function, instead
Training sample has been reflected through vocal print cipher key-extraction matrix W1It is poor with the distance between positive and negative sample set mean value after projection, with Euclidean away from
From calculating.
It enables:
Solution matrix (H1-H2) eigen vector, obtain vocal print cipher key-extraction matrix W1, it may be assumed that (H1-H2) w=λ
w;W is matrix (H1-H2) feature vector, λ is characterized value.
Due to { w1,w2,...,wdzIt is feature vector, respectively correspond characteristic value { λ1,λ2,...,λdz, wherein λ1≥λ2
≥...≥λdz>=0, feature vector of the characteristic value less than 0 is not included into matrix W1Construction.
So far vocal print cipher key-extraction matrix W is trained1。
Step 2: vocal print cipher key-extraction, specific steps are as follows:
Step 1, user enroll itself text related voice, and 3 seconds or so.
Step 2 extracts vocal print sound spectrograph, with specific reference to step 1 third step.
Step 3 the pretreatment such as is filtered to vocal print sound spectrograph, normalizes, vocal print sound spectrograph is then switched to rectangular
Formula, and sequentially splice by row, obtain vocal print vector xt。
Step 4, with the vocal print invariant feature learning matrix W of step 1 training1, premultiplication step 3 obtains after transposition vocal print
Vector xt, i.e. W1 T·xt, obtain dzTie up vocal print feature vector xtz, xtzFor vocal print feature vector after stabilization.
Step 5, to xtzPer one-dimensional component carry out a chessboard method operation, further stablize vocal print feature vector be
Chessboard method operation, steps are as follows:
To xtzEach of dimension component be denoted as xtzi;
(3) quantitative formula is shown in formula:
Wherein, D is the grid size of chessboard method, takes positive number, occurrence can rule of thumb be selected by user, general satisfaction Λ
(x) value is between 0~63, xtziFor xtzEach of component, Λ (x) be integer value.
Λ (x) i.e. xtziValue after quantization is in checker-wise closest to xtziThe coordinate value of point and the grid of coordinate origin.
Step 6 takes the 5th step calculated result vectorPreceding 32 or 64 components, front and back splicing, with each component value
0~64, the calculating of 4 bit keys can be formed, the vocal print key of 128bit or 256bit can be formed;Complete mentioning for vocal print key
It takes.
The present invention while more fully expressing the sound speciality of words person, is kept using the relevant sound spectrograph of words person's text
Front and back sample has more stable similitude.On this basis, it is trained from multiple sound spectrographs with machine learning method
Vocal print invariant feature vector extracts matrix, is handled with the matrix subsequent samples, can extract more stable vocal print key.Side
Method has the characteristics that stability is good, succinct, convenient to use.
Detailed description of the invention
Fig. 1 is vocal print key of the present invention training flow chart;
Fig. 2 is vocal print sound spectrograph product process figure of the present invention;
Fig. 3 is vocal print sound spectrograph of the present invention;
Fig. 4 is vocal print cipher key-extraction flow chart of the present invention.
Fig. 5 is vocal print feature machine learning schematic diagram of the present invention.
Specific embodiment
A kind of relevant vocal print key generation method of text includes that vocal print key is trained and vocal print cipher key-extraction;Vocal print key
The vocal print sample training that training is acquired by early period goes out vocal print cipher key-extraction matrix.Vocal print cipher key-extraction is by vocal print sample to be extracted
After pretreatment, it is multiplied by the cipher key-extraction matrix that the training of vocal print key obtains, obtains vocal print key.Specific step is as follows:
Step 1: the training of vocal print key, as shown in Figure 1, specific steps are as follows:
The first step, user enroll own voices, repeat 20 to the same text information, generally 1-3 continuous words
Secondary above (number can be adjusted by user according to training).
Second step, admission 10 or more different user read the voice of same text information, are respectively repeated 20 times above;Admission
10 or more different users read voice similar in different text informations, duration, are respectively repeated 20 times above.
Third step pre-processes the first and second step admission voice, as shown in Figures 2 and 3, it is specific to extract vocal print sound spectrograph
Process are as follows:
1) enhancing (Pre-Emphasis) in advance:
Voice time domain signal is indicated with S1 (n), wherein n=0,1,2 ..., N-1, enhance formula in advance are as follows: S (n)=S1 (n)-
A*S1 (n-1), 0.9 < a < 1.0.A is coefficient to be aggravated, to adjust amplitude to be enhanced.
2) sound frame (Framing), i.e., to voice signal framing.
3) Hamming window (Hamming Windowing) is handled:
Voice time domain signal after sound frame is S (n), n=0,1,2 ..., N-1, and expression has been divided into n sections of voice signals;That
Voice time domain signal after being multiplied by Hamming window is S ' (n), sees formula (1):
S ' (n)=S (n) * W (n) is (1);
:The value interval of a=0.46, a are 0.3
Between~0.7, specific value is determined according to experiment and empirical data.W (n) is Hamming window function, special with smoother low pass
Property, it can preferably reflect the frequency characteristic of Short Time Speech signal.
5) fast Fourier transform (FFT):
Implement base 2FFT transformation to voice time domain signal S ' (n) after Hamming window is multiplied by, obtains linear spectral X (n, k), base
2FFT is transformed to general-purpose algorithm in the art;Spectral energy density function of the X (n, k) for n-th section of speech frame, k corresponding spectrum section,
Each section of speech frame has corresponded to a timeslice on time shaft.
5) text correlation vocal print sound spectrograph is generated:
Use time n as time axial coordinate, k is incited somebody to action as frequency spectrum axial coordinate | X (n, k) |2Value be expressed as gray level, show
Show on corresponding coordinate points position, that is, constitutes vocal print sound spectrograph.By converting 10log10(|X(n,k)|2) obtain sound spectrograph
DB indicate.
4th step, the pretreatment such as is filtered to vocal print sound spectrograph, normalizes, and specific filtering mode has Gauss, small echo, two
The general filtering mode of the field of signal processing such as value, specifically using the combination of which kind of mode or several ways, by user according to reality
The selection of border test case.Normalized finger speech spectrogram size is unified to fixed length and width size, each pixel of sound spectrograph
Primary system one within the scope of 0-255, specific method can be all made of universal method in the art, as picture size adjustment can be used
Imresize function in matlab function library is realized.
5th step carries out machine learning to vocal print sound spectrograph, obtains vocal print invariant feature learning matrix, i.e. vocal print cipher key-extraction
Matrix.
The vocal print sound spectrograph that 4th step obtains is divided into two major classes, and one kind is the related text vocal print sound spectrograph of user,
The comparison vocal print sound spectrograph that the another kind of related text for non-user is mixed with irrelevant text, referred to as positive and negative sample set
It closes.
With M=[M1,M2] indicate to participate in the positive and negative sample set of training, Mi=[xi1,xi2,...,xiL], i ∈ { 1,2 } table
Show that the i-th class sample set, i=1 are positive sample, i=2 is negative sample;xir∈Rd, 1≤i≤2,1≤r≤L, xirFor it is one-dimensional arrange to
Amount, forms a two-dimensional matrix by the value of all pixels point of a vocal print sound spectrograph, then sequentially by every a line of two-dimensional matrix
Splicing, obtains one-dimensional row vector, a dimensional vector x is obtained after transpositionir, xirLength is d, RdIndicate that d ties up real number field, L indicates same
There are L vocal print sound spectrographs, i.e. L column vector in a kind of sample set.
The characteristics of now according to two class samples, training vocal print cipher key-extraction matrix W1, W1∈Rd×dz, obtain formula (2):
WhereinFor the positive sample mean value of training sample,For the negative sample mean value of training sample.J is cost function, instead
Training sample has been reflected through vocal print cipher key-extraction matrix W1It is poor with the distance between positive and negative sample set mean value after projection, with Euclidean away from
From calculating.
It enables:
Solution matrix (H1-H2) eigen vector, obtain vocal print cipher key-extraction matrix W1, it may be assumed that (H1-H2) w=λ
w;W is matrix (H1-H2) feature vector, λ is characterized value.
Due to { w1,w2,...,wdzIt is feature vector, respectively correspond characteristic value { λ1,λ2,...,λdz, wherein λ1≥λ2
≥...≥λdz>=0, feature vector of the characteristic value less than 0 is not included into matrix W1Construction.
So far vocal print cipher key-extraction matrix W is trained1。
Step 2: vocal print cipher key-extraction, as shown in figure 4, specific steps are as follows:
Step 1, user enroll itself text related voice, and 3 seconds or so.
Step 2 extracts vocal print sound spectrograph, with specific reference to step 1 third step.
Step 3 the pretreatment such as is filtered to vocal print sound spectrograph, normalizes, vocal print sound spectrograph is then switched to rectangular
Formula, and sequentially splice by row, obtain vocal print vector xt。
Step 4, with the vocal print invariant feature learning matrix W of step 1 training1, premultiplication step 3 obtains after transposition vocal print
Vector xt, i.e. W1 T·xt, obtain dzTie up vocal print feature vector xtz, xtzFor vocal print feature vector after stabilization.
Step 5, to xtzPer one-dimensional component carry out a chessboard method operation, further stablize vocal print feature vector be
Chessboard method operation, steps are as follows:
To xtzEach of dimension component be denoted as xtzi;
(3) quantitative formula is shown in formula:
Wherein, D is the grid size of chessboard method, takes positive number, occurrence can rule of thumb be selected by user, general satisfaction Λ
(x) value is between 0~63, xtziFor xtzEach of component, Λ (x) be integer value.
Λ (x) i.e. xtziValue after quantization is in checker-wise closest to xtziThe coordinate value of point and the grid of coordinate origin.
Step 6 takes the 5th step calculated result vectorPreceding 32 or 64 components, front and back splicing, with each component value
0~64, the calculating of 4 bit keys can be formed, the vocal print key of 128bit or 256bit can be formed;Complete mentioning for vocal print key
It takes.
The present invention has the characteristics that higher similitude using same words person's text related voice vocal print frequency spectrum, from text phase
It closes and extracts vocal print sound spectrograph in voice, multiple vocal print sound spectrographs that same section of text of the same words person is obtained through multiple repairing weld have
Higher similitude, meanwhile, there is obvious difference between the vocal print sound spectrograph of same section of Text Feature Extraction of different words persons.Extraction sound
After line sound spectrograph, common characteristic information is extracted from multiple vocal print sound spectrographs by machine learning method as shown in Figure 5, is passed through
After crossing segment quantization, text correlation vocal print key is obtained.Vocal print key retains biometric templates without server-side, has higher
Safety, and can be merged with universal networks enciphering and deciphering algorithms such as AES, RSA, it is user-friendly.This method can obtain
More stable vocal print key, vocal print cipher key-extraction accuracy rate are greater than 95%, and key length is up to 256bit.
Claims (2)
1. a kind of relevant vocal print key generation method of text, it is characterised in that: trained including vocal print key and vocal print key mentions
It takes;The vocal print sample training that the training of vocal print key is acquired by early period goes out vocal print cipher key-extraction matrix;Vocal print cipher key-extraction will be to
It extracts vocal print sample after pretreatment, is multiplied by the cipher key-extraction matrix that the training of vocal print key obtains, obtains vocal print key;Specific step
It is rapid as follows:
Step 1: the training of vocal print key, specific steps are as follows:
The first step, for user to the same text information, generally 1-3 continuous words enroll own voices, be repeated 20 times with
On, number is adjusted by user according to training;
Second step, admission 10 or more different user read the voice of same text information, are respectively repeated 20 times above;Admission 10
The above different user reads voice similar in different text informations, duration, is respectively repeated 20 times above;
Third step pre-processes the first and second step admission voice, extracts vocal print sound spectrograph detailed process are as follows:
1) pre- enhancing:
Voice time domain signal is indicated with S1 (n), wherein n=0,1,2 ..., N-1, enhance formula in advance are as follows: S (n)=S1 (n)-a*S1
(n-1), 0.9 < a < 1.0;A is coefficient to be aggravated, to adjust amplitude to be enhanced;
2) sound frame, i.e., to voice signal framing;
3) Hamming window is handled:
Voice time domain signal after sound frame is S (n), n=0,1,2 ..., N-1, and expression has been divided into n sections of voice signals;So multiply
Voice time domain signal after upper Hamming window is S ' (n), sees formula (1):
S ' (n)=S (n) * W (n) (1);
:The value interval of a=0.46, a are 0.3~0.7
Between, specific value is determined according to experiment and empirical data;W (n) is Hamming window function, has smoother low-pass characteristic, energy
The frequency characteristic of enough preferable reflection Short Time Speech signals;
4) fast Fourier transform FFT:
Implement base 2FFT transformation to voice time domain signal S ' (n) after Hamming window is multiplied by, obtains linear spectral X (n, k), base 2FFT
It is transformed to general-purpose algorithm in the art;X (n, k) is the spectral energy density function of n-th section of speech frame, and k corresponding spectrum section is each
Section speech frame has corresponded to a timeslice on time shaft;
5) text correlation vocal print sound spectrograph is generated:
Use time n as time axial coordinate, k is incited somebody to action as frequency spectrum axial coordinate | X (n, k) |2Value be expressed as gray level, be shown in phase
On the coordinate points position answered, that is, constitute vocal print sound spectrograph;By converting 10log10(|X(n,k)|2) obtain the dB table of sound spectrograph
Show;
4th step, the pretreatment such as is filtered to vocal print sound spectrograph, normalizes, and specific filtering mode has Gauss, small echo, binaryzation
Etc. the general filtering mode of field of signal processing, specifically using the combination of which kind of mode or several ways, by user according to practical survey
Try situation selection;
5th step carries out machine learning to vocal print sound spectrograph, obtains vocal print invariant feature learning matrix, i.e. vocal print cipher key-extraction square
Battle array;
The vocal print sound spectrograph that 4th step obtains is divided into two major classes, and one kind is the related text vocal print sound spectrograph of user, another
Class is the comparison vocal print sound spectrograph that the related text of non-user is mixed with irrelevant text, referred to as positive and negative sample set;
With M=[M1,M2] indicate to participate in the positive and negative sample set of training, Mi=[xi1,xi2,...,xiL], i ∈ { 1,2 } indicates i-th
Class sample set, i=1 are positive sample, and i=2 is negative sample;xir∈Rd, 1≤i≤2,1≤r≤L, xirFor a dimensional vector, by
The value of all pixels point of one vocal print sound spectrograph forms a two-dimensional matrix, then every a line of two-dimensional matrix is sequentially spliced,
One-dimensional row vector is obtained, a dimensional vector x is obtained after transpositionir, xirLength is d, RdIndicate that d ties up real number field, L indicates same class sample
There are L vocal print sound spectrographs, i.e. L column vector in this set;
The characteristics of now according to two class samples, training vocal print cipher key-extraction matrix W1, W1∈Rd×dz, obtain formula (2):
WhereinFor the positive sample mean value of training sample,For the negative sample mean value of training sample;J is cost function, is reflected
Training sample is through vocal print cipher key-extraction matrix W1It is poor with the distance between positive and negative sample set mean value after projection, with Euclidean distance meter
It calculates;
It enables:
Solution matrix (H1-H2) eigen vector, obtain vocal print cipher key-extraction matrix W1, it may be assumed that (H1-H2) w=λ w;w
For matrix (H1-H2) feature vector, λ is characterized value;
Due to { w1,w2,...,wdzIt is feature vector, respectively correspond characteristic value { λ1,λ2,...,λdz, wherein λ1≥λ2≥...≥
λdz>=0, feature vector of the characteristic value less than 0 is not included into matrix W1Construction;
So far vocal print cipher key-extraction matrix W is trained1;
Step 2: vocal print cipher key-extraction, specific steps are as follows:
Step 1, user enroll itself text related voice, and 3 seconds or so;
Step 2 extracts vocal print sound spectrograph, with specific reference to step 1 third step;
Step 3 the pretreatment such as is filtered to vocal print sound spectrograph, normalizes, vocal print sound spectrograph is then switched to matrix form, and
Sequentially splice by row, obtains vocal print vector xt;
Step 4, with the vocal print invariant feature learning matrix W of step 1 training1, premultiplication step 3 obtains after transposition vocal print vector
xt, i.e. W1 T·xt, obtain dzTie up vocal print feature vector xtz, xtzFor vocal print feature vector after stabilization;
Step 5, to xtzPer one-dimensional component carry out a chessboard method operation, further stablize vocal print feature vector be
Chessboard method operation, steps are as follows:
To xtzEach of dimension component be denoted as xtzi;
(3) quantitative formula is shown in formula:
Wherein, D is the grid size of chessboard method, takes positive number, occurrence can rule of thumb be selected by user, general satisfaction Λ (x)
Value between 0~63, xtziFor xtzEach of component, Λ (x) be integer value;
Λ (x) i.e. xtziValue after quantization is in checker-wise closest to xtziThe coordinate value of point and the grid of coordinate origin;
Step 6 takes the 5th step calculated result vectorPreceding 32 or 64 components, front and back splicing, with each component value 0~
64, the calculating of 4 bit keys can be formed, the vocal print key of 128bit or 256bit can be formed;Complete the extraction of vocal print key.
2. the relevant vocal print key generation method of a kind of text as described in claim 1, it is characterised in that: described in the 4th step
Normalized finger speech spectrogram size is unified to fixed length and width size, and the primary system one of each pixel of sound spectrograph arrives 0-255
In range, it can be realized using the imresize function in matlab function library.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811139547.4A CN109326294B (en) | 2018-09-28 | 2018-09-28 | Text-related voiceprint key generation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811139547.4A CN109326294B (en) | 2018-09-28 | 2018-09-28 | Text-related voiceprint key generation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109326294A true CN109326294A (en) | 2019-02-12 |
CN109326294B CN109326294B (en) | 2022-09-20 |
Family
ID=65266096
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811139547.4A Active CN109326294B (en) | 2018-09-28 | 2018-09-28 | Text-related voiceprint key generation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109326294B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110223699A (en) * | 2019-05-15 | 2019-09-10 | 桂林电子科技大学 | A kind of speaker's identity confirmation method, device and storage medium |
CN110322887A (en) * | 2019-04-28 | 2019-10-11 | 武汉大晟极科技有限公司 | A kind of polymorphic type audio signal energies feature extracting method |
CN111161705A (en) * | 2019-12-19 | 2020-05-15 | 上海寒武纪信息科技有限公司 | Voice conversion method and device |
CN112908303A (en) * | 2021-01-28 | 2021-06-04 | 广东优碧胜科技有限公司 | Audio signal processing method and device and electronic equipment |
CN113129897A (en) * | 2021-04-08 | 2021-07-16 | 杭州电子科技大学 | Voiceprint recognition method based on attention mechanism recurrent neural network |
CN113179157A (en) * | 2021-03-31 | 2021-07-27 | 杭州电子科技大学 | Text-related voiceprint biological key generation method based on deep learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001092974A (en) * | 1999-08-06 | 2001-04-06 | Internatl Business Mach Corp <Ibm> | Speaker recognizing method, device for executing the same, method and device for confirming audio generation |
CN103873254A (en) * | 2014-03-03 | 2014-06-18 | 杭州电子科技大学 | Method for generating human vocal print biometric key |
CN103971690A (en) * | 2013-01-28 | 2014-08-06 | 腾讯科技(深圳)有限公司 | Voiceprint recognition method and device |
CN106128465A (en) * | 2016-06-23 | 2016-11-16 | 成都启英泰伦科技有限公司 | A kind of Voiceprint Recognition System and method |
CN107274890A (en) * | 2017-07-04 | 2017-10-20 | 清华大学 | Vocal print composes extracting method and device |
CN108198561A (en) * | 2017-12-13 | 2018-06-22 | 宁波大学 | A kind of pirate recordings speech detection method based on convolutional neural networks |
CN112786059A (en) * | 2021-03-11 | 2021-05-11 | 合肥市清大创新研究院有限公司 | Voiceprint feature extraction method and device based on artificial intelligence |
-
2018
- 2018-09-28 CN CN201811139547.4A patent/CN109326294B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001092974A (en) * | 1999-08-06 | 2001-04-06 | Internatl Business Mach Corp <Ibm> | Speaker recognizing method, device for executing the same, method and device for confirming audio generation |
CN103971690A (en) * | 2013-01-28 | 2014-08-06 | 腾讯科技(深圳)有限公司 | Voiceprint recognition method and device |
CN103873254A (en) * | 2014-03-03 | 2014-06-18 | 杭州电子科技大学 | Method for generating human vocal print biometric key |
CN106128465A (en) * | 2016-06-23 | 2016-11-16 | 成都启英泰伦科技有限公司 | A kind of Voiceprint Recognition System and method |
CN107274890A (en) * | 2017-07-04 | 2017-10-20 | 清华大学 | Vocal print composes extracting method and device |
CN108198561A (en) * | 2017-12-13 | 2018-06-22 | 宁波大学 | A kind of pirate recordings speech detection method based on convolutional neural networks |
CN112786059A (en) * | 2021-03-11 | 2021-05-11 | 合肥市清大创新研究院有限公司 | Voiceprint feature extraction method and device based on artificial intelligence |
Non-Patent Citations (3)
Title |
---|
丁冬兵: "TL-CNN-GAP模型下的小样本声纹识别方法研究", 《电脑知识与技术》 * |
冯辉宗等: "语谱特征的身份认证向量识别方法", 《重庆大学学报》 * |
马义德等: "基于PCNN的语谱图特征提取在说话人识别中的应用", 《计算机工程与应用》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110322887A (en) * | 2019-04-28 | 2019-10-11 | 武汉大晟极科技有限公司 | A kind of polymorphic type audio signal energies feature extracting method |
CN110223699A (en) * | 2019-05-15 | 2019-09-10 | 桂林电子科技大学 | A kind of speaker's identity confirmation method, device and storage medium |
CN110223699B (en) * | 2019-05-15 | 2021-04-13 | 桂林电子科技大学 | Speaker identity confirmation method, device and storage medium |
CN111161705A (en) * | 2019-12-19 | 2020-05-15 | 上海寒武纪信息科技有限公司 | Voice conversion method and device |
CN111161705B (en) * | 2019-12-19 | 2022-11-18 | 寒武纪(西安)集成电路有限公司 | Voice conversion method and device |
CN112908303A (en) * | 2021-01-28 | 2021-06-04 | 广东优碧胜科技有限公司 | Audio signal processing method and device and electronic equipment |
CN113179157A (en) * | 2021-03-31 | 2021-07-27 | 杭州电子科技大学 | Text-related voiceprint biological key generation method based on deep learning |
CN113179157B (en) * | 2021-03-31 | 2022-05-17 | 杭州电子科技大学 | Text-related voiceprint biological key generation method based on deep learning |
CN113129897A (en) * | 2021-04-08 | 2021-07-16 | 杭州电子科技大学 | Voiceprint recognition method based on attention mechanism recurrent neural network |
CN113129897B (en) * | 2021-04-08 | 2024-02-20 | 杭州电子科技大学 | Voiceprint recognition method based on attention mechanism cyclic neural network |
Also Published As
Publication number | Publication date |
---|---|
CN109326294B (en) | 2022-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109326294A (en) | A kind of relevant vocal print key generation method of text | |
Wu et al. | LVID: A multimodal biometrics authentication system on smartphones | |
Rui et al. | A survey on biometric authentication: Toward secure and privacy-preserving identification | |
Gomez-Barrero et al. | General framework to evaluate unlinkability in biometric template protection systems | |
Galbally et al. | Iris image reconstruction from binary templates: An efficient probabilistic approach based on genetic algorithms | |
Tolosana et al. | BioTouchPass2: Touchscreen password biometrics using time-aligned recurrent neural networks | |
US8862888B2 (en) | Systems and methods for three-factor authentication | |
US9430628B2 (en) | Access authorization based on synthetic biometric data and non-biometric data | |
Galbally et al. | Image quality assessment for fake biometric detection: Application to iris, fingerprint, and face recognition | |
CN110677260B (en) | Authentication method, device, electronic equipment and storage medium | |
CN105512535A (en) | User authentication method and user authentication device | |
CN106302330A (en) | Auth method, device and system | |
CN106503655A (en) | A kind of electric endorsement method and sign test method based on face recognition technology | |
CN103873253B (en) | Method for generating human fingerprint biometric key | |
JP7412496B2 (en) | Living body (liveness) detection verification method, living body detection verification system, recording medium, and training method for living body detection verification system | |
CN113505652A (en) | Living body detection method, living body detection device, electronic apparatus, and storage medium | |
CN112132996A (en) | Door lock control method, mobile terminal, door control terminal and storage medium | |
KR20220123118A (en) | Systems and methods for distinguishing user, action and device-specific characteristics recorded in motion sensor data | |
Zhang et al. | Volere: Leakage resilient user authentication based on personal voice challenges | |
Akasaka et al. | Model-free template reconstruction attack with feature converter | |
Liu et al. | Biohashing for human acoustic signature based on random projection | |
CN220983921U (en) | Recognition device based on face and voiceprint | |
Uzun | Security and Privacy in Biometrics-Based Systems. | |
Mtibaa | Towards robust and privacy-preserving speaker verification systems | |
Korshunov et al. | Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |