CN108986789A - Audio recognition method, device, storage medium and electronic equipment - Google Patents
Audio recognition method, device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN108986789A CN108986789A CN201811060707.6A CN201811060707A CN108986789A CN 108986789 A CN108986789 A CN 108986789A CN 201811060707 A CN201811060707 A CN 201811060707A CN 108986789 A CN108986789 A CN 108986789A
- Authority
- CN
- China
- Prior art keywords
- voice data
- eigenmatrix
- data
- sample voice
- recognition method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 22
- 238000012706 support-vector machine Methods 0.000 claims abstract description 20
- 238000000605 extraction Methods 0.000 claims abstract description 14
- 238000012549 training Methods 0.000 claims description 21
- 238000012360 testing method Methods 0.000 claims description 18
- 238000010276 construction Methods 0.000 claims description 17
- 238000010200 validation analysis Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000005194 fractionation Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000005291 magnetic effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000011664 signaling Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000005266 casting Methods 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 210000000352 storage cell Anatomy 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The present invention provides a kind of audio recognition method, device, storage medium and electronic equipment, the audio recognition method includes the following steps: to obtain multiple sample voice data;General coefficient is fallen to each sample voice data progress speech feature extraction, to obtain the eigenmatrix of each sample voice data using mel-frequency;The size of the eigenmatrix of each sample voice data is constructed, according to a preset value to obtain the set of normalized eigenmatrix;Set based on the normalized eigenmatrix establishes a disaggregated model with algorithm of support vector machine;Target speech data is identified by the disaggregated model.The present invention can accurately distinguish the target speech data of multilingual, particular with CRBT or the voice data of the outgoing call call failure of ring.
Description
Technical field
The present invention relates to field of computer technology more particularly to a kind of audio recognition method, device, storage medium and electronics
Equipment.
Background technique
In general, there are a large amount of voice calls data in call center daily, wherein there are many outgoing call call failure
Voice data.The failed signaling that operator provides at present is more general, and such as shutdown, rejection, spacing, shutdown, ring unanswered, the line is busy
Etc. signalings it is consistent, true cause cannot be distinguished, the number for be easy to causeing business invalid dialed repeatedly, influences efficiency.Therefore it needs
A strategy is wanted to find out the failure cause of these call failure voices.
Current existing way is using ASR (Automatic Speech Recognition, automatic speech recognition) language
Voice recognition method, the method are semantic-based.As shutdown, the line is busy, the outgoing call voice of spacing and shutdown these types, because every time
The voice of casting be all it is identical, ASR can be very good to identify the failure cause of these types.But there are two fatal to lack by ASR
It falls into, i.e., multilingual type is supported that limited and cost is very high and can not be identified in voice with CRBT and ring situation.With business
Expansion, can often encounter in the voice data and voice of many foreign languages have CRBT and ring voice data, the prior art is
Speech recognition demand through being difficult under such situation of meet demand.
Summary of the invention
For the problems of the prior art, the purpose of the present invention is to provide a kind of audio recognition method, device, electronics to set
Standby and storage medium is called with accurately distinguishing the target speech data of multilingual particular with the outgoing call of CRBT or ring
The voice data of failure.
A kind of audio recognition method is provided according to an aspect of the present invention, it includes the following steps: to obtain multiple sample languages
Sound data;General coefficient is fallen to each sample voice data progress speech feature extraction, to obtain each sample using mel-frequency
The eigenmatrix of voice data;The size of the eigenmatrix of each sample voice data is constructed, according to a preset value to be returned
The set of one eigenmatrix changed;Set based on the normalized eigenmatrix establishes a classification with algorithm of support vector machine
Model;Target speech data is identified by the disaggregated model.
In one embodiment of the present invention, the sample voice data are divided into the first voice data and the second voice
The type of data, the sample voice data is exported as the classification of the disaggregated model.
In one embodiment of the present invention, by more to quantity in first voice data and second speech data
A kind of voice data sampled so that first voice data is identical with the quantity of second speech data.
In one embodiment of the present invention, first voice data and the second speech data are respectively labeled as
Rejection voice data and ring unanswered's voice data.
In one embodiment of the present invention, first voice data and second speech data include CRBT or ring.
In one embodiment of the present invention, it is every to be constructed to instruction for each eigenmatrix in the set of the eigenmatrix
Rear n seconds voice data of one sample voice data, n are the integer for being less than or equal to 15 more than or equal to 5.
In one embodiment of the present invention, the value based on n makes described preset value [1, M], and M is more than or equal to 1
The step of integer, the size of the eigenmatrix that each sample voice data are constructed according to a preset value includes:
The size of the eigenmatrix of each sample voice data is configured to [1, M], wherein M is this feature matrix column
Number.
In one embodiment of the present invention, the size of the eigenmatrix by each sample voice data is configured to
The step of [1, M] includes:
If the size of the eigenmatrix of the sample voice data is more than [1, M], then the spy of the sample voice data is intercepted
The rear M column in matrix are levied, its size [1, M] is made;
If the size of the eigenmatrix of the sample voice data is less than [1, M], then with the spy for making the sample voice data
Sign matrix preceding paragraph is filled with 0, makes its size [1, M].
In one embodiment of the present invention, n is 10 seconds, M 17381.
In one embodiment of the present invention, the set based on the normalized eigenmatrix is with support vector machines
Algorithm establishes the step of disaggregated model and includes:
By the set of the normalized eigenmatrix according to a preset ratio establish training dataset, validation data set and
Test data set;
The classification is established with algorithm of support vector machine based on the training dataset, validation data set and test data set
Model;
The training dataset is for training pattern or determines model parameter, and the validation data set is for doing model choosing
It selects, the test data set is used to test the resolution capability of trained model.
In one embodiment of the present invention, the preset ratio of the training dataset, validation data set and test data
For 6:2:2.
According to another aspect of the present invention, a kind of speech recognition equipment is provided, it includes: to obtain module, feature extraction mould
Block, feature construction module, model construction module and identification module.The acquisition module is for obtaining multiple sample voice numbers
According to.The characteristic extracting module be used for using mel-frequency fall general coefficient to each sample voice data carry out phonetic feature mention
It takes, to obtain the eigenmatrix of each sample voice data.The feature construction module is used to construct according to a preset value each
The size of the eigenmatrix of sample voice data, to obtain the set of normalized eigenmatrix.The model construction module is used
One disaggregated model is established with algorithm of support vector machine in the set based on the normalized eigenmatrix.The identification module is used
Target speech data is identified in passing through the disaggregated model.
According to another aspect of the invention, a kind of storage medium is provided, is stored with computer program on the storage medium,
The computer program executes step as described above when being run by processor.
According to another aspect of the present invention, a kind of electronic equipment is provided, the electronic equipment includes: that processor and storage are situated between
Matter.Computer program is stored on the storage medium, the computer program executes institute as above when being run by the processor
The step of stating.
Audio recognition method proposed by the invention uses mel-frequency to fall general coefficient to each sample voice data first
Speech feature extraction is carried out to obtain the eigenmatrix of each sample voice data;Each sample language is constructed according still further to a preset value
The size of the eigenmatrix of sound data is to obtain the set of normalized eigenmatrix;And it is based on the normalized eigenmatrix
Set one disaggregated model is established with algorithm of support vector machine, the mesh of multilingual can be accurately distinguished by the disaggregated model
Voice data is marked, particular with CRBT or the voice data of the outgoing call call failure of ring.
In addition, audio recognition method proposed by the invention also has good expansion, accuracy rate higher and cost
Lower advantage.
Detailed description of the invention
Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention,
Objects and advantages will become more apparent upon.
Fig. 1 is the flow chart of audio recognition method in one embodiment of the invention.
Fig. 2 is the structural schematic diagram of speech recognition equipment in one embodiment of the invention.
Fig. 3 is the structural schematic diagram of computer readable storage medium in one embodiment of the invention.
Fig. 4 is the structural schematic diagram of electronic equipment in one embodiment of the invention.
Fig. 5 is that the set based on the normalized eigenmatrix in one embodiment of the invention is built with algorithm of support vector machine
The flow chart of a vertical disaggregated model.And
Fig. 6 is the structural schematic diagram of speech recognition equipment in another embodiment of the present invention.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes
Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, thesing embodiments are provided so that the disclosure will more
Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot
Structure or characteristic can be incorporated in any suitable manner in one or more embodiments.
In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure
Note indicates same or similar part, thus will omit repetition thereof.Some block diagrams shown in the drawings are function
Energy entity, not necessarily must be corresponding with physically or logically independent entity.These function can be realized using software form
Energy entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or at heterogeneous networks and/or place
These functional entitys are realized in reason device device and/or microcontroller device.
In order to solve the deficiencies in the prior art, the target speech data of multilingual is accurately distinguished, particular with CRBT
Or the voice data of the outgoing call call failure of ring.The present invention provides a kind of audio recognition method, device, electronic equipment and storage
Medium,
Fig. 1 is the flow chart of audio recognition method in one embodiment of the invention.The audio recognition method includes following step
It is rapid:
S110 obtains multiple sample voice data.
It will be appreciated by persons skilled in the art that the sample voice data obtained herein can be have been subjected to it is pre-
The voice data that handles and can directly use.And the validity in order to guarantee the following disaggregated model, the sample voice
The radix of data should be sufficiently large.
In a specific embodiment of the invention, the sample voice data can be divided into the first voice data and
Second speech data.Specifically, can be labeled in step S110 to sample voice data, to distinguish the first voice data
And second speech data.
Furthermore, the quantity of the first voice data and second speech data as described in sample voice data may
Larger difference is had, then can be by a fairly large number of a kind of voice in first voice data and second speech data
Data are sampled, so that first voice data is identical with the quantity of second speech data.That is, extracting described first
A fairly large number of a kind of a part of voice data in voice data and second speech data, so that first voice data
Maintain an equal level with the quantity of second speech data.Thus the quantity that various types of voice data in sample voice data can be solved is unbalanced
Problem to improve the reliability of the building of subsequent classification model, and then improves the validity of the audio recognition method.
S120 falls general coefficient to each sample voice data progress speech feature extraction, to obtain often using mel-frequency
The eigenmatrix of a sample voice data.
Common Speech Feature Extraction has very much.The present invention is carried out using mel-frequency cepstrum coefficient (abbreviation mfcc)
The complexity of follow-up mode identifying system can be effectively reduced in the feature extraction of sample voice data, can be effective for voice
The High Efficiency Modeling of principle of sound, has and its wide applicability.Furthermore, described that general coefficient pair is fallen using mel-frequency
The step of each sample voice data progress speech feature extraction, can be based on python_speech_features Open-Source Tools
It realizes.
S130 constructs the size of the eigenmatrix of each sample voice data according to a preset value, normalized to obtain
The set of eigenmatrix.
Continue with sample voice data as ring unanswered's (such as first voice data) and rejection (such as the second voice number
According to) for two kinds of voice data two types, sample voice data at this time include CRBT or ring, existing speech recognition
Method can not effectively distinguish it.Since the CRBT or ring are generally present in the beginning of sample voice data,
So we can construct by the eigenmatrix to sample voice data, inactive portion included in it is eliminated,
To obtain the set of normalized eigenmatrix.It intercepts first described in sample voice data indicated by the eigenmatrix
The CRBT of the beginning of voice data and second speech data or ring part retain first voice data and the second voice number
According to it is subsequent distinguish part, so as to training pattern.
Furthermore, each eigenmatrix in the set of the eigenmatrix is constructed to indicate each sample voice number
According to rear n seconds voice data, n be more than or equal to 5 be less than or equal to 15 integer.Value based on n make the preset value [1,
M], M is the integer more than or equal to 1.The size of the eigenmatrix that each sample voice data are constructed according to a preset value
Step includes: that the size of the eigenmatrix of each sample voice data is configured to [1, M], and wherein M is this feature matrix column
Number (in other words, the size of the eigenmatrix of each sample voice data is 1 row M column).It is described by each sample voice data
The size of eigenmatrix is configured to the step of [1, M] and further comprises: if the size of the eigenmatrix of the sample voice data
More than [1, M], then the rear M column in the eigenmatrix of the sample voice data are intercepted, its size [1, M] is made.If the sample
The size of the eigenmatrix of voice data is less than [1, M], then with making the eigenmatrix preceding paragraph of the sample voice data be filled with 0,
Make its size [1, M].
Find that the length of these voice data usually differs by the detection to two kinds of voice data of ring unanswered and rejection,
Longest 70 seconds reachable, short will also have 3,4 seconds, and the beginning of these voice data is all ring back tone or color bell sound.
For aforementioned two classes voice data only in last n second different from, the voice data of rejection class can say " you in the last n second
The phone dialed is busy now, and is please dialled again later ", and the voice data of ring unanswered's class always is in the last n second
Ring back tone or color bell sound.According to test, moduli is adjusted to draw up as n=10, voice data of the present invention to above two type
Differentiation effect it is ideal.The columns of the eigenmatrix also determines that when determining when the numerical value of n, at this time M=
17381, also mean that the size of the eigenmatrix of each sample voice data is configured to [1,17381] by needs, if described
The size of the eigenmatrix of sample voice data is more than [1,17381], then in the eigenmatrix for intercepting the sample voice data
17381 column afterwards, make its size [1,17381].If the size of the eigenmatrix of the sample voice data be less than [1,
17381], then make its size [1,17381] with making the eigenmatrix preceding paragraph of the sample voice data be filled with 0.
S140, the set based on the normalized eigenmatrix establish a disaggregated model with algorithm of support vector machine.
The type that the sample voice data are marked is exported as the classification of the disaggregated model.That is, described at this time
Disaggregated model is dedicated for identifying the voice data of first voice data and second speech data two categories.Certainly, originally
Invention is not limited thereto, the type for the target speech data that can be used for identifying and the type one of the sample voice data
It causes.
S150 identifies target speech data by the disaggregated model.
In one embodiment of the present invention, when voice data to be identified is rejection and ring unanswered's two types,
Audio recognition method of the present invention is especially suitable.Specifically, obtaining sufficient amount of sample voice data first, wherein
The sample voice data are divided into two type of rejection (such as first voice data) and ring unanswered's (such as second speech data)
Type carries out speech feature extraction to the sample voice data by mfcc (mel-frequency fall general coefficient) to obtain corresponding spy
Matrix is levied, then extracted eigenmatrix is normalized to form the set of the eigenmatrix of default size, and base
Disaggregated model is constructed with SVM algorithm (algorithm of support vector machine) in the set of the eigenmatrix, then passes through the disaggregated model
Rejection voice data and ring unanswered's voice data to be identified can effectively be distinguished.Algorithm of support vector machine is built upon statistics
The VC dimension of the theories of learning is theoretical and Structural risk minization basis on, according to limited sample information model complexity
(i.e. to the study precision of specific training sample, Accuracy) and learning ability (identify the ability of arbitrary sample) without error
Between seek optimal compromise, to obtain best Generalization Ability (or generalization ability), therefore, SVM algorithm solve sample
This, show many distinctive advantages, and Function Fitting can be promoted the use of etc. other in the identification of non-linear and high dimensional pattern
In Machine Learning Problems.Since the disaggregated model is constructed based on machine learning, as long as target speech data belongs to
Type in sample voice data can be distinguished effectively, be linguistic property without regard to target speech data.Likewise, this
Invention can also accurately distinguish the target speech data of multilingual by the disaggregated model.
In addition, audio recognition method proposed by the invention also has good expansion, accuracy rate higher and cost
Lower advantage.
Fig. 5 is that the set based on the normalized eigenmatrix in one embodiment of the invention is built with algorithm of support vector machine
The flow chart of a vertical disaggregated model, as shown in figure 5, the set based on the normalized eigenmatrix is with support vector machines
It includes: S1410 that algorithm, which establishes the step of disaggregated model, by the set of the normalized eigenmatrix according to a preset ratio
Establish training dataset, validation data set and test data set.S1420 is based on the training dataset, validation data set and survey
It tries data set and the disaggregated model is established with algorithm of support vector machine.
Specifically, it all includes multi-group data, every group of data packet that training dataset, validation data set and test data, which are concentrated,
Include the eigenmatrix of sample voice data and the tag along sort of sample voice data.Training dataset is used for sample language therein
Input of the eigenmatrix of sound data as disaggregated model, the tag along sort of sample voice data therein is as disaggregated model
Output carrys out training pattern or determines model parameter.Test data set be used for using the eigenmatrix of sample voice data therein as
The input of disaggregated model obtains the output of disaggregated model according to the disaggregated model of train number, by the output and test of disaggregated model
The tag along sort of the sample voice data of data set is compared, to determine the resolution capability (example of the trained disaggregated model
Such as, accuracy rate).Typically, when the preset ratio of the training dataset, validation data set and test data is 6:2:2,
The disaggregated model it is preferable to the recognition effect of target speech data.
Wherein, the training dataset for training pattern or determines model parameter, i.e., for constructing model.The verifying
Data set is used to do the final optimization pass and determination of model, and submodel building can repeat to make.The test data set is only in mould
Type uses when examining, and for the accuracy rate of assessment models, is not allowed for model construction process, otherwise will lead to transition fitting.
Fig. 2 is the structural schematic diagram of speech recognition equipment in one embodiment of the invention.As shown in Fig. 2, the speech recognition fills
Set includes: to obtain module 201, characteristic extracting module 202, feature construction module 203, model construction module 204 and identification mould
Block 205.The acquisition module 201 is for obtaining multiple sample voice data.The characteristic extracting module 202 is used to use Meier
Frequency falls general coefficient to each sample voice data progress speech feature extraction, to obtain the feature square of each sample voice data
Battle array.The feature construction module 203 is used to construct the size of the eigenmatrix of each sample voice data according to a preset value, with
Obtain the set of normalized eigenmatrix.The model construction module 204 is used for based on the normalized eigenmatrix
Set establishes a disaggregated model with algorithm of support vector machine.The identification module 205 is used to identify mesh by the disaggregated model
Mark voice data.Wherein, the execution step and principle of modules are described in the above-described embodiments, therefore no longer superfluous
It states.Under the premise of without prejudice to present inventive concept, the fractionation and merging of modules are all within protection scope of the present invention.This
Inventing proposed speech recognition equipment has the advantages that good expansion, accuracy rate are higher and cost is relatively low, can accurate area
The target speech data for dividing multilingual, particular with CRBT or the voice data of the outgoing call call failure of ring.
Fig. 6 is the structural schematic diagram of speech recognition equipment in another embodiment of the present invention.As shown in fig. 6, the speech recognition
Device is in addition to including obtaining module 201, characteristic extracting module 202, feature construction module 203, model construction module 204 and knowing
It further include sampling module 206 outside other module 205.Wherein, the feature construction module 203 further comprises fisrt feature building
Module 2031 and second feature construct module 2032.The model construction module 204 further comprises data set building module
2041.The acquisition module 201 is for obtaining multiple sample voice data.The characteristic extracting module 202 is used to use Meier
Frequency falls general coefficient to each sample voice data progress speech feature extraction, to obtain the feature square of each sample voice data
Battle array.The feature construction module 203 is used to construct the size of the eigenmatrix of each sample voice data according to a preset value, with
Obtain the set of normalized eigenmatrix.The model construction module 204 is used for based on the normalized eigenmatrix
Set establishes a disaggregated model with algorithm of support vector machine.The identification module 205 is used to identify mesh by the disaggregated model
Mark voice data.The sampling module 206 is used for by a kind of voice data progress a fairly large number of in sample voice data
Sampling, so that the quantity of various types of voice data is same or similar in sample voice data.The fisrt feature constructs module
2031 eigenmatrix for being more than a preset value to size constructs.The second feature building module 2032 is used for big
The small eigenmatrix less than a preset value is constructed.The data set building module 2041 is used for the normalized feature
The set of matrix establishes training dataset, validation data set and test data set according to a preset ratio.Wherein, modules
It executes step and principle is described in the above-described embodiments, therefore repeat no more.In the premise without prejudice to present inventive concept
Under, the fractionation and merging of modules are all within protection scope of the present invention.
Speech recognition equipment proposed by the invention has good expansion, accuracy rate higher and lower-cost excellent
Point can accurately distinguish the target speech data of multilingual, particular with CRBT or the voice of the outgoing call call failure of ring
Data.
In an exemplary embodiment of the present invention, a kind of computer readable storage medium is additionally provided, meter is stored thereon with
Audio recognition method described in any one above-mentioned embodiment may be implemented when being executed by such as processor in calculation machine program, the program
The step of.In some possible embodiments, various aspects of the invention are also implemented as a kind of form of program product,
It includes program code, and when described program product is run on the terminal device, said program code is for setting the terminal
The step of standby various illustrative embodiments according to the present invention for executing the above-mentioned audio recognition method description of this specification.The present invention
So that user's reports that event is available to be automatically processed for repairment, to simplify speech recognition process;Mitigating background service team
Operating pressure, while user experience can also be promoted.
Fig. 3 is the structural schematic diagram of computer readable storage medium in one embodiment of the invention.Fig. 3 is described according to this hair
The program product 300 for realizing the above method of bright embodiment can use portable compact disc read only memory
(CD-ROM) it and including program code, and can be run on terminal device, such as PC.However, program of the invention
Product is without being limited thereto, and in this document, readable storage medium storing program for executing can be any tangible medium for including or store program, the program
Execution system, device or device use or in connection can be commanded.
Described program product 300 can be using any combination of one or more readable mediums.Readable medium can be can
Read signal medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can be but be not limited to electricity, magnetic, optical, electromagnetic, infrared
The system of line or semiconductor, device or device, or any above combination.The more specific example of readable storage medium storing program for executing is (non-
The list of exhaustion) include: electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM),
Read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, the read-only storage of portable compact disc
Device (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
The computer readable storage medium may include in a base band or the data as the propagation of carrier wave a part are believed
Number, wherein carrying readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetism
Signal, optical signal or above-mentioned any appropriate combination.Readable storage medium storing program for executing can also be any other than readable storage medium storing program for executing
Readable medium, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or
Person's program in connection.The program code for including on readable storage medium storing program for executing can transmit with any suitable medium, packet
Include but be not limited to wireless, wired, optical cable, RF etc. or above-mentioned any appropriate combination.
The program for executing operation of the present invention can be write with any combination of one or more programming languages
Code, described program design language include object oriented program language-Java, C++ etc., further include conventional
Procedural programming language-such as " C " language or similar programming language.Program code can be fully in user
It calculates and executes in equipment, partly executes on a user device, being executed as an independent software package, partially in user's calculating
Upper side point is executed on a remote computing or is executed in remote computing device or server completely.It is being related to far
Journey calculates in the situation of equipment, and remote computing device can pass through the network of any kind, including local area network (LAN) or wide area network
(WAN), it is connected to user calculating equipment, or, it may be connected to external computing device (such as utilize ISP
To be connected by internet).
In an exemplary embodiment of the present invention, a kind of electronic equipment is also provided, which may include processor,
And the memory of the executable instruction for storing the processor.Wherein, the processor is configured to via described in execution
Executable instruction is come the step of executing audio recognition method described in any one above-mentioned embodiment.
Person of ordinary skill in the field it is understood that various aspects of the invention can be implemented as system, method or
Program product.Therefore, various aspects of the invention can be embodied in the following forms, it may be assumed that complete hardware embodiment, complete
The embodiment combined in terms of full Software Implementation (including firmware, microcode etc.) or hardware and software, can unite here
Referred to as circuit, " module " or " system ".
The electronic equipment 400 of this embodiment according to the present invention is described referring to Fig. 4.The electronics that Fig. 4 is shown
Equipment 400 is only an example, should not function to the embodiment of the present invention and use scope bring any restrictions.
As shown in figure 4, electronic equipment 400 is showed in the form of universal computing device.The component of electronic equipment 400 can wrap
It includes but is not limited to: at least one processing unit 410, at least one storage unit 420, (including the storage of the different system components of connection
Unit 420 and processing unit 410) bus 430, display unit 440 etc..
Wherein, the storage unit is stored with program code, and said program code can be held by the processing unit 410
Row, so that various according to the present invention described in the execution of the processing unit 410 above-mentioned audio recognition method part of this specification
The step of illustrative embodiments.For example, the processing unit 410 can execute step as shown in fig. 1.
The storage unit 420 may include the readable medium of volatile memory cell form, such as random access memory
Unit (RAM) 4201 and/or cache memory unit 4202 can further include read-only memory unit (ROM) 4203.
The storage unit 420 can also include program/practical work with one group of (at least one) program module 4205
Tool 4204, such program module 4205 includes but is not limited to: operating system, one or more application program, other programs
It may include the realization of network environment in module and program data, each of these examples or certain combination.
Bus 430 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage
Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in a variety of bus structures
Local bus.
Electronic equipment 400 can also be with one or more external equipments 500 (such as keyboard, sensing equipment, bluetooth equipment
Deng) communication, can also be enabled a user to one or more equipment interact with the electronic equipment 400 communicate, and/or with make
Any equipment (such as the router, modulation /demodulation that the electronic equipment 400 can be communicated with one or more of the other calculating equipment
Device etc.) communication.This communication can be carried out by input/output (I/O) interface 450.Also, electronic equipment 400 can be with
By network adapter 460 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network,
Such as internet) communication.Network adapter 460 can be communicated by bus 430 with other modules of electronic equipment 400.It should
Understand, although not shown in the drawings, other hardware and/or software module can be used in conjunction with electronic equipment 400, including but unlimited
In: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and number
According to backup storage system etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented
Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the present invention
The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one
Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating
Equipment (can be personal computer, server or network equipment etc.) executes the above-mentioned voice of embodiment according to the present invention
Recognition methods.
Audio recognition method proposed by the invention uses mel-frequency to fall general coefficient to each sample voice data first
Speech feature extraction is carried out to obtain the eigenmatrix of each sample voice data;Each sample language is constructed according still further to a preset value
The size of the eigenmatrix of sound data is to obtain the set of normalized eigenmatrix;And it is based on the normalized eigenmatrix
Set one disaggregated model is established with algorithm of support vector machine, the mesh of multilingual can be accurately distinguished by the disaggregated model
Voice data is marked, particular with CRBT or the voice data of the outgoing call call failure of ring.
In addition, audio recognition method proposed by the invention also has good expansion, accuracy rate higher and cost
Lower advantage.
The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that
Specific implementation of the invention is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs, exist
Under the premise of not departing from present inventive concept, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to of the invention
Protection scope.
Claims (14)
1. a kind of audio recognition method, which comprises the steps of:
Obtain multiple sample voice data;
General coefficient is fallen to each sample voice data progress speech feature extraction, to obtain each sample voice using mel-frequency
The eigenmatrix of data;
The size of the eigenmatrix of each sample voice data is constructed, according to a preset value to obtain normalized eigenmatrix
Set;
Set based on the normalized eigenmatrix establishes a disaggregated model with algorithm of support vector machine;
Target speech data is identified by the disaggregated model.
2. audio recognition method according to claim 1, which is characterized in that the sample voice data are divided into first
The type of voice data and second speech data, the sample voice data is exported as the classification of the disaggregated model.
3. audio recognition method according to claim 2, which is characterized in that by first voice data and second
A kind of a fairly large number of voice data is sampled in voice data, so that first voice data and second speech data
Quantity it is identical.
4. audio recognition method according to claim 2, which is characterized in that first voice data and second language
Sound data are respectively labeled as rejection voice data and ring unanswered's voice data.
5. audio recognition method according to claim 2, which is characterized in that first voice data and the second voice number
According to including CRBT or ring.
6. audio recognition method according to claim 1, which is characterized in that each feature in the set of the eigenmatrix
Matrix is constructed to indicate that rear n seconds voice data of each sample voice data, n are the integer for being less than or equal to 15 more than or equal to 5.
7. audio recognition method according to claim 6, which is characterized in that the value based on n make described preset value [1,
M], M is the integer more than or equal to 1, the size of the eigenmatrix that each sample voice data are constructed according to a preset value
Step includes:
The size of the eigenmatrix of each sample voice data is configured to [1, M], wherein M is this feature matrix column number.
8. according to audio recognition method as claimed in claim 7, which is characterized in that the feature square by each sample voice data
Size the step of being configured to [1, M] of battle array includes:
If the size of the eigenmatrix of the sample voice data is more than [1, M], then the feature square of the sample voice data is intercepted
Rear M column in battle array, make its size [1, M];
If the size of the eigenmatrix of the sample voice data is less than [1, M], then with the feature square for making the sample voice data
Battle array preceding paragraph is filled with 0, makes its size [1, M].
9. audio recognition method according to claim 7 or 8, which is characterized in that n is 10 seconds, M 17381.
10. audio recognition method according to claim 1, which is characterized in that described to be based on the normalized feature square
Set the step of establishing a disaggregated model with algorithm of support vector machine of battle array includes:
The set of the normalized eigenmatrix is established into training dataset, validation data set and test according to a preset ratio
Data set;
The classification mould is established with algorithm of support vector machine based on the training dataset, validation data set and test data set
Type;
Wherein, the training dataset for training pattern or determines model parameter, and the validation data set is for doing model choosing
It selects, the test data set is used to test the resolution capability of trained model.
11. audio recognition method according to claim 10, which is characterized in that the training dataset, validation data set
Preset ratio with test data is 6:2:2.
12. a kind of speech recognition equipment characterized by comprising
Module is obtained, for obtaining multiple sample voice data;
Characteristic extracting module, for using mel-frequency fall general coefficient to each sample voice data carry out speech feature extraction,
To obtain the eigenmatrix of each sample voice data;
Feature construction module, the size of the eigenmatrix for constructing each sample voice data according to a preset value, to obtain
The set of normalized eigenmatrix;
Model construction module establishes a classification for the set based on the normalized eigenmatrix with algorithm of support vector machine
Model;
Identification module, for identifying target speech data by the disaggregated model.
13. a kind of storage medium, which is characterized in that be stored with computer program, the computer program on the storage medium
Step as described in any one of claim 1 to 11 is executed when being run by processor.
14. a kind of electronic equipment, which is characterized in that the electronic equipment includes:
Processor;
Storage medium is stored thereon with computer program, and such as right is executed when the computer program is run by the processor
It is required that 1 to 11 described in any item steps.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811060707.6A CN108986789A (en) | 2018-09-12 | 2018-09-12 | Audio recognition method, device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811060707.6A CN108986789A (en) | 2018-09-12 | 2018-09-12 | Audio recognition method, device, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108986789A true CN108986789A (en) | 2018-12-11 |
Family
ID=64545109
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811060707.6A Pending CN108986789A (en) | 2018-09-12 | 2018-09-12 | Audio recognition method, device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108986789A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110290280A (en) * | 2019-05-28 | 2019-09-27 | 同盾控股有限公司 | A kind of recognition methods of the SOT state of termination, device and storage medium |
CN110503980A (en) * | 2019-08-23 | 2019-11-26 | 百可录(北京)科技有限公司 | A method of classified based on machine learning for ring |
CN110933236A (en) * | 2019-10-25 | 2020-03-27 | 杭州哲信信息技术有限公司 | Machine learning-based null number identification method |
CN110995938A (en) * | 2019-12-13 | 2020-04-10 | 上海优扬新媒信息技术有限公司 | Data processing method and device |
CN111508527A (en) * | 2020-04-17 | 2020-08-07 | 北京帝派智能科技有限公司 | Telephone answering state detection method, device and server |
CN112002306A (en) * | 2020-08-26 | 2020-11-27 | 阳光保险集团股份有限公司 | Voice category identification method and device, electronic equipment and readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102404462A (en) * | 2010-09-08 | 2012-04-04 | 北京商路通信息技术有限公司 | Call progress analyzing method for phone dialing system and device |
CN105323744A (en) * | 2014-06-23 | 2016-02-10 | 中兴通讯股份有限公司 | Method and apparatus for call state feedback, and terminal |
CN106971714A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of speech de-noising recognition methods and device applied to robot |
US9818399B1 (en) * | 2000-11-30 | 2017-11-14 | Google Inc. | Performing speech recognition over a network and using speech recognition results based on determining that a network connection exists |
CN108369813A (en) * | 2017-07-31 | 2018-08-03 | 深圳和而泰智能家居科技有限公司 | Specific sound recognition methods, equipment and storage medium |
-
2018
- 2018-09-12 CN CN201811060707.6A patent/CN108986789A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9818399B1 (en) * | 2000-11-30 | 2017-11-14 | Google Inc. | Performing speech recognition over a network and using speech recognition results based on determining that a network connection exists |
CN102404462A (en) * | 2010-09-08 | 2012-04-04 | 北京商路通信息技术有限公司 | Call progress analyzing method for phone dialing system and device |
CN105323744A (en) * | 2014-06-23 | 2016-02-10 | 中兴通讯股份有限公司 | Method and apparatus for call state feedback, and terminal |
CN106971714A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of speech de-noising recognition methods and device applied to robot |
CN108369813A (en) * | 2017-07-31 | 2018-08-03 | 深圳和而泰智能家居科技有限公司 | Specific sound recognition methods, equipment and storage medium |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110290280A (en) * | 2019-05-28 | 2019-09-27 | 同盾控股有限公司 | A kind of recognition methods of the SOT state of termination, device and storage medium |
CN110503980A (en) * | 2019-08-23 | 2019-11-26 | 百可录(北京)科技有限公司 | A method of classified based on machine learning for ring |
CN110933236A (en) * | 2019-10-25 | 2020-03-27 | 杭州哲信信息技术有限公司 | Machine learning-based null number identification method |
CN110995938A (en) * | 2019-12-13 | 2020-04-10 | 上海优扬新媒信息技术有限公司 | Data processing method and device |
CN111508527A (en) * | 2020-04-17 | 2020-08-07 | 北京帝派智能科技有限公司 | Telephone answering state detection method, device and server |
CN112002306A (en) * | 2020-08-26 | 2020-11-27 | 阳光保险集团股份有限公司 | Voice category identification method and device, electronic equipment and readable storage medium |
CN112002306B (en) * | 2020-08-26 | 2024-04-05 | 阳光保险集团股份有限公司 | Speech class recognition method and device, electronic equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108986789A (en) | Audio recognition method, device, storage medium and electronic equipment | |
CN110334241B (en) | Quality inspection method, device and equipment for customer service record and computer readable storage medium | |
US10354643B2 (en) | Method for recognizing voice signal and electronic device supporting the same | |
US20230072352A1 (en) | Speech Recognition Method and Apparatus, Terminal, and Storage Medium | |
CN106373564B (en) | Personalized hot word detection model | |
CN107992596B (en) | Text clustering method, text clustering device, server and storage medium | |
AU2016291566B2 (en) | Data driven speech enabled self-help systems and methods of operating thereof | |
CN107273531A (en) | Telephone number classifying identification method, device, equipment and storage medium | |
CN106210239A (en) | The maliciously automatic identifying method of caller's vocal print, device and mobile terminal | |
CN110415679A (en) | Voice error correction method, device, equipment and storage medium | |
CN107025393B (en) | Resource calling method and device | |
KR20160027640A (en) | Electronic device and method for recognizing named entities in electronic device | |
US20220301547A1 (en) | Method for processing audio signal, method for training model, device and medium | |
CN109102806A (en) | Method, apparatus, equipment and computer readable storage medium for interactive voice | |
CN109462482A (en) | Method for recognizing sound-groove, device, electronic equipment and computer readable storage medium | |
CN107295489A (en) | Pseudo-base station note recognition methods, device, equipment and storage medium | |
CN110826036A (en) | User operation behavior safety identification method and device and electronic equipment | |
US20220321598A1 (en) | Method of processing security information, device and storage medium | |
CN110909804B (en) | Method, device, server and storage medium for detecting abnormal data of base station | |
US11816443B2 (en) | Method, device, and storage medium for generating response | |
JP2023012541A (en) | Question answering method, device, and electronic apparatus based on table | |
CN114067805A (en) | Method and device for training voiceprint recognition model and voiceprint recognition | |
CN113724698A (en) | Training method, device and equipment of speech recognition model and storage medium | |
CN112633381A (en) | Audio recognition method and training method of audio recognition model | |
US20230145853A1 (en) | Method of generating pre-training model, electronic device, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181211 |
|
RJ01 | Rejection of invention patent application after publication |