CN104462537A

CN104462537A - Method and device for classifying voice data

Info

Publication number: CN104462537A
Application number: CN201410817745.7A
Authority: CN
Inventors: 杨晓昊
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2014-12-24
Filing date: 2014-12-24
Publication date: 2015-03-25

Abstract

The embodiment of the invention discloses a method and device for classifying voice data. The method for classifying the voice data comprises the steps that first voice data of the category to be recognized are obtained; windowing is carried out on the voice time shaft of the first voice data according to a preset windowing algorithm; an MFCC feature vector is extracted for the voice data in each window; vector quantization is carried out on each MFCC feature vector to obtain a one-dimensional first feature value; all the first feature values are calculated according to a preset histogram drawing algorithm to obtain a first histogram of the first voice data; similarity calculation is carried out on the first histogram and preset histogram feature templates corresponding to the voice data of various voice categories, and a first histogram feature template most similar to the first histogram is obtained; the voice category of the feature template is the voice category of the first voice data. Compared with the prior art, according to the technical scheme, the accuracy and speed of classifying the voice data are improved.

Description

A kind of audio data classification method and device

Technical field

The present invention relates to multimedia data processing field, particularly relate to a kind of audio data classification method and device.

Background technology

Along with the develop rapidly of multimedia technology and network technology, voice data exponentially level increases, correspondingly, internet also there is a large amount of voice data information, these information are widely used in multiple fields such as education, amusement, news, advertisement, become the important component part of people's daily life.Therefore, how carrying out classification to these voice datas is a problem demanding prompt solution.

At present, in prior art, first characteristic vector pickup is carried out to voice data, then based on GMM model, voice data is classified.Because the dimension of the proper vector extracted is generally 39 dimensions or more, so need a large amount of data having mark when being trained obtained GMM model by proper vector under GMM model framework, and thisly have the data of mark will consume a large amount of manpower, the actual data volume obtained is fewer, this will bring the problem of Sparse, and the accuracy of voice data classification is not high.In addition, because the dimension of proper vector is higher, the calculated amount of corresponding above-mentioned training process is comparatively large, so training process is slow, the speed of voice data classification is lower.

Summary of the invention

The object of the embodiment of the present invention is to provide a kind of audio data classification method and device, to improve accuracy and the speed of voice data classification.Concrete technical scheme is as follows:

A kind of audio data classification method, is applied to electronic equipment, comprises:

Obtain the first voice data of classification to be identified;

According to the windowing algorithm preset, the audio timeline of described first voice data carries out windowing;

A MFCC proper vector is extracted to the voice data in each window;

According to the Vector Quantization algorithm preset, all MFCC proper vectors extracted are quantified as respectively the First Eigenvalue of one dimension; Wherein, the corresponding the First Eigenvalue of each MFCC proper vector;

According to the histogram rendering algorithm preset, all the First Eigenvalues are calculated, obtains the first histogram of described first voice data;

The histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset carries out Similarity Measure, obtains the first histogram feature template maximum with described first histogram similarity;

Audio categories corresponding for described first histogram feature template is identified as the audio categories of described first voice data.

In a kind of embodiment of the present invention, the described windowing algorithm according to presetting, the audio timeline of described first voice data carries out the step of windowing, comprising:

By described first voice data at preset timed intervals section be divided into audio frame;

Windowing is carried out to described audio frame.

In a kind of embodiment of the present invention, described window is specially rectangular window or Hamming window.

In a kind of embodiment of the present invention, the described histogram rendering algorithm according to presetting, calculate all the First Eigenvalues, the first histogram obtaining described first voice data comprises:

Described the First Eigenvalue is mapped in the first default numerical intervals each numerically;

With described the first default numerical intervals for transverse axis, account for the first histogram of number percent described first voice data for the longitudinal axis calculates of the First Eigenvalue sum with the First Eigenvalue number that each numerical value of described first numerical intervals is corresponding.

In a kind of embodiment of the present invention, the histogram feature template that the voice data of described each default audio categories is corresponding, be obtained by voice data training in advance, a kind of voice data training method comprises:

Obtain the audio data sample having classification to mark;

According to described default windowing algorithm, the audio timeline of the described audio data sample having classification to mark carries out windowing;

A MFCC proper vector is extracted to the voice data in each window;

According to described default Vector Quantization algorithm, all described MFCC proper vectors extracted are quantified as the Second Eigenvalue of one dimension; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;

According to described default histogram rendering algorithm, all Second Eigenvalues are calculated, the histogram of the audio data sample having classification to mark described in obtaining;

Classification according to audio data sample marks, and is averaging the histogram of the audio data sample with identical audio categories, obtains the histogram feature template that the voice data of each audio categories is corresponding.

In a kind of embodiment of the present invention, described according to described default histogram rendering algorithm, calculate all Second Eigenvalues, the histogram of the audio data sample having classification to mark described in obtaining comprises:

Described Second Eigenvalue is mapped in default second value interval each numerically;

Interval for transverse axis with described default second value, the number percent accounting for Second Eigenvalue sum with the Second Eigenvalue number that each numerical value in described second value interval is corresponding for the longitudinal axis calculate described in have classification to mark the histogram of audio data sample.

In a kind of embodiment of the present invention, the described classification according to audio data sample marks, the histogram of the audio data sample with identical audio categories is averaging, obtains the histogram feature template that the voice data of each audio categories is corresponding, comprise the following steps:

Obtain the histogram of the audio data sample having classification to mark described in all calculating gained;

The histogram of the audio data sample having classification to mark described in all calculating gained of described acquisition is classified;

Interval for transverse axis with described default second value, with the mean value of number percent corresponding to each numerical value of the histogrammic transverse axis with the audio data sample of identical category for the longitudinal axis calculates histogram feature template corresponding to the voice data of each audio categories.

A kind of voice data training method, comprising:

Obtain the audio data sample having classification to mark;

According to the windowing algorithm preset, the audio timeline of the described audio data sample having classification to mark carries out windowing;

A MFCC proper vector is extracted to the voice data in each window;

According to the Vector Quantization algorithm preset, all described MFCC proper vectors extracted are quantified as the Second Eigenvalue of one dimension; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;

According to the histogram rendering algorithm preset, all Second Eigenvalues are calculated, the histogram of the audio data sample having classification to mark described in obtaining;

The embodiment of the present invention also provides a kind of voice data sorter, comprising:

First voice data obtains unit: for obtaining the first voice data of classification to be identified;

First voice data windowing unit: for according to the windowing algorithm preset, the audio timeline of described first voice data carries out windowing;

First eigenvector extraction unit: for extracting a MFCC proper vector to the voice data in each window;

First vector quantization unit: for according to the Vector Quantization algorithm preset, all MFCC proper vectors extracted are quantified as respectively the First Eigenvalue of one dimension; Wherein, the corresponding the First Eigenvalue of each MFCC proper vector;

First histogram calculation unit: for according to the histogram rendering algorithm preset, all the First Eigenvalues are calculated, obtains the first histogram of described first voice data;

Similarity calculated: carry out Similarity Measure for the histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset, obtain the first histogram feature template maximum with described first histogram similarity;

Audio categories recognition unit: for audio categories corresponding for described first histogram feature template being identified as the audio categories of described first voice data.

In a kind of embodiment of the present invention, described first voice data windowing unit specifically comprises:

Audio frame acquiring unit: for by described first voice data at preset timed intervals section be divided into audio frame;

Audio frame windowing unit: for carrying out windowing to described audio frame.

In a kind of embodiment of the present invention, it is characterized in that, the window that described first voice data windowing unit adds is specially rectangular window or Hamming window.

In a kind of embodiment of the present invention, described first histogram calculation unit specifically comprises:

The First Eigenvalue map unit: for described the First Eigenvalue is mapped in the first default numerical intervals each numerically;

First histogram calculation subelement: for described the first default numerical intervals for transverse axis, account for the first histogram of number percent described first voice data for the longitudinal axis calculates of the First Eigenvalue sum with the First Eigenvalue number that each numerical value of described first numerical intervals is corresponding.

In a kind of embodiment of the present invention, described similarity calculated, when the histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset carries out Similarity Measure, the histogram feature template that the voice data of described each default audio categories is corresponding is obtained by the training of voice data trainer in advance, and a kind of voice data trainer comprises:

Audio data sample obtains unit: for obtaining the audio data sample of classification mark;

Audio data sample windowing unit: for according to described default windowing algorithm, the audio timeline of the described audio data sample having classification to mark carries out windowing;

Second feature vector extraction unit: for extracting a MFCC proper vector to the voice data in each window;

Second vector quantization unit: for according to described default Vector Quantization algorithm, all described MFCC proper vectors extracted are quantified as the Second Eigenvalue of one dimension; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;

Second histogram calculation unit: for according to described default histogram rendering algorithm, all Second Eigenvalues are calculated, the histogram of the audio data sample having classification to mark described in obtaining;

Histogram feature template obtains unit: mark for the classification according to audio data sample, be averaging the histogram of the audio data sample with identical audio categories, obtain the histogram feature template that the voice data of each audio categories is corresponding.

In a kind of embodiment of the present invention, described second histogram calculation unit specifically comprises:

Second Eigenvalue map unit: for described Second Eigenvalue is mapped in default second value interval each numerically;

Second histogram calculation subelement: for interval for transverse axis with described default second value, the number percent accounting for Second Eigenvalue sum with the Second Eigenvalue number that each numerical value in described second value interval is corresponding for the longitudinal axis calculate described in have classification to mark the histogram of audio data sample.

In a kind of embodiment of the present invention, described histogram feature template obtains unit and specifically comprises:

Histogram obtains unit: for obtaining the histogram of the audio data sample having classification to mark described in all calculating gained;

Histogram taxon: for the histogram of the audio data sample having classification to mark described in all calculating gained of described acquisition is classified;

Histogram feature template obtains subelement: for interval for transverse axis with described default second value, with the mean value of number percent corresponding to each numerical value of the histogrammic transverse axis with the audio data sample of identical category for the longitudinal axis calculates histogram feature template corresponding to the voice data of each classification.

A kind of voice data trainer, comprising:

Audio data sample windowing unit: for according to the windowing algorithm preset, the audio timeline of the described audio data sample having classification to mark carries out windowing;

Second vector quantization unit: for according to the Vector Quantization algorithm preset, all described MFCC proper vectors extracted are quantified as the Second Eigenvalue of one dimension; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;

Second histogram calculation unit: for according to the histogram rendering algorithm preset, all Second Eigenvalues are calculated, the histogram of the audio data sample having classification to mark described in obtaining;

The technical scheme that the embodiment of the present invention provides, the proper vector extracted in voice data from classification to be identified is quantified as the eigenwert of one dimension, and corresponding voice data histogram is calculated to obtained eigenwert, the histogram feature template of this histogram and each classification voice data is carried out Similarity Measure, goes out the audio categories of the voice data of classification to be identified according to the size identification of similarity.Because the proper vector extracted in voice data to be quantified as the eigenwert of one dimension by technical scheme of the present invention, achieve Feature Dimension Reduction, so do not need a large amount of data having mark when technical scheme of the present invention is trained obtained feature templates by proper vector, and technical scheme of the present invention utilizes histogram to obtain the principal character of voice data in global scope, which solve the problem of Sparse in prior art, improve the accuracy of voice data classification.Simultaneously, because technical scheme of the present invention utilizes the method for vector quantization to achieve Feature Dimension Reduction, reduce workload during training characteristics template, so technical scheme of the present invention improves the training speed of feature templates, and then improve the classification speed of voice data.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 is a kind of implementing procedure figure of embodiment of the present invention audio data classification method;

Fig. 2 is a kind of implementing procedure figure of embodiment of the present invention voice data training method;

Fig. 3 is a kind of structural representation of embodiment of the present invention voice data sorter;

Fig. 4 is a kind of structural representation of embodiment of the present invention voice data trainer;

Fig. 5 is the histogrammic a kind of schematic diagram of voice data.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

Shown in Fig. 1, be the implementing procedure figure of a kind of audio data classification method of the embodiment of the present invention, it comprises the following steps:

Step S101, obtains the first voice data of classification to be identified;

In the process of classifying to the first voice data, first electronic equipment obtains the first voice data to be identified.Wherein, the first voice data of classification to be identified can be other voice datas various types of such as head, run-out, advertisement, live report.

Step S102, according to the windowing algorithm preset, the audio timeline of described first voice data carries out windowing;

After electronic equipment obtains above-mentioned first voice data to be identified, because the time of general voice data is longer, and voice data is unstable for a long time, and industry finds that voice data is stable within very short a period of time, so usually need according to the windowing algorithm preset in the process of voice data classification, the audio timeline of the first voice data carries out windowing.In a kind of embodiment of the present invention, according to the windowing algorithm preset, the step that the audio timeline of the first voice data carries out windowing comprises:

(1) above-mentioned first voice data is divided into audio frame by the time period of presetting by equipment;

(2) windowing process is carried out to all audio frames.

Further, the time span of usually selected audio frame is 25 milliseconds, and above-mentioned added window is generally rectangular window or Hamming window.Concrete windowing method, can be identical with prior art, repeats no more here.

Step S103, extracts a MFCC proper vector to the voice data in each window;

After windowing process is carried out to the first voice data, a MFCC proper vector is extracted to the voice data in each window.Wherein, the Chinese of MFCC is called mel-frequency cepstrum coefficient, and MFCC is a kind of audio frequency characteristics of classics, is often applied to the field such as speech recognition and voice data classification.MFCC proper vector comprises the essential characteristics of 12 to 16 dimensions, the energy feature of one dimension, and the first order difference of above-mentioned essential characteristic and energy feature and second order difference feature, thus the dimension of MFCC proper vector can be 39 dimensions, 42 dimensions, 45 dimensions, 48 peacekeepings 51 tie up.Usually, when carrying out MFCC characteristic vector pickup to voice data, the MFCC proper vector of 39 dimensions is preferentially selected.

The voice data of each window extracts the method for a MFCC proper vector can be identical with prior art, repeats no more here.

Step S104, according to the Vector Quantization algorithm preset, is quantified as the First Eigenvalue of one dimension respectively by all MFCC proper vectors extracted; Wherein, the corresponding the First Eigenvalue of each MFCC proper vector;

After extracting a MFCC proper vector to the voice data in each window, according to the Vector Quantization algorithm preset, all MFCC proper vectors said extracted gone out are quantified as the First Eigenvalue of one dimension respectively; Wherein, the corresponding the First Eigenvalue of each MFCC proper vector.In this step, by quantizing MFCC proper vector, final quantization is the First Eigenvalue of one dimension.

Step S105, according to the histogram rendering algorithm preset, calculates all the First Eigenvalues, obtains the first histogram of described first voice data;

After obtaining the First Eigenvalue, according to the histogram rendering algorithm preset, all the First Eigenvalues are calculated, obtains the first histogram of described first voice data.Particularly, in a kind of embodiment of the present invention, first above-mentioned the First Eigenvalue is mapped to the first default numerical intervals each numerically; Then with the first numerical intervals preset for transverse axis, the number percent accounting for the First Eigenvalue sum with the First Eigenvalue number that each numerical value of the first numerical intervals is corresponding calculates the first histogram of the first voice data for the longitudinal axis.Wherein, above-mentioned the first default numerical intervals can be 0 ~ 1023.Here the span being understandable that transverse axis also can be larger than 0 ~ 1023, but the number percent value now outside 1023 is zero.

Step S106, the histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset carries out Similarity Measure, obtains the first histogram feature template maximum with described first histogram similarity;

After the first histogram obtaining the first voice data, the histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset carries out Similarity Measure, obtains the first histogram feature template maximum with described first histogram similarity.Be understandable that, the first histogram feature template here can be any one histogram feature template in the histogram feature template that the voice data of default each audio categories is corresponding.

In a kind of embodiment of the present invention, the histogram feature template that the voice data of each classification preset in step S106 is corresponding is obtained by voice data training in advance, and a kind of voice data training method comprises:

Obtain the audio data sample having classification to mark;

A MFCC proper vector is extracted to the voice data in each window;

In voice data training process, after obtaining the audio data sample having classification to mark, perform the step similar to S102, namely according to described default windowing algorithm, that is the windowing algorithm adopted in the windowing algorithm adopted in voice data training process and voice data assorting process is identical, and the audio timeline of the described audio data sample having classification to mark carries out windowing.Further, a MFCC proper vector is extracted to the voice data in each window, and according to the Vector Quantization algorithm preset, that is the Vector Quantization algorithm adopted in the Vector Quantization algorithm adopted in voice data training process and voice data assorting process is identical, all described MFCC proper vectors extracted is quantified as the Second Eigenvalue of one dimension; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector.

Similar to step S105, after obtaining Second Eigenvalue, according to the histogram rendering algorithm preset, that is the histogram rendering algorithm adopted in the histogram rendering algorithm adopted in voice data training process and voice data assorting process is identical, all Second Eigenvalues are calculated, the histogram of the audio data sample having classification to mark described in obtaining.Particularly, in a kind of embodiment of the present invention, Second Eigenvalue is mapped in default second value interval each numerically; And interval for transverse axis with the second value preset, the number percent accounting for Second Eigenvalue sum with the Second Eigenvalue number that each numerical value in described second value interval is corresponding for the longitudinal axis calculate described in have classification to mark the histogram of audio data sample.It is to be noted: second value interval here can be identical with above-mentioned first numerical intervals.

Finally, after the histogram of audio data sample obtaining classification mark, classification according to audio data sample marks, and is averaging the histogram of the audio data sample with identical audio categories, obtains the histogram feature template that the voice data of each audio categories is corresponding.Particularly, the histogram of the audio data sample having classification to mark of above-mentioned all calculating gained is first obtained; Then the histogram of the audio data sample having classification to mark of all calculating gained obtained is classified; Interval for transverse axis with the second value preset, with the mean value of number percent corresponding to each numerical value of the histogrammic transverse axis with the audio data sample of identical category for the longitudinal axis calculates histogram feature template corresponding to the voice data of each audio categories.

In the process of histogram feature template corresponding to the above-mentioned voice data obtaining each classification, utilize the method for vector quantization, MFCC proper vector is quantified as the eigenwert of one dimension, reduces intrinsic dimensionality, solve the problem of Sparse in prior art.Meanwhile, the Feature Dimension Reduction due to this process implementation, reduces workload during training characteristics template, so improve the training speed of feature templates.

Step S107, is identified as the audio categories of described first voice data by audio categories corresponding for described first histogram feature template.

After obtaining the first histogram feature template maximum with the first histogram similarity, audio categories corresponding for the first histogram feature template is identified as the audio categories of the first voice data.

Shown in Fig. 2, be the implementing procedure figure of a kind of voice data training method of the embodiment of the present invention, it comprises the following steps:

Step S201, obtains the audio data sample having classification to mark;

Step S202, according to the windowing algorithm preset, the audio timeline of the described audio data sample having classification to mark carries out windowing;

Step S203, extracts a MFCC proper vector to the voice data in each window;

Step S204, according to described default Vector Quantization algorithm, is quantified as the Second Eigenvalue of one dimension by all described MFCC proper vectors extracted; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;

Step S205, according to the first histogram rendering algorithm preset, calculates all Second Eigenvalues, the histogram of the audio data sample having classification to mark described in obtaining;

Step S206, the classification according to audio data sample marks, and is averaging the histogram of the audio data sample with identical audio categories, obtains the histogram feature template that the voice data of each audio categories is corresponding.

Relative to embodiment of the method above, the present invention also provides a kind of voice data sorter, shown in Figure 3, and this device comprises:

First voice data obtains unit 301, for obtaining the first voice data of classification to be identified;

First voice data windowing unit 302, for according to the windowing algorithm preset, the audio timeline of described first voice data carries out windowing;

First eigenvector extraction unit 303, for extracting a MFCC proper vector to the voice data in each window;

First vector quantization unit 304, for according to the Vector Quantization algorithm preset, is quantified as the First Eigenvalue of one dimension respectively by all MFCC proper vectors extracted; Wherein, the corresponding the First Eigenvalue of each MFCC proper vector;

First histogram calculation unit 305, for according to the histogram rendering algorithm preset, calculates all the First Eigenvalues, obtains the first histogram of described first voice data;

Similarity calculated 306, carries out Similarity Measure for the histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset, obtains the first histogram feature template maximum with described first histogram similarity;

Audio categories recognition unit 307, for being identified as the audio categories of described first voice data by audio categories corresponding for described first histogram feature template.

In a kind of embodiment of the present invention, described first voice data windowing unit 302 specifically comprises:

In a kind of embodiment of the present invention, the window that described first voice data windowing unit 302 adds is specially rectangular window or Hamming window.

In a kind of embodiment of the present invention, described first histogram calculation unit 305 specifically comprises:

In a kind of embodiment of the present invention, described similarity calculated 306, when the histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset carries out Similarity Measure, the histogram feature template that the voice data of described each default audio categories is corresponding is obtained by the training of voice data trainer in advance, and a kind of voice data trainer comprises:

Shown in Fig. 4, be the structural representation of a kind of voice data trainer of the embodiment of the present invention, this device comprises:

Audio data sample obtains unit 401, for obtaining the audio data sample of classification mark;

Audio data sample windowing unit 402, for according to the windowing algorithm preset, the audio timeline of the described audio data sample having classification to mark carries out windowing;

Second feature vector extraction unit 403, for extracting a MFCC proper vector to the voice data in each window;

Second vector quantization unit 404, for according to the Vector Quantization algorithm preset, is quantified as the Second Eigenvalue of one dimension by all described MFCC proper vectors extracted; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;

Second histogram calculation unit 405, for according to the histogram rendering algorithm preset, calculates all Second Eigenvalues, the histogram of the audio data sample having classification to mark described in obtaining;

Histogram feature template obtains unit 406, marks, is averaging the histogram of the audio data sample with identical audio categories, obtain the histogram feature template that the voice data of each audio categories is corresponding for the classification according to audio data sample.

For convenience of description, various unit is divided into describe respectively with function when describing above device.Certainly, the function of each unit can be realized in same or multiple software and/or hardware when implementing of the present invention.

By see Fig. 5, the feasibility of technical scheme of the present invention is proved below.Histogram red in Fig. 5 represents advertisement voice data histogram, and the histogram of black represents news voice data histogram.Can find out there is larger similarity between advertisement voice data histogram and advertisement voice data histogram, have larger similarity between news voice data histogram and news voice data histogram, and similarity between advertisement voice data histogram and news voice data histogram is less.So we can draw following conclusion: the similarity between the voice data histogram of identical category is comparatively large, and the similarity between different classes of voice data histogram is less.Namely technical scheme of the present invention can accurately, rapidly be classified to voice data when Sparse.

As seen through the above description of the embodiments, those skilled in the art can be well understood to the mode that the present invention can add required general hardware platform by software and realizes.Based on such understanding, technical scheme of the present invention can embody with the form of software product the part that prior art contributes in essence in other words, this computer software product can be stored in storage medium, as ROM/RAM, magnetic disc, CD etc., comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) perform the method described in some part of each embodiment of the present invention or embodiment.

It should be noted that, in this article, the such as relational terms of first and second grades and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.

Each embodiment in this instructions all adopts relevant mode to describe, between each embodiment identical similar part mutually see, what each embodiment stressed is the difference with other embodiments.Especially, for system embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.

The foregoing is only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.All any amendments done within the spirit and principles in the present invention, equivalent replacement, improvement etc., be all included in protection scope of the present invention.

Claims

1. an audio data classification method, is applied to electronic equipment, it is characterized in that, comprising:

Obtain the first voice data of classification to be identified;

A MFCC proper vector is extracted to the voice data in each window;

2. method according to claim 1, is characterized in that, the described windowing algorithm according to presetting, and the audio timeline of described first voice data carries out the step of windowing, comprising:

Windowing is carried out to described audio frame.

3. method according to claim 1 and 2, is characterized in that, described window is specially rectangular window or Hamming window.

4. method according to claim 1 and 2, is characterized in that, the described histogram rendering algorithm according to presetting, and calculate all the First Eigenvalues, the first histogram obtaining described first voice data comprises:

5. method according to claim 1, is characterized in that, the histogram feature template that the voice data of described each default audio categories is corresponding, is to be obtained by voice data training in advance, and a kind of voice data training method comprises:

Obtain the audio data sample having classification to mark;

A MFCC proper vector is extracted to the voice data in each window;

6. method according to claim 5, is characterized in that, described according to described default histogram rendering algorithm, calculate all Second Eigenvalues, the histogram of the audio data sample having classification to mark described in obtaining comprises:

7. method according to claim 6, it is characterized in that, the described classification according to audio data sample marks, and is averaging the histogram of the audio data sample with identical audio categories, obtain the histogram feature template that the voice data of each audio categories is corresponding, comprise the following steps:

8. a voice data training method, is characterized in that, comprising:

Obtain the audio data sample having classification to mark;

A MFCC proper vector is extracted to the voice data in each window;

9. a voice data sorter, is characterized in that, comprising:

10. device according to claim 9, is characterized in that, described first voice data windowing unit specifically comprises:

11. devices according to claim 9 or 10, it is characterized in that, the window that described first voice data windowing unit adds is specially rectangular window or Hamming window.

12. devices according to claim 9 or 10, it is characterized in that, described first histogram calculation unit specifically comprises:

13. devices according to claim 9, it is characterized in that, described similarity calculated, when the histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset carries out Similarity Measure, the histogram feature template that the voice data of described each default audio categories is corresponding is obtained by the training of voice data trainer in advance, and a kind of voice data trainer comprises:

14. devices according to claim 13, is characterized in that, described second histogram calculation unit specifically comprises:

15. devices according to claim 14, is characterized in that, described histogram feature template obtains unit and specifically comprises:

16. 1 kinds of voice data trainers, is characterized in that, comprising: