CN104462537A - Method and device for classifying voice data - Google Patents

Method and device for classifying voice data Download PDF

Info

Publication number
CN104462537A
CN104462537A CN201410817745.7A CN201410817745A CN104462537A CN 104462537 A CN104462537 A CN 104462537A CN 201410817745 A CN201410817745 A CN 201410817745A CN 104462537 A CN104462537 A CN 104462537A
Authority
CN
China
Prior art keywords
histogram
voice data
audio
classification
eigenvalue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410817745.7A
Other languages
Chinese (zh)
Inventor
杨晓昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201410817745.7A priority Critical patent/CN104462537A/en
Publication of CN104462537A publication Critical patent/CN104462537A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Landscapes

  • Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method and device for classifying voice data. The method for classifying the voice data comprises the steps that first voice data of the category to be recognized are obtained; windowing is carried out on the voice time shaft of the first voice data according to a preset windowing algorithm; an MFCC feature vector is extracted for the voice data in each window; vector quantization is carried out on each MFCC feature vector to obtain a one-dimensional first feature value; all the first feature values are calculated according to a preset histogram drawing algorithm to obtain a first histogram of the first voice data; similarity calculation is carried out on the first histogram and preset histogram feature templates corresponding to the voice data of various voice categories, and a first histogram feature template most similar to the first histogram is obtained; the voice category of the feature template is the voice category of the first voice data. Compared with the prior art, according to the technical scheme, the accuracy and speed of classifying the voice data are improved.

Description

A kind of audio data classification method and device
Technical field
The present invention relates to multimedia data processing field, particularly relate to a kind of audio data classification method and device.
Background technology
Along with the develop rapidly of multimedia technology and network technology, voice data exponentially level increases, correspondingly, internet also there is a large amount of voice data information, these information are widely used in multiple fields such as education, amusement, news, advertisement, become the important component part of people's daily life.Therefore, how carrying out classification to these voice datas is a problem demanding prompt solution.
At present, in prior art, first characteristic vector pickup is carried out to voice data, then based on GMM model, voice data is classified.Because the dimension of the proper vector extracted is generally 39 dimensions or more, so need a large amount of data having mark when being trained obtained GMM model by proper vector under GMM model framework, and thisly have the data of mark will consume a large amount of manpower, the actual data volume obtained is fewer, this will bring the problem of Sparse, and the accuracy of voice data classification is not high.In addition, because the dimension of proper vector is higher, the calculated amount of corresponding above-mentioned training process is comparatively large, so training process is slow, the speed of voice data classification is lower.
Summary of the invention
The object of the embodiment of the present invention is to provide a kind of audio data classification method and device, to improve accuracy and the speed of voice data classification.Concrete technical scheme is as follows:
A kind of audio data classification method, is applied to electronic equipment, comprises:
Obtain the first voice data of classification to be identified;
According to the windowing algorithm preset, the audio timeline of described first voice data carries out windowing;
A MFCC proper vector is extracted to the voice data in each window;
According to the Vector Quantization algorithm preset, all MFCC proper vectors extracted are quantified as respectively the First Eigenvalue of one dimension; Wherein, the corresponding the First Eigenvalue of each MFCC proper vector;
According to the histogram rendering algorithm preset, all the First Eigenvalues are calculated, obtains the first histogram of described first voice data;
The histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset carries out Similarity Measure, obtains the first histogram feature template maximum with described first histogram similarity;
Audio categories corresponding for described first histogram feature template is identified as the audio categories of described first voice data.
In a kind of embodiment of the present invention, the described windowing algorithm according to presetting, the audio timeline of described first voice data carries out the step of windowing, comprising:
By described first voice data at preset timed intervals section be divided into audio frame;
Windowing is carried out to described audio frame.
In a kind of embodiment of the present invention, described window is specially rectangular window or Hamming window.
In a kind of embodiment of the present invention, the described histogram rendering algorithm according to presetting, calculate all the First Eigenvalues, the first histogram obtaining described first voice data comprises:
Described the First Eigenvalue is mapped in the first default numerical intervals each numerically;
With described the first default numerical intervals for transverse axis, account for the first histogram of number percent described first voice data for the longitudinal axis calculates of the First Eigenvalue sum with the First Eigenvalue number that each numerical value of described first numerical intervals is corresponding.
In a kind of embodiment of the present invention, the histogram feature template that the voice data of described each default audio categories is corresponding, be obtained by voice data training in advance, a kind of voice data training method comprises:
Obtain the audio data sample having classification to mark;
According to described default windowing algorithm, the audio timeline of the described audio data sample having classification to mark carries out windowing;
A MFCC proper vector is extracted to the voice data in each window;
According to described default Vector Quantization algorithm, all described MFCC proper vectors extracted are quantified as the Second Eigenvalue of one dimension; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;
According to described default histogram rendering algorithm, all Second Eigenvalues are calculated, the histogram of the audio data sample having classification to mark described in obtaining;
Classification according to audio data sample marks, and is averaging the histogram of the audio data sample with identical audio categories, obtains the histogram feature template that the voice data of each audio categories is corresponding.
In a kind of embodiment of the present invention, described according to described default histogram rendering algorithm, calculate all Second Eigenvalues, the histogram of the audio data sample having classification to mark described in obtaining comprises:
Described Second Eigenvalue is mapped in default second value interval each numerically;
Interval for transverse axis with described default second value, the number percent accounting for Second Eigenvalue sum with the Second Eigenvalue number that each numerical value in described second value interval is corresponding for the longitudinal axis calculate described in have classification to mark the histogram of audio data sample.
In a kind of embodiment of the present invention, the described classification according to audio data sample marks, the histogram of the audio data sample with identical audio categories is averaging, obtains the histogram feature template that the voice data of each audio categories is corresponding, comprise the following steps:
Obtain the histogram of the audio data sample having classification to mark described in all calculating gained;
The histogram of the audio data sample having classification to mark described in all calculating gained of described acquisition is classified;
Interval for transverse axis with described default second value, with the mean value of number percent corresponding to each numerical value of the histogrammic transverse axis with the audio data sample of identical category for the longitudinal axis calculates histogram feature template corresponding to the voice data of each audio categories.
A kind of voice data training method, comprising:
Obtain the audio data sample having classification to mark;
According to the windowing algorithm preset, the audio timeline of the described audio data sample having classification to mark carries out windowing;
A MFCC proper vector is extracted to the voice data in each window;
According to the Vector Quantization algorithm preset, all described MFCC proper vectors extracted are quantified as the Second Eigenvalue of one dimension; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;
According to the histogram rendering algorithm preset, all Second Eigenvalues are calculated, the histogram of the audio data sample having classification to mark described in obtaining;
Classification according to audio data sample marks, and is averaging the histogram of the audio data sample with identical audio categories, obtains the histogram feature template that the voice data of each audio categories is corresponding.
The embodiment of the present invention also provides a kind of voice data sorter, comprising:
First voice data obtains unit: for obtaining the first voice data of classification to be identified;
First voice data windowing unit: for according to the windowing algorithm preset, the audio timeline of described first voice data carries out windowing;
First eigenvector extraction unit: for extracting a MFCC proper vector to the voice data in each window;
First vector quantization unit: for according to the Vector Quantization algorithm preset, all MFCC proper vectors extracted are quantified as respectively the First Eigenvalue of one dimension; Wherein, the corresponding the First Eigenvalue of each MFCC proper vector;
First histogram calculation unit: for according to the histogram rendering algorithm preset, all the First Eigenvalues are calculated, obtains the first histogram of described first voice data;
Similarity calculated: carry out Similarity Measure for the histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset, obtain the first histogram feature template maximum with described first histogram similarity;
Audio categories recognition unit: for audio categories corresponding for described first histogram feature template being identified as the audio categories of described first voice data.
In a kind of embodiment of the present invention, described first voice data windowing unit specifically comprises:
Audio frame acquiring unit: for by described first voice data at preset timed intervals section be divided into audio frame;
Audio frame windowing unit: for carrying out windowing to described audio frame.
In a kind of embodiment of the present invention, it is characterized in that, the window that described first voice data windowing unit adds is specially rectangular window or Hamming window.
In a kind of embodiment of the present invention, described first histogram calculation unit specifically comprises:
The First Eigenvalue map unit: for described the First Eigenvalue is mapped in the first default numerical intervals each numerically;
First histogram calculation subelement: for described the first default numerical intervals for transverse axis, account for the first histogram of number percent described first voice data for the longitudinal axis calculates of the First Eigenvalue sum with the First Eigenvalue number that each numerical value of described first numerical intervals is corresponding.
In a kind of embodiment of the present invention, described similarity calculated, when the histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset carries out Similarity Measure, the histogram feature template that the voice data of described each default audio categories is corresponding is obtained by the training of voice data trainer in advance, and a kind of voice data trainer comprises:
Audio data sample obtains unit: for obtaining the audio data sample of classification mark;
Audio data sample windowing unit: for according to described default windowing algorithm, the audio timeline of the described audio data sample having classification to mark carries out windowing;
Second feature vector extraction unit: for extracting a MFCC proper vector to the voice data in each window;
Second vector quantization unit: for according to described default Vector Quantization algorithm, all described MFCC proper vectors extracted are quantified as the Second Eigenvalue of one dimension; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;
Second histogram calculation unit: for according to described default histogram rendering algorithm, all Second Eigenvalues are calculated, the histogram of the audio data sample having classification to mark described in obtaining;
Histogram feature template obtains unit: mark for the classification according to audio data sample, be averaging the histogram of the audio data sample with identical audio categories, obtain the histogram feature template that the voice data of each audio categories is corresponding.
In a kind of embodiment of the present invention, described second histogram calculation unit specifically comprises:
Second Eigenvalue map unit: for described Second Eigenvalue is mapped in default second value interval each numerically;
Second histogram calculation subelement: for interval for transverse axis with described default second value, the number percent accounting for Second Eigenvalue sum with the Second Eigenvalue number that each numerical value in described second value interval is corresponding for the longitudinal axis calculate described in have classification to mark the histogram of audio data sample.
In a kind of embodiment of the present invention, described histogram feature template obtains unit and specifically comprises:
Histogram obtains unit: for obtaining the histogram of the audio data sample having classification to mark described in all calculating gained;
Histogram taxon: for the histogram of the audio data sample having classification to mark described in all calculating gained of described acquisition is classified;
Histogram feature template obtains subelement: for interval for transverse axis with described default second value, with the mean value of number percent corresponding to each numerical value of the histogrammic transverse axis with the audio data sample of identical category for the longitudinal axis calculates histogram feature template corresponding to the voice data of each classification.
A kind of voice data trainer, comprising:
Audio data sample obtains unit: for obtaining the audio data sample of classification mark;
Audio data sample windowing unit: for according to the windowing algorithm preset, the audio timeline of the described audio data sample having classification to mark carries out windowing;
Second feature vector extraction unit: for extracting a MFCC proper vector to the voice data in each window;
Second vector quantization unit: for according to the Vector Quantization algorithm preset, all described MFCC proper vectors extracted are quantified as the Second Eigenvalue of one dimension; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;
Second histogram calculation unit: for according to the histogram rendering algorithm preset, all Second Eigenvalues are calculated, the histogram of the audio data sample having classification to mark described in obtaining;
Histogram feature template obtains unit: mark for the classification according to audio data sample, be averaging the histogram of the audio data sample with identical audio categories, obtain the histogram feature template that the voice data of each audio categories is corresponding.
The technical scheme that the embodiment of the present invention provides, the proper vector extracted in voice data from classification to be identified is quantified as the eigenwert of one dimension, and corresponding voice data histogram is calculated to obtained eigenwert, the histogram feature template of this histogram and each classification voice data is carried out Similarity Measure, goes out the audio categories of the voice data of classification to be identified according to the size identification of similarity.Because the proper vector extracted in voice data to be quantified as the eigenwert of one dimension by technical scheme of the present invention, achieve Feature Dimension Reduction, so do not need a large amount of data having mark when technical scheme of the present invention is trained obtained feature templates by proper vector, and technical scheme of the present invention utilizes histogram to obtain the principal character of voice data in global scope, which solve the problem of Sparse in prior art, improve the accuracy of voice data classification.Simultaneously, because technical scheme of the present invention utilizes the method for vector quantization to achieve Feature Dimension Reduction, reduce workload during training characteristics template, so technical scheme of the present invention improves the training speed of feature templates, and then improve the classification speed of voice data.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is a kind of implementing procedure figure of embodiment of the present invention audio data classification method;
Fig. 2 is a kind of implementing procedure figure of embodiment of the present invention voice data training method;
Fig. 3 is a kind of structural representation of embodiment of the present invention voice data sorter;
Fig. 4 is a kind of structural representation of embodiment of the present invention voice data trainer;
Fig. 5 is the histogrammic a kind of schematic diagram of voice data.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Shown in Fig. 1, be the implementing procedure figure of a kind of audio data classification method of the embodiment of the present invention, it comprises the following steps:
Step S101, obtains the first voice data of classification to be identified;
In the process of classifying to the first voice data, first electronic equipment obtains the first voice data to be identified.Wherein, the first voice data of classification to be identified can be other voice datas various types of such as head, run-out, advertisement, live report.
Step S102, according to the windowing algorithm preset, the audio timeline of described first voice data carries out windowing;
After electronic equipment obtains above-mentioned first voice data to be identified, because the time of general voice data is longer, and voice data is unstable for a long time, and industry finds that voice data is stable within very short a period of time, so usually need according to the windowing algorithm preset in the process of voice data classification, the audio timeline of the first voice data carries out windowing.In a kind of embodiment of the present invention, according to the windowing algorithm preset, the step that the audio timeline of the first voice data carries out windowing comprises:
(1) above-mentioned first voice data is divided into audio frame by the time period of presetting by equipment;
(2) windowing process is carried out to all audio frames.
Further, the time span of usually selected audio frame is 25 milliseconds, and above-mentioned added window is generally rectangular window or Hamming window.Concrete windowing method, can be identical with prior art, repeats no more here.
Step S103, extracts a MFCC proper vector to the voice data in each window;
After windowing process is carried out to the first voice data, a MFCC proper vector is extracted to the voice data in each window.Wherein, the Chinese of MFCC is called mel-frequency cepstrum coefficient, and MFCC is a kind of audio frequency characteristics of classics, is often applied to the field such as speech recognition and voice data classification.MFCC proper vector comprises the essential characteristics of 12 to 16 dimensions, the energy feature of one dimension, and the first order difference of above-mentioned essential characteristic and energy feature and second order difference feature, thus the dimension of MFCC proper vector can be 39 dimensions, 42 dimensions, 45 dimensions, 48 peacekeepings 51 tie up.Usually, when carrying out MFCC characteristic vector pickup to voice data, the MFCC proper vector of 39 dimensions is preferentially selected.
The voice data of each window extracts the method for a MFCC proper vector can be identical with prior art, repeats no more here.
Step S104, according to the Vector Quantization algorithm preset, is quantified as the First Eigenvalue of one dimension respectively by all MFCC proper vectors extracted; Wherein, the corresponding the First Eigenvalue of each MFCC proper vector;
After extracting a MFCC proper vector to the voice data in each window, according to the Vector Quantization algorithm preset, all MFCC proper vectors said extracted gone out are quantified as the First Eigenvalue of one dimension respectively; Wherein, the corresponding the First Eigenvalue of each MFCC proper vector.In this step, by quantizing MFCC proper vector, final quantization is the First Eigenvalue of one dimension.
Step S105, according to the histogram rendering algorithm preset, calculates all the First Eigenvalues, obtains the first histogram of described first voice data;
After obtaining the First Eigenvalue, according to the histogram rendering algorithm preset, all the First Eigenvalues are calculated, obtains the first histogram of described first voice data.Particularly, in a kind of embodiment of the present invention, first above-mentioned the First Eigenvalue is mapped to the first default numerical intervals each numerically; Then with the first numerical intervals preset for transverse axis, the number percent accounting for the First Eigenvalue sum with the First Eigenvalue number that each numerical value of the first numerical intervals is corresponding calculates the first histogram of the first voice data for the longitudinal axis.Wherein, above-mentioned the first default numerical intervals can be 0 ~ 1023.Here the span being understandable that transverse axis also can be larger than 0 ~ 1023, but the number percent value now outside 1023 is zero.
Step S106, the histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset carries out Similarity Measure, obtains the first histogram feature template maximum with described first histogram similarity;
After the first histogram obtaining the first voice data, the histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset carries out Similarity Measure, obtains the first histogram feature template maximum with described first histogram similarity.Be understandable that, the first histogram feature template here can be any one histogram feature template in the histogram feature template that the voice data of default each audio categories is corresponding.
In a kind of embodiment of the present invention, the histogram feature template that the voice data of each classification preset in step S106 is corresponding is obtained by voice data training in advance, and a kind of voice data training method comprises:
Obtain the audio data sample having classification to mark;
According to described default windowing algorithm, the audio timeline of the described audio data sample having classification to mark carries out windowing;
A MFCC proper vector is extracted to the voice data in each window;
According to described default Vector Quantization algorithm, all described MFCC proper vectors extracted are quantified as the Second Eigenvalue of one dimension; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;
According to described default histogram rendering algorithm, all Second Eigenvalues are calculated, the histogram of the audio data sample having classification to mark described in obtaining;
Classification according to audio data sample marks, and is averaging the histogram of the audio data sample with identical audio categories, obtains the histogram feature template that the voice data of each audio categories is corresponding.
In voice data training process, after obtaining the audio data sample having classification to mark, perform the step similar to S102, namely according to described default windowing algorithm, that is the windowing algorithm adopted in the windowing algorithm adopted in voice data training process and voice data assorting process is identical, and the audio timeline of the described audio data sample having classification to mark carries out windowing.Further, a MFCC proper vector is extracted to the voice data in each window, and according to the Vector Quantization algorithm preset, that is the Vector Quantization algorithm adopted in the Vector Quantization algorithm adopted in voice data training process and voice data assorting process is identical, all described MFCC proper vectors extracted is quantified as the Second Eigenvalue of one dimension; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector.
Similar to step S105, after obtaining Second Eigenvalue, according to the histogram rendering algorithm preset, that is the histogram rendering algorithm adopted in the histogram rendering algorithm adopted in voice data training process and voice data assorting process is identical, all Second Eigenvalues are calculated, the histogram of the audio data sample having classification to mark described in obtaining.Particularly, in a kind of embodiment of the present invention, Second Eigenvalue is mapped in default second value interval each numerically; And interval for transverse axis with the second value preset, the number percent accounting for Second Eigenvalue sum with the Second Eigenvalue number that each numerical value in described second value interval is corresponding for the longitudinal axis calculate described in have classification to mark the histogram of audio data sample.It is to be noted: second value interval here can be identical with above-mentioned first numerical intervals.
Finally, after the histogram of audio data sample obtaining classification mark, classification according to audio data sample marks, and is averaging the histogram of the audio data sample with identical audio categories, obtains the histogram feature template that the voice data of each audio categories is corresponding.Particularly, the histogram of the audio data sample having classification to mark of above-mentioned all calculating gained is first obtained; Then the histogram of the audio data sample having classification to mark of all calculating gained obtained is classified; Interval for transverse axis with the second value preset, with the mean value of number percent corresponding to each numerical value of the histogrammic transverse axis with the audio data sample of identical category for the longitudinal axis calculates histogram feature template corresponding to the voice data of each audio categories.
In the process of histogram feature template corresponding to the above-mentioned voice data obtaining each classification, utilize the method for vector quantization, MFCC proper vector is quantified as the eigenwert of one dimension, reduces intrinsic dimensionality, solve the problem of Sparse in prior art.Meanwhile, the Feature Dimension Reduction due to this process implementation, reduces workload during training characteristics template, so improve the training speed of feature templates.
Step S107, is identified as the audio categories of described first voice data by audio categories corresponding for described first histogram feature template.
After obtaining the first histogram feature template maximum with the first histogram similarity, audio categories corresponding for the first histogram feature template is identified as the audio categories of the first voice data.
Shown in Fig. 2, be the implementing procedure figure of a kind of voice data training method of the embodiment of the present invention, it comprises the following steps:
Step S201, obtains the audio data sample having classification to mark;
Step S202, according to the windowing algorithm preset, the audio timeline of the described audio data sample having classification to mark carries out windowing;
Step S203, extracts a MFCC proper vector to the voice data in each window;
Step S204, according to described default Vector Quantization algorithm, is quantified as the Second Eigenvalue of one dimension by all described MFCC proper vectors extracted; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;
Step S205, according to the first histogram rendering algorithm preset, calculates all Second Eigenvalues, the histogram of the audio data sample having classification to mark described in obtaining;
Step S206, the classification according to audio data sample marks, and is averaging the histogram of the audio data sample with identical audio categories, obtains the histogram feature template that the voice data of each audio categories is corresponding.
The technical scheme that the embodiment of the present invention provides, the proper vector extracted in voice data from classification to be identified is quantified as the eigenwert of one dimension, and corresponding voice data histogram is calculated to obtained eigenwert, the histogram feature template of this histogram and each classification voice data is carried out Similarity Measure, goes out the audio categories of the voice data of classification to be identified according to the size identification of similarity.Because the proper vector extracted in voice data to be quantified as the eigenwert of one dimension by technical scheme of the present invention, achieve Feature Dimension Reduction, so do not need a large amount of data having mark when technical scheme of the present invention is trained obtained feature templates by proper vector, and technical scheme of the present invention utilizes histogram to obtain the principal character of voice data in global scope, which solve the problem of Sparse in prior art, improve the accuracy of voice data classification.Simultaneously, because technical scheme of the present invention utilizes the method for vector quantization to achieve Feature Dimension Reduction, reduce workload during training characteristics template, so technical scheme of the present invention improves the training speed of feature templates, and then improve the classification speed of voice data.
Relative to embodiment of the method above, the present invention also provides a kind of voice data sorter, shown in Figure 3, and this device comprises:
First voice data obtains unit 301, for obtaining the first voice data of classification to be identified;
First voice data windowing unit 302, for according to the windowing algorithm preset, the audio timeline of described first voice data carries out windowing;
First eigenvector extraction unit 303, for extracting a MFCC proper vector to the voice data in each window;
First vector quantization unit 304, for according to the Vector Quantization algorithm preset, is quantified as the First Eigenvalue of one dimension respectively by all MFCC proper vectors extracted; Wherein, the corresponding the First Eigenvalue of each MFCC proper vector;
First histogram calculation unit 305, for according to the histogram rendering algorithm preset, calculates all the First Eigenvalues, obtains the first histogram of described first voice data;
Similarity calculated 306, carries out Similarity Measure for the histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset, obtains the first histogram feature template maximum with described first histogram similarity;
Audio categories recognition unit 307, for being identified as the audio categories of described first voice data by audio categories corresponding for described first histogram feature template.
In a kind of embodiment of the present invention, described first voice data windowing unit 302 specifically comprises:
Audio frame acquiring unit: for by described first voice data at preset timed intervals section be divided into audio frame;
Audio frame windowing unit: for carrying out windowing to described audio frame.
In a kind of embodiment of the present invention, the window that described first voice data windowing unit 302 adds is specially rectangular window or Hamming window.
In a kind of embodiment of the present invention, described first histogram calculation unit 305 specifically comprises:
The First Eigenvalue map unit: for described the First Eigenvalue is mapped in the first default numerical intervals each numerically;
First histogram calculation subelement: for described the first default numerical intervals for transverse axis, account for the first histogram of number percent described first voice data for the longitudinal axis calculates of the First Eigenvalue sum with the First Eigenvalue number that each numerical value of described first numerical intervals is corresponding.
In a kind of embodiment of the present invention, described similarity calculated 306, when the histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset carries out Similarity Measure, the histogram feature template that the voice data of described each default audio categories is corresponding is obtained by the training of voice data trainer in advance, and a kind of voice data trainer comprises:
Audio data sample obtains unit: for obtaining the audio data sample of classification mark;
Audio data sample windowing unit: for according to described default windowing algorithm, the audio timeline of the described audio data sample having classification to mark carries out windowing;
Second feature vector extraction unit: for extracting a MFCC proper vector to the voice data in each window;
Second vector quantization unit: for according to described default Vector Quantization algorithm, all described MFCC proper vectors extracted are quantified as the Second Eigenvalue of one dimension; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;
Second histogram calculation unit: for according to described default histogram rendering algorithm, all Second Eigenvalues are calculated, the histogram of the audio data sample having classification to mark described in obtaining;
Histogram feature template obtains unit: mark for the classification according to audio data sample, be averaging the histogram of the audio data sample with identical audio categories, obtain the histogram feature template that the voice data of each audio categories is corresponding.
In a kind of embodiment of the present invention, described second histogram calculation unit specifically comprises:
Second Eigenvalue map unit: for described Second Eigenvalue is mapped in default second value interval each numerically;
Second histogram calculation subelement: for interval for transverse axis with described default second value, the number percent accounting for Second Eigenvalue sum with the Second Eigenvalue number that each numerical value in described second value interval is corresponding for the longitudinal axis calculate described in have classification to mark the histogram of audio data sample.
In a kind of embodiment of the present invention, described histogram feature template obtains unit and specifically comprises:
Histogram obtains unit: for obtaining the histogram of the audio data sample having classification to mark described in all calculating gained;
Histogram taxon: for the histogram of the audio data sample having classification to mark described in all calculating gained of described acquisition is classified;
Histogram feature template obtains subelement: for interval for transverse axis with described default second value, with the mean value of number percent corresponding to each numerical value of the histogrammic transverse axis with the audio data sample of identical category for the longitudinal axis calculates histogram feature template corresponding to the voice data of each classification.
Shown in Fig. 4, be the structural representation of a kind of voice data trainer of the embodiment of the present invention, this device comprises:
Audio data sample obtains unit 401, for obtaining the audio data sample of classification mark;
Audio data sample windowing unit 402, for according to the windowing algorithm preset, the audio timeline of the described audio data sample having classification to mark carries out windowing;
Second feature vector extraction unit 403, for extracting a MFCC proper vector to the voice data in each window;
Second vector quantization unit 404, for according to the Vector Quantization algorithm preset, is quantified as the Second Eigenvalue of one dimension by all described MFCC proper vectors extracted; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;
Second histogram calculation unit 405, for according to the histogram rendering algorithm preset, calculates all Second Eigenvalues, the histogram of the audio data sample having classification to mark described in obtaining;
Histogram feature template obtains unit 406, marks, is averaging the histogram of the audio data sample with identical audio categories, obtain the histogram feature template that the voice data of each audio categories is corresponding for the classification according to audio data sample.
The technical scheme that the embodiment of the present invention provides, the proper vector extracted in voice data from classification to be identified is quantified as the eigenwert of one dimension, and corresponding voice data histogram is calculated to obtained eigenwert, the histogram feature template of this histogram and each classification voice data is carried out Similarity Measure, goes out the audio categories of the voice data of classification to be identified according to the size identification of similarity.Because the proper vector extracted in voice data to be quantified as the eigenwert of one dimension by technical scheme of the present invention, achieve Feature Dimension Reduction, so do not need a large amount of data having mark when technical scheme of the present invention is trained obtained feature templates by proper vector, and technical scheme of the present invention utilizes histogram to obtain the principal character of voice data in global scope, which solve the problem of Sparse in prior art, improve the accuracy of voice data classification.Simultaneously, because technical scheme of the present invention utilizes the method for vector quantization to achieve Feature Dimension Reduction, reduce workload during training characteristics template, so technical scheme of the present invention improves the training speed of feature templates, and then improve the classification speed of voice data.
For convenience of description, various unit is divided into describe respectively with function when describing above device.Certainly, the function of each unit can be realized in same or multiple software and/or hardware when implementing of the present invention.
By see Fig. 5, the feasibility of technical scheme of the present invention is proved below.Histogram red in Fig. 5 represents advertisement voice data histogram, and the histogram of black represents news voice data histogram.Can find out there is larger similarity between advertisement voice data histogram and advertisement voice data histogram, have larger similarity between news voice data histogram and news voice data histogram, and similarity between advertisement voice data histogram and news voice data histogram is less.So we can draw following conclusion: the similarity between the voice data histogram of identical category is comparatively large, and the similarity between different classes of voice data histogram is less.Namely technical scheme of the present invention can accurately, rapidly be classified to voice data when Sparse.
As seen through the above description of the embodiments, those skilled in the art can be well understood to the mode that the present invention can add required general hardware platform by software and realizes.Based on such understanding, technical scheme of the present invention can embody with the form of software product the part that prior art contributes in essence in other words, this computer software product can be stored in storage medium, as ROM/RAM, magnetic disc, CD etc., comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) perform the method described in some part of each embodiment of the present invention or embodiment.
It should be noted that, in this article, the such as relational terms of first and second grades and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.
Each embodiment in this instructions all adopts relevant mode to describe, between each embodiment identical similar part mutually see, what each embodiment stressed is the difference with other embodiments.Especially, for system embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
The foregoing is only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.All any amendments done within the spirit and principles in the present invention, equivalent replacement, improvement etc., be all included in protection scope of the present invention.

Claims (16)

1. an audio data classification method, is applied to electronic equipment, it is characterized in that, comprising:
Obtain the first voice data of classification to be identified;
According to the windowing algorithm preset, the audio timeline of described first voice data carries out windowing;
A MFCC proper vector is extracted to the voice data in each window;
According to the Vector Quantization algorithm preset, all MFCC proper vectors extracted are quantified as respectively the First Eigenvalue of one dimension; Wherein, the corresponding the First Eigenvalue of each MFCC proper vector;
According to the histogram rendering algorithm preset, all the First Eigenvalues are calculated, obtains the first histogram of described first voice data;
The histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset carries out Similarity Measure, obtains the first histogram feature template maximum with described first histogram similarity;
Audio categories corresponding for described first histogram feature template is identified as the audio categories of described first voice data.
2. method according to claim 1, is characterized in that, the described windowing algorithm according to presetting, and the audio timeline of described first voice data carries out the step of windowing, comprising:
By described first voice data at preset timed intervals section be divided into audio frame;
Windowing is carried out to described audio frame.
3. method according to claim 1 and 2, is characterized in that, described window is specially rectangular window or Hamming window.
4. method according to claim 1 and 2, is characterized in that, the described histogram rendering algorithm according to presetting, and calculate all the First Eigenvalues, the first histogram obtaining described first voice data comprises:
Described the First Eigenvalue is mapped in the first default numerical intervals each numerically;
With described the first default numerical intervals for transverse axis, account for the first histogram of number percent described first voice data for the longitudinal axis calculates of the First Eigenvalue sum with the First Eigenvalue number that each numerical value of described first numerical intervals is corresponding.
5. method according to claim 1, is characterized in that, the histogram feature template that the voice data of described each default audio categories is corresponding, is to be obtained by voice data training in advance, and a kind of voice data training method comprises:
Obtain the audio data sample having classification to mark;
According to described default windowing algorithm, the audio timeline of the described audio data sample having classification to mark carries out windowing;
A MFCC proper vector is extracted to the voice data in each window;
According to described default Vector Quantization algorithm, all described MFCC proper vectors extracted are quantified as the Second Eigenvalue of one dimension; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;
According to described default histogram rendering algorithm, all Second Eigenvalues are calculated, the histogram of the audio data sample having classification to mark described in obtaining;
Classification according to audio data sample marks, and is averaging the histogram of the audio data sample with identical audio categories, obtains the histogram feature template that the voice data of each audio categories is corresponding.
6. method according to claim 5, is characterized in that, described according to described default histogram rendering algorithm, calculate all Second Eigenvalues, the histogram of the audio data sample having classification to mark described in obtaining comprises:
Described Second Eigenvalue is mapped in default second value interval each numerically;
Interval for transverse axis with described default second value, the number percent accounting for Second Eigenvalue sum with the Second Eigenvalue number that each numerical value in described second value interval is corresponding for the longitudinal axis calculate described in have classification to mark the histogram of audio data sample.
7. method according to claim 6, it is characterized in that, the described classification according to audio data sample marks, and is averaging the histogram of the audio data sample with identical audio categories, obtain the histogram feature template that the voice data of each audio categories is corresponding, comprise the following steps:
Obtain the histogram of the audio data sample having classification to mark described in all calculating gained;
The histogram of the audio data sample having classification to mark described in all calculating gained of described acquisition is classified;
Interval for transverse axis with described default second value, with the mean value of number percent corresponding to each numerical value of the histogrammic transverse axis with the audio data sample of identical category for the longitudinal axis calculates histogram feature template corresponding to the voice data of each audio categories.
8. a voice data training method, is characterized in that, comprising:
Obtain the audio data sample having classification to mark;
According to the windowing algorithm preset, the audio timeline of the described audio data sample having classification to mark carries out windowing;
A MFCC proper vector is extracted to the voice data in each window;
According to the Vector Quantization algorithm preset, all described MFCC proper vectors extracted are quantified as the Second Eigenvalue of one dimension; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;
According to the histogram rendering algorithm preset, all Second Eigenvalues are calculated, the histogram of the audio data sample having classification to mark described in obtaining;
Classification according to audio data sample marks, and is averaging the histogram of the audio data sample with identical audio categories, obtains the histogram feature template that the voice data of each audio categories is corresponding.
9. a voice data sorter, is characterized in that, comprising:
First voice data obtains unit: for obtaining the first voice data of classification to be identified;
First voice data windowing unit: for according to the windowing algorithm preset, the audio timeline of described first voice data carries out windowing;
First eigenvector extraction unit: for extracting a MFCC proper vector to the voice data in each window;
First vector quantization unit: for according to the Vector Quantization algorithm preset, all MFCC proper vectors extracted are quantified as respectively the First Eigenvalue of one dimension; Wherein, the corresponding the First Eigenvalue of each MFCC proper vector;
First histogram calculation unit: for according to the histogram rendering algorithm preset, all the First Eigenvalues are calculated, obtains the first histogram of described first voice data;
Similarity calculated: carry out Similarity Measure for the histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset, obtain the first histogram feature template maximum with described first histogram similarity;
Audio categories recognition unit: for audio categories corresponding for described first histogram feature template being identified as the audio categories of described first voice data.
10. device according to claim 9, is characterized in that, described first voice data windowing unit specifically comprises:
Audio frame acquiring unit: for by described first voice data at preset timed intervals section be divided into audio frame;
Audio frame windowing unit: for carrying out windowing to described audio frame.
11. devices according to claim 9 or 10, it is characterized in that, the window that described first voice data windowing unit adds is specially rectangular window or Hamming window.
12. devices according to claim 9 or 10, it is characterized in that, described first histogram calculation unit specifically comprises:
The First Eigenvalue map unit: for described the First Eigenvalue is mapped in the first default numerical intervals each numerically;
First histogram calculation subelement: for described the first default numerical intervals for transverse axis, account for the first histogram of number percent described first voice data for the longitudinal axis calculates of the First Eigenvalue sum with the First Eigenvalue number that each numerical value of described first numerical intervals is corresponding.
13. devices according to claim 9, it is characterized in that, described similarity calculated, when the histogram feature template that described first histogram is corresponding with the voice data of each audio categories preset carries out Similarity Measure, the histogram feature template that the voice data of described each default audio categories is corresponding is obtained by the training of voice data trainer in advance, and a kind of voice data trainer comprises:
Audio data sample obtains unit: for obtaining the audio data sample of classification mark;
Audio data sample windowing unit: for according to described default windowing algorithm, the audio timeline of the described audio data sample having classification to mark carries out windowing;
Second feature vector extraction unit: for extracting a MFCC proper vector to the voice data in each window;
Second vector quantization unit: for according to described default Vector Quantization algorithm, all described MFCC proper vectors extracted are quantified as the Second Eigenvalue of one dimension; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;
Second histogram calculation unit: for according to described default histogram rendering algorithm, all Second Eigenvalues are calculated, the histogram of the audio data sample having classification to mark described in obtaining;
Histogram feature template obtains unit: mark for the classification according to audio data sample, be averaging the histogram of the audio data sample with identical audio categories, obtain the histogram feature template that the voice data of each audio categories is corresponding.
14. devices according to claim 13, is characterized in that, described second histogram calculation unit specifically comprises:
Second Eigenvalue map unit: for described Second Eigenvalue is mapped in default second value interval each numerically;
Second histogram calculation subelement: for interval for transverse axis with described default second value, the number percent accounting for Second Eigenvalue sum with the Second Eigenvalue number that each numerical value in described second value interval is corresponding for the longitudinal axis calculate described in have classification to mark the histogram of audio data sample.
15. devices according to claim 14, is characterized in that, described histogram feature template obtains unit and specifically comprises:
Histogram obtains unit: for obtaining the histogram of the audio data sample having classification to mark described in all calculating gained;
Histogram taxon: for the histogram of the audio data sample having classification to mark described in all calculating gained of described acquisition is classified;
Histogram feature template obtains subelement: for interval for transverse axis with described default second value, with the mean value of number percent corresponding to each numerical value of the histogrammic transverse axis with the audio data sample of identical category for the longitudinal axis calculates histogram feature template corresponding to the voice data of each classification.
16. 1 kinds of voice data trainers, is characterized in that, comprising:
Audio data sample obtains unit: for obtaining the audio data sample of classification mark;
Audio data sample windowing unit: for according to the windowing algorithm preset, the audio timeline of the described audio data sample having classification to mark carries out windowing;
Second feature vector extraction unit: for extracting a MFCC proper vector to the voice data in each window;
Second vector quantization unit: for according to the Vector Quantization algorithm preset, all described MFCC proper vectors extracted are quantified as the Second Eigenvalue of one dimension; Wherein, the corresponding Second Eigenvalue of each MFCC proper vector;
Second histogram calculation unit: for according to the histogram rendering algorithm preset, all Second Eigenvalues are calculated, the histogram of the audio data sample having classification to mark described in obtaining;
Histogram feature template obtains unit: mark for the classification according to audio data sample, be averaging the histogram of the audio data sample with identical audio categories, obtain the histogram feature template that the voice data of each audio categories is corresponding.
CN201410817745.7A 2014-12-24 2014-12-24 Method and device for classifying voice data Pending CN104462537A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410817745.7A CN104462537A (en) 2014-12-24 2014-12-24 Method and device for classifying voice data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410817745.7A CN104462537A (en) 2014-12-24 2014-12-24 Method and device for classifying voice data

Publications (1)

Publication Number Publication Date
CN104462537A true CN104462537A (en) 2015-03-25

Family

ID=52908572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410817745.7A Pending CN104462537A (en) 2014-12-24 2014-12-24 Method and device for classifying voice data

Country Status (1)

Country Link
CN (1) CN104462537A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105657535A (en) * 2015-12-29 2016-06-08 北京搜狗科技发展有限公司 Audio recognition method and device
CN105654964A (en) * 2016-01-20 2016-06-08 司法部司法鉴定科学技术研究所 Recording audio device source determination method and device
CN105975568A (en) * 2016-04-29 2016-09-28 腾讯科技(深圳)有限公司 Audio processing method and apparatus
CN106910494A (en) * 2016-06-28 2017-06-30 阿里巴巴集团控股有限公司 A kind of audio identification methods and device
CN106919662A (en) * 2017-02-14 2017-07-04 复旦大学 A kind of music recognition methods and system
CN107293308A (en) * 2016-04-01 2017-10-24 腾讯科技(深圳)有限公司 A kind of audio-frequency processing method and device
CN108231091A (en) * 2018-01-24 2018-06-29 广州酷狗计算机科技有限公司 A kind of whether consistent method and apparatus of left and right acoustic channels for detecting audio
CN108320756A (en) * 2018-02-07 2018-07-24 广州酷狗计算机科技有限公司 It is a kind of detection audio whether be absolute music audio method and apparatus
CN110516109A (en) * 2019-08-30 2019-11-29 腾讯科技(深圳)有限公司 Correlating method, device and the storage medium of music label
CN115910042A (en) * 2023-01-09 2023-04-04 百融至信(北京)科技有限公司 Method and apparatus for identifying information type of formatted audio file

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090060211A1 (en) * 2007-08-30 2009-03-05 Atsuhiro Sakurai Method and System for Music Detection
CN103325382A (en) * 2013-06-07 2013-09-25 大连民族学院 Method for automatically identifying Chinese national minority traditional instrument audio data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090060211A1 (en) * 2007-08-30 2009-03-05 Atsuhiro Sakurai Method and System for Music Detection
CN103325382A (en) * 2013-06-07 2013-09-25 大连民族学院 Method for automatically identifying Chinese national minority traditional instrument audio data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KUNIO KASHINO.ETC: "A Quick Search Method for Audio and Video Signals Based on Histogram Pruning", 《IEEE TRANSACTIONS ON MULTIMEDIA》 *
张长水等: "《机器学习及其应用 2013》", 31 October 2013, 北京:清华大学出版社 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105657535A (en) * 2015-12-29 2016-06-08 北京搜狗科技发展有限公司 Audio recognition method and device
CN105657535B (en) * 2015-12-29 2018-10-30 北京搜狗科技发展有限公司 A kind of audio identification methods and device
CN105654964A (en) * 2016-01-20 2016-06-08 司法部司法鉴定科学技术研究所 Recording audio device source determination method and device
CN107293308B (en) * 2016-04-01 2019-06-07 腾讯科技(深圳)有限公司 A kind of audio-frequency processing method and device
CN107293308A (en) * 2016-04-01 2017-10-24 腾讯科技(深圳)有限公司 A kind of audio-frequency processing method and device
CN105975568A (en) * 2016-04-29 2016-09-28 腾讯科技(深圳)有限公司 Audio processing method and apparatus
CN105975568B (en) * 2016-04-29 2020-04-03 腾讯科技(深圳)有限公司 Audio processing method and device
CN106910494A (en) * 2016-06-28 2017-06-30 阿里巴巴集团控股有限公司 A kind of audio identification methods and device
WO2018001125A1 (en) * 2016-06-28 2018-01-04 阿里巴巴集团控股有限公司 Method and device for audio recognition
TWI659410B (en) * 2016-06-28 2019-05-11 香港商阿里巴巴集團服務有限公司 Audio recognition method and device
US11133022B2 (en) 2016-06-28 2021-09-28 Advanced New Technologies Co., Ltd. Method and device for audio recognition using sample audio and a voting matrix
US10910000B2 (en) 2016-06-28 2021-02-02 Advanced New Technologies Co., Ltd. Method and device for audio recognition using a voting matrix
CN106919662A (en) * 2017-02-14 2017-07-04 复旦大学 A kind of music recognition methods and system
CN106919662B (en) * 2017-02-14 2021-08-31 复旦大学 Music identification method and system
CN108231091A (en) * 2018-01-24 2018-06-29 广州酷狗计算机科技有限公司 A kind of whether consistent method and apparatus of left and right acoustic channels for detecting audio
CN108320756A (en) * 2018-02-07 2018-07-24 广州酷狗计算机科技有限公司 It is a kind of detection audio whether be absolute music audio method and apparatus
CN110516109A (en) * 2019-08-30 2019-11-29 腾讯科技(深圳)有限公司 Correlating method, device and the storage medium of music label
CN115910042A (en) * 2023-01-09 2023-04-04 百融至信(北京)科技有限公司 Method and apparatus for identifying information type of formatted audio file
CN115910042B (en) * 2023-01-09 2023-05-05 百融至信(北京)科技有限公司 Method and device for identifying information type of formatted audio file

Similar Documents

Publication Publication Date Title
CN104462537A (en) Method and device for classifying voice data
CN110147726B (en) Service quality inspection method and device, storage medium and electronic device
CN105912625B (en) A kind of entity classification method and system towards link data
CN111081279A (en) Voice emotion fluctuation analysis method and device
MX2016003981A (en) Classifier training method, type recognition method, and apparatus.
CN108932950A (en) It is a kind of based on the tag amplified sound scenery recognition methods merged with multifrequency spectrogram
CN110019779B (en) Text classification method, model training method and device
CN101894548A (en) Modeling method and modeling device for language identification
CN109005451B (en) Video strip splitting method based on deep learning
CN108615532A (en) A kind of sorting technique and device applied to sound field scape
CN103793447A (en) Method and system for estimating semantic similarity among music and images
KR101667557B1 (en) Device and method for sound classification in real time
CN106531195B (en) A kind of dialogue collision detection method and device
CN111159332A (en) Text multi-intention identification method based on bert
CN105912525A (en) Sentiment classification method for semi-supervised learning based on theme characteristics
CN112331188A (en) Voice data processing method, system and terminal equipment
CN112926621A (en) Data labeling method and device, electronic equipment and storage medium
CN105609116A (en) Speech emotional dimensions region automatic recognition method
CN105678244A (en) Approximate video retrieval method based on improvement of editing distance
CN116524939A (en) ECAPA-TDNN-based automatic identification method for bird song species
CN105550278A (en) Webpage region recognition algorithm based on deep learning
Abidin et al. Enhanced LBP texture features from time frequency representations for acoustic scene classification
CN115759033A (en) Method, device and equipment for processing track data
CN106710588B (en) Speech data sentence recognition method, device and system
EP4071764A3 (en) Information processing program, information processing apparatus, and information processing method for determining properties of molecules

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150325