CN104700833A

CN104700833A - Big data speech classification method

Info

Publication number: CN104700833A
Application number: CN201410844027.9A
Authority: CN
Inventors: 高辉; 尚成辉
Original assignee: Wuhu Leruisi Information Consulting Co Ltd
Current assignee: Wuhu Leruisi Information Consulting Co Ltd
Priority date: 2014-12-29
Filing date: 2014-12-29
Publication date: 2015-06-10

Abstract

The invention discloses a big data speech classification method. The big data speech classification method comprises the following steps of 1 collecting speed samples as a training set, 2 looking for a big data speech classification optimized frequency spectrum matrix, 3 conducting spectral analysis on unmarked data, and 4 adopting frequency bands to classify big data speeches according to frequency spectrum data. By means of the big data speech classification method, feature information of different speech data can be effectively found under the big data situation, accordingly various relevant data are effectively classified, storage cost in the training process is effectively reduced, and the classification accuracy is higher than the accuracy in the prior art.

Description

A kind of large data-voice sorting technique

Technical field

The present invention relates to a kind of large data-voice sorting technique.

Background technology

Along with developing rapidly of mobile Internet, more and more enter the life of people with digital camera smart mobile phone, panel computer, be easy to produce a large amount of individual voice messagings.Although utilizing time and catalogue to manage speech data is a kind of common method, lacks semantic level and voice are effectively managed.Therefore supervised learning method is utilized, by learning artificial labeled data, to obtain Classification of Speech model, then to not having the voice marked to carry out automatic speech classification.Because the common intrinsic dimensionality of voice is very high, therefore fourier transform method contributes to the raising of recognition performance.

The linear Fourier transformation method of traditional overall situation is mainly based on linear, and wherein linear discriminant analysis is widely used on pattern classification problem.Fisher face to make while between class distance in class sample separation from minimum mainly through maximizing, thus realize different classes of between separability.But it is huge that large data image classification is faced with classification number, the sample size of needs classification is huge waits difficulty.Linear discriminant analysis is for large data, and use cost is higher, and in order to obtain certain classification performance, it needs manually a large amount of mark samples.This makes Classification of Speech software development cost roll up, and needs manually a large amount of mark samples.

Therefore, find one to need to mark the automatic speech sorting technique that a small amount of sample can be met requirement and be very important.

Summary of the invention

Technical matters to be solved by this invention is to provide a kind of large data-voice sorting technique, reduces software development cost, rationally effectively classifies to a large amount of speech datas, distinguish and process.

The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of large data-voice sorting technique, comprises the steps:

1) speech samples is collected as training set;

2) spectral matrix that the classification of large data-voice is optimum is found;

3) spectrum analysis is carried out to without labeled data;

4) frequency range is adopted to classify to large data-voice to frequency spectrum data.

Preferably, the spectral matrix that the classification of described searching large data-voice is optimum, comprises the following steps:

Step 1, set up local optimum objective function;

Step 2, set up global optimization objective function;

Step 3, utilize Fourier Transform Algorithm: by the question variation of new global optimization target for asking generalized eigenvalue problem, the optimum spectral matrix of large data-voice classification is by formula x (n, k) corresponding to front m minimal eigenvalue.

The invention has the beneficial effects as follows: under can effectively finding out large data cases, the characteristic information of different phonetic data thus obtain effective classification of all kinds of related data, effectively reduce the carrying cost in training process, its classify accuracy is higher than prior art.

Accompanying drawing explanation

Fig. 1 is one-piece construction schematic diagram of the present invention;

Embodiment

Be described principle of the present invention and feature below in conjunction with accompanying drawing, example, only for explaining the present invention, is not intended to limit scope of the present invention.

As shown in Figure 1, a kind of large data-voice sorting technique, comprises the steps:

1) speech samples is collected as training set;

3) spectrum analysis is carried out to without labeled data;

Step 1, set up local optimum objective function;

Step 2, set up global optimization objective function;

Compared with prior art, the invention has the advantages that, under can effectively finding out large data cases, the characteristic information of different phonetic data thus obtain effective classification of all kinds of related data, effectively reduce the carrying cost in training process, its classify accuracy is higher than prior art.

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. a large data-voice sorting technique, is characterized in that, comprise the steps:

1) speech samples is collected as training set;

3) spectrum analysis is carried out to without labeled data;

2. large data-voice sorting technique according to claim 1, is characterized in that, the spectral matrix that the classification of described searching large data-voice is optimum, comprises the following steps:

Step 1, set up local optimum objective function;

Step 2, set up global optimization objective function;