CN105895110A

CN105895110A - Method and device for classifying audio files

Info

Publication number: CN105895110A
Application number: CN201610512234.3A
Authority: CN
Inventors: 黄瑛; 兰细鹏; 胡明清; 王涛
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2016-06-30
Filing date: 2016-06-30
Publication date: 2016-08-24

Abstract

The embodiment of the invention discloses a method and a device for classifying audio files. The method comprises the following steps of pre-classifying music, and obtaining a spectrogram of each type of music; for a target audio file to be classified, obtaining a spectrogram of the target audio file; according to the similarity of the spectrogram of the target audio file and the spectrogram of each type of music, determining the type of the target audio file. The method disclosed by the embodiment is used for classifying the audio files according to the spectrograms.

Description

The sorting technique of a kind of audio file and device

Technical field

The present invention relates to Audiotechnica field, particularly to sorting technique and the device of a kind of audio file.

Background technology

In the internet multimedia epoch, people become more and more diversified to the demand of music.Music assorting, Contribute to people music is labeled, such as different to different musical genre marks emotions, it is possible to To facilitate user preferably to obtain music sources according to interest.

Traditional music assorting method, by audio extraction feature, then classifies with grader. Audio frequency characteristics includes: temporal signatures, comprise short-time average energy, linear predictor coefficient, zero-crossing rate and Derivative feature；Frequency domain character, comprises Mel coefficient, LPC cepstral coefficients and entropy feature；Time-frequency is special Levy, comprise wavelet coefficient.In this process, effective audio feature extraction and selection be one more Complicated process.

Summary of the invention

The purpose of the embodiment of the present invention is to provide sorting technique and the device of a kind of audio file, logical to realize Cross sound spectrograph audio file is classified.

For reaching above-mentioned purpose, the embodiment of the invention discloses the sorting technique of a kind of audio file, in advance will Music is classified, and obtains the sound spectrograph of each class music；Method includes:

For target audio file to be sorted, it is thus achieved that the sound spectrograph of described target audio file；

Sound spectrograph according to described target audio file and the similarity of the sound spectrograph of described each class music, Determine the classification of described target audio file.

It is also preferred that the left described for target audio file to be sorted, it is thus achieved that the language spectrum of described target audio file Figure, including:

For target audio file to be sorted, described target audio file is carried out segmentation；

Obtain the sound spectrograph of each section audio file respectively.

It is also preferred that the left the language spectrum of the described sound spectrograph according to described target audio file and described each class music The similarity of figure, determines the classification of described target audio file, including:

Utilize neutral net, according to the sound spectrograph of described each section audio file and described each class music The similarity of sound spectrograph, determines the classification of each section audio file；

According to the classification of all section audio files, determine the classification of described target audio file.

It is also preferred that the left the described sound spectrograph obtaining each section audio file respectively, including:

Being respectively directed to each section audio file, each audio frame for described section audio file carries out Fourier Conversion, obtains the spectrum value of described audio frame；

The spectrum value of each audio frame according to described section audio file, generates the language spectrum of described section audio file Figure.

It is also preferred that the left described neutral net is:

Convolutional neural networks.

For reaching above-mentioned purpose, the embodiment of the invention discloses the sorter of a kind of audio file, in advance will Music is classified, and obtains the sound spectrograph of each class music；Device includes:

Obtain module, for for target audio file to be sorted, it is thus achieved that the language of described target audio file Spectrogram；

Determine module, for the sound spectrograph according to described target audio file and the language of described each class music The similarity of spectrogram, determines the classification of described target audio file.

It is also preferred that the left described acquisition module, including:

Segmentation submodule, for for target audio file to be sorted, is carried out described target audio file Segmentation；

Obtain submodule, for obtaining the sound spectrograph of each section audio file respectively.

Module is determined it is also preferred that the left described, specifically for:

It is also preferred that the left described acquisition submodule, specifically for:

It is also preferred that the left described neutral net is:

Convolutional neural networks.

As seen from the above technical solutions, the embodiment of the present invention provide a kind of audio file sorting technique and Device, classifies music in advance, and obtains the sound spectrograph of each class music；For target to be sorted Audio file, it is thus achieved that the sound spectrograph of described target audio file；Sound spectrograph according to described target audio file And the similarity of the sound spectrograph of described each class music, determine the classification of described target audio file.

Visible, utilize the sound spectrograph of target audio file and the similarity of the sound spectrograph of each class music, determine The classification of target audio file, it is achieved that audio file is classified by sound spectrograph.

Certainly, arbitrary product or the method for implementing the present invention must be not necessarily required to reach above-described institute simultaneously There is advantage.

Accompanying drawing explanation

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to enforcement In example or description of the prior art, the required accompanying drawing used is briefly described, it should be apparent that, describe below In accompanying drawing be only some embodiments of the present invention, for those of ordinary skill in the art, do not paying On the premise of going out creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.

The schematic flow sheet of the sorting technique of a kind of audio file that Fig. 1 provides for the embodiment of the present invention；

The structural representation of the sorter of a kind of audio file that Fig. 2 provides for the embodiment of the present invention；

The sound spectrograph of the Jazz that Fig. 3 provides for the embodiment of the present invention；

The sound spectrograph of the Blue that Fig. 4 provides for the embodiment of the present invention；

The sound spectrograph of the Metal that Fig. 5 provides for the embodiment of the present invention；

The sound spectrograph of the Pop that Fig. 6 provides for the embodiment of the present invention；

The sound spectrograph of the Hip-pop that Fig. 7 provides for the embodiment of the present invention.

Detailed description of the invention

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clearly Chu, be fully described by, it is clear that described embodiment be only a part of embodiment of the present invention rather than Whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art are not making creation The every other embodiment obtained under property work premise, broadly falls into the scope of protection of the invention.

First below the sorting technique of a kind of audio file that the embodiment of the present invention provides is described in detail.

See the flow process signal of the sorting technique of a kind of audio file that Fig. 1, Fig. 1 provide for the embodiment of the present invention Figure, classifies music in advance, and obtains the sound spectrograph of each class music；May include steps of:

S101, for target audio file to be sorted, it is thus achieved that the sound spectrograph of described target audio file；

Concrete, audio file can be music file, music is classified in advance, such as Jazz, Blue, Metal, Pop, Hip-pop etc., obtain the sound spectrograph of each class music simultaneously.Wherein, each class music Sound spectrograph respectively the most as shown in Figure 3, Figure 4, Figure 5, Figure 6, Figure 7.For target audio file to be sorted, Target audio file can be carried out segmentation.Such as, the music of 1 first 60s being started anew, every 5s is divided into one Fragment, is divided into 12 snatch of musics.

Concrete, for a wherein section audio file, the voice signal in this section audio file can be carried out Windowing, is moved into row framing according to certain window length and window；To every frame audio sample by fast Fourier transform, Obtain this section audio file spectrum value；The spectrum value of this section audio file can be normalized, Change into the value between 0 to 255, generate the sound spectrograph of this section audio file.For each section audio file all Process in this way, obtain the sound spectrograph of each section audio file respectively, thus obtain target audio The sound spectrograph of file.Wherein, the windowing of voice signal, framing and fast Fourier transform belong to existing Technology, does not repeats them here.

S102, according to the sound spectrograph of described target audio file and the phase of the sound spectrograph of described each class music Like property, determine the classification of described target audio file.

Concrete, the texture of same type of music sound spectrograph has similarity, and human eye can be according to texture one Determine in degree, to tell different music categories.In actual applications, it is possible to use neutral net, according to often The sound spectrograph of one section audio file and the similarity of the sound spectrograph of each class music, determine each section audio literary composition The classification of part；According to the classification of all section audio files, determine the classification of described target audio file.In reality In the application of border, this neutral net can be convolutional neural networks CNN.

Exemplary, the method that maximum is voted can be used, for 12 music file fragments of 1 song, Utilizing convolutional neural networks, determine that the classification of wherein 9 fragments is Jazz, the classification of 2 fragments is Blue, 1 The classification of individual fragment is Pop, then after processing, final classification results is Jazz, so that it is determined that the classification of this music For jazz (Jazz).Wherein, convolutional neural networks (CNN) is that a kind of image based on degree of depth study divides Class and object detection algorithms, belong to prior art, does not repeats them here.

Visible, utilize the sound spectrograph of target audio file and the similarity of the sound spectrograph of each class music, determine The classification of target audio file, is not related to the complex process of audio feature extraction and selection, thus realizes passing through Audio file is classified by sound spectrograph.

See the structural representation of the sorter of a kind of audio file that Fig. 2, Fig. 2 provide for the embodiment of the present invention Figure, corresponding with the flow process shown in Fig. 1, in advance music is classified, and obtain the language of each class music Spectrogram；This sorter may include that acquisition module 201, determines module 202.

Obtain module 201, for for target audio file to be sorted, it is thus achieved that described target audio file Sound spectrograph；

Concrete, it is thus achieved that module 201, may include that segmentation submodule and obtain submodule (not shown)；

Concrete, described acquisition submodule, specifically may be used for:

Being respectively directed to each section audio file, each audio frame for described section audio file carries out Fourier Conversion, obtains the spectrum value of described audio frame；The spectrum value of each audio frame according to described section audio file, Generate the sound spectrograph of described section audio file.

Determine module 202, for according to the sound spectrograph of described target audio file and described each class music The similarity of sound spectrograph, determines the classification of described target audio file.

Concrete, determine module 202, specifically may be used for:

Utilize neutral net, according to the sound spectrograph of described each section audio file and described each class music The similarity of sound spectrograph, determines the classification of each section audio file；According to the classification of all section audio files, Determine the classification of described target audio file.

Concrete, described neutral net can be: convolutional neural networks.

It should be noted that in this article, the relational terms of such as first and second or the like be used merely to by One entity or operation separate with another entity or operating space, and not necessarily require or imply these Relation or the order of any this reality is there is between entity or operation.And, term " includes ", " comprising " Or its any other variant is intended to comprising of nonexcludability, so that include the mistake of a series of key element Journey, method, article or equipment not only include those key elements, but also other including being not expressly set out Key element, or also include the key element intrinsic for this process, method, article or equipment.Do not having In the case of more restrictions, statement " including ... " key element limited, it is not excluded that including described wanting Process, method, article or the equipment of element there is also other identical element.

Each embodiment in this specification all uses relevant mode to describe, phase homophase between each embodiment As part see mutually, what each embodiment stressed is the difference with other embodiments. For device embodiment, owing to it is substantially similar to embodiment of the method, so the comparison described Simply, relevant part sees the part of embodiment of the method and illustrates.

One of ordinary skill in the art will appreciate that all or part of step realizing in said method embodiment The program that can be by completes to instruct relevant hardware, and described program can be stored in computer-readable Take in storage medium, the storage medium obtained designated herein, such as: ROM/RAM, magnetic disc, CD etc..

The foregoing is only presently preferred embodiments of the present invention, be not intended to limit protection scope of the present invention. All any modification, equivalent substitution and improvement etc. made within the spirit and principles in the present invention, are all contained in In protection scope of the present invention.

Claims

1. the sorting technique of an audio file, it is characterised in that in advance music is classified, and obtain The sound spectrograph of each class music；Described method includes:

Method the most according to claim 1, described for target audio file to be sorted, it is thus achieved that institute State the sound spectrograph of target audio file, including:

Obtain the sound spectrograph of each section audio file respectively.

Method the most according to claim 2, the described sound spectrograph according to described target audio file and The similarity of the sound spectrograph of described each class music, determines the classification of described target audio file, including:

Method the most according to claim 2, the described sound spectrograph obtaining each section audio file respectively, Including:

Method the most according to claim 3, described neutral net is:

Convolutional neural networks.

6. the sorter of an audio file, it is characterised in that in advance music is classified, and obtain The sound spectrograph of each class music；Described device includes:

Device the most according to claim 6, described acquisition module, including:

Device the most according to claim 7, described determines module, specifically for:

Device the most according to claim 7, described acquisition submodule, specifically for:

Device the most according to claim 8, described neutral net is:

Convolutional neural networks.