CN104036788B

CN104036788B - The acoustic fidelity identification method of audio file and device

Info

Publication number: CN104036788B
Application number: CN201410235733.3A
Authority: CN
Inventors: 田彪
Original assignee: Beijing Yinzhibang Culture Technology Co Ltd
Current assignee: Shenzhen Taile Culture Technology Co.,Ltd.
Priority date: 2014-05-29
Filing date: 2014-05-29
Publication date: 2016-10-05
Anticipated expiration: 2034-05-29
Also published as: CN104036788A

Abstract

The present invention provides acoustic fidelity identification method and the device of a kind of audio file.The embodiment of the present invention is by obtaining target audio file to be identified, and then according to described target audio file, obtain at least one in the time domain waveform feature of described target audio file and the frequency domain spectral line characteristic of described target audio file, make it possible to according at least one in described time domain waveform feature and described frequency domain spectral line characteristic, the tonequality identifying described target audio file is the first tonequality or the second tonequality, described first tonequality is higher than described second tonequality, so, the audio file of real high tone quality can be provided a user with, allow users to appreciate the audio file of real high tone quality.

Description

The acoustic fidelity identification method of audio file and device

[technical field]

The present invention relates to audio signal processing technique, particularly relate to acoustic fidelity identification method and the dress of a kind of audio file Put.

[background technology]

The tonequality of audio file, refers to the fidelity of original audio data after overcompression processes.High The audio file of tonequality, it is possible to recover original audio data completely, and do not cause any distortion；And bass The audio file of matter, then can not recover original audio data completely, and cause partial distortion.At present, Occur in that some switch technologies, it is possible to the audio file of low tonequality is converted into the audio file of pseudo-high tone quality. It practice, the audio file of this pseudo-high tone quality, its tonequality with change before the tonequality of audio file be The same, and it is not belonging to real high tone quality.User obtains these pseudo-high pitchs by the application of some music classes After the audio file of matter, cannot enjoy real high tone quality, this can affect these music classes should at all Brand image, even also result in legal dispute.

Therefore, in order to provide a user with the audio file of real high tone quality, allow users to appreciate The audio file of real high tone quality, effectively identifies the tonequality of audio file, is problem demanding prompt solution.

[summary of the invention]

The many aspects of the present invention provide acoustic fidelity identification method and the device of a kind of audio file, in order to realize The tonequality identification of audio file.

An aspect of of the present present invention, it is provided that the acoustic fidelity identification method of a kind of audio file, including:

Obtain target audio file to be identified；

According to described target audio file, it is thus achieved that the time domain waveform feature of described target audio file and described At least one in the frequency domain spectral line characteristic of target audio file；

According at least one in described time domain waveform feature and described frequency domain spectral line characteristic, identify described mesh The tonequality of mark with phonetic symbols frequency file is the first tonequality or the second tonequality, and described first tonequality is higher than described second tonequality.

Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute State according to described target audio file, it is thus achieved that the time domain waveform feature of described target audio file and described mesh At least one in the frequency domain spectral line characteristic of mark with phonetic symbols frequency file, including:

Determine the number of channels of described target audio file；

The data block of described target audio file is decoded, to obtain original audio data；

According to described number of channels and described original audio data, it is thus achieved that the sound channel sound corresponding to each sound channel Frequency evidence.

Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute State according at least one in described time domain waveform feature and described frequency domain spectral line characteristic, identify described target The tonequality of audio file is the first tonequality or the second tonequality, including:

If described number of channels is more than or equal to 2, according to the channel audio data corresponding to each sound channel, obtain Obtain the first channel audio data corresponding at least two sound channel and second sound channel voice data；

Described first channel audio data and described second sound channel voice data are carried out addition process, to obtain Obtain mixed layer sound channel voice data；

If described mixed layer sound channel voice data is more than or equal to described first channel audio data/N or described the Two channel audio data/M, identifies that the tonequality of described target audio file is described first tonequality；

If described mixed layer sound channel voice data is less than described first channel audio data/N or described second sound channel Voice data/M, identifies that the tonequality of described target audio file is described second tonequality；Wherein,

N is the number more than 1；M is the number more than 1.

If difference in the value of the target channels voice data specified number continuously between any two, less than or etc. In the first amplitude threshold, identify that the tonequality of described target audio file is described second tonequality, described target Channel audio data include the sound corresponding to arbitrary sound channel in the channel audio data corresponding to each sound channel Audio data；Or

If the difference of the value of the target channels voice data of continuous two, more than or equal to the second amplitude threshold, And the symbol of the value of the target channels voice data of described continuous two is contrary, identify described target audio literary composition The tonequality of part is described second tonequality, and described target channels voice data includes the sound corresponding to each sound channel The channel audio data corresponding to arbitrary sound channel in audio data.

Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute State according to described number of channels and described original audio data, it is thus achieved that the channel audio corresponding to each sound channel After data, also include:

Target channels voice data is carried out sub-frame processing, to obtain at least one frame voice data, described mesh Mark channel audio data include corresponding to the arbitrary sound channel in the channel audio data corresponding to each sound channel Channel audio data；

To described at least one frame voice data, carry out frequency domain transform process, to obtain every frame voice data institute Corresponding frequency domain data.

According to the frequency domain data corresponding to every frame voice data, it is thus achieved that every frequency domain corresponding to frame voice data Data energy component at each frequency；

If in the energy component that every frequency domain data corresponding to frame voice data is at least one identical frequency Difference between any two, less than or equal to described energy threshold, identifies the tonequality of described target audio file For described second tonequality.

Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute State before obtaining target audio file to be identified, also include:

Obtain the format parameter of candidate audio files；

According to described format parameter, determine that described candidate audio files is described target audio file；Or The tonequality identifying described candidate audio files is described second tonequality.

Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute State format parameter and include at least one in compressed format, sample rate, sampling depth and code check.

Another aspect of the present invention, it is provided that the tonequality identification device of a kind of audio file, including:

Acquiring unit, for obtaining target audio file to be identified；

Feature unit, for according to described target audio file, it is thus achieved that the time domain of described target audio file At least one in the frequency domain spectral line characteristic of wave character and described target audio file；

Recognition unit, for according at least in described time domain waveform feature and described frequency domain spectral line characteristic , identifying that the tonequality of described target audio file is the first tonequality or the second tonequality, described first tonequality is high In described second tonequality.

Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute State feature unit, specifically for

Determine the number of channels of described target audio file；

The data block of described target audio file is decoded, to obtain original audio data；And

Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute State recognition unit, specifically for

Described first channel audio data and described second sound channel voice data are carried out addition process, to obtain Obtain mixed layer sound channel voice data；And

N is the number more than 1；M is the number more than 1.

Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute State feature unit, be additionally operable to

Target channels voice data is carried out sub-frame processing, to obtain at least one frame voice data, described mesh Mark channel audio data include corresponding to the arbitrary sound channel in the channel audio data corresponding to each sound channel Channel audio data；And

According to the frequency domain data corresponding to every frame voice data, it is thus achieved that every frequency domain corresponding to frame voice data Data energy component at each frequency；And

Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute State recognition unit, be additionally operable to

Obtain the format parameter of candidate audio files；And

As shown from the above technical solution, the embodiment of the present invention by obtaining target audio file to be identified, And then according to described target audio file, it is thus achieved that the time domain waveform feature of described target audio file and described At least one in the frequency domain spectral line characteristic of target audio file, enabling special according to described time domain waveform Seek peace at least one in described frequency domain spectral line characteristic, identify that the tonequality of described target audio file is first Tonequality or the second tonequality, described first tonequality is higher than described second tonequality, so, can carry to user For the audio file of real high tone quality, allow users to appreciate the audio file of real high tone quality.

It addition, use the technical scheme that the present invention provides, simple to operate, it is possible to be effectively improved audio file The efficiency of tonequality identification.

[accompanying drawing explanation]

For the technical scheme being illustrated more clearly that in the embodiment of the present invention, below will be to embodiment or existing In technology description, the required accompanying drawing used is briefly described, it should be apparent that, in describing below Accompanying drawing is some embodiments of the present invention, for those of ordinary skill in the art, is not paying creation On the premise of property is laborious, it is also possible to obtain other accompanying drawing according to these accompanying drawings.

The schematic flow sheet of the acoustic fidelity identification method of the audio file that Fig. 1 provides for one embodiment of the invention；

Fig. 2 is an original audio data i.e. time domain of target channels voice data in the embodiment that Fig. 1 is corresponding Waveform diagram；

When Fig. 3 is another of the i.e. target channels voice data of original audio data in the embodiment that Fig. 1 is corresponding Domain waveform schematic diagram；

Fig. 4 is beginning voice data i.e. frequency corresponding to target channels voice data in the embodiment that Fig. 1 is corresponding The energy spectrum schematic diagram of numeric field data；

The structural representation of the tonequality identification device of the audio file that Fig. 5 provides for another embodiment of the present invention.

[detailed description of the invention]

For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with this Accompanying drawing in bright embodiment, is clearly and completely described the technical scheme in the embodiment of the present invention, Obviously, described embodiment is a part of embodiment of the present invention rather than whole embodiments.Based on Embodiment in the present invention, those of ordinary skill in the art are obtained under not making creative work premise Other embodiments whole obtained, broadly fall into the scope of protection of the invention.

It should be noted that terminal involved in the embodiment of the present invention can include but not limited to mobile phone, Personal digital assistant (Personal Digital Assistant, PDA), wireless handheld device, wireless on Net basis, portable computer, PC (Personal Computer, PC), MP3 player, MP4 Player etc..

It addition, the terms "and/or", a kind of incidence relation describing affiliated partner, represent Three kinds of relations, such as, A and/or B can be there are, can represent: individualism A, there is A simultaneously And B, individualism B these three situation.It addition, character "/" herein, typically represent forward-backward correlation pair As if the relation of a kind of "or".

The flow process signal of the acoustic fidelity identification method of a kind of audio file that Fig. 1 provides for one embodiment of the invention Figure, as shown in Figure 1.

101, target audio file to be identified is obtained.

Wherein, described target audio file can include the audio file of various coded formats in prior art, Such as, dynamic image expert group (Moving Picture Experts Group, MPEG) layer 3 (MPEGLayer-3, MP3) formatted audio files, WMA (Windows Media Audio) lattice Formula audio file, Advanced Audio Coding (Advanced Audio Coding, AAC) formatted audio files, Lossless Audio Compression coding (Free Lossless Audio Codec, FLAC) or APE format audio Files etc., this is not particularly limited by the present embodiment.

102, according to described target audio file, it is thus achieved that the time domain waveform feature of described target audio file and At least one in the frequency domain spectral line characteristic of described target audio file.

Wherein, the time domain waveform feature of described target audio file, original audio can be included but not limited to The amplitude information of data.

Original audio data, is by the digital signal converting acoustical signal, such as, to described sound Tone signal is sampled, quantifies and coded treatment, to obtain pulse code modulation (Pulse Code Modulation, PCM) data, specifically can be by the data block of target audio file be resolved Obtain.

Wherein, the frequency domain spectral line characteristic of described target audio file, original audio can be included but not limited to The spectrum information of data.

103, according at least one in described time domain waveform feature and described frequency domain spectral line characteristic, institute is identified The tonequality stating target audio file is the first tonequality or the second tonequality, and described first tonequality is higher than described second Tonequality.

It should be noted that the executive agent of 101～103 can be processing means, may be located at this locality Application (Application, App) such as, in Baidu's music, or may be located on the service of network side In device, or can also be in the application that is located locally of a part, another part is positioned at the server of network side.

It is understood that the application program (nativeAPP) that described application can be mounted in terminal, Or can also is that a webpage (webAPP) of browser in terminal, as long as being capable of audio frequency number According to process objective reality form can, this is not defined by the present embodiment.

So, by obtaining target audio file to be identified, and then according to described target audio file, Obtain time domain waveform feature and the frequency domain spectral line characteristic of described target audio file of described target audio file In at least one, enabling according in described time domain waveform feature and described frequency domain spectral line characteristic extremely One item missing, identifies that the tonequality of described target audio file is the first tonequality or the second tonequality, described first sound Matter is higher than described second tonequality, so, can provide a user with the audio file of real high tone quality, Allow users to appreciate the audio file of real high tone quality.

Alternatively, in a possible implementation of the present embodiment, before 101, processing means The format parameter of candidate audio files can also be obtained further.Then, described processing means then can root According to described format parameter, determine that described candidate audio files is described target audio file；Or identify institute The tonequality stating candidate audio files is described second tonequality.

Wherein, described format parameter can include but not limited to compressed format, sample rate, sampling depth and At least one in code check.

Described compressed format, original audio data performs the compression method of compression through certain program, such as MP3 format, WMA form, AAC form, FLAC form or APE form etc..

Described sample rate, also referred to as sample rate or sample frequency, define per second from continuous signal Extracting and form the number of samples of discrete signal, it represents with hertz (Hz).

Described sampling depth, refers to that the value of a sampled point is represented by several bit numbers, which determines each adopting The figure place of the value of sampling point, such as, 8 bits (bit), 16 or 24 etc..

Described code check, the quantity of the bit processed in referring to the unit interval, unit is bit per second (bps).

Specifically, the frame head of candidate audio files specifically can be resolved by processing means, to obtain time Select the format parameter of audio file.

Such as, if sampling depth is 8bit, identify that the tonequality of described candidate audio files is described second tonequality； If sampling depth is 16bit, determine that described candidate audio files is described target audio file.

Or, more such as, if sample rate is less than 44100Hz, identify the tonequality of described candidate audio files For described second tonequality；If sample rate is more than or equal to 44100Hz, determine that described candidate audio files is Described target audio file.

Or, more such as, compressed format is MP3, and code check is less than 320 kilobits per seconds (kbps), The tonequality identifying described candidate audio files is described second tonequality；Compressed format is MP3, and code check is big In or equal to 320kbps, determine that described candidate audio files is described target audio file.

So, by obtain candidate audio files format parameter, and then can according to described format parameter, The tonequality identifying described candidate audio files in advance is described second tonequality so that this candidate audio files without Need to be as target audio file, to identify further, it is possible to the tonequality being effectively improved audio file is known Other efficiency.

Further, since without candidate audio files is decoded, it is only necessary to carry out frame head resolving and just may be used To obtain the format parameter of candidate audio files, therefore, it is possible to the tonequality improving audio file further is known Other efficiency.

Alternatively, in a possible implementation of the present embodiment, in 102, processing means has Body may determine that the number of channels of described target audio file, and the data to described target audio file Block is decoded, to obtain original audio data.Then, described processing means then can be according to described sound Road number and described original audio data, it is thus achieved that the channel audio data corresponding to each sound channel.Wherein, The detailed description of analytic method and coding/decoding method may refer to related content of the prior art, the most no longer Repeat.

Such as, the frame head of described target audio file specifically can be resolved, to determine by processing means The number of channels of described target audio file.

Or the most such as, the file header of described target audio file is specifically resolved by processing means, with Determine the number of channels of described target audio file.

Or the most such as, other parts of target audio file can also be resolved by processing means, with Determining the number of channels of described target audio file, this is not particularly limited by the present embodiment.

Or the most such as, processing means specifically can also be from configuration file, it is thus achieved that described target audio literary composition The number of channels of part.

It is understood that " determining the number of channels of described target audio file ", and " to described The data block of target audio file is decoded, to obtain original audio data " two steps, do not have Permanent order, described processing means can first carry out " number of channels determining described target audio file " Step, then perform " data block of described target audio file to be decoded, to obtain original audio Data " step, or can also first carry out " data block of described target audio file is decoded, To obtain original audio data " step, then perform " determining the number of channels of described target audio file " Step, or the two step can also be performed simultaneously, this is not particularly limited by the present embodiment.

Correspondingly, in a possible implementation of the present embodiment, in 103, if described sound channel Number is more than or equal to 2, and processing means then can obtain according to the channel audio data corresponding to each sound channel Obtain the first channel audio data corresponding at least two sound channel and second sound channel voice data, and then by institute State the first channel audio data and described second sound channel voice data carries out addition process, to obtain compound voice Audio data.

If described mixed layer sound channel voice data is more than or equal to described first channel audio data/N or described the Two channel audio data/M, described processing means then can identify that the tonequality of described target audio file is institute State the first tonequality.Wherein, N is the number more than 1；M is the number more than 1.

If described mixed layer sound channel voice data is less than described first channel audio data/N or described second sound channel Voice data/M, described processing means then can identify that the tonequality of described target audio file is described second Tonequality；Wherein, N is the number more than 1；M is the number more than 1.

Correspondingly, in a possible implementation of the present embodiment, in 103, if specifying continuously Difference between any two in the value of the target channels voice data of number (such as 3), less than or equal to One amplitude threshold, the waveform corresponding to this situation can be as shown in Figure 2, then, described processing means The tonequality that then can identify described target audio file is described second tonequality.Wherein, target channels audio frequency Data can channel audio data corresponding to any one sound channel, this is not carried out especially by the present embodiment Limit.In Fig. 2, abscissa express time, vertical coordinate represents amplitude.

Correspondingly, in a possible implementation of the present embodiment, in 103, if continuous two The difference of value of target channels voice data, more than or equal to the second amplitude threshold, and described continuous two The symbol of the value of individual target channels voice data is contrary, and the waveform corresponding to this situation can be such as Fig. 3 Shown in, then, described processing means then can identify that the tonequality of described target audio file is described second Tonequality.Wherein, target channels voice data can channel audio data corresponding to any one sound channel, This is not particularly limited by the present embodiment.In Fig. 3, abscissa express time, vertical coordinate represents amplitude,.

Alternatively, in a possible implementation of the present embodiment, in 102, processing means exists After obtaining the channel audio data corresponding to each sound channel, it is also possible to further to target channels audio frequency number According to carrying out sub-frame processing, to obtain at least one frame voice data, described target channels voice data includes often The channel audio data corresponding to arbitrary sound channel in channel audio data corresponding to individual sound channel.Then, Described processing means then can carry out frequency domain transform process to described at least one frame voice data, to obtain Every frequency domain data corresponding to frame voice data.Wherein, target channels voice data can be any one Channel audio data corresponding to sound channel, this is not particularly limited by the present embodiment.

Specifically, described frequency domain transform processes and can include but not limited to fast Fourier transform (Fast Fourier Transform, FFT).

Such as, processing means can carry out framing to target channels voice data according to the interval of 20ms Process, and have the data overlap of 50% between consecutive frame, to obtain at least one frame voice data.Then, Described processing means then can carry out FFT process, to obtain every frame to described at least one frame voice data Frequency domain data corresponding to voice data, is designated as A_i,j；Wherein, i represents the numbering of frequency, and j represents frame Numbering, A_i,jRepresent jth frame frequency domain data at i-th frequency.

Correspondingly, in a possible implementation of the present embodiment, in 103, described process fills Putting specifically can be according to the frequency domain data corresponding to every frame voice data, it is thus achieved that corresponding to every frame voice data Frequency domain data energy component at each frequency.If every frequency domain data corresponding to frame voice data exists Difference between any two in energy component at least one identical frequency, less than or equal to described energy cut-off Value, the energy spectrum corresponding to this situation can be as shown in Figure 4, then, described processing means is the most permissible The tonequality identifying described target audio file is described second tonequality.In Fig. 4, abscissa express time, Vertical coordinate represents that frequency, the color of each point represent energy.

Such as, processing means is designated as A according to the frequency domain data corresponding to the every frame voice data obtained_i,j, Obtain energy component E at each frequency of the frequency domain data corresponding to every frame voice data_i,j；Wherein, i Representing the numbering of frequency, j represents the numbering of frame, E_i,jRepresent that jth frame energy at i-th frequency divides Amount.

In the present embodiment, by obtaining target audio file to be identified, and then according to described target audio File, it is thus achieved that the time domain waveform feature of described target audio file and the frequency domain spectra of described target audio file At least one in line feature, enabling according to described time domain waveform feature and described frequency domain spectral line characteristic In at least one, identify that the tonequality of described target audio file is the first tonequality or the second tonequality, described First tonequality is higher than described second tonequality, so, can provide a user with the audio frequency of real high tone quality File, allows users to appreciate the audio file of real high tone quality.

It should be noted that for aforesaid each method embodiment, in order to be briefly described, therefore by its all table Stating as a series of combination of actions, but those skilled in the art should know, the present invention is by being retouched The restriction of the sequence of movement stated because according to the present invention, some step can use other orders or with Shi Jinhang.Secondly, those skilled in the art also should know, embodiment described in this description all belongs to In preferred embodiment, necessary to involved action and the module not necessarily present invention.

In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not has in certain embodiment The part described in detail, may refer to the associated description of other embodiments.

The structural representation of the tonequality identification device of the audio file that Fig. 5 provides for another embodiment of the present invention, As shown in Figure 5.The tonequality identification device of the audio file of the present embodiment can include acquiring unit 51, spy Levy unit 52 and recognition unit 53.Wherein,

Acquiring unit 51, for obtaining target audio file to be identified.

Feature unit 52, for according to described target audio file, it is thus achieved that described target audio file time At least one in the frequency domain spectral line characteristic of domain waveform feature and described target audio file.

Recognition unit 53, for according in described time domain waveform feature and described frequency domain spectral line characteristic at least One, identify that the tonequality of described target audio file is the first tonequality or the second tonequality, described first tonequality Higher than described second tonequality.

It should be noted that the tonequality identification device of audio file that the present embodiment is provided can be to process Device, may be located at the application (Application, App) of this locality such as, in Baidu's music, or also May be located in the server of network side, or can also be in the application that is located locally of a part, another portion Divide the server being positioned at network side.

So, obtain target audio file to be identified by acquiring unit, so by feature unit according to Described target audio file, it is thus achieved that the time domain waveform feature of described target audio file and described target audio At least one in the frequency domain spectral line characteristic of file so that recognition unit can be special according to described time domain waveform Seek peace at least one in described frequency domain spectral line characteristic, identify that the tonequality of described target audio file is first Tonequality or the second tonequality, described first tonequality is higher than described second tonequality, so, can carry to user For the audio file of real high tone quality, allow users to appreciate the audio file of real high tone quality.

Alternatively, in a possible implementation of the present embodiment, described recognition unit, it is also possible to It is further used for obtaining the format parameter of candidate audio files；And according to described format parameter, determine institute Stating candidate audio files is described target audio file；Or the tonequality identifying described candidate audio files is Described second tonequality.

Specifically, the frame head of candidate audio files specifically can be resolved by described recognition unit 53, with Obtain the format parameter of candidate audio files.

So, obtained the format parameter of candidate audio files by recognition unit, and then can be according to described Format parameter, identifies that the tonequality of described candidate audio files is described second tonequality so that this candidate in advance Audio file is without as target audio file, to identify further, it is possible to be effectively improved audio frequency literary composition The efficiency of the tonequality identification of part.

Alternatively, in a possible implementation of the present embodiment, described feature unit 52, specifically It is determined for the number of channels of described target audio file；Data block to described target audio file It is decoded, to obtain original audio data；And according to described number of channels and described original audio number According to, it is thus achieved that the channel audio data corresponding to each sound channel.Wherein, analytic method and coding/decoding method is detailed Thin description may refer to related content of the prior art, and here is omitted.

Such as, the frame head of described target audio file specifically can be resolved by described feature unit 52, To determine the number of channels of described target audio file.

Or the most such as, the file header of described target audio file is specifically solved by described feature unit 52 Analysis, to determine the number of channels of described target audio file.

Or the most such as, other parts of target audio file can also be solved by described feature unit 52 Analysis, to determine the number of channels of described target audio file, this is not particularly limited by the present embodiment.

Or the most such as, described feature unit 52 specifically can also be from configuration file, it is thus achieved that described target The number of channels of audio file.

Correspondingly, in a possible implementation of the present embodiment, described recognition unit 53, specifically If may be used for described number of channels to be more than or equal to 2, according to the channel audio number corresponding to each sound channel According to, it is thus achieved that the first channel audio data corresponding at least two sound channel and second sound channel voice data；Will Described first channel audio data and described second sound channel voice data carry out addition process, to obtain mixing Channel audio data；And if described mixed layer sound channel voice data is more than or equal to described first channel audio Data/N or described second sound channel voice data/M, identifies that the tonequality of described target audio file is described One tonequality；If described mixed layer sound channel voice data is less than described first channel audio data/N or described second Channel audio data/M, identifies that the tonequality of described target audio file is described second tonequality；Wherein, N For the number more than 1；M is the number more than 1.

Correspondingly, in a possible implementation of the present embodiment, described recognition unit 53, specifically If may be used for specifying number continuously in the value of the target channels voice data of (such as 3) between any two Difference, less than or equal to the first amplitude threshold, identifies that the tonequality of described target audio file is described second Tonequality, it is arbitrary that described target channels voice data includes in the channel audio data corresponding to each sound channel Channel audio data corresponding to sound channel.Waveform corresponding to this situation can be as shown in Figure 2.Wherein, Target channels voice data can channel audio data corresponding to any one sound channel, the present embodiment pair This is not particularly limited.

Correspondingly, in a possible implementation of the present embodiment, described recognition unit 53, specifically If may be used for the difference of the value of the target channels voice data of continuous two, more than or equal to the second amplitude Threshold value, and the symbol of the value of the target channels voice data of described continuous two is contrary, identifies described target The tonequality of audio file is described second tonequality, and described target channels voice data includes that each sound channel institute is right The channel audio data corresponding to arbitrary sound channel in the channel audio data answered.Corresponding to this situation Waveform can be as shown in Figure 3.Wherein, target channels voice data can be corresponding to any one sound channel Channel audio data, this is not particularly limited by the present embodiment.

Alternatively, in a possible implementation of the present embodiment, described feature unit 52, also may be used To be further used for target channels voice data is carried out sub-frame processing, to obtain at least one frame voice data, Described target channels voice data includes the arbitrary sound channel institute in the channel audio data corresponding to each sound channel Corresponding channel audio data；And to described at least one frame voice data, carry out frequency domain transform process, To obtain the frequency domain data corresponding to every frame voice data.Wherein, target channels voice data can be to appoint Anticipating the channel audio data corresponding to a sound channel, this is not particularly limited by the present embodiment.

Such as, described feature unit 52 can enter to target channels voice data according to the interval of 20ms Row sub-frame processing, and have the data overlap of 50% between consecutive frame, to obtain at least one frame voice data. Then, described feature unit 52 then can carry out FFT process to described at least one frame voice data, with Obtain the frequency domain data corresponding to every frame voice data, be designated as A_i,j；Wherein, i represents the numbering of frequency, j Represent the numbering of frame, A_i,jRepresent jth frame frequency domain data at i-th frequency.

Correspondingly, in a possible implementation of the present embodiment, described recognition unit 53, specifically May be used for according to the frequency domain data corresponding to every frame voice data, it is thus achieved that corresponding to every frame voice data Frequency domain data energy component at each frequency；If every frequency domain data corresponding to frame voice data is extremely Lack difference between any two in the energy component at an identical frequency, less than or equal to described energy threshold, The tonequality identifying described target audio file is described second tonequality.Energy spectrum corresponding to this situation can With as shown in Figure 4.

Such as, described recognition unit 53 is remembered according to the frequency domain data corresponding to the every frame voice data obtained For A_i,j, it is thus achieved that every frequency domain data corresponding to frame voice data energy component E at each frequency_i,j；Its In, i represents the numbering of frequency, and j represents the numbering of frame, E_i,jRepresent jth frame energy at i-th frequency Amount component.

In the present embodiment, obtain target audio file to be identified by acquiring unit, and then by feature list Unit is according to described target audio file, it is thus achieved that the time domain waveform feature of described target audio file and described mesh At least one in the frequency domain spectral line characteristic of mark with phonetic symbols frequency file so that recognition unit can be according to described time domain At least one in wave character and described frequency domain spectral line characteristic, identifies the tonequality of described target audio file Being the first tonequality or the second tonequality, described first tonequality is higher than described second tonequality, so, and can be to User provides the audio file of real high tone quality, allows users to appreciate the audio frequency literary composition of real high tone quality Part.

Those skilled in the art is it can be understood that arrive, and for convenience and simplicity of description, above-mentioned retouches The specific works process of the system stated, device and unit, is referred to the correspondence in preceding method embodiment Process, does not repeats them here.

In several embodiments provided by the present invention, it should be understood that disclosed system, device and Method, can realize by another way.Such as, device embodiment described above is only shown Meaning property, such as, the division of described unit, be only a kind of logic function and divide, actual can when realizing There to be other dividing mode, the most multiple unit or assembly can in conjunction with or be desirably integrated into another System, or some features can ignore, or do not perform.Another point, shown or discussed each other Coupling direct-coupling or communication connection can be the INDIRECT COUPLING by some interfaces, device or unit Or communication connection, can be electrical, machinery or other form.

The described unit illustrated as separating component can be or may not be physically separate, makees The parts shown for unit can be or may not be physical location, i.e. may be located at a place, Or can also be distributed on multiple NE.Can select according to the actual needs part therein or The whole unit of person realizes the purpose of the present embodiment scheme.

It addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, Can also be that unit is individually physically present, it is also possible to two or more unit are integrated in a list In unit.Above-mentioned integrated unit both can realize to use the form of hardware, it would however also be possible to employ hardware adds software The form of functional unit realizes.

The above-mentioned integrated unit realized with the form of SFU software functional unit, can be stored in a computer In read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, including some fingers Make with so that computer installation (can be personal computer, Audio Processing engine, or network Device etc.) or processor (processor) perform the part steps of method described in each embodiment of the present invention. And aforesaid storage medium includes: USB flash disk, portable hard drive, read only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or light The various medium that can store program code such as dish.

Last it is noted that above example is only in order to illustrate technical scheme, rather than to it Limit；Although the present invention being described in detail with reference to previous embodiment, the ordinary skill of this area Personnel it is understood that the technical scheme described in foregoing embodiments still can be modified by it, or Person carries out equivalent to wherein portion of techniques feature；And these amendments or replacement, do not make corresponding skill The essence of art scheme departs from the spirit and scope of various embodiments of the present invention technical scheme.

Claims

1. the acoustic fidelity identification method of an audio file, it is characterised in that including:

Obtain target audio file to be identified；

According at least one in described time domain waveform feature and described frequency domain spectral line characteristic, identify described mesh The tonequality of mark with phonetic symbols frequency file is the first tonequality or the second tonequality, and described first tonequality is higher than described second tonequality； Wherein,

Described according to described target audio file, it is thus achieved that the time domain waveform feature of described target audio file and At least one in the frequency domain spectral line characteristic of described target audio file, including:

Determine the number of channels of described target audio file；

According to described number of channels and described original audio data, it is thus achieved that the sound channel sound corresponding to each sound channel Frequency evidence；

Described according to described number of channels with described original audio data, it is thus achieved that the sound corresponding to each sound channel After audio data, also include:

Method the most according to claim 1, it is characterised in that described special according to described time domain waveform Seek peace at least one in described frequency domain spectral line characteristic, identify that the tonequality of described target audio file is first Tonequality or the second tonequality, including:

N is the number more than 1；M is the number more than 1.

5. according to the method described in Claims 1 to 4 any claim, it is characterised in that described acquisition Before target audio file to be identified, also include:

Obtain the format parameter of candidate audio files；

Method the most according to claim 5, it is characterised in that described format parameter includes compressing lattice At least one in formula, sample rate, sampling depth and code check.

7. the tonequality identification device of an audio file, it is characterised in that including:

Acquiring unit, for obtaining target audio file to be identified；

Recognition unit, for according at least in described time domain waveform feature and described frequency domain spectral line characteristic , identifying that the tonequality of described target audio file is the first tonequality or the second tonequality, described first tonequality is high In described second tonequality；Wherein,

Described feature unit, specifically for

Determine the number of channels of described target audio file；

Described feature unit, is additionally operable to

Device the most according to claim 7, it is characterised in that described recognition unit, specifically for

N is the number more than 1；M is the number more than 1.

Device the most according to claim 7, it is characterised in that described recognition unit, specifically uses In

11. according to the device described in claim 7～10 any claim, it is characterised in that described knowledge Other unit, is additionally operable to

Obtain the format parameter of candidate audio files；And

12. devices according to claim 11, it is characterised in that described format parameter includes compression At least one in form, sample rate, sampling depth and code check.