[summary of the invention]
The many aspects of the present invention provide acoustic fidelity identification method and the device of a kind of audio file, in order to realize
The tonequality identification of audio file.
An aspect of of the present present invention, it is provided that the acoustic fidelity identification method of a kind of audio file, including:
Obtain target audio file to be identified;
According to described target audio file, it is thus achieved that the time domain waveform feature of described target audio file and described
At least one in the frequency domain spectral line characteristic of target audio file;
According at least one in described time domain waveform feature and described frequency domain spectral line characteristic, identify described mesh
The tonequality of mark with phonetic symbols frequency file is the first tonequality or the second tonequality, and described first tonequality is higher than described second tonequality.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute
State according to described target audio file, it is thus achieved that the time domain waveform feature of described target audio file and described mesh
At least one in the frequency domain spectral line characteristic of mark with phonetic symbols frequency file, including:
Determine the number of channels of described target audio file;
The data block of described target audio file is decoded, to obtain original audio data;
According to described number of channels and described original audio data, it is thus achieved that the sound channel sound corresponding to each sound channel
Frequency evidence.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute
State according at least one in described time domain waveform feature and described frequency domain spectral line characteristic, identify described target
The tonequality of audio file is the first tonequality or the second tonequality, including:
If described number of channels is more than or equal to 2, according to the channel audio data corresponding to each sound channel, obtain
Obtain the first channel audio data corresponding at least two sound channel and second sound channel voice data;
Described first channel audio data and described second sound channel voice data are carried out addition process, to obtain
Obtain mixed layer sound channel voice data;
If described mixed layer sound channel voice data is more than or equal to described first channel audio data/N or described the
Two channel audio data/M, identifies that the tonequality of described target audio file is described first tonequality;
If described mixed layer sound channel voice data is less than described first channel audio data/N or described second sound channel
Voice data/M, identifies that the tonequality of described target audio file is described second tonequality;Wherein,
N is the number more than 1;M is the number more than 1.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute
State according at least one in described time domain waveform feature and described frequency domain spectral line characteristic, identify described target
The tonequality of audio file is the first tonequality or the second tonequality, including:
If difference in the value of the target channels voice data specified number continuously between any two, less than or etc.
In the first amplitude threshold, identify that the tonequality of described target audio file is described second tonequality, described target
Channel audio data include the sound corresponding to arbitrary sound channel in the channel audio data corresponding to each sound channel
Audio data;Or
If the difference of the value of the target channels voice data of continuous two, more than or equal to the second amplitude threshold,
And the symbol of the value of the target channels voice data of described continuous two is contrary, identify described target audio literary composition
The tonequality of part is described second tonequality, and described target channels voice data includes the sound corresponding to each sound channel
The channel audio data corresponding to arbitrary sound channel in audio data.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute
State according to described number of channels and described original audio data, it is thus achieved that the channel audio corresponding to each sound channel
After data, also include:
Target channels voice data is carried out sub-frame processing, to obtain at least one frame voice data, described mesh
Mark channel audio data include corresponding to the arbitrary sound channel in the channel audio data corresponding to each sound channel
Channel audio data;
To described at least one frame voice data, carry out frequency domain transform process, to obtain every frame voice data institute
Corresponding frequency domain data.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute
State according at least one in described time domain waveform feature and described frequency domain spectral line characteristic, identify described target
The tonequality of audio file is the first tonequality or the second tonequality, including:
According to the frequency domain data corresponding to every frame voice data, it is thus achieved that every frequency domain corresponding to frame voice data
Data energy component at each frequency;
If in the energy component that every frequency domain data corresponding to frame voice data is at least one identical frequency
Difference between any two, less than or equal to described energy threshold, identifies the tonequality of described target audio file
For described second tonequality.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute
State before obtaining target audio file to be identified, also include:
Obtain the format parameter of candidate audio files;
According to described format parameter, determine that described candidate audio files is described target audio file;Or
The tonequality identifying described candidate audio files is described second tonequality.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute
State format parameter and include at least one in compressed format, sample rate, sampling depth and code check.
Another aspect of the present invention, it is provided that the tonequality identification device of a kind of audio file, including:
Acquiring unit, for obtaining target audio file to be identified;
Feature unit, for according to described target audio file, it is thus achieved that the time domain of described target audio file
At least one in the frequency domain spectral line characteristic of wave character and described target audio file;
Recognition unit, for according at least in described time domain waveform feature and described frequency domain spectral line characteristic
, identifying that the tonequality of described target audio file is the first tonequality or the second tonequality, described first tonequality is high
In described second tonequality.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute
State feature unit, specifically for
Determine the number of channels of described target audio file;
The data block of described target audio file is decoded, to obtain original audio data;And
According to described number of channels and described original audio data, it is thus achieved that the sound channel sound corresponding to each sound channel
Frequency evidence.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute
State recognition unit, specifically for
If described number of channels is more than or equal to 2, according to the channel audio data corresponding to each sound channel, obtain
Obtain the first channel audio data corresponding at least two sound channel and second sound channel voice data;
Described first channel audio data and described second sound channel voice data are carried out addition process, to obtain
Obtain mixed layer sound channel voice data;And
If described mixed layer sound channel voice data is more than or equal to described first channel audio data/N or described the
Two channel audio data/M, identifies that the tonequality of described target audio file is described first tonequality;
If described mixed layer sound channel voice data is less than described first channel audio data/N or described second sound channel
Voice data/M, identifies that the tonequality of described target audio file is described second tonequality;Wherein,
N is the number more than 1;M is the number more than 1.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute
State recognition unit, specifically for
If difference in the value of the target channels voice data specified number continuously between any two, less than or etc.
In the first amplitude threshold, identify that the tonequality of described target audio file is described second tonequality, described target
Channel audio data include the sound corresponding to arbitrary sound channel in the channel audio data corresponding to each sound channel
Audio data;Or
If the difference of the value of the target channels voice data of continuous two, more than or equal to the second amplitude threshold,
And the symbol of the value of the target channels voice data of described continuous two is contrary, identify described target audio literary composition
The tonequality of part is described second tonequality, and described target channels voice data includes the sound corresponding to each sound channel
The channel audio data corresponding to arbitrary sound channel in audio data.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute
State feature unit, be additionally operable to
Target channels voice data is carried out sub-frame processing, to obtain at least one frame voice data, described mesh
Mark channel audio data include corresponding to the arbitrary sound channel in the channel audio data corresponding to each sound channel
Channel audio data;And
To described at least one frame voice data, carry out frequency domain transform process, to obtain every frame voice data institute
Corresponding frequency domain data.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute
State recognition unit, specifically for
According to the frequency domain data corresponding to every frame voice data, it is thus achieved that every frequency domain corresponding to frame voice data
Data energy component at each frequency;And
If in the energy component that every frequency domain data corresponding to frame voice data is at least one identical frequency
Difference between any two, less than or equal to described energy threshold, identifies the tonequality of described target audio file
For described second tonequality.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute
State recognition unit, be additionally operable to
Obtain the format parameter of candidate audio files;And
According to described format parameter, determine that described candidate audio files is described target audio file;Or
The tonequality identifying described candidate audio files is described second tonequality.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, institute
State format parameter and include at least one in compressed format, sample rate, sampling depth and code check.
As shown from the above technical solution, the embodiment of the present invention by obtaining target audio file to be identified,
And then according to described target audio file, it is thus achieved that the time domain waveform feature of described target audio file and described
At least one in the frequency domain spectral line characteristic of target audio file, enabling special according to described time domain waveform
Seek peace at least one in described frequency domain spectral line characteristic, identify that the tonequality of described target audio file is first
Tonequality or the second tonequality, described first tonequality is higher than described second tonequality, so, can carry to user
For the audio file of real high tone quality, allow users to appreciate the audio file of real high tone quality.
It addition, use the technical scheme that the present invention provides, simple to operate, it is possible to be effectively improved audio file
The efficiency of tonequality identification.
[detailed description of the invention]
For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with this
Accompanying drawing in bright embodiment, is clearly and completely described the technical scheme in the embodiment of the present invention,
Obviously, described embodiment is a part of embodiment of the present invention rather than whole embodiments.Based on
Embodiment in the present invention, those of ordinary skill in the art are obtained under not making creative work premise
Other embodiments whole obtained, broadly fall into the scope of protection of the invention.
It should be noted that terminal involved in the embodiment of the present invention can include but not limited to mobile phone,
Personal digital assistant (Personal Digital Assistant, PDA), wireless handheld device, wireless on
Net basis, portable computer, PC (Personal Computer, PC), MP3 player, MP4
Player etc..
It addition, the terms "and/or", a kind of incidence relation describing affiliated partner, represent
Three kinds of relations, such as, A and/or B can be there are, can represent: individualism A, there is A simultaneously
And B, individualism B these three situation.It addition, character "/" herein, typically represent forward-backward correlation pair
As if the relation of a kind of "or".
The flow process signal of the acoustic fidelity identification method of a kind of audio file that Fig. 1 provides for one embodiment of the invention
Figure, as shown in Figure 1.
101, target audio file to be identified is obtained.
Wherein, described target audio file can include the audio file of various coded formats in prior art,
Such as, dynamic image expert group (Moving Picture Experts Group, MPEG) layer 3
(MPEGLayer-3, MP3) formatted audio files, WMA (Windows Media Audio) lattice
Formula audio file, Advanced Audio Coding (Advanced Audio Coding, AAC) formatted audio files,
Lossless Audio Compression coding (Free Lossless Audio Codec, FLAC) or APE format audio
Files etc., this is not particularly limited by the present embodiment.
102, according to described target audio file, it is thus achieved that the time domain waveform feature of described target audio file and
At least one in the frequency domain spectral line characteristic of described target audio file.
Wherein, the time domain waveform feature of described target audio file, original audio can be included but not limited to
The amplitude information of data.
Original audio data, is by the digital signal converting acoustical signal, such as, to described sound
Tone signal is sampled, quantifies and coded treatment, to obtain pulse code modulation (Pulse Code
Modulation, PCM) data, specifically can be by the data block of target audio file be resolved
Obtain.
Wherein, the frequency domain spectral line characteristic of described target audio file, original audio can be included but not limited to
The spectrum information of data.
103, according at least one in described time domain waveform feature and described frequency domain spectral line characteristic, institute is identified
The tonequality stating target audio file is the first tonequality or the second tonequality, and described first tonequality is higher than described second
Tonequality.
It should be noted that the executive agent of 101~103 can be processing means, may be located at this locality
Application (Application, App) such as, in Baidu's music, or may be located on the service of network side
In device, or can also be in the application that is located locally of a part, another part is positioned at the server of network side.
It is understood that the application program (nativeAPP) that described application can be mounted in terminal,
Or can also is that a webpage (webAPP) of browser in terminal, as long as being capable of audio frequency number
According to process objective reality form can, this is not defined by the present embodiment.
So, by obtaining target audio file to be identified, and then according to described target audio file,
Obtain time domain waveform feature and the frequency domain spectral line characteristic of described target audio file of described target audio file
In at least one, enabling according in described time domain waveform feature and described frequency domain spectral line characteristic extremely
One item missing, identifies that the tonequality of described target audio file is the first tonequality or the second tonequality, described first sound
Matter is higher than described second tonequality, so, can provide a user with the audio file of real high tone quality,
Allow users to appreciate the audio file of real high tone quality.
Alternatively, in a possible implementation of the present embodiment, before 101, processing means
The format parameter of candidate audio files can also be obtained further.Then, described processing means then can root
According to described format parameter, determine that described candidate audio files is described target audio file;Or identify institute
The tonequality stating candidate audio files is described second tonequality.
Wherein, described format parameter can include but not limited to compressed format, sample rate, sampling depth and
At least one in code check.
Described compressed format, original audio data performs the compression method of compression through certain program, such as
MP3 format, WMA form, AAC form, FLAC form or APE form etc..
Described sample rate, also referred to as sample rate or sample frequency, define per second from continuous signal
Extracting and form the number of samples of discrete signal, it represents with hertz (Hz).
Described sampling depth, refers to that the value of a sampled point is represented by several bit numbers, which determines each adopting
The figure place of the value of sampling point, such as, 8 bits (bit), 16 or 24 etc..
Described code check, the quantity of the bit processed in referring to the unit interval, unit is bit per second (bps).
Specifically, the frame head of candidate audio files specifically can be resolved by processing means, to obtain time
Select the format parameter of audio file.
Such as, if sampling depth is 8bit, identify that the tonequality of described candidate audio files is described second tonequality;
If sampling depth is 16bit, determine that described candidate audio files is described target audio file.
Or, more such as, if sample rate is less than 44100Hz, identify the tonequality of described candidate audio files
For described second tonequality;If sample rate is more than or equal to 44100Hz, determine that described candidate audio files is
Described target audio file.
Or, more such as, compressed format is MP3, and code check is less than 320 kilobits per seconds (kbps),
The tonequality identifying described candidate audio files is described second tonequality;Compressed format is MP3, and code check is big
In or equal to 320kbps, determine that described candidate audio files is described target audio file.
So, by obtain candidate audio files format parameter, and then can according to described format parameter,
The tonequality identifying described candidate audio files in advance is described second tonequality so that this candidate audio files without
Need to be as target audio file, to identify further, it is possible to the tonequality being effectively improved audio file is known
Other efficiency.
Further, since without candidate audio files is decoded, it is only necessary to carry out frame head resolving and just may be used
To obtain the format parameter of candidate audio files, therefore, it is possible to the tonequality improving audio file further is known
Other efficiency.
Alternatively, in a possible implementation of the present embodiment, in 102, processing means has
Body may determine that the number of channels of described target audio file, and the data to described target audio file
Block is decoded, to obtain original audio data.Then, described processing means then can be according to described sound
Road number and described original audio data, it is thus achieved that the channel audio data corresponding to each sound channel.Wherein,
The detailed description of analytic method and coding/decoding method may refer to related content of the prior art, the most no longer
Repeat.
Such as, the frame head of described target audio file specifically can be resolved, to determine by processing means
The number of channels of described target audio file.
Or the most such as, the file header of described target audio file is specifically resolved by processing means, with
Determine the number of channels of described target audio file.
Or the most such as, other parts of target audio file can also be resolved by processing means, with
Determining the number of channels of described target audio file, this is not particularly limited by the present embodiment.
Or the most such as, processing means specifically can also be from configuration file, it is thus achieved that described target audio literary composition
The number of channels of part.
It is understood that " determining the number of channels of described target audio file ", and " to described
The data block of target audio file is decoded, to obtain original audio data " two steps, do not have
Permanent order, described processing means can first carry out " number of channels determining described target audio file "
Step, then perform " data block of described target audio file to be decoded, to obtain original audio
Data " step, or can also first carry out " data block of described target audio file is decoded,
To obtain original audio data " step, then perform " determining the number of channels of described target audio file "
Step, or the two step can also be performed simultaneously, this is not particularly limited by the present embodiment.
Correspondingly, in a possible implementation of the present embodiment, in 103, if described sound channel
Number is more than or equal to 2, and processing means then can obtain according to the channel audio data corresponding to each sound channel
Obtain the first channel audio data corresponding at least two sound channel and second sound channel voice data, and then by institute
State the first channel audio data and described second sound channel voice data carries out addition process, to obtain compound voice
Audio data.
If described mixed layer sound channel voice data is more than or equal to described first channel audio data/N or described the
Two channel audio data/M, described processing means then can identify that the tonequality of described target audio file is institute
State the first tonequality.Wherein, N is the number more than 1;M is the number more than 1.
If described mixed layer sound channel voice data is less than described first channel audio data/N or described second sound channel
Voice data/M, described processing means then can identify that the tonequality of described target audio file is described second
Tonequality;Wherein, N is the number more than 1;M is the number more than 1.
Correspondingly, in a possible implementation of the present embodiment, in 103, if specifying continuously
Difference between any two in the value of the target channels voice data of number (such as 3), less than or equal to
One amplitude threshold, the waveform corresponding to this situation can be as shown in Figure 2, then, described processing means
The tonequality that then can identify described target audio file is described second tonequality.Wherein, target channels audio frequency
Data can channel audio data corresponding to any one sound channel, this is not carried out especially by the present embodiment
Limit.In Fig. 2, abscissa express time, vertical coordinate represents amplitude.
Correspondingly, in a possible implementation of the present embodiment, in 103, if continuous two
The difference of value of target channels voice data, more than or equal to the second amplitude threshold, and described continuous two
The symbol of the value of individual target channels voice data is contrary, and the waveform corresponding to this situation can be such as Fig. 3
Shown in, then, described processing means then can identify that the tonequality of described target audio file is described second
Tonequality.Wherein, target channels voice data can channel audio data corresponding to any one sound channel,
This is not particularly limited by the present embodiment.In Fig. 3, abscissa express time, vertical coordinate represents amplitude,.
Alternatively, in a possible implementation of the present embodiment, in 102, processing means exists
After obtaining the channel audio data corresponding to each sound channel, it is also possible to further to target channels audio frequency number
According to carrying out sub-frame processing, to obtain at least one frame voice data, described target channels voice data includes often
The channel audio data corresponding to arbitrary sound channel in channel audio data corresponding to individual sound channel.Then,
Described processing means then can carry out frequency domain transform process to described at least one frame voice data, to obtain
Every frequency domain data corresponding to frame voice data.Wherein, target channels voice data can be any one
Channel audio data corresponding to sound channel, this is not particularly limited by the present embodiment.
Specifically, described frequency domain transform processes and can include but not limited to fast Fourier transform (Fast
Fourier Transform, FFT).
Such as, processing means can carry out framing to target channels voice data according to the interval of 20ms
Process, and have the data overlap of 50% between consecutive frame, to obtain at least one frame voice data.Then,
Described processing means then can carry out FFT process, to obtain every frame to described at least one frame voice data
Frequency domain data corresponding to voice data, is designated as Ai,j;Wherein, i represents the numbering of frequency, and j represents frame
Numbering, Ai,jRepresent jth frame frequency domain data at i-th frequency.
Correspondingly, in a possible implementation of the present embodiment, in 103, described process fills
Putting specifically can be according to the frequency domain data corresponding to every frame voice data, it is thus achieved that corresponding to every frame voice data
Frequency domain data energy component at each frequency.If every frequency domain data corresponding to frame voice data exists
Difference between any two in energy component at least one identical frequency, less than or equal to described energy cut-off
Value, the energy spectrum corresponding to this situation can be as shown in Figure 4, then, described processing means is the most permissible
The tonequality identifying described target audio file is described second tonequality.In Fig. 4, abscissa express time,
Vertical coordinate represents that frequency, the color of each point represent energy.
Such as, processing means is designated as A according to the frequency domain data corresponding to the every frame voice data obtainedi,j,
Obtain energy component E at each frequency of the frequency domain data corresponding to every frame voice datai,j;Wherein, i
Representing the numbering of frequency, j represents the numbering of frame, Ei,jRepresent that jth frame energy at i-th frequency divides
Amount.
In the present embodiment, by obtaining target audio file to be identified, and then according to described target audio
File, it is thus achieved that the time domain waveform feature of described target audio file and the frequency domain spectra of described target audio file
At least one in line feature, enabling according to described time domain waveform feature and described frequency domain spectral line characteristic
In at least one, identify that the tonequality of described target audio file is the first tonequality or the second tonequality, described
First tonequality is higher than described second tonequality, so, can provide a user with the audio frequency of real high tone quality
File, allows users to appreciate the audio file of real high tone quality.
It addition, use the technical scheme that the present invention provides, simple to operate, it is possible to be effectively improved audio file
The efficiency of tonequality identification.
It should be noted that for aforesaid each method embodiment, in order to be briefly described, therefore by its all table
Stating as a series of combination of actions, but those skilled in the art should know, the present invention is by being retouched
The restriction of the sequence of movement stated because according to the present invention, some step can use other orders or with
Shi Jinhang.Secondly, those skilled in the art also should know, embodiment described in this description all belongs to
In preferred embodiment, necessary to involved action and the module not necessarily present invention.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not has in certain embodiment
The part described in detail, may refer to the associated description of other embodiments.
The structural representation of the tonequality identification device of the audio file that Fig. 5 provides for another embodiment of the present invention,
As shown in Figure 5.The tonequality identification device of the audio file of the present embodiment can include acquiring unit 51, spy
Levy unit 52 and recognition unit 53.Wherein,
Acquiring unit 51, for obtaining target audio file to be identified.
Wherein, described target audio file can include the audio file of various coded formats in prior art,
Such as, dynamic image expert group (Moving Picture Experts Group, MPEG) layer 3
(MPEGLayer-3, MP3) formatted audio files, WMA (Windows Media Audio) lattice
Formula audio file, Advanced Audio Coding (Advanced Audio Coding, AAC) formatted audio files,
Lossless Audio Compression coding (Free Lossless Audio Codec, FLAC) or APE format audio
Files etc., this is not particularly limited by the present embodiment.
Feature unit 52, for according to described target audio file, it is thus achieved that described target audio file time
At least one in the frequency domain spectral line characteristic of domain waveform feature and described target audio file.
Wherein, the time domain waveform feature of described target audio file, original audio can be included but not limited to
The amplitude information of data.
Original audio data, is by the digital signal converting acoustical signal, such as, to described sound
Tone signal is sampled, quantifies and coded treatment, to obtain pulse code modulation (Pulse Code
Modulation, PCM) data, specifically can be by the data block of target audio file be resolved
Obtain.
Wherein, the frequency domain spectral line characteristic of described target audio file, original audio can be included but not limited to
The spectrum information of data.
Recognition unit 53, for according in described time domain waveform feature and described frequency domain spectral line characteristic at least
One, identify that the tonequality of described target audio file is the first tonequality or the second tonequality, described first tonequality
Higher than described second tonequality.
It should be noted that the tonequality identification device of audio file that the present embodiment is provided can be to process
Device, may be located at the application (Application, App) of this locality such as, in Baidu's music, or also
May be located in the server of network side, or can also be in the application that is located locally of a part, another portion
Divide the server being positioned at network side.
It is understood that the application program (nativeAPP) that described application can be mounted in terminal,
Or can also is that a webpage (webAPP) of browser in terminal, as long as being capable of audio frequency number
According to process objective reality form can, this is not defined by the present embodiment.
So, obtain target audio file to be identified by acquiring unit, so by feature unit according to
Described target audio file, it is thus achieved that the time domain waveform feature of described target audio file and described target audio
At least one in the frequency domain spectral line characteristic of file so that recognition unit can be special according to described time domain waveform
Seek peace at least one in described frequency domain spectral line characteristic, identify that the tonequality of described target audio file is first
Tonequality or the second tonequality, described first tonequality is higher than described second tonequality, so, can carry to user
For the audio file of real high tone quality, allow users to appreciate the audio file of real high tone quality.
Alternatively, in a possible implementation of the present embodiment, described recognition unit, it is also possible to
It is further used for obtaining the format parameter of candidate audio files;And according to described format parameter, determine institute
Stating candidate audio files is described target audio file;Or the tonequality identifying described candidate audio files is
Described second tonequality.
Wherein, described format parameter can include but not limited to compressed format, sample rate, sampling depth and
At least one in code check.
Described compressed format, original audio data performs the compression method of compression through certain program, such as
MP3 format, WMA form, AAC form, FLAC form or APE form etc..
Described sample rate, also referred to as sample rate or sample frequency, define per second from continuous signal
Extracting and form the number of samples of discrete signal, it represents with hertz (Hz).
Described sampling depth, refers to that the value of a sampled point is represented by several bit numbers, which determines each adopting
The figure place of the value of sampling point, such as, 8 bits (bit), 16 or 24 etc..
Described code check, the quantity of the bit processed in referring to the unit interval, unit is bit per second (bps).
Specifically, the frame head of candidate audio files specifically can be resolved by described recognition unit 53, with
Obtain the format parameter of candidate audio files.
Such as, if sampling depth is 8bit, identify that the tonequality of described candidate audio files is described second tonequality;
If sampling depth is 16bit, determine that described candidate audio files is described target audio file.
Or, more such as, if sample rate is less than 44100Hz, identify the tonequality of described candidate audio files
For described second tonequality;If sample rate is more than or equal to 44100Hz, determine that described candidate audio files is
Described target audio file.
Or, more such as, compressed format is MP3, and code check is less than 320 kilobits per seconds (kbps),
The tonequality identifying described candidate audio files is described second tonequality;Compressed format is MP3, and code check is big
In or equal to 320kbps, determine that described candidate audio files is described target audio file.
So, obtained the format parameter of candidate audio files by recognition unit, and then can be according to described
Format parameter, identifies that the tonequality of described candidate audio files is described second tonequality so that this candidate in advance
Audio file is without as target audio file, to identify further, it is possible to be effectively improved audio frequency literary composition
The efficiency of the tonequality identification of part.
Further, since without candidate audio files is decoded, it is only necessary to carry out frame head resolving and just may be used
To obtain the format parameter of candidate audio files, therefore, it is possible to the tonequality improving audio file further is known
Other efficiency.
Alternatively, in a possible implementation of the present embodiment, described feature unit 52, specifically
It is determined for the number of channels of described target audio file;Data block to described target audio file
It is decoded, to obtain original audio data;And according to described number of channels and described original audio number
According to, it is thus achieved that the channel audio data corresponding to each sound channel.Wherein, analytic method and coding/decoding method is detailed
Thin description may refer to related content of the prior art, and here is omitted.
Such as, the frame head of described target audio file specifically can be resolved by described feature unit 52,
To determine the number of channels of described target audio file.
Or the most such as, the file header of described target audio file is specifically solved by described feature unit 52
Analysis, to determine the number of channels of described target audio file.
Or the most such as, other parts of target audio file can also be solved by described feature unit 52
Analysis, to determine the number of channels of described target audio file, this is not particularly limited by the present embodiment.
Or the most such as, described feature unit 52 specifically can also be from configuration file, it is thus achieved that described target
The number of channels of audio file.
Correspondingly, in a possible implementation of the present embodiment, described recognition unit 53, specifically
If may be used for described number of channels to be more than or equal to 2, according to the channel audio number corresponding to each sound channel
According to, it is thus achieved that the first channel audio data corresponding at least two sound channel and second sound channel voice data;Will
Described first channel audio data and described second sound channel voice data carry out addition process, to obtain mixing
Channel audio data;And if described mixed layer sound channel voice data is more than or equal to described first channel audio
Data/N or described second sound channel voice data/M, identifies that the tonequality of described target audio file is described
One tonequality;If described mixed layer sound channel voice data is less than described first channel audio data/N or described second
Channel audio data/M, identifies that the tonequality of described target audio file is described second tonequality;Wherein, N
For the number more than 1;M is the number more than 1.
Correspondingly, in a possible implementation of the present embodiment, described recognition unit 53, specifically
If may be used for specifying number continuously in the value of the target channels voice data of (such as 3) between any two
Difference, less than or equal to the first amplitude threshold, identifies that the tonequality of described target audio file is described second
Tonequality, it is arbitrary that described target channels voice data includes in the channel audio data corresponding to each sound channel
Channel audio data corresponding to sound channel.Waveform corresponding to this situation can be as shown in Figure 2.Wherein,
Target channels voice data can channel audio data corresponding to any one sound channel, the present embodiment pair
This is not particularly limited.
Correspondingly, in a possible implementation of the present embodiment, described recognition unit 53, specifically
If may be used for the difference of the value of the target channels voice data of continuous two, more than or equal to the second amplitude
Threshold value, and the symbol of the value of the target channels voice data of described continuous two is contrary, identifies described target
The tonequality of audio file is described second tonequality, and described target channels voice data includes that each sound channel institute is right
The channel audio data corresponding to arbitrary sound channel in the channel audio data answered.Corresponding to this situation
Waveform can be as shown in Figure 3.Wherein, target channels voice data can be corresponding to any one sound channel
Channel audio data, this is not particularly limited by the present embodiment.
Alternatively, in a possible implementation of the present embodiment, described feature unit 52, also may be used
To be further used for target channels voice data is carried out sub-frame processing, to obtain at least one frame voice data,
Described target channels voice data includes the arbitrary sound channel institute in the channel audio data corresponding to each sound channel
Corresponding channel audio data;And to described at least one frame voice data, carry out frequency domain transform process,
To obtain the frequency domain data corresponding to every frame voice data.Wherein, target channels voice data can be to appoint
Anticipating the channel audio data corresponding to a sound channel, this is not particularly limited by the present embodiment.
Specifically, described frequency domain transform processes and can include but not limited to fast Fourier transform (Fast
Fourier Transform, FFT).
Such as, described feature unit 52 can enter to target channels voice data according to the interval of 20ms
Row sub-frame processing, and have the data overlap of 50% between consecutive frame, to obtain at least one frame voice data.
Then, described feature unit 52 then can carry out FFT process to described at least one frame voice data, with
Obtain the frequency domain data corresponding to every frame voice data, be designated as Ai,j;Wherein, i represents the numbering of frequency, j
Represent the numbering of frame, Ai,jRepresent jth frame frequency domain data at i-th frequency.
Correspondingly, in a possible implementation of the present embodiment, described recognition unit 53, specifically
May be used for according to the frequency domain data corresponding to every frame voice data, it is thus achieved that corresponding to every frame voice data
Frequency domain data energy component at each frequency;If every frequency domain data corresponding to frame voice data is extremely
Lack difference between any two in the energy component at an identical frequency, less than or equal to described energy threshold,
The tonequality identifying described target audio file is described second tonequality.Energy spectrum corresponding to this situation can
With as shown in Figure 4.
Such as, described recognition unit 53 is remembered according to the frequency domain data corresponding to the every frame voice data obtained
For Ai,j, it is thus achieved that every frequency domain data corresponding to frame voice data energy component E at each frequencyi,j;Its
In, i represents the numbering of frequency, and j represents the numbering of frame, Ei,jRepresent jth frame energy at i-th frequency
Amount component.
In the present embodiment, obtain target audio file to be identified by acquiring unit, and then by feature list
Unit is according to described target audio file, it is thus achieved that the time domain waveform feature of described target audio file and described mesh
At least one in the frequency domain spectral line characteristic of mark with phonetic symbols frequency file so that recognition unit can be according to described time domain
At least one in wave character and described frequency domain spectral line characteristic, identifies the tonequality of described target audio file
Being the first tonequality or the second tonequality, described first tonequality is higher than described second tonequality, so, and can be to
User provides the audio file of real high tone quality, allows users to appreciate the audio frequency literary composition of real high tone quality
Part.
It addition, use the technical scheme that the present invention provides, simple to operate, it is possible to be effectively improved audio file
The efficiency of tonequality identification.
Those skilled in the art is it can be understood that arrive, and for convenience and simplicity of description, above-mentioned retouches
The specific works process of the system stated, device and unit, is referred to the correspondence in preceding method embodiment
Process, does not repeats them here.
In several embodiments provided by the present invention, it should be understood that disclosed system, device and
Method, can realize by another way.Such as, device embodiment described above is only shown
Meaning property, such as, the division of described unit, be only a kind of logic function and divide, actual can when realizing
There to be other dividing mode, the most multiple unit or assembly can in conjunction with or be desirably integrated into another
System, or some features can ignore, or do not perform.Another point, shown or discussed each other
Coupling direct-coupling or communication connection can be the INDIRECT COUPLING by some interfaces, device or unit
Or communication connection, can be electrical, machinery or other form.
The described unit illustrated as separating component can be or may not be physically separate, makees
The parts shown for unit can be or may not be physical location, i.e. may be located at a place,
Or can also be distributed on multiple NE.Can select according to the actual needs part therein or
The whole unit of person realizes the purpose of the present embodiment scheme.
It addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit,
Can also be that unit is individually physically present, it is also possible to two or more unit are integrated in a list
In unit.Above-mentioned integrated unit both can realize to use the form of hardware, it would however also be possible to employ hardware adds software
The form of functional unit realizes.
The above-mentioned integrated unit realized with the form of SFU software functional unit, can be stored in a computer
In read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, including some fingers
Make with so that computer installation (can be personal computer, Audio Processing engine, or network
Device etc.) or processor (processor) perform the part steps of method described in each embodiment of the present invention.
And aforesaid storage medium includes: USB flash disk, portable hard drive, read only memory (Read-Only Memory,
ROM), random access memory (Random Access Memory, RAM), magnetic disc or light
The various medium that can store program code such as dish.
Last it is noted that above example is only in order to illustrate technical scheme, rather than to it
Limit;Although the present invention being described in detail with reference to previous embodiment, the ordinary skill of this area
Personnel it is understood that the technical scheme described in foregoing embodiments still can be modified by it, or
Person carries out equivalent to wherein portion of techniques feature;And these amendments or replacement, do not make corresponding skill
The essence of art scheme departs from the spirit and scope of various embodiments of the present invention technical scheme.