CN102842310A - Method for extracting and utilizing audio features for repairing Chinese national folk music audios - Google Patents

Method for extracting and utilizing audio features for repairing Chinese national folk music audios Download PDF

Info

Publication number
CN102842310A
CN102842310A CN2012102849714A CN201210284971A CN102842310A CN 102842310 A CN102842310 A CN 102842310A CN 2012102849714 A CN2012102849714 A CN 2012102849714A CN 201210284971 A CN201210284971 A CN 201210284971A CN 102842310 A CN102842310 A CN 102842310A
Authority
CN
China
Prior art keywords
audio frequency
audio
repaired
cultural traits
music
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012102849714A
Other languages
Chinese (zh)
Inventor
王劲松
李柏岩
宋辉
黄钢
袁征
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI XIEYAN SCIENCE AND TECHNOLOGY SERVICE Co Ltd
Original Assignee
SHANGHAI XIEYAN SCIENCE AND TECHNOLOGY SERVICE Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI XIEYAN SCIENCE AND TECHNOLOGY SERVICE Co Ltd filed Critical SHANGHAI XIEYAN SCIENCE AND TECHNOLOGY SERVICE Co Ltd
Priority to CN2012102849714A priority Critical patent/CN102842310A/en
Publication of CN102842310A publication Critical patent/CN102842310A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Auxiliary Devices For Music (AREA)

Abstract

The invention discloses a method for extracting and utilizing audio features, which is particularly suitable for repairing Chinese national folk music audios, and the method comprises following steps of determining a music type of a sample audio and adopting values of different culture features associated to the music type as a culture feature set; extracting a digital feature of a digital audio signal of the sample audio, performing feature selection and classification, and acquiring a digital feature set of the sample audio; correlating the culture feature set and the digital feature set so as to set an audio feature database; and determining a music type of an audio to be repaired and adopting values of different culture features in the music type as a culture feature set, searching and acquiring the culture feature set with high matching degree from the music feature database, extracting the digital feature set which is correlated with the culture feature set as the digital feature set of the audio to be repaired, and exporting the digital feature set to be used for repairing the audio. Due to adopting the method, the culture features of the audio are introduced in the audio repairing, and the repairing quality can be guaranteed.

Description

The audio feature extraction that China's national folk music VF is repaired and the method for use
Technical field
The present invention relates to the method for a kind of audio feature extraction and use, relate in particular to a kind of audio feature extraction that historical audio frequency that Chinese national folk music VF repairs repairs and method of use of being used for.
Background technology
Music is ubiquitous in people's life.From ancient times to the present, music is interted in the whole development course of human society, has put down in writing moment fine in the numerous life, gives people with shock on the soul and emotion.
For colourful musical recording is got off, people have invented various music storage media, and the various treatment technologies of music also arise at the historic moment simultaneously.At the analogue audio frequency treatment technology is the main age; The processing of audio frequency mainly is to rely on various professional equipments to handle; The mixing of audio frequency, delay, change all are to accomplish through various device; Because circuit such as the amplification of various device, filtering, delay all might be introduced the distortion of new noise and audio frequency, the cost of these equipment is very expensive in addition, and this has just influenced the development of analogue audio frequency technology to a certain extent.
The develop rapidly of Along with computer technology is that more and more important role is being played the part of in the information processing of core with the computing machine, and the digital audio processing technology has also obtained development at full speed.Digital audio processing technology is different with the analogue audio frequency treatment technology, and it is through after carrying out simulating signal the quantification treatment on temporal discretize and the amplitude, becomes a succession of digital signal and stores and transmit.After sound signal became digital form, all processing in fact all were a kind of processing of numeral, just can realize on computers through software based on theory of digital signal processing and various algorithm.With the computer software be main implementation method have cost low with processing mode advantage flexibly; Sound card mixed by computing machine and Audio Processing software just can be done various processing; And can revise repeatedly, repeatedly process; Along with improving constantly of computer process ability, its non real-time shortcoming is also progressively overcome.
Though the processing that appears as music of computing machine brings great convenience, still there are a lot of early stage music still to be stored in the old mounting medium with the mode of simulating signal.For example generally all collect the audio data of a large amount of different times, different carriers medium in the library in each big professional music conservatory school of China; Wherein be no lack of classical performance, sing sound material; But along with the transition passing of time and the technical limitation of carrier material, some early stage audio datas have been on the verge of the edge that lost efficacy.To the historical audio data of preciousness clean, digitizing and reparation are protection and the effective way that realizes utilizing again.But; Because the region of region that composer, deduction person, composition take place and time and deduction generation and time is different, each music track can have different representation style (combination that is rich in individual character that comprises the music factor of melody, rhythm, tone color, dynamics, harmony, limbs and musical form etc.).In addition, the mounting medium of music track also can influence the representation style of this music track.The cultural traits (such as composer, deduction person, carrier etc.) that are music track can have influence on its representation style.If in to historical audio data repair process, ignored the consideration of this part, so such reparation will be unsuccessful.
Therefore, those skilled in the art is devoted to develop a kind of audio feature extraction of historical audio frequency reparation and the method for use, repairs so that in the historical audio frequency repair process of the national folk music of centering state (folk music), introduce the cultural traits of this audio frequency.
Summary of the invention
Because the above-mentioned defective of prior art; Technical matters to be solved by this invention provides a kind of audio feature extraction of historical audio frequency reparation and the method for use; The audio frequency characteristics database of the set of the cultural traits through setting up associated audio and the set of numerical characteristic; To be implemented in the repair process to audio frequency, from the audio frequency characteristics database, to derive the numerical characteristic relevant and repair with the cultural traits of this audio frequency.
For realizing above-mentioned purpose, the invention provides a kind of audio feature extraction of historical audio frequency reparation and the method for use, it is characterized in that, comprise step:
Confirm the music type of sample audio frequency, confirm that said sample audio frequency is about the cultural traits set as said sample audio frequency of the value of each cultural traits of said music type;
Said sample audio conversion is become the digital audio and video signals of WAV form, and said digital audio and video signals is carried out pre-service;
Extract numerical characteristic, use sorter that said numerical characteristic is carried out feature selecting and classification, obtain the numerical characteristic set of said sample audio frequency through said pretreated said digital audio and video signals;
The said cultural traits set of related said sample audio frequency and the set of said numerical characteristic are to set up the audio frequency characteristics database of said music type;
Confirm the music type of audio frequency to be repaired; Confirm that said audio frequency to be repaired is about the cultural traits set as said audio frequency to be repaired of the value of each cultural traits in the said music type; The cultural traits set of the said audio frequency to be repaired of retrieval in the audio frequency characteristics database of said music type; The said cultural traits set of the said sample audio frequency that the cultural traits sets match degree of acquisition and said audio frequency to be repaired is the highest, use is gathered as the numerical characteristic of said audio frequency to be repaired with the said numerical characteristic of the said cultural traits set associative of said sample audio frequency and is gathered;
The said numerical characteristic set of said audio frequency to be repaired is derived to be used for the reparation to said audio frequency to be repaired.
Further, said sample audio frequency and said audio frequency to be repaired are all the audio frequency of folk music, and said music type comprises seven-stringed plucked instrument in some ways similar to the zither class, Fujian southern music class and exhales the wheat class.
Further, the cultural traits of said seven-stringed plucked instrument in some ways similar to the zither class comprise qin group, style, carrier and age; The cultural traits of said Fujian southern music class comprise the name of tune, musical instrument, carrier and age; The said cultural traits of wheat class of exhaling comprise sounding position, carrier and age.
Further, said sample audio frequency is a description entry about the value of the said cultural traits of said music type.
Further, said pre-service comprises that unified sampling rate, sound channel merge and windowing divides frame, and the sampling rate of the said digital audio and video signals of the said unified sampling rate of process is 16kHz.
Further, when said digital audio and video signals being carried out said windowing divide frame, use Hamming window as window function, it is 1/2 that frame moves, and window length is the length of 512 sampled points.
Further, said numerical characteristic comprises tonality feature, loudness characteristic, tamber characteristic and the rhythm characteristic of the said digital audio and video signals of said sample audio frequency; Said tonality feature comprises the spectrum peak of said digital audio and video signals; Said loudness characteristic comprises the ratio of the low-yield frame of said digital audio and video signals; Said tamber characteristic comprises short-time zero-crossing rate, frequency spectrum barycenter and the MFCC of said digital audio and video signals; The beat intensity that said rhythm characteristic comprises said digital audio and video signals and, the strongest beat and the intensity of strong beat.
Further, said sorter is a support vector machine classifier.
Further, the algorithm that when said digital audio and video signals is carried out said feature selecting, adopts is heuristic search (HBS) forward and heuristic search (HFS) backward.
Further, said matching degree is to use the ratio of the element number that the element number that the cultural traits set of the said audio frequency to be repaired that fuzzy matching obtains overlaps with the said cultural traits set of said sample audio frequency and the said cultural traits of said audio frequency to be repaired gather.
In preferred embodiments of the present invention; Use the method for audio feature extraction and the use of historical audio frequency reparation of the present invention to be applied to the reparation of the historical audio frequency of Chinese national folk music; Set up the audio frequency characteristics database that comprises seven-stringed plucked instrument in some ways similar to the zither class, Fujian southern music class and exhale the wheat class; Comprise step: at first use a plurality of sample audio frequency, confirm their music type, music type comprises seven-stringed plucked instrument in some ways similar to the zither class, Fujian southern music class and exhales the wheat class; Each music type has a plurality of cultural traits, and for example the cultural traits of seven-stringed plucked instrument in some ways similar to the zither class comprise qin group, style, carrier and age; The cultural traits of Fujian southern music class comprise the name of tune, musical instrument, carrier and age; Exhale the cultural traits of wheat class to comprise sounding position, carrier and age; Confirm the value of each sample audio frequency then respectively, and gather as the cultural traits of this sample audio frequency that the sample audio frequency is description entrys about the value of certain cultural traits with the set of these values about each cultural traits of the music type under it; In addition, this sample audio conversion is become to comprise behind the digital audio and video signals of WAV form this digital audio and video signals is unified sampling rate, sound channel merges and windowing divides frame pre-service; Extract numerical characteristic then through pretreated digital audio and video signals; Numerical characteristic comprises tonality feature, loudness characteristic, tamber characteristic and rhythm characteristic; Wherein tonality feature comprises the spectrum peak of this digital audio and video signals; The loudness characteristic comprises the ratio of the low-yield frame of this digital audio and video signals; Tamber characteristic comprises short-time zero-crossing rate, frequency spectrum barycenter and the MFCC of this digital audio and video signals, the beat intensity that rhythm characteristic comprises this digital audio and video signals and, the strongest beat and the intensity of strong beat; Use sorter that above-mentioned numerical characteristic is carried out feature selecting and classification then, obtain the numerical characteristic set of this sample audio frequency; The cultural traits set and the numerical characteristic of last related this sample audio frequency are gathered, and deposit set of the cultural traits of the association of each sample audio frequency and numerical characteristic set in database, set up the audio frequency characteristics database of each music type thus.In preferred embodiments of the present invention, also provide numerical characteristic set that the method for the audio feature extraction of using historical audio frequency reparation of the present invention and use obtains audio frequency to be repaired to be used for reparation to this audio frequency.Comprise step: confirm the music type of audio frequency to be repaired, confirm the value of this audio frequency to be repaired, and the set of these values is gathered as the cultural traits of audio frequency to be repaired about each cultural traits in the music type under it; Audio frequency wherein to be repaired is description entrys about the value of certain cultural traits; Under this audio frequency to be repaired, retrieve in the audio frequency characteristics database of music type then; Use element in the cultural traits set of audio frequency to be repaired as keyword; Use fuzzy matching, the cultural traits set of the sample audio frequency that the cultural traits sets match degree of acquisition and audio frequency to be repaired is the highest; Extraction is gathered as the numerical characteristic of audio frequency to be repaired with the numerical characteristic set of the cultural traits set associative of this sample audio frequency then; Numerical characteristic set that at last will this audio frequency to be repaired is derived to be used for the reparation to this repair touch tone.
This shows; The present invention adopts the audio frequency characteristics that has comprised cultural traits and numerical characteristic; Through a plurality of sample audio frequency, set up the cultural traits set of related a plurality of audio frequency and the audio frequency characteristics database of a plurality of music types that numerical characteristic is gathered, and; In treating the process that repair touch tone repairs; Can under it, derive the numerical characteristic set of the sample audio frequency the most similar the audio frequency characteristics database of music type, and this numerical characteristic set is used for the reparation to this audio frequency to be repaired through confirming the cultural traits set of this audio frequency to be repaired with its cultural traits set; Thereby can make the audio frequency after the reparation more meet its cultural traits, guarantee the quality of repairing.
Below will combine accompanying drawing that the technique effect of design of the present invention, concrete structure and generation is described further, to understand the object of the invention, characteristic and effect fully.
Description of drawings
Fig. 1 is to use the method for audio feature extraction and the use of historical audio frequency reparation of the present invention to set up the process flow diagram of audio frequency characteristics database.
Fig. 2 is to use the method for audio feature extraction and the use of historical audio frequency reparation of the present invention to obtain the process flow diagram of numerical characteristic set to be used to repair of audio frequency to be repaired.
Fig. 3 is to use the results of comparative experiment figure of the audio frequency after audio frequency and the reparation of not using the present invention to repair after the reparation that the method for audio feature extraction and the use of historical audio frequency reparation of the present invention repairs.
Embodiment
As shown in Figure 1, in one embodiment of the invention, use the method for audio feature extraction and the use of historical audio frequency reparation of the present invention to be applied to the reparation of the historical audio frequency of Chinese national folk music.At first, use the audio feature extraction of historical audio frequency reparation of the present invention and the method for use to set up the audio frequency characteristics database, step is following:
Step 101, obtain cultural traits set.
Because the difference on school, deduction mode and the form of expression of music track can be divided into a plurality of music types, for a plurality of music track that belong to a music type, can think that they have more similarity on its cultural traits.Therefore the present invention at first divides music type, confirms the cultural traits of each music type, and sets up the audio frequency characteristics database of each music type.
In the present embodiment, through collection, arrangement and analytical work to all kinds of audio datas, the various music types of having confirmed to comprise seven-stringed plucked instrument in some ways similar to the zither class, Fujian southern music class and having exhaled the wheat class; The cultural traits of confirming the seven-stringed plucked instrument in some ways similar to the zither class simultaneously comprise qin group, style, carrier and age, and the cultural traits of Fujian southern music class comprise the name of tune, musical instrument, carrier and age, exhale the cultural traits of wheat class to comprise sounding position, carrier and age.Wherein carrier is meant the medium at this audio frequency place, for example: shellac disk, tygon material disc, magnetic sound recording tape and laser disc etc.Year, this audio frequency of acute pyogenic infection of finger tip was stored the time on this carrier, if pirate recordings, then the time with mastering is as the criterion.
For each music type, select a plurality of music track as the sample audio frequency.These selecteed sample audio frequency must have state preferably, and for example tone color is true to nature, noise is low etc.Can select some through the contained music track of the old preferably disc of repairing of quality as the sample audio frequency.
Confirm the value of each sample audio frequency respectively, and gather as the cultural traits of this sample audio frequency with the set of these values about each cultural traits of the music type under it.Wherein, the sample audio frequency is description entrys about the value of certain cultural traits.For example be recorded in the seven-stringed plucked instrument in some ways similar to the zither bent " Xiangjiang water cloud " on the shellac disk for nineteen thirty, it about the value of each cultural traits of seven-stringed plucked instrument in some ways similar to the zither class is respectively: qin group=Zhejiang group, style=grace, carrier=shellac disk, age=nineteen thirty.Therefore, the cultural traits set that can obtain this sample audio frequency is { Zhejiang group, grace, shellac disk, nineteen thirty }.
Step 102, audio conversion.
Each sample audio conversion of selecting in the step 101 is become the digital audio and video signals of WAV form.
Common audio format mainly contains MP3, WAV etc., therefore at first should be with the uniform format of audio frequency.Because MP3 format mainly is a kind of coding rule of audio compression, the extraction of the numerical characteristic after being unfavorable for converts all sample audio frequency unifications into help analyzing WAV form in the present embodiment.
Step 103, Signal Pretreatment.
The digital audio and video signals that obtains in the step 102 is comprised unified sampling rate, the pre-service that sound channel merges and windowing divides frame.
Because uneven sampling rate can produce bad impression to the extraction of some special numerical characteristics; And excessive sampling rate can't increase in the music the useful information that can extract; Also can bring huge storage overhead on the contrary; In the present embodiment, the unified resampling of used sample audio frequency is set at 16kHz.
The audio frequency of two sound channels in the sample audio frequency is all merged into monophony, think that the extraction of musical features facilitates.
After the digital audio and video signals that merges through unified sampling rate and sound channel carried out Filtering Processing, carry out windowing and divide the frame processing.The frame number of general per second is about 33~100 frames, adopts the method for overlapping segmentation so that seamlessly transit between frame and the frame, keeps its continuity.The overlapping of former frame and back one frame partly is called frame and moves.The ratio that frame moves with frame length generally is taken as 0~1/2.Divide frame to realize, also promptly use certain window function w (n) to take advantage of signal s (n), thereby form the signal S of windowing with the method for finite length window weighting movably w(n)=s (n) * w (n).Use Hamming window as window function in the present embodiment, it is 1/2 that frame moves, and window length is the length of 512 sampled points.
Step 104, extraction numerical characteristic.
The digital audio and video signals that pre-service through step 103 is obtained extracts numerical characteristic.
In the present embodiment; Through collection, arrangement and analytical work to all kinds of audio datas; All kinds of numerical characteristics of tonality feature, loudness characteristic, tamber characteristic and rhythm characteristic have been confirmed to comprise; Wherein, tonality feature comprises the spectrum peak of this digital audio and video signals, and the loudness characteristic comprises the ratio of the low-yield frame of this digital audio and video signals; Tamber characteristic comprises short-time zero-crossing rate, frequency spectrum barycenter and the MFCC of this digital audio and video signals, the beat intensity that rhythm characteristic comprises this digital audio and video signals and, the strongest beat and the intensity of strong beat.
Energy frequency spectrum is signal S w(n) transform from the time domain to frequency domain and the tolerance of each frequency energy of obtaining is in particular signal S w(n) through asking the quadratic sum of its real part and imaginary part after the Fourier transform.
Amplitude spectrum is signal S w(n) the process Fourier transform is asked the quadratic sum of its real part and imaginary part afterwards, and then extraction of square root.
Beat histogram calculation method is: at first try to achieve signal S w(n) the short-time energy mean square value (RMS) in each window is done Fast Fourier Transform (FFT) (FFT) to the RMS sequence then, and the energy spectrum figure that obtains RMS representes the periodicity of music signal energy, represents beat with the periodicity of music signal energy.
Spectrum peak is that a dimension is the numerical characteristic of 1 dimension, through analytic signal S w(n) obtain through the spectral magnitude after the FFT.Through at signal S w(n) set a thresholding in the frequency domain regional area and come detection peak, all maximal values in this thresholding can be regarded as peak value.
The ratio of low-yield frame is that a dimension is the numerical characteristic of 1 dimension, and the situation of change between its expression frame and frame on energy is through calculating the number percent that the energy on time domain in k the adjacent frame obtains less than the time domain average energy of this k frame.In the present embodiment, k=100.
Short-time zero-crossing rate Z (i) is that a dimension is the numerical characteristic of 1 dimension, and it is signal S w(n) sampled value in the i frame by just to negative and by negative to the number of times that is just changing.Its computing method are:
Z ( i ) = 1 2 N Σ n N - 1 | sgn [ x i ( n ) - sgn x i ( n - 1 ) ] | ,
Wherein
Figure BDA00001996514500062
N is the number of sampled point in the i frame, x i(n) be the amplitude of certain sampled point on time domain.
Frequency spectrum barycenter C (i) is that a dimension is the numerical characteristic of 1 dimension, and it is the tolerance of i frame spectral shape, and the big corresponding brighter acoustic construction of its value has more energy at high frequency treatment.Its computing method are:
Figure BDA00001996514500063
X wherein iBe the sample of i frame, X i(m) be the coefficient of the Fourier transform of correspondence.
MFCC is that a dimension is the numerical characteristic of 13 dimensions, i.e. Mel cepstrum coefficient, it applies to the auditory properties of people's ear in the Signal Processing, voice with voice recognition and classify, this is one of the most useful characteristic.It extracts flow process: signal calculated S w(n) power spectrum, calculate discrete cosine transform, calculate Mel frequency spectrum cepstrum, obtain MFCC.
Beat intensity be a dimension be 1 the dimension numerical characteristic, it is the intensity sum of detected all beats in one section music signal.
The strongest beat is that a dimension is the numerical characteristic of 1 dimension, and it is the maximum beat of intensity in the beat histogram, gets through calculating that maximum corresponding beat number of beat histogram intermediate value, and unit is beat/per minute.
The intensity of the strongest beat is that a dimension is the numerical characteristic of 1 dimension, in its intensity and beat histogram through calculating the strongest beat the intensity of all beats and ratio obtain, codomain is (0,1).
Each numerical characteristic of above-mentioned sample audio frequency is constituted the eigenmatrix of one 20 dimension; And after calculating the standard deviation of each numerical characteristic respectively; With the vector of each numerical characteristic and synthetic one 40 dimension of standard deviation der group thereof, and the vector of these 40 dimensions is vectorial as characteristic of division.
Step 105, obtain numerical characteristic set.
The characteristic of division vector is used sorter checking classifying quality, and sorter can be Naive Bayes Classification device, BP neural network classifier, k nearest neighbor sorter (K gets 3 and 5 respectively), decision tree classification device and support vector machine classifier.Use support vector machine classifier (SVM) in the present embodiment.
Because each numerical characteristic and their various combination, have nothing in common with each other for the influence of classification performance; Some characteristic can play bigger effect, and the effect of some characteristic is then very little, even can reduce the performance of sorter, therefore need carry out feature selecting.In the present embodiment, two kinds of didactic hybrid characteristic selecting methods have been designed: heuristic search (HFS) forward and heuristic search (HBS) backward.
The step that the algorithm of HFS is carried out is following, uses SVM as sorter in the experiment:
1) with the vector of all 40 dimensions as initial characteristics subclass FS Opt, and with sorter data set is classified;
2) with the sample data of classification error from test set D TeIn separate, as misdata collection D Er
3) calculate the ReliefF weights of each dimensional feature, and the characteristic that weights are minimum is from FS OptIn remove, notice that each characteristic is once disallowable at the most;
4) use FS OptIn the corresponding data collection of contained characteristic carry out classification experiments, if accuracy rate improves, then return step 2), otherwise jump into next step;
5) add just disallowable characteristic again, and make the needed searching times of adding new feature increase 1.If number of times has surpassed pre-set threshold, then algorithm stops; Otherwise return step 2).
The step that the algorithm of HBS is carried out is following, uses SVM as sorter in the experiment:
1) puts optimal feature subset FS OptBe sky, at training dataset D TrThe ReliefF weights of last all characteristics of calculating, then that weights are a highest characteristic joins FS OptIn;
2) use FS OptIn the characteristic that comprises carry out classification experiments;
3) will classify correct sample from test set D TeIn separate, as proper data collection D Ri
4) calculate D RiIn the ReliefF weights of each dimensional feature, and the highest and not at FS weights OptIn characteristic join FS OptIn;
5) to FS OptIn characteristic carry out class test, if classification accuracy improves, then return step 3), otherwise get into step 6);
6) reject the characteristic that has just added, and make the needed searching times of adding new feature increase 1.If number of times has surpassed pre-set threshold, then algorithm stops; Otherwise return step 3).
The ReliefF weights of each dimensional feature above-mentioned are the weights of each dimensional feature when using the ReliefF algorithm.Like this, after above-mentioned feature selecting, can obtain a plurality of numerical characteristics of sample audio frequency, they are formed the numerical characteristic set of set as this sample audio frequency.
Step 106, set up the audio frequency characteristics database.
For each sample audio frequency that belongs to certain music type; The numerical characteristic of this sample audio frequency that obtains in the cultural traits set of this sample audio frequency that obtains in the step 101 and the step 105 gathered be associated; Vector for example partners; And store in the database, set up the audio frequency characteristics database of this music type thus.
In the present embodiment, set up respectively the seven-stringed plucked instrument in some ways similar to the zither class audio frequency characteristics database, Fujian southern music class the audio frequency characteristics database and exhale the audio frequency characteristics database of wheat class.In other embodiments of the invention; Can set up the audio frequency characteristics database of other music type with similar step; For example can also set up the audio frequency characteristics database of various music types such as suona horn, flute, Zheng, can set up the audio frequency characteristics database of various music types such as piano, violin, flute, opera for western music for Chinese folk music.
Fig. 2 has shown that the method for the audio feature extraction of using historical audio frequency reparation of the present invention and use obtains the flow process of numerical characteristic set to be used to repair of audio frequency to be repaired, comprises the steps:
Step 201, obtain cultural traits set.
Before the method for the audio feature extraction of using historical audio frequency reparation of the present invention and use is obtained the numerical characteristic set of a certain audio frequency to be repaired; At first need confirm the music type of this audio frequency to be repaired, confirm the value of this audio frequency to be repaired then about each cultural traits of this music type.Describe the music type of this audio frequency of random sample really in concrete grammar and the step 101 and confirm that this sample audio frequency is the same about the method for the value of each cultural traits of this music type.For example; For audio frequency to be repaired is the seven-stringed plucked instrument in some ways similar to the zither bent " fisherman's song " that is recorded in nineteen thirty on the shellac disk; The music type of at first confirming this audio frequency to be repaired is the seven-stringed plucked instrument in some ways similar to the zither class, confirms that then its value about each cultural traits of seven-stringed plucked instrument in some ways similar to the zither class is respectively: qin group=Zhejiang group, style=grace; Carrier=shellac disk, age=nineteen thirty.Therefore, the cultural traits set that can obtain this audio frequency to be repaired is { Zhejiang group, grace, shellac disk, nineteen thirty }.
Step 202, retrieval audio frequency characteristics database.
With the element in the cultural traits set of the audio frequency to be repaired that obtains in the step 101 as keyword; Under this audio frequency to be repaired, retrieve in the audio frequency characteristics database of music type; Use fuzzy matching, the cultural traits set of the sample audio frequency that the cultural traits sets match degree of acquisition and audio frequency to be repaired is the highest.Matching degree is meant the ratio of the element number that the element number that the cultural traits set of using the audio frequency to be repaired that fuzzy matching obtains overlaps with the cultural traits set of sample audio frequency and the cultural traits of audio frequency to be repaired are gathered.
Be the seven-stringed plucked instrument in some ways similar to the zither bent " fisherman's song " that is recorded in nineteen thirty on the shellac disk for audio frequency to be repaired for example, its cultural traits set is { Zhejiang group, grace, shellac disk, nineteen thirty }.The element of this culture characteristic set is Zhejiang group, grace, shellac disk and nineteen thirty, and number is 4.As keyword, use fuzzy matching algorithm in the audio frequency characteristics database of seven-stringed plucked instrument in some ways similar to the zither class, to retrieve these elements, it is following to obtain result for retrieval:
1, { Zhejiang group, grace, shellac disk, nineteen thirty };
2, { Yu Shan group, light far away, shellac disk, nineteen thirty };
3, { Zhejiang group, grace, tygon material disc, nineteen fifty };
4, { nine a word used in place names groups, vigorous, shellac disk, nineteen thirty-five }.
Matching degree between the cultural traits set that can calculate this audio frequency to be repaired is so respectively gathered with the cultural traits of the audio frequency of above-mentioned 4 result for retrieval.For result for retrieval 1, matching degree is 100%; For result for retrieval 2, matching degree is 50%; For result for retrieval 3, matching degree is 50%; For result for retrieval 4, matching degree is 25%.Can find out, with the cultural traits sets match degree of this audio frequency to be repaired the highest be the cultural traits set of the audio frequency of result for retrieval 1.
Step 203, obtain numerical characteristic set.
Obtain the numerical characteristic set that is associated with the cultural traits set of the sample audio frequency that obtains in the step 201 the audio frequency characteristics database of the music type under audio frequency to be repaired, and with the numerical characteristic set of this numerical characteristic set as audio frequency to be repaired.
Step 204, the reparation of importing audio frequency.
Deriving the audio frequency characteristics database of the music type of the numerical characteristic of the audio frequency to be repaired that obtains in the step 203 set under this audio frequency to be repaired; Send to audio frequency and repair software or program, when this audio frequency to be repaired being repaired the setting of parameter to participate in.
The method that Fig. 3 has provided audio feature extraction and the use of historical audio frequency reparation is applied to the results of comparative experiment of reparation of the historical audio frequency of Chinese national folk music; Wherein the audio frequency of A group has been to use the audio frequency after the reparation that the present invention repairs, and the audio frequency of B, C and D group is the audio frequency that does not use after the reparation that the present invention repairs.In the experiment, every group has been adopted 10 audio frequency, by the musical expert group membership about signal to noise ratio (S/N ratio), artistic value, the tone color tonequality of each audio frequency with dynamically change four aspects and carry out double blinding and judge and give a mark.Can find out from the result, use audio frequency after the reparation that the present invention repairs in artistic value, tone color tonequality with dynamically the performance aspect the change is all more outstanding.Promptly use the audio frequency after the reparation that the present invention repairs more to meet its cultural traits, thereby guaranteed the quality of repairing.
More than describe preferred embodiment of the present invention in detail.Should be appreciated that those of ordinary skill in the art need not creative work and just can design according to the present invention make many modifications and variation.Therefore, the technician in all present technique field all should be in the determined protection domain by claims under this invention's idea on the basis of existing technology through the available technical scheme of logical analysis, reasoning, or a limited experiment.

Claims (10)

1. the method for audio feature extraction and use is used for the reparation of historical audio frequency, it is characterized in that, comprises step:
Confirm the music type of sample audio frequency, confirm that said sample audio frequency is about the cultural traits set as said sample audio frequency of the value of each cultural traits of said music type;
Said sample audio conversion is become the digital audio and video signals of WAV form, and said digital audio and video signals is carried out pre-service;
Extract numerical characteristic, use sorter that said numerical characteristic is carried out feature selecting and classification, obtain the numerical characteristic set of said sample audio frequency through said pretreated said digital audio and video signals;
The said cultural traits set of related said sample audio frequency and the set of said numerical characteristic are to set up the audio frequency characteristics database of said music type;
Confirm the music type of audio frequency to be repaired; Confirm that said audio frequency to be repaired is about the cultural traits set as said audio frequency to be repaired of the value of each cultural traits in the said music type; The cultural traits set of the said audio frequency to be repaired of retrieval in the audio frequency characteristics database of said music type; The said cultural traits set of the said sample audio frequency that the cultural traits sets match degree of acquisition and said audio frequency to be repaired is the highest, use is gathered as the numerical characteristic of said audio frequency to be repaired with the said numerical characteristic of the said cultural traits set associative of said sample audio frequency and is gathered;
The said numerical characteristic set of said audio frequency to be repaired is derived to be used for the reparation to said audio frequency to be repaired.
2. the method for audio feature extraction as claimed in claim 1 and use, wherein said sample audio frequency and said audio frequency to be repaired are all the audio frequency of folk music, and said music type comprises seven-stringed plucked instrument in some ways similar to the zither class, Fujian southern music class and exhales the wheat class.
3. the method for audio feature extraction as claimed in claim 2 and use, the cultural traits of wherein said seven-stringed plucked instrument in some ways similar to the zither class comprise qin group, style, carrier and age; The cultural traits of said Fujian southern music class comprise the name of tune, musical instrument, carrier and age; The said cultural traits of wheat class of exhaling comprise sounding position, carrier and age.
4. the method for audio feature extraction as claimed in claim 3 and use, wherein said sample audio frequency is a description entry about the value of the said cultural traits of said music type.
5. the method for audio feature extraction as claimed in claim 4 and use, wherein said pre-service comprise that unified sampling rate, sound channel merge and windowing divides frame, and the sampling rate of the said digital audio and video signals of the said unified sampling rate of process is 16kHz.
6. the method for audio feature extraction as claimed in claim 5 and use wherein when said digital audio and video signals being carried out said windowing divide frame, uses Hamming window as window function, and it is 1/2 that frame moves, and window length is the length of 512 sampled points.
7. like the method for claim 1 or 3 described audio feature extraction and use, wherein said numerical characteristic comprises tonality feature, loudness characteristic, tamber characteristic and the rhythm characteristic of the said digital audio and video signals of said sample audio frequency; Said tonality feature comprises the spectrum peak of said digital audio and video signals; Said loudness characteristic comprises the ratio of the low-yield frame of said digital audio and video signals; Said tamber characteristic comprises short-time zero-crossing rate, frequency spectrum barycenter and the MFCC of said digital audio and video signals; The beat intensity that said rhythm characteristic comprises said digital audio and video signals and, the strongest beat and the intensity of strong beat.
8. like the method for claim 1 or 3 described audio feature extraction and use, wherein said sorter is a support vector machine classifier.
9. like the method for claim 1 or 3 described audio feature extraction and use, the algorithm that wherein when the said numerical characteristic to said digital audio and video signals carries out said feature selecting, adopts is heuristic search forward and heuristic search backward.
10. like the method for claim 1 or 3 described audio feature extraction and use, wherein said matching degree is to use the ratio of the element number that the element number that the said cultural traits set of the said audio frequency to be repaired that fuzzy matching obtains overlaps with the said cultural traits set of said sample audio frequency and the said cultural traits of said audio frequency to be repaired gather.
CN2012102849714A 2012-08-10 2012-08-10 Method for extracting and utilizing audio features for repairing Chinese national folk music audios Pending CN102842310A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012102849714A CN102842310A (en) 2012-08-10 2012-08-10 Method for extracting and utilizing audio features for repairing Chinese national folk music audios

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012102849714A CN102842310A (en) 2012-08-10 2012-08-10 Method for extracting and utilizing audio features for repairing Chinese national folk music audios

Publications (1)

Publication Number Publication Date
CN102842310A true CN102842310A (en) 2012-12-26

Family

ID=47369595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012102849714A Pending CN102842310A (en) 2012-08-10 2012-08-10 Method for extracting and utilizing audio features for repairing Chinese national folk music audios

Country Status (1)

Country Link
CN (1) CN102842310A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104464754A (en) * 2014-12-11 2015-03-25 北京中细软移动互联科技有限公司 Sound brand search method
CN105761728A (en) * 2015-12-02 2016-07-13 中国传媒大学 Chinese typical hearing culture symbol characteristic selection method
CN106202128A (en) * 2015-05-08 2016-12-07 富士通株式会社 The sorting technique of sequential file and categorizing system
CN106372257A (en) * 2016-10-09 2017-02-01 华中师范大学 Retrieval method and device of musical instruments
CN109147816A (en) * 2018-06-05 2019-01-04 安克创新科技股份有限公司 The method and apparatus of volume adjustment is carried out to music
CN109176541A (en) * 2018-09-06 2019-01-11 南京阿凡达机器人科技有限公司 A kind of method, equipment and storage medium realizing robot and dancing
CN116312636A (en) * 2023-03-21 2023-06-23 广州资云科技有限公司 Method, apparatus, computer device and storage medium for analyzing electric tone key

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005106844A1 (en) * 2004-04-29 2005-11-10 Koninklijke Philips Electronics N.V. Method of and system for classification of an audio signal
CN101477801A (en) * 2009-01-22 2009-07-08 东华大学 Method for detecting and eliminating pulse noise in digital audio signal
CN101882442A (en) * 2009-05-04 2010-11-10 上海音乐学院 Historical voice frequency noise detection and elimination method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005106844A1 (en) * 2004-04-29 2005-11-10 Koninklijke Philips Electronics N.V. Method of and system for classification of an audio signal
CN101477801A (en) * 2009-01-22 2009-07-08 东华大学 Method for detecting and eliminating pulse noise in digital audio signal
CN101882442A (en) * 2009-05-04 2010-11-10 上海音乐学院 Historical voice frequency noise detection and elimination method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙科: "中国民族音乐特征提取与分类技术的研究", 《中国优秀硕士学位论文全文数据库哲学与人文科学辑》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104464754A (en) * 2014-12-11 2015-03-25 北京中细软移动互联科技有限公司 Sound brand search method
CN106202128A (en) * 2015-05-08 2016-12-07 富士通株式会社 The sorting technique of sequential file and categorizing system
CN105761728A (en) * 2015-12-02 2016-07-13 中国传媒大学 Chinese typical hearing culture symbol characteristic selection method
CN106372257A (en) * 2016-10-09 2017-02-01 华中师范大学 Retrieval method and device of musical instruments
CN109147816A (en) * 2018-06-05 2019-01-04 安克创新科技股份有限公司 The method and apparatus of volume adjustment is carried out to music
CN109176541A (en) * 2018-09-06 2019-01-11 南京阿凡达机器人科技有限公司 A kind of method, equipment and storage medium realizing robot and dancing
CN109176541B (en) * 2018-09-06 2022-05-06 南京阿凡达机器人科技有限公司 Method, equipment and storage medium for realizing dancing of robot
CN116312636A (en) * 2023-03-21 2023-06-23 广州资云科技有限公司 Method, apparatus, computer device and storage medium for analyzing electric tone key
CN116312636B (en) * 2023-03-21 2024-01-09 广州资云科技有限公司 Method, apparatus, computer device and storage medium for analyzing electric tone key

Similar Documents

Publication Publication Date Title
US10043500B2 (en) Method and apparatus for making music selection based on acoustic features
Hung et al. Frame-level instrument recognition by timbre and pitch
Typke Music retrieval based on melodic similarity
CN102842310A (en) Method for extracting and utilizing audio features for repairing Chinese national folk music audios
Stein et al. Automatic detection of audio effects in guitar and bass recordings
CN102486920A (en) Audio event detection method and device
KR20080054393A (en) Music analysis
Benetos et al. Automatic transcription of pitched and unpitched sounds from polyphonic music
Fuhrmann et al. Polyphonic instrument recognition for exploring semantic similarities in music
Tzanetakis Song-specific bootstrapping of singing voice structure
Kim et al. Deep composer classification using symbolic representation
Nagavi et al. Overview of automatic Indian music information recognition, classification and retrieval systems
Marolt Gaussian Mixture Models For Extraction Of Melodic Lines From Audio Recordings.
Dressler Automatic transcription of the melody from polyphonic music
Ramirez et al. Automatic performer identification in celtic violin audio recordings
Viloria et al. Segmentation process and spectral characteristics in the determination of musical genres
Nichols et al. Automatically discovering talented musicians with acoustic analysis of youtube videos
Sha et al. Singing voice timbre classification of Chinese popular music
Kumar et al. Melody extraction from music: A comprehensive study
Poonia et al. Music genre classification using machine learning: A comparative study
Chen Characterization of pitch intonation of Beijing opera
Ashraf et al. Integration of speech/music discrimination and mood classification with audio feature extraction
Schreiber Data-driven approaches for tempo and key estimation of music recordings
Peiris et al. Supervised learning approach for classification of Sri Lankan music based on music structure similarity
Abeßer et al. Instrument-centered music transcription of bass guitar tracks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20121226