CN110265051A - The sightsinging audio intelligent scoring modeling method of education is sung applied to root LeEco - Google Patents
The sightsinging audio intelligent scoring modeling method of education is sung applied to root LeEco Download PDFInfo
- Publication number
- CN110265051A CN110265051A CN201910480919.8A CN201910480919A CN110265051A CN 110265051 A CN110265051 A CN 110265051A CN 201910480919 A CN201910480919 A CN 201910480919A CN 110265051 A CN110265051 A CN 110265051A
- Authority
- CN
- China
- Prior art keywords
- audio
- data
- rhythm
- pitch
- sightsinging
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000033764 rhythmic process Effects 0.000 claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 18
- 238000012360 testing method Methods 0.000 claims abstract description 8
- 239000000284 extract Substances 0.000 claims abstract description 7
- 230000002708 enhancing effect Effects 0.000 claims abstract description 4
- 238000013528 artificial neural network Methods 0.000 claims description 17
- 239000013598 vector Substances 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 7
- 238000001228 spectrum Methods 0.000 description 7
- 238000007596 consolidation process Methods 0.000 description 6
- 230000009466 transformation Effects 0.000 description 5
- 238000007792 addition Methods 0.000 description 4
- 239000004615 ingredient Substances 0.000 description 4
- 238000000354 decomposition reaction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000006073 displacement reaction Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000001965 increasing effect Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000003990 capacitor Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000000452 restraining effect Effects 0.000 description 1
- 210000000697 sensory organ Anatomy 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/04—Real-time or near real-time messaging, e.g. instant messaging [IM]
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Quality & Reliability (AREA)
- Computer Networks & Wireless Communication (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
The present invention relates to a kind of sightsinging audio intelligent scoring modeling methods that education is sung applied to root LeEco, step 1: the solfege data comprising expert analysis mode that system is collected in advance divide, data are divided by 2:1,2 parts therein are used as training data, and 1 part is test data;Step 2: audio data is denoised, and cuts out the blank segment of no audio, carries out the data prediction of voice enhancing;Step 3: after audio data is pre-processed, audio frequency characteristics is extracted using mel cepstrum coefficients method, extract pitch information;Step 4: high pitch information is extracted into frequency domain character using Short Time Fourier Transform, beat information wherein included is extracted, forms the feature based on rhythm.Step 5: based on characteristic informations such as pitch, rhythm, scoring modeling is carried out.The ability that the present invention helps user to promote oneself music sightsinging aspect.
Description
Technical field
The present invention relates to a kind of sightsinging audio intelligent scoring modeling methods that education is sung applied to root LeEco.
Background technique
This system realizes that user recording and audio file upload, into system background server, to solfege audio into
Row intelligent scoring, and appraisal result is fed back into client.Intelligent scoring module application machine learning modeling, by comparison
The difference of voice and standard pronunciation in acoustic frequency is judged respectively from two angles of rhythm and accuracy in pitch, is precisely commented to realize
It surveys, and result is fed back into user, the ability for helping user to promote oneself music sightsinging aspect.
Summary of the invention
The object of the present invention is to provide a kind of sightsinging audio intelligent scoring modeling sides that education is sung applied to root LeEco
Method, the ability for helping user to promote oneself music sightsinging aspect.
Above-mentioned purpose is realized by following technical scheme:
A kind of sightsinging audio intelligent scoring modeling method for singing education applied to root LeEco, the acquisition of data and pre- place
Reason the following steps are included:
Step 1: the solfege data comprising expert analysis mode that system is collected in advance divide, and data are pressed 2:1
It divides, 2 parts therein are used as training data, and 1 part is test data, are modeled using training data;
Step 2: audio data is denoised, and cuts out the blank segment of no audio, and the data for carrying out voice enhancing are located in advance
Reason;
Step 3: audio data is extracted into audio frequency characteristics using mel cepstrum coefficients method, extracts pitch information;
Step 4: extracting frequency domain character using Short Time Fourier Transform for audio data, extracts beat letter wherein included
Breath forms the feature based on rhythm;
Step 5: standard audio is extracted into accuracy in pitch and rhythm characteristic according to step 2 to step 4;
Step 6: standard audio and solfege audio are used and are based on dynamic time adjustment algorithm, is fallen to based on Meier
The accuracy in pitch feature that spectral coefficient method obtains, is compared;
Step 7: standard audio and solfege audio are used, algorithm is scaled based on linear Hash, to based on Fu in short-term
In leaf method obtain rhythm characteristic be compared;
Step 8: using the matching vector of the pitch of acquisition and rhythm as training data, using training neural network, when
When test data set error rate is less than 1%, verification process terminates;
Step 9: using the client end interface of wechat small routine, sightsinging audio when upload user individual practices, to these
The audio of upload carries out step 2 to step 4 and step 6, and the processing of step 7 inputs trained neural network later
Model exports corresponding rhythm by neural network, accuracy in pitch scores;By the corresponding rhythm of neural network output, the scoring knot of accuracy in pitch
Fruit exports to the interface of wechat small routine, flashes in client;
Step 10: corresponding accuracy in pitch vector sum rhythm vector is returned into user client interface.
The utility model has the advantages that
1. scoring effect of the invention can achieve the level of profession scoring, the scoring mean value error with multidigit human expert
It is smaller.
2. scoring operation efficiency of the invention is higher, multiple angle scoring process can be completed within 5 seconds, reach industry
Application requirement.
3. anti-noise ability of the invention is stronger, also can preferably score in the case where there is certain ambient noise.
4. scoring process of the invention merges various features, score can be judged from multiple angles such as rhythm, accuracy in pitch.
Detailed description of the invention
Attached drawing 1 is training process schematic diagram of the invention.
Attached drawing 2 is scoring process schematic diagram of the invention.
Attached drawing 3 is the dimensional variation schematic diagram of the filter group of melscale of the invention.
Attached drawing 4 is of the invention by signal decomposition, and the convolution of two signals is converted into the addition schematic diagram of two signals.
Attached drawing 5 is the similitude schematic diagram between the present invention two time serieses of calculating.
Attached drawing 6 is cost matrix schematic diagram of the invention.
Attached drawing 7 is that audio of the invention does after Fourier transformation overfrequency to separate unlike signal schematic diagram.
Specific embodiment
A kind of sightsinging audio intelligent scoring modeling method for singing education applied to root LeEco, it is characterized in that: data
Obtain and pretreatment the following steps are included:
Step 1: the solfege data comprising expert analysis mode that system is collected in advance divide, and data are pressed 2:1
It divides, 2 parts therein are used as training data, and 1 part is test data, are modeled using training data;
Step 2: audio data is denoised, and cuts out the blank segment of no audio, and the data for carrying out voice enhancing are located in advance
Reason;
Step 3: audio data is extracted into audio frequency characteristics using mel cepstrum coefficients method, extracts pitch information;
Step 4: extracting frequency domain character using Short Time Fourier Transform for audio data, extracts beat letter wherein included
Breath forms the feature based on rhythm;
Step 5: standard audio is extracted into accuracy in pitch and rhythm characteristic according to step 2 to step 4;
Step 6: standard audio and solfege audio are used and are based on dynamic time adjustment algorithm, is fallen to based on Meier
The accuracy in pitch feature that spectral coefficient method obtains, is compared;
Step 7: standard audio and solfege audio are used, algorithm is scaled based on linear Hash, to based on Fu in short-term
In leaf method obtain rhythm characteristic be compared;
Step 8: using the matching vector of the pitch of acquisition and rhythm as training data, using training neural network, when
When test data set error rate is less than 1%, verification process terminates;
The neural network training process, which includes: (1), selects important parameter, including activation primitive according to data characteristic,
The hidden layer quantity of neural network, each hidden neuron number of nodes, learning rate etc.;(2) feature and mark extracted training data
Range data after quasi- audio frequency characteristics comparison is as two vectors, using the professional score data that expert gives as prediction target,
Carry out the training of neural network.Target value is approached by neural network using back-propagation algorithm;Neural network after training iteration
The target of output and the error of expert analysis mode are less than certain threshold value, when test data set error rate is less than 1%, verification process
Terminate;If it exceeds 10,000 iteration cannot approach the error range of target, then (1) is returned to, readjusts setting for important parameter
It is fixed;
Step 9: using the client end interface of wechat small routine, sightsinging audio when upload user individual practices, to these
The audio of upload carries out step 2 to step 4 and step 6, and the processing of step 7 inputs trained neural network later
Model exports corresponding rhythm by neural network, accuracy in pitch scores;By the corresponding rhythm of neural network output, the scoring knot of accuracy in pitch
Fruit exports to the interface of wechat small routine, flashes in client;
Step 10: corresponding accuracy in pitch vector sum rhythm vector is returned into user client interface.
Further, it is based on step 6, the pitch in the main piano pitch compared in standard audio and sightsinging audio is high
Low variation matching degree, the method for having used linear pitch calibration here, first carries out linear scale for the pitch of voice and piano,
Ensure that its average energy value is identical, on this basis in comparing audio sequence change in pitch matching vector.
Further, it is based on step 7, the rhythm in the main piano rhythm compared in standard audio and sightsinging audio is fast
Slow variation matching degree.The rhythm of voice is carried out linear scale by the method for having used the calibration of linear rhythm here, it is ensured that its with
The tempo variation rate of piano is identical, on this basis in comparing audio sequence tempo variation matching vector.
Further, it is based on step 10, interface parsed, and label is right in the sightsinging corresponding music score of Chinese operas position of song
The poor position of user's matching degree carries out marking red annotation.
Mel-frequency cepstrum coefficient is exactly the coefficient for forming mel-frequency cepstrum, mel-frequency cepstrum coefficient feature extraction packet
Containing two committed steps: being transformed into mel-frequency, then carry out cepstral analysis.
Further, the mel-frequency be it is a kind of based on human ear to the judgement of the sense organ of equidistant change in pitch depending on
Non-linear frequency scale;It is as follows with the relationship of the hertz of frequency:
So if it on melscale be uniform indexing, for the distance between hertz will be increasing, institute
With dimensional variation such as Fig. 1 of the filter group of melscale,
In the high resolution of low frequency part, the auditory properties with human ear are consistent the filter group of melscale, this
It is the physical significance place of melscale.
This step is meant that: being carried out Fourier transformation to time-domain signal first and is transformed into frequency domain, then recycles Meier
The filter group of frequency scale corresponds to frequency-region signal and carries out cutting, the last corresponding numerical value of each frequency band.
Cepstral analysis does Fourier transformation to time-domain signal, then takes log, then carries out inversefouriertransform again.It can be with
It is divided into cepstrum, real cepstrum and power cepstrum, ours is power cepstrum;
Cepstral analysis is available: by signal decomposition, the convolution of two signals is converted into the addition of two signals.
Cepstral analysis is available: by signal decomposition, the convolution of two signals is converted into the addition of two signals.It is exemplified below:
Assuming that frequency spectrum X (k) above, time-domain signal is that x (n) so meets:
X (k)=DFT (x (n))
Consider frequency domain X (k) being split as two-part product:
X (k)=H (k) E (k)
Assuming that the corresponding time-domain signal of two parts is h (n) and e (n) respectively, then meeting:
X (n)=h (n) * e (n)
We are to cannot be distinguished to open h (n) and e (n) at this time.
Log is taken to frequency domain both sides:
Log (X (k))=log (H (k))+log (E (k))
Then inversefouriertransform is carried out:
IDFT (log (X (k)))=IDFT (log (H (k)))+IDFT (log (E (k)))
Assuming that the time-domain signal obtained at this time is as follows:
X'(n)=h'(n)+e'(n)
Although obtaining time-domain signal x ' (n) at this time is cepstrum, different with original time-domain signal x (n),
The convolution relation of time-domain signal can be converted to linearly add relationship.
The frequency-region signal of corresponding upper figure, can split into two-part product: the envelope of frequency spectrum and the details of frequency spectrum.Frequency spectrum
Peak value be formant, it determines the envelope of signal frequency domain, be distinguish sound important information, so carry out cepstral analysis
Purpose is exactly to obtain the envelope information of frequency spectrum.It is the low-frequency information of frequency spectrum that envelope part is corresponding, and detail section is corresponding is
The high-frequency information of frequency spectrum.Cepstral analysis closes the convolution relation conversion of the corresponding time-domain signal of two parts to linearly add
System, so only needing cepstrum can be obtained corresponding time-domain signal h ' (t) in envelope part by a low-pass filter.
Accuracy in pitch based on dynamic time adjustment algorithm compares, and is the side of the similarity between a kind of two time serieses of measurement
Method is mainly used in field of speech recognition to identify that two sections of voices indicate whether the same word;
In time series, the length for needing to compare two sections of time serieses of similitude is possible and unequal, knows in voice
The word speed that other field shows as different people is different.And the rate of articulation of the different phonemes in the same word is also different, such as
Somebody can drag " A " this sound very long, or the very short of " i " hair.In addition, when different time sequence may only exist
Between displacement on axis, that is, in the case where restoring displacement, two time serieses are consistent.In these complex cases, make
The distance between two time serieses that can not be effectively asked with traditional Euclidean distance/similitude;
DTW is by extending time series and shortened, and to calculate the similitude between two time serieses, such as schemes
3。
Enabling two time serieses that calculate similarity is X and Y, and length is respectively | X | and | Y |,
The form in consolidation path be W=w1, w2 ..., wK, wherein Max (| X |, | Y |)≤K≤| X |+| Y |,
The form of wk is (i, j), and wherein what i was indicated is the i coordinate in X, and what j was indicated is the j coordinate in Y,
Consolidation path W must start from w1=(1,1), to wK=(| X |, | Y |) ending, to guarantee each seat in X and Y
Mark all occurs in W,
In addition, the i and j of w (i, j) must be increased monotonically in W, will not be intersected with the dotted line guaranteed in Fig. 1, so-called list
It adjusts plus refers to:
wk=(i, j), wk+1=(i', j')
i≤i'≤i+1,j≤j'≤j+1
Obtained consolidation path is apart from a shortest consolidation path:
D (i, j)=Dist (i, j)+min [D (i-1, j), D (i, j-1), D (i-1, j-1)]
The consolidation path distance finally acquired be D (| X |, | Y |).
It is solved using Dynamic Programming such as Fig. 4, is cost matrix (Cost Matrix) D, D (i, j) indicates that length is
Consolidation path distance between two time serieses of i and j.
Audio is extracted based on the frequency domain character of Fourier transformation to do after Fourier transformation and can separate different letters with overfrequency
Number, the core concept of Fourier is exactly that all waves can be indicated with multiple sine-wave superimposeds, and wave here includes from sound
Sound to light all waves, so, the signal of several frequencies can be separated by doing Fourier's series to a collected sound.
Fourier transform is a kind of method for analyzing signal, it can analyze the ingredient of signal, it is also possible to the synthesis of these ingredients
Signal.It indicate can by some function representation met certain condition at trigonometric function (sinusoidal and/or cosine function) or it
Integral linear combination.Many waveforms can be used as ingredient of signal, such as sine wave, square wave, sawtooth wave etc., and Fourier becomes
Use ingredient of the sine wave as signal instead.
Frequency domain can also encounter frequency domain, frequency domain in high-speed figure application with more especially in radio frequency and communication system
Most important property is: it is not true, a Mathematics structural.Time domain is the domain of only objective reality, and frequency domain is
One follows the mathematics scope of ad hoc rules.
Sine wave is unique existing waveform in frequency domain, this is most important rule in frequency domain, i.e. sine wave is to frequency domain
Description because any waveform in time domain can all be synthesized with sine wave.This is a very important property of sine wave.So
And it is not the monopolizing characteristic of sine wave, there are many more other waveforms also to have the property that.Use sine wave as frequency
Functional form in domain has its special place.If using sine wave, some problems relevant to the electric effect of interconnection line
It will become more clearly understood from and solve.If transforming to frequency domain and being described using sine wave, sometimes than only in the time domain can
Quickly obtain answer.
And in practice, it initially sets up comprising resistance, the circuit of inductance and capacitor, and input random waveform.Ordinary circumstance
Under, the waveform of a similar sine wave will be obtained.Moreover, can easily describe these waves with the combination of several sine waves
Shape.
Rhythm based on linear Hash scaling is compared to be needed to hum rotation in view of user from rhythm direction two section audios of comparison
The difference for restraining speed, can quickly calculate the linear range between two different length sequences using linear extendible algorithm, groan
Singing in rhythm scoring using the main reason for linear extendible algorithm is since the humming rate of user and the performance of original song are fast
Rate is inconsistent, by linear extendible, humming segment can be stretched or be compressed, and is to keep one with the rate of original song
It causes, the fundamental frequency sequence extracted in humming segment is carried out different degrees of stretch with identical contraction-expansion factor by being critical that for this algorithm
Then contracting calculates the rhythm comparison of corresponding original song.
The humming rate and inconsistent problem of original song rate can solve using linear extendible algorithm, but this algorithm
The premise of reliability is that humming rate and original song rate are completely proportional, that is, is not in sometimes slow phenomenon fastly sometimes,
If humming rate is variation, will be gone wrong using linear extendible algorithm.
Certainly, the above description is not a limitation of the present invention, and the present invention is also not limited to the example above, this technology neck
The variations, modifications, additions or substitutions that the technical staff in domain is made within the essential scope of the present invention also should belong to of the invention
Protection scope.
Claims (4)
- The modeling method 1. a kind of sightsinging audio intelligent for singing education applied to root LeEco scores, it is characterized in that: data obtain Take and pre-process the following steps are included:Step 1: the solfege data comprising expert analysis mode that system is collected in advance divide, and data are drawn by 2:1 Point, 2 parts therein are used as training data, and 1 part is test data, are modeled using training data;Step 2: audio data is denoised, and cuts out the blank segment of no audio, carries out the data prediction of voice enhancing;Step 3: audio data is extracted into audio frequency characteristics using mel cepstrum coefficients method, extracts pitch information;Step 4: audio data is extracted into frequency domain character using Short Time Fourier Transform, extracts beat information wherein included, shape At the feature based on rhythm;Step 5: standard audio is extracted into accuracy in pitch and rhythm characteristic according to step 2 to step 4;Step 6: standard audio and solfege audio are used and are based on dynamic time adjustment algorithm, to based on mel cepstrum system The accuracy in pitch feature that number method obtains, is compared;Step 7: standard audio and solfege audio are used, algorithm is scaled based on linear Hash, to based on Fourier in short-term The rhythm characteristic that method obtains is compared;Step 8: using the matching vector of the pitch of acquisition and rhythm as training data, using training neural network, when testing When data set error rate is less than 1%, verification process terminates;Step 9: using the client end interface of wechat small routine, sightsinging audio when upload user individual practices, to these uploads Audio carry out step 2 to step 4, and Step 6: step 7 processing, input trained neural network mould later Type exports corresponding rhythm by neural network, accuracy in pitch scores;By the corresponding rhythm of neural network output, the appraisal result of accuracy in pitch It exports to the interface of wechat small routine, flashes in client;Step 10: corresponding accuracy in pitch vector sum rhythm vector is returned into user client interface.
- 2. the sightsinging audio intelligent scoring modeling method according to claim 1 that education is sung applied to root LeEco, It is characterized in: the pitch height variation matching based on step 6, in the main piano pitch compared in standard audio and sightsinging audio Degree;Here the method for having used linear pitch calibration, first carries out linear scale for the pitch of voice and piano, it is ensured that its energy Mean value is identical, on this basis in comparing audio sequence change in pitch matching vector.
- 3. the sightsinging audio intelligent scoring modeling method according to claim 1 that education is sung applied to root LeEco, Be characterized in: based on step 7, the rhythm speed in the main piano rhythm compared in standard audio and sightsinging audio changes matching Degree, the method for having used linear rhythm calibration here, carries out linear scale for the rhythm of voice, it is ensured that the rhythm of itself and piano Change rate is identical, on this basis in comparing audio sequence tempo variation matching vector.
- 4. the sightsinging audio intelligent scoring modeling method according to claim 1 that education is sung applied to root LeEco, Be characterized in: based on step 10, interface is parsed, and label matches journey in the sightsinging corresponding music score of Chinese operas position of song, to user Poor position is spent to carry out marking red annotation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910480919.8A CN110265051A (en) | 2019-06-04 | 2019-06-04 | The sightsinging audio intelligent scoring modeling method of education is sung applied to root LeEco |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910480919.8A CN110265051A (en) | 2019-06-04 | 2019-06-04 | The sightsinging audio intelligent scoring modeling method of education is sung applied to root LeEco |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110265051A true CN110265051A (en) | 2019-09-20 |
Family
ID=67916665
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910480919.8A Pending CN110265051A (en) | 2019-06-04 | 2019-06-04 | The sightsinging audio intelligent scoring modeling method of education is sung applied to root LeEco |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110265051A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111508526A (en) * | 2020-04-10 | 2020-08-07 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for detecting audio beat information and storage medium |
CN111653152A (en) * | 2020-05-18 | 2020-09-11 | 河南财政金融学院 | Using method of music education and exercise system |
CN113657184A (en) * | 2021-07-26 | 2021-11-16 | 广东科学技术职业学院 | Evaluation method and device for piano playing fingering |
CN113744721A (en) * | 2021-09-07 | 2021-12-03 | 腾讯音乐娱乐科技(深圳)有限公司 | Model training method, audio processing method, device and readable storage medium |
CN114093386A (en) * | 2021-11-10 | 2022-02-25 | 厦门大学 | Education-oriented multi-dimensional singing evaluation method |
CN115796653A (en) * | 2022-11-16 | 2023-03-14 | 中南大学 | Interview speech evaluation method and system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1737796A (en) * | 2005-09-08 | 2006-02-22 | 上海交通大学 | Across type rapid matching method for digital music rhythm |
CN103514866A (en) * | 2012-06-28 | 2014-01-15 | 曾平蔚 | Method and device for instrumental performance grading |
CN104143340A (en) * | 2014-07-28 | 2014-11-12 | 腾讯科技(深圳)有限公司 | Voice frequency evaluation method and device |
CN106250400A (en) * | 2016-07-19 | 2016-12-21 | 腾讯科技(深圳)有限公司 | A kind of audio data processing method, device and system |
CN106445964A (en) * | 2015-08-11 | 2017-02-22 | 腾讯科技(深圳)有限公司 | Audio information processing method and apparatus |
CN107767847A (en) * | 2017-09-29 | 2018-03-06 | 小叶子(北京)科技有限公司 | A kind of intelligent piano performance assessment method and system |
CN107967827A (en) * | 2017-12-29 | 2018-04-27 | 重庆师范大学 | A kind of music education exercise system and its method |
CN109461431A (en) * | 2018-12-24 | 2019-03-12 | 厦门大学 | The sightsinging mistake music score of Chinese operas mask method of education is sung applied to root LeEco |
CN109584904A (en) * | 2018-12-24 | 2019-04-05 | 厦门大学 | The sightsinging audio roll call for singing education applied to root LeEco identifies modeling method |
-
2019
- 2019-06-04 CN CN201910480919.8A patent/CN110265051A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1737796A (en) * | 2005-09-08 | 2006-02-22 | 上海交通大学 | Across type rapid matching method for digital music rhythm |
CN103514866A (en) * | 2012-06-28 | 2014-01-15 | 曾平蔚 | Method and device for instrumental performance grading |
CN104143340A (en) * | 2014-07-28 | 2014-11-12 | 腾讯科技(深圳)有限公司 | Voice frequency evaluation method and device |
CN106445964A (en) * | 2015-08-11 | 2017-02-22 | 腾讯科技(深圳)有限公司 | Audio information processing method and apparatus |
CN106250400A (en) * | 2016-07-19 | 2016-12-21 | 腾讯科技(深圳)有限公司 | A kind of audio data processing method, device and system |
CN107767847A (en) * | 2017-09-29 | 2018-03-06 | 小叶子(北京)科技有限公司 | A kind of intelligent piano performance assessment method and system |
CN107967827A (en) * | 2017-12-29 | 2018-04-27 | 重庆师范大学 | A kind of music education exercise system and its method |
CN109461431A (en) * | 2018-12-24 | 2019-03-12 | 厦门大学 | The sightsinging mistake music score of Chinese operas mask method of education is sung applied to root LeEco |
CN109584904A (en) * | 2018-12-24 | 2019-04-05 | 厦门大学 | The sightsinging audio roll call for singing education applied to root LeEco identifies modeling method |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111508526A (en) * | 2020-04-10 | 2020-08-07 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for detecting audio beat information and storage medium |
CN111508526B (en) * | 2020-04-10 | 2022-07-01 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for detecting audio beat information and storage medium |
CN111653152A (en) * | 2020-05-18 | 2020-09-11 | 河南财政金融学院 | Using method of music education and exercise system |
CN113657184A (en) * | 2021-07-26 | 2021-11-16 | 广东科学技术职业学院 | Evaluation method and device for piano playing fingering |
CN113657184B (en) * | 2021-07-26 | 2023-11-07 | 广东科学技术职业学院 | Piano playing fingering evaluation method and device |
CN113744721A (en) * | 2021-09-07 | 2021-12-03 | 腾讯音乐娱乐科技(深圳)有限公司 | Model training method, audio processing method, device and readable storage medium |
CN113744721B (en) * | 2021-09-07 | 2024-05-14 | 腾讯音乐娱乐科技(深圳)有限公司 | Model training method, audio processing method, device and readable storage medium |
CN114093386A (en) * | 2021-11-10 | 2022-02-25 | 厦门大学 | Education-oriented multi-dimensional singing evaluation method |
CN115796653A (en) * | 2022-11-16 | 2023-03-14 | 中南大学 | Interview speech evaluation method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110265051A (en) | The sightsinging audio intelligent scoring modeling method of education is sung applied to root LeEco | |
Dhingra et al. | Isolated speech recognition using MFCC and DTW | |
Tiwari | MFCC and its applications in speaker recognition | |
Patel et al. | Speech recognition and verification using MFCC & VQ | |
JP2020524308A (en) | Method, apparatus, computer device, program and storage medium for constructing voiceprint model | |
WO2017088364A1 (en) | Speech recognition method and device for dynamically selecting speech model | |
Prasomphan | Improvement of speech emotion recognition with neural network classifier by using speech spectrogram | |
Jancovic et al. | Bird species recognition using unsupervised modeling of individual vocalization elements | |
CN103996155A (en) | Intelligent interaction and psychological comfort robot service system | |
Li et al. | Speech emotion recognition using 1d cnn with no attention | |
Mansour et al. | Voice recognition using dynamic time warping and mel-frequency cepstral coefficients algorithms | |
CN102521281A (en) | Humming computer music searching method based on longest matching subsequence algorithm | |
WO2020248388A1 (en) | Method and device for training singing voice synthesis model, computer apparatus, and storage medium | |
Sefara | The effects of normalisation methods on speech emotion recognition | |
CN102411932B (en) | Methods for extracting and modeling Chinese speech emotion in combination with glottis excitation and sound channel modulation information | |
CN101178897A (en) | Speaking man recognizing method using base frequency envelope to eliminate emotion voice | |
CN103456302B (en) | A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight | |
CN107767881B (en) | Method and device for acquiring satisfaction degree of voice information | |
CN112002348B (en) | Method and system for recognizing speech anger emotion of patient | |
Tyagi et al. | Automatic identification of bird calls using spectral ensemble average voice prints | |
Wang | Speech recognition of oral English teaching based on deep belief network | |
CN109065073A (en) | Speech-emotion recognition method based on depth S VM network model | |
Piotrowska et al. | Machine learning-based analysis of English lateral allophones | |
CN109452932A (en) | A kind of Constitution Identification method and apparatus based on sound | |
Chien et al. | Evaluation of glottal inverse filtering algorithms using a physiologically based articulatory speech synthesizer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190920 |