CN112863464A - Piano partner training method and system based on audio interaction - Google Patents

Piano partner training method and system based on audio interaction Download PDF

Info

Publication number
CN112863464A
CN112863464A CN202110076077.7A CN202110076077A CN112863464A CN 112863464 A CN112863464 A CN 112863464A CN 202110076077 A CN202110076077 A CN 202110076077A CN 112863464 A CN112863464 A CN 112863464A
Authority
CN
China
Prior art keywords
information
performance
playing
score
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110076077.7A
Other languages
Chinese (zh)
Inventor
夏雨
闫召曦
应笕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiaoyezi Beijing Technology Co ltd
Original Assignee
Xiaoyezi Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaoyezi Beijing Technology Co ltd filed Critical Xiaoyezi Beijing Technology Co ltd
Priority to CN202110076077.7A priority Critical patent/CN112863464A/en
Publication of CN112863464A publication Critical patent/CN112863464A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/121Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/131Mathematical functions for musical analysis, processing, synthesis or composition
    • G10H2250/215Transforms, i.e. mathematical transforms into domains appropriate for musical signal processing, coding or compression
    • G10H2250/235Fourier transform; Discrete Fourier Transform [DFT]; Fast Fourier Transform [FFT]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/311Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The invention discloses a piano partner training method based on audio interaction, which comprises the following steps: acquiring current playing audio data of a user and carrying out format conversion to obtain playing music score data; comparing the performance music score data with reference music score data to generate a current comparison result for representing the current performance condition of the user; and determining error correction content and a subsequent playing plan based on the current comparison result and the historical comparison result, generating prompt information, and prompting the user through voice. The invention also discloses a piano partner training system based on audio interaction. The invention can realize the analysis of the user performance data through the interaction of audio and the user without depending on any screen equipment, and can prompt the user to perform in a voice mode.

Description

Piano partner training method and system based on audio interaction
Technical Field
The invention relates to the technical field of pianos, in particular to a piano partner training method and system based on audio interaction.
Background
In the related art of the piano training system, an intelligent APP is generally installed through a mobile phone or a tablet, the played piano sound is identified through the APP to judge the playing error of the user, and the piano training system corrects and practices according to the playing error. The APP and user interaction interface mainly uses a screen of a mobile phone or a tablet computer, an electronic music score is displayed through the screen, and a place where a performance error occurs is marked on the electronic music score. This approach relies on the interaction of the electronic screen, can injure the eyes of the user for a long time of use, and cannot be applied to a device without a screen, so that the use scene is limited.
Disclosure of Invention
In order to solve the above problems, an object of the present invention is to provide a piano training method and system based on audio interaction, which can process user performance data through audio and user interaction without screen interaction, and can prompt the user through voice.
The invention provides a piano partner training method based on audio interaction, which comprises the following steps:
acquiring current playing audio data of a user and carrying out format conversion to obtain playing music score data;
comparing the performance music score data with reference music score data to generate a current comparison result for representing the current performance condition of the user;
and determining error correction content and a subsequent playing plan based on the current comparison result and the historical comparison result, generating corresponding playing prompt information, and prompting the user through voice so that the user can correspondingly adjust in the current playing process and/or the subsequent playing process.
As a further improvement of the present invention, the acquiring the performance audio data of the user and performing format conversion to obtain the performance music score data includes:
acquiring performance audio data of a user;
performing fast Fourier transform on the performance audio data, and converting the performance audio data into frequency spectrum data, wherein the frequency spectrum data comprises multiple frames of data, and each frame of data is divided according to a preset frame length;
processing the frequency spectrum data through a deep neural network to obtain probability data of a plurality of keys, wherein each key comprises a plurality of frame probabilities, and each frame probability is a number of 0-1 and is used for representing whether the key is pressed down or not;
and converting the probability data of the keys into data with a preset format to obtain the music score data.
As a further improvement of the present invention, for the probability data of each key, when the first frame probability is greater than or equal to the preset probability, the key is regarded as being depressed, otherwise the key is regarded as being lifted.
As a further improvement of the invention, the deep neural network adopts a CNN LSTM network model,
the frequency spectrum data is processed through the deep neural network to obtain probability data of a plurality of keys, and the probability data comprises the following steps:
for each key, performing feature extraction on the frequency spectrum data corresponding to the key through a CNN network, providing the output of the CNN network to an LSTM network as input, and predicting through the LSTM network to obtain probability data of the key;
inputting the features extracted from the spectrum data corresponding to the keys to the LSTM network through the CNN network, and outputting the probability data of the keys through the LSTM network, wherein the probability data of each key is a probability sequence.
As a further improvement of the invention, the score library comprises a plurality of reference scores, for each reference score, the reference score data comprises a plurality of reference slices sliced according to a preset slice length, the plurality of reference slices comprise a plurality of left-hand reference slices, a plurality of right-hand reference slices and a plurality of two-hand reference slices, wherein the plurality of left-hand reference slices are used for representing a plurality of segments of a left-hand performance part of the reference score, the plurality of right-hand reference slices are used for representing a plurality of segments of a right-hand performance part of the reference score, the plurality of two-hand reference slices are used for representing a plurality of segments of a two-hand performance part of the reference score,
the comparing the performance music score data with the reference music score data to generate a current comparison result for representing the current performance condition of the user includes:
extracting a plurality of fragments of the performance music score data, and matching the plurality of fragments with each reference fragment in the music score library to determine a target music score corresponding to the performance music score data from the music score library;
aligning each playing note of the playing music score data with each reference note of the target music score to establish a corresponding relation between each playing note of the playing music score data and each reference note of the target music score;
and extracting performance information corresponding to each fragment in the performance music score data, comparing the performance information with reference information corresponding to the reference fragment at the corresponding position in the target music score, and generating a current comparison result for representing the current performance condition of the user.
As a further improvement of the present invention, the extracting a plurality of slices of the performance score data and matching the plurality of slices with respective reference slices in the score library to determine a target score corresponding to the performance score data from the score library includes:
slicing the performance music score data according to a preset slicing length to obtain a plurality of slices, and extracting a characteristic value of each slice, wherein the plurality of slices comprise a plurality of left-hand slices, a plurality of right-hand slices and a plurality of double-hand slices;
and searching a plurality of reference fragments which are correspondingly matched with the fragments in the music score library, and taking the reference music score with the maximum matching times of the reference fragments as the target music score.
As a further improvement of the invention, for each reference music score, each reference slice comprises reference feature values for characterizing the reference slice,
the searching a plurality of reference fragments which are correspondingly matched with the plurality of fragments in the music score library, and taking the reference music score with the most matching times of the reference fragments as the target music score comprises the following steps:
constructing indexes for reference characteristic values of all reference fragments in the music library;
for each fragment, comparing the characteristic value of the fragment with each reference characteristic value in the index to determine a reference fragment matched with the fragment;
after the multiple slices are matched respectively, determining a reference music score to which the matched multiple reference slices belong;
and sequencing the determined reference music scores according to the times of successful matching of the reference fragments, and taking the reference music score with the most matching times in the sequencing as the target music score.
As a further improvement of the present invention, the extracting performance information corresponding to each segment in the performance music score data, and comparing the performance information with reference information corresponding to a reference segment at a corresponding position in the target music score to generate a current comparison result for representing a current performance condition of a user includes:
extracting corresponding performance information for each fragment, searching a reference fragment at a corresponding position in the target music score, comparing the performance information of the fragment with the reference information corresponding to the searched reference fragment, and acquiring the current comparison result of the fragment, wherein the performance information comprises performance fragment information, performance pitch information, performance rhythm information, performance time value information, performance speed information, performance dynamics information and performance pedal information;
and determining whether the current playing segment, playing intonation, playing rhythm, playing time value, playing speed, playing strength and playing pedal of the user are correct or not according to the comparison result of the plurality of segments.
As a further improvement of the present invention, for each reference score, the performance section information of each reference piece includes the score page number, score line number, score bar number, and left-right hand information of the start and end positions of the score.
As a further improvement of the invention, for each reference score, each reference slice comprises a plurality of reference notes,
aligning each playing note of the playing music score data with each reference note of the target music score to establish a corresponding relation between each playing note of the playing music score data and each reference note of the target music score, and the method comprises the following steps:
extracting each playing note corresponding to each fragment;
searching a reference fragment at a corresponding position in the target music score;
and respectively aligning each playing note in the segment with each reference note in the found reference segment, and determining the corresponding relation between each playing note and each reference note, wherein the corresponding relation comprises the reference note corresponding to the playing note, the empty note corresponding to the playing note and the reference note corresponding to the empty note.
As a further improvement of the present invention, for each reference music score, each reference slice includes reference information including reference segment information, reference pitch information, reference tempo information, reference duration information, reference speed information, reference force information, and reference pedal information,
the extracting of the corresponding performance information for each segment, searching for the reference segment at the corresponding position in the target music score, comparing the performance information of the segment with the reference information corresponding to the searched reference segment, and obtaining the current comparison result of the segment includes:
extracting corresponding performance segment information, performance pitch information, performance rhythm information, performance time value information, performance speed information, performance dynamics information and performance pedal information of each segment;
searching a reference fragment at a corresponding position in the target music score;
and comparing the playing segment information, the playing pitch information, the playing rhythm information, the playing duration information, the playing speed information, the playing force information and the playing pedal information of the fragment with the reference segment information, the reference pitch information, the reference rhythm information, the reference duration information, the reference speed information, the reference force information and the reference pedal information corresponding to the found reference fragment respectively to obtain the current comparison result of the fragment.
As a further improvement of the present invention, the current comparison result comprises a fragment comparison result, a pitch comparison result, a tempo comparison result, a duration comparison result, a speed comparison result, a force comparison result and a pedal comparison result,
the determining error correction content and a subsequent playing plan based on the current comparison result and the historical comparison result, generating corresponding playing prompt information, and prompting a user through voice so that the user can perform corresponding adjustment in the current playing process or the subsequent playing process, includes:
when any one or more of the fragment comparison result, the pitch comparison result, the tempo comparison result, the duration comparison result, the speed comparison result, the strength comparison result and the pedal comparison result exceeds a corresponding comparison threshold;
and determining error correction contents and a subsequent playing plan according to the comparison threshold corresponding to various comparison results and historical comparison results, generating corresponding playing prompt information, and prompting the user through voice so that the user can perform corresponding adjustment in the current playing process or the subsequent playing process.
The invention also provides a piano partner training system based on audio interaction, which comprises:
the musical instrument sound identification module is used for carrying out format conversion on the current playing audio data of the user to obtain playing music score data;
the music score identification module is used for extracting a plurality of fragments of the performance music score data and determining a target music score corresponding to the performance music score data based on the plurality of fragments;
a score alignment module, configured to align each playing note of the playing score data with each reference note of the target score to establish a corresponding relationship between each playing note of the playing score data and each reference note of the target score;
the error judgment module is used for extracting performance information corresponding to each fragment in the performance music score data, comparing the performance information with reference information corresponding to a reference fragment at a corresponding position in the target music score, and generating a current comparison result for representing the current performance condition of the user;
the prompt strategy module is used for determining error correction content and a subsequent playing plan based on the current comparison result and the historical comparison result and generating corresponding playing prompt information;
and the voice prompt module is used for prompting the user through voice so that the user can correspondingly adjust in the current playing process and/or the subsequent playing process.
As a further improvement of the present invention, the piano sound identification module comprises:
acquiring performance audio data of a user;
performing fast Fourier transform on the performance audio data, and converting the performance audio data into frequency spectrum data, wherein the frequency spectrum data comprises multiple frames of data, and each frame of data is divided according to a preset frame length;
processing the frequency spectrum data through a deep neural network to obtain probability data of a plurality of keys, wherein each key comprises a plurality of frame probabilities, and each frame probability is a number of 0-1 and is used for representing whether the key is pressed down or not;
and converting the probability data of the keys into data with a preset format to obtain the music score data.
As a further improvement of the present invention, for the probability data of each key, when the first frame probability is greater than or equal to the preset probability, the key is regarded as being depressed, otherwise the key is regarded as being lifted.
As a further improvement of the invention, the deep neural network adopts a CNN LSTM network model,
the frequency spectrum data is processed through the deep neural network to obtain probability data of a plurality of keys, and the probability data comprises the following steps:
for each key, performing feature extraction on the frequency spectrum data corresponding to the key through a CNN network, providing the output of the CNN network to an LSTM network as input, and predicting through the LSTM network to obtain probability data of the key;
inputting the features extracted from the spectrum data corresponding to the keys to the LSTM network through the CNN network, and outputting the probability data of the keys through the LSTM network, wherein the probability data of each key is a probability sequence.
As a further improvement of the invention, the score library comprises a plurality of reference scores, for each reference score, the reference score data comprises a plurality of reference slices sliced according to a preset slice length, the plurality of reference slices comprise a plurality of left-hand reference slices, a plurality of right-hand reference slices and a plurality of two-hand reference slices, wherein the plurality of left-hand reference slices are used for representing a plurality of segments of a left-hand performance part of the reference score, the plurality of right-hand reference slices are used for representing a plurality of segments of a right-hand performance part of the reference score, the plurality of two-hand reference slices are used for representing a plurality of segments of a two-hand performance part of the reference score,
the music score identification module comprises:
slicing the performance music score data according to a preset slicing length to obtain a plurality of slices, and extracting a characteristic value of each slice, wherein the plurality of slices comprise a plurality of left-hand slices, a plurality of right-hand slices and a plurality of double-hand slices;
and searching a plurality of reference fragments which are correspondingly matched with the fragments in the music score library, and taking the reference music score with the maximum matching times of the reference fragments as the target music score.
As a further improvement of the invention, for each reference music score, each reference slice comprises reference feature values for characterizing the reference slice,
the searching a plurality of reference fragments which are correspondingly matched with the plurality of fragments in the music score library, and taking the reference music score with the most matching times of the reference fragments as the target music score comprises the following steps:
constructing indexes for reference characteristic values of all reference fragments in the music library;
for each fragment, comparing the characteristic value of the fragment with each reference characteristic value in the index to determine a reference fragment matched with the fragment;
after the multiple slices are matched respectively, determining a reference music score to which the matched multiple reference slices belong;
and sequencing the determined reference music scores according to the times of successful matching of the reference fragments, and taking the reference music score with the most matching times in the sequencing as the target music score.
As a further improvement of the present invention, the error determination module includes:
extracting corresponding performance information for each fragment, searching a reference fragment at a corresponding position in the target music score, comparing the performance information of the fragment with the reference information corresponding to the searched reference fragment, and acquiring the current comparison result of the fragment, wherein the performance information comprises performance fragment information, performance pitch information, performance rhythm information, performance time value information, performance speed information, performance dynamics information and performance pedal information;
and determining whether the current playing segment, playing intonation, playing rhythm, playing time value, playing speed, playing strength and playing pedal of the user are correct or not according to the comparison result of the plurality of segments.
As a further improvement of the present invention, for each reference score, the performance section information of each reference piece includes the score page number, score line number, score bar number, and left-right hand information of the start and end positions of the score.
As a further improvement of the invention, for each reference score, each reference slice comprises a plurality of reference notes,
the music score alignment module comprises:
extracting each playing note corresponding to each fragment;
searching a reference fragment at a corresponding position in the target music score;
and respectively aligning each playing note in the segment with each reference note in the found reference segment, and determining the corresponding relation between each playing note and each reference note, wherein the corresponding relation comprises the reference note corresponding to the playing note, the empty note corresponding to the playing note and the reference note corresponding to the empty note.
As a further improvement of the present invention, for each reference music score, each reference slice includes reference information, wherein the reference information includes reference segment information, reference pitch treble information, reference tempo information, reference duration information, reference velocity information, reference force information, and reference pedal information,
the extracting of the corresponding performance information for each segment, searching for the reference segment at the corresponding position in the target music score, comparing the performance information of the segment with the reference information corresponding to the searched reference segment, and obtaining the current comparison result of the segment includes:
extracting corresponding performance segment information, performance pitch information, performance rhythm information, performance time value information, performance speed information, performance dynamics information and performance pedal information of each segment;
searching a reference fragment at a corresponding position in the target music score;
and comparing the playing segment information, the playing pitch information, the playing rhythm information, the playing duration information, the playing speed information, the playing force information and the playing pedal information of the fragment with the reference segment information, the reference pitch information, the reference rhythm information, the reference duration information, the reference speed information, the reference force information and the reference pedal information corresponding to the found reference fragment respectively to obtain the current comparison result of the fragment.
As a further improvement of the invention, the comparison results include a fragment comparison result, a pitch comparison result, a tempo comparison result, a duration comparison result, a speed comparison result, a force comparison result and a pedal comparison result,
the prompt strategy module comprises:
when any one or more of the fragment comparison result, the pitch comparison result, the tempo comparison result, the duration comparison result, the speed comparison result, the strength comparison result and the pedal comparison result exceeds a corresponding comparison threshold;
and determining error correction contents and a subsequent playing plan according to the comparison threshold corresponding to various comparison results and historical comparison results, generating corresponding playing prompt information, and prompting the user through voice so that the user can perform corresponding adjustment in the current playing process or the subsequent playing process.
The invention also provides an electronic device comprising a memory and a processor, the memory storing one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method.
The invention also provides a computer-readable storage medium having stored thereon a computer program for execution by a processor to perform the method.
The invention has the beneficial effects that: through the interaction of audio and the user, the processing of the performance data of the user can be realized without depending on any screen equipment, and the performance prompt can be carried out on the user in a voice mode.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a flowchart illustrating a piano partner training method based on audio interaction according to an exemplary embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.
In addition, in the description of the present invention, the terms used are for illustrative purposes only and are not intended to limit the scope of the present invention. The terms "comprises" and/or "comprising" are used to specify the presence of stated elements, steps, operations, and/or components, but do not preclude the presence or addition of one or more other elements, steps, operations, and/or components. The terms "first," "second," and the like may be used to describe various elements, not necessarily order, and not necessarily limit the elements. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. These terms are only used to distinguish one element from another. These and/or other aspects will become apparent to those of ordinary skill in the art in view of the following drawings, and the description of the embodiments of the present invention will be more readily understood by those of ordinary skill in the art. The drawings are only for purposes of illustrating the described embodiments of the invention. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated in the present application may be employed without departing from the principles described in the present application.
The piano partner training method based on audio interaction in the embodiment of the invention is shown in fig. 1, and comprises the following steps:
acquiring playing audio data of a user and carrying out format conversion to obtain playing music score data;
comparing the performance music score data with reference music score data to generate a current comparison result for representing the current performance condition of the user;
and determining error correction content and a subsequent playing plan based on the current comparison result and the historical comparison result, generating corresponding playing prompt information, and prompting the user through voice so that the user can correspondingly adjust in the current playing process and/or the subsequent playing process.
In the related art of the piano training system, an intelligent APP is generally installed through a mobile phone or a tablet, the played piano sound is identified through the APP to judge the playing error of the user, and the piano training system corrects and practices according to the playing error. The APP and user interaction interface mainly uses a screen of a mobile phone or a tablet computer, an electronic music score is displayed through the screen, and a place where a performance error occurs is marked on the electronic music score. This approach relies on the interaction of the electronic screen, can injure the eyes of the user for a long time of use, and cannot be applied to a device without a screen, so that the use scene is limited.
The method can be applied to equipment without a screen through audio and user interaction, and is suitable for more application scenes. The hardware carrier may be, for example, a smart speaker, a mobile phone, a tablet, or the like. The hardware carrier is connected with the server and the piano through the internet, and transmission among data is achieved. The audio data played by the user is transmitted to the server through the Internet for calculation, and the calculation record is transmitted to the hardware carrier through the Internet for playing. Use intelligent audio amplifier as an example, audio amplifier and server pass through internet connection, and the audio data that the user played passes through the microphone output on the piano to transmit to the server through the internet and calculate, the server is through calculating the back of comparing, and direct audio amplifier of prompt message through internet transmission that obtains, broadcast prompt message through audio amplifier loudspeaker.
In an optional implementation manner, the acquiring performance audio data of a user and performing format conversion to obtain performance music score data includes:
acquiring performance audio data of a user;
performing fast Fourier transform on the performance audio data, and converting the performance audio data into frequency spectrum data, wherein the frequency spectrum data comprises multiple frames of data, and each frame of data is divided according to a preset frame length;
processing the frequency spectrum data through a deep neural network to obtain probability data of a plurality of keys, wherein each key comprises a plurality of frame probabilities, and each frame probability is a number of 0-1 and is used for representing whether the key is pressed down or not;
and converting the probability data of the keys into data with a preset format to obtain the music score data.
It is to be understood that the performance audio data is performance data of the user, for example, data in WAV format, and the format of the performance audio data is not particularly limited by the present invention. When converting into spectrum data (for example, spectrum format), length division may be performed according to the length of the data, for example, spectrum data with a length of 32ms is used as one frame of data. When the probability data is output through the deep neural network, each key corresponds to one probability data, for example, for a piano with 88 keys, the probability data corresponding to 88 keys is output, and the probability data is used for expressing the probability that 88 keys are pressed down respectively. Since the frequency spectrum data is multi-frame data, for each frame of data, a probability is generated for each key, and the multi-frame probability sequence data is probability data corresponding to the key. The preset format is, for example, a MIDI format, and the preset format is not specifically limited in the present invention.
In an alternative embodiment, for the probability data of each key, the key is regarded as being depressed when the first frame probability is greater than or equal to the preset probability, and the key is regarded as being lifted otherwise.
For example, when the preset format is the MIDI format, for the probability data of the same key, when the first frame probability is greater than or equal to 0.5, the key is considered to be pressed, and the converted MIDI data can be played through the sound box. The preset probability can be adaptively designed, and the present invention is not particularly limited.
In an alternative embodiment, the deep neural network employs a CNN LSTM network model,
the frequency spectrum data is processed through the deep neural network to obtain probability data of a plurality of keys, and the probability data comprises the following steps:
for each key, performing feature extraction on the frequency spectrum data corresponding to the key through a CNN network, providing the output of the CNN network to an LSTM network as input, and predicting through the LSTM network to obtain probability data of the key;
providing the features extracted from the spectrum data corresponding to the keys to the LSTM input through the CNN network, and outputting probability data of the keys through the LSTM network, wherein the probability data of each key is a probability sequence.
It will be appreciated that CNN is a convolutional neural network and LSTM is a long-short term memory network, and the present invention combines the two networks, using the CNN network to extract features and the LSTM network to generate descriptions. And performing feature extraction on the frequency spectrum data through a CNN network, wherein the feature extraction comprises feature extraction of the frequency spectrum data corresponding to each key. After the frequency spectrum data corresponding to each key is input into the CNN network, the output of the CNN network is transmitted to the LSTM network, and probability data corresponding to the key is predicted through the LSTM network. For a plurality of keys, characteristics of frequency spectrum data corresponding to the keys are sequentially provided for the LSTM network through the CNN network, and the LSTM network outputs probability data of the keys. It can also be understood that, for each key, the spectral data corresponding to the key, which is multi-frame data, is input to the CNN network, and the LSTM network outputs the probability data corresponding to the key, which is a series of probabilities corresponding to the probabilities predicted by the multi-frame spectral data, respectively.
In an optional embodiment, the score library includes a plurality of reference scores, and for each reference score, the reference score data includes a plurality of reference slices sliced according to a preset slice length, the plurality of reference slices include a plurality of left-hand reference slices, a plurality of right-hand reference slices and a plurality of two-hand reference slices, wherein the plurality of left-hand reference slices are used for representing a plurality of slices of a left-hand performance part of the reference score, the plurality of right-hand reference slices are used for representing a plurality of slices of a right-hand performance part of the reference score, the plurality of two-hand reference slices are used for representing a plurality of slices of a two-hand performance part of the reference score,
the comparing the performance music score data with the reference music score data to generate a current comparison result for representing the current performance condition of the user includes:
extracting a plurality of fragments of the performance music score data, and matching the plurality of fragments with each reference fragment in the music score library to determine a target music score corresponding to the performance music score data from the music score library;
aligning each playing note of the playing music score data with each reference note of the target music score to establish a corresponding relation between each playing note of the playing music score data and each reference note of the target music score;
and extracting performance information corresponding to each fragment in the performance music score data, comparing the performance information with the performance information corresponding to the reference fragment at the corresponding position in the target music score, and generating a current comparison result for representing the current performance condition of the user.
The method of the invention carries out fragmentation processing on each reference music score in the music score library, wherein the fragmentation processing can be understood as that the music score is segmented according to the preset fragmentation length, for example, fragmentation is carried out according to 4 tones, 5 tones, 6 tones or 7 tones, and the like. It can also be understood that, when the same initial score is sliced, the same slice length can be used for slicing to obtain a plurality of slices with the same length, and the different slice lengths can be used for slicing to obtain a plurality of slices with different lengths. However, it should be noted that when the slicing processing is performed at different slicing lengths, each initial score needs to be processed identically, so that the subsequent matching process is more convenient, and the matching efficiency is improved. Correspondingly, the method of the invention also needs to perform slicing processing on the music score. It is understood that the playing music score is subjected to the same slicing processing as the reference music scores in the music score library, so that the corresponding reference slicing can be found in the music score library by the slicing in the playing music score data. After the target music score is determined, the method aligns the musical notes of the playing music score and the musical notes of the target music score so as to ensure that the comparison process of the subsequent segments can be smoothly carried out. The comparison efficiency and the comparison accuracy can be improved through the fragment comparison during the comparison of the performance information of the music scores.
In an alternative embodiment, the extracting a plurality of slices of the performance score data and matching the plurality of slices with respective reference slices in the score library to determine a target score corresponding to the performance score data from the score library includes:
slicing the performance music score data according to a preset slicing length to obtain a plurality of slices, and extracting a characteristic value of each slice, wherein the plurality of slices comprise a plurality of left-hand slices, a plurality of right-hand slices and a plurality of double-hand slices;
and searching a plurality of reference fragments which are correspondingly matched with the fragments in the music score library, and taking the reference music score with the maximum matching times of the reference fragments as the target music score.
It can be understood that, for each reference score in the score library, the method of the present invention may perform slicing processing according to the left hand data, the right hand data, and the two hand data, so as to obtain each left hand reference slice corresponding to the left hand data, each right hand reference slice corresponding to the right hand data, and each two hand reference slice corresponding to the two hand data. The slicing processing of the left-hand data, the right-hand data and the two-hand data can be respectively realized by the slicing processing method, and the description is omitted here. Correspondingly, the music score needs to be sliced by the left hand, the right hand and both hands, and the slicing processing method is the same as that described above, and is not described again here.
In an alternative embodiment, for each reference score, each reference slice comprises reference feature values for characterizing the reference slice,
the searching a plurality of reference fragments which are correspondingly matched with the plurality of fragments in the music score library, and taking the reference music score with the most matching times of the reference fragments as the target music score comprises the following steps:
constructing indexes for reference characteristic values of all reference fragments in the music library;
for each fragment, comparing the characteristic value of the fragment with each reference characteristic value in the index to determine a reference fragment matched with the fragment;
after the multiple slices are matched respectively, determining a reference music score to which the matched multiple reference slices belong;
and sequencing the determined reference music scores according to the times of successful matching of the reference fragments, and taking the reference music score with the most matching times in the sequencing as the target music score.
When the method of the invention is used for matching the segments, for example, the average pitch and variance of each note in the segment are used as the characteristic values of the segment, the same characteristic value extraction is carried out on each reference segment and each segment of the music score data, the characteristic values of the segments and the reference segments are compared, when the comparison result is in a preset range, the successful matching is determined, and the preset range is not specifically limited. For example, the number of reference slices for which the first reference score is successfully matched is 8, the number of reference slices for which the second reference score is successfully matched is 5, the number of reference slices for which the third reference score is successfully matched is 2, and the first reference score is taken as the target score.
In an optional implementation manner, the extracting performance information corresponding to each segment in the performance music score data, and comparing the performance information with the performance information corresponding to the reference segment at the corresponding position in the target music score to generate a current comparison result for characterizing a current performance condition of the user includes:
extracting corresponding performance information for each fragment, searching a reference fragment at a corresponding position in the target music score, comparing the performance information of the fragment with the reference information corresponding to the searched reference fragment, and acquiring the current comparison result of the fragment, wherein the performance information comprises performance fragment information, performance pitch information, performance rhythm information, performance time value information, performance speed information, performance dynamics information and performance pedal information;
and determining whether the current playing segment, playing pitch, playing rhythm, playing time value, playing speed, playing strength and playing pedal of the user are correct or not according to the comparison result of the plurality of segments.
When the method of the invention is used for comparing the segments, the method comprises the step of comparing the playing segment information, the playing pitch information, the playing rhythm information, the playing time value information, the playing speed information, the playing force information and the playing pedal information of each segment. Of course, other pieces of information can be compared, the invention is not particularly limited, and the comparison information can be adaptively designed according to the use requirements of the user to obtain a corresponding comparison result and provide a corresponding playing prompt for the user.
In an alternative embodiment, the performance section information of each reference piece includes, for each reference score, the number of score pages, the number of score lines, the number of score bars, and left-right hand information of the start and end positions of the score.
In an alternative embodiment, for each reference music score, each reference slice includes a plurality of reference notes, and the aligning process of each performance note of the performance music score data and each reference note of the target music score to establish a corresponding relationship between each performance note of the performance music score data and each reference note of the target music score includes:
extracting each playing note corresponding to each fragment;
searching a reference fragment at a corresponding position in the target music score;
and respectively aligning each playing note in the segment with each reference note in the found reference segment, and determining the corresponding relation between each playing note and each reference note, wherein the corresponding relation comprises the reference note corresponding to the playing note, the empty note corresponding to the playing note and the reference note corresponding to the empty note.
The method of the invention aligns each playing note in each segment with each reference note in the reference segment to find a one-to-one correspondence, which may be, for example: and (4) playing the note-reference note, playing the note-null note and the null-reference note, establishing a corresponding relation, and then carrying out the next comparison.
In an alternative embodiment, for each reference music score, each reference slice comprises reference information, wherein the reference information comprises reference segment information, reference pitch information, reference rhythm information, reference duration information, reference speed information, reference strength information and reference pedal information,
the extracting of the corresponding performance information for each segment, searching for the reference segment at the corresponding position in the target music score, comparing the performance information of the segment with the performance information corresponding to the found reference segment, and obtaining the current comparison result of the segment includes:
extracting corresponding performance segment information, performance pitch information, performance rhythm information, performance time value information, performance speed information, performance dynamics information and performance pedal information of each segment;
searching a reference fragment at a corresponding position in the target music score;
and comparing the playing segment information, the playing pitch information, the playing rhythm information, the playing duration information, the playing speed information, the playing force information and the playing pedal information of the fragment with the reference segment information, the reference pitch information, the reference rhythm information, the reference duration information, the reference speed information, the reference force information and the reference pedal information corresponding to the found reference fragment respectively to obtain the current comparison result of the fragment.
The method of the invention compares the performance information of each segment with the reference information of the reference segment to determine whether the performance segment, the performance pitch, the performance rhythm, the performance time value, the performance speed, the performance dynamics and the performance pedal of each segment are accurate, and can give corresponding prompt information to the user after the comparison, such as wrong rhythm, too fast performance speed and the like.
In an alternative embodiment, the current comparison result includes a segment comparison result, a pitch comparison result, a tempo comparison result, a duration comparison result, a speed comparison result, a force comparison result, and a pedal comparison result,
the determining error correction content and a subsequent playing plan based on the current comparison result and the historical comparison result, generating prompt information, and prompting the user through voice so that the user can perform corresponding adjustment in the current playing process or the subsequent playing process includes:
when any one or more of the fragment comparison result, the pitch comparison result, the tempo comparison result, the duration comparison result, the speed comparison result, the strength comparison result and the pedal comparison result exceeds a corresponding comparison threshold;
and determining error correction contents and a subsequent playing plan according to the comparison threshold corresponding to various comparison results and historical comparison results, generating corresponding playing prompt information, and prompting the user through voice so that the user can perform corresponding adjustment in the current playing process or the subsequent playing process.
It is understood that the present invention can compare (determine) the sound including but not limited to wrong sound, wrong rhythm, unskilled segment, speed, strength, etc., and give corresponding comparison (determination) result, and prompt by voice. It is also understood that the prompt information corresponding to the comparison (determination) result may be generated, for example, by inputting the comparison (determination) result into a neural network.
It will also be appreciated that the invention may automatically generate an exercise plan over a period of time, the plan including, for example, the number of passes, one-handed or two-handed shots, range of segments of the exercise, speed of the exercise, etc., such as 3 passes for a right-handed shot, 3 passes for a left-handed shot, 2 passes for a two-handed shot, etc. When the errors are corrected according to the historical comparison result, the priority of the errors can be calculated, and which kind of errors need to be highlighted. When a great error is found during playing (not played), the playing can be interrupted and prompted in time, or only some auxiliary guidance (good, once again) is performed, and the like.
For example, when the prompt information is generated, an error prompt may be performed when the performance errors (number and severity) are greater than a certain threshold; when the playing error is smaller than a certain threshold, the silence does not prompt (only interrupting the playing of the user when necessary, prompting the error and not listening to the user when necessary); when the playing times are larger than a certain threshold value, the next section can be played; when the playing time is larger than a certain threshold value, the prompt can have a rest. In making the error prompt, for example, the current performance error and the next performance method including, but not limited to, the segment to be played, whether hands are needed, whether the speed is slow, whether a specific practice is used, etc. may be prompted. The above is a schematic example of the generated prompt information, and the prompt information of different scenes is not specifically limited in the present invention.
For example, when prompted by voice, the location of the error can be described by voice, such as "please note the x page, x line, x section, (right hand) the firstx tones Do "; the error type can be described by speech, for example, wrong sound: "this tone should be Do", e.g. wrong tempo: "attention rhythm type is" Da, Da "; further, the method of playing the next step can be prompted by voice, e.g.) "
Figure BDA0002907524820000171
Please pop the section x to the section x again, e.g.) "
Figure BDA0002907524820000172
Please play one time with the right hand at slow speed. The above is a schematic example of the voice prompt information, and the voice prompt information in different scenes is not specifically limited by the present invention.
The piano partner training system based on audio interaction comprises the following components:
the musical instrument sound identification module is used for carrying out format conversion on the current playing audio data of the user to obtain playing music score data;
the music score identification module is used for extracting a plurality of fragments of the performance music score data and determining a target music score corresponding to the performance music score data based on the plurality of fragments;
a score alignment module, configured to align each playing note of the playing score data with each reference note of the target score to establish a corresponding relationship between each playing note of the playing score data and each reference note of the target score;
the error judgment module is used for extracting performance information corresponding to each fragment in the performance music score data, comparing the performance information with the performance information corresponding to the reference fragment at the corresponding position in the target music score, and generating a current comparison result for representing the current performance condition of the user;
the prompt strategy module is used for determining error correction content and a subsequent playing plan based on the current comparison result and the historical comparison result and generating corresponding playing prompt information;
and the voice prompt module is used for prompting the user through voice so that the user can correspondingly adjust in the current playing process and/or the subsequent playing process.
In an alternative embodiment, the piano sound identification module comprises:
acquiring performance audio data of a user;
performing fast Fourier transform on the performance audio data, and converting the performance audio data into frequency spectrum data, wherein the frequency spectrum data comprises multiple frames of data, and each frame of data is divided according to a preset frame length;
processing the frequency spectrum data through a deep neural network to obtain probability data of a plurality of keys, wherein each key comprises a plurality of frame probabilities, and each frame probability is a number of 0-1 and is used for representing whether the key is pressed down or not;
and converting the probability data of the keys into data with a preset format to obtain the music score data.
In an alternative embodiment, for the probability data of each key, the key is regarded as being depressed when the first frame probability is greater than or equal to the preset probability, and the key is regarded as being lifted otherwise.
In an alternative embodiment, the deep neural network employs a CNN LSTM network model,
the frequency spectrum data is processed through the deep neural network to obtain probability data of a plurality of keys, and the probability data comprises the following steps:
for each key, performing feature extraction on the frequency spectrum data corresponding to the key through a CNN network, providing the output of the CNN network to an LSTM network as input, and predicting through the LSTM network to obtain probability data of the key;
inputting the features extracted from the spectrum data corresponding to the keys to the LSTM network through the CNN network, and outputting the probability data of the keys through the LSTM network, wherein the probability data of each key is a probability sequence.
In an optional embodiment, the score library includes a plurality of reference scores, and for each reference score, the reference score data includes a plurality of reference slices sliced according to a preset slice length, the plurality of reference slices include a plurality of left-hand reference slices, a plurality of right-hand reference slices and a plurality of two-hand reference slices, wherein the plurality of left-hand reference slices are used for representing a plurality of slices of a left-hand performance part of the reference score, the plurality of right-hand reference slices are used for representing a plurality of slices of a right-hand performance part of the reference score, the plurality of two-hand reference slices are used for representing a plurality of slices of a two-hand performance part of the reference score,
the music score identification module comprises:
slicing the performance music score data according to a preset slicing length to obtain a plurality of slices, and extracting a characteristic value of each slice, wherein the plurality of slices comprise a plurality of left-hand slices, a plurality of right-hand slices and a plurality of double-hand slices;
and searching a plurality of reference fragments which are correspondingly matched with the fragments in the music score library, and taking the reference music score with the maximum matching times of the reference fragments as the target music score.
In an alternative embodiment, for each reference score, each reference slice comprises reference feature values for characterizing the reference slice,
the searching a plurality of reference fragments which are correspondingly matched with the plurality of fragments in the music score library, and taking the reference music score with the most matching times of the reference fragments as the target music score comprises the following steps:
constructing indexes for reference characteristic values of all reference fragments in the music library;
for each fragment, comparing the characteristic value of the fragment with each reference characteristic value in the index to determine a reference fragment matched with the fragment;
after the multiple slices are matched respectively, determining a reference music score to which the matched multiple reference slices belong;
and sequencing the determined reference music scores according to the times of successful matching of the reference fragments, and taking the reference music score with the most matching times in the sequencing as the target music score.
In an alternative embodiment, the error determination module includes:
extracting corresponding performance information for each fragment, searching a reference fragment at a corresponding position in the target music score, comparing the performance information of the fragment with the reference information corresponding to the searched reference fragment, and acquiring the current comparison result of the fragment, wherein the performance information comprises performance fragment information, performance pitch information, performance rhythm information, performance time value information, performance speed information, performance dynamics information and performance pedal information;
and determining whether the current playing segment, playing pitch, playing rhythm, playing time value, playing speed, playing strength and playing pedal of the user are correct or not according to the comparison result of the plurality of segments.
In an alternative embodiment, the performance section information of each reference score includes, for each reference score, the score page number, score line number, score bar number, and left-right hand information of the start and end positions of the score.
In an alternative embodiment, each reference slice includes a plurality of reference notes for each reference score, the score alignment module includes:
extracting each playing note corresponding to each fragment;
searching a reference fragment at a corresponding position in the target music score;
and respectively aligning each playing note in the segment with each reference note in the found reference segment, and determining the corresponding relation between each playing note and each reference note, wherein the corresponding relation comprises the reference note corresponding to the playing note, the empty note corresponding to the playing note and the reference note corresponding to the empty note.
In an alternative embodiment, each reference slice comprises reference information for each reference music score, wherein the reference information comprises reference segment information, reference pitch treble information, reference rhythm information, reference duration information, reference speed information, reference strength information and reference pedal information,
the extracting of the corresponding performance information for each segment, searching for the reference segment at the corresponding position in the target music score, comparing the performance information of the segment with the reference information corresponding to the searched reference segment, and obtaining the current comparison result of the segment includes:
extracting corresponding performance segment information, performance pitch information, performance rhythm information, performance time value information, performance speed information, performance dynamics information and performance pedal information of each segment;
searching a reference fragment at a corresponding position in the target music score;
and comparing the playing segment information, the playing pitch information, the playing rhythm information, the playing duration information, the playing speed information, the playing force information and the playing pedal information of the segment with the reference segment information, the reference pitch information, the reference rhythm information, the reference duration information, the reference speed information, the reference force information and the reference pedal information corresponding to the found reference segment respectively to obtain the current comparison result of the segments.
In an optional embodiment, the comparison result includes a fragment comparison result, a pitch comparison result, a tempo comparison result, a duration comparison result, a speed comparison result, a strength comparison result, and a pedal comparison result
The prompt strategy module comprises:
when any one or more of the fragment comparison result, the pitch comparison result, the tempo comparison result, the duration comparison result, the speed comparison result, the strength comparison result and the pedal comparison result exceeds a corresponding comparison threshold;
and determining error correction content and a subsequent playing plan according to the comparison threshold corresponding to each comparison result and the historical comparison result, generating corresponding playing prompt information, and prompting the user through voice so that the user can perform corresponding adjustment in the current playing process or the subsequent playing process.
The disclosure also relates to an electronic device comprising a server, a terminal and the like. The electronic device includes: at least one processor; a memory communicatively coupled to the at least one processor; and a communication component communicatively coupled to the storage medium, the communication component receiving and transmitting data under control of the processor; wherein the memory stores instructions executable by the at least one processor to implement the method of the above embodiments.
In an alternative embodiment, the memory is used as a non-volatile computer-readable storage medium for storing non-volatile software programs, non-volatile computer-executable programs, and modules. The processor executes various functional applications of the device and data processing, i.e., implements the method, by executing nonvolatile software programs, instructions, and modules stored in the memory.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be connected to the external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory and, when executed by the one or more processors, perform the methods of any of the method embodiments described above.
The product can execute the method provided by the embodiment of the application, has corresponding functional modules and beneficial effects of the execution method, and can refer to the method provided by the embodiment of the application without detailed technical details in the embodiment.
The present disclosure also relates to a computer-readable storage medium for storing a computer-readable program for causing a computer to perform some or all of the above-described method embodiments.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Furthermore, those of ordinary skill in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
It will be understood by those skilled in the art that while the present invention has been described with reference to exemplary embodiments, various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (25)

1. A piano partner training method based on audio interaction is characterized by comprising the following steps:
acquiring current playing audio data of a user and carrying out format conversion to obtain playing music score data;
comparing the performance music score data with reference music score data to generate a current comparison result for representing the current performance condition of the user;
and determining error correction content and a subsequent playing plan based on the current comparison result and the historical comparison result, generating corresponding playing prompt information, and prompting the user through voice so that the user can correspondingly adjust in the current playing process and/or the subsequent playing process.
2. The method of claim 1, wherein the acquiring and format-converting performance audio data of the current user to obtain performance music score data comprises:
acquiring performance audio data of a user;
performing fast Fourier transform on the performance audio data, and converting the performance audio data into frequency spectrum data, wherein the frequency spectrum data comprises multiple frames of data, and each frame of data is divided according to a preset frame length;
processing the frequency spectrum data through a deep neural network to obtain probability data of a plurality of keys, wherein each key comprises a plurality of frame probabilities, and each frame probability is a number of 0-1 and is used for representing whether the key is pressed down or not;
and converting the probability data of the keys into data with a preset format to obtain the music score data.
3. The method of claim 2, wherein for the probability data for each key, the key is deemed to be depressed when the first frame probability is greater than or equal to the preset probability, and the key is deemed to be lifted otherwise.
4. The method of claim 2, wherein the deep neural network employs a CNN LSTM network model,
the frequency spectrum data is processed through the deep neural network to obtain probability data of a plurality of keys, and the probability data comprises the following steps:
for each key, performing feature extraction on the frequency spectrum data corresponding to the key through a CNN network, providing the output of the CNN network to an LSTM network as input, and predicting through the LSTM network to obtain probability data of the key;
inputting the features extracted from the spectrum data corresponding to the keys to the LSTM network through the CNN network, and outputting the probability data of the keys through the LSTM network, wherein the probability data of each key is a probability sequence.
5. The method of claim 1, wherein the score library includes a plurality of reference scores, and for each reference score, the reference score data includes a plurality of reference slices sliced according to a preset slice length, the plurality of reference slices includes a plurality of left-hand reference slices, a plurality of right-hand reference slices and a plurality of two-hand reference slices, wherein the plurality of left-hand reference slices are used for characterizing a plurality of slices of a left-hand performance part of the reference score, the plurality of right-hand reference slices are used for characterizing a plurality of slices of a right-hand performance part of the reference score, and the plurality of two-hand reference slices are used for characterizing a plurality of slices of a two-hand performance part of the reference score,
the comparing the performance music score data with the reference music score data to generate a current comparison result for representing the current performance condition of the user includes:
extracting a plurality of fragments of the performance music score data, and matching the plurality of fragments with each reference fragment in the music score library to determine a target music score corresponding to the performance music score data from the music score library;
aligning each playing note of the playing music score data with each reference note of the target music score to establish a corresponding relation between each playing note of the playing music score data and each reference note of the target music score;
and extracting performance information corresponding to each fragment in the performance music score data, comparing the performance information with reference information corresponding to the reference fragment at the corresponding position in the target music score, and generating a current comparison result for representing the current performance condition of the user.
6. The method of claim 5, wherein the extracting a plurality of slices of the performance score data and matching the plurality of slices with respective reference slices in the score library to determine a target score corresponding to the performance score data from the score library comprises:
slicing the performance music score data according to a preset slicing length to obtain a plurality of slices, and extracting a characteristic value of each slice, wherein the plurality of slices comprise a plurality of left-hand slices, a plurality of right-hand slices and a plurality of double-hand slices;
and searching a plurality of reference fragments which are correspondingly matched with the fragments in the music score library, and taking the reference music score with the maximum matching times of the reference fragments as the target music score.
7. The method of claim 6, wherein, for each reference score, each reference slice comprises reference feature values for characterizing the reference slice,
the searching a plurality of reference fragments which are correspondingly matched with the plurality of fragments in the music score library, and taking the reference music score with the most matching times of the reference fragments as the target music score comprises the following steps:
constructing indexes for reference characteristic values of all reference fragments in the music library;
for each fragment, comparing the characteristic value of the fragment with each reference characteristic value in the index to determine a reference fragment matched with the fragment;
after the multiple slices are matched respectively, determining a reference music score to which the matched multiple reference slices belong;
and sequencing the determined reference music scores according to the times of successful matching of the reference fragments, and taking the reference music score with the most matching times in the sequencing as the target music score.
8. The method of claim 5, wherein the extracting performance information corresponding to each segment in the performance score data and comparing the performance information with reference information corresponding to a reference segment at a corresponding position in the target score to generate a current comparison result for representing a current performance condition of a user comprises:
extracting corresponding performance information for each fragment, searching a reference fragment at a corresponding position in the target music score, comparing the performance information of the fragment with the reference information corresponding to the searched reference fragment, and acquiring the current comparison result of the fragment, wherein the performance information comprises performance fragment information, performance pitch information, performance rhythm information, performance time value information, performance speed information, performance dynamics information and performance pedal information;
and determining whether the current playing segment, playing intonation, playing rhythm, playing time value, playing speed, playing strength and playing pedal of the user are correct or not according to the comparison result of the plurality of segments.
9. The method of claim 8, wherein the performance section information of each reference slice includes, for each reference score, the number of pages of the score, the number of lines of the score, the number of bars of the score, and left and right hand information at the start and end positions of the score.
10. The method of claim 5, wherein, for each reference score, each reference slice comprises a plurality of reference notes,
aligning each playing note of the playing music score data with each reference note of the target music score to establish a corresponding relation between each playing note of the playing music score data and each reference note of the target music score, and the method comprises the following steps:
extracting each playing note corresponding to each fragment;
searching a reference fragment at a corresponding position in the target music score;
and respectively aligning each playing note in the segment with each reference note in the found reference segment, and determining the corresponding relation between each playing note and each reference note, wherein the corresponding relation comprises the reference note corresponding to the playing note, the empty note corresponding to the playing note and the reference note corresponding to the empty note.
11. The method of claim 8, wherein each reference slice includes reference information including reference segment information, reference pitch information, reference tempo information, reference duration information, reference speed information, reference force information, and reference pedal information for each reference score,
the extracting of the corresponding performance information for each segment, searching for the reference segment at the corresponding position in the target music score, comparing the performance information of the segment with the reference information corresponding to the searched reference segment, and obtaining the current comparison result of the segment includes:
extracting corresponding performance segment information, performance pitch information, performance rhythm information, performance time value information, performance speed information, performance dynamics information and performance pedal information of each segment;
searching a reference fragment at a corresponding position in the target music score;
and comparing the playing segment information, the playing pitch information, the playing rhythm information, the playing duration information, the playing speed information, the playing force information and the playing pedal information of the fragment with the reference segment information, the reference pitch information, the reference rhythm information, the reference duration information, the reference speed information, the reference force information and the reference pedal information corresponding to the found reference fragment respectively to obtain the current comparison result of the fragment.
12. The method of claim 1, wherein the current alignment results comprise a segment alignment result, a pitch alignment result, a tempo alignment result, a duration alignment result, a speed alignment result, a force alignment result, and a pedal alignment result,
the determining error correction content and a subsequent playing plan based on the current comparison result and the historical comparison result, generating corresponding playing prompt information, and prompting a user through voice so that the user can perform corresponding adjustment in the current playing process or the subsequent playing process, includes:
when any one or more of the fragment comparison result, the pitch comparison result, the tempo comparison result, the duration comparison result, the speed comparison result, the strength comparison result and the pedal comparison result exceeds a corresponding comparison threshold;
and determining error correction contents and a subsequent playing plan according to the comparison threshold corresponding to various comparison results and historical comparison results, generating corresponding playing prompt information, and prompting the user through voice so that the user can perform corresponding adjustment in the current playing process or the subsequent playing process.
13. A piano sparring system based on audio interaction, the system comprising:
the musical instrument sound identification module is used for carrying out format conversion on the current playing audio data of the user to obtain playing music score data;
the music score identification module is used for extracting a plurality of fragments of the performance music score data and determining a target music score corresponding to the performance music score data based on the plurality of fragments;
a score alignment module, configured to align each playing note of the playing score data with each reference note of the target score to establish a corresponding relationship between each playing note of the playing score data and each reference note of the target score;
the error judgment module is used for extracting performance information corresponding to each fragment in the performance music score data, comparing the performance information with the performance information corresponding to the reference fragment at the corresponding position in the target music score, and generating a current comparison result for representing the current performance condition of the user;
the prompt strategy module is used for determining error correction content and a subsequent playing plan based on the current comparison result and the historical comparison result and generating corresponding playing prompt information;
and the voice prompt module is used for prompting the user through voice so that the user can correspondingly adjust in the current playing process and/or the subsequent playing process.
14. The system of claim 13, wherein the piano sound identification module comprises:
acquiring performance audio data of a user;
performing fast Fourier transform on the performance audio data, and converting the performance audio data into frequency spectrum data, wherein the frequency spectrum data comprises multiple frames of data, and each frame of data is divided according to a preset frame length;
processing the frequency spectrum data through a deep neural network to obtain probability data of a plurality of keys, wherein each key comprises a plurality of frame probabilities, and each frame probability is a number of 0-1 and is used for representing whether the key is pressed down or not;
and converting the probability data of the keys into data with a preset format to obtain the music score data.
15. The system of claim 14, wherein for the probability data for each key, the key is deemed to be depressed when the first frame probability is greater than or equal to the preset probability, and the key is deemed to be lifted otherwise.
16. The system of claim 14, wherein the deep neural network employs a CNN LSTM network model,
the frequency spectrum data is processed through the deep neural network to obtain probability data of a plurality of keys, and the probability data comprises the following steps:
for each key, performing feature extraction on the frequency spectrum data corresponding to the key through a CNN network, providing the output of the CNN network to an LSTM network as input, and predicting through the LSTM network to obtain probability data of the key;
inputting the features extracted from the spectrum data corresponding to the keys to the LSTM network through the CNN network, and outputting the probability data of the keys through the LSTM network, wherein the probability data of each key is a probability sequence.
17. The system of claim 13, wherein the score library includes a plurality of reference scores, and for each reference score, the reference score data includes a plurality of reference slices sliced according to a preset slice length, the plurality of reference slices includes a plurality of left-hand reference slices, a plurality of right-hand reference slices and a plurality of two-hand reference slices, wherein the plurality of left-hand reference slices are used for characterizing a plurality of slices of a left-hand performance portion of the reference score, the plurality of right-hand reference slices are used for characterizing a plurality of slices of a right-hand performance portion of the reference score, and the plurality of two-hand reference slices are used for characterizing a plurality of slices of a two-hand performance portion of the reference score,
the music score identification module comprises:
slicing the performance music score data according to a preset slicing length to obtain a plurality of slices, and extracting a characteristic value of each slice, wherein the plurality of slices comprise a plurality of left-hand slices, a plurality of right-hand slices and a plurality of double-hand slices;
and searching a plurality of reference fragments which are correspondingly matched with the fragments in the music score library, and taking the reference music score with the maximum matching times of the reference fragments as the target music score.
18. The system of claim 17, wherein, for each reference score, each reference slice comprises reference feature values for characterizing the reference slice,
the searching a plurality of reference fragments which are correspondingly matched with the plurality of fragments in the music score library, and taking the reference music score with the most matching times of the reference fragments as the target music score comprises the following steps:
constructing indexes for reference characteristic values of all reference fragments in the music library;
for each fragment, comparing the characteristic value of the fragment with each reference characteristic value in the index to determine a reference fragment matched with the fragment;
after the multiple slices are matched respectively, determining a reference music score to which the matched multiple reference slices belong;
and sequencing the determined reference music scores according to the times of successful matching of the reference fragments, and taking the reference music score with the most matching times in the sequencing as the target music score.
19. The system of claim 13, wherein the error determination module comprises:
extracting corresponding performance information for each fragment, searching a reference fragment at a corresponding position in the target music score, comparing the performance information of the fragment with the reference information corresponding to the searched reference fragment, and acquiring the current comparison result of the fragment, wherein the performance information comprises performance fragment information, performance pitch information, performance rhythm information, performance time value information, performance speed information, performance dynamics information and performance pedal information;
and determining whether the current playing segment, playing intonation, playing rhythm, playing time value, playing speed, playing strength and playing pedal of the user are correct or not according to the comparison result of the plurality of segments.
20. The system of claim 19, wherein the performance section information of each reference slice includes, for each reference score, the number of pages of the score, the number of lines of the score, the number of bars of the score, and left and right hand information at the start and end positions of the score.
21. The system of claim 13, wherein, for each reference score, each reference slice comprises a plurality of reference notes,
the music score alignment module comprises:
extracting each playing note corresponding to each fragment;
searching a reference fragment at a corresponding position in the target music score;
and respectively aligning each playing note in the segment with each reference note in the found reference segment, and determining the corresponding relation between each playing note and each reference note, wherein the corresponding relation comprises the reference note corresponding to the playing note, the empty note corresponding to the playing note and the reference note corresponding to the empty note.
22. The system of claim 19, wherein each reference slice includes reference information for each reference score, wherein the reference information includes reference segment information, reference pitch information, reference tempo information, reference duration information, reference speed information, reference force information, and reference pedal information,
the extracting of the corresponding performance information for each segment, searching for the reference segment at the corresponding position in the target music score, comparing the performance information of the segment with the reference information corresponding to the searched reference segment, and obtaining the current comparison result of the segment includes:
extracting corresponding performance segment information, performance pitch information, performance rhythm information, performance time value information, performance speed information, performance dynamics information and performance pedal information of each segment;
searching a reference fragment at a corresponding position in the target music score;
and comparing the playing segment information, the playing pitch information, the playing rhythm information, the playing duration information, the playing speed information, the playing force information and the playing pedal information of the fragment with the reference segment information, the reference pitch information, the reference rhythm information, the reference duration information, the reference speed information, the reference force information and the reference pedal information corresponding to the found reference fragment respectively to obtain the current comparison result of the fragment.
23. The system of claim 13, wherein the current alignment results comprise a segment alignment result, a pitch alignment result, a tempo alignment result, a duration alignment result, a speed alignment result, a force alignment result, and a pedal alignment result,
the prompt strategy module comprises:
when any one or more of the fragment comparison result, the pitch comparison result, the tempo comparison result, the duration comparison result, the speed comparison result, the strength comparison result and the pedal comparison result exceeds a corresponding comparison threshold;
and determining error correction contents and a subsequent playing plan according to the comparison threshold corresponding to various comparison results and historical comparison results, generating corresponding playing prompt information, and prompting the user through voice so that the user can perform corresponding adjustment in the current playing process or the subsequent playing process.
24. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of any of claims 1-12.
25. A computer-readable storage medium, on which a computer program is stored, the computer program being executable by a processor for implementing the method according to any one of claims 1-12.
CN202110076077.7A 2021-01-20 2021-01-20 Piano partner training method and system based on audio interaction Pending CN112863464A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110076077.7A CN112863464A (en) 2021-01-20 2021-01-20 Piano partner training method and system based on audio interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110076077.7A CN112863464A (en) 2021-01-20 2021-01-20 Piano partner training method and system based on audio interaction

Publications (1)

Publication Number Publication Date
CN112863464A true CN112863464A (en) 2021-05-28

Family

ID=76007739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110076077.7A Pending CN112863464A (en) 2021-01-20 2021-01-20 Piano partner training method and system based on audio interaction

Country Status (1)

Country Link
CN (1) CN112863464A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114333497A (en) * 2022-01-11 2022-04-12 平安科技(深圳)有限公司 Music partner training method, device, equipment and medium
CN116450907A (en) * 2023-06-09 2023-07-18 深圳普菲特信息科技股份有限公司 Process route visual setting method, system and readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108074438A (en) * 2016-11-18 2018-05-25 北京酷我科技有限公司 The error correction method and system of a kind of piano performance
CN108711336A (en) * 2018-04-27 2018-10-26 山东英才学院 A kind of piano performance points-scoring system and its method
CN109493853A (en) * 2018-09-30 2019-03-19 福建星网视易信息***有限公司 A kind of the determination method and terminal of audio similarity
CN110070847A (en) * 2019-03-28 2019-07-30 深圳芒果未来教育科技有限公司 Musical sound assessment method and Related product
CN111028615A (en) * 2019-11-29 2020-04-17 尤剑 Intelligent musical instrument playing teaching method, system and storage medium
CN111554256A (en) * 2020-04-21 2020-08-18 华南理工大学 Piano playing ability evaluation system based on strong and weak standards
CN111863026A (en) * 2020-07-27 2020-10-30 北京世纪好未来教育科技有限公司 Processing method and device for playing music by keyboard instrument and electronic device
CN112183658A (en) * 2020-10-14 2021-01-05 小叶子(北京)科技有限公司 Music score identification method and device, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108074438A (en) * 2016-11-18 2018-05-25 北京酷我科技有限公司 The error correction method and system of a kind of piano performance
CN108711336A (en) * 2018-04-27 2018-10-26 山东英才学院 A kind of piano performance points-scoring system and its method
CN109493853A (en) * 2018-09-30 2019-03-19 福建星网视易信息***有限公司 A kind of the determination method and terminal of audio similarity
CN110070847A (en) * 2019-03-28 2019-07-30 深圳芒果未来教育科技有限公司 Musical sound assessment method and Related product
CN111028615A (en) * 2019-11-29 2020-04-17 尤剑 Intelligent musical instrument playing teaching method, system and storage medium
CN111554256A (en) * 2020-04-21 2020-08-18 华南理工大学 Piano playing ability evaluation system based on strong and weak standards
CN111863026A (en) * 2020-07-27 2020-10-30 北京世纪好未来教育科技有限公司 Processing method and device for playing music by keyboard instrument and electronic device
CN112183658A (en) * 2020-10-14 2021-01-05 小叶子(北京)科技有限公司 Music score identification method and device, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114333497A (en) * 2022-01-11 2022-04-12 平安科技(深圳)有限公司 Music partner training method, device, equipment and medium
CN114333497B (en) * 2022-01-11 2023-08-25 平安科技(深圳)有限公司 Music partner training method, device, equipment and medium
CN116450907A (en) * 2023-06-09 2023-07-18 深圳普菲特信息科技股份有限公司 Process route visual setting method, system and readable storage medium

Similar Documents

Publication Publication Date Title
CN108305643B (en) Method and device for determining emotion information
US11564090B1 (en) Audio verification
US7716049B2 (en) Method, apparatus and computer program product for providing adaptive language model scaling
EP2700071B1 (en) Speech recognition using multiple language models
WO2017113973A1 (en) Method and device for audio identification
US20130044885A1 (en) System And Method For Identifying Original Music
CN105304080A (en) Speech synthesis device and speech synthesis method
US20110093263A1 (en) Automated Video Captioning
WO2020155490A1 (en) Method and apparatus for managing music based on speech analysis, and computer device
CN112863464A (en) Piano partner training method and system based on audio interaction
WO2020237769A1 (en) Accompaniment purity evaluation method and related device
CN106898339B (en) Song chorusing method and terminal
CN111883100B (en) Voice conversion method, device and server
CN105989839A (en) Speech recognition method and speech recognition device
CN113823323A (en) Audio processing method and device based on convolutional neural network and related equipment
KR101813704B1 (en) Analyzing Device and Method for User's Voice Tone
JP5112978B2 (en) Speech recognition apparatus, speech recognition system, and program
JP6468258B2 (en) Voice dialogue apparatus and voice dialogue method
CN112908293B (en) Method and device for correcting pronunciations of polyphones based on semantic attention mechanism
CN114598933A (en) Video content processing method, system, terminal and storage medium
CN112712793A (en) ASR (error correction) method based on pre-training model under voice interaction and related equipment
JP2013050605A (en) Language model switching device and program for the same
CN108682423A (en) A kind of audio recognition method and device
CN107025902B (en) Data processing method and device
JP2008241970A (en) Speaker adaptation device, speaker adaptation method and speaker adaptation program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination