CN113689836B

CN113689836B - Method and terminal for converting audio into notes and displaying notes

Info

Publication number: CN113689836B
Application number: CN202110922957.1A
Authority: CN
Inventors: 王子亮; 陈勇; 苏财德; 邹应双
Original assignee: Fujian Star Net eVideo Information Systems Co Ltd
Current assignee: Fujian Star Net eVideo Information Systems Co Ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2023-08-18
Anticipated expiration: 2041-08-12
Also published as: CN113689836A

Abstract

The invention discloses a method and a terminal for converting audio into notes and displaying the notes, and the method and the terminal acquire the audio to be converted; converting the audio to be converted into a corresponding note set, and generating a converted music score according to the note set; aligning and displaying the converted music score with a standard music score corresponding to the audio to be converted; converting the audio to be converted into a corresponding note set, and generating a converted music score according to the note set; the converted music score is aligned with the standard music score corresponding to the audio to be converted and displayed, and the singing or playing information of the user is converted into the music score and is compared with the standard music score for display, so that the singing or playing condition of the user can be reflected more accurately and professionally, the difference between the audio of the singing or playing and the standard music score can be conveniently known by the user, and the correction, exercise and improvement of the singing or playing of the user are facilitated.

Description

Method and terminal for converting audio into notes and displaying notes

Technical Field

The present invention relates to the field of audio conversion, and in particular, to a method and terminal for converting audio into notes and displaying the notes.

Background

The existing singing or playing scoring system usually displays the condition of the singing or playing of the user by drawing a pitch bar on an interface, and although the condition of the singing or playing of the user can be roughly seen in the mode, the difference from the standard music score and how to improve are not known in detail.

At present, music playing software is also available, so that the audio singed by a user can be directly converted into a plurality of notes, and then the notes are converted into music, thereby facilitating the user to make music. However, the method can only obtain original music score, and the user cannot know the difference between the audio of personal singing or playing and the standard music score in the method, and cannot further improve or enhance the music score.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: the method and the terminal for converting the audio into the notes and displaying the notes can enable a user to conveniently know the difference between the singed or played audio and a standard music score.

In order to solve the technical problems, the invention adopts a technical scheme that:

a method of converting audio to notes and displaying the notes, comprising the steps of:

acquiring audio to be converted;

converting the audio to be converted into a corresponding note set, and generating a converted music score according to the note set;

and aligning and displaying the converted music score with the standard music score corresponding to the audio to be converted.

In order to solve the technical problems, the invention adopts another technical scheme that:

a terminal for converting audio to notes and displaying the notes, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method for converting audio to notes and displaying the notes when executing the computer program.

The invention has the beneficial effects that: converting the audio to be converted into a corresponding note set, and generating a converted music score according to the note set; the converted music score is aligned with the standard music score corresponding to the audio to be converted and displayed, and the singing or playing information of the user is converted into the music score and is compared with the standard music score for display, so that the singing or playing condition of the user can be reflected more accurately and professionally, the difference between the audio of the singing or playing and the standard music score can be conveniently known by the user, and the correction, exercise and improvement of the singing or playing of the user are facilitated.

Drawings

FIG. 1 is a flowchart illustrating steps of a method for converting audio to notes and displaying the notes according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a terminal for converting audio into notes and displaying the notes according to an embodiment of the present invention;

FIG. 3 is a schematic diagram showing two alignments performed in a method of converting audio into notes and displaying the notes according to an embodiment of the present invention.

Detailed Description

In order to describe the technical contents, the achieved objects and effects of the present invention in detail, the following description will be made with reference to the embodiments in conjunction with the accompanying drawings.

Referring to fig. 1, a method for converting audio into notes and displaying the notes includes the steps of:

acquiring audio to be converted;

From the above description, the beneficial effects of the invention are as follows: converting the audio to be converted into a corresponding note set, and generating a converted music score according to the note set; the converted music score is aligned with the standard music score corresponding to the audio to be converted and displayed, and the singing or playing information of the user is converted into the music score and is compared with the standard music score for display, so that the singing or playing condition of the user can be reflected more accurately and professionally, the difference between the audio of the singing or playing and the standard music score can be conveniently known by the user, and the correction, exercise and improvement of the singing or playing of the user are facilitated.

Further, the method further comprises the steps of:

highlighting notes for which the converted score is inconsistent with the standard score.

According to the above description, through highlighting the notes of which the converted music score is inconsistent with the standard music score, a user can intuitively know the place where the performance or singing is inaccurate, so that the user can correct the music score in time.

Further, the converting the audio to be converted into the corresponding note set includes:

and obtaining a standard note set of a standard music score corresponding to the audio to be converted, and converting the audio to be converted into a corresponding note set by referring to the standard note set.

As can be seen from the above description, when converting audio into a corresponding note set, the method is performed with reference to a standard melody, and the matching notes of a certain standard note can be more reasonably found out by converting the standard note set of the standard melody, so as to improve the matching degree between the converted note set and the standard note set, thereby more accurately determining the gap between the audio singed or played by the user and the standard melody.

Further, the converting the audio to be converted into the corresponding note set with reference to the standard note set includes:

converting the audio to be converted into a corresponding comparison note sequence;

aligning the comparison note sequence with a standard note set corresponding to the standard music score, and determining more than one comparison note corresponding to each standard note in the standard note set;

taking all the comparison notes corresponding to each standard note as a comparison note set corresponding to the standard note;

determining a comparison note with the highest matching degree with the corresponding standard note in each comparison note set to obtain a comparison note matched with each standard note in the standard note set;

and combining all the matched comparison notes to obtain a note set corresponding to the audio to be converted.

According to the above description, the comparison note set corresponding to the standard music score is firstly aligned to the comparison note sequence, the comparison note set corresponding to each standard note in the standard note set is determined, then the comparison note with the highest matching degree with the corresponding standard note is determined in each comparison note set, and the comparison note matched with each standard note one by one is obtained, namely, the first alignment of the comparison note and the standard note is realized by determining the comparison note set corresponding to each standard note, the problem that the singing or playing time of a user is inconsistent with the music score can be solved, then the comparison note which is most matched with the standard note is searched on the basis, the second alignment of the comparison note and the standard note is realized, redundant comparison notes are further removed, the accuracy of matching the user note and the standard note is greatly improved through the two alignments, and the difference between the audio singing or playing of the user and the standard music score is accurately determined.

Further, each comparison note in the set of notes includes its corresponding pitch and duration values, and each standard note in the set of standard notes includes its corresponding pitch and duration values;

the generating a converted score from the set of notes comprises:

converting the pitch of each comparison note in the set of notes to a corresponding note name or tone name;

converting the duration value of each comparison note in the note set into a corresponding duration according to the duration value of the corresponding standard note and a preset redundancy value;

and generating a converted melody according to the tone name or the singing name and the duration corresponding to each comparison note in the note set.

As can be seen from the above description, when the duration value of the comparison note is changed into the duration on the music score, the comparison note can be converted according to the preset redundancy value, and the preset redundancy value is set according to the experience value, so that the converted music score better accords with the characteristics of actual singing or playing of the user.

Further, the converting the audio to be converted into the corresponding aligned note sequence includes:

extracting a pitch sequence from the audio to be converted;

and cutting the pitch sequence to obtain a corresponding comparison note sequence.

As can be seen from the above description, the corresponding comparison note sequence is obtained by extracting the pitch sequence from the audio to be converted and segmenting the pitch sequence, so that the audio can be conveniently and accurately converted into the corresponding comparison note sequence.

Further, the step of segmenting the pitch sequence to obtain a corresponding aligned note sequence includes:

detecting a zero value of the pitch sequence, and dividing the pitch sequence into a plurality of pitch segments according to the zero value to obtain a pitch segment set;

traversing the pitch segment set, and determining more than one comparison notes corresponding to the pitch segment for each traversed pitch segment;

and combining the comparison notes corresponding to all the pitch segments to obtain corresponding comparison note sequences.

It can be seen from the above description that, after the pitch sequence is extracted, the pitch sequence is first divided into a plurality of pitch segments according to the zero value by detecting the zero value of the pitch sequence, then more than one comparison note corresponding to each pitch segment is determined based on the pitch segments, finally, the comparison notes corresponding to all the pitch segments are combined to obtain the comparison note sequence, and the accuracy and reliability of the determined comparison note sequence are ensured by splitting and then combining.

Further, the determining more than one comparison note corresponding to the pitch segment includes:

acquiring and buffering pitches in the pitch segments one by one in time sequence;

determining an average pitch difference of all pitches buffered for each buffer;

judging whether the average pitch difference is larger than a preset threshold value, if so, determining the median or average value of all the cached pitches as the pitch of a new note;

determining a duration value of the newly generated note according to the sum of duration corresponding to all the cached pitches;

determining a comparison note corresponding to the pitch segment according to the pitch and duration values;

clearing all the cached pitches, and returning to execute the steps of sequentially obtaining and caching the pitches in the pitch segments one by one in time sequence until each pitch in the pitch segments is traversed;

if not, returning to execute the steps of obtaining and caching the pitches in the pitch segments one by one in time sequence until each pitch in the pitch segments is traversed;

and determining all comparison notes generated in sequence as more than one comparison note corresponding to the pitch segment.

As can be seen from the above description, when determining the comparison notes corresponding to each pitch segment, the average pitch difference is calculated for the obtained continuous pitches, the pitch belonging to the same note is determined by the average pitch difference, and the pitch and duration values of the corresponding notes are determined accordingly as the comparison notes corresponding to the pitch segments, so that more than one comparison note corresponding to each pitch segment is determined, all the comparison notes corresponding to each pitch segment can be accurately and rapidly determined, and subsequent note comparison is facilitated.

Further, the method further comprises the steps of:

determining the starting time and the ending time of each comparison note in the comparison note sequence to obtain a time range corresponding to each comparison note;

the aligning the standard note set corresponding to the standard music score with the comparison note sequence, determining more than one comparison note corresponding to each standard note in the standard note set, taking all comparison notes corresponding to each standard note as comparison note sets corresponding to the standard notes, and the steps of:

the audio to be converted is singing audio, words are cut on the audio to be converted according to singing lyrics, and time information of each word to be singed is obtained;

determining more than one comparison note corresponding to each word according to the time information of each word singed and the time range corresponding to each comparison note in the comparison note sequence, and determining all comparison notes corresponding to each word as a comparison note set corresponding to the word;

and determining a comparison note set corresponding to each standard note in the standard note set according to the standard note corresponding to each word of the singing and the comparison note set corresponding to each word of the singing.

As can be seen from the above description, for the singing audio, the time information of each word can be obtained by word cutting of the singing lyrics, then the comparison note set corresponding to each word is determined according to the time information, finally the comparison note set corresponding to each word and the comparison note set corresponding to each standard note are determined according to the comparison note set corresponding to each word and the standard note corresponding to each word, and the comparison note set corresponding to each standard note in the singing audio can be conveniently determined by word cutting of the lyrics, and meanwhile the problem that the singing time or playing time of the user is inconsistent with the standard music score can be solved. Through the operation, one-time alignment of singing or playing of the user and a standard music score is realized.

Further, the word cutting is performed on the audio to be converted according to singing lyrics, and the time information of each word to be singed is obtained, which includes:

and carrying out strong alignment on the audio to be converted and the corresponding singing lyrics by using a preset voice recognition model to obtain the time information of each word of singing. Wherein, the strong alignment adopts a spoken language voice evaluation GOP algorithm.

From the above description, the word cutting can be accurately and conveniently performed on the singing lyrics through the spoken language evaluation GOP algorithm and the preset voice recognition model, and the time information of each word of singing can be determined.

Further, the aligning the standard note set of the aligned note sequence corresponding to the standard score includes:

and carrying out DTW alignment on the comparison note sequence and the standard note set corresponding to the standard score.

From the above description, the alignment of the note sequence and the standard note set corresponding to the standard score can be conveniently compared through the DTW algorithm, so that the method is suitable for singing or playing audio, humming audio and has good universality and strong applicability.

Further, each comparison note in the set of notes includes its corresponding pitch and duration values, and each standard note in the set of standard notes includes its corresponding pitch and duration values; the determining the comparison notes with the highest matching degree with the corresponding standard notes in each comparison note set comprises the following steps:

traversing each comparison note in the comparison note set for the traversed comparison notes:

determining a first matching degree of the pitch of the comparison notes and the pitch of the standard notes corresponding to the comparison note set where the pitch of the comparison notes are located;

determining a second matching degree of the duration value of the comparison note and the duration value of the standard note corresponding to the comparison note set where the duration value of the comparison note is located;

determining the matching degree between the comparison notes and standard notes corresponding to the comparison note set where the comparison notes are located according to the first matching degree and the second matching degree;

and determining the comparison notes with the highest matching degree in each comparison note set as the comparison notes with the highest matching degree with the corresponding standard notes.

As can be seen from the above description, the pitch and duration values of the comparison notes and the corresponding standard notes are respectively matched, the matching degree between the comparison notes and the corresponding standard notes is determined according to the two matching values, the comparison note with the highest matching degree in the set of comparison notes is determined as the comparison note with the highest matching degree with the corresponding standard notes, redundant comparison notes are removed, the accuracy of the second alignment is ensured, and the accuracy of the note alignment is improved.

Referring to fig. 2, a terminal for converting audio into notes and displaying the notes includes a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for converting audio into notes and displaying the notes when executing the computer program.

The method and the terminal for converting the audio into the notes and displaying the notes can be applied to various application scenes in which the audio of singing or playing of the user is required to be evaluated, such as scoring of a player or singer, practice of the player or singer, or application scenes such as singing from the user to KTV, and the like, and the following description is made by the specific embodiments:

example 1

s1, acquiring audio to be converted, wherein the audio can be the audio sung by a user or the audio played by the user;

s2, converting the audio to be converted into a corresponding note set, and generating a converted music score according to the note set;

wherein the converting the audio to be converted into the corresponding note set includes:

acquiring a standard note set of a standard music score corresponding to the audio to be converted, and converting the audio to be converted into a corresponding note set by referring to the standard note set;

s3, aligning and displaying the converted music score with a standard music score corresponding to the audio to be converted;

the aligned converted music score and the standard music score corresponding to the audio to be converted are displayed on an interface together;

when the song is displayed, a line of converted song spectrums can be displayed in a mode that a line of corresponding standard song spectrums are displayed until the complete song is displayed; the whole converted music spectrum can be displayed, and the whole standard music spectrum can be correspondingly displayed;

wherein, in order to make the user more intuitively know the place where the singing or playing is inconsistent, the notes where the converted music spectrum is inconsistent with the standard music spectrum can be highlighted, for example, the inconsistent notes can be highlighted with different colors or with animation to draw the attention of the user, so that the user can intuitively see the place where the singing or playing is inaccurate.

Example two

The present embodiment further defines how the audio to be converted is converted into a corresponding set of notes with reference to the standard set of notes:

s21, converting the audio to be converted into a corresponding comparison note sequence;

specifically, extracting a pitch sequence from the audio to be converted;

cutting the pitch sequence to obtain a corresponding comparison note sequence;

wherein extracting a pitch sequence from the audio to be converted comprises:

extracting a fundamental frequency sequence from audio to be converted, wherein the fundamental frequency extraction algorithm can adopt short-time autocorrelation, short-time average amplitude difference AMDF, YIN algorithm and the like;

smoothing the base frequency sequence;

the smoothed fundamental frequency sequence is converted into a pitch sequence, and the conversion formula is as follows:

wherein f represents a fundamental frequency in the fundamental frequency sequence, and p represents a pitch corresponding to the fundamental frequency f after conversion;

s22, aligning the comparison note sequence with a standard note set corresponding to the standard music score, and determining more than one comparison note corresponding to each standard note in the standard note set;

s23, taking all comparison notes corresponding to each standard note as a comparison note set corresponding to the standard note;

s24, determining a comparison note with the highest matching degree of the corresponding standard note in each comparison note set to obtain a comparison note matched with each standard note in the standard note set;

s25, combining all the matched comparison notes to obtain a note set corresponding to the audio to be converted;

wherein, each comparison note in the note set comprises a corresponding pitch and duration value thereof, and each standard note in the standard note set comprises a corresponding pitch and duration value thereof;

the generating a converted score from the set of notes comprises:

for example, when C is greatly adjusted, the pitch 60 is mapped to the numbered "1", the pitch 62 is mapped to the numbered "2", and so on, and in other alternative embodiments, the pitch can be mapped to the staff name;

for example, the duration of the standard note is a half note, and if the duration value of the matched comparison note is close to the duration value of the standard note, the matching note is mapped into the half note; if the duration value of the matched comparison note is close to half of the duration value of the standard note, mapping the comparison note into quarter notes; and so on.

More specifically, the proximity of the comparison note to the standard note may be compared by a preset redundancy value. When the duration value of the comparison note is within the preset redundant value range of the duration value of the standard note, mapping the duration value of the comparison note into the duration of the standard note. For example, if the preset redundancy value is 0.25, the duration value of the comparison note is considered to be within the range of the duration value (1±0.25) of the standard note. Specifically, for example, if the duration value of the comparison note is 0.5 seconds, the duration of the standard note is quarter note, the corresponding duration value is 0.6 seconds, and if the preset redundancy value is 0.25, the preset redundancy range value of the standard note is 0.45 seconds to 0.75 seconds, and the duration value of the comparison note falls within the redundancy range value of the standard note, so the duration value of the comparison note is mapped to the duration of the standard note, that is, quarter note. The redundancy value is set according to the experience value, so that the converted situation of actual singing or playing of the user is more met.

Generating a converted melody according to the tone name or the singing name and the duration corresponding to each comparison note in the note set;

the converted melody is composed of notes including names and duration, and in the displaying process of step S3, standard notes and the names and duration of the matching comparison notes can be displayed in contrast through the interface.

Example III

The embodiment further defines how to segment the pitch sequence to obtain a corresponding aligned note sequence, which is specific to:

detecting the zero value of the pitch sequence, dividing the pitch sequence into a plurality of pitch segments according to the zero value to obtain a pitch segment set, namely dividing the pitch sequence into a plurality of pitch segments arranged according to time sequence according to the zero value position in the pitch sequence;

traversing each pitch segment in the pitch segment set according to time sequence, and determining more than one comparison notes corresponding to the pitch segment for each traversed pitch segment;

combining more than one comparison notes corresponding to all the pitch segments according to the time sequence to obtain corresponding comparison note sequences;

wherein determining more than one comparison note corresponding to the pitch segment comprises:

every time a pitch is obtained, the pitch can be cached into a preset array pvInNote, and then the average pitch difference meanpatchError of all pitches stored in the array pvInNote is calculated;

the calculation process of the average pitch difference meanpatcherror is as follows:

calculating the median value of all pitches stored in the array pvInNote, then calculating the sum of absolute values of differences between all pitches stored in the array pvInNote and the median value, and then averaging;

namely, all the stored pitches in the array pvInNote correspond to one note, and the median or average value of all the stored pitches is determined as the pitch of the note;

the length of the array pvInNote is multiplied by the time represented by the unit length corresponding to the single number in the array to be used as the duration value of the note, the indexes of all the pitches of the pvInNote array corresponding to the original pitch sequence are recorded, the starting time and the ending time of the generated note in the original audio can be obtained according to the indexes, the newly generated note is added into the note array notes, the array pvInNote is emptied, and the steps of acquiring and caching the pitches in the pitch segments one by one according to the time sequence are executed until each pitch in the pitch segments is traversed;

determining all comparison notes generated in sequence as more than one comparison note corresponding to the pitch segment;

after traversing all the pitch segments, all the comparison notes corresponding to all the pitch segments are generated, and all the comparison notes are stored in the note array notes.

Example IV

The present embodiment further defines how to align the standard note sets corresponding to the standard score of the aligned note sequence, specifically:

in an alternative embodiment, the audio to be converted is an application scenario in which the audio is singed. Wherein, the singing audio can contain lyric information singed by a user; lyric information corresponding to a singing song can also be obtained separately:

specifically, when generating the tone Fu Shuzu notes, the index of the pitch corresponding to each comparison note is recorded, and the start time and the end time of the corresponding tone can be determined according to the minimum index and the maximum index of the pitch corresponding to each comparison note; the minimum index is multiplied by the duration of each pitch as a start time, and the maximum index is multiplied by the duration of each pitch as an end time, the duration of each pitch being the same.

word cutting is carried out on the audio to be converted according to singing lyrics, and time information of each word to be singed is obtained;

determining a comparison note set corresponding to each standard note in the standard note set according to the standard note corresponding to each word of the singing and the comparison note set corresponding to each word of the singing;

if the word corresponds to a standard note, the comparison note set corresponding to the word is used as the comparison note set corresponding to the standard note, and if the word corresponds to a plurality of standard notes, a DTW algorithm is adopted to align the plurality of standard notes corresponding to the word with the comparison note set corresponding to the word, and further division is carried out, so that the comparison note set corresponding to each standard note in the plurality of standard notes corresponding to the word is obtained;

the word cutting is carried out on the audio to be converted according to singing lyrics, and the time information of each word to be singed is obtained comprises the following steps:

carrying out strong alignment on the audio to be converted and the corresponding singing lyrics by using a preset voice recognition model to obtain time information of each word of singing; in this embodiment, a tdnn speech recognition model is used, and the strong alignment uses a spoken speech evaluation GOP algorithm.

In another alternative embodiment, the aligning the set of standard notes of the aligned note sequence corresponding to the standard score includes:

performing DTW alignment on the comparison note sequence and a standard note set corresponding to the standard score;

taking the comparison notes aligned with the same standard note as a comparison note set corresponding to the standard note, wherein the comparison notes aligned with the same standard note can be one or more;

in this embodiment, the application scenario where the user sings or humms or plays with lyrics is supported.

Example five

The embodiment further defines how to determine the comparison notes in the comparison note set with the highest matching degree with the corresponding standard notes, namely, in the comparison note set associated with the standard notes, the comparison notes with the highest similarity with the corresponding standard notes are matched with the corresponding standard notes, and finally, the comparison notes corresponding to all the standard notes one by one are obtained, namely:

determining the comparison notes with highest matching degree in each comparison note set as the comparison notes with highest matching degree with the corresponding standard notes;

for example, comparing the traversed pitch of the comparison note with the pitch of the corresponding standard note to obtain a score, and storing the score in the digital pitchscore, wherein the design rule of the score can be that the closer the pitch is, the higher the score is;

then comparing the duration value of the traversed comparison note with the duration value of the corresponding standard note to obtain a score, and storing the score in the array duration core, wherein the score design rule is that the closer the duration value is, the higher the score is;

after the traversal is finished, the elements corresponding to the array pitchscore and the duration core are added according to the proportion, and the formula is as follows: notascore=rate+pitchscore+ (1-rate) duration core, where rate may be set as needed, in this embodiment, rate is 0.5;

determining the maximum value in the array notascore and the index i corresponding to the maximum value, wherein the comparison note corresponding to the index i is the comparison note with the highest matching degree with the standard note;

fig. 3 shows the process of note alignment according to this embodiment:

primary alignment: the box of standard note number 1 represents standard note 1, the box of standard note number 2 represents standard note 2, and the box of standard note number 3 represents standard note 3; the comparison note sets formed by the comparison note numbers 1-3 correspond to the standard note 1, and the comparison note sets formed by the comparison note numbers 4-5 correspond to the standard note 2; the comparison note set formed by the comparison note numbers 6 corresponds to the standard note 3; namely, a comparison note set corresponding to the standard note is obtained through the fourth embodiment.

And (3) secondary alignment: through the matching degree calculation, the comparison note with the comparison note number of 2 corresponds to the standard note 1, and the comparison note with the comparison note number of 4 corresponds to the standard note 2; the comparison note with the comparison note number of 6 corresponds to the standard note 3; namely, the comparison notes with the highest matching degree of the standard notes are obtained through the fifth embodiment.

Thus, the accuracy of matching the comparison notes to the standard notes is improved by the two alignments described above.

Example six

Referring to fig. 2, a terminal for converting audio into notes and displaying the notes includes a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for converting audio into notes and displaying the notes according to any one of the first to fifth embodiments when the processor executes the computer program.

In summary, in order to solve the problem of alignment between a user note and a standard music score note, the method and terminal provided by the invention firstly aligns a standard note set corresponding to the standard music score with a comparison note sequence, determines a comparison note set corresponding to each standard note in the standard note set, then determines a comparison note with the highest matching degree of the standard note corresponding to the comparison note set in each comparison note set, and obtains the comparison note which is matched with each standard note one by one, namely, the first alignment between the comparison note and the standard note is realized by determining the comparison note set corresponding to each standard note, the problem that the user singing or the time is inconsistent with the music score is solved, then searches the comparison note which is matched with the standard note most frequently on the basis, realizes the second alignment between the comparison note and the standard note, and further eliminates redundant comparison notes.

The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent changes made by the specification and drawings of the present invention, or direct or indirect application in the relevant art, are included in the scope of the present invention.

Claims

1. A method of converting audio to notes and displaying the notes, comprising the steps of:

acquiring audio to be converted;

aligning and displaying the converted music score with a standard music score corresponding to the audio to be converted;

the converting the audio to be converted into the corresponding note set includes:

2. The method of converting audio to notes and displaying as recited in claim 1 wherein said converting audio to be converted to a corresponding set of notes with reference to said standard set of notes comprises:

3. A method of converting audio to notes and displaying according to claim 2, wherein each comparison note in the set of notes includes its corresponding pitch and duration value, and each standard note in the set of standard notes includes its corresponding pitch and duration value;

the generating a converted score from the set of notes comprises:

4. A method of converting audio to notes and displaying as defined in claim 2, wherein said converting audio to be converted to a corresponding sequence of aligned notes comprises:

extracting a pitch sequence from the audio to be converted;

5. A method of converting audio to notes and displaying according to claim 4, wherein said slicing the pitch sequence to obtain corresponding aligned note sequences includes:

6. A method of converting audio to notes and displaying according to claim 5, wherein said determining one or more alignment notes corresponding to said pitch segment comprises:

7. A method of converting audio to notes and displaying according to any one of claims 2 to 6, further comprising the steps of:

8. The method of converting audio to notes and displaying according to claim 7, wherein said cutting words into lyrics of singing said audio to be converted to obtain time information of each word of singing includes:

and carrying out strong alignment on the audio to be converted and the corresponding singing lyrics by using a preset voice recognition model to obtain the time information of each word of singing.

9. A method of converting audio to notes and displaying according to claim 2, wherein each comparison note in the set of notes includes its corresponding pitch and duration values, each standard note in the set of standard notes

Including their corresponding pitch and duration values; the determining the comparison notes with the highest matching degree with the corresponding standard notes in each comparison note set comprises the following steps:

10. A terminal for converting audio to notes and displaying, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of a method for converting audio to notes and displaying according to any one of claims 1 to 9 when executing the computer program.