CN115331682B

CN115331682B - Method and device for correcting pitch of audio

Info

Publication number: CN115331682B
Application number: CN202110512936.2A
Authority: CN
Inventors: 张超; 朱洁
Original assignee: Beijing Qiyin Miaoxiao Technology Co ltd
Current assignee: Beijing Qiyin Miaoxiao Technology Co ltd
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2024-07-02
Anticipated expiration: 2041-05-11
Also published as: CN115331682A

Abstract

The application discloses a method for correcting the pitch of audio, which comprises the steps of obtaining a fundamental frequency sequence of the audio, and obtaining an original pitch sequence of the audio based on the fundamental frequency sequence; determining a closest standard deviation scale using a similarity algorithm; generating a minimum pitch difference based on the standard-scale; forming a pitch difference sequence to be corrected based on the minimum pitch difference of the fundamental frequency at each time point and the offset direction of the minimum pitch difference; and correcting the audio by taking the audio and the to-be-corrected tone height difference sequence as inputs through resampling and PSOLA algorithm in sequence to obtain corrected audio. According to the method, the fundamental frequency corresponding to the closest music tone scale is determined by searching the most similar tone scale, and the pitch difference sequence to be corrected can be calculated rapidly. The application also discloses a corresponding device for correcting the pitch of the audio.

Description

Method and device for correcting pitch of audio

Technical Field

The invention relates to the field of audio signal processing and the field of algorithm composing, in particular to a method and a device for correcting the pitch of audio content based on music adjustment information.

Background

With the development and rising of the music internet industry, the functions of music playing, online singing (i.e., karaoke, song singing based on accompaniment system), voice social interaction and the like have become a popular field direction in the mobile internet industry. The processing of audio signals, such as the audio correction of K-song products, such as the changing of male voice into female voice, female voice into male voice, special changing voice and the like, is increasingly occurring in mobile internet products, and the personalized requirements of users on the audio signals are increasingly strong.

However, in the process of using the K song software, not every user can accurately sing the correct pitch of every tone, and as the personalized requirements of new generation young people become stronger, the user is not satisfied with the pitch rhythm singing mode which is the same as the original singing mode, but usually uses the original humming mode to create music in own style. However, users often cannot accurately sing existing or original songs with a particular pitch, and thus there is a need for a method and system suitable for correcting deviations in pitch when the original music with or without accompaniment is singed.

Disclosure of Invention

Therefore, the invention provides a method and a device for correcting the pitch of audio content based on music mode information.

Some embodiments of the present application provide a method of correcting a pitch of audio, comprising the steps of: acquiring a base frequency sequence of the audio, wherein the base frequency sequence comprises a plurality of time points and base frequency values of each time point; acquiring an original pitch sequence of the audio based on the base frequency sequence; calculating the closest tone scale to the original pitch sequence in different reference tone scales by using a similarity algorithm and taking the closest tone scale as a standard tone scale; generating a standard-scale fundamental frequency reference table based on the standard-scale; calculating a minimum pitch difference of the fundamental frequency of each time point of the fundamental frequency sequence and the fundamental frequency in the standard-scale fundamental frequency reference table by using a sequencing method; forming a pitch difference sequence to be corrected based on the minimum pitch difference of the fundamental frequency at each time point and the offset direction of the minimum pitch difference; and taking the audio and the to-be-corrected tone height difference sequence as inputs, and correcting the audio sequentially through resampling and a PSOLA algorithm to obtain corrected audio.

Some embodiments of the present application provide a method of correcting a pitch of audio, comprising the steps of: acquiring a base frequency sequence of the audio, wherein the base frequency sequence comprises a plurality of time points and base frequency values of each time point; acquiring an original pitch sequence of the audio based on the base frequency sequence; calculating the nearest tone scale to the original pitch sequence in different reference tone scales by using a cosine similarity algorithm and taking the nearest tone scale as a standard tone scale; generating a standard-scale fundamental frequency reference table based on the standard-scale; calculating a minimum pitch difference of the fundamental frequency of each time point of the fundamental frequency sequence and the fundamental frequency in the standard-scale fundamental frequency reference table by using a rapid sequencing method; forming a pitch difference sequence to be corrected based on the minimum pitch difference of the fundamental frequency at each time point and the offset direction of the minimum pitch difference; and taking the audio and the to-be-corrected tone height difference sequence as inputs, and correcting the audio sequentially through resampling and a PSOLA algorithm to obtain corrected audio.

In some embodiments, the calculating, using a similarity algorithm or a cosine similarity algorithm, a scale closest to the standard pitch sequence from among the different reference scales as a standard scale comprises: using different adjustable scales under a preset adjustable combination as the reference adjustable scales; or using a user-determined differently scaled scale as the reference scaled scale.

In some embodiments, the calculating, using a similarity algorithm or a cosine similarity algorithm, a scale closest to the standard pitch sequence from among the different reference scales as a standard scale comprises: and screening the reference-mode musical scale by taking twelve pitches in each octave as references.

In some embodiments, the number of pitches screened in a musical scale arrangement within an octave is increased or decreased according to the composition of the reference-scale.

In some embodiments, before the reference scale with the greatest similarity is selected as the standard scale, the standard tone sequence is converted into the octave range identical to the reference scale according to the octave relation by using a similarity algorithm or a cosine similarity algorithm.

In some embodiments, the different reference-mode scales include one or more of twelve different-mode scales of common natural major, mid-to-palace modes, modern musical mode scales based on five-tone and brus scales, ethnic scales, and the like.

In some embodiments, the generating of the standard-scale fundamental frequency reference table of the standard-scale comprises generating a standard pitch sequence from the number of octaves of the standard-scale and the number of pitches of the scale in each octave, and converting the standard pitch sequence into a frequency sequence of standard pitches according to an international standard pitch and frequency reference table, thereby obtaining the standard-scale fundamental frequency reference table.

In some embodiments, the number of octaves is preset or set by the user.

In some embodiments, the number of pitches of the scale of the octave in each octave is preset or set by the user.

In some embodiments, the calculating the pitch difference between the fundamental frequency of the audio and the smallest fundamental frequency in the standard-scale fundamental frequency reference table using the fast ordering method comprises: step-size offset is carried out on the fundamental frequency of each sampling point to a first direction; stopping shifting and recording a first accumulated shift amount when shifting to a frequency corresponding to a nearest standard pitch in the standard-scale fundamental frequency reference table or when a difference between the frequency corresponding to the nearest standard pitch in the standard-scale fundamental frequency reference table is smaller than a predefined minimum tolerance; and determining the first accumulated offset as a minimum pitch difference for the sample point.

In some embodiments, the calculating the pitch difference between the fundamental frequency of the audio and the smallest fundamental frequency in the standard-scale fundamental frequency reference table using the fast ordering method comprises: step-size offset is carried out on the fundamental frequency of each sampling point to a first direction; stopping shifting and recording a first accumulated shift amount when shifting to a frequency corresponding to a nearest standard pitch in the standard-scale fundamental frequency reference table or when a difference between the frequency corresponding to the nearest standard pitch in the standard-scale fundamental frequency reference table is smaller than a predefined minimum tolerance; step-size shifting the fundamental frequency of each sampling point to a second direction opposite to the first direction; stopping shifting and recording a second accumulated shift amount when shifting to a frequency corresponding to a nearest standard pitch in the standard-scale fundamental frequency reference table or when a difference between the frequency corresponding to the nearest standard pitch in the standard-scale fundamental frequency reference table is smaller than a predefined minimum tolerance; comparing the first accumulated offset to the second accumulated offset and determining the smaller accumulated offset as the minimum pitch difference for the sample point.

In some embodiments, the offset of the step offset is preset or set by the user.

In some embodiments, the base frequency sequence is determined using PYin algorithm.

In some embodiments, according to the obtained to-be-corrected tone height difference sequence, determining a multiple sequence of the tone pitch to be increased, resampling the singing tone according to 1/S times of the sampling rate of the singing tone of the acquisition user to obtain resampled tone; and the resampled audio is elongated to S times by a PSOLA algorithm and a multiple sequence of the high to-be-corrected audio to be improved.

In some embodiments, the multiple sequence that the height of the sound to be corrected needs to be raised is a calculated decimal array or a fixed decimal rather than a decimal array.

In some embodiments, further comprising re-synthesizing the vocal audio based on a PSOLA algorithm using each pitch difference as a pitch coefficient for a formant at a corresponding time in the standard pitch vocal fundamental frequency sequence.

Further embodiments of the present application provide an apparatus for correcting the pitch of audio comprising at least one processor and at least one memory including computer program code, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform the method of correcting the pitch of audio of any one of the above.

Compared with the traditional pitch correction technology, the method provided by the application firstly uses a similarity algorithm, especially a cosine similarity algorithm to calculate the most similar tone scale, and automatically obtains the tone scale from the collected human voice audio; according to the method, the musical scale is used as a reference standard, the base frequencies corresponding to the musical scale closest to the voice base frequencies at each time point are calculated by using a sorting algorithm, particularly a rapid sorting method, and the calculated amount is reduced by matching the two methods, so that the pitch difference sequence to be corrected can be rapidly calculated.

Compared with the traditional pitch correction technology, the method is completely identical to the following steps in the applicable scene: firstly, the application does not need to compare the music information extracted from the singing voice of the original singer with the music information extracted from the singing voice of the user, and compared with the prior art, the technology of the application is more suitable for the original works which do not appear in the original singer; and secondly, the application can select the adjustable musical scale according to the user definition to carry out the singing of the national style such as Japanese style and Persian style, and is more suitable for the personalized singing output of the user.

Drawings

FIG. 1 is a schematic diagram of a method of modifying pitch of audio according to an embodiment of the application;

FIG. 2 is a schematic diagram of audio processing steps in a method of modifying the pitch of audio according to an embodiment of the application;

FIG. 3 is a schematic diagram of standard-scale processing steps in a method of modifying pitch of audio according to an embodiment of the present application;

FIGS. 4A, 4B are schematic diagrams of the ranking algorithm steps in a method of modifying the pitch of audio according to an embodiment of the application;

Fig. 5 is a schematic diagram of a resampling and PSOLA algorithm in a method of correcting a pitch of audio according to an embodiment of the application.

Detailed Description

The following describes specific embodiments of the present application in detail with reference to the drawings.

Definition of terms:

to clearly express the scope of the present application, and to avoid ambiguity, the general terms of the present application will now be defined as follows:

And (3) adjusting: a plurality of musical sounds are organized together according to different tone heights by taking a tone as a core and a certain interval relation to form an organism.

Tuning musical scale: with a main tone of a certain mode as a starting point and an end point, other tones of the mode are sequentially arranged into a scale according to the order of pitch. Including natural modes, indian modes, ancient modes, etc., each mode including major modes, minor modes, etc.

Musical interval: refers to the pitch relationship between two tones, and is denoted by "degree". Including pure first degree, minor second degree, increased first degree, greater second degree, decreased third degree, lesser third degree, increased second degree, greater third degree, decreased fourth degree, pure fourth degree, increased fourth degree, decreased fifth degree, pure fifth degree, decreased sixth degree, greater sixth degree, decreased seventh degree, greater seventh degree, pure eighth degree, and the like.

It will be readily understood that the components of certain exemplary embodiments, as generally described and illustrated in the figures herein, could be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of some example embodiments of systems, methods, apparatus and computer program products related to interactive multimedia structures is not intended to limit the scope of certain embodiments, but is representative of selected example embodiments.

The features, structures, or characteristics of the example embodiments described throughout the specification may be combined in any suitable manner in one or more example embodiments. For example, the use of the phrases "certain embodiments," "some embodiments," or other similar language throughout this specification refers to the fact that: a particular feature, structure, or characteristic described in connection with the embodiments may be included within at least one embodiment. Thus, appearances of the phrases "in certain embodiments," "in some embodiments," "in other embodiments," or other similar language throughout this specification do not necessarily all refer to the same group of embodiments, and the described features, structures may be combined in any suitable manner in one or more example embodiments. Additionally, the phrase "group" refers to a group of one or more referenced group members. Thus, the phrases "a set," "one or more," and "at least one" or equivalent terms may be used interchangeably. In addition, unless explicitly stated otherwise, "or" is intended to mean "and/or".

In addition, if desired, different functions or operations discussed below may be performed in a different order and/or concurrently with each other. Furthermore, one or more of the described functions or operations may be optional or may be combined, if desired. As such, the following description should be considered as merely illustrative of the principles and teachings of certain exemplary embodiments, and not in limitation thereof.

Aiming at the scenes of the user in humming original songs, adapting existing songs and the like containing human voice, the application provides a method and a system for correcting the pitch of the human voice in the songs to the pitch in the standard-mode musical scale.

The application relates to a method for correcting the pitch of audio, in particular to a method for correcting the pitch of voice audio, which comprises the steps of collecting singing voice of a user when humming songs at a sampling rate to obtain an audio file in WAV format, and step S100; then, performing audio processing on the audio file to obtain a fundamental frequency sequence and a pitch sequence of the audio file, wherein step S200 is performed; comparing the collected human voice pitch sequence with different reference tone scales by using a similarity algorithm, calculating the similarity, determining the closest reference tone scale as a standard tone scale and generating a standard tone scale fundamental frequency reference table, and S300; comparing each standard pitch in the standard-scale musical scale with the human voice fundamental frequency sequence by using a sequencing (sorting) algorithm, and calculating to obtain the minimum pitch difference between each human voice fundamental frequency and the standard pitch in the nearest standard-scale musical scale, wherein the step S400 is performed; and taking the minimum pitch difference corresponding to the pitch to be corrected in the voice fundamental frequency sequence as an input parameter to obtain the voice fundamental frequency sequence in the standard-scale type musical scale, and step S500.

The audio processing of the audio file may be performed using the PYin algorithm. It may comprise deriving an audio base frequency sequence comprising a plurality of time points using PYin algorithm, step S201. The audio base frequency sequence may be in the format of a sequence comprising a record of time points and a record of values of the base frequency at a certain time point. After obtaining the audio base frequency sequence, the audio pitch sequence corresponding to the base frequency of the audio at each time point can be obtained by referring to the international standard pitch and frequency comparison table (SCIENTIFIC PITCH notice), step S202.

After obtaining the pitch sequence of the audio, one or more different reference pitch scales preset by the system or selected by the user may be used for comparison using a similarity algorithm, such as a cosine similarity algorithm, to obtain the pitch scale closest to the pitch sequence of the audio. In addition to cosine similarity algorithms, similarity calculations may be performed using methods such as Euclidean distance, manhattan distance, minkowski distance, jacaded similarity, pelson correlation coefficients, and the like.

The reference pitch scale may be any known pitch, such as a pitch of twelve different pitches of a natural major scale, or may be one or more of a mid-paleo pitch scale, a modern musical pitch scale based on a penta-and brus scale, a national scale, etc.

Taking natural major scale as an example, twelve different scales can be combined, including scales of C major scale, G major scale, E major scale, B major scale, D major scale, A major scale, F major scale, B major scale, E major scale, A major scale and D major scale as reference scales for comparison.

After the standard pitch of the audio is determined, a fast ordering algorithm may be used to compare the fundamental frequency of each standard pitch in the standard pitch with the fundamental frequency sequence of the audio, and a minimum pitch difference between each fundamental frequency of the audio and the standard pitch in the closest standard pitch is calculated. For example, the fundamental frequencies of the standard pitches pre-stored for the standard-scale musical scale may be used for comparison, or the pitches of the standard pitches may be converted into frequencies according to an international standard pitch and frequency comparison table (SCIENTIFIC PITCH note) after the standard-scale musical scale is determined, where the number of standard pitches in the standard-scale may be preset or set by a user, for example, for a determined standard-scale musical scale, a plurality of musical intervals, for example, N, of the standard-scale musical scale may be selected according to an octave relation, or other musical interval relations, for example, five degree musical intervals, etc., step S301, and the number of tones in each musical interval may be selected, step S302, for example, one octave contains seven tones, and N octaves are selected, a sequence of 7N standard pitches is generated, and the pitches of the standard pitches may be converted into frequencies according to the international standard pitch and frequency comparison table (SCIENTIFIC PITCH note) using the standard tile sequence as a base, to obtain the standard-scale musical scale fundamental frequency reference table, step S303.

The octave relationship described above refers to a correspondence relationship between different octaves, for example, the C1 pitch and C2, C3, C4, C5, C6, C7 are set to have octave correspondence. When the C major scale is determined to be a standard-scale musical scale, the pitch of one or more of C1-B1, C2-B2, C3-B3, C4-B4, C5-B5 and C6-B6 can be selected as the pitch of the standard pitch sequence, namely the number of octaves is preset or set by a user, and the number of pitches of the scale musical scale in each octave is preset or set by the user.

Finding the minimum pitch difference using a fast ordering algorithm may include: step-shifting the fundamental frequency of each time point of the audio frequency to a first direction, for example, shifting upwards by 2Hz each time, namely +2Hz, stopping shifting when shifting to a frequency corresponding to a nearest standard pitch in a standard-mode scale fundamental frequency reference table or when the distance is smaller than a preset minimum tolerance, and recording accumulated offset, wherein step 401A is performed; and forming a pitch difference sequence to be corrected by taking the accumulated offset and the offset direction as parameters of the minimum pitch difference and a time point together, wherein the step 402A is performed; the elements include time point, accumulated offset, offset direction. This offset approach is most suitable when the audio is low or high overall.

However, when the overall audio cannot be judged to be low or high, the minimum pitch difference can be found in the following way: step-size shifting the fundamental frequency of each time point of the audio in the first direction, for example, shifting upwards by 2Hz each time, namely +2hz, stopping shifting when shifting to the frequency corresponding to a nearest standard pitch in the standard-scale fundamental frequency reference table or when the distance is smaller than a preset minimum tolerance, and recording a first accumulated offset, wherein the step-size shifting is performed in step 401B; repeating the shifting process to shift in a second direction opposite to the first direction, stopping shifting when shifting to a frequency corresponding to a nearest standard pitch in the standard-scale fundamental frequency reference table or when the distance is smaller than a preset minimum tolerance, and recording a second accumulated shifting amount, wherein the step 402B is performed; taking the accumulated offset with the smallest two offset directions at the time point, recording the accumulated offset as the smallest pitch difference and the offset direction at the time point, and calculating the smallest pitch difference at each time point in the fundamental frequency sequence of the audio to form a pitch difference sequence to be corrected, wherein the step 403B is performed; the elements of the pitch difference sequence to be corrected include the point in time, the pitch difference measured in frequency and the direction of the offset.

The step size may be preset or configured to be settable by a user as desired. The minimum tolerance specifies the minimum distance from the standard pitch. The minimum tolerance may be preset or configured to be user adjustable.

In addition to the rapid ranking algorithm, the minimum pitch difference for each point in time in the base frequency sequence of audio may be used in ways such as insert ranking, bubble ranking, merge ranking, select ranking, etc. to form a pitch difference sequence to be corrected.

After the pitch difference sequence to be corrected is obtained, the collected singing voice, including WAV, MP3, WMA and other common audio formats and fundamental frequencies, and the pitch difference sequence to be corrected are used as input parameters, and the pitch change calculation of the pitch to be corrected is completed through resampling and PSOLA algorithm to obtain a corrected audio format file, so that a corrected voice fundamental frequency sequence is obtained.

The multiple sequence of the tone pitch to be improved can be determined according to the tone height difference sequence to be corrected, for example, the multiple sequence is measured by the expansion multiple of the peak value, namely, division operation is performed, wherein the step S501 comprises resampling the tone of the rhythm standard according to 1/S times of the sampling rate when the singing voice of the user is collected, so as to obtain the resampled tone; and (2) stretching the resampled audio to S times by a PSOLA algorithm and a multiple sequence of the high to-be-corrected audio to be improved, and step S502.

In some embodiments, the multiple sequence in which the height of the tone to be corrected needs to be increased is a fractional array, such as [1.2, 2.1, -1.5, ], 0.8]. In other embodiments, the multiple sequence in which the height of the tone to be corrected needs to be increased may be a fixed fraction, for example, from implementing a direct C-tone to a D-tone, instead of a fraction array; the pitch-shifting coefficient calculating step is based on the PSOLA algorithm to re-synthesize the voice audio by taking each pitch difference in the pitch difference sequence to be corrected as the pitch-shifting coefficient of the formants at the corresponding moment in the standard pitch voice fundamental frequency sequence, and step S503.

It will be apparent that the described embodiments are merely some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

The technical scheme of the present application is further described below in conjunction with application examples of the technical scheme of the present application.

The technical scheme of the application can be presented in the form of a single application program, an App, an applet or the like or integrated in the application program, the App, the applet in the form of a program module so as to realize the automatic accompaniment function of humming.

The App or the program module can provide three pitch automatic correction options of 'height', 'middle degree' and 'low degree' for a user, wherein 'height' automatic correction corresponds to the conversion of a natural large-scale 7-sound scale into a 5-sound scale, so that the change of the fundamental frequency at each time point is larger; the middle automatic correction corresponds to the determination of the pitch difference sequence to be corrected through unidirectional offset; the "low" auto-correction corresponds to the determination of the pitch difference sequence to be corrected by the bi-directional offset described above.

The program may also provide the user with a choice of the mode, for example, the user may be provided with a choice of the mode, and after receiving a choice of a certain mode from the user, the program will determine a specific mode scale within the range of the mode. For example, the user may be provided with options such as "ancient india style", "ancient chinese style", "standard style", etc., where each option corresponds to a set of scale scales, and selection of the style enables selection of the set of scale scales, thereby determining the scale range of the scale for similarity determination. For example, the ancient indian style corresponds to the twelve-mode indian scale sequence, and when the user selects the ancient indian style, the twelve-mode indian scale sequence is formed as a comparison standard.

The application scenario of the technical scheme is not particularly limited. For example, may be implemented in a system that operates independently, or may operate on a network system, for example, a network system of a client-server architecture. In such a network system structure, an application may be configured at the client to provide a graphical user interface to the user through a display device of the client and collect operations such as input, trigger, etc. of the user through a collection device such as a touch screen, etc., and collect singing voice of the user using a sound collection device provided or connected to the client, and play the corrected humming audio using a play device such as a loudspeaker of the client. The humming audio collected by the client may be processed at the server and the modified humming audio may be transmitted to the client.

The client may collect humming from the user in full at a certain sampling rate for a period of time and store the collection in an audio file, such as a WAV format audio file. The acquisition of the humming by the server may include reading the audio file, e.g., WAV file, at a certain sampling rate, or directly reading the client's cache and reading the audio acquired by the client in real time.

In an example one of the method and system for correcting the pitch of audio of the present application, the selection of the corrected option may be performed before the audio collection is started, and after the trigger of the user for the "low" pitch automatic correction function is received, the following program process flow is entered:

Scheme 1A: the server side obtains the storage address of the humming audio file recorded by the user of the user side, and reads the audio wav format file of the user according to the negotiated 44100Hz sampling rate;

Scheme 2A: acquiring the fundamental frequency sung by a user and the standard pitch corresponding to the fundamental frequency by using PYin algorithm; forming a fundamental frequency and pitch sequence corresponding to each time point; and storing the data into a database at a server.

Scheme 3A: and (3) forming a natural large tuning order sequence with twelve modes as a comparison standard, comparing the pitch sequence obtained in the second calculation process with the natural large tuning order sequence under each mode in the twelve modes by using a cosine similarity algorithm to obtain a natural large tuning order with the maximum similarity, and determining and/or recording the natural large tuning order as the tuning order of the track.

Scheme 4A: if the scale of the track is determined to be C major, then all natural large scale sequences within each octave of "C2" through "C7" are selected to form natural large scale sequences: "C1", "D1", "E1", "A7", "B7", "C7", the natural large scale sequence is converted to a frequency sequence with reference to the international standard pitch and frequency lookup table.

Scheme 5A: calculating the minimum pitch difference between the singing fundamental frequency of the user and the fundamental frequency in the reference sequence by using a quick sequencing method to form a pitch difference sequence to be corrected, wherein the frequency to be improved is marked as positive number, and the frequency to be reduced is marked as negative number;

Scheme 6A: and using the to-be-corrected tone height difference sequence as an input parameter, and sequentially performing resampling and PSOLA algorithm to finish correction calculation on the pitch to obtain a corrected pitch sequence and a corresponding corrected voice audio file.

Scheme 7A: and returning the corrected voice audio file to the client.

In the second example of the method and system for correcting the pitch of the audio, the application can realize the function of converting the humming self-melody into the paleo-indian style, and after the user triggers the paleo-indian style function, the corresponding program processing flow is as follows:

Scheme 1B: the server side obtains a storage address of a humming audio file recorded by a user of the user side, and reads an audio wav format file of the user according to a 44100Hz sampling rate;

Scheme 2B: acquiring the fundamental frequency sung by a user and the standard pitch corresponding to the fundamental frequency by using PYin algorithm; forming a fundamental frequency and pitch sequence corresponding to each time point; and storing the data into a database at a server.

Scheme 3B: twelve vowels of the vowels of indian scale are used as alignment criteria, for example, the vowels of indian scale of C scale are C, D, E, #f, G, a, B; and (3) comparing the pitch sequence acquired in the second calculation process with each pitch sequence of the indian scale under each mode by using a cosine similarity algorithm to obtain the indian mode scale with the maximum similarity, and determining and/or recording the indian mode scale as the tone scale of the audio.

Scheme 4B: according to the determined tone scale of the audio, selecting tone scale sequences of octaves of all pitches corresponding to the tone scale of the audio, and converting the Indian tone scale sequences into frequency sequences by referring to an international standard pitch and frequency comparison table. For example, the pitch scale of the audio is determined as the pitch scale of the C-pitch indian scale, then all indian scale sequences in each octave range of "C2" to "C7" are selected and converted into frequency sequences with reference to the international standard pitch and frequency look-up table.

Scheme 5B: calculating the pitch difference between the fundamental frequency of the audio frequency and the fundamental frequency in the reference sequence by using a rapid sequencing method to form a pitch difference sequence to be corrected, wherein the frequency to be improved is marked as positive number, and the frequency to be reduced is marked as negative number;

Flow 6B: and using the to-be-corrected tone height difference sequence as an input parameter, sequentially carrying out resampling and PSOLA algorithm to finish correction calculation on the pitch sequence, obtaining a corrected pitch sequence and then obtaining a corrected voice audio file.

In some example embodiments, the functions of any of the methods, processes, signaling diagrams, algorithms, or flowcharts described herein may be implemented by software and/or computer program code or code portions stored in a memory or other computer readable or tangible medium and executed by a processor.

In some example embodiments, an apparatus may be included or associated with at least one software application, module, unit, or entity configured as arithmetic operations, or as a program or portion thereof (including added or updated software routines), executed by at least one operating processor. Programs, also referred to as program products or computer programs, including software routines, applets and macros, can be stored in any apparatus-readable data storage medium and can include program instructions for performing particular tasks.

A sequence is a unit of data structure that may include strings, lists, tuples, etc.

A computer program product may include one or more computer-executable components configured to perform some example embodiments when the program is run. The one or more computer-executable components may be at least one software code or code portion. The modification and configuration for implementing the functions of the example embodiments may be performed as routines that may be implemented as added or updated software routines. In one example, software routines may be downloaded into the apparatus.

By way of example, software or computer program code, or a portion of code, may be in source code form, object code form, or in some intermediate form, and may be stored on some carrier, distribution medium, or computer readable medium, which may be any entity or device capable of carrying the program. Such carriers may include, for example, recording media, computer memory, read-only memory, electro-optical and/or electronic carrier signals, telecommunications signals, and/or software distribution packages. Depending on the processing power required, the computer program may be executed in a single electronic digital computer or may be distributed among multiple computers. The computer readable medium or computer readable storage medium may be a non-transitory medium.

In other example embodiments, the functions may be performed by a circuit, such as through the use of an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or any other hardware and software combination. In yet another example embodiment, the functionality may be implemented as a signal, such as a non-tangible means that may be carried by an electromagnetic signal downloaded from the Internet or other network.

According to example embodiments, an apparatus such as a node, device or responsive element may be configured as a circuit, a computer or microprocessor (such as a single chip computer element) or a chipset, which may include at least a memory for providing storage capacity for arithmetic operations and/or an operation processor for performing arithmetic operations.

The example embodiments described herein are equally applicable to both singular and plural implementations, whether the language used to describe certain embodiments is in the singular or the plural. For example, embodiments describing the operation of a single computing device are equally applicable to embodiments that include multiple instances of a computing device, and vice versa.

Those of ordinary skill in the art will readily appreciate that the example embodiments described above may be implemented in a different order of operation and/or in hardware elements in a different configuration than that disclosed. Thus, while some embodiments have been described based on these example embodiments, it will be apparent to those of ordinary skill in the art that certain modifications, variations and alternative constructions will be apparent, while remaining within the spirit and scope of the example embodiments.

Claims

1. A method of modifying the pitch of audio, characterized by: comprising the steps of

Acquiring a fundamental frequency sequence of the audio, wherein the fundamental frequency sequence comprises a plurality of sampling points, and each sampling point comprises a time point and a fundamental frequency value of each time point;

acquiring an original pitch sequence of the audio based on the base frequency sequence;

calculating the closest tone scale to the original pitch sequence in different reference tone scales by using a similarity algorithm and taking the closest tone scale as a standard tone scale;

generating a standard-scale fundamental frequency reference table based on the standard-scale;

Calculating a minimum pitch difference of the fundamental frequency of each time point of the fundamental frequency sequence and the fundamental frequency in the standard-scale fundamental frequency reference table using a sorting method, wherein the calculating of the minimum pitch difference comprises: step-size offset is carried out on the fundamental frequency of each sampling point to a first direction; stopping shifting and recording a first accumulated shift amount when shifting to a frequency corresponding to a nearest standard pitch in the standard-scale fundamental frequency reference table or when a difference between the frequency corresponding to the nearest standard pitch in the standard-scale fundamental frequency reference table is smaller than a predefined minimum tolerance; and determining the first accumulated offset as a minimum pitch difference for the sampling point; or step-size shifting is carried out on the fundamental frequency of each sampling point to a first direction; stopping shifting and recording a first accumulated shift amount when shifting to a frequency corresponding to a nearest standard pitch in the standard-scale fundamental frequency reference table or when a difference between the frequency corresponding to the nearest standard pitch in the standard-scale fundamental frequency reference table is smaller than a predefined minimum tolerance; step-size shifting the fundamental frequency of each sampling point to a second direction opposite to the first direction; stopping shifting and recording a second accumulated shift amount when shifting to a frequency corresponding to a nearest standard pitch in the standard-scale fundamental frequency reference table or when a difference between the frequency corresponding to the nearest standard pitch in the standard-scale fundamental frequency reference table is smaller than a predefined minimum tolerance; comparing the first accumulated offset to the second accumulated offset and determining a smaller accumulated offset as a minimum pitch difference for the sample point;

forming a pitch difference sequence to be corrected based on the minimum pitch difference of the fundamental frequency at each time point and the offset direction of the minimum pitch difference;

And taking the audio and the to-be-corrected tone height difference sequence as inputs, and correcting the audio through resampling and a PSOLA algorithm in sequence to obtain corrected audio.

2. A method of modifying the pitch of audio as claimed in claim 1, wherein: the calculating, using a similarity algorithm, a scale closest to the original pitch sequence among different reference scales as a standard scale, including: using different adjustable scales under a preset adjustable combination as the reference adjustable scales; or using a user-determined differently scaled scale as the reference scaled scale.

3. A method of modifying the pitch of audio as claimed in claim 1, wherein: the calculating, using a similarity algorithm, a scale closest to the original pitch sequence among different reference scales as a standard scale, including: and screening the reference-mode musical scale by taking twelve pitches in each octave as references.

4. A method of modifying the pitch of audio as claimed in claim 3, wherein: the number of pitches screened in the musical scale arrangement within one octave is increased or decreased according to the constitution of the reference-mode musical scale.

5. A method of modifying the pitch of audio as claimed in claim 1, wherein: and before the reference scale with the maximum similarity is selected as the standard-scale by using a similarity algorithm, converting the original pitch sequence into the octave range which is the same as the reference scale according to the octave relation.

6. A method of modifying the pitch of audio as claimed in claim 1, wherein: the different reference-mode musical scales include one or more of twelve different-mode musical scales of natural major, mid-palace-mode musical scales, modern musical-mode musical scales based on five-tone and bruse musical scales, national musical scales, and the like.

7. A method of modifying the pitch of audio as claimed in claim 1, wherein: the generation of the standard-scale fundamental frequency reference table of the standard-scale comprises the steps of generating an original pitch sequence according to the number of octaves of the standard-scale and the number of pitches of the scale in each octave, and converting the original pitch sequence into a frequency sequence of standard pitches according to an international standard pitch and frequency reference table, so that the standard-scale fundamental frequency reference table is obtained.

8. The method of modifying the pitch of audio of claim 7, wherein: the number of octaves is preset or set by a user; and/or the number of pitches of the diatonic scale in each octave is preset or set by the user.

9. A method of modifying the pitch of audio as claimed in claim 1, wherein: the offset of the step offset is preset or set by a user.

10. A method of modifying the pitch of audio as claimed in claim 1, wherein: determining a multiple sequence of the audio, the pitch of which needs to be increased, according to the obtained height difference sequence of the to-be-corrected voice, resampling the audio according to 1/S times of the sampling rate when the singing voice of the user is acquired, and obtaining the resampled audio; and the resampled audio is elongated to S times by a PSOLA algorithm and a multiple sequence of the high to-be-corrected audio to be improved.

11. The method of modifying the pitch of audio of claim 10, wherein: the multiple sequence to be improved of the tone height to be corrected is a calculated decimal array or a single fixed decimal.

12. The method of modifying the pitch of audio of claim 11, wherein: and the method further comprises the step of using each pitch difference as a pitch variation coefficient of a formant at a corresponding moment in the standard pitch sequence, and correcting the audio based on a PSOLA algorithm to obtain corrected audio.

13. A method of modifying the pitch of audio as claimed in claim 1, wherein: determining the range of the reference pitch scale prior to said calculating using a similarity algorithm; and/or a calculation method for determining the minimum pitch difference before the calculation using the ranking method.

14. A method of modifying the pitch of audio as claimed in claim 1, wherein: determining the base frequency sequence using PYin algorithm; and/or the similarity algorithm is a cosine similarity algorithm; and/or the ordering method is a rapid ordering method.

15. Apparatus for modifying the pitch of audio comprising at least one processor and at least one memory including computer program code, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform the method of modifying the pitch of audio of any one of claims 1 to 14.