CN113192533A

CN113192533A - Audio processing method and device, electronic equipment and storage medium

Info

Publication number: CN113192533A
Application number: CN202110471498.XA
Authority: CN
Inventors: 陈晓; 李轩; 范欣悦; 崔凡; 邢文浩; 郑羲光; 张晨
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2021-07-30

Abstract

The present disclosure relates to an audio processing method, an audio processing apparatus, an electronic device, and a storage medium, wherein the audio processing method includes: extracting pitch characteristics of the recorded voice data of the user; matching the extracted pitch feature with pitch data of background audio corresponding to the voice data; and adjusting the pitch of the background audio according to the matching result, so that the adjusted pitch of the background audio is adapted to the register of the user.

Description

Audio processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of signal processing, and in particular, to an audio processing method and apparatus, an electronic device, and a storage medium.

Background

Most people are often troubled by their own range when singing, if the pitch of the favorite song is too high, or the range of the song with the opposite sex is not matched, the situation that the song cannot go up or run is easy to occur, and the tone of the accompaniment is usually modified to fit their own range by using the function of 'rising and falling tone'. For many users, it is not intuitive to adjust whether the tone is ascending or descending or ascending or descending, and it can be determined only by continuously retrying, even repeatedly retrying. Since such a manual adjustment scheme requires a user to try several times and does not always obtain an optimal adjustment manner, a running condition or the like occurs for the user.

Disclosure of Invention

The present disclosure provides an audio processing method, device electronic equipment and storage medium to solve at least the problems of time consuming and possibly inaccurate manual adjustment of the pitch in the related art.

According to a first aspect of embodiments of the present disclosure, there is provided an audio processing method, including: extracting pitch characteristics of the recorded voice data of the user; matching the extracted pitch feature with pitch data of background audio corresponding to the voice data; and adjusting the pitch of the background audio according to the matching result, so that the adjusted pitch of the background audio is adapted to the register of the user.

Optionally, the extracting pitch features of the recorded voice data of the user includes: extracting pitch characteristics of voice data of a user in real time while recording the audio data of the user; alternatively, after recording the user's audio data, a pitch feature of the user's voice data is extracted.

Optionally, the extracting, in real time, a pitch feature of the voice data of the user includes: performing voice activity detection on the recorded audio data to obtain voice data of a user; performing pitch detection on the voice data to extract pitch features of the voice data in real time.

Optionally, the matching the extracted pitch feature with pitch data of background audio corresponding to the voice data includes: acquiring pitch data of the background audio; and matching the extracted pitch characteristics with the acquired pitch data of the background audio according to a preset matching mode to obtain a pitch difference value between the matched pitch and the pitch of the background audio.

Optionally, the adjusting the pitch of the background audio according to the matching result includes: adjusting the pitch of the background audio according to the pitch difference value if the pitch difference value satisfies a predetermined condition.

Optionally, the audio processing method further includes: and under the condition that the pitch difference value does not meet the preset condition, outputting a prompt about the failure of tone finding.

Optionally, the audio processing method further includes: and recording the audio content sung by the user according to the background audio after the pitch adjustment.

Optionally, the audio data corresponds to at least a portion of a song that the user desires to sing.

According to a second aspect of the embodiments of the present disclosure, there is provided an audio processing apparatus including: a feature extraction unit configured to extract a pitch feature of the recorded user's voice data; a matching unit configured to match the extracted pitch feature with pitch data of background audio corresponding to the voice data; a pitch adjustment unit configured to adjust a pitch of the background audio according to the matching result so that the adjusted pitch of the background audio is adapted to the register of the user.

Optionally, the audio processing apparatus further includes: a prompt unit configured to output a prompt about a failure in pitch finding in a case where the pitch difference value does not satisfy a predetermined condition.

Optionally, the audio processing apparatus further includes: and the audio recording unit is configured to record the audio content sung by the user according to the background audio after pitch adjustment.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the audio processing method as described above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium storing instructions, characterized in that the instructions, when executed by at least one processor, cause the at least one processor to perform the audio processing method as described above.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by a processor, implement the audio processing method as described above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: the pitch characteristic of the recorded voice data of the user is extracted, the voice data is matched with the pitch data of the background audio, and the pitch of the background audio is adjusted according to the matching result, so that the pitch adjustment is more accurate and efficient, and the background audio which is really suitable for the user's own range is conveniently obtained.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is an exemplary system architecture to which exemplary embodiments of the present disclosure may be applied;

fig. 2 is a flowchart of an audio processing method of an exemplary embodiment of the present disclosure;

fig. 3 is a schematic diagram illustrating an example of an audio processing method of an exemplary embodiment of the present disclosure;

fig. 4 is a screen display example showing when the audio processing method of the exemplary embodiment of the present disclosure is performed;

fig. 5 is a block diagram showing an audio processing apparatus of an exemplary embodiment of the present disclosure;

fig. 6 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.

Fig. 1 illustrates an exemplary system architecture 100 in which exemplary embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. A user may use the

terminal devices

101, 102, 103 to interact with the server 105 over the network 104 to receive or send messages (e.g., video data upload requests, video data download requests), etc. Various communication client applications, such as audio and video call software, audio and video recording software, instant messaging software, conference software, mailbox clients, social platform software, and the like, may be installed on the

terminal devices

101, 102, and 103. The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen and capable of playing, recording, editing, etc. audio and video, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, etc. When the

terminal device

101, 102, 103 is software, it may be installed in the electronic devices listed above, it may be implemented as a plurality of software or software modules (for example, to provide distributed services), or it may be implemented as a single software or software module. And is not particularly limited herein.

The

terminal devices

101, 102, 103 may be equipped with an image capturing device (e.g., a camera) to capture video data. In practice, the smallest visual unit that makes up a video is a Frame (Frame). Each frame is a static image. Temporally successive sequences of frames are composited together to form a motion video. Further, the

terminal apparatuses

101, 102, 103 may also be mounted with a component (e.g., a speaker) for converting an electric signal into sound to play the sound, and may also be mounted with a device (e.g., a microphone) for converting an analog audio signal into a digital audio signal to pick up the sound.

The server 105 may be a server providing various services, such as a background server providing support for multimedia applications installed on the

terminal devices

101, 102, 103. The background server can analyze, store and the like the received data such as the audio and video data uploading request, can also receive the audio and video data downloading request sent by the

terminal equipment

101, 102 and 103, and feeds back the audio and video data indicated by the audio and video data downloading request to the

terminal equipment

101, 102 and 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the audio processing method provided by the embodiment of the present disclosure is generally executed by a terminal device, but may also be executed by a server, or may also be executed by cooperation of the terminal device and the server. Accordingly, the audio processing means may be provided in the terminal device, in the server, or in both the terminal device and the server.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation, and the disclosure is not limited thereto.

Fig. 2 is a flowchart of an audio processing method of an exemplary embodiment of the present disclosure.

Referring to fig. 2, in step S210, a pitch feature of recorded user' S voice data is extracted. According to an exemplary embodiment, in step S210, a pitch feature of the user 'S voice data may be extracted in real time while the user' S audio data is recorded. By extracting the pitch feature of the voice data in real time while recording the audio data of the user, the pitch feature can be extracted more quickly, and then the pitch of the background audio (also referred to as "accompaniment") can be adjusted quickly. Alternatively, the pitch feature of the user's voice data may be extracted after recording the user's audio data. It should be noted that the present disclosure is not limited to the timing of extracting the pitch feature of the voice data of the user. Here, the audio data may correspond to at least a portion of a song that the user desires to sing. That is, the user may sing the entire song to record audio data, or may sing only a small segment of the entire song to record audio data. The present disclosure may help a user find the pitch of the back audio that fits his range, in the case where only audio data of a portion of the user's chorus is recorded. For example, a piece of audio data greater than 5s and less than 30s may be recorded, but is not limited thereto. In addition, as an example, when recording the audio data of the user, the audio data of the user may be collected on a frame-by-frame basis in real time.

The following briefly describes a manner of extracting a pitch feature of user's voice data, taking an example of extracting the pitch feature of user's voice data in real time while recording the user's audio data.

Specifically, first, Voice Activity Detection (VAD) may be performed on the recorded audio data to obtain voice data of a user. The VAD can eliminate the non-voice data in the recorded audio data of the user, so that the pure voice data of the user can be obtained. It is clear to those skilled in the art how to perform VAD detection in detail, and therefore, the details thereof are not described herein. After the voice data of the user is obtained, pitch detection may be performed on the voice data to extract pitch features of the voice data in real time. Here, any pitch detection method may be employed to perform pitch detection, and the present disclosure does not have any limitation on the specific pitch detection manner.

After extracting the pitch feature of the voice data of the user, the extracted pitch feature may be matched with pitch data of background audio corresponding to the voice data at step S220. Specifically, first, pitch data of the background audio may be acquired, and then, the extracted pitch feature may be matched with the acquired pitch data of the background audio according to a predetermined matching manner, so as to obtain a pitch difference between the matched pitch and the pitch of the background audio. As an example, the pitch data for each frame of background audio may be obtained by reading a MIDI file of the background audio. Further, as an example, the predetermined matching manner may be a matching manner based on a Dynamic Time Warping (DTW) algorithm. Specifically, a pitch matching the pitch of the background audio may be calculated according to the DTW algorithm, and then a difference between the matched pitch and the pitch of the background audio may be calculated. As an example, the matched pitch may be determined by calculating a euclidean distance between the feature vector of the extracted pitch features and the feature vector of the pitch of the background audio. Details of the DTW algorithm are well known to those skilled in the art and will not be described here. Although the matching using DTW is mentioned above, the matching method is not limited to this, but may be other matching methods as long as a pitch difference between the matched pitch and the pitch of the background audio can be obtained.

Next, in step S230, the pitch of the background audio may be adjusted according to the matching result, so that the adjusted pitch of the background audio is adapted to the register of the user. Here, the pitch of the adjusted background audio may be adapted to the range of the user by matching the pitch of the adjusted background audio with the pitch characteristics of the user, for example, after obtaining a pitch difference between the matched pitch and the pitch of the background audio in step S220, the adjusted pitch of the background audio obtained by adjusting the original pitch of the background audio according to the pitch difference (for example, if the original pitch of the background audio is added to the pitch difference) will be adapted to the pitch characteristics of the user, in which case, the pitch of the adjusted background audio is considered to be adapted to the range of the user. After obtaining the pitch difference between the matched pitch and the pitch of the background audio in step S220, it may be determined whether the pitch difference satisfies a predetermined condition. In a case where a pitch difference value satisfies a predetermined condition, a pitch of the background audio may be adjusted according to the pitch difference value. And under the condition that the pitch difference value does not meet the preset condition, a prompt about the failure of tone finding can be output, so that the user is prompted to record again. The reason why the pitch of the background audio is adjusted according to the pitch difference value when the pitch difference value meets the predetermined condition is that the user may sing around when recording the audio data, which may result in inaccurate obtained pitch difference value and further affect the accuracy of pitch adjustment.

Optionally, the audio processing method shown in fig. 2 may further include recording audio content that the user sings according to the pitch-adjusted background audio after adjusting the pitch of the background audio. Because singing is performed according to the background audio frequency after the pitch adjustment, the recorded audio content can be more adaptive to the range of the user, so that the audio content with better effect can be recorded by preventing the audio content from running away during singing.

In the above, the audio processing method according to the embodiment of the present disclosure has been described with reference to fig. 2, by extracting the pitch feature of the recorded voice data of the user, matching the pitch feature with the pitch data of the background audio, and adjusting the pitch of the background audio according to the matching result, the pitch adjustment can be more accurate and efficient, thereby facilitating to obtain the background audio really suitable for the user's own range.

To facilitate an intuitive understanding of the audio processing method according to the embodiment of the present disclosure described above with reference to fig. 2, an example of the above audio processing method is briefly described below with further reference to fig. 3 in conjunction with fig. 4.

Fig. 3 is a schematic diagram illustrating an example of an audio processing method of an exemplary embodiment of the present disclosure. Referring to fig. 3, when starting to record a song, it may first be determined whether to activate the sing and tone-finding function. If the singing and tuning function is started, tuning is started, and if the singing and tuning function is not started, formal song recording can be directly started. For example, as shown in (a) of fig. 4, in response to the user selecting a specific menu or button on the user interface (e.g., "singing and tuning" shown in (a)), it may be determined that the singing and tuning function is started at this time. In the user interface shown in fig. 4 (a), the user may also manually tune, for example, set the volume to go up and down or to return to the ear.

When the singing and tuning are started, recording data are collected frame by frame in real time, VAD detection is carried out on the recording data, and the influence of non-voice data is eliminated so as to obtain voice data of a user. If the voice data of the user is obtained, a pitch is detected frame by frame for the obtained voice data to extract a pitch feature of the voice data (i.e., singing pitch data). And if the voice data of the user is not obtained, acquiring the recording frame by frame again. As mentioned above in the description of fig. 2, a user may record audio data by singing only a small segment of an entire song. For example, as shown in (b) of fig. 4, only a part of the lyrics in the whole song may be selected for recording, for example, when the user holds down the "hold singing" button, audio data corresponding to the part of the lyrics may be recorded. In the course of recording audio data, as shown in fig. 4 (c), a corresponding audio recording animation may be displayed on the user interface, and the lyrics currently being sung may be displayed in an enlarged manner. When the user releases the "press and hold singing" button, the recording ends.

The longest recording time and the shortest recording time in the case of singing recording can be preset, for example, the longest recording time can be set to 30s, and the shortest recording time can be set to 5 s. After the pitch detection, whether the recording is finished or not can be judged, if not, the step of collecting the recording frame by frame is skipped to continue the recording, and if the recording is finished, whether the recorded audio length meets the requirement of the recording duration or not can be continuously judged, for example, whether the recorded audio length is less than 5s or not can be judged, when the recording duration is less than 5s, because the matching time is too short, an accurate result is difficult to obtain, at the moment, a user can be prompted to detect that the matching is failed, and then, the user can select to start the tuning again. When the recording duration is greater than or equal to 5s, the extracted singing pitch data can be saved.

Subsequently, the pitch data of singing can be read, for example, the MIDI file corresponding to the lyrics of singing can be read and the pitch value corresponding to each frame of accompaniment can be obtained.

Next, the singing pitch and the accompaniment pitch are matched, for example, a DTW algorithm may be used to obtain a matched pitch, and then a pitch difference between the original pitch of the matched pitch may be calculated. When the pitch difference value meets a predetermined condition (for example, the variance of the pitch difference value is larger than a set threshold range, but not limited thereto), a matching error is prompted, otherwise, the matching is considered to be successful. If the matching is successful, the pitch of the accompaniment can be adjusted according to the calculated pitch difference. As shown in (d) of fig. 4, after the user releases the "hold singing button" to complete recording, if matching is successful and the pitch of the accompaniment is adjusted according to the calculated pitch difference (i.e., the key finding is successful), information indicating how much the key has been raised or lowered may be displayed, for example, "raise or lower +3 has been set", or the like. Conversely, when the user releases the "press and hold singing" button to complete recording, if the matching fails (i.e., the tune search fails), the user may be prompted to start re-recording the data, at which point the tune search may be restarted. In addition, as shown in fig. 4 (e), when the tuning fails, a prompt about the tuning failure may be output on the user interface, for example, "tuning failed, change a sentence and try again" may be displayed.

Further, the formal recording may be started after the pitch is adjusted. Optionally, after the audio is formally recorded, the recorded audio may be subjected to preview editing and publishing.

The audio processing method according to the embodiment of the present disclosure has been described above with reference to fig. 2 and fig. 4, and according to the audio processing method, by extracting the pitch feature of the voice data, matching the voice data with the pitch data of the accompaniment and adjusting the pitch of the accompaniment according to the matching result, the pitch adjustment of the accompaniment is more accurate and efficient, thereby facilitating the user to record audio content with better effect according to the adjusted accompaniment.

Fig. 5 is a block diagram illustrating an audio processing apparatus according to an exemplary embodiment of the present disclosure.

Referring to fig. 5, the audio processing apparatus 500 may include a feature extraction unit 501, a matching unit 502, and a pitch adjustment unit 503. Specifically, the feature extraction unit 501 may extract pitch features of the recorded user's voice data. The matching unit 502 may match the extracted pitch feature with pitch data of background audio corresponding to the voice data. The pitch adjustment unit 503 may adjust the pitch of the background audio according to the matching result, so that the adjusted pitch of the background audio is adapted to the register of the user. Specifically, the matching unit 502 may obtain pitch data of the background audio, and then match the extracted pitch feature with the obtained pitch data of the background audio according to a predetermined matching manner, so as to obtain a pitch difference between the matched pitch and the pitch of the background audio. The pitch adjustment unit 503 may adjust the pitch of the background audio according to the pitch difference value if the pitch difference value satisfies a predetermined condition. Optionally, the audio processing apparatus 500 may further include a prompt unit (not shown). The cue unit may output a cue about a failure of pitch finding in a case where the pitch difference value does not satisfy a predetermined condition. Optionally, the audio processing apparatus 500 may further include an audio recording unit (not shown), and the audio recording unit may record the audio content sung by the user according to the pitch-adjusted background audio.

Since the audio processing method shown in fig. 2 can be performed by the audio processing apparatus 500 shown in fig. 5, and the feature extraction unit 501, the matching unit 502, and the pitch adjustment unit 503 can respectively perform operations corresponding to step S210, step S220, and step S230 in fig. 2, any relevant details related to the operations performed by the units in fig. 5 can be referred to in the corresponding description of fig. 2 to 4, and are not repeated here.

Furthermore, it should be noted that although the audio processing apparatus 500 is described above as being divided into units for respectively performing the corresponding processes, it is clear to those skilled in the art that the processes performed by the units described above can also be performed without any specific unit division by the audio processing apparatus 500 or without explicit demarcation between the units. In addition, the audio processing apparatus 500 may further include other units, for example, a storage unit and the like.

Referring to fig. 6, the electronic device 600 may include at least one memory 601 and at least one processor 602, the at least one memory storing computer-executable instructions that, when executed by the at least one processor, cause the at least one processor 602 to perform an audio processing method according to an embodiment of the disclosure.

By way of example, the electronic device may be a PC computer, tablet device, personal digital assistant, smartphone, or other device capable of executing the set of instructions described above. The electronic device need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) either individually or in combination. The electronic device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In an electronic device, a processor may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special-purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

The processor may execute instructions or code stored in the memory, which may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.

The memory may be integral to the processor, e.g., RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the memory may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the memory.

In addition, the electronic device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device may be connected to each other via a bus and/or a network.

According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions, wherein the instructions, when executed by at least one processor, cause the at least one processor to perform an audio processing method according to an exemplary embodiment of the present disclosure. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a hard disk, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The instructions in the computer-readable storage medium or computer program described above may be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, etc., and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an embodiment of the present disclosure, there may also be provided a computer program product including computer instructions which, when executed by a processor, implement an audio processing method according to an exemplary embodiment of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An audio processing method, comprising:

extracting pitch characteristics of the recorded voice data of the user;

matching the extracted pitch feature with pitch data of background audio corresponding to the voice data;

and adjusting the pitch of the background audio according to the matching result, so that the adjusted pitch of the background audio is adapted to the register of the user.

2. The audio processing method of claim 1, wherein the extracting pitch features of the recorded user's voice data comprises:

extracting pitch characteristics of voice data of a user in real time while recording the audio data of the user; or

After recording the user's audio data, a pitch characteristic of the user's voice data is extracted.

3. The audio processing method of claim 2, wherein the extracting pitch features of the user's voice data in real-time comprises:

performing voice activity detection on the recorded audio data to obtain voice data of a user;

performing pitch detection on the voice data to extract pitch features of the voice data in real time.

4. The audio processing method of claim 1, wherein the matching the extracted pitch feature with pitch data of background audio corresponding to the speech data comprises:

acquiring pitch data of the background audio;

and matching the extracted pitch characteristics with the acquired pitch data of the background audio according to a preset matching mode to obtain a pitch difference value between the matched pitch and the pitch of the background audio.

5. The audio processing method of claim 4, wherein the adjusting the pitch of the background audio according to the matching result comprises:

adjusting the pitch of the background audio according to the pitch difference value if the pitch difference value satisfies a predetermined condition.

6. The audio processing method of claim 4, further comprising:

and under the condition that the pitch difference value does not meet the preset condition, outputting a prompt about the failure of tone finding.

7. The audio processing method of claim 1, further comprising:

and recording the audio content sung by the user according to the background audio after the pitch adjustment.

8. An audio processing apparatus comprising:

a feature extraction unit configured to extract a pitch feature of the recorded user's voice data;

a matching unit configured to match the extracted pitch feature with pitch data of background audio corresponding to the voice data;

a pitch adjustment unit configured to adjust a pitch of the background audio according to the matching result so that the adjusted pitch of the background audio is adapted to the register of the user.

9. An electronic device, comprising:

at least one processor;

at least one memory storing computer-executable instructions,

wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the audio processing method of any of claims 1 to 7.

10. A computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform the audio processing method of any of claims 1 to 7.