CN112037739A

CN112037739A - Data processing method and device and electronic equipment

Info

Publication number: CN112037739A
Application number: CN202010907240.5A
Authority: CN
Inventors: 徐东; 鲁霄
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2020-09-01
Filing date: 2020-09-01
Publication date: 2020-12-04
Anticipated expiration: 2040-09-01
Also published as: CN112037739B

Abstract

The embodiment of the invention provides a data processing method and device, electronic equipment and a storage medium. The data processing method specifically comprises the following steps: selecting a first audio and a second audio from a string burning candidate pool; acquiring music retrieval information of a first audio and music retrieval information of a second audio, wherein the music retrieval information comprises beat information and time information; if the similarity between the beat information of the first audio and the beat information of the second audio is greater than the similarity threshold, smoothing the first audio according to the time information of the first audio and smoothing the second audio according to the time information of the second audio; and overlapping the first audio subjected to the smoothing processing and the second audio subjected to the smoothing processing to form a target audio. By the embodiment of the invention, the audio frequencies with similar beat information are selected for audio frequency overlapping, and the audio frequencies needing to be overlapped are subjected to smooth processing, so that the high-quality serial combustion audio frequency can be efficiently generated.

Description

Data processing method and device and electronic equipment

Technical Field

The present invention relates to the field of internet technologies, and in particular, to a data processing method, an apparatus, and an electronic device.

Background

In the current era of continuous development of internet technology, music is an indispensable part of people's daily life. With the acceleration of modern life rhythm, people's demand for diversification of music is increasing day by day, for example, in addition to listening to the music from the beginning to the end of the audio, people's demand for audio composed of audio segments of different styles and types, namely, for the audio of string fever is increasing.

At present, the string-burning audio is mainly generated in a manual manufacturing mode, and the style difference between the audio and the audio may exist in the string-burning audio formed through manual manufacturing, so that the formed string-burning audio is more abrupt, and meanwhile, the connection between the audio and the audio is not smooth enough, so that the experience of a user on the string-burning audio is poor. Therefore, how to manufacture high-quality serial audio is a problem to be solved urgently at present.

Disclosure of Invention

The embodiment of the invention provides a data processing method, a data processing device, electronic equipment and a storage medium, wherein high-quality string-burning audio can be generated and the user experience is enhanced by selecting audio with similar beat information for overlapping and respectively performing smoothing processing on the audio needing to be overlapped.

In a first aspect, an embodiment of the present invention provides a data processing method, where the data processing method includes:

selecting a first audio and a second audio from a string burning candidate pool;

acquiring music retrieval information of the first audio and music retrieval information of the second audio, wherein the music retrieval information comprises beat information and time information;

if the similarity between the beat information of the first audio and the beat information of the second audio is greater than a similarity threshold, smoothing the first audio according to the time information of the first audio, and smoothing the second audio according to the time information of the second audio;

and overlapping the first audio subjected to the smoothing processing and the second audio subjected to the smoothing processing to form a target audio.

In a second aspect, an embodiment of the present invention provides a data processing apparatus, including:

the selection unit is used for selecting a first audio and a second audio from the string burning candidate pool;

the acquisition unit is used for acquiring music retrieval information of the first audio and music retrieval information of the second audio, wherein the music retrieval information comprises beat information and time information;

the processing unit is used for smoothing the first audio according to the time information of the first audio and smoothing the second audio according to the time information of the second audio if the similarity between the beat information of the first audio and the beat information of the second audio is greater than a similarity threshold;

and the processing unit is also used for performing overlapping processing on the first audio subjected to the smoothing processing and the second audio subjected to the smoothing processing to form target audio.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to perform the operations recited in the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer program instructions for an electronic device, which includes a program for executing the first aspect.

In the embodiment of the invention, a first audio and a second audio with similar beat information are selected from a string burning candidate pool, and the first audio and the second audio are subjected to smoothing processing and overlapping processing, so that the first audio and the second audio can be subjected to string burning to obtain a target audio with higher quality; since the first audio and the second audio are similar-beat audio, the target audio formed after the first audio and the second audio are overlapped can realize smooth transition in beat; in addition, in the process of overlapping the first audio and the second audio, the first audio and the second audio are respectively subjected to smoothing processing, so that the string-burned audio can realize smooth audio turning, and a better playing effect is presented.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a block diagram of a data processing system according to an embodiment of the present invention;

FIG. 2 is a flow chart of a data processing method according to an embodiment of the present invention;

FIG. 3a is a schematic diagram of a smoothing process for a first audio and a second audio according to an embodiment of the present invention;

FIG. 3b is a schematic diagram of another smoothing process for the first audio and the second audio according to an embodiment of the present invention;

FIG. 4 is an audio diagram of a target audio provided by an embodiment of the invention;

FIG. 5 is a schematic diagram of smoothing a first audio, a second audio, and a third audio according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the descriptions of "first", "second", etc. referred to in the embodiments of the present invention are only for descriptive purposes, and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a technical feature defined as "first" or "second" may explicitly or implicitly include at least one such feature.

In the prior art, the method for manually making the string-burning audio is mainly to randomly select two or more sections of audio needing to be subjected to string burning for splicing, and because the audio is randomly selected, the style difference and the music characteristic between the audio and the audio are possibly large, specifically, the beat information difference between the audio is large, so that the string-burning audio generated by making is very abrupt and not natural, and the connection between the audio and the audio is not smooth enough, so that the experience of a user on the string-burning audio is poor.

Based on the above analysis, embodiments of the present invention provide a data processing method and apparatus, an electronic device, and a storage medium. The data processing method specifically comprises the following steps: selecting a first audio and a second audio from a string burning candidate pool; acquiring music retrieval information of the first audio and music retrieval information of the second audio, wherein the music retrieval information comprises beat information and time information; if the similarity between the beat information of the first audio and the beat information of the second audio is greater than a similarity threshold, smoothing the first audio according to the time information of the first audio, and smoothing the second audio according to the time information of the second audio; and overlapping the first audio subjected to the smoothing processing and the second audio subjected to the smoothing processing to form a target audio. By implementing the data processing method provided by the embodiment of the invention, high-quality audio is selected and put into the string burning candidate pool, audio with similar beat information is selected from the string burning candidate pool for audio overlapping, and the audio needing to be overlapped is subjected to smoothing processing, so that the high-quality string burning audio can be generated.

The electronic device according to the embodiments of the present application may be a device providing voice and/or data connectivity to a user, for example, a handheld device with a wireless connection function, an in-vehicle device, and the like. The electronic device may also be other processing devices connected to a wireless modem. The electronic device may communicate with a Radio Access Network (RAN). The electronic Device may also be referred to as a wireless Terminal, a Subscriber Unit (Subscriber Unit), a Subscriber Station (Subscriber Station), a Mobile Station (Mobile), a Remote Station (Remote Station), an Access Point (Access Point), a Remote Terminal (Remote Terminal), an Access Terminal (Access Terminal), a User Terminal (User Terminal), a User Agent (User Agent), a User Device (User Device), or a User Equipment (User Equipment, UE), etc. The electronic equipment may be mobile terminals such as mobile telephones (or so-called "cellular" telephones) and computers with mobile terminals, e.g. portable, pocket, hand-held, computer-included or car-mounted mobile devices, which exchange language and/or data with a radio access network. For example, the electronic device may be a Personal Communication Service (PCS) phone, a cordless phone, a Session Initiation Protocol (SIP) phone, a Wireless Local Loop (WLL) station, a Personal Digital Assistant (PDA), or the like. Common electronic devices include, for example: the Mobile terminal may be a Mobile phone, a tablet computer, a laptop computer, a palmtop computer, a Mobile Internet Device (MID), a vehicle, a roadside Device, an aircraft, a wearable Device, such as a smart watch, a smart bracelet, a pedometer, or the like, but the embodiment of the present application is not limited thereto.

In order to better understand the data processing method provided by the embodiment of the present invention, a system architecture diagram applicable to the embodiment of the present invention is described below. Referring to fig. 1, fig. 1 is a schematic diagram illustrating an architecture of a data processing system according to an embodiment of the present invention. As shown in fig. 1, the system architecture diagram includes at least: electronic device 110, server 120. The electronic device 110 and the server 120 may communicate data via a network, which includes but is not limited to a local area network, a wide area network, a wireless communication network, and the like.

In one possible implementation, the server 120 stores an audio information library, which includes a large amount of audio (candidate audio) and records audio information corresponding to each audio, where the audio information may include, but is not limited to: audio name, audio format, number of channels, lyric information, audio hit index, etc. Where audio may refer to songs, song pieces, music pieces, and so forth. The audio name refers to the name of a certain audio issue; the audio format refers to the way the audio of the audio is stored, such as a lossless format FLAC, or a lossy format Mp 3; the channel number means that the audio is stored in several channels, such as mono channel, two channels, etc.; the lyric information refers to the audio content of the audio; the audio popularity index is used to measure the popularity of the audio, and generally takes the playing amount as an index, and can also be measured by indexes such as the number of searches, the number of users purchasing, and the like.

In a possible implementation manner, the electronic device 110 downloads the first audio and the second audio from an audio information base in the server 120 through a network, the electronic device 110 performs feature extraction on the first audio through a computer algorithm to obtain music retrieval information of the first audio, the electronic device 110 performs feature extraction on the second audio through the computer algorithm to obtain music retrieval information of the second audio, and the music retrieval information may include beat information and time information; if the beat information of the first audio is similar to the beat information of the second audio, the electronic device 110 performs smoothing on the first audio and the second audio, and performs overlapping processing on the smoothed first audio and the smoothed second audio to form a string-burned audio.

In one possible implementation, the electronic device 110 uploads the generated streaming audio to the server 120 via the network, and after the server 120 receives the streaming audio sent by the electronic device, the server 120 stores the streaming audio locally, so that other electronic devices download the streaming audio from the server 120 via the network.

It should be understood that the system architecture diagram described in the embodiment of the present invention is for more clearly illustrating the technical solution of the embodiment of the present invention, and does not constitute a limitation to the technical solution provided in the embodiment of the present invention, and as a person of ordinary skill in the art knows that along with the evolution of the system architecture and the appearance of a new service scenario, the technical solution provided in the embodiment of the present invention is also applicable to similar technical problems.

Referring to fig. 2 based on the system architecture diagram of fig. 1, fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present invention. The method is applied to the electronic device, and as shown in fig. 2, the data processing method may include steps S210 to S240. Wherein:

step S210: and selecting a first audio and a second audio from the string burning candidate pool.

In a possible implementation manner, before the electronic device selects the first audio and the second audio from the string fever candidate pool, the electronic device may further construct the string fever candidate pool. The construction mode of the string burning candidate pool is as follows: the electronic equipment acquires any audio from a pre-established audio information base as an alternative audio to be input into a pool; the electronic equipment extracts the characteristics of the alternative audio in the pool to obtain the characteristic information of the alternative audio in the pool, and determines whether to put the alternative audio in the pool into the string burning candidate pool or not according to the characteristic information. It should be noted that the first audio and the second audio are any candidate pool audio in the candidate pool for serial combustion, and the processing manner of the first audio by the electronic device is the same as the processing manner of the second audio by the electronic device, and the following description will be given in detail by taking the processing manner of the arbitrary candidate pool audio by the electronic device as an example.

The feature information mentioned in the embodiment of the present invention includes direct information and Music information retrieval (Mir). The direct information of the audio includes but is not limited to: audio name, audio format, channel number, lyric information and audio hot index; music retrieval information includes, but is not limited to: beat Per Minute (BPM), downbeat, and time information, which may include a chorus start time and a chorus end time.

For example, the electronic device performs feature extraction on the candidate audio to obtain direct information of the candidate audio, which may specifically include: first, the electronic device numbers the input mass of audio. Specifically, the user uploads 1000 audios to the server through the user terminal, and the electronic device acquires the 1000 audios from the server, and numbers the 1000 audios to 1, 2, 3, …, and 1000, respectively. Then, the electronic device queries an audio information base from a server through a network to obtain information such as an audio name, an audio format, a channel number, lyric information, an audio hit index and the like corresponding to each audio, and determines the information such as the audio name, the audio format, the channel number, the lyric information, the audio hit index and the like as direct information of the audio, and records the direct information as info D. Wherein, the audio name refers to the name of the audio issue; the audio format refers to how the audio of the audio is stored, including but not limited to: lossy format MP3, lossless format FLAC, lossless format WMA (Windows Media Audio), MIDI (musical Instrument interface), twinVQ (Transform-domain Weighted interface Vector Quantization), AMR (Adaptive Multi-Rate); the channel number means that the audio is stored in several channels, such as mono channel, two channels, etc.; the lyric information refers to the audio content of the audio; the audio hit index is to measure the popularity of the audio, and is generally measured by using the playing amount as an index, and in addition, the audio hit index can also be measured by using indexes such as the number of users to search, the number of users to click, the number of users to purchase, and the like.

For example, the electronic device performs feature extraction on the candidate audio to obtain music retrieval information of the candidate audio, which may specifically include: the electronic equipment obtains music retrieval information of the audio in the alternative pool through an information extraction technology based on a neural network, and the music retrieval information is recorded as info. The music retrieval information may include, but is not limited to: beat Per Minute (BPM), downbeat, and time information, which may include a chorus start time and a chorus end time. Specifically, the BPM of the audio is how many beats the audio contains per minute, and is used for measuring the music speed; a downbeat generally refers to the first beat in a music beat, i.e., a strong beat. The BPM of the audio may be obtained by a BPM detection and analysis tool (e.g., a mixmeister bpmannalyzer tool), and the downbeat of the audio may be obtained by analyzing the audio, for example, by a technician analyzing the audio.

In a possible implementation manner, the determining, by the electronic device, the refraining start time of the audio to be selected into the pool and the refraining end time of the audio mainly includes four steps: 1. firstly, performing time domain and frequency domain transformation on the alternative pool audio through Constant Q Transform (CQT) to obtain frequency spectrum information of the alternative pool audio, wherein the CQT can give the energy of each note frequency in the alternative pool audio, which is beneficial to describing the subjective auditory sensation characteristic of the alternative pool audio; 2. then, the selected candidate pool audio is used as an input parameter of a Recurrent Neural Network (CRNN) model, the candidate pool audio is extracted through the CRNN, and audio segments of a plurality of time points of the selected candidate pool audio are output; 3. and finally, traversing audio segments of different time points contained in the alternative input pool audio through a filter to obtain the prediction probability value of each time point, determining the time point corresponding to the probability maximum value in the plurality of prediction probability values as the start time of the refrain, and determining the time point corresponding to the minimum probability value in the plurality of prediction probability values as the end time of the refrain.

In one possible implementation, the feature information of the candidate pool audio includes an audio name, an audio format, and a channel number of the candidate pool audio. The electronic equipment judges whether the audio frequency of the alternative incoming pool meets the quality requirement or not according to the characteristic information of the audio frequency of the alternative incoming pool, and if the audio frequency of the alternative incoming pool meets the quality requirement, the audio frequency of the alternative incoming pool is placed into the candidate cluster burning pool.

In a possible implementation manner, the determining, by the electronic device, whether the candidate pool audio meets the quality requirement according to the feature information of the candidate pool audio specifically includes: judging whether the audio name of the audio in the alternative pool is unique, judging whether the audio format of the audio in the alternative pool is a preset format, and judging whether the number of channels of the audio in the alternative pool is a preset number of channels; if the audio name of the alternative incoming pool audio is unique, the audio format of the alternative incoming pool audio is a preset format, and the number of the channels of the alternative incoming pool audio is a preset channel number, judging that the alternative incoming pool audio meets the quality requirement; and if the audio name of the alternative cell audio is not unique, or the audio format of the alternative cell audio is not a preset format, or the number of the channels of the alternative cell audio is not a preset number of channels, judging that the alternative cell audio does not meet the quality requirement. The preset format can be a lossless format, and the preset number of channels can be the number of channels of two channels or more.

In one possible implementation manner, the electronic device obtains the alternative pooled audio from the audio information base, where the alternative pooled audio may be any audio in the audio information base. Judging whether the audio name of the alternative pool audio is unique or not; the unique audio name means that only one audio name exists in an audio information base; by audio name is not meant to be unique, it is meant that there are multiple identical audio names in the audio information base. If the audio names of the alternative pool-entering audios are not unique, all audios with the same audio names as the alternative pool-entering audios are obtained from an audio information base, and the audio with the highest audio hot index is selected from the audios to be re-determined as the alternative pool-entering audio; or randomly selecting one audio from the audios to be re-determined as the alternative in-pool audio, or selecting the audio with the highest audio quality from the audios to be re-determined as the alternative in-pool audio. After the alternative pool audio is determined again or the audio name of the alternative pool audio is judged to be unique, the electronic equipment performs feature extraction on the alternative pool audio to obtain feature information of the alternative pool audio, judges whether the audio format of the alternative pool audio is a preset format or not according to the feature information of the alternative pool audio, and judges whether the number of channels of the alternative pool audio is a preset number of channels or not; and if the audio format of the alternative audio in the pool is a preset format and the number of the channels of the alternative audio in the pool is a preset number, putting the alternative audio in the pool into a string burning candidate pool.

By the mode, the electronic equipment screens a large amount of audios through the preset screening rule, audios which are higher in quality and popular with users can be obtained, and the alternative pool audios are put into the candidate string burning pool, so that a material with higher quality is provided for the string burning audios.

In a possible implementation manner, after the electronic device performs feature extraction on the feature information of the candidate in-pool audio, the electronic device counts the feature information of the candidate in-pool audio. And if the direct information contained in the characteristic information of the alternative in-pool audio is missing, the alternative in-pool audio cannot be put into the string burning candidate pool, another audio is selected from the audio information base again to serve as a new alternative in-pool audio, and the judgment process is executed again. And if the music retrieval information contained in the characteristic information of the alternative audio pool is missing, re-acquiring the music retrieval information of the alternative audio pool. The specific manner of retrieving the music retrieval information selected from the pool audio may include: firstly, randomly intercepting audio frequency segments of the audio frequency to be selected into a pool to obtain a plurality of audio frequency segments of the audio frequency to be selected into the pool; then, respectively calculating music retrieval information corresponding to each audio segment to obtain a calculation result corresponding to each audio segment, wherein the mode of calculating the music retrieval information of each audio segment is the same as the mode of performing feature extraction on the alternative pool audio by the electronic equipment to obtain steps executed by the electronic equipment in the music retrieval information of the alternative pool audio, and the steps are not repeated herein; and finally, carrying out weighting operation on the plurality of calculation results to obtain new music retrieval information of the audio to be selected into the pool.

Step S220: and acquiring music retrieval information of the first audio and music retrieval information of the second audio, wherein the music retrieval information comprises beat information and time information.

Specifically, the electronic device obtains beat information and time information of a first audio from the candidate streaming pool, and the electronic device obtains beat information and time information of a second audio from the candidate streaming pool. For example, the electronic device acquires the BPM, the downteat, the refrain start time, and the refrain end time of the first audio, and the electronic device acquires the BPM, the downteat, the refrain start time, and the refrain end time of the second audio.

Step S230: and if the similarity between the beat information of the first audio and the beat information of the second audio is greater than the similarity threshold, smoothing the first audio according to the time information of the first audio, and smoothing the second audio according to the time information of the second audio.

In a possible implementation manner, the electronic device obtains a first audio from the string fever candidate pool, and obtains beat information of the first audio, and the electronic device selects, as a second audio, an audio having similarity greater than a similarity threshold with the beat information of the first audio from the string fever candidate pool. The similarity threshold may be a degree threshold preset by a user. If more than one audio with the similarity greater than the similarity threshold with the beat information of the first audio is selected from the candidate cluster burning pool, the audio with the highest audio popularity index is selected as the second audio, or a certain audio is randomly selected as the second audio, which is not limited by the present invention.

For example, the electronic device randomly obtains the BPM1 of the first audio and obtains the first beat information of the first audio from the string burning candidate pool, and the electronic device selects the audio with the similarity greater than 80% to the BPM1 of the first audio from the string burning candidate pool as the second audio. If the number of the audio frequencies with the similarity of more than 80% to the BPM1 in the cluster candidate pool is 3, the audio frequencies are respectively audio frequency 1, audio frequency 2 and audio frequency 3. The electronic device obtains the audio hit index of the 3 audio frequencies, and if the audio hit index corresponding to audio 1 is 80%, the audio hit index corresponding to audio 1 is 85%, and the audio hit index corresponding to audio 1 is 90%, the electronic device takes the audio with the highest audio hit index (i.e., audio 3) as the second audio frequency.

In one possible implementation manner, in the first audio and the second audio, the electronic device determines an audio with a front playing sequence and an audio with a back playing sequence; fade-out processing is performed on the end portion of the audio whose play order is earlier, and fade-in processing is performed on the start portion of the audio whose play order is later. If the electronic equipment determines that the first audio is the audio with the front playing sequence, the second audio is the audio with the back playing sequence. The electronic device fades out the ending portion of the first audio and fades in the beginning portion of the second audio. The ending part of the first audio refers to an audio clip of a first time period backtracking from the chorus ending moment in the first audio; the beginning part of the second audio is an audio segment of a second time period which extends from the start time of the refrain of the second audio to the back. The first time period and the second time period may be set according to actual conditions, and the first time period and the second time period may be the same or different. Specifically, the ending part of the first audio is an audio segment contained by advancing for a period of time (for example, 10 seconds) from the chorus ending time (for example, 3 minutes 50 seconds) of the first audio, that is, from 3 minutes 40 seconds of the first audio until 3 minutes 50 seconds of the first audio, where the audio segment corresponding to 10 seconds is the ending part of the first audio. The start portion of the second audio is an audio segment that is included after a certain period of time (e.g., 10 seconds) from the chorus start time (e.g., 0 minutes 00 seconds) of the second audio, that is, from 0 minutes 00 seconds of the second audio until 0 minutes 10 seconds of the second audio, where the audio segment corresponding to 10 seconds is the start portion of the second audio.

For example, as shown in fig. 3a, fig. 3a is a schematic diagram of smoothing the first audio and the second audio according to an embodiment of the present invention. The dotted circle at the upper part of the double-headed arrow in fig. 3a is the fade-out end time of the first audio, where the fade-out end time may specifically be the refrain end time of the first audio, the solid circle at the lower part of the double-headed arrow is the fade-in end time of the second audio, and the fade-in end time may specifically be the refrain end time of the second audio. The electronic device can fade out the end part of the first audio through a cosine function, and the electronic device can fade in the start part of the second audio through a sine function.

In a possible implementation manner, a playing sequence of the first audio and a playing sequence of the second audio are obtained, where the electronic device may determine the playing sequences of the first audio and the second audio according to the audio duration, for example, if the audio duration of the first audio is greater than the audio duration of the second audio, the electronic device determines that the first audio is an audio with a playing sequence before, and the second audio is an audio with a playing sequence after. For another example, the electronic device determines the playing order of the first audio and the second audio according to the audio style, for example, the audio style of the first audio is style 1, the audio style of the second audio is style 2, the electronic device determines that the audio corresponding to style 1 is the audio with the front playing order, and the audio corresponding to style 2 is the audio with the back playing order according to the specified rule, that is, the electronic device determines that the first audio is the audio with the front playing order, and the second audio is the audio with the back playing order. And if the playing sequence of the second audio is prior to the playing sequence of the first audio, the electronic equipment performs fade-out processing on the ending part of the second audio and performs fade-in processing on the starting part of the first audio. The ending part of the second audio refers to an audio clip of a third time period backtracking from the chorus ending time in the second audio; the beginning part of the first audio refers to an audio clip of a fourth time period continuing from the refraining starting time of the first audio.

For example, as shown in fig. 3b, fig. 3b is a schematic diagram of another smoothing process for the first audio and the second audio according to the embodiment of the present invention. The dotted circle at the upper part of the double-headed arrow in fig. 3b is the fade-out end time of the second audio, where the fade-out end time may specifically be the refrain end time of the second audio, the solid circle at the lower part of the double-headed arrow is the fade-in end time of the first audio, and the fade-in end time may specifically be the refrain start time of the first audio. The electronic device can fade out the end part of the second audio through a cosine function, and the electronic device can fade in the start part of the first audio through a sine function.

It should be noted that, in addition to the fade-in and fade-out processing method, the electronic device may also perform smoothing processing on the first audio and the second audio in a seamless transition manner, specifically, the electronic device may perform overlap-add on the first audio and the second audio, that is, the electronic device superimposes n (n is a positive integer) beats of an end portion of the first audio and n beats of a start portion of the second audio, so that seamless connection between the first audio and the second audio is achieved.

By the method, the electronic equipment performs smoothing processing on the first audio and performs smoothing processing on the second audio, so that the connection between the first audio and the second audio is smoother, and the user experience is enhanced.

Step S240: and overlapping the first audio subjected to the smoothing processing and the second audio subjected to the smoothing processing to form a target audio.

In one possible implementation, the end time point of the fade-out process of the first audio corresponds to a re-beat in the first audio (which may be referred to as a first re-beat for distinction from a re-beat of the second audio), and the end time point of the fade-in process of the second audio corresponds to a re-beat in the second audio (which may be referred to as a second re-beat for distinction from a re-beat of the first audio). The electronic equipment aligns the end time point of the fade-out processing of the first audio with the end time point of the fade-in processing of the second audio; and performing characteristic coincidence on the first repeated beat and the second repeated beat to obtain a target audio, namely the string-burned audio.

In one possible implementation, the end time point of the fade-out process of the second audio corresponds to a re-beat in the second audio (which may be referred to as a third re-beat for distinction from the re-beat of the second audio), and the end time point of the fade-in process of the first audio corresponds to a re-beat in the first audio (which may be referred to as a fourth re-beat for distinction from the re-beat of the first audio). The electronic equipment aligns the end time point of the fade-out processing of the second audio with the end time point of the fade-in processing of the first audio; and performing characteristic coincidence on the third repeated rhythm and the fourth repeated rhythm to obtain a target audio frequency, namely the string-burned audio frequency.

As shown in fig. 4, fig. 4 is an audio schematic diagram of a target audio according to an embodiment of the present invention. The electronic device performs an overlapping operation on the smoothed first audio and the smoothed second audio, specifically, aligns an end time point of fade-out processing of the first audio with an end time point of fade-in processing of the second audio, and vertically adds a time sequence corresponding to the first audio and a time sequence corresponding to the second audio to obtain a target audio. The target audio may be referred to as a cross-fire audio.

In a possible implementation manner, after the electronic device performs overlapping processing on the first audio after smoothing processing and the second audio after smoothing processing to form a target audio, the electronic device selects a third audio from a candidate pool, and acquires music retrieval information of the target audio and music retrieval information of the third audio, where the music retrieval information includes beat information and time information; if the similarity between the beat information of the target audio and the beat information of the third audio is greater than the similarity threshold, smoothing the target audio according to the time information of the target audio, and smoothing the third audio according to the time information of the third audio; and overlapping the smoothed target audio and the smoothed third audio to form a new string-burning audio.

In specific implementation, the electronic device uses the target audio as a new first audio and uses the third audio as a new second audio, and then the electronic device re-executes the step of generating the target audio according to the first audio and the second audio. As shown in fig. 5, after the electronic device generates the target audio according to the first audio and the second audio, the electronic device selects a third audio from the audio candidate pool, and then the electronic device combines the target audio and the third audio to generate a new string fever audio, and so on, the electronic device can splice any multiple segments of audio to obtain the string fever audio of any duration. For example, the number of spliced audios may be a preset threshold, so if the number of audios included in the string-burned audio reaches the threshold, the splicing is stopped. Or whether to terminate the splicing process may be determined according to other conditions, for example, the duration of the spliced serial audio may be a preset threshold, so that the splicing is stopped if the duration of the spliced serial audio reaches the threshold.

According to the data processing method provided by the embodiment of the invention, firstly, high-quality audios are screened out through a preset screening rule and put into the audio candidate pool, then, the electronic equipment selects audios with similar bpm in the audio candidate pool and respectively carries out smoothing processing on the audios, and the audios after smoothing processing are overlapped to obtain the string burning audios. By the method, the high-quality string-burning audio can be generated, the song listening experience of a user can be enriched, and compared with the manual production of the string-burning audio, the production of the string-burning audio can be completed more efficiently, so that the consumption of manpower and material resources is reduced, and the economic cost is reduced.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention. The information processing apparatus is configured to perform the steps performed by the electronic device in the method embodiments corresponding to fig. 2 to 5, and the information processing apparatus may include:

a selecting unit 610, configured to select a first audio and a second audio from a string burning candidate pool;

an obtaining unit 620, configured to obtain music retrieval information of the first audio and music retrieval information of the second audio, where the music retrieval information includes beat information and time information;

a processing unit 630, configured to, if a similarity between the beat information of the first audio and the beat information of the second audio is greater than a similarity threshold, perform smoothing on the first audio according to the time information of the first audio, and perform smoothing on the second audio according to the time information of the second audio;

the processing unit 630 is further configured to perform overlapping processing on the smoothed first audio and the smoothed second audio to form a target audio.

In a possible implementation manner, before the selecting unit 610 selects the first audio and the second audio from the string burning candidate pool, a step of constructing the string burning candidate pool is further performed; wherein the construction step of the skewer burning candidate pool comprises the following steps:

the obtaining unit 620 obtains the candidate pool audio from a pre-constructed audio information base;

the processing unit 630 performs feature extraction on the candidate incoming pool audio to obtain feature information of the candidate incoming pool audio;

the processing unit 630 determines whether the candidate incoming pool audio meets the quality requirement according to the feature information of the candidate incoming pool audio, and if the candidate incoming pool audio meets the quality requirement, the candidate incoming pool audio is placed in the candidate pool for the string sintering;

in a possible implementation manner, the alternative pool audio includes a plurality of time points, the feature information of the alternative pool audio includes music retrieval information of the alternative pool audio, and the time information in the music retrieval information includes a refrain starting time and a refrain ending time;

the processing unit 630 performs feature extraction on the candidate incoming pool audio to obtain feature information of the candidate incoming pool audio, including:

acquiring prediction probability values corresponding to all time points contained in the alternative incoming pool audio respectively;

determining a time point corresponding to the maximum probability value in the plurality of prediction probability values as the start time of the refrain;

and determining the time point corresponding to the minimum probability value in the plurality of prediction probability values as the chorus ending time.

In a possible implementation manner, the feature information of the alternative pool audio includes an audio name, an audio format, and a channel number of the alternative pool audio;

the processing unit 630 determines whether the candidate pool audio meets the quality requirement according to the feature information of the candidate pool audio, and includes:

judging whether the audio name of the alternative pool audio is unique, judging whether the audio format of the alternative pool audio is a preset format, and judging whether the channel number of the alternative pool audio is a preset channel number;

and if the audio name of the alternative pool audio is unique, the audio format of the alternative pool audio is a preset format, and the number of the channels of the alternative pool audio is a preset number, judging that the alternative pool audio meets the quality requirement.

In one possible implementation manner, the smoothing processing unit 630 performs smoothing processing on the first audio according to the time information of the first audio and performs smoothing processing on the second audio according to the time information of the second audio, including:

determining the audio with the front playing sequence and the audio with the back playing sequence in the first audio and the second audio;

fade-out processing is carried out on the end part of the audio with the prior playing sequence, and fade-in processing is carried out on the start part of the audio with the subsequent playing sequence;

the end part of the audio with the prior playing sequence refers to an audio clip of a first time period backtracking from the chorus end time in the audio with the prior playing sequence; the beginning part of the audio with the following playing sequence refers to an audio clip of a second time period which extends from the start time of the refrain of the audio with the following playing sequence to the back.

In a possible implementation manner, the end time point of the fade-out process of the audio with the prior playing sequence corresponds to a re-beat in the audio with the prior playing sequence, and the end time point of the fade-in process of the audio with the later playing sequence corresponds to a re-beat in the audio with the later playing sequence.

In one possible implementation manner, the processing unit 630 performs an overlapping process on the smoothed first audio and the smoothed second audio to form a target audio, including:

aligning the end time point of fade-out processing of the audio with the playing sequence before and the end time point of fade-in processing of the audio with the playing sequence after;

and vertically adding the audio with the front playing sequence and the audio with the back playing sequence after aligning to obtain the target audio.

In a possible implementation manner, the processing unit 630 vertically adds the aligned audio before the playing sequence and the audio after the playing sequence to obtain the target audio, including:

and performing characteristic coincidence on the repeated beat in the audio with the front playing sequence and the repeated beat in the audio with the back playing sequence to obtain the target audio.

According to the data processing device provided by the embodiment of the invention, firstly, high-quality audios are screened out through a preset screening rule and put into the audio candidate pool, then, the electronic equipment selects audios with similar bpm in the audio candidate pool and respectively carries out smoothing processing on the audios, and the audios after smoothing processing are overlapped to obtain the string burning audios. By the method, the high-quality string-burning audio can be generated, the song listening experience of a user can be enriched, and compared with the manual production of the string-burning audio, the production of the string-burning audio can be completed more efficiently, so that the consumption of manpower and material resources is reduced, and the economic cost is reduced.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device is configured to execute steps executed by the electronic device in the method embodiments corresponding to fig. 2 to fig. 5, and the electronic device includes: one or more processors 710; one or more input devices 720, one or more output devices 730, and memory 740. The processor 710, the input device 720, the output device 730, and the memory 740 are connected by a bus 750. The memory 720 is used to store a computer program comprising program instructions, and the processor 710 is used to execute the program instructions stored in the memory 740 to perform the following operations:

selecting a first audio and a second audio from a string burning candidate pool; acquiring music retrieval information of the first audio and music retrieval information of the second audio, wherein the music retrieval information comprises beat information and time information; if the similarity between the beat information of the first audio and the beat information of the second audio is greater than a similarity threshold, smoothing the first audio according to the time information of the first audio, and smoothing the second audio according to the time information of the second audio; and overlapping the first audio subjected to the smoothing processing and the second audio subjected to the smoothing processing to form a target audio.

In a possible implementation manner, before the processor 710 selects the first audio and the second audio from the string burning candidate pool, a step of constructing the string burning candidate pool is further included; wherein the construction step of the skewer burning candidate pool comprises the following steps:

acquiring alternative pool audio from a pre-constructed audio information base;

performing feature extraction on the alternative incoming pool audio to obtain feature information of the alternative incoming pool audio;

and judging whether the alternative incoming pool audio meets the quality requirement or not according to the characteristic information of the alternative incoming pool audio, and if the alternative incoming pool audio meets the quality requirement, putting the alternative incoming pool audio into the string burning candidate pool.

the processor 710 performs feature extraction on the candidate pool audio to obtain feature information of the candidate pool audio, including:

the processor 710 determines whether the candidate pool audio meets the quality requirement according to the feature information of the candidate pool audio, including:

In one possible implementation, the smoothing of the first audio by the processor 710 according to the time information of the first audio and the smoothing of the second audio according to the time information of the second audio includes:

In one possible implementation, the processor 710 performs an overlapping process on the smoothed first audio and the smoothed second audio to form a target audio, including:

In one possible implementation manner, the vertically adding the aligned audio before the playing sequence and the audio after the playing sequence by the processor 710 to obtain the target audio includes:

According to the electronic equipment provided by the embodiment of the invention, firstly, high-quality audios are screened out through a preset screening rule and put into the audio candidate pool, then, the electronic equipment selects audios with similar bpm in the audio candidate pool and respectively carries out smoothing treatment on the audios, and the audios after smoothing treatment are overlapped to obtain the string burning audios. By the method, the high-quality string-burning audio can be generated, the song listening experience of a user can be enriched, and compared with the manual production of the string-burning audio, the production of the string-burning audio can be completed more efficiently, so that the consumption of manpower and material resources is reduced, and the economic cost is reduced.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a processor, the steps performed by the electronic device in the foregoing embodiments may be performed.

It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium, and when executed, may include the processes of the above embodiments of the data processing method. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the invention has been described with reference to a number of embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of data processing, the method comprising:

2. The method of claim 1, wherein before the selecting the first audio and the second audio from the string fever candidate pool, the method further comprises a step of constructing the string fever candidate pool; wherein the construction step of the skewer burning candidate pool comprises the following steps:

acquiring alternative pool audio from a pre-constructed audio information base;

3. The method according to claim 2, wherein the alternative pooled audio comprises a plurality of time points, and the characteristic information of the alternative pooled audio comprises a refraining start time and a refraining end time of the alternative pooled audio;

the extracting the features of the alternative pool audio to obtain the feature information of the alternative pool audio comprises:

4. The method of claim 2, wherein the feature information of the alternative pooled audio comprises an audio name, an audio format, and a channel number of the alternative pooled audio;

the judging whether the alternative pool audio meets the quality requirement or not according to the characteristic information of the alternative pool audio comprises the following steps:

5. The method according to claim 1, wherein the time information includes a refrain start time and a refrain end time; the smoothing the first audio according to the time information of the first audio and the smoothing the second audio according to the time information of the second audio includes:

6. The method according to claim 5, wherein the end time point of fade-out processing of audio in the preceding playing order corresponds to a re-beat in audio in the preceding playing order, and the end time point of fade-in processing of audio in the following playing order corresponds to a re-beat in audio in the following playing order.

7. The method of claim 6, wherein overlapping the smoothed first audio and the smoothed second audio to form a target audio comprises:

8. The method of claim 7, wherein the vertically adding the aligned audio before the playing sequence and the audio after the playing sequence to obtain the target audio comprises:

9. A data processing apparatus, comprising:

10. An electronic device, comprising a memory and a processor, wherein the memory stores a set of program codes, and the processor calls the program codes stored in the memory to execute the data processing method according to any one of 1 to 8.