CN111862913A - Method, device, equipment and storage medium for converting voice into rap music - Google Patents

Method, device, equipment and storage medium for converting voice into rap music Download PDF

Info

Publication number
CN111862913A
CN111862913A CN202010688502.3A CN202010688502A CN111862913A CN 111862913 A CN111862913 A CN 111862913A CN 202010688502 A CN202010688502 A CN 202010688502A CN 111862913 A CN111862913 A CN 111862913A
Authority
CN
China
Prior art keywords
alignment
rhythm
information
music
period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010688502.3A
Other languages
Chinese (zh)
Other versions
CN111862913B (en
Inventor
徐雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Baiguoyuan Information Technology Co Ltd
Original Assignee
Guangzhou Baiguoyuan Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Baiguoyuan Information Technology Co Ltd filed Critical Guangzhou Baiguoyuan Information Technology Co Ltd
Priority to CN202010688502.3A priority Critical patent/CN111862913B/en
Publication of CN111862913A publication Critical patent/CN111862913A/en
Priority to PCT/CN2021/095236 priority patent/WO2022012164A1/en
Application granted granted Critical
Publication of CN111862913B publication Critical patent/CN111862913B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/111Automatic composing, i.e. using predefined musical rules
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/341Rhythm pattern selection, synthesis or composition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B20/00Energy efficient lighting technologies, e.g. halogen lamps or gas discharge lamps
    • Y02B20/40Control techniques providing energy savings, e.g. smart controller or presence detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for converting voice into rap music. The method comprises the steps of identifying an obtained voice segment and processing selected background music to obtain character attribute information of characters in the voice segment and music rhythm information of the background music; determining at least one alignment period for aligning the voice segment with the background music according to the character attribute information and the music rhythm information, and obtaining an alignment information table of each alignment period; and controlling the alignment of the characters in the voice section and rhythm points in the background music according to each alignment information table, and forming a rap audio after tonal modification adjustment and special effect processing. The method converts the randomly recorded voice into the rap fragments matched with the background music without limiting the voice content to be converted, ensures the free recording of the voice content to be converted, simplifies the implementation process of voice conversion, avoids the condition of dislocation of the rhythm points of the voice characters and the music, and improves the application range of the voice conversion rap music.

Description

Method, device, equipment and storage medium for converting voice into rap music
Technical Field
The embodiment of the invention relates to the technical field of music production, in particular to a method, a device, equipment and a storage medium for converting voice into rap music.
Background
With the popularization of various karaoke software, the research on the tone modification algorithm and the human voice to music algorithm gradually receives wide attention, and the interest of people in automatically modifying the tone and changing the speaking into singing is increased more and more. In recent years, rap culture gradually enters the field of view of the public, and rap music is characterized in that a composer can rapidly and rhythmically speak a series of rhymes under background music, a complicated process is often required in the process of making the rap music, and for most non-audio processing personnel, the study and use of professional audio processing software is not only required, but also complicated manual operation is time-consuming.
For the above problems, some voice conversion software suitable for non-audio processing personnel to operate currently appears, however, different defects exist in the process of realizing voice conversion rap by using different existing voice conversion software, for example, one technical scheme of voice conversion rap limits that specific lyrics need to be read aloud, and since the lyrics are completely matched with background music, the alignment position of a word and a rhythm point is fixed, and the scheme cannot well process the situation that the content and the length of the lyrics are unknown, so that the creation space of a user during application is reduced, and the application prospect of the scheme is further limited. For another example, another technical scheme for converting speech into rap has complex algorithm design on audio segmentation and audio alignment, which increases the conversion difficulty, and meanwhile, the problem of misplacement of rhythm points of speech characters and music exists, and this conversion mode is not favorable for effectively processing the user's own uploading of music.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a storage medium for converting speech into rap music, so as to solve the problems of limited speech content and poor speech conversion effect in the existing speech conversion.
In a first aspect, an embodiment of the present invention provides a method for converting speech into rap music, including:
recognizing the obtained voice segment and processing the selected background music to obtain character attribute information of characters in the voice segment and music rhythm information of the background music;
determining at least one alignment period for aligning the voice segment with the background music according to the character attribute information and the music rhythm information, and obtaining an alignment information table of each alignment period;
and controlling the alignment of the characters in the voice section and rhythm points in the background music according to the alignment information tables, and forming a rap audio after tone change adjustment and special effect processing.
In a second aspect, an embodiment of the present invention provides an apparatus for converting speech into rap music, including:
the information determining module is used for identifying the obtained voice segment and processing the selected background music to obtain character attribute information of characters in the voice segment and music rhythm information of the background music;
An alignment information determining module, configured to determine at least one alignment period for aligning the speech segment with the background music according to the text attribute information and the music tempo information, and obtain an alignment information table of each alignment period;
and the conversion control module is used for controlling the alignment of the characters in the voice section and the rhythm points in the background music according to the alignment information tables, and forming a rap audio after tone-changing adjustment and special effect processing.
In a third aspect, an embodiment of the present invention provides a computer device, including:
one or more processors;
storage means for storing one or more programs;
the one or more programs are executed by the one or more processors, so that the one or more processors implement the method for converting speech into rap music provided by the embodiment of the first aspect of the present invention.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for converting speech into rap music provided by an embodiment of the first aspect of the present invention.
In the method, the apparatus, the device and the storage medium for converting voice into rap music provided by the embodiment of the invention, firstly, the obtained voice segment can be identified and the selected background music can be processed to obtain the character attribute information of characters in the voice segment and the music rhythm information in the background music; then, at least one alignment period used for matching the voice segment with the background music is determined according to the character attribute information and the music rhythm information, and an alignment information table of each corresponding period is obtained; finally, according to each alignment information table, the characters in the voice section are aligned with the rhythm points in the background music, and the rap audio is formed after the tone change adjustment and the special effect processing. By the technical scheme, the voice content segments randomly recorded by the user are effectively converted into the rap segments matched with the background music, the complicated manual audio editing and making process is simplified, and the possibility of rap music making is provided for non-professional audio processing personnel; meanwhile, compared with the existing voice conversion rap method, the voice content to be converted does not need to be limited, the free recording of the voice content to be converted is ensured, the implementation process of voice conversion is simplified, the condition that the rhythm points of voice characters and music are staggered is avoided, and the application range of voice conversion rap music is expanded.
Drawings
Fig. 1 is a schematic flow chart illustrating a method for converting speech into rap music according to an embodiment of the present invention;
fig. 2 is a schematic flow chart illustrating a method for converting speech into rap music according to a second embodiment of the present invention;
FIG. 3 is a flowchart illustrating an implementation of determining an alignment period in the method for converting speech into rap music according to the present embodiment;
FIG. 4 is a flowchart illustrating an implementation of determining an alignment unit and information of the alignment unit in an alignment period in the method for converting speech into rap music according to this embodiment;
FIG. 5 is a flowchart illustrating an embodiment of a method for determining an alignment unit and information of the alignment unit in an alignment cycle according to the present invention;
fig. 6 is a block diagram illustrating a structure of an apparatus for converting speech into rap music according to a third embodiment of the present invention;
fig. 7 is a schematic diagram of a hardware structure of a computer device according to a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.
In the description of the present application, it is to be understood that the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not necessarily used to describe a particular order or sequence, nor are they to be construed as indicating or implying relative importance. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
Example one
Fig. 1 is a flow chart illustrating a method for converting speech into rap music according to an embodiment of the present invention, which is applicable to a case of converting speech segments recorded by a user into rap music, and which can be executed by an apparatus for converting speech into rap music, where the apparatus can be implemented by software and/or hardware, and can be generally integrated on a computer device.
In the application mode, a selection interface of background music can be provided for the user, so that the background music selected by the user is obtained; then, a selection interface of voice content can be provided for the user, so that a voice segment which is recorded by the user in real time by triggering the recording button is obtained, or a pre-recorded voice segment which is uploaded by the user by triggering the uploading button is obtained; then, the method for converting the speech into the rap music provided in the embodiment described below can be used to convert the obtained speech segments into rap segments matching with the background music.
As shown in fig. 1, a method for converting speech into rap music provided by an embodiment of the present invention specifically includes the following operations:
s101, recognizing the obtained voice segment and processing the selected background music to obtain character attribute information of characters in the voice segment and music rhythm information of the background music.
In this embodiment, the obtained speech segments may be understood as speech segments obtained by real-time recording or pre-recording of the user before the step is performed, and the selected background music may be understood as music to be used selected by the user from the background music set and received before the step is performed.
The step can carry out voice recognition on the voice segment, so that the character attribute information related to the character serial number, the character pronunciation duration (character starting and stopping time), the initial position of the first vowel in the character and the like of the character included in the voice segment can be obtained; the music tempo detection processing can also be performed on the background music, so that music tempo information related to the number of the tempo points of each tempo point included in the background music, the positions of the tempo points, the number of the tempo points included in each divided tempo cycle, and the like can be acquired.
It should be noted that the present embodiment does not limit the specific manner of speech recognition, text detection, and rhythm point detection, as long as the required text attribute information and music rhythm information can be obtained.
S102, determining at least one alignment period for aligning the voice segment with the background music according to the character attribute information and the music rhythm information, and obtaining an alignment information table of each alignment period.
It should be noted that, for the formation of the user speech to music segments, in addition to the speech recognition and beat detection in the above steps, the most important step is to align the corresponding characters in the speech segments with the rhythm points in the background music. The alignment of the speech segment and the background music can be considered as dividing the speech into individual characters, and each character is located at a strong rhythm and regular accent point, wherein the characters may be repeated with some beginning characters or middle characters to enhance the rhythm. Therefore, when the embodiment implements conversion of the speech into the rap music, it is necessary to determine the alignment period and the corresponding alignment information table for aligning the speech segments and the background music through this step.
Specifically, the alignment period may be understood as a minimum repetition unit including a rhythm point capable of aligning with all characters in the speech segment, that is, from a certain time T, the rhythm of the background music is repeated with a fixed period T capable of aligning with all characters in the speech segment. The alignment information table can be understood as an information statement table containing information such as corresponding relation information (such as a rhythm point serial number and a character serial number) of a needed rhythm point and characters to be aligned and a speed ratio when the rhythm point and the characters are aligned in an alignment period.
The specific implementation of this step can be expressed as:
the total number of texts included in the speech segment can be determined from the text attribute information, the total number of rhythm points of the rhythm points included in the background music can be determined from the music rhythm information, and the cycle information of the beat cycle formed by dividing the rhythm points can be determined. The beat cycle is understood to be a minimum rhythm repetition unit found from the rhythm point, i.e. the rhythm of the background music repeats with a fixed cycle Z from a certain time.
Then, according to the total number of the characters and the number of the rhythm points included in one rhythm period, whether the existing rhythm period can meet the condition of being used as an alignment period can be determined, if yes, each rhythm period is directly taken as the alignment period, and if not, the period length of the rhythm period needs to be updated, so that the rhythm period which can be used as the alignment period is obtained.
Then, because the rhythm of each alignment cycle is repeated, one alignment cycle can be randomly selected, the rhythm point to be aligned in the alignment cycle of each character in the voice segment is determined by combining the character start-stop time in the character attribute information, the start position of the first vowel of the character, and the node point information of the rhythm point in one alignment cycle, and the gear ratio required for aligning the rhythm point, so as to form an information table comprising the sequence number of the rhythm point, the sequence number of the associated character and the corresponding gear ratio, and use the information table as the alignment information table of the alignment cycle.
Finally, the step may regard the alignment information table as an alignment information table for each complete alignment period, and for incomplete alignment periods, a part of the alignment information may be extracted from the alignment information table to form a corresponding alignment information table, so that at least one alignment period and an alignment information table corresponding to each alignment period are obtained through the step.
S103, controlling the alignment of the characters in the voice section and the rhythm points in the background music according to the alignment information tables, and forming a rap audio after tone modification and special effect processing.
In this embodiment, in this step, the matched text and rhythm point may be determined directly through the alignment period formed by dividing the rhythm point of the background music and the alignment information table including the alignment relationship between the text and the rhythm point of the voice segment, and the alignment between the text in the voice segment and the rhythm point in the background music and the speed change of the aligned audio based on the corresponding speed change ratio may be controlled.
The method for converting the voice into the rap music provided by the embodiment of the invention comprises the following steps of firstly, identifying an obtained voice segment and processing selected background music to obtain character attribute information of characters in the voice segment and music rhythm information in the background music; then, at least one alignment cycle used for matching the voice segment with the background music is determined according to the text attribute information and the music rhythm information, and an alignment information table of each corresponding cycle is obtained; finally, according to each alignment information table, the alignment of the characters in the voice section and rhythm points in the background music is controlled, and a rap audio is formed after tonal modification adjustment and special effect processing. By the technical scheme, the voice content segment randomly recorded by the user is effectively converted into the rap segment matched with the background music, the complicated process of manual audio editing and making is simplified, and the possibility of making the rap music is provided for non-professional audio processing personnel; meanwhile, compared with the existing voice conversion rap method, the voice content to be converted does not need to be limited, the free recording of the voice content to be converted is ensured, the implementation process of voice conversion is simplified, the condition that the rhythm points of voice characters and music are staggered is avoided, and the application range of voice conversion rap music is expanded.
As an optional embodiment of the first embodiment of the present invention, on the basis of the above embodiment, before determining at least one alignment period for aligning the speech segment with the background music according to the text attribute information and the music tempo information, and obtaining an alignment information table of each alignment period, the optimization includes: if the total amount of the characters in the character attribute information is larger than the total amount of the rhythm points in the music rhythm information, ending the processing of converting the voice segment into the rap music and giving a prompt for obtaining the voice segment or the background music again.
It should be noted that in the implementation of the method for converting speech into rap music provided by this embodiment, the execution condition of S102 and S103 thereof may be default that the total amount of characters in the character attribute information obtained in S101 is less than or equal to the total amount of rhythm points in the music rhythm information, that is, the total number of characters in the speech segment to be obtained is greater than the number of rhythm points in the background music. When the above condition is not satisfied, it may be considered that the condition for continuing to convert the voice into the rap music is not satisfied, and at this time, the operation of the optional embodiment may be performed, that is, when it is determined that the total amount of the characters is greater than the total amount of the pitch point, the execution of the subsequent step of converting the voice segment into the rap music may be ended, and a prompt for re-recording the voice segment is performed at the same time, so as to inform the user of re-recording the voice segment. Alternatively, there may be other optional operations, for example, the present alternative embodiment may also give a prompt to reselect the background music to inform the user of the reselection of the background music.
The operation of the optional embodiment ensures the effective matching of the voice segment to be converted and the background music, thereby improving the user experience of converting the voice into the rap music.
Example two
Fig. 2 is a schematic flow chart of a method for converting speech into rap music according to a second embodiment of the present invention, which is optimized based on the first embodiment, in this embodiment, specifically, the obtained speech segment is recognized and the selected background music is processed, so as to obtain text attribute information of text in the speech segment and music tempo information of the background music, which are further optimized as follows: carrying out noise reduction processing and end point detection processing on the voice section selected by the user, and obtaining the character sequence number, the starting and stopping time, the initial position of the first vowel and the character total amount of each character in the voice section through the voice recognition of the processed voice section to form character attribute information of the voice section; detecting rhythm points and dividing rhythm periods of background music selected by a user, and determining that the background music comprises the total quantity of the rhythm points, the serial numbers of the rhythm points and period information of each rhythm period to form music rhythm information of the background music; wherein the period information includes: the cycle number, the number of rhythm points including the rhythm point in the beat cycle, the sequence number of the rhythm point of each rhythm point and the start time of the rhythm point.
Meanwhile, in this embodiment, at least one alignment period for aligning the speech segment with the background music is determined according to the text attribute information and the music tempo information, and an alignment information table for each alignment period is further optimized as follows: determining at least one alignment cycle for aligning the voice segment with the background music according to the total amount of characters in the character attribute information and cycle information of each beat cycle in the music rhythm information; selecting a complete alignment cycle as a rhythm section to be aligned, and determining at least one alignment unit and corresponding alignment unit information according to the character attribute information and rhythm point information of rhythm points to be aligned in the rhythm section to be aligned; and summarizing the information of each alignment unit to form a current alignment information table of the rhythm section to be aligned, and determining the alignment information tables of the rest alignment periods according to the current alignment information table.
As shown in fig. 2, the method for converting speech into rap music provided in the second embodiment specifically includes the following operations:
s201, carrying out noise reduction processing and end point detection processing on the voice section selected by the user, and obtaining the character sequence number, the starting and ending time, the initial position of the first vowel and the character total amount of each character in the voice section through the voice recognition of the processed voice section to form character attribute information of the voice section.
In this embodiment, a noise processing strategy in the audio processing may be adopted to perform noise reduction processing on the recorded speech segment, an endpoint detection strategy is adopted to remove the mute segment from the noise-reduced speech segment, and then a speech recognition strategy may be adopted to recognize the processed speech segment, so as to obtain the relevant information of each character constituting the speech segment.
The obtained information may specifically include the total number of characters included in the whole speech segment, the serial number of each character, the start-stop time of the character in the speech segment, and the start position of the first vowel of the pronunciation corresponding to the character, where the start-stop time of the character and the start position of the first vowel may be regarded as a relative time point, that is, the start time of the first character may be regarded as 0 second according to the playing sequence of the whole speech. In this embodiment, the information may be recorded as text attribute information corresponding to the voice segment.
For example, table 1 shows a data table effect display of text attribute information, as shown in table 1, each column in table 1 may be regarded as a text attribute item, which at least may include a text sequence number, a text start time, a start time of a first vowel in a text, and a text end time, and a row number of the table may be regarded as a total number of texts included in a speech segment.
Table 1 text attribute information for text in speech segment
Figure BDA0002588485200000041
Figure BDA0002588485200000051
S202, rhythm point detection and rhythm period division are carried out on the background music selected by the user, the background music is determined to contain the total quantity of rhythm points, the serial number of the rhythm points and the period information of each rhythm period, and the music rhythm information of the background music is formed.
In this embodiment, a rhythm point detection strategy in audio processing may be adopted to detect a strong rhythm accent point (i.e., a rhythm point) from background music, and then a beat division strategy is adopted to find a pronunciation rule of the detected rhythm point, so as to divide a beat cycle with a minimum rhythm repetition unit. For a piece of background music, the detected rhythm point itself has certain attribute information, such as the serial number of the rhythm point, the total amount of the rhythm point, and the position of the rhythm point (i.e., the relative time of the rhythm point), and after the beat detection, corresponding cycle information is also formed corresponding to each beat cycle, for example, the cycle information may include: the cycle number, the number of rhythm points of the rhythm points included in the beat cycle, the serial number of the rhythm points of each rhythm point and the start time of the rhythm points. The embodiment can aggregate the information to form a music rhythm information.
For example, this embodiment provides music tempo information in the form of a data table, so that the music tempo information is displayed as information in the information table, as shown in table 2, table 2 provides data table effect display of the music tempo information, it can be seen that table 2 is a cascade table, a first column of table 2 displays a beat cycle identified by a cycle number, a second column of table 2 provides a beat point serial number, and at the same time, it is displayed in a cascade form which beat points are included in a beat cycle with a cycle number of 1, the beat point serial number and the beat point position (i.e., start time) included in the beat cycle are cascaded under each cycle number, and the number of rows of the cascaded beat points under each cycle number can be used as the number of beat points of the beat cycle.
TABLE 2 music tempo information corresponding to background music
Figure BDA0002588485200000052
In this embodiment, the following S203 to S205 show specific implementations of determining an alignment period and an alignment information table required for aligning a speech segment with background music according to the text attribute information and the music tempo information.
S203, determining at least one alignment period for aligning the voice segment with the background music according to the total amount of characters in the character attribute information and the period information of each beat period in the music rhythm information.
In this embodiment, it may be determined how many alignment cycles for aligning all characters in a speech segment may be included in the whole piece of background music first through this step, and it may be known that, when the number of alignment cycles is greater than 1, the determined last alignment cycle may be an incomplete cycle (i.e., not including all characters). This step is equivalent to performing a rough alignment period division on the background music. In the whole dividing process, by means of the total amount of characters included in the voice segment and the number of rhythm points in a complete beat period in the background music, whether the beat period is directly used as an alignment period or not is determined by comparing the total amount of characters with the number of rhythm points in the beat period, or the alignment period is obtained by combining the beat periods.
Further, fig. 3 is a flowchart illustrating an implementation of determining an alignment period in the method for converting voice into rap music according to this embodiment, and as shown in fig. 3, determining at least one alignment period for aligning the voice segment with the background music according to the total amount of characters in the character attribute information and the period information of each beat period in the music tempo information may specifically be optimized as follows:
S2031, selecting a complete beat period, and acquiring the number of rhythm points in the corresponding period information.
It is understood that at least one beat period can be detected in the whole piece of background music, and when one beat period is detected, the beat period is considered to be a complete period, and when the number of beat periods is greater than 1 beat period, the last period formed by dividing may be an incomplete period, that is, all the beat points in one fixed period are not included. In this embodiment, one of the rhythm points in the corresponding cycle information may be selected from the complete beat cycle. Wherein the number of tempo points in each complete beat period is the same.
S2032, judging whether the number of the rhythm points is more than or equal to the total amount of the characters, if so, executing S2033; if not, S2034 is executed.
The purpose of the determination in this step is mainly to determine whether a complete beat period obtained by current detection can accommodate all the characters in the obtained speech segment, if yes, then S2033 is executed; if not, S2034 needs to be performed.
S2033, regarding each beat period as an alignment period.
After the above determination, when the number of the rhythm points is greater than or equal to the total number of the characters, the beat period can be directly regarded as an alignment period. It is to be understood that when it is determined that a complete beat period has the condition, other detected complete beat periods can be also regarded as a complete alignment period, and an incomplete beat period included in the complete beat period can also be regarded as an incomplete alignment period.
S2034, whether the number of beat cycles included in the background music is more than 1 or not, if yes, executing S2035; if not, S2036 is executed.
In connection with the above determination, when the number of the rhythm points is smaller than the total number of the characters, it is equivalent to that a complete beat period cannot accommodate all the characters in the obtained speech segment, and at this time, the beat periods need to be combined through this step, and the condition of the combination is that the number of the beat periods included in the background music is at least two. Whether the number of the beat cycles in the background music is greater than 1 or not can be continuously judged through the step, if yes, a merging condition is met, and S2035 can be continuously executed; otherwise, it is equivalent to that the background music does not match the speech segment, and S2036 needs to be executed.
And S2035, combining every two beat periods according to the arrangement sequence of the period numbers to form at least one new beat period, and returning to execute S2031.
In this embodiment, when the number of the beat periods is greater than 1, two beat periods may be combined in sequence according to the period numbers, so as to form a new beat period, and it can be known that the corresponding period information of the newly formed beat period will also be correspondingly changed. Taking table 2 as an example, it is assumed that two beat periods with a period number of 1 and a period number of 2 are merged to form a new beat period, and the number of the beat points included in the new beat period is the sum of the number of the first two beat points. Meanwhile, after the two are combined in the step, the number of formed beat periods is half of the number of the original beat periods plus 1, and then the step can be returned to S2031 to further determine the alignment period according to the period information of the newly formed beat periods, and the process is repeated until a proper alignment period is found, or the conversion operation from the subsequent voice to the rap music is finished when the search fails.
S2036, ending the process of converting the voice segment into rap music, and giving a prompt for reacquiring the voice segment or background music.
In this embodiment, if the number of the beat cycles is only one, and the number of the tempo points is still less than the total amount of the text, it may be considered that the speech segment does not match the selected background music, and the speech segment needs to be uploaded again or recorded again through the operation of an optional embodiment of this embodiment, or the background music is reselected.
S204, selecting a complete alignment cycle as a rhythm section to be aligned, and determining at least one alignment unit and corresponding alignment unit information according to the character attribute information and rhythm point information of rhythm points to be aligned in the rhythm section to be aligned.
After the alignment period division is performed, the matching of each character included in the voice segment with respect to each rhythm point in the alignment period can be determined by using one alignment period as a reference in this step. In this embodiment, matching of the rhythm point included in a period of time and the text in the voice segment is regarded as an alignment unit, and information of each alignment unit specifically includes a sequence number of the rhythm point, a text sequence number of the text matched with the rhythm point, and a gear ratio required for aligning the rhythm point and the text matched with the rhythm point.
Each alignment unit has alignment unit information which at least comprises a rhythm point serial number, a character serial number and a speed ratio. Meanwhile, because the number of rhythm points included in each alignment cycle is the same, and the music rhythms are the same, the step can only determine the alignment unit and the information of the alignment unit for any complete alignment cycle.
Specifically, the implementation process of determining the alignment unit and the alignment unit information in this step may be described as follows: firstly, recording an alignment cycle selected for information determination as a rhythm section to be aligned, wherein rhythm point information of the alignment cycle can be directly used as rhythm point information of rhythm points to be aligned in the rhythm section to be aligned; then, the step can determine an alignment matching value for aligning the character and the rhythm point according to the character attribute information and the rhythm point information; then determining the alignment range of the alignment matching value in a preset rhythm point-character alignment rule table; and finally, determining the alignment unit in the rhythm section to be aligned and the information of the alignment unit of each alignment unit based on the alignment rule corresponding to the alignment range, wherein the alignment range and the corresponding alignment rule in the rhythm point-character alignment rule table can be preset through historical experience.
Further, fig. 4 shows a flowchart for implementing determining information of an alignment unit and an alignment unit in an alignment period in the method for converting speech into rap music according to this embodiment, and as shown in fig. 4, determining at least one alignment unit and corresponding information of the alignment unit may specifically be optimized according to the text attribute information and rhythm point information of a rhythm point to be aligned in a rhythm segment to be aligned:
it is to be understood that the following S2041 to S2048 of the present embodiment are specific expansion execution of the above S204.
S2041, selecting a complete alignment cycle as a rhythm section to be aligned, forming rhythm blocks to be aligned corresponding to the rhythm points to be aligned one by one based on rhythm point information of the rhythm points to be aligned in the rhythm section to be aligned, and recording the number of the rhythm points to be aligned as the initial number of the remaining points.
In this embodiment, a complete alignment cycle may be selected from the determined alignment cycles as a to-be-aligned rhythm segment determined by the alignment information table, where rhythm points to be aligned in the to-be-aligned rhythm segment are rhythm points included in the alignment cycle, and rhythm point information of the included rhythm points is rhythm point information of the rhythm points to be aligned.
It should be noted that, in this embodiment, an interval formed by two adjacent rhythm points to be aligned may be recorded as one rhythm block to be aligned, and therefore, in this step, the same number of rhythm blocks to be aligned may be formed according to the number of rhythm points to be aligned included in the rhythm segment to be aligned, that is, the formed rhythm blocks to be aligned may be considered to correspond to the rhythm points to be aligned one to one, respectively, where in this step, a corresponding block number may be set for each rhythm block to be aligned, and then, in this step, the number of rhythm points to be aligned may be preferably recorded as the initial number of remaining points.
S2042, determining the ratio of the number of the remaining points to the total amount of the characters in the character attribute information, and recording the ratio as an alignment matching value.
In this embodiment, to match each to-be-aligned rhythm point in the to-be-aligned rhythm segment with a character in the voice segment, a ratio of each to-be-aligned rhythm point, which is not matched with the character in the to-be-aligned rhythm segment, to a total amount of the character is determined through this step, and this ratio is recorded as an alignment matching value.
It can be understood that when there are no matched rhythm points in the rhythm segment to be aligned, the rhythm points to be matched are all rhythm points to be aligned, and therefore, initially, the number of the remaining points is initialized to the number of included rhythm points to be aligned.
S2043, searching a preset rhythm point-character alignment rule table, and determining the length ratio range to which the alignment matching value belongs.
In this embodiment, a rhythm point-character alignment rule table is preset, the rule table is a binary association table, and two associated objects are a length ratio range and an alignment rule, respectively. The length ratio range can be specifically set by the ratio of the number of unmatched rhythm points in one alignment period to the total amount of characters included in the whole voice segment. Preferably, the length ratio ranges of 6 different intervals are formed based on historical experience in the embodiment, and are respectively: (0, 0.2), (0.2, 0.8), (0.8, 1), (1, 1.1), (1.1, 1.3) and (1.3, ∞).
In this embodiment, the length ratio range of the obtained alignment matching value in the rhythm point-character alignment rule table may be determined.
And S2044, determining the rhythm block to be aligned with the matched characters according to the alignment rule corresponding to the length ratio range, and recording as a candidate alignment unit.
The length ratio range to which the alignment matching value belongs is determined through the above steps, and the alignment rule associated with the length ratio range can be obtained.
In this embodiment, matching between a text and a rhythm point may be regarded as matching between the text and a rhythm block to be aligned, and based on an alignment rule corresponding to the length ratio range, a text (the number of texts is uncertain, but is at least 1) matching with each rhythm block to be aligned may be determined for each rhythm block to be aligned, and the matched rhythm block to be aligned may be used as a candidate alignment unit.
Following the above description of the rhythm point-character alignment rule table, corresponding to different length ratio ranges, the present embodiment sets a corresponding alignment rule, and exemplarily, a preset rhythm point-character alignment rule table is given in table 3. In this step, the alignment rules corresponding to the length ratio ranges in table 3 are used to perform character matching for the remaining rhythm points (remaining rhythm segments to be aligned).
TABLE 3 rhythm point-literal alignment rule Table
Figure BDA0002588485200000071
Figure BDA0002588485200000081
And S2045, counting the number of the remaining rhythm blocks to be aligned to obtain a new number of remaining points.
After the alignment matching is performed for one time by using the above S2044, the unmatched rhythm blocks to be aligned still exist, and in this step, the number of remaining rhythm blocks to be aligned in the rhythm section to be aligned can be counted, and the number of the remaining rhythm blocks to be aligned is used as a new number of remaining points.
S2046, determining whether the number of the remaining points is 0, if so, executing S2047; if not, the process returns to step S2042.
Through the step, whether the number of the remaining points is 0 or not can be judged, if so, the remaining rhythm blocks to be aligned in the rhythm section to be aligned can be considered to be 0, that is, all the rhythm blocks to be aligned are matched, and at this time, the operation of S2047 can be executed; if not, it may be considered that there is an unmatched to-be-aligned rhythm block in the to-be-aligned rhythm segment, and at this time, the operation of determining the aligned matching value in S2042 may be returned to be executed again.
It can be understood that, based on the operation in this step, when all the rhythm blocks to be aligned in one rhythm section to be aligned are matched, the number of candidate alignment units formed by the rhythm blocks to be aligned is actually the same as the number of included rhythm points to be aligned. That is, it can be considered that one candidate alignment unit exists for one to-be-aligned rhythm point (to-be-aligned rhythm block), and the unit numbers of the formed candidate alignment units may be sequentially marked in an increasing order from 0 according to the alignment order.
To facilitate better understanding of the determination process of the candidate alignment unit, the present embodiment provides an exemplary description. For example, it is assumed that the number of to-be-aligned rhythm points in one to-be-aligned rhythm segment is 8, and the number of currently determined remaining points is 8; if the total amount of the characters included in the speech segment obtained by the user is 5, for example, the "light yellow skirt", the process of determining each candidate aligning unit by matching the "light yellow skirt" with the 8 remaining rhythm points may be described as follows:
1) The alignment match value is: 8/5 falls within the length ratio range of (1.3, ∞), and the corresponding alignment rule can be obtained by looking up table 3 above.
2) And matching the characters with the rhythm points according to the alignment rule associated with the length ratio range (1.3, infinity).
Specifically, the alignment rule is: "select 10% word length from the first word to match from the first remaining rhythm point, then match 100% word length remaining rhythm point with word sequence separately, then select 20% word length remaining rhythm point from the last word to match repeatedly". Based on the alignment rule, it is first necessary to repeat by selecting 10% word length, i.e. 0.5 word, starting from the first word of the "light yellow skirt". It should be noted that when the word length to be repeated is less than 1, a rounding-down operation is performed, and therefore, the number of words that need to be repeated is 0. Then, the rhythm points with the word length of 100 percent can be selected from the first remaining rhythm points and are sequentially matched with 5 characters respectively, and at the moment, rhythm blocks 0 to 4 to be aligned formed by the rhythm points 0 to 4 respectively correspond to 5 characters of 'light', 'yellow', 'color', 'long', 'skirt'; then, it is necessary to select 20% word length, i.e. 1.5 words, from the last word of the "light yellow long skirt" for repetition, and as described above, based on rounding down, the number of words that need to be repeated currently is 1, i.e. the last word "skirt", and at this time, the rhythm block to be aligned formed by the rhythm point 6 corresponds to the word "skirt". So far, the operation of matching the characters with the rhythm points according to the alignment rules associated with the length ratio range (1.3, infinity) is completed, the unit numbers of the currently determined candidate alignment units are respectively 0-5, and the characters corresponding to the candidate alignment units are respectively: light, yellow, color, long, skirt and skirt.
3) After the above operation, 2 unmatched rhythm pieces to be aligned remain in the 8 rhythm pieces to be aligned, and the number of the remaining points is considered to be greater than 0, so that the alignment matching value can be determined again, the new alignment matching value is 2/5 ═ 0.4, and falls into the length ratio range of (0.2, 0.8), the table 3 is searched, and the corresponding alignment rule can be obtained.
4) And matching the characters with the rhythm points according to the alignment rules associated with the length ratio range (0.2, 0.8).
Specifically, the alignment rule is: when L is less than or equal to 0.5, randomly selecting characters to be repeated with L character length, adjusting the positions of matched node-characters, and repeatedly adding after the characters are selected; and when L is larger than 0.5, randomly selecting 50% word length words to be repeated, adjusting the positions of the matched rhythm points and the words, repeatedly adding after the words are selected, and adding mute sections to the rest rhythm points with the word length of L-0.5, wherein L is an alignment matching value. "
Analyzing the alignment matching value 0.4, it can be known that according to the alignment rule, the alignment matching value 0.4 is less than 0.5, therefore, the operation of randomly selecting 40% word length (i.e. 2 words) can be directly performed, assuming that the randomly selected word size of the word sizes 0-4 is 1 and 3, and the corresponding word is "yellow" and "long", respectively, then the above matched "light yellow skirt" needs to be adjusted, so that the word to be repeated can be located behind the position of the selected word, according to the alignment rule, the characters matched with the rest two rhythm blocks to be aligned are respectively 'yellow' and 'long', thereby forming new candidate alignment units that match the two words "yellow" and "long" respectively, because the 'faint yellow skirt' formed by matching is adjusted, based on the operation, the characters corresponding to the candidate matching units are respectively as follows: "light", "yellow", "color", "long", "skirt" and "skirt".
5) After the above operation, the remaining unmatched rhythm blocks to be aligned are 0, that is, the number of remaining points is 0, which meets the matching condition of the ending candidate alignment unit, so that the above operation can be ended.
After the step 5), 8 candidate aligned units with unit numbers of 0-7 in sequence can be formed. Thus, the alignment and matching of the characters in the voice section to the rhythm section to be aligned are completed.
S2047, determining at least one alignment unit according to the unit duration of each candidate alignment unit and the matched character attribute information of the matched characters, and obtaining a corresponding speed ratio.
According to the above description, it can be known that the number of candidate alignment units determined from the to-be-aligned rhythm segment is the same as the number of blocks including the to-be-aligned rhythm block, and one to-be-aligned rhythm block is a spacer block formed from a corresponding rhythm point to an adjacent next rhythm point or rhythm end point (in this case, the last rhythm point is mainly used), that is, the duration of one to-be-aligned rhythm block is the interval duration of two rhythm points (or rhythm end points). In this embodiment, since one candidate alignment unit corresponds to one to-be-aligned rhythm block, the duration of the to-be-aligned rhythm block may be used as the unit duration of the corresponding candidate alignment unit.
It should be noted that after determining the characters matched with the candidate alignment unit, what is needed is to align the character pronunciation and the unit duration of the matched characters with the candidate alignment unit. In general, the alignment may be directly mixing the pronunciation of the matched text while the candidate alignment unit audio signal is played. Considering that the pronunciation time of some characters is short but the unit duration of the matching candidate alignment unit is long, or the pronunciation time of some characters is long but the unit duration of the matching candidate alignment unit is short, in order to align the characters with the unit to be aligned, the pronunciation rate of the characters needs to be adjusted, for example, the pronunciation time of the characters needs to be stretched (to reduce the pronunciation speed) or compressed (to increase the pronunciation speed) so as to be equal to the unit duration.
In this embodiment, the ratio value of the text that needs to be stretched or compressed is recorded as the gear ratio, and the step may specifically determine the gear ratio required when the matched text is aligned with the corresponding candidate aligning unit according to the unit duration of the candidate aligning unit and the attribute information of the matched text matched with the candidate aligning unit (e.g., the text start/stop time of the matched text, the start position of the first vowel in the text, etc.).
However, in the implementation of aligning the character with the unit to be aligned by stretching or compressing the character pronunciation, the degree of stretching or compressing the character pronunciation is limited, and if only the alignment is considered but the character pronunciation is stretched or compressed infinitely, there is a risk that the audio frequency formed after the actual alignment operation is distorted.
Thus, the step may further determine whether the corresponding candidate aligning unit is suitable as the aligning unit by comparing the speed ratio with the set suitable condition according to the calculated speed ratio, and if so, may directly determine the candidate aligning unit as the aligning unit and determine the corresponding speed ratio as the speed ratio of the aligning unit; if not, mute filling or merging processing of two or more candidate aligning units is required to be performed on the candidate aligning unit, so that an aligning unit satisfying the above-mentioned suitable condition is obtained, and the speed ratio at which the suitable condition determination is performed is taken as the speed ratio of the aligning unit.
Through the operation of the step, the candidate aligning units with the determined number of rhythm points to be aligned can finally form at least one aligning unit, each aligning unit at least comprises one rhythm point and at least one matched character, and the gear ratio of each aligning unit can be regarded as a proportion value required for stretching or shrinking the character when the included character is aligned with the included rhythm point.
S2048, determining the unit serial number of each alignment unit, the initial rhythm point serial number in the rhythm points, the character serial number of the matched characters and the gear ratio as corresponding alignment unit information.
It can be known that, when the determination operations of the alignment units and the corresponding gear ratios are performed, the unit number of each alignment unit and the rhythm point number of each rhythm point included in the alignment unit are also obtained, and at the same time, the character number of each character matched in the alignment unit can also be obtained. Through the operation of the step, the information can be summarized for each alignment unit, so that corresponding alignment unit information is formed corresponding to each alignment unit.
S205, summarizing the information of each alignment unit to form a current alignment information table of the rhythm section to be aligned, and determining the alignment information table of each remaining alignment period according to the current alignment information table.
In this embodiment, at least one alignment unit included in the rhythm segment to be aligned and corresponding information of each alignment unit may be determined through the step S204, and in this step, the determined information of each alignment unit may be arranged and summarized according to the unit number sequence of the alignment unit, so as to form a current alignment information table. And then, the alignment information table of each of the remaining alignment periods determined in S203 may be determined according to the current alignment information table.
Specifically, for the remaining other alignment periods, if the period is a complete alignment period, the current alignment information table may be copied as the corresponding alignment information table directly; if the alignment period is not complete, the alignment unit information of the same row as the rhythm points included in the alignment period can be taken out from the current alignment information table to form a corresponding alignment information table.
Table 4 alignment information table formed based on information of each alignment unit in one alignment period
Unit number Number of start rhythm point Character serial number Gear ratio
1 1 2 1.0
2 2 3,4 1.2
3 3 5 0.9
Illustratively, table 4 shows an alignment information table formed based on information of each alignment unit in an alignment period, and as shown in table 4, each column in the alignment information table corresponds to attribute information of an alignment unit, and may include: the unit number of the alignment unit, the rhythm point number of the initial rhythm point in the alignment unit, the character number of each matched character and the gear ratio required for alignment, and the row number of the alignment information table represents the unit number of the alignment unit in the alignment cycle.
Further, the determining of the alignment information table of the remaining alignment periods according to the current alignment information table may be embodied as: for each remaining alignment period, if the alignment period is a complete period, taking the current alignment information table as an alignment information table of the alignment period; if the alignment period is an incomplete period, determining the number of target points of the rhythm points included in the alignment period; and selecting the alignment unit information of the target point rows in the current alignment information table in a reverse order to form the alignment information table of the alignment period.
The foregoing description of this embodiment specifically provides a process for determining an alignment information table of the remaining alignment periods in the background music, and for an incomplete alignment period, assuming that 2 rhythm points are included, two rows of alignment unit information may be directly selected from the current alignment information table from bottom to top to form a corresponding alignment information table.
S206, controlling the alignment of the characters in the voice section and the rhythm points in the background music according to the alignment information tables, and forming a rap audio after tone modification and special effect processing.
In this embodiment, it can be specifically known that the alignment information table formed for each alignment cycle at least includes one alignment unit and corresponding alignment unit information, and each alignment unit information includes a rhythm point number actually used for aligning the text and the rhythm point, a matched text number, a gear ratio required for alignment, and the like. In this embodiment, after the alignment information tables of each alignment period are obtained through the above steps, the corresponding rhythm point and the matched text are controlled to be aligned according to the corresponding speed ratio according to the fact that each alignment unit information is included in each alignment information table, so that the alignment matching between the rhythm points in the speech segment and the background music is realized.
It should be noted that, when the step controls the alignment of the characters in the speech segment with the matched rhythm points, for the matching in each alignment period, it is actually equivalent to first obtaining the audio data actually corresponding to each alignment unit in the speech segment according to the pronunciation occupation time (the interval time from the first vowel start point of the alignment unit to the first vowel start point of the next unit) of the characters included in the lower alignment unit in the alignment period, then performing variable speed adjustment on the audio data actually corresponding to each alignment unit according to the variable speed ratio of each alignment unit, and finally performing operations such as tone changing adjustment and special effect processing on the audio data after the variable speed adjustment, thereby forming the converted rap music.
The method for converting the voice into the rap music provided by the second embodiment of the invention specifically provides the determination operation of the character attribute information and the music rhythm information, and also provides the specific operation of determining the alignment period and the related alignment information table required by aligning the voice segment with the background music. By the method provided by the embodiment, after the user selects the background music and records a segment of speaking voice with random content, the alignment strategy of matching alignment and speed change of the characters and the rhythm points is determined through the obtained rhythm point positions, the start and stop time of the single characters and the start time of the vowels, and therefore the rap music formed after the characters are aligned with the rhythm points can be obtained in a short time through the alignment strategy. The implementation of the whole technical scheme simplifies the complicated process of manual audio editing and making, and provides the possibility of making rap music for non-professional audio processors; meanwhile, compared with the existing voice conversion rap method, the voice content to be converted does not need to be limited, the free recording of the voice content to be converted is ensured, the implementation process of voice conversion is simplified, the condition that the rhythm points of voice characters and music are staggered is avoided, and the application range of voice conversion rap music is expanded.
As an optional embodiment of the second embodiment of the present invention, before determining that the background music contains the total amount of tempo points, the sequence numbers of the tempo points, and the period information of each tempo period and constitutes the music tempo information of the background music in step S202, the optional embodiment further optimizes the music tempo information including:
acquiring the detected initial rhythm points, and determining the interval duration formed by two adjacent initial rhythm points; and determining and deleting rhythm points to be deleted in the initial rhythm points according to the average word length of the characters included in the voice section and the interval duration to obtain effective rhythm points in the background music.
In this alternative embodiment, an operation of performing optimization processing on the tempo points detected from the background music is specifically given, and by this operation, it is possible to remove, from the detected tempo points (referred to as initial tempo points in this alternative embodiment), tempo points at which the interval duration between two adjacent tempo points is closer to the average word length by less than half.
Specifically, the average word length of the text is the ratio of the time length occupied by all the text to the total amount of the text, and generally, if the interval time length between two adjacent rhythm points is less than half of the average word length, it is not favorable for the alignment of the text and the rhythm points, so that any one of the two adjacent rhythm points needs to be deleted, so that the undeleted rhythm point and the previous or next deleted rhythm point form a new interval time length, and the newly formed interval time length can be determined again by the method of the optional embodiment, so that the invalid rhythm point is removed by cyclic updating, and the valid rhythm point is retained.
As another optional embodiment of the second embodiment of the present invention, the execution of the above-mentioned S2047 is further optimized, fig. 5 is a specific expanded flowchart for determining the alignment unit and the alignment unit information in the alignment cycle in the second embodiment of the present invention, and as shown in fig. 5, at least one alignment unit is further determined and a corresponding shift ratio is obtained according to the unit duration of each candidate alignment unit in combination with the matching character attribute information of the matched character, specifically:
it is to be appreciated that the present alternative embodiment is a specific implementation of S2047 described above. Through the operation of S2046, a certain number of candidate aligning units can be obtained in the rhythm segment to be aligned, and the following operation of this alternative embodiment can implement the determining operation of the aligning units and the corresponding gear ratios of the aligning units from among the candidate aligning units.
And S1, selecting an unselected candidate alignment unit as the current processing unit according to the sequence of the unit sequence numbers.
In this embodiment, the candidate alignment units in the rhythm segment to be aligned have corresponding unit serial numbers, and in this step, a candidate alignment unit that has not been selected before may be selected first according to the sequence of the unit serial numbers as the current processing unit, and the non-selection may be understood as not being selected as the current processing unit.
Illustratively, this step first selects the first candidate processing unit as the current processing unit.
And S2, determining the current gear ratio of the current processing unit according to the unit duration of the current processing unit and by combining the start-stop time of the characters matched respectively in the current processing unit and the next adjacent candidate alignment unit and the initial position of the first vowel.
According to the above description of the embodiment, it can be seen that the alignment between the character and the candidate alignment unit is mainly represented by the alignment between the actual pronunciation time length of the character and the unit time length of the candidate alignment unit, the alignment between the actual pronunciation time length of the character and the unit time length of the candidate alignment unit can be specifically realized by stretching or compressing the pronunciation time length of the character, and the stretching or compressing of the pronunciation time length of the character can be determined by a transmission ratio. And the speed ratio is equivalent to the ratio of the pronunciation duration to the actual pronunciation duration of the text.
It should be noted that, for a word, the actual pronunciation time length starts from the beginning position of the first vowel, and the actual pronunciation ending time can be regarded as the beginning position of the first vowel of the next word. Considering the text in combination with the candidate aligning units, the actual pronunciation duration of all the matched text in one candidate aligning unit should be from the first vowel position of the first matched text in the candidate aligning unit to the first vowel position of the first matched text in the next adjacent candidate aligning unit. Therefore, the step can determine the actual pronunciation duration of all the characters matched in the current processing unit through the start and stop time of the characters matched in the current processing unit and the start position of the first vowel in the next adjacent candidate alignment unit respectively, and obtain the current gear ratio of the current processing unit according to the known unit duration and the determined actual pronunciation duration.
Specifically, in this embodiment, the current transmission ratio of the current processing unit may be further optimized by determining, according to the unit duration of the current processing unit, the start-stop time and the start position of the first vowel of the text respectively matched in the current processing unit and the next adjacent candidate alignment unit, and by combining the start-stop time and the start position of the first vowel of the text matched in the current processing unit and the next adjacent candidate alignment unit:
and S21, determining the pronunciation occupation time of all matched characters in the current processing unit according to the start-stop time of all matched characters and the initial position of the first vowel of the current processing unit.
In this step, the attribute information of the matched characters of all the characters matched in the current processing unit, specifically, the starting and ending time of each character and the starting position of the first reason of the character, may be obtained, and based on these information, the pronunciation occupation time of all the matched characters in the current processing unit may be determined.
For example, assuming that there is only one word currently in the current processing unit, the start and stop time of the word is t1 and t2, respectively, and the start position of the first vowel is t3, where t1< t3< t2, the pronunciation occupation time of the word in the current processing unit is actually t2-t 3.
Assuming that there are two characters currently in the current processing unit, the start-stop time of the first character is t1 and t2, the start position of the first vowel is t3, the start-stop time of the second character is t2 and t4, and the start position of the first vowel is t5, where t1< t3< t2< t5< t4, the occupation time of the pronunciation of the two characters commonly possessed in the current processing unit is: t4-t3 or (t2-t3) + (t4-t 5). It can be seen that the pronunciation occupation time of all the characters matched in the current processing unit is only the difference between the sum of the start-stop time of all the characters and the interval time from the start of the first character to the first vowel.
It should be noted that the pronunciation occupation time is not the actual pronunciation time matching all the characters in the current processing unit, and the actual pronunciation time also includes the vowel interval time matching the first character of the next candidate alignment unit adjacent to the current processing unit, which can be obtained in S22. The purpose of determining the pronunciation occupation time length in the above manner in this embodiment is to align the first vowel position of each character in the speech segment to the rhythm point of the matched candidate alignment processing unit, so as to ensure that the playing effect of the characters and the rhythm point aligned in this manner is better than the playing effect of directly aligning the character head with the rhythm point.
And S22, determining the vowel interval duration of the first character according to the start-stop time of the first character and the start position of the first vowel matched by the next candidate alignment unit adjacent to the current processing unit.
In the above example, taking the current processing unit including two words as an example, assuming that the start and stop times of the first word in the words matched by the current processing unit and the next candidate alignment unit are t4 and t6, and the start position of the first vowel is t7, where t1< t3< t2< t5< t4< t7< t6, the interval duration of the vowel of the first word is t7-t 4.
From S21 and S22, it can be determined that the actual pronunciation duration of all matched words in the current processing unit is the pronunciation occupation duration of all matched words in the current processing unit and the vowel interval duration of the first word matched by the next adjacent candidate alignment unit. In the above example, the actual pronunciation duration of all matching characters in the current processing unit is (t4-t3) + (t7-t 4).
And S23, taking the ratio of the unit duration of the current processing unit to the determined actual pronunciation duration as the current speed change ratio of the current processing unit, wherein the actual pronunciation duration is the sum of the pronunciation duration and the vowel interval duration.
Following the above example, assuming that the unit duration of the current processing unit is t, the current transmission ratio of the current processing unit is representable as: t/[ (t4-t3) + (t7-t4) ].
S3, comparing the current speed ratio with a first speed ratio value and a second speed ratio value, wherein the second speed ratio value is larger than the first speed ratio value.
After determining the current gear ratio of the current processing unit through S2, the current gear ratio may be compared with the set first gear ratio value and the set second gear ratio value to determine whether the text matching through the current gear ratio satisfies the conventional stretching/compressing condition.
The present embodiment preferably sets the speed ratio between the first ratio value and the second ratio value to satisfy the tension/compression condition, the non-satisfaction of the compression condition smaller than the first ratio value, and the non-satisfaction of the tension condition larger than the second ratio value.
S4, if the current gear ratio is greater than or equal to the first gear ratio value and less than or equal to the second gear ratio value, determining the current processing unit as an aligning unit, and noting the current gear ratio as the gear ratio of the aligning unit, then executing S7.
Specifically, when the current speed ratio is greater than or equal to the first transmission ratio value and less than or equal to the second transmission ratio value, the current speed ratio of the current processing unit is considered to satisfy the conventional tension/compression condition, and at this time, the current processing unit may be directly considered as an alignment unit, and the current speed ratio may be considered as the speed ratio of the alignment unit, and the operation may be continued by jumping to S7.
S5, if the current speed ratio is larger than the second speed ratio value, determining a mute duration for filling the current processing unit, determining a new current speed ratio according to the mute duration, and then executing S3.
Specifically, when the current speed ratio is greater than the second speed ratio value, the current speed ratio of the current processing unit is considered not to satisfy a conventional stretching condition, and at this time, the unit duration of the current processing unit is equivalent to be longer than the actual pronunciation duration matched with all the characters, and a mute duration needs to be added in the current processing unit, so as to increase the actual pronunciation duration of the characters.
The step determines that the supplemented mute duration is preferably a start-stop duration of a character, and thus, the step re-determines a current gear ratio having a unit duration as a numerator and a sum of the mute duration and an actual utterance duration as a denominator, based on the combination of the mute duration and the determined actual utterance duration. And thereafter returns to S3 to perform the speed ratio comparison operation.
S6, if the current speed ratio is smaller than the first speed ratio value, merging the current processing unit with the next adjacent alignment candidate unit to form a new current processing unit, and returning to execute S2.
Specifically, when the current speed ratio is smaller than the first speed ratio value, the current speed ratio of the current processing unit is considered not to satisfy a conventional compression condition, and at this time, the unit duration equivalent to the current processing unit is smaller than the actual pronunciation duration matched with all characters.
In this embodiment, it is preferable that the candidate alignment unit to be incorporated is a next candidate alignment unit adjacent to the current processing unit, at this time, the unit duration of the newly formed current processing unit is the sum of the original unit duration and the unit duration corresponding to the next candidate alignment unit, and then, the step may return to S2 to recalculate the actual pronunciation duration of all matched characters in the newly formed current processing unit.
It should be noted that, in this embodiment, an operation of selecting a next candidate alignment unit to be merged into the existing current processing unit is performed, and it is also considered that the selected next candidate alignment unit has already been selected, and subsequently when S1 needs to be performed, the selection of the next candidate alignment unit may be skipped, and the next candidate alignment unit is not selected as the current processing unit.
S7, judging whether all candidate alignment units are selected to participate in the processing, if so, executing S8; if not, returning to execute S1;
after an alignment unit is determined through the above steps, there may be candidate alignment units that are not selected in the rhythm segment to be aligned, and a determination may be made through this step, if all candidate alignment units are selected to participate in the above processing, S8 may be executed, otherwise, it is required to return to S1 to reselect an unselected candidate alignment unit to perform the above operation in a loop.
S8, summarizing the determined alignment units and the corresponding gear ratios.
The step may sum up the determined alignment units and corresponding gear ratios to obtain at least one alignment unit and a gear ratio included in the rhythm section to be aligned.
The optional embodiment provides the effective aligning unit in the rhythm section to be aligned and the implementation process of the corresponding gear ratio, and through the implementation of the optional embodiment, the effective alignment of the rhythm point in the rhythm section to be aligned and the characters in the voice section can be ensured, the condition that the rhythm points of the voice characters and the music are staggered is avoided, and therefore effective theoretical support is provided for the conversion from the voice to the rap music in the embodiment.
EXAMPLE III
Fig. 6 is a block diagram of an apparatus for converting speech into rap music according to a third embodiment of the present invention, which is suitable for use in the case of converting rap music from speech recorded by a user, where the apparatus may be implemented by software or hardware, and may be generally integrated on a computer device. As shown in fig. 6, the apparatus includes: an information determination module 31, an alignment information determination module 32, and a conversion control module 33.
An information determining module 31, configured to identify an obtained speech segment and process the selected background music to obtain text attribute information of a text in the speech segment and music tempo information of the background music;
an alignment information determining module 32, configured to determine, according to the text attribute information and the music tempo information, at least one alignment period for aligning the speech segment with the background music, and obtain an alignment information table of each alignment period;
and the conversion control module 33 is configured to control, according to each alignment information table, alignment of characters in the speech segment and rhythm points in the background music, and form a rap audio after pitch adjustment and special effect processing.
The device for converting the voice into the rap music provided by the third embodiment of the invention effectively realizes the conversion of the voice content segment randomly recorded by the user into the rap segment matched with the background music, simplifies the complicated process of manual audio editing and making, and provides possibility of making the rap music for non-professional audio processing personnel; meanwhile, compared with the existing voice conversion rap method, the voice content to be converted does not need to be limited, the free recording of the voice content to be converted is ensured, the implementation process of voice conversion is simplified, the condition that the rhythm points of voice characters and music are staggered is avoided, and the application range of voice conversion rap music is expanded.
Example four
Fig. 7 is a schematic diagram of a hardware structure of a computer device according to a fourth embodiment of the present invention, specifically, the computer device includes: a processor and a storage device. At least one instruction is stored in the storage device, and the instructions are executed by the processor, so that the computer device executes the method for converting the voice into the rap music according to the embodiment of the method.
Referring to fig. 7, the computer device may specifically include: a processor 40, a storage device 41, a display 42, an input device 43, an output device 44, and a communication device 45. The number of processors 40 in the computer device may be one or more, and one processor 40 is taken as an example in fig. 6. The number of the storage devices 41 in the computer apparatus may be one or more, and one storage device 41 is taken as an example in fig. 7. The processor 40, the storage means 41, the display 42, the input means 43, the output means 44 and the communication means 45 of the computer device may be connected by a bus or other means, as exemplified by the bus connection in fig. 7.
Specifically, in the embodiment, when the processor 40 executes one or more programs stored in the storage device 41, the following operations are specifically implemented: identifying the obtained voice segment and processing the selected background music to obtain character attribute information of characters in the voice segment and music rhythm information of the background music; determining at least one alignment period for aligning the voice segment with the background music according to the character attribute information and the music rhythm information, and obtaining an alignment information table of each alignment period; and controlling the alignment of the characters in the voice section and rhythm points in the background music according to the alignment information tables, and forming a rap audio after tone change adjustment and special effect processing.
Embodiments of the present invention further provide a computer-readable storage medium, where a program in the storage medium, when executed by a processor of a computer device, enables the computer device to perform the method for converting speech into rap music according to the above embodiments. Illustratively, the method for converting speech into rap music according to the above embodiments includes: recognizing the obtained voice segment and processing the selected background music to obtain character attribute information of characters in the voice segment and music rhythm information of the background music; determining at least one alignment period for aligning the voice segment with the background music according to the character attribute information and the music rhythm information, and obtaining an alignment information table of each alignment period; and controlling the alignment of the characters in the voice section and rhythm points in the background music according to the alignment information tables, and forming a rap audio after tone change adjustment and special effect processing.
It should be noted that, as for the embodiments of the apparatus, the computer device, and the storage medium, since they are basically similar to the embodiments of the method, the description is relatively simple, and in the relevant places, reference may be made to the partial description of the embodiments of the method.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes instructions for causing a computer device (which may be a robot, a personal computer, a server, or a network device) to execute the method for converting voice into rap music according to any embodiment of the present invention.
It should be noted that, in the above apparatus for converting speech into rap music, the included units and modules are only divided according to functional logic, but not limited to the above division, as long as the corresponding functions can be realized; in addition, the specific names of the functional units are only for convenience of distinguishing from each other and are not used for limiting the protection scope of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by suitable instruction execution devices. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in more detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (13)

1. A method for converting speech to rap music, comprising:
recognizing the obtained voice segment and processing the selected background music to obtain character attribute information of characters in the voice segment and music rhythm information of the background music;
determining at least one alignment period for aligning the voice segment with the background music according to the character attribute information and the music rhythm information, and obtaining an alignment information table of each alignment period;
and controlling the alignment of the characters in the voice section and rhythm points in the background music according to the alignment information tables, and forming a rap audio after tone change adjustment and special effect processing.
2. The method according to claim 1, wherein the recognizing the obtained speech segment and processing the selected background music to obtain text attribute information of text in the speech segment and music tempo information of the background music comprises:
carrying out noise reduction processing and end point detection processing on the voice section selected by the user, and obtaining the character serial number, the starting and stopping time, the initial position of the first vowel and the character total amount of each character in the voice section through the voice recognition of the processed voice section to form character attribute information of the voice section;
Detecting rhythm points and dividing rhythm periods of background music selected by a user, and determining that the background music comprises the total quantity of the rhythm points, the serial numbers of the rhythm points and the period information of each rhythm period to form music rhythm information of the background music;
wherein the period information includes: the cycle number, the number of rhythm points of the rhythm points included in the beat cycle, the serial number of the rhythm points of each rhythm point and the start time of the rhythm points.
3. The method according to claim 2, before determining that the background music contains the total number of tempo points, the number of tempo points, and the period information of each beat period, and constitutes the music tempo information of the background music, further comprising:
acquiring the detected initial rhythm points, and determining the interval duration formed by two adjacent initial rhythm points;
and determining and deleting rhythm points to be deleted in the initial rhythm points according to the average word length of the words in the voice section and the interval duration to obtain effective rhythm points in the background music.
4. The method according to claim 2, wherein the determining at least one alignment period for aligning the speech segment with the background music according to the text attribute information and the music tempo information, and obtaining an alignment information table for each alignment period comprises:
Determining at least one alignment cycle for aligning the voice segment with the background music according to the total amount of characters in the character attribute information and cycle information of each beat cycle in the music rhythm information;
selecting a complete alignment cycle as a rhythm section to be aligned, and determining at least one alignment unit and corresponding alignment unit information according to the character attribute information and rhythm point information of rhythm points to be aligned in the rhythm section to be aligned;
and summarizing the information of each alignment unit to form a current alignment information table of the rhythm section to be aligned, and determining the alignment information tables of the rest alignment periods according to the current alignment information table.
5. The method according to claim 4, wherein the determining at least one alignment period for aligning the speech segment with the background music according to the total amount of text in the text attribute information and the period information of each beat period in the music tempo information comprises:
judging whether the number of rhythm points in the period information corresponding to a complete beat period is greater than or equal to the total number of the characters;
if yes, respectively regarding each beat period as an alignment period;
If not, when the number of the beat periods included in the background music is larger than 1, combining every two beat periods according to the arrangement sequence of the period numbers to form at least one new beat period, and returning to continue judging the number of the rhythm points and the total amount of the characters.
6. The method according to claim 4, wherein said determining the alignment information table for each remaining alignment period according to the current alignment information table comprises:
for each remaining alignment period, if the alignment period is a complete period, taking the current alignment information table as an alignment information table of the alignment period;
if the alignment period is an incomplete period, determining the number of target points of the rhythm points included in the alignment period;
and selecting the alignment unit information of the target point rows in the current alignment information table in a reverse order to form the alignment information table of the alignment period.
7. The method according to claim 4, wherein the determining at least one alignment unit and corresponding alignment unit information according to the text attribute information and the rhythm point information of the rhythm point to be aligned in the rhythm segment to be aligned comprises:
Forming rhythm blocks to be aligned which are in one-to-one correspondence with the rhythm points to be aligned on the basis of rhythm point information of the rhythm points to be aligned in a rhythm segment to be aligned, and recording the number of the rhythm points to be aligned as the initial number of remaining points;
determining the ratio of the number of the remaining points to the total amount of the characters in the character attribute information, and recording the ratio as an alignment matching value;
searching a preset rhythm point-character alignment rule table, and determining the length ratio range to which the alignment matching value belongs;
determining rhythm blocks to be aligned with matched characters according to alignment rules corresponding to the length ratio range, and recording the rhythm blocks as candidate alignment units;
counting the number of the remaining rhythm blocks to be aligned to serve as the number of new remaining points, and returning to the step of re-executing the determination operation of the alignment matching value until the number of the remaining points is 0;
determining at least one alignment unit and obtaining a corresponding gear ratio according to the unit duration of each candidate alignment unit and the matched character attribute information of the matched characters;
and determining the unit serial number of each alignment unit, the initial rhythm point serial number in the rhythm points, the character serial number of the matched characters and the gear ratio as corresponding alignment unit information.
8. The method according to claim 7, wherein the determining at least one alignment unit and obtaining a corresponding transmission ratio according to the unit duration of each candidate alignment unit and the matched text attribute information of the matched text comprises:
a) selecting an unselected candidate alignment unit as a current processing unit according to the sequence of the unit serial numbers;
b) determining the current gear ratio of the current processing unit according to the unit duration of the current processing unit and by combining the start-stop time of the characters respectively matched in the current processing unit and the adjacent next candidate alignment unit and the initial position of the first vowel;
c) comparing the current speed ratio with a set first speed ratio value and a set second speed ratio value, wherein the second speed ratio value is larger than the first speed ratio value;
d) if the current speed ratio is larger than or equal to the first speed ratio value and smaller than or equal to the second speed ratio value, determining the current processing unit as an alignment unit, recording the current speed ratio as the speed ratio of the alignment unit, and then executing step g);
e) if the current speed ratio is larger than the second speed ratio value, determining a mute time length for filling the current processing unit, determining a new current speed ratio according to the mute time length, and then returning to the step c);
f) If the current speed ratio is smaller than the first speed ratio value, combining the current processing unit and the next adjacent candidate alignment unit to form a new current processing unit, and returning to execute the step b);
g) judging whether all the candidate alignment units are selected to participate in the processing, if so, executing the step h); if not, returning to execute the step a);
h) the determined alignment units and the corresponding gear ratios are aggregated.
9. The method according to claim 8, wherein determining the current transmission ratio of the current processing unit according to the unit duration of the current processing unit in combination with the start-stop time and the start position of the first vowel of the matched text in the current processing unit and the next adjacent candidate alignment unit comprises:
determining the pronunciation occupation time of all matched characters in the current processing unit according to the start-stop time of all matched characters and the initial position of the first vowel of the current processing unit;
determining the vowel interval duration of the first character according to the start-stop time of the first character and the initial position of the first vowel matched by the next candidate alignment unit adjacent to the current processing unit;
And taking the ratio of the unit duration of the current processing unit to the determined actual pronunciation duration as the current gear ratio of the current processing unit, wherein the actual pronunciation duration is the sum of the pronunciation duration and the vowel interval duration.
10. The method according to any one of claims 1 to 9, before determining, according to the text attribute information and the music tempo information, at least one alignment period for aligning the speech segment with the background music, and obtaining an alignment information table for each alignment period, further comprising:
if the total amount of the characters in the character attribute information is larger than the total amount of the rhythm points in the music rhythm information, ending the processing of converting the voice segment into the rap music and giving a prompt for obtaining the voice segment or the background music again.
11. An apparatus for converting speech into rap music, comprising:
the information determining module is used for identifying the obtained voice segment and processing the selected background music to obtain character attribute information of characters in the voice segment and music rhythm information of the background music;
an alignment information determining module, configured to determine, according to the text attribute information and the music tempo information, at least one alignment period for aligning the speech segment with the background music, and obtain an alignment information table of each alignment period;
And the conversion control module is used for controlling the alignment of the characters in the voice section and the rhythm points in the background music according to the alignment information tables, and forming a rap audio after tone-changing adjustment and special effect processing.
12. A computer device, comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs being executable by the one or more processors to cause the one or more processors to implement a method of converting speech to rap music as claimed in any one of claims 1-10.
13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of converting speech into rap music according to any one of claims 1-10.
CN202010688502.3A 2020-07-16 2020-07-16 Method, device, equipment and storage medium for converting voice into rap music Active CN111862913B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010688502.3A CN111862913B (en) 2020-07-16 2020-07-16 Method, device, equipment and storage medium for converting voice into rap music
PCT/CN2021/095236 WO2022012164A1 (en) 2020-07-16 2021-05-21 Method and apparatus for converting voice into rap music, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010688502.3A CN111862913B (en) 2020-07-16 2020-07-16 Method, device, equipment and storage medium for converting voice into rap music

Publications (2)

Publication Number Publication Date
CN111862913A true CN111862913A (en) 2020-10-30
CN111862913B CN111862913B (en) 2023-09-05

Family

ID=72984100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010688502.3A Active CN111862913B (en) 2020-07-16 2020-07-16 Method, device, equipment and storage medium for converting voice into rap music

Country Status (2)

Country Link
CN (1) CN111862913B (en)
WO (1) WO2022012164A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669849A (en) * 2020-12-18 2021-04-16 百度国际科技(深圳)有限公司 Method, apparatus, device and storage medium for outputting information
CN112700781A (en) * 2020-12-24 2021-04-23 江西台德智慧科技有限公司 Voice interaction system based on artificial intelligence
CN112712783A (en) * 2020-12-21 2021-04-27 北京百度网讯科技有限公司 Method and apparatus for generating music, computer device and medium
CN113823281A (en) * 2020-11-24 2021-12-21 北京沃东天骏信息技术有限公司 Voice signal processing method, device, medium and electronic equipment
WO2022012164A1 (en) * 2020-07-16 2022-01-20 百果园技术(新加坡)有限公司 Method and apparatus for converting voice into rap music, device, and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114566191A (en) * 2022-02-25 2022-05-31 腾讯音乐娱乐科技(深圳)有限公司 Sound correcting method for recording and related device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5811707A (en) * 1994-06-24 1998-09-22 Roland Kabushiki Kaisha Effect adding system
CN101399036A (en) * 2007-09-30 2009-04-01 三星电子株式会社 Device and method for conversing voice to be rap music
CN103440862A (en) * 2013-08-16 2013-12-11 北京奇艺世纪科技有限公司 Method, device and equipment for synthesizing voice and music
CN107170464A (en) * 2017-05-25 2017-09-15 厦门美图之家科技有限公司 A kind of changing speed of sound method and computing device based on music rhythm
CN111402843A (en) * 2020-03-23 2020-07-10 北京字节跳动网络技术有限公司 Rap music generation method and device, readable medium and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103035235A (en) * 2011-09-30 2013-04-10 西门子公司 Method and device for transforming voice into melody
CN105931625A (en) * 2016-04-22 2016-09-07 成都涂鸦科技有限公司 Rap music automatic generation method based on character input
CN111862913B (en) * 2020-07-16 2023-09-05 广州市百果园信息技术有限公司 Method, device, equipment and storage medium for converting voice into rap music

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5811707A (en) * 1994-06-24 1998-09-22 Roland Kabushiki Kaisha Effect adding system
CN101399036A (en) * 2007-09-30 2009-04-01 三星电子株式会社 Device and method for conversing voice to be rap music
CN103440862A (en) * 2013-08-16 2013-12-11 北京奇艺世纪科技有限公司 Method, device and equipment for synthesizing voice and music
CN107170464A (en) * 2017-05-25 2017-09-15 厦门美图之家科技有限公司 A kind of changing speed of sound method and computing device based on music rhythm
CN111402843A (en) * 2020-03-23 2020-07-10 北京字节跳动网络技术有限公司 Rap music generation method and device, readable medium and electronic equipment

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022012164A1 (en) * 2020-07-16 2022-01-20 百果园技术(新加坡)有限公司 Method and apparatus for converting voice into rap music, device, and storage medium
CN113823281A (en) * 2020-11-24 2021-12-21 北京沃东天骏信息技术有限公司 Voice signal processing method, device, medium and electronic equipment
CN113823281B (en) * 2020-11-24 2024-04-05 北京沃东天骏信息技术有限公司 Voice signal processing method, device, medium and electronic equipment
CN112669849A (en) * 2020-12-18 2021-04-16 百度国际科技(深圳)有限公司 Method, apparatus, device and storage medium for outputting information
CN112712783A (en) * 2020-12-21 2021-04-27 北京百度网讯科技有限公司 Method and apparatus for generating music, computer device and medium
CN112712783B (en) * 2020-12-21 2023-09-29 北京百度网讯科技有限公司 Method and device for generating music, computer equipment and medium
CN112700781A (en) * 2020-12-24 2021-04-23 江西台德智慧科技有限公司 Voice interaction system based on artificial intelligence
CN112700781B (en) * 2020-12-24 2022-11-11 江西台德智慧科技有限公司 Voice interaction system based on artificial intelligence

Also Published As

Publication number Publication date
WO2022012164A1 (en) 2022-01-20
CN111862913B (en) 2023-09-05

Similar Documents

Publication Publication Date Title
CN111862913A (en) Method, device, equipment and storage medium for converting voice into rap music
US10776422B2 (en) Dual sound source audio data processing method and apparatus
CN112382257B (en) Audio processing method, device, equipment and medium
JP2000099015A (en) Automatic music composing device and storage medium
Streich Music complexity: a multi-faceted description of audio content
EP2515249B1 (en) Performance data search using a query indicative of a tone generation pattern
CN110120212B (en) Piano auxiliary composition system and method based on user demonstration audio frequency style
JP2020003536A (en) Learning device, automatic music transcription device, learning method, automatic music transcription method and program
CN105718486B (en) Online humming retrieval method and system
JPH0254300A (en) Automatic music selection device
CN109841203B (en) Electronic musical instrument music harmony determination method and system
JPH11272274A (en) Method for retrieving piece of music by use of singing voice
CN110134823B (en) MIDI music genre classification method based on normalized note display Markov model
CN112820254A (en) Music generation method and device, electronic equipment and storage medium
JP6314884B2 (en) Reading aloud evaluation device, reading aloud evaluation method, and program
CN108922505B (en) Information processing method and device
JP2014013340A (en) Music composition support device, music composition support method, music composition support program, recording medium storing music composition support program and melody retrieval device
CN110782866A (en) Singing sound converter
JP4491743B2 (en) Karaoke equipment
CN111785236A (en) Automatic composition method based on motivational extraction model and neural network
CN103440250A (en) Embedded humming retrieval method and system based on 16-bit DSP (Digital Signal Processing) platform application
CN112825244B (en) Music audio generation method and device
JP2022033579A (en) Music structure analyzing device
JPH10105173A (en) Performance data converter
CN115206270A (en) Training method and training device of music generation model based on cyclic feature extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant