WO2023065641A1 - Speech synthesis method and system based on text editor, and storage medium - Google Patents

Speech synthesis method and system based on text editor, and storage medium Download PDF

Info

Publication number
WO2023065641A1
WO2023065641A1 PCT/CN2022/090728 CN2022090728W WO2023065641A1 WO 2023065641 A1 WO2023065641 A1 WO 2023065641A1 CN 2022090728 W CN2022090728 W CN 2022090728W WO 2023065641 A1 WO2023065641 A1 WO 2023065641A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target
operation instruction
function operation
pinyin
Prior art date
Application number
PCT/CN2022/090728
Other languages
French (fr)
Chinese (zh)
Inventor
侯炜健
陈闽川
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023065641A1 publication Critical patent/WO2023065641A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers

Definitions

  • the present application relates to the technical field of artificial intelligence, in particular to a speech synthesis method, system and storage medium based on a text editor.
  • the speech synthesis markup language specification is usually based on the markup language EXtensible Markup Language (XML).
  • XML EXtensible Markup Language
  • Extensible Markup Language has strict format requirements, so when inputting the Speech Synthesis Markup Language, one more word or one less word may make the entire Speech Synthesis Markup Language illegal and thus unparseable. Therefore, with this speech synthesis method, because it is difficult for the user to write a standard speech synthesis markup language, it is easy to make mistakes, so that the speech cannot be synthesized.
  • the embodiment of the present application provides a text editor-based speech synthesis method, comprising the following steps:
  • a speech synthesis result is obtained according to the unprocessed data, the target data, the target correction data and the target modification data.
  • the embodiment of the present application provides a speech synthesis system based on a text editor, including:
  • the data input module is used to obtain the text data to be processed
  • the data selection module is used to obtain the data selection function operation instruction according to the text data to be processed
  • a target data acquisition module configured to obtain unprocessed data and target data within a first selection range according to the data selection function operation instruction, wherein the text data to be processed includes the unprocessed data and the target data;
  • a target correction module configured to obtain a correction function operation instruction according to the target data, and obtain target correction data corresponding to the target data according to the correction function operation instruction;
  • a target modification module configured to obtain a modification function operation instruction according to the target data, and obtain target modification data corresponding to the target data according to the modification function operation instruction;
  • a speech synthesis module configured to obtain a speech synthesis result according to the unprocessed data, the target data, the target correction data and the target modification data.
  • the embodiment of the present application provides a speech synthesis system based on a text editor, including: a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the A computer program realizes a text editor-based speech synthesis method, wherein the text editor-based speech synthesis method includes: obtaining text data to be processed; according to the text data to be processed, obtaining a data selection function operation instruction; Obtain unprocessed data and target data within the first selection range according to the data selection function operation instruction, wherein the text data to be processed includes the unprocessed data and the target data; obtain correction according to the target data
  • the functional operation instruction according to the correction function operation instruction, obtains the target correction data corresponding to the target data; obtains the modification function operation instruction according to the target data, and obtains the modification function operation instruction corresponding to the target data according to the modification function operation instruction.
  • the target modification data according to the unprocessed data, the target data, the target correction data and the target modification data,
  • the embodiment of the present application provides a computer-readable storage medium, which stores computer-executable instructions, and the computer-executable instructions are used to execute a speech synthesis method based on a text editor, wherein the method based on The speech synthesis method of the text editor includes the following steps: obtaining text data to be processed; obtaining a data selection function operation instruction according to the text data to be processed; obtaining unprocessed data and a first selection range according to the data selection function operation instruction
  • the target data in the target data wherein, the text data to be processed includes the unprocessed data and the target data; according to the target data, the correction function operation instruction is obtained, and according to the correction function operation instruction, the target data is obtained.
  • Corresponding target correction data obtain modification function operation instructions according to the target data, and obtain target modification data corresponding to the target data according to the modification function operation instructions; according to the unprocessed data, the target data, the The target correction data and the target modification data are used to obtain a speech synthesis result.
  • the speech synthesis method, system and storage medium based on the text editor proposed in the embodiment of the present application first obtain the text data to be processed; then obtain the data selection function operation instruction according to the text data to be processed, and then obtain the unprocessed data and the first Select the target data within the range, wherein the text data to be processed includes unprocessed data and target data; then obtain the correction function operation instruction according to the target data, and then obtain the target correction data corresponding to the target data according to the correction function operation instruction; according to the target
  • the data acquisition modification function operation instruction is used to obtain the target modification data corresponding to the target data according to the modification function operation instruction; according to the unprocessed data, the target data, the target correction data and the target modification data, the speech synthesis result is obtained.
  • This embodiment can effectively reduce the situation that the speech synthesis markup language is difficult to write and cause errors, and ensure the effect of speech synthesis.
  • Fig. 1 is the schematic diagram of a kind of text editor of the embodiment of the present application
  • Fig. 2 is a flow chart of a text editor-based speech synthesis method according to an embodiment of the present application
  • Fig. 3 is the schematic flow chart that obtains target pinyin data of the embodiment of the present application.
  • Fig. 4 is the concrete embodiment figure of the candidate phonetic result of the embodiment of the present application.
  • Fig. 5 is the concrete embodiment figure of the target pinyin data of the embodiment of the present application.
  • Fig. 6 is the schematic flow chart of the polyphone pronunciation correcting function operation instruction of the embodiment of the present application.
  • Fig. 7 is the specific display example figure of the English data of the embodiment of the present application.
  • FIG. 8 is a diagram of a specific display example of digital symbol data in the embodiment of the present application.
  • FIG. 9 is a diagram of a specific embodiment of the target correction digital symbol data in the embodiment of the present application.
  • FIG. 10 is a schematic flow diagram of obtaining target modification data according to an embodiment of the present application.
  • Fig. 11 is a specific embodiment figure of the target pause data of the embodiment of the present application.
  • FIG. 12 is a diagram of a specific embodiment of the target mute data in the embodiment of the present application.
  • Fig. 13 is the concrete embodiment diagram of the target special effect sound data of the embodiment of the present application.
  • FIG. 14 is a schematic flow diagram of obtaining target priority modification data according to an embodiment of the present application.
  • FIG. 15 is a diagram of a specific embodiment of the multi-speaker function of the embodiment of the present application.
  • Fig. 16 is a diagram of a specific embodiment of the local speed change function of the embodiment of the present application.
  • Figure 17 is a diagram of specific examples of various modification functions of the embodiment of the present application.
  • Fig. 18 is a diagram of a specific embodiment of continuous reading data in the second selection range of the embodiment of the present application
  • Fig. 19 is a diagram of a specific embodiment of the target data within the second selection range of the embodiment of the present application.
  • FIG. 20 is a diagram of a specific embodiment of continuous reading data in the embodiment of the present application.
  • FIG. 21 is a schematic structural diagram of a speech synthesis system based on a text editor according to an embodiment of the present application.
  • Speech synthesis technology has gradually developed to be able to use AI technology to generate high-quality speech.
  • the speech synthesis markup language specification is usually based on the markup language extensible markup language.
  • Extensible Markup Language has strict format requirements, so when inputting the Speech Synthesis Markup Language, one more word or one less word may make the entire Speech Synthesis Markup Language illegal and thus unparseable. Therefore, with this speech synthesis method, because it is difficult for the user to write a standard speech synthesis markup language, it is easy to make mistakes, so that the speech cannot be synthesized.
  • the embodiments of the present application provide a speech synthesis method, system and storage medium based on a text editor.
  • the embodiments of the present application can effectively reduce the situation that the speech synthesis markup language is difficult to write and cause errors, and ensure the effect of speech synthesis.
  • the speech synthesis method in the embodiment of the present application is based on a text editor, which can be used to input text data to be processed, and the text data to be processed may include Chinese character data, English data, digital data, symbol data, etc. .
  • the above text data to be processed can be processed through functional operation instructions such as data selection function operation instructions, correction function operation instructions and modification function operation instructions.
  • the speech synthesis result can also be output in a speech synthesis markup language format. For example, after data processing is performed on the text data to be processed in a text editor, a speech synthesis result is obtained.
  • the text editor can implement an editable function based on a web page.
  • the text editor can be similar to a rich text editor.
  • the speech synthesis effect is realized based on a text editor, which solves the problem that users are prone to make mistakes because it is difficult to write a standard speech synthesis markup language.
  • the text editor can be set to be visualized, so that the indirect editing process of the speech synthesis markup language can also be visualized to improve interactivity.
  • the text editor in the embodiment of the present application can also be operated normally through a web browser. With such a setting, the user can run the text editor anytime and anywhere based on the network without downloading other software.
  • the text editor can also implement a saving function. By selecting the saved text data to be processed and continuing to edit the saved text data to be processed, a speech synthesis result, such as an audio result, is obtained, thereby improving practicability.
  • the text editor of the embodiment of the present application may include modification functions, such as audition function, pinyin/English translation display function, continuous reading function, partial speed change function, pause function, mute function, multi-speaker function, special effect function, pitch change function, change Volume function, background sound function, multi-tone function (that is, optional customer service tone), etc.; correction functions, such as pinyin (multi-phonetic characters) pronunciation correction function, English pronunciation correction function, number symbol pronunciation correction function, etc., also include new text function, Save text function, download function, export speech synthesis markup language text function (ie export SSML) and so on.
  • modification functions such as audition function, pinyin/English translation display function, continuous reading function, partial speed change function, pause function, mute function, multi-speaker function, special effect function, pitch change function, change Volume function, background sound function, multi-tone function (that is, optional customer service tone), etc.
  • correction functions such as pinyin (multi-phonetic characters) pronunciation correction function, English pronunciation correction function, number symbol
  • each function of the text editor corresponds to a functional operation instruction
  • the audition function corresponds to the audition function operation instruction
  • the local speed change function corresponds to the local speed change function operation instruction
  • the volume change function corresponds to the strain relief function.
  • the modifying function operation instruction includes a trial listening function operation instruction
  • the target trial listening data is obtained according to the trial listening function operation instruction obtained by obtaining the trial listening function operation instruction according to the target data.
  • the modifying function operation instruction includes a transposition function operation instruction, and by obtaining the transposition function operation instruction according to the target data, the target transposition data is obtained according to the transposition function operation instruction, and so on. Obtain the desired target processing data through functional operation instructions to increase the diversity of speech synthesis.
  • the text data corresponding to each line 1, 2, 3...7 is input text data to be processed. Since the text data to be processed needs to be selected, the unprocessed data and target data can be obtained according to the data selection function operation instructions. It can be understood that the target data is the data that needs to be processed, and after data processing is performed on the target data, the target processing data can be obtained.
  • the word "Le” represents the target data (or pinyin data)
  • the pinyin above the word "Le” represents the target correction data (or target pinyin data).
  • “Yangtze River Bridge” indicates the target data (or continuous reading data), and the continuous reading symbols on the left and right sides of “Yangtze River Bridge” indicate the target modification data (or target continuous reading data), and the character “long” It can also represent polyphonic data, and the pinyin above the word “long” then represents the target correction data (or target polyphonic data).
  • numbers 50 and -50 may represent target modification data (or target local shifting data), that is, the size of the number is the magnitude of the shifting.
  • 400ms and 600ms may represent target modification data (or target mute data), and 400ms and 600ms represent the duration of silence. It can be understood that, in other embodiments, when the target modification data is a number, the number can be expressed as volume, pitch, speech speed, etc., and is not limited to this embodiment.
  • the embodiment of the present application provides a text editor-based speech synthesis method, including the following steps:
  • Step S100 acquiring text data to be processed
  • Chinese character data, English data, number data, symbol data or combined data can be input based on a text editor, and the text data to be processed can be commonly used data by users.
  • Step S200 according to the text data to be processed, obtain the data selection function operation instruction
  • Step S300 obtain unprocessed data and target data within the first selection range according to the data selection function operation instruction, wherein the text data to be processed includes unprocessed data and target data;
  • the target data within the first selection range is obtained according to the data selection function operation instruction. It can be understood that the target data can be part of the text data to be processed, or the entire text data to be processed; if the target data is the entire text data to be processed, the unprocessed data is none;
  • Step S400 obtaining the correcting function operation instruction according to the target data, and obtaining the target correcting data corresponding to the target data according to the correcting function operating command;
  • the correction function operation instruction is executed on the target data, and the target correction data corresponding to the target data is obtained according to the correction function operation instruction.
  • correcting the erroneous data may be to obtain an audio result by performing data processing on the text data to be processed in advance, such as playing the above audio result through a trial listening function.
  • the correcting function operation instruction in this embodiment may be a pinyin pronunciation correction function operation instruction, an English pronunciation correction function operation instruction, a numeral symbol pronunciation correction function operation instruction, a typo correction function operation instruction, and the like.
  • Step S500 obtain the modification function operation instruction according to the target data, and obtain the target modification data corresponding to the target data according to the modification function operation instruction;
  • the modification function operation instruction is executed on the target data, and the target modification data corresponding to the target data is obtained according to the modification function operation instruction.
  • the target data is modified so that the target data can have a certain modification effect.
  • the modification function operation instruction may be a continuous reading function operation instruction, a speed change function operation instruction, a scene function operation instruction, an insertion point function operation instruction, etc., so as to obtain corresponding target modification data corresponding to the target data.
  • Step S600 Obtain a speech synthesis result according to the unprocessed data, target data, target corrected data and target modified data.
  • this embodiment executes the correction function operation instruction on the target data to obtain the target correction data, and executes the modification function operation instruction on the target data to obtain the target modification data.
  • Target correction data and target modification data add correction and modification effects to the corresponding target data
  • the text data to be processed also includes unprocessed data other than the target data.
  • unprocessed data and target data When outputting, combine the above unprocessed data and target data , target correction data and target modification data to obtain the final speech synthesis result. It can be understood that even if unprocessed data is included in the text data to be processed, the unprocessed data still corresponds to corresponding audio or pronunciation, so the unprocessed data also needs to be part of the speech synthesis result.
  • This embodiment can act on a Chinese character data or English data or numeral symbol data.
  • the user thinks that the pronunciation or reading method corresponding to a certain text data to be processed is incorrect after performing the audition function, he can obtain the data selection function operation instruction according to the text data to be processed, so as to obtain the target data in the first selection range, and then The target data executes the correction function operation instruction, and then the correction function can be performed.
  • the target data includes pinyin data
  • the target correction data includes the target pinyin data
  • the correcting function operation instruction includes the pinyin pronunciation correction function operation instruction
  • the correction function operation instruction is obtained according to the target data
  • the correction function operation instruction is obtained according to the target data, and according to the correction function operation instruction, Obtain the target correction data corresponding to the target data, including:
  • Step S410 according to the pinyin data, at least one candidate pinyin result corresponding to the pinyin data is obtained, wherein the candidate pinyin results are sorted according to the pronunciation probability value corresponding to each candidate pinyin result;
  • the text editor can sort the candidate pinyin results according to the pronunciation probability value corresponding to each candidate pinyin result.
  • the pinyin data within the first selection range is obtained.
  • the first selection range may include a single quantity of Pinyin data, or multiple quantities of Pinyin data.
  • the pinyin data includes monophone data and polyphonic data.
  • the candidate pinyin result is one type.
  • the correct pinyin can be obtained through the candidate pinyin result.
  • the pinyin data is polyphone data
  • the candidate pinyin results can be sorted according to the pronunciation probability values corresponding to at least two candidate pinyin results.
  • Step S420 according to the candidate pinyin result, obtain the pinyin pronunciation correction function operation instruction
  • Step S430 according to the pinyin pronunciation correction function operation instruction, obtain the target pinyin data corresponding to the pinyin data.
  • the pinyin pronunciation correction function operation instruction is obtained, so as to obtain the target pinyin data corresponding to the pinyin data according to the pinyin pronunciation correction function operation instruction.
  • the pinyin pronunciation correction function operation instruction can be executed on the pinyin data, so that the user can correct the pinyin data and obtain the target pinyin data.
  • the candidate pinyin results can be displayed in the form of pinyin plus tones.
  • the candidate pinyin result can be set above or below the corresponding pinyin data, or can be set on the side of the corresponding pinyin data and so on.
  • "music" word represents the pinyin data of the present embodiment, by selecting "music" word, according to this "music” word, obtains at least two kinds of candidate pinyin results corresponding to "music" word, present embodiment has le4 , yue4, yao4, lao4 four candidate pinyin results, and the candidate pinyin results are sorted according to the pronunciation probability value. Among them, 4 represents tone.
  • the pinyin pronunciation correction function operation instruction obtain the target pinyin data corresponding to " happy " word, i.e. yue4 of the present embodiment.
  • the target pinyin data and the corresponding pinyin data i.e. the word " ⁇ " can be set to the same color, and the corresponding target pinyin data or pinyin data is different from the color of the unprocessed data, thereby distinguishing the target pinyin data to represent the target pinyin data.
  • the pinyin data implements the operation instructions of the pinyin pronunciation correction function to improve recognition.
  • the phonetic pronunciation correction function operation instruction includes the polyphone pronunciation correction function operation instruction, and according to the phonetic data, at least one candidate phonetic result corresponding to the phonetic data is obtained, including:
  • Step S411 according to the pinyin data, obtain the polyphone pronunciation correction function operation instruction
  • Step S412 according to the polyphone pronunciation correction function operation instruction, at least two candidate pinyin results corresponding to the pinyin data are obtained.
  • the pinyin data in this embodiment can include monophonic data and polyphonic data.
  • the pinyin data is polyphonic data
  • the polyphonic pronunciation correction function operation instruction is obtained
  • the polyphonic pronunciation correction function operation instruction is obtained according to the polyphonic pronunciation correction function operation instruction.
  • At least two candidate pinyin results corresponding to the data, so that users can correct the pronunciation of polyphonic characters according to the pinyin data, and realize functional diversity.
  • the target pinyin data can be displayed in a special color above the pinyin data such as " ⁇ ", and the pinyin data also changes color synchronously, so that the user can directly know which pinyin data executes the pinyin pronunciation correction function operation instruction.
  • the text editor obtains the correction function operation instruction according to the target number symbol data, and obtains the target correction number symbol data corresponding to the target number symbol data according to the correction function operation instruction, as shown in Figure 9, and the date indicates the target correction Numeric sign data.
  • the modification function operation instruction includes an insertion point function operation instruction
  • the modification function operation instruction is obtained according to the target data
  • the target modification data corresponding to the target data is obtained according to the modification function operation instruction, including:
  • Step S510 according to the target data, obtain the target insertion position, wherein the target insertion position is the left position and/or the right position corresponding to the target data;
  • Step S520 according to the target insertion position, obtain the insertion point function operation instruction
  • Step S530 according to the insertion point function operation instruction, obtain the target modification data corresponding to the target insertion position, wherein the target modification data includes at least one of target pause data, target mute data, and target special effect sound data.
  • the insertion point function operation instruction can be executed on the target data, so that the target data has a modification effect.
  • the target data obtain the left position corresponding to the target data, that is, the target insertion position, and execute the insertion point function operation command at the left position corresponding to the target data, such as executing the pause function operation command, the mute function operation command, and the special sound function operation Instructions, etc., so that according to the insertion point function operation instructions, the target modification data corresponding to the left position corresponding to the target data is obtained, such as target pause data, target mute data, and target special effect sound data.
  • the target pause data corresponding to the target insertion position is obtained;
  • the target pause data can include short pause data and long pause data, wherein the time corresponding to the short pause and the long pause can be set by preset Pause time to determine.
  • the target mute data corresponding to the target insertion position is obtained; 400ms, 600ms, and 800ms in FIG. 12 are the target mute data, and the numbers indicate the mute duration.
  • the target special effect sound data corresponding to the target insertion position is obtained.
  • the target special effect sound data can be breath data, sigh data, heartbeat data, cough data, mouse click data, keyboard typing data, QQ message reminder data, etc.
  • modification function operation instructions include continuous reading function operation instructions, speed change function operation instructions, and scene function operation instructions.
  • Target modification data for including:
  • Step S550 according to the target data, obtain the continuous reading function operation instruction, the speed change function operation instruction and the scene function operation instruction;
  • Step S560 according to the priority corresponding to the continuous reading function operation command, the priority corresponding to the speed change function operation command, and the priority corresponding to the scene function operation command, obtain the target priority modification data corresponding to the target data, wherein the continuous reading function operation is defined
  • the priority corresponding to the instruction is higher than the priority corresponding to the speed change function operation instruction, and the priority corresponding to the speed change function operation instruction is higher than the priority corresponding to the scene function operation instruction.
  • the corresponding priority of the continuous reading function operation command is higher than that of the variable speed function operation command.
  • priority the priority corresponding to the speed change function operation command is greater than the priority corresponding to the scene function operation command, so that the priority of the continuous reading function corresponding to the continuous reading function operation command is greater than the priority of the speed change function corresponding to the speed change function operation command,
  • the priority of the speed change function corresponding to the speed change function operation command is higher than the priority of the scene function corresponding to the scene function operation command.
  • the scene includes multiple voices, multiple speakers and background sounds.
  • the scene function includes multi-tone function, multi-speaker function and background sound function. Then in one embodiment, the multi-tone function/multi-speaker function/background sound function is the outermost layer (the lowest priority), followed by the variable speed function (such as the local variable speed function), and finally the continuous reading function (the highest priority).
  • the multi-speaker function indicates that in some scenarios, the user wishes to synthesize a speech synthesis result of a two-person dialogue or even a multi-person dialogue.
  • the multi-tone function means switching between different tones according to the application scenario, such as customer service, telemarketing, broadcast, happy, angry, sad, etc.
  • the background sound function represents background music, white noise; in some embodiments, background sound can be applied to an audiobook scene.
  • the system will mistakenly stop a longer target data such as Chinese character data.
  • a longer target data such as Chinese character data.
  • “Wuhan Yangtze River Bridge” may be wrongly broken into “Wuhan Mayor/Jiang Bridge”, so the continuous reading function needs to ensure that the selected range There is no pause in the middle of the Chinese character data.
  • this embodiment can perform a local speed change function, or a pitch change function, volume change function, etc. for the target data within the selected range.
  • FIG. 17 it is a diagram of a specific embodiment when various modification functions are applied in this embodiment.
  • customer service Rongrong indicated that the target multi-modal third modification data was obtained according to the multi-modal function operation instructions and the corresponding priorities of the multi-modal function operation instructions;
  • the parameter value "25" indicates that according to the speed change function operation command and the priority corresponding to the speed change function operation command, the target obtained Variable speed second modification data.
  • the target priority modification data in this embodiment includes the first modification data of the target continuous reading, the second modification data of the target speed change, and the third modification data of the target multi-tone.
  • other different modification functions may also be applied, not limited to this embodiment, and will not be repeated here.
  • multiple different modification functions can also be applied to the target data corresponding to different intervals within the first selection range, and are not limited to this embodiment, and will not be repeated here. .
  • the first selection range corresponding to the original target data is deleted by reacquiring the data selection function operation instruction, and then according to the reacquired data selection function operation instruction, the target data in the second selection range can be obtained to transform Different selection ranges result in different target data.
  • the second selection range may be smaller than the first selection range, or larger than the first selection range, or equal to the first selection range. Specifically, after re-acquiring the data selection function operation instruction and deleting the first selection range, the obtained In the target data step within the second selection range, multiple different embodiments may be included.
  • the second selection range can be processed according to the principle of "the selection range is a new range, and other old ranges are removed".
  • the modification function without parameters as a parameterless function (for example, the continuous reading function, which means that the continuous reading data corresponding to the target continuous reading data corresponding to the continuous reading function can be directly split and merged); define the modification with parameters
  • the function is a parameter function (such as a speed change function, a pitch change function, a volume change function, etc. modification functions).
  • step S100 to step S500 that is, after the continuous reading function operation instruction is executed
  • step of re-acquiring the data selection function operation instruction is executed.
  • the first selection range is deleted, and the continuous reading data in the second selection range is obtained.
  • the target continuous reading data obtained in step S500 is still retained, and the target continuous reading data is correspondingly set on the left and right sides of the target data in the second selection range.
  • the data on the left indicates that the continuous reading data of the first selection range is obtained after the continuous reading function operation instruction, and the target continuous reading data and the continuous reading data of the first selection range are obtained, and then the data selection function operation is obtained again command, the original first selection range will be deleted, so as to obtain the continuous reading data in the second selection range, and the data on the right represents the continuous reading data in the second selection range and the corresponding target continuous reading data.
  • step S400 is performed to obtain The target correction data and/or the target modification data obtained through step S500 will still be retained. That is, this embodiment is to change the target data in the first selection range. After deleting the first selection range, by retaining the original target correction data and/or target modification data, the target data and the corresponding target data in the second selection range are obtained. The target correction data and/or the corresponding target modification data. That is, the modification function corresponding to the target modification data is retained.
  • step S100 to step S500 that is, after executing the modification function operation instruction, execute the reacquisition data selection function operation instruction, delete the first selection range, and obtain the target data within the second selection range.
  • the second selection range corresponding to the target data changes, and the parameter value corresponding to the corresponding target modification data will also change.
  • the selection conditions corresponding to the second selection range will include: single-sided selection conditions, internal selection There are three kinds of conditions, condition and selection condition on both sides.
  • the one-sided selection condition corresponding to the second selection range indicates that unprocessed data is selected at the left or right position of the target data within the first selection range, and the second selection range includes Part of the target data; at this time, the first selection range can be greater than or equal to or smaller than the second selection range;
  • the internal selection condition corresponding to the second selection range indicates that part of the target data is selected within the first selection range, that is, the first selection range is larger than the second selection range;
  • the selection condition on both sides corresponding to the second selection range indicates that the unprocessed data is selected on the left and right positions of the target data in the first selection range, and all targets in the first selection range are included in the second selection range
  • the data, that is, the second selection range is larger than the first selection range.
  • the single-side selection condition of this embodiment is described. Specifically, the data on the left side in Fig. 19 indicates that after the target data in the first selection range is modified by a function operation command (parameter function operation command), the target modification data and the target data in the first selection range are obtained. Re-acquire the operation command of the data selection function under the condition. At this time, the original first selection range will be deleted, so as to obtain the target data in the second selection range.
  • the data on the right side of the figure represent the target data in the second selection range and the corresponding Target modifier data.
  • the second selection range includes the first interval range and the second interval range.
  • the first interval range includes the first target data, which is the data in the first selection range that has not re-executed the data selection function operation instruction but has executed the modification function operation instruction; and the second interval range includes the second target data , the second target data is the data for which the operation instruction of the data selection function has been re-executed.
  • the target modification data corresponding to the first target data remains unchanged, that is, the modification function corresponding to the target data in the first selected range and the parameter value corresponding to the modification function are retained; but for the second target data, the second The modification function in the target modification data corresponding to the target data is not changed, but the parameter value is changed, that is, the modification function corresponding to the target data in the first selected range is retained, but the parameter value is changed.
  • the parameter value corresponding to the second target data may be a preset value, such as "50" in FIG. 19 .
  • the internal selection conditions of this embodiment are described by taking the third row of data as an example.
  • the data on the left side in Fig. 19 indicates that after the target data in the first selection range is modified by a function operation command (parameter function operation command), the target modification data and the target data in the first selection range are obtained, and then according to the internal selection conditions Re-acquire the operation command of the data selection function.
  • the original first selection range will be deleted, so as to obtain the target data in the second selection range.
  • the data on the right side of the figure represent the target data in the second selection range and the corresponding target. Decorate data.
  • the second selection range includes the first interval range and the second interval range.
  • the first interval range includes the first target data
  • the second interval range includes the second target data;
  • the modification function and parameter value corresponding to the first target data and the second target data can refer to the description of the above-mentioned one-sided selection condition, in This will not be repeated here.
  • this embodiment selects part of the target data in the middle of the first selection range, three sets of target modification data corresponding to the target data are obtained, but the parameter value 25 in this embodiment corresponds to "1" , "4" all belong to the first target data, that is, the data in the first selection range that has not re-executed the data selection function operation instruction but has executed the modification function operation instruction.
  • the data on the left side in Fig. 19 indicates that after the target data in the first selection range has been modified by a function operation command (parameter function operation command), the target modification data and the target data in the first selection range are obtained, and then selected according to the two sides Re-acquire the operation command of the data selection function under the condition. At this time, the original first selection range will be deleted, so as to obtain the target data in the second selection range.
  • the data on the right side of the figure represent the target data in the second selection range and the corresponding Target modifier data.
  • the second selection range includes the second interval range.
  • the second interval range includes second target data
  • the second target data is data on which the data selection function operation instruction has been re-executed.
  • the modification function in the target modification data corresponding to the second target data is not changed, but the parameter value is changed, that is, the modification function corresponding to the target data in the first selected range is retained, but the parameter value is changed.
  • the parameter value corresponding to the second target data may be a preset value, such as "50" in FIG. 19 .
  • the size of the parameter values in FIG. 19 can indicate the speed of the speed change, the height of the pitch change, the size of the change volume, and so on.
  • step S300 if the operation instruction of reacquiring the data selection function is directly executed, and the first selection range is deleted to obtain the target data in the second selection range, then When step S400 and/or step S500 are executed, the target correction data and target modification data at this time are the target data within the corresponding second selected range.
  • step S400 and/or step S500 are executed, the target correction data and target modification data at this time are the target data within the corresponding second selected range.
  • the speech synthesis result can be changed, so that the user can customize the desired audio and improve the applicability.
  • the modification function operation instruction includes continuous reading function operation instruction
  • the target data includes continuous reading data
  • the modification function operation instruction is obtained according to the target data
  • the target modification data corresponding to the target data is obtained according to the modification function operation instruction, including: If the number corresponding to the continuous reading data in the second selection range is one, cancel the continuous reading function operation instruction.
  • the modification function operation instruction includes a continuous reading function operation instruction. It should be noted that when the target data includes continuous reading data, if the number of continuous reading data in the second selected range is one, it is meaningless to execute the continuous reading function operation instruction on the continuous reading data.
  • the continuous reading function operation instruction is cancelled. It means that the text editor refuses to execute the continuous reading function operation instruction.
  • the continuous reading function operation instruction is obtained according to the continuous reading data
  • the target continuous reading data corresponding to the continuous reading data is obtained according to the continuous reading function operation instruction.
  • the continuous reading symbol mark can be used to represent it.
  • the target continuous reading data that is, the continuous reading symbol mark corresponds to the continuous reading data, and can be set on the left and right sides of the continuous reading data.
  • the corresponding number of consecutive reading data is at least two, that is, for the continuous reading data "continuous”, the number is two, “Continuous” is one, “continued” is one, and then execute the delete function operation instruction on the continuous reading data until the number corresponding to the continuous reading data in the second selection range is one, at this time, the continuous reading function operation instruction will also be cancelled. , so that the continuous reading data is converted into unprocessed data, such as the data in the third row on the right side of the arrow in FIG. 20 .
  • an embodiment of the present application also provides a speech synthesis system based on a text editor, including:
  • a data input module 100 configured to obtain text data to be processed
  • the data selection module 200 is used to obtain the data selection function operation instruction according to the text data to be processed
  • the target data acquisition module 300 is used to obtain unprocessed data and target data within the first selection range according to the data selection function operation instruction, wherein the text data to be processed includes unprocessed data and target data;
  • the target correction module 400 is configured to obtain a correction function operation instruction according to the target data, and obtain target correction data corresponding to the target data according to the correction function operation instruction;
  • the target modification module 500 is configured to obtain a modification function operation instruction according to the target data, and obtain target modification data corresponding to the target data according to the modification function operation instruction;
  • the speech synthesis module 600 is configured to obtain a speech synthesis result according to the unprocessed data, the target data, the target correction data and the target modification data.
  • an embodiment of the present application also provides a speech synthesis system based on a text editor, and the text editor-based speech synthesis system includes: a memory, a processor, and a computer stored in the memory and operable on the processor program, when the processor executes the computer program, a text editor-based speech synthesis method is implemented, wherein the text editor-based speech synthesis method includes: acquiring text data to be processed; data, to obtain a data selection function operation instruction; according to the data selection function operation instruction, unprocessed data and target data within the first selection range are obtained, wherein the text data to be processed includes the unprocessed data and the target data; obtain correcting function operation instructions according to the target data, obtain target correction data corresponding to the target data according to the correcting function operation instructions; obtain modification function operation instructions according to the target data, and operate according to the modification function an instruction to obtain target modification data corresponding to the target data; and obtain a speech synthesis result according to the unprocessed data, the target data, the target correction data and the target modification data.
  • the processor and memory can be connected by a bus or other means.
  • memory can be used to store non-transitory software programs and non-transitory computer-executable programs.
  • the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices.
  • the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the text editor-based speech synthesis system in this embodiment can be applied as the text editor-based speech synthesis method in the above-mentioned embodiment, and the text editor-based speech synthesis system and The speech synthesis method based on a text editor as in the above embodiments has the same inventive concept, so these embodiments have the same implementation principle and technical effect, and will not be described in detail here.
  • the non-transitory software programs and instructions required for realizing the speech synthesis method based on the text editor of the above-mentioned embodiment are stored in the memory, and when executed by the processor, the speech synthesis method based on the text editor in the above-mentioned embodiment is executed, For example, the above described method steps S100 to S600 in FIG. 2, method steps S410 to S430 in FIG. 3, method steps S411 to S412 in FIG. 6, method steps S510 to S530 in FIG. Method steps S550 to S560.
  • an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by a processor or a controller, for example, by the above-mentioned Execution by a processor in the speech synthesis system embodiment of the text editor can cause the processor to execute the speech synthesis method based on the text editor in the above embodiment, for example, execute the method steps S100 to 2 in FIG. 2 described above. S600, method steps S410 to S430 in FIG. 3 , method steps S411 to S412 in FIG. 6 , method steps S510 to S530 in FIG. 10 , method steps S550 to S560 in FIG. 14 .
  • the computer-readable storage medium may be non-volatile or volatile.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Document Processing Apparatus (AREA)

Abstract

Disclosed are a speech synthesis method and system based on a text editor, and a storage medium. The present application can be widely applied to the technical field of artificial intelligence. The method of the present application comprises the following steps: acquiring text data to be processed; acquiring a data selection function operation instruction according to said text data; according to the data selection function operation instruction, obtaining unprocessed data, and target data which is within a first selection range, wherein said text data comprises the unprocessed data and the target data; acquiring a correction function operation instruction according to the target data, and obtaining target corrected data according to the correction function operation instruction; acquiring a modification function operation instruction according to the target data, and obtaining target modified data according to the modification function operation instruction; and obtaining a speech synthesis result according to the unprocessed data, the target data, the target corrected data and the target modified data. By means of the present application, errors caused by the fact that a speech synthesis markup language is difficult to write can be effectively reduced.

Description

基于文本编辑器的语音合成方法、***和存储介质Speech synthesis method, system and storage medium based on text editor
本申请要求于2021年10月22日提交中国专利局、申请号为202111236415.5,发明名称为“基于文本编辑器的语音合成方法、***和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111236415.5 filed on October 22, 2021, and the title of the invention is "Text Editor-Based Speech Synthesis Method, System, and Storage Medium", the entire content of which is passed References are incorporated in this application.
技术领域technical field
本申请涉及人工智能技术领域,尤其涉及一种基于文本编辑器的语音合成方法、***和存储介质。The present application relates to the technical field of artificial intelligence, in particular to a speech synthesis method, system and storage medium based on a text editor.
背景技术Background technique
随着人工智能(Artificial Intelligence,AI)的快速发展,语音合成技术也逐渐发展成能够利用AI技术来生成高质量的语音。例如,用户输入一段文字,通过选择需要的音库后便能很快合成极其真实的音频。目前W3C定义了语音合成标记语言(Speech Synthesis Markup Language,SSML)。With the rapid development of artificial intelligence (AI), speech synthesis technology has gradually developed to be able to use AI technology to generate high-quality speech. For example, the user inputs a piece of text, and after selecting the required sound bank, the extremely realistic audio can be quickly synthesized. Currently W3C defines Speech Synthesis Markup Language (SSML).
技术问题technical problem
以下是发明人意识到的现有技术的技术问题:相关技术中,语音合成标记语言规范通常都是基于可扩展标记语言(EXtensible Markup Language,XML)这种标记语言的。然而,可扩展标记语言具有严格的格式要求,故在输入语音合成标记语言时,多一个字或少一个字都有可能导致整个语音合成标记语言不合法,进而导致无法解析。因此,通过这种语音合成的方式,由于用户通常难以书写出规范的语音合成标记语言,导致容易出错,从而造成语音无法合成。The following is the technical problem of the prior art realized by the inventor: in the related art, the speech synthesis markup language specification is usually based on the markup language EXtensible Markup Language (XML). However, Extensible Markup Language has strict format requirements, so when inputting the Speech Synthesis Markup Language, one more word or one less word may make the entire Speech Synthesis Markup Language illegal and thus unparseable. Therefore, with this speech synthesis method, because it is difficult for the user to write a standard speech synthesis markup language, it is easy to make mistakes, so that the speech cannot be synthesized.
技术解决方案technical solution
第一方面,本申请实施例提供了一种基于文本编辑器的语音合成方法,包括以下步骤:In the first aspect, the embodiment of the present application provides a text editor-based speech synthesis method, comprising the following steps:
获取待处理文本数据;Get the text data to be processed;
根据所述待处理文本数据,获取数据选取功能操作指令;Obtain a data selection function operation instruction according to the text data to be processed;
根据所述数据选取功能操作指令,得到未处理数据和第一选取范围内的目标数据,其中,所述待处理文本数据包括所述未处理数据和所述目标数据;Obtain unprocessed data and target data within the first selection range according to the data selection function operation instruction, wherein the text data to be processed includes the unprocessed data and the target data;
根据所述目标数据获取纠正功能操作指令,根据所述纠正功能操作指令,得到与所述目标数据对应的目标纠正数据;Acquiring a correction function operation instruction according to the target data, and obtaining target correction data corresponding to the target data according to the correction function operation instruction;
根据所述目标数据获取修饰功能操作指令,根据所述修饰功能操作指令,得到与所述目标数据对应的目标修饰数据;Acquiring a modification function operation instruction according to the target data, and obtaining target modification data corresponding to the target data according to the modification function operation instruction;
根据所述未处理数据、所述目标数据、所述目标纠正数据和所述目标修饰数据,得到语音合成结果。A speech synthesis result is obtained according to the unprocessed data, the target data, the target correction data and the target modification data.
第二方面,本申请实施例提供了一种基于文本编辑器的语音合成***,包括:In the second aspect, the embodiment of the present application provides a speech synthesis system based on a text editor, including:
数据输入模块,用于获取待处理文本数据;The data input module is used to obtain the text data to be processed;
数据选取模块,用于根据所述待处理文本数据,获取数据选取功能操作指令;The data selection module is used to obtain the data selection function operation instruction according to the text data to be processed;
目标数据获取模块,用于根据所述数据选取功能操作指令,得到未处理数据和第一选取范围内的目标数据,其中,所述待处理文本数据包括所述未处理数据和所述目标数据;A target data acquisition module, configured to obtain unprocessed data and target data within a first selection range according to the data selection function operation instruction, wherein the text data to be processed includes the unprocessed data and the target data;
目标纠正模块,用于根据所述目标数据获取纠正功能操作指令,根据所述纠正功能操作指令,得到与所述目标数据对应的目标纠正数据;A target correction module, configured to obtain a correction function operation instruction according to the target data, and obtain target correction data corresponding to the target data according to the correction function operation instruction;
目标修饰模块,用于根据所述目标数据获取修饰功能操作指令,根据所述修饰功能操作指令,得到与所述目标数据对应的目标修饰数据;A target modification module, configured to obtain a modification function operation instruction according to the target data, and obtain target modification data corresponding to the target data according to the modification function operation instruction;
语音合成模块,用于根据所述未处理数据、所述目标数据、所述目标纠正数据和所述目标修饰数据,得到语音合成结果。A speech synthesis module, configured to obtain a speech synthesis result according to the unprocessed data, the target data, the target correction data and the target modification data.
第三方面,本申请实施例提供了一种基于文本编辑器的语音合成***,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现一种基于文本编辑器的语音合成方法,其中,所述基于文本编辑器的语音合成方法包括:获取待处理文本数据;根据所述待处理文本数据,获取数据选取功能操作指令;根据所述数据选取功能操作指令,得到未处理数据和第一选取范围内的目标数据,其中,所述待处理文本数据包括所述未处理数据和所述目标数据;根据所述目标数据获取纠正功能操作指令,根据所述纠正功能操作指令,得到与所述目标数据对应的目标纠正数据;根据所述目标数据获取修饰功能操作指令,根据所述修饰功能操作指令,得到与所述目标数据对应的目标修饰数据;根据所述未处理数据、所述目标数据、所述目标纠正数据和所述目标修饰数据,得到语音合成结果。In a third aspect, the embodiment of the present application provides a speech synthesis system based on a text editor, including: a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the A computer program realizes a text editor-based speech synthesis method, wherein the text editor-based speech synthesis method includes: obtaining text data to be processed; according to the text data to be processed, obtaining a data selection function operation instruction; Obtain unprocessed data and target data within the first selection range according to the data selection function operation instruction, wherein the text data to be processed includes the unprocessed data and the target data; obtain correction according to the target data The functional operation instruction, according to the correction function operation instruction, obtains the target correction data corresponding to the target data; obtains the modification function operation instruction according to the target data, and obtains the modification function operation instruction corresponding to the target data according to the modification function operation instruction. The target modification data; according to the unprocessed data, the target data, the target correction data and the target modification data, a speech synthesis result is obtained.
第四方面,本申请实施例提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行一种基于文本编辑器的语音合成方法,其中,所述基于文本编辑器的语音合成方法包括以下步骤:获取待处理文本数据;根据所述待处理文本数据,获取数据选取功能操作指令;根据所述数据选取功能操作指令,得到未处理数据和第一选取范围内的目标数据,其中,所述待处理文本数据包括所述未处理数据和所述目标数据;根据所述目标数据获取纠正功能操作指令,根据所述纠正功能操作指令,得到与所述目标数据对应的目标纠正数据;根据所述目标数据获取修饰功能操作指令,根据所述修饰功能操作指令,得到与所述目标数据对应的目标修饰数据;根据所述未处理数据、所述目标数据、所述目标纠正数据和所述目标修饰数据,得到语音合成结果。In a fourth aspect, the embodiment of the present application provides a computer-readable storage medium, which stores computer-executable instructions, and the computer-executable instructions are used to execute a speech synthesis method based on a text editor, wherein the method based on The speech synthesis method of the text editor includes the following steps: obtaining text data to be processed; obtaining a data selection function operation instruction according to the text data to be processed; obtaining unprocessed data and a first selection range according to the data selection function operation instruction The target data in the target data, wherein, the text data to be processed includes the unprocessed data and the target data; according to the target data, the correction function operation instruction is obtained, and according to the correction function operation instruction, the target data is obtained. Corresponding target correction data; obtain modification function operation instructions according to the target data, and obtain target modification data corresponding to the target data according to the modification function operation instructions; according to the unprocessed data, the target data, the The target correction data and the target modification data are used to obtain a speech synthesis result.
有益效果Beneficial effect
本申请实施例提出的基于文本编辑器的语音合成方法、***和存储介质,通过先获取待处理文本数据;再根据待处理文本数据来获取数据选取功能操作指令,进而得到未处理数据和第一选取范围内的目标数据,其中,待处理文本数据包括未处理数据和目标数据;之后根据目标数据获取纠正功能操作指令,再根据纠正功能操作指令,得到与目标数据对应的目标纠正数据;根据目标数据获取修饰功能操作指令,根据修饰功能操作指令,得到与目标数据对应的目标修饰数据;根据未处理数据、目标数据、目标纠正数据和目标修饰数据,得到语音合成结果。本实施例能够有效减少语音合成标记语言难以书写而导致出错的情况,保证语音合成效果。The speech synthesis method, system and storage medium based on the text editor proposed in the embodiment of the present application first obtain the text data to be processed; then obtain the data selection function operation instruction according to the text data to be processed, and then obtain the unprocessed data and the first Select the target data within the range, wherein the text data to be processed includes unprocessed data and target data; then obtain the correction function operation instruction according to the target data, and then obtain the target correction data corresponding to the target data according to the correction function operation instruction; according to the target The data acquisition modification function operation instruction is used to obtain the target modification data corresponding to the target data according to the modification function operation instruction; according to the unprocessed data, the target data, the target correction data and the target modification data, the speech synthesis result is obtained. This embodiment can effectively reduce the situation that the speech synthesis markup language is difficult to write and cause errors, and ensure the effect of speech synthesis.
附图说明Description of drawings
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。The accompanying drawings are used to provide a further understanding of the technical solution of the present application, and constitute a part of the specification, and are used together with the embodiments of the present application to explain the technical solution of the present application, and do not constitute a limitation to the technical solution of the present application.
图1为本申请实施例的一种文本编辑器的示意图;Fig. 1 is the schematic diagram of a kind of text editor of the embodiment of the present application;
图2为本申请实施例的一种基于文本编辑器的语音合成方法的流程图;Fig. 2 is a flow chart of a text editor-based speech synthesis method according to an embodiment of the present application;
图3为本申请实施例的得到目标拼音数据的流程示意图;Fig. 3 is the schematic flow chart that obtains target pinyin data of the embodiment of the present application;
图4为本申请实施例的候选拼音结果的具体实施例图;Fig. 4 is the concrete embodiment figure of the candidate phonetic result of the embodiment of the present application;
图5为本申请实施例的目标拼音数据的具体实施例图;Fig. 5 is the concrete embodiment figure of the target pinyin data of the embodiment of the present application;
图6为本申请实施例的多音字发音纠正功能操作指令的流程示意图;Fig. 6 is the schematic flow chart of the polyphone pronunciation correcting function operation instruction of the embodiment of the present application;
图7为本申请实施例的英文数据的具体显示实施例图;Fig. 7 is the specific display example figure of the English data of the embodiment of the present application;
图8为本申请实施例的数字符号数据的具体显示实施例图;FIG. 8 is a diagram of a specific display example of digital symbol data in the embodiment of the present application;
图9为本申请实施例的目标纠正数字符号数据的具体实施例图;FIG. 9 is a diagram of a specific embodiment of the target correction digital symbol data in the embodiment of the present application;
图10为本申请实施例的得到目标修饰数据的流程示意图;FIG. 10 is a schematic flow diagram of obtaining target modification data according to an embodiment of the present application;
图11为本申请实施例的目标停顿数据的具体实施例图;Fig. 11 is a specific embodiment figure of the target pause data of the embodiment of the present application;
图12为本申请实施例的目标静音数据的具体实施例图;FIG. 12 is a diagram of a specific embodiment of the target mute data in the embodiment of the present application;
图13为本申请实施例的目标特效音数据的具体实施例图;Fig. 13 is the concrete embodiment diagram of the target special effect sound data of the embodiment of the present application;
图14为本申请实施例的得到目标优先修饰数据的流程示意图;FIG. 14 is a schematic flow diagram of obtaining target priority modification data according to an embodiment of the present application;
图15为本申请实施例的多发音人功能的具体实施例图;FIG. 15 is a diagram of a specific embodiment of the multi-speaker function of the embodiment of the present application;
图16为本申请实施例的局部变速功能的具体实施例图;Fig. 16 is a diagram of a specific embodiment of the local speed change function of the embodiment of the present application;
图17为本申请实施例的多种不同的修饰功能的具体实施例图;Figure 17 is a diagram of specific examples of various modification functions of the embodiment of the present application;
图18为本申请实施例的第二选取范围内的连读数据的具体实施例图Fig. 18 is a diagram of a specific embodiment of continuous reading data in the second selection range of the embodiment of the present application
图19为本申请实施例的第二选取范围内的目标数据的具体实施例图;Fig. 19 is a diagram of a specific embodiment of the target data within the second selection range of the embodiment of the present application;
图20为本申请实施例的连读数据的具体实施例图;FIG. 20 is a diagram of a specific embodiment of continuous reading data in the embodiment of the present application;
图21为本申请实施例的一种基于文本编辑器的语音合成***的结构示意图。FIG. 21 is a schematic structural diagram of a speech synthesis system based on a text editor according to an embodiment of the present application.
本发明的实施方式Embodiments of the present invention
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that although the functional modules are divided in the schematic diagram of the device, and the logical sequence is shown in the flowchart, in some cases, it can be executed in a different order than the module division in the device or the flowchart in the flowchart. steps shown or described. The terms "first", "second" and the like in the specification and claims and the above drawings are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence.
随着人工智能的快速发展,语音合成技术也逐渐发展成能够利用AI技术来生成高质量的语音。相关技术中,语音合成标记语言规范通常都是基于可扩展标记语言这种标记语言的。然而,可扩展标记语言具有严格的格式要求,故在输入语音合成标记语言时,多一个字或少一个字都有可能导致整个语音合成标记语言不合法,进而导致无法解析。因此,通过这种语音合成的方式,由于用户通常难以书写出规范的语音合成标记语言,导致容易出错,从而造成语音无法合成。With the rapid development of artificial intelligence, speech synthesis technology has gradually developed to be able to use AI technology to generate high-quality speech. In related technologies, the speech synthesis markup language specification is usually based on the markup language extensible markup language. However, Extensible Markup Language has strict format requirements, so when inputting the Speech Synthesis Markup Language, one more word or one less word may make the entire Speech Synthesis Markup Language illegal and thus unparseable. Therefore, with this speech synthesis method, because it is difficult for the user to write a standard speech synthesis markup language, it is easy to make mistakes, so that the speech cannot be synthesized.
基于此,本申请实施例提供了一种基于文本编辑器的语音合成方法、***和存储介质。本申请实施例能够有效减少语音合成标记语言难以书写而导致出错的情况,保证语音合成效果。Based on this, the embodiments of the present application provide a speech synthesis method, system and storage medium based on a text editor. The embodiments of the present application can effectively reduce the situation that the speech synthesis markup language is difficult to write and cause errors, and ensure the effect of speech synthesis.
可以理解的是,本申请实施例的语音合成方法基于文本编辑器,该文本编辑器能够用于输入待处理文本数据,待处理文本数据可以包括汉字数据、英文数据、数字数据、符号数据等等。通过在文本编辑器内输入汉字数据、英文数据、数字数据或符号数据后,能够通过数据选取功能操作指令、纠正功能操作指令和修饰功能操作指令等功能操作指令,来对上述待处理文本数据进行数据处理,以最终输出语音合成结果。可以理解的是,该语音合成结果还可以以语音合成标记语言格式输出。例如,在文本编辑器内对待处理文本数据进行数据处理后,得到语音合成结果。通过将该语音合成结果通过数据转换处理而转换为语音合成标记语言文本,从而便于将该语音合成标记语言文本导入其他***中继续使用,提高便捷性。It can be understood that the speech synthesis method in the embodiment of the present application is based on a text editor, which can be used to input text data to be processed, and the text data to be processed may include Chinese character data, English data, digital data, symbol data, etc. . After inputting Chinese character data, English data, digital data or symbol data in the text editor, the above text data to be processed can be processed through functional operation instructions such as data selection function operation instructions, correction function operation instructions and modification function operation instructions. Data processing to finally output speech synthesis results. It can be understood that the speech synthesis result can also be output in a speech synthesis markup language format. For example, after data processing is performed on the text data to be processed in a text editor, a speech synthesis result is obtained. By converting the speech synthesis result into speech synthesis markup language text through data conversion processing, it is convenient to import the speech synthesis markup language text into other systems for continued use, and the convenience is improved.
一些实施例中,该文本编辑器能够基于网页实现可编辑功能。例如,该文本编辑器可类似于富文本编辑器。通过基于文本编辑器,来实现语音合成效果,解决了用户由于难以书写出规范的语音合成标记语言导致容易出错的问题。In some embodiments, the text editor can implement an editable function based on a web page. For example, the text editor can be similar to a rich text editor. The speech synthesis effect is realized based on a text editor, which solves the problem that users are prone to make mistakes because it is difficult to write a standard speech synthesis markup language.
一些实施例中,该文本编辑器可设置为可视化,从而使得对语音合成标记语言的间接编辑过程也变得可视化,提高交互性。通过采用文本编辑器,能够帮助用户定制想要的音频,提高适用性。In some embodiments, the text editor can be set to be visualized, so that the indirect editing process of the speech synthesis markup language can also be visualized to improve interactivity. By adopting a text editor, it is possible to help users customize desired audio and improve applicability.
一些实施例中,本申请实施例的文本编辑器还可以通过web浏览器来实现正常运行。通过如此设置,使得用户无需下载其他软件,基于网络便可随时随地运行该文本编辑器。另一实施例中,该文本编辑器还能实现保存功能。通过选择已保存的待处理文本数据,并对已保存的待处理文本数据继续编辑,从而得到语音合成结果,例如得到音频结果,提高实用性。In some embodiments, the text editor in the embodiment of the present application can also be operated normally through a web browser. With such a setting, the user can run the text editor anytime and anywhere based on the network without downloading other software. In another embodiment, the text editor can also implement a saving function. By selecting the saved text data to be processed and continuing to edit the saved text data to be processed, a speech synthesis result, such as an audio result, is obtained, thereby improving practicability.
本申请实施例的文本编辑器可以包括修饰功能,如试听功能、拼音/英文翻译显示功能、连读功能、局部变速功能、停顿功能、静音功能、多发音人功能、特效功能、变调功能、变音量功能、背景音功能、多语气功能(即可选客服语气)等;纠正功能,如拼音(多音字)发音纠正功能、英文发音纠正功能、数字符号发音纠正功能等,还包括新建文本功能、保存文本功能、下载功能、导出语音合成标记语言文本功能(即导出SSML)等等。The text editor of the embodiment of the present application may include modification functions, such as audition function, pinyin/English translation display function, continuous reading function, partial speed change function, pause function, mute function, multi-speaker function, special effect function, pitch change function, change Volume function, background sound function, multi-tone function (that is, optional customer service tone), etc.; correction functions, such as pinyin (multi-phonetic characters) pronunciation correction function, English pronunciation correction function, number symbol pronunciation correction function, etc., also include new text function, Save text function, download function, export speech synthesis markup language text function (ie export SSML) and so on.
可以理解的是,在运行该文本编辑器时,文本编辑器的各个功能都对应有功能操作指令,例如试听功能对应试听功能操作指令,局部变速功能对应局部变速功能操作指令,变音量功能对应变音量功能操作指令等。一实施例中,修饰功能操作指令包括试听功能操作指令,通过根据目标数据获取试听功能操作指令,根据试听功能操作指令,得到目标试听数据。又例如,修饰功能操作指令包括变调功能操作指令,通过根据目标数据获取变调功能操作指令,根据变调功能操作指令,得到目标变调数据,等等。通过功能操作指令获取想要的目标处理数据,增加语音合成多样性。It can be understood that when the text editor is running, each function of the text editor corresponds to a functional operation instruction, for example, the audition function corresponds to the audition function operation instruction, the local speed change function corresponds to the local speed change function operation instruction, and the volume change function corresponds to the strain relief function. Volume function operation instructions, etc. In one embodiment, the modifying function operation instruction includes a trial listening function operation instruction, and the target trial listening data is obtained according to the trial listening function operation instruction obtained by obtaining the trial listening function operation instruction according to the target data. For another example, the modifying function operation instruction includes a transposition function operation instruction, and by obtaining the transposition function operation instruction according to the target data, the target transposition data is obtained according to the transposition function operation instruction, and so on. Obtain the desired target processing data through functional operation instructions to increase the diversity of speech synthesis.
可以理解的是,图1所示的文本编辑器中,1、2、3……7每行对应的文本数据为输入的待处理文本数据。由于需要对待处理文本数据进行数据选取,从而根据数据选取功能操作指令,得到未处理数据和目标数据。可以理解的是,目标数据即需要处理的数据,以及对目标数据进行数据处理后,能够得到目标处理数据。例如,在多音字纠错中,“乐”字表示目标数据(或拼音数据),“乐”字上方的拼音则表示目标纠正数据(或目标拼音数据)。在连读中,“长江大桥”表示目标数据(或连读数据),而“长江大桥”左右两侧的连读符号标记,则表示目标修饰数据(或目标连读数据),“长”字也可表示多音字数据,则“长”字上方的拼音则表示目标纠正数据(或目标多音字数据)。在局部变速中,数字50、-50可表示目标修饰数据(或目标局部变速数据),即数字的大小即变速的大小。在静音中,400ms、600ms可表示目标修饰数据(或目标静音数据),400ms、600ms即表示静音时长。可以理解的是,在其他实施例中,当目标修饰数据为数字时,该数字可表示为音量的大小、音调的大小、语速的快慢等,而不局限于本实施例。It can be understood that, in the text editor shown in FIG. 1 , the text data corresponding to each line 1, 2, 3...7 is input text data to be processed. Since the text data to be processed needs to be selected, the unprocessed data and target data can be obtained according to the data selection function operation instructions. It can be understood that the target data is the data that needs to be processed, and after data processing is performed on the target data, the target processing data can be obtained. For example, in the error correction of polyphonic characters, the word "Le" represents the target data (or pinyin data), and the pinyin above the word "Le" represents the target correction data (or target pinyin data). In continuous reading, "Yangtze River Bridge" indicates the target data (or continuous reading data), and the continuous reading symbols on the left and right sides of "Yangtze River Bridge" indicate the target modification data (or target continuous reading data), and the character "long" It can also represent polyphonic data, and the pinyin above the word "long" then represents the target correction data (or target polyphonic data). In local shifting, numbers 50 and -50 may represent target modification data (or target local shifting data), that is, the size of the number is the magnitude of the shifting. In silence, 400ms and 600ms may represent target modification data (or target mute data), and 400ms and 600ms represent the duration of silence. It can be understood that, in other embodiments, when the target modification data is a number, the number can be expressed as volume, pitch, speech speed, etc., and is not limited to this embodiment.
具体地,参照图2,本申请实施例提供一种基于文本编辑器的语音合成方法,包括以下步骤:Specifically, referring to FIG. 2 , the embodiment of the present application provides a text editor-based speech synthesis method, including the following steps:
步骤S100、获取待处理文本数据;Step S100, acquiring text data to be processed;
需说明的是,可基于文本编辑器输入汉字数据、英文数据、数字数据、符号数据或组合数据(如数字符号数据、汉字英文数据)等,该待处理文本数据可以为用户常用数据。It should be noted that Chinese character data, English data, number data, symbol data or combined data (such as number symbol data, Chinese character and English data) can be input based on a text editor, and the text data to be processed can be commonly used data by users.
步骤S200、根据待处理文本数据,获取数据选取功能操作指令;Step S200, according to the text data to be processed, obtain the data selection function operation instruction;
步骤S300、根据数据选取功能操作指令,得到未处理数据和第一选取范围内的目标数据,其中,待处理文本数据包括未处理数据和目标数据;Step S300, obtain unprocessed data and target data within the first selection range according to the data selection function operation instruction, wherein the text data to be processed includes unprocessed data and target data;
需说明的是,通过对待处理文本数据执行数据选取功能操作指令,从而根据数据选取功能操作指令,得到第一选取范围内的目标数据。可以理解的是,目标数据可以为待处理文本数据中的部分数据,也可以为整个待处理文本数据;若目标数据为整个待处理文本数据,则未处理数据为无;It should be noted that, by executing the data selection function operation instruction on the text data to be processed, the target data within the first selection range is obtained according to the data selection function operation instruction. It can be understood that the target data can be part of the text data to be processed, or the entire text data to be processed; if the target data is the entire text data to be processed, the unprocessed data is none;
步骤S400、根据目标数据获取纠正功能操作指令,根据纠正功能操作指令,得到与目标数据对应的目标纠正数据;Step S400, obtaining the correcting function operation instruction according to the target data, and obtaining the target correcting data corresponding to the target data according to the correcting function operating command;
之后,对目标数据执行纠正功能操作指令,根据纠正功能操作指令,得到与目标数据对应的目标纠正数据。本实施例对待处理文本数据中对应的目标数据存在错误数据时,需要对错误数据进行纠正。可以理解的是,对错误数据进行纠正,可以为,通过预先对待处理文本数据进行数据处理得到音频结果,例如通过试听功能以播放上述音频结果。当音频结果存在错误发音时,表示目标数据存在错误数据,例如存在发音错误或者错别字错误等等。对应的,本实施例的纠正功能操作指令可以为拼音发音纠正功能操作指令、英文发音纠正功能操作指令、数字符号发音纠正功能操作指令、错别字纠正功能操作指令等等。Afterwards, the correction function operation instruction is executed on the target data, and the target correction data corresponding to the target data is obtained according to the correction function operation instruction. In this embodiment, when there is erroneous data in the target data corresponding to the text data to be processed, it is necessary to correct the erroneous data. It can be understood that correcting the erroneous data may be to obtain an audio result by performing data processing on the text data to be processed in advance, such as playing the above audio result through a trial listening function. When there is mispronunciation in the audio result, it means that there is wrong data in the target data, for example, there are pronunciation mistakes or typos. Correspondingly, the correcting function operation instruction in this embodiment may be a pinyin pronunciation correction function operation instruction, an English pronunciation correction function operation instruction, a numeral symbol pronunciation correction function operation instruction, a typo correction function operation instruction, and the like.
步骤S500、根据目标数据获取修饰功能操作指令,根据修饰功能操作指令,得到与目标 数据对应的目标修饰数据;Step S500, obtain the modification function operation instruction according to the target data, and obtain the target modification data corresponding to the target data according to the modification function operation instruction;
对目标数据执行修饰功能操作指令,根据修饰功能操作指令,得到与目标数据对应的目标修饰数据。本实施例对目标数据进行修饰,以使得目标数据能够带有一定的修饰效果。例如,修饰功能操作指令可以为连读功能操作指令、变速功能操作指令、场景功能操作指令、***点功能操作指令等,从而相应得到与目标数据对应的目标修饰数据。The modification function operation instruction is executed on the target data, and the target modification data corresponding to the target data is obtained according to the modification function operation instruction. In this embodiment, the target data is modified so that the target data can have a certain modification effect. For example, the modification function operation instruction may be a continuous reading function operation instruction, a speed change function operation instruction, a scene function operation instruction, an insertion point function operation instruction, etc., so as to obtain corresponding target modification data corresponding to the target data.
步骤S600、根据未处理数据、目标数据、目标纠正数据和目标修饰数据,得到语音合成结果。Step S600. Obtain a speech synthesis result according to the unprocessed data, target data, target corrected data and target modified data.
需说明的是,由于本实施例对目标数据执行了纠正功能操作指令得到目标纠正数据,对目标数据执行了修饰功能操作指令得到目标修饰数据。目标纠正数据、目标修饰数据对对应的目标数据增加了纠正效果、修饰效果,而待处理文本数据中还包括了除目标数据之外的未处理数据,输出时,结合上述未处理数据、目标数据、目标纠正数据和目标修饰数据,能够得到最终语音合成结果。可以理解的是,待处理文本数据中即使包括有未处理数据,但该未处理数据仍对应有相应的音频或发音,因此该未处理数据也需作为语音合成结果的一部分。It should be noted that since this embodiment executes the correction function operation instruction on the target data to obtain the target correction data, and executes the modification function operation instruction on the target data to obtain the target modification data. Target correction data and target modification data add correction and modification effects to the corresponding target data, and the text data to be processed also includes unprocessed data other than the target data. When outputting, combine the above unprocessed data and target data , target correction data and target modification data to obtain the final speech synthesis result. It can be understood that even if unprocessed data is included in the text data to be processed, the unprocessed data still corresponds to corresponding audio or pronunciation, so the unprocessed data also needs to be part of the speech synthesis result.
可以理解的是,经过人工智能得到的语音合成技术,也无法确保能够完全按照用户的意愿来准确地读出每一个字,这不仅意味着汉字数据和英文数据都能准确发音,也意味着数字符号数据也能按照惯用读法播报。It is understandable that the speech synthesis technology obtained through artificial intelligence cannot ensure that every word can be read accurately according to the user's wishes. This not only means that Chinese character data and English data can be pronounced accurately, but also means that numbers Symbolic data can also be broadcast in conventional reading.
本实施例能够作用于一个汉字数据或英文数据或数字符号数据上。用户在执行试听功能后认为某个待处理文本数据对应的发音或读法不对时,可根据该待处理文本数据,获取数据选取功能操作指令,从而得到第一选取范围内的目标数据,之后对目标数据执行纠正功能操作指令,即可进行纠正功能。This embodiment can act on a Chinese character data or English data or numeral symbol data. When the user thinks that the pronunciation or reading method corresponding to a certain text data to be processed is incorrect after performing the audition function, he can obtain the data selection function operation instruction according to the text data to be processed, so as to obtain the target data in the first selection range, and then The target data executes the correction function operation instruction, and then the correction function can be performed.
参照图3,可以理解的是,目标数据包括拼音数据,目标纠正数据包括目标拼音数据,纠正功能操作指令包括拼音发音纠正功能操作指令,根据目标数据获取纠正功能操作指令,根据纠正功能操作指令,得到与目标数据对应的目标纠正数据,包括:With reference to Fig. 3, it can be understood that the target data includes pinyin data, the target correction data includes the target pinyin data, the correcting function operation instruction includes the pinyin pronunciation correction function operation instruction, the correction function operation instruction is obtained according to the target data, and according to the correction function operation instruction, Obtain the target correction data corresponding to the target data, including:
步骤S410、根据拼音数据,得到与拼音数据对应的至少一种候选拼音结果,其中,候选拼音结果根据每一候选拼音结果对应的发音概率值进行排序;Step S410, according to the pinyin data, at least one candidate pinyin result corresponding to the pinyin data is obtained, wherein the candidate pinyin results are sorted according to the pronunciation probability value corresponding to each candidate pinyin result;
可以理解的是,该文本编辑器能够根据每一候选拼音结果对应的发音概率值对候选拼音结果进行排序。需说明的是,根据数据选取功能操作指令,得到第一选取范围内的拼音数据。该第一选取范围可以包括单个数量的拼音数据,或者多个数量的拼音数据。以单个拼音数据为例,根据拼音数据,得到与拼音数据对应的至少一种候选拼音结果。可以理解的是,拼音数据包括单音字数据和多音字数据。当拼音数据为单音字数据时,候选拼音结果为一种,此时是针对文本编辑器对该单音字数据所对应的默认拼音记载错误时,可通过候选拼音结果来获取正确拼音。当拼音数据为多音字数据时,候选拼音结果为至少两种。此时针对文本编辑器对该多音字数据所对应的默认拼音记载错误时,可通过至少两种候选拼音结果对应的发音概率值对候选拼音结果进行排序。It can be understood that the text editor can sort the candidate pinyin results according to the pronunciation probability value corresponding to each candidate pinyin result. It should be noted that, according to the data selection function operation instruction, the pinyin data within the first selection range is obtained. The first selection range may include a single quantity of Pinyin data, or multiple quantities of Pinyin data. Taking a single pinyin data as an example, according to the pinyin data, at least one candidate pinyin result corresponding to the pinyin data is obtained. It can be understood that the pinyin data includes monophone data and polyphonic data. When the pinyin data is monophonic data, the candidate pinyin result is one type. At this time, when the default pinyin record corresponding to the monophonic data is wrong in the text editor, the correct pinyin can be obtained through the candidate pinyin result. When the pinyin data is polyphone data, there are at least two candidate pinyin results. At this time, when the default pinyin record corresponding to the polyphone data is wrong in the text editor, the candidate pinyin results can be sorted according to the pronunciation probability values corresponding to at least two candidate pinyin results.
步骤S420、根据候选拼音结果,获取拼音发音纠正功能操作指令;Step S420, according to the candidate pinyin result, obtain the pinyin pronunciation correction function operation instruction;
步骤S430、根据拼音发音纠正功能操作指令,得到与拼音数据对应的目标拼音数据。Step S430, according to the pinyin pronunciation correction function operation instruction, obtain the target pinyin data corresponding to the pinyin data.
之后,根据候选拼音结果,获取拼音发音纠正功能操作指令,从而根据拼音发音纠正功能操作指令,得到与拼音数据对应的目标拼音数据。Afterwards, according to the candidate pinyin results, the pinyin pronunciation correction function operation instruction is obtained, so as to obtain the target pinyin data corresponding to the pinyin data according to the pinyin pronunciation correction function operation instruction.
可以理解的是,通过本实施例,能够对拼音数据执行拼音发音纠正功能操作指令,从而便于用户对拼音数据进行纠正,得到目标拼音数据。It can be understood that, through this embodiment, the pinyin pronunciation correction function operation instruction can be executed on the pinyin data, so that the user can correct the pinyin data and obtain the target pinyin data.
参照图4,可以理解的是,候选拼音结果可以为拼音加声调的形式进行显示。该候选拼音结果可以设置于对应的拼音数据的上方或下方,也可以设置于对应的拼音数据的一侧等等。如图4中“乐”字表示本实施例的拼音数据,通过选取“乐”字,根据该“乐”字,得到与“乐”字对应的至少两种候选拼音结果,本实施例有le4、yue4、yao4、lao4四种候选拼音结果,且该候选拼音结果根据发音概率值进行排序。其中,4表示声调。Referring to FIG. 4 , it can be understood that the candidate pinyin results can be displayed in the form of pinyin plus tones. The candidate pinyin result can be set above or below the corresponding pinyin data, or can be set on the side of the corresponding pinyin data and so on. As shown in Fig. 4, "music" word represents the pinyin data of the present embodiment, by selecting "music" word, according to this "music" word, obtains at least two kinds of candidate pinyin results corresponding to "music" word, present embodiment has le4 , yue4, yao4, lao4 four candidate pinyin results, and the candidate pinyin results are sorted according to the pronunciation probability value. Among them, 4 represents tone.
参照图5,一些实施例中,根据拼音发音纠正功能操作指令,得到与“乐”字对应的目 标拼音数据,即本实施例的yue4。该目标拼音数据和对应的拼音数据(即“乐”字)可以设置为相同的颜色,且对应的目标拼音数据或拼音数据与未处理数据的颜色不同,从而区分出目标拼音数据,以表示该拼音数据执行了拼音发音纠正功能操作指令,提高识别性。With reference to Fig. 5, in some embodiments, according to the pinyin pronunciation correction function operation instruction, obtain the target pinyin data corresponding to " happy " word, i.e. yue4 of the present embodiment. The target pinyin data and the corresponding pinyin data (i.e. the word "乐") can be set to the same color, and the corresponding target pinyin data or pinyin data is different from the color of the unprocessed data, thereby distinguishing the target pinyin data to represent the target pinyin data. The pinyin data implements the operation instructions of the pinyin pronunciation correction function to improve recognition.
参照图6,可以理解的是,拼音发音纠正功能操作指令包括多音字发音纠正功能操作指令,根据拼音数据,得到与拼音数据对应的至少一种候选拼音结果,包括:With reference to Fig. 6, it can be understood that the phonetic pronunciation correction function operation instruction includes the polyphone pronunciation correction function operation instruction, and according to the phonetic data, at least one candidate phonetic result corresponding to the phonetic data is obtained, including:
步骤S411、根据拼音数据,获取多音字发音纠正功能操作指令;Step S411, according to the pinyin data, obtain the polyphone pronunciation correction function operation instruction;
步骤S412、根据多音字发音纠正功能操作指令,得到与拼音数据对应的至少两种候选拼音结果。Step S412, according to the polyphone pronunciation correction function operation instruction, at least two candidate pinyin results corresponding to the pinyin data are obtained.
可以理解的是,本实施例拼音数据可以包括单音字数据和多音字数据,当拼音数据为多音字数据时,获取多音字发音纠正功能操作指令,根据多音字发音纠正功能操作指令,得到与拼音数据对应的至少两种候选拼音结果,从而便于用户根据拼音数据进行多音字发音纠正,实现功能多样性。It can be understood that the pinyin data in this embodiment can include monophonic data and polyphonic data. When the pinyin data is polyphonic data, the polyphonic pronunciation correction function operation instruction is obtained, and the polyphonic pronunciation correction function operation instruction is obtained according to the polyphonic pronunciation correction function operation instruction. At least two candidate pinyin results corresponding to the data, so that users can correct the pronunciation of polyphonic characters according to the pinyin data, and realize functional diversity.
例如参照图4,根据拼音数据,例如“乐”,点击多音字发音纠正功能,文本编辑器则获取多音字发音纠正功能操作指令,即可自动获取该“乐”字对应的所有可能的候选拼音结果,并按候选拼音结果对应的发音概率值进行排序后,提供给用户选择。用户点击候选拼音结果后,根据候选拼音结果,文本编辑器即可获取拼音发音纠正功能操作指令,根据拼音发音纠正功能操作指令,得到与拼音数据对应的目标拼音数据。参照图5,一实施例中,该目标拼音数据可以以特殊颜色显示在拼音数据如“乐”上方,同时拼音数据也同步变色,以便于用户直接获知哪些拼音数据执行拼音发音纠正功能操作指令。For example, referring to Figure 4, according to the pinyin data, such as "乐", click on the pronunciation correction function of polyphonic characters, and the text editor will obtain the operation instructions of the pronunciation correction function of polyphonic characters, and can automatically obtain all possible candidate pinyin corresponding to the word "乐". The results are sorted according to the pronunciation probability values corresponding to the candidate pinyin results, and provided to the user for selection. After the user clicks the candidate pinyin result, the text editor can obtain the pinyin pronunciation correction function operation instruction according to the candidate pinyin result, and obtain the target pinyin data corresponding to the pinyin data according to the pinyin pronunciation correction function operation instruction. Referring to Fig. 5, in one embodiment, the target pinyin data can be displayed in a special color above the pinyin data such as "乐", and the pinyin data also changes color synchronously, so that the user can directly know which pinyin data executes the pinyin pronunciation correction function operation instruction.
可以理解的是,相对于汉字数据,通常确定有有限个发音,而对于英文数据,由于英文数据例如英文单词,所对应的发音通常具有多样性(如英式发音和美式发音)和不确定性(一词多音等,如阅读“read”,具有一词多音)。用户想要语音合成结果是期望的效果时,则可根据目标数据,获取编辑功能操作指令,以对目标数据输入对应的发音音标。由此,文本编辑器能够根据用户指定的发音音标来播报英文单词,如图7所示。It is understandable that, compared with Chinese character data, there are usually a limited number of pronunciations, while for English data, due to English data such as English words, the corresponding pronunciations usually have diversity (such as British pronunciation and American pronunciation) and uncertainty (a word polyphonic etc., as reading " read ", has a word polyphonic). When the user wants the speech synthesis result to be the desired effect, he can obtain the editing function operation instruction according to the target data, so as to input the corresponding pronunciation phonetic symbols for the target data. Thus, the text editor can broadcast English words according to the pronunciation phonetic symbols specified by the user, as shown in FIG. 7 .
可以理解的是,当数字混合了符号后,往往可以有好几种读法。此时,可在文本编辑器内设定几种默认读法,例如按照日期读法、序列读法、算数读法、比分读法或者范围读法等。参照图8、图9,当用户认为数字符号数据按照文本编辑器默认读法错误或者不是用户所需时,则根据对输入的数字符号数据,获取数据选取功能操作指令,得到目标数字符号数据。例如,文本编辑器弹出目标数字符号数据对应的所有可能的读法,以供用户选择。用户选择后,文本编辑器根据目标数字符号数据,获取纠正功能操作指令,根据纠正功能操作指令,得到与目标数字符号数据对应的目标纠正数字符号数据,如图9所示,日期则表示目标纠正数字符号数据。Understandably, when numbers are mixed with symbols, they can often be read in several ways. At this point, several default reading methods can be set in the text editor, such as reading by date, sequence, arithmetic, score or range. Referring to Fig. 8 and Fig. 9, when the user thinks that the digital symbol data is incorrectly read according to the default reading method of the text editor or is not required by the user, then according to the input digital symbol data, obtain the data selection function operation instruction to obtain the target digital symbol data. For example, the text editor pops up all possible reading methods corresponding to the target number symbol data for the user to choose. After the user selects, the text editor obtains the correction function operation instruction according to the target number symbol data, and obtains the target correction number symbol data corresponding to the target number symbol data according to the correction function operation instruction, as shown in Figure 9, and the date indicates the target correction Numeric sign data.
参照图10,可以理解的是,修饰功能操作指令包括***点功能操作指令,根据目标数据获取修饰功能操作指令,根据修饰功能操作指令,得到与目标数据对应的目标修饰数据,包括:Referring to FIG. 10, it can be understood that the modification function operation instruction includes an insertion point function operation instruction, the modification function operation instruction is obtained according to the target data, and the target modification data corresponding to the target data is obtained according to the modification function operation instruction, including:
步骤S510、根据目标数据,获取目标***位置,其中,目标***位置为目标数据对应的左侧位置和/或右侧位置;Step S510, according to the target data, obtain the target insertion position, wherein the target insertion position is the left position and/or the right position corresponding to the target data;
步骤S520、根据目标***位置,获取***点功能操作指令;Step S520, according to the target insertion position, obtain the insertion point function operation instruction;
步骤S530、根据***点功能操作指令,得到与目标***位置对应的目标修饰数据,其中,目标修饰数据包括目标停顿数据、目标静音数据、目标特效音数据中的至少一种。Step S530, according to the insertion point function operation instruction, obtain the target modification data corresponding to the target insertion position, wherein the target modification data includes at least one of target pause data, target mute data, and target special effect sound data.
可以理解的是,本实施例能够对目标数据执行***点功能操作指令,从而使得目标数据带有修饰效果。例如,根据目标数据,获取目标数据对应的左侧位置即目标***位置,在目标数据对应的左侧位置执行***点功能操作指令,如执行停顿功能操作指令、静音功能操作指令、特效音功能操作指令等,以使得根据***点功能操作指令,得到与目标数据对应的左侧位置所对应的目标修饰数据,如目标停顿数据、目标静音数据、目标特效音数据。It can be understood that, in this embodiment, the insertion point function operation instruction can be executed on the target data, so that the target data has a modification effect. For example, according to the target data, obtain the left position corresponding to the target data, that is, the target insertion position, and execute the insertion point function operation command at the left position corresponding to the target data, such as executing the pause function operation command, the mute function operation command, and the special sound function operation Instructions, etc., so that according to the insertion point function operation instructions, the target modification data corresponding to the left position corresponding to the target data is obtained, such as target pause data, target mute data, and target special effect sound data.
即参照图11,根据停顿功能操作指令,得到与目标***位置对应的目标停顿数据;目标 停顿数据可包括短停顿数据、长停顿数据,其中,短停顿、长停顿对应的时间可通过预设的停顿时间来确定。That is, with reference to Figure 11, according to the pause function operation instruction, the target pause data corresponding to the target insertion position is obtained; the target pause data can include short pause data and long pause data, wherein the time corresponding to the short pause and the long pause can be set by preset Pause time to determine.
参照图12,根据静音功能操作指令,得到与目标***位置对应的目标静音数据;图12中400ms、600ms、800ms即目标静音数据,数字大小表示静音时长。Referring to FIG. 12 , according to the mute function operation instruction, the target mute data corresponding to the target insertion position is obtained; 400ms, 600ms, and 800ms in FIG. 12 are the target mute data, and the numbers indicate the mute duration.
参照图13,根据特效音功能操作指令,得到与目标***位置对应的目标特效音数据。目标特效音数据可以为呼吸数据、叹息数据、心跳数据、咳嗽数据、鼠标点击数据、键盘打字数据、QQ消息提醒数据等。Referring to FIG. 13 , according to the special effect sound function operation instruction, the target special effect sound data corresponding to the target insertion position is obtained. The target special effect sound data can be breath data, sigh data, heartbeat data, cough data, mouse click data, keyboard typing data, QQ message reminder data, etc.
参照图14,可以理解的是,修饰功能操作指令包括连读功能操作指令、变速功能操作指令、场景功能操作指令,根据目标数据获取修饰功能操作指令,根据修饰功能操作指令,得到与目标数据对应的目标修饰数据,包括:Referring to FIG. 14, it can be understood that the modification function operation instructions include continuous reading function operation instructions, speed change function operation instructions, and scene function operation instructions. Target modification data for , including:
步骤S550、根据目标数据,获取连读功能操作指令、变速功能操作指令以及场景功能操作指令;Step S550, according to the target data, obtain the continuous reading function operation instruction, the speed change function operation instruction and the scene function operation instruction;
步骤S560、根据连读功能操作指令对应的优先级、变速功能操作指令对应的优先级以及场景功能操作指令对应的优先级,得到与目标数据对应的目标优先修饰数据,其中,定义连读功能操作指令对应的优先级大于变速功能操作指令对应的优先级,变速功能操作指令对应的优先级大于场景功能操作指令对应的优先级。Step S560, according to the priority corresponding to the continuous reading function operation command, the priority corresponding to the speed change function operation command, and the priority corresponding to the scene function operation command, obtain the target priority modification data corresponding to the target data, wherein the continuous reading function operation is defined The priority corresponding to the instruction is higher than the priority corresponding to the speed change function operation instruction, and the priority corresponding to the speed change function operation instruction is higher than the priority corresponding to the scene function operation instruction.
可以理解的是,当目标数据中需要执行多个修饰功能操作指令时,例如执行连读功能操作指令、变速功能操作指令以及场景功能操作指令,则可以根据连读功能操作指令对应的优先级、变速功能操作指令对应的优先级以及场景功能操作指令对应的优先级进行排序,以获取目标优先修饰数据。通过目标优先修饰数据得到的语音合成结果,在播放语音合成结果时,会根据修饰功能操作指令对应的优先级进行播放。It can be understood that when multiple modification function operation instructions need to be executed in the target data, such as the execution of continuous reading function operation instructions, speed change function operation instructions and scene function operation instructions, then the corresponding priority of the continuous reading function operation instructions, The priorities corresponding to the speed change function operation instructions and the priorities corresponding to the scene function operation instructions are sorted to obtain the target priority modification data. The speech synthesis result obtained by modifying the target priority data will be played according to the priority corresponding to the operation instruction of the modification function when the speech synthesis result is played.
根据不同功能逻辑上的特点,以及***合成语音时数据处理的先后顺序,当用户对一段目标数据反复应用不同的修饰功能时,可定义连读功能操作指令对应的优先级大于变速功能操作指令对应的优先级,变速功能操作指令对应的优先级大于场景功能操作指令对应的优先级,从而使得连读功能操作指令对应的连读功能的优先级大于变速功能操作指令对应的变速功能的优先级,变速功能操作指令对应的变速功能的优先级大于场景功能操作指令对应的场景功能的优先级。According to the logical characteristics of different functions and the sequence of data processing when the system synthesizes speech, when the user repeatedly applies different modification functions to a piece of target data, it can be defined that the corresponding priority of the continuous reading function operation command is higher than that of the variable speed function operation command. priority, the priority corresponding to the speed change function operation command is greater than the priority corresponding to the scene function operation command, so that the priority of the continuous reading function corresponding to the continuous reading function operation command is greater than the priority of the speed change function corresponding to the speed change function operation command, The priority of the speed change function corresponding to the speed change function operation command is higher than the priority of the scene function corresponding to the scene function operation command.
需说明的是,场景包括多语气、多发音人和背景音。对应的,场景功能则包括多语气功能、多发音人功能和背景音功能。则一实施例中,多语气功能/多发音人功能/背景音功能为最外层(优先级最低),其次变速功能(例如局部变速功能),最后连读功能(优先级最高)。It should be noted that the scene includes multiple voices, multiple speakers and background sounds. Correspondingly, the scene function includes multi-tone function, multi-speaker function and background sound function. Then in one embodiment, the multi-tone function/multi-speaker function/background sound function is the outermost layer (the lowest priority), followed by the variable speed function (such as the local variable speed function), and finally the continuous reading function (the highest priority).
参照图15,多发音人功能表示在部分场景下用户希望合成一段二人对话甚至多人对话的语音合成结果。Referring to FIG. 15 , the multi-speaker function indicates that in some scenarios, the user wishes to synthesize a speech synthesis result of a two-person dialogue or even a multi-person dialogue.
多语气功能表示根据应用场景切换不同语气,例如客服、电销、播报、喜、怒、哀等。The multi-tone function means switching between different tones according to the application scenario, such as customer service, telemarketing, broadcast, happy, angry, sad, etc.
背景音功能表示背景音乐、白噪音;在一些实施例中,背景音能够应用于有声读物场景。The background sound function represents background music, white noise; in some embodiments, background sound can be applied to an audiobook scene.
可以理解的是,为了使得语音合成效果更丰富,有时需要控制一段待处理文本数据。It can be understood that, in order to make the speech synthesis effect richer, sometimes it is necessary to control a piece of text data to be processed.
具体地,针对***会错误地对一个较长的目标数据如汉字数据进行停顿,比如“武汉市长江大桥”可能错误断句为“武汉市长/江大桥”,故连读功能需确保选取范围内的汉字数据中间没有停顿。Specifically, the system will mistakenly stop a longer target data such as Chinese character data. For example, "Wuhan Yangtze River Bridge" may be wrongly broken into "Wuhan Mayor/Jiang Bridge", so the continuous reading function needs to ensure that the selected range There is no pause in the middle of the Chinese character data.
参照图16,对于有声读物场景,为了更好融入情景,有时可能需要忽快忽慢、忽高忽低地朗读一段汉字数据,相比传统的语音合成技术只能对整句/段文本应用相同的参数值,本实施例可以针对选取范围内的目标数据进行局部变速功能,又或者进行变调功能、变音量功能等。Referring to Figure 16, for the audiobook scene, in order to better integrate into the scene, sometimes it may be necessary to read a piece of Chinese character data aloud at high and low speeds. For the parameter value, this embodiment can perform a local speed change function, or a pitch change function, volume change function, etc. for the target data within the selected range.
参照图17,为本实施例应用多种不同的修饰功能时的具体实施例图。其中,客服蓉蓉表示根据多语气功能操作指令以及多语气功能操作指令对应的优先级,得到的目标多语气第三修饰数据;“123456789”左侧位置和右侧位置的连读符号标记表示根据连读功能操作指令以及连读功能操作指令对应的优先级,得到的目标连读第一修饰数据;参数值“25”表示根据 变速功能操作指令以及变速功能操作指令对应的优先级,得到的目标变速第二修饰数据。即本实施例的目标优先修饰数据包括目标连读第一修饰数据、目标变速第二修饰数据和目标多语气第三修饰数据。在其他实施例中,也可应用其他不同的修饰功能,而不局限于本实施例,在此不再赘述。Referring to FIG. 17 , it is a diagram of a specific embodiment when various modification functions are applied in this embodiment. Among them, customer service Rongrong indicated that the target multi-modal third modification data was obtained according to the multi-modal function operation instructions and the corresponding priorities of the multi-modal function operation instructions; The continuous reading function operation command and the priority corresponding to the continuous reading function operation command, the obtained target continuous reading first modified data; the parameter value "25" indicates that according to the speed change function operation command and the priority corresponding to the speed change function operation command, the target obtained Variable speed second modification data. That is, the target priority modification data in this embodiment includes the first modification data of the target continuous reading, the second modification data of the target speed change, and the third modification data of the target multi-tone. In other embodiments, other different modification functions may also be applied, not limited to this embodiment, and will not be repeated here.
可以理解的是,一实施例中,也可以对第一选取范围内的不同区间所对应设置的目标数据,分别应用多种不同的修饰功能,而不局限于本实施例,在此不再赘述。It can be understood that, in one embodiment, multiple different modification functions can also be applied to the target data corresponding to different intervals within the first selection range, and are not limited to this embodiment, and will not be repeated here. .
可以理解的是,根据数据选取功能操作指令,得到未处理数据和第一选取范围内的目标数据之后,包括:重新获取数据选取功能操作指令,并删除第一选取范围,得到第二选取范围内的目标数据。It can be understood that, after obtaining the unprocessed data and the target data within the first selection range according to the data selection function operation instruction, it includes: reacquiring the data selection function operation instruction, and deleting the first selection range, and obtaining the second selection range target data.
本实施例通过重新获取数据选取功能操作指令,以使得原目标数据对应的第一选取范围被删除,之后根据重新获取的数据选取功能操作指令,能够得到第二选取范围内的目标数据,以变换不同选取范围,从而得到不同目标数据。In this embodiment, the first selection range corresponding to the original target data is deleted by reacquiring the data selection function operation instruction, and then according to the reacquired data selection function operation instruction, the target data in the second selection range can be obtained to transform Different selection ranges result in different target data.
可以理解的是,第二选取范围可以小于第一选取范围,或大于第一选取范围,或等于第一选取范围,具体地,在重新获取数据选取功能操作指令,并删除第一选取范围,得到第二选取范围内的目标数据步骤中,可以包括多种不同实施例。It can be understood that the second selection range may be smaller than the first selection range, or larger than the first selection range, or equal to the first selection range. Specifically, after re-acquiring the data selection function operation instruction and deleting the first selection range, the obtained In the target data step within the second selection range, multiple different embodiments may be included.
例如,一实施例中,第二选取范围可以按照“选取范围为新区间,其他拆老区间”的原则进行数据处理。同时,定义无参数的修饰功能为无参数功能(例如连读功能,即意味着连读功能对应的目标连读数据所对应的连读数据可以直接进行拆分、合并);定义带参数的修饰功能为有参数功能(例如变速功能、变调功能、变音量功能等等修饰功能)。For example, in an embodiment, the second selection range can be processed according to the principle of "the selection range is a new range, and other old ranges are removed". At the same time, define the modification function without parameters as a parameterless function (for example, the continuous reading function, which means that the continuous reading data corresponding to the target continuous reading data corresponding to the continuous reading function can be directly split and merged); define the modification with parameters The function is a parameter function (such as a speed change function, a pitch change function, a volume change function, etc. modification functions).
下面具体描述无参数功能、有参数功能的实施例。Embodiments of functions without parameters and functions with parameters are specifically described below.
1、对于无参数功能,且在第二选取范围内带有(如一个或两个)同种类型的修饰功能时:1. For functions without parameters, and with (such as one or two) modifier functions of the same type in the second selection range:
例如,在经过步骤S100至步骤S500后,即执行了连读功能操作指令之后,再执行重新获取数据选取功能操作指令步骤,此时删除第一选取范围,得到第二选取范围内的连读数据,此时经步骤S500得到的目标连读数据仍保留,且该目标连读数据相应设置于第二选取范围内的目标数据的左右两侧。For example, after step S100 to step S500, that is, after the continuous reading function operation instruction is executed, the step of re-acquiring the data selection function operation instruction is executed. At this time, the first selection range is deleted, and the continuous reading data in the second selection range is obtained. At this time, the target continuous reading data obtained in step S500 is still retained, and the target continuous reading data is correspondingly set on the left and right sides of the target data in the second selection range.
如图18所示,左侧数据表示第一选取范围的连读数据经过连读功能操作指令后,得到目标连读数据和第一选取范围的连读数据,此时再重新获取数据选取功能操作指令,则原第一选取范围将会被删除,从而得到第二选取范围内的连读数据,右侧数据则表示第二选取范围内的连读数据以及对应的目标连读数据。As shown in Figure 18, the data on the left indicates that the continuous reading data of the first selection range is obtained after the continuous reading function operation instruction, and the target continuous reading data and the continuous reading data of the first selection range are obtained, and then the data selection function operation is obtained again command, the original first selection range will be deleted, so as to obtain the continuous reading data in the second selection range, and the data on the right represents the continuous reading data in the second selection range and the corresponding target continuous reading data.
即可以理解的是,一实施例中,当重新获取数据选取功能操作指令这个步骤是在经过步骤S400执行的纠正功能操作指令和/或步骤S500执行的修饰功能操作指令之后,则经过步骤S400得到的目标纠正数据和/或经过步骤S500得到的目标修饰数据仍然会保留。即本实施例是为了改变第一选取范围内的目标数据,在删除了第一选取范围后,通过保留原先的目标纠正数据和/或目标修饰数据,从而得到第二选取范围的目标数据以及对应的目标纠正数据和/或对应的目标修饰数据。即保留目标修饰数据对应的修饰功能。That is, it can be understood that, in one embodiment, when the step of reacquiring the data selection function operation instruction is performed after the correction function operation instruction executed in step S400 and/or the modification function operation instruction executed in step S500, then step S400 is performed to obtain The target correction data and/or the target modification data obtained through step S500 will still be retained. That is, this embodiment is to change the target data in the first selection range. After deleting the first selection range, by retaining the original target correction data and/or target modification data, the target data and the corresponding target data in the second selection range are obtained. The target correction data and/or the corresponding target modification data. That is, the modification function corresponding to the target modification data is retained.
2、对于有参数功能:2. For functions with parameters:
在经过步骤S100至步骤S500后,即执行了修饰功能操作指令之后,再执行重新获取数据选取功能操作指令,并删除第一选取范围,得到第二选取范围内的目标数据步骤。此时目标数据对应的第二选取范围改变,且对应的目标修饰数据所对应的参数值也将会改变。After step S100 to step S500, that is, after executing the modification function operation instruction, execute the reacquisition data selection function operation instruction, delete the first selection range, and obtain the target data within the second selection range. At this time, the second selection range corresponding to the target data changes, and the parameter value corresponding to the corresponding target modification data will also change.
例如,当第二选取范围带有不同类型的修饰功能时,或者第二选取范围带有目标修饰数据以及未处理数据时,第二选取范围对应的选取条件将包括:单侧选取条件、内部选取条件、两侧选取条件这三种条件。For example, when the second selection range has different types of modification functions, or the second selection range has target modified data and unprocessed data, the selection conditions corresponding to the second selection range will include: single-sided selection conditions, internal selection There are three kinds of conditions, condition and selection condition on both sides.
具体地,第二选取范围对应的单侧选取条件表示在第一选取范围内的目标数据的左侧位置或右侧位置选取了未处理数据,且第二选取范围中包括第一选取范围内的部分目标数据;此时第一选取范围可以大于或者等于或者小于第二选取范围;Specifically, the one-sided selection condition corresponding to the second selection range indicates that unprocessed data is selected at the left or right position of the target data within the first selection range, and the second selection range includes Part of the target data; at this time, the first selection range can be greater than or equal to or smaller than the second selection range;
第二选取范围对应的内部选取条件表示在第一选取范围内选取部分目标数据,即第一选 取范围大于第二选取范围;The internal selection condition corresponding to the second selection range indicates that part of the target data is selected within the first selection range, that is, the first selection range is larger than the second selection range;
第二选取范围对应的两侧选取条件表示在第一选取范围内的目标数据的左侧位置和右侧位置均选取了未处理数据,且第二选取范围中包括第一选取范围内的所有目标数据,即第二选取范围大于第一选取范围。The selection condition on both sides corresponding to the second selection range indicates that the unprocessed data is selected on the left and right positions of the target data in the first selection range, and all targets in the first selection range are included in the second selection range The data, that is, the second selection range is larger than the first selection range.
参照图19,以第一行、第二行数据为例,描述本实施例的单侧选取条件。具体地,图19中左侧数据表示第一选取范围的目标数据经过修饰功能操作指令(参数功能操作指令)后,得到目标修饰数据和第一选取范围的目标数据,此时再根据单侧选取条件重新获取数据选取功能操作指令,此时原第一选取范围将会被删除,从而得到第二选取范围内的目标数据,图中右侧数据则表示第二选取范围内的目标数据以及对应的目标修饰数据。Referring to FIG. 19 , taking the first row and the second row of data as an example, the single-side selection condition of this embodiment is described. Specifically, the data on the left side in Fig. 19 indicates that after the target data in the first selection range is modified by a function operation command (parameter function operation command), the target modification data and the target data in the first selection range are obtained. Re-acquire the operation command of the data selection function under the condition. At this time, the original first selection range will be deleted, so as to obtain the target data in the second selection range. The data on the right side of the figure represent the target data in the second selection range and the corresponding Target modifier data.
可以理解的是,此时右侧数据中,第二选取范围包括有第一区间范围和第二区间范围。第一区间范围包括第一目标数据,该第一目标数据为第一选取范围中未重新执行数据选取功能操作指令但又执行了修饰功能操作指令的数据;而第二区间范围包括第二目标数据,第二目标数据为重新执行了数据选取功能操作指令的数据。此时,第一目标数据所对应的目标修饰数据未改变,即保留了第一选取范围内的目标数据对应的修饰功能以及修饰功能对应的参数值;但对于第二目标数据而言,第二目标数据所对应的目标修饰数据中的修饰功能未改变,但参数值发生了改变,即保留了第一选取范围内的目标数据对应的修饰功能,但改变了参数值。可以理解的是,对于第二目标数据所对应的参数值可以为预设值,如图19中的“50”。It can be understood that, in the right data at this time, the second selection range includes the first interval range and the second interval range. The first interval range includes the first target data, which is the data in the first selection range that has not re-executed the data selection function operation instruction but has executed the modification function operation instruction; and the second interval range includes the second target data , the second target data is the data for which the operation instruction of the data selection function has been re-executed. At this time, the target modification data corresponding to the first target data remains unchanged, that is, the modification function corresponding to the target data in the first selected range and the parameter value corresponding to the modification function are retained; but for the second target data, the second The modification function in the target modification data corresponding to the target data is not changed, but the parameter value is changed, that is, the modification function corresponding to the target data in the first selected range is retained, but the parameter value is changed. It can be understood that the parameter value corresponding to the second target data may be a preset value, such as "50" in FIG. 19 .
参照图19,以第三行数据为例,描述本实施例的内部选取条件。具体地,图19中左侧数据表示第一选取范围的目标数据经过修饰功能操作指令(参数功能操作指令)后,得到目标修饰数据和第一选取范围的目标数据,此时再根据内部选取条件重新获取数据选取功能操作指令,此时原第一选取范围将会被删除,从而得到第二选取范围内的目标数据,图中右侧数据则表示第二选取范围内的目标数据以及对应的目标修饰数据。Referring to FIG. 19 , the internal selection conditions of this embodiment are described by taking the third row of data as an example. Specifically, the data on the left side in Fig. 19 indicates that after the target data in the first selection range is modified by a function operation command (parameter function operation command), the target modification data and the target data in the first selection range are obtained, and then according to the internal selection conditions Re-acquire the operation command of the data selection function. At this time, the original first selection range will be deleted, so as to obtain the target data in the second selection range. The data on the right side of the figure represent the target data in the second selection range and the corresponding target. Decorate data.
可以理解的是,此时右侧数据中,第二选取范围包括有第一区间范围和第二区间范围。且第一区间范围包括第一目标数据,而第二区间范围包括第二目标数据;第一目标数据、第二目标数据所对应的修饰功能、参数值可参照上述单侧选取条件的描述,在此不再赘述。可以理解的是,由于本实施例是在第一选取范围内的中部选取了部分目标数据,从而得到三组与目标数据对应的目标修饰数据,但本实施例的参数值25对应的“1”、“4”均属于第一目标数据,即第一选取范围中未重新执行数据选取功能操作指令但又执行了修饰功能操作指令的数据。It can be understood that, in the right data at this time, the second selection range includes the first interval range and the second interval range. And the first interval range includes the first target data, and the second interval range includes the second target data; the modification function and parameter value corresponding to the first target data and the second target data can refer to the description of the above-mentioned one-sided selection condition, in This will not be repeated here. It can be understood that since this embodiment selects part of the target data in the middle of the first selection range, three sets of target modification data corresponding to the target data are obtained, but the parameter value 25 in this embodiment corresponds to "1" , "4" all belong to the first target data, that is, the data in the first selection range that has not re-executed the data selection function operation instruction but has executed the modification function operation instruction.
参照图19,以第四行数据为例,描述本实施例的两侧选取条件。具体地,图19中左侧数据表示第一选取范围的目标数据经过修饰功能操作指令(参数功能操作指令)后,得到目标修饰数据和第一选取范围的目标数据,此时再根据两侧选取条件重新获取数据选取功能操作指令,此时原第一选取范围将会被删除,从而得到第二选取范围内的目标数据,图中右侧数据则表示第二选取范围内的目标数据以及对应的目标修饰数据。Referring to FIG. 19 , taking the fourth row of data as an example, the conditions for selecting both sides of this embodiment will be described. Specifically, the data on the left side in Fig. 19 indicates that after the target data in the first selection range has been modified by a function operation command (parameter function operation command), the target modification data and the target data in the first selection range are obtained, and then selected according to the two sides Re-acquire the operation command of the data selection function under the condition. At this time, the original first selection range will be deleted, so as to obtain the target data in the second selection range. The data on the right side of the figure represent the target data in the second selection range and the corresponding Target modifier data.
可以理解的是,此时右侧数据中,第二选取范围包括有第二区间范围。第二区间范围包括第二目标数据,第二目标数据为重新执行了数据选取功能操作指令的数据。此时,第二目标数据所对应的目标修饰数据中的修饰功能未改变,但参数值发生了改变,即保留了第一选取范围内的目标数据对应的修饰功能,但改变了参数值。可以理解的是,对于第二目标数据所对应的参数值可以为预设值,如图19中的“50”。It can be understood that, in the right data at this time, the second selection range includes the second interval range. The second interval range includes second target data, and the second target data is data on which the data selection function operation instruction has been re-executed. At this time, the modification function in the target modification data corresponding to the second target data is not changed, but the parameter value is changed, that is, the modification function corresponding to the target data in the first selected range is retained, but the parameter value is changed. It can be understood that the parameter value corresponding to the second target data may be a preset value, such as "50" in FIG. 19 .
可以理解的是,图19中的参数值的大小可以表示变速的快慢、变调的高低、变音量的大小等等。It can be understood that the size of the parameter values in FIG. 19 can indicate the speed of the speed change, the height of the pitch change, the size of the change volume, and so on.
可以理解的是,在另外一实施例中,在步骤S300后,若直接执行了重新获取数据选取功能操作指令,并删除第一选取范围,得到第二选取范围内的目标数据这个步骤,之后再执行步骤S400和/或步骤S500时,此时的目标纠正数据、目标修饰数据是对应的第二选取范围内的目标数据。本实施例与上述提到的实施例具有较大不同。It can be understood that, in another embodiment, after step S300, if the operation instruction of reacquiring the data selection function is directly executed, and the first selection range is deleted to obtain the target data in the second selection range, then When step S400 and/or step S500 are executed, the target correction data and target modification data at this time are the target data within the corresponding second selected range. This embodiment is quite different from the above-mentioned embodiments.
通过如此设置,能够改变语音合成结果,使得用户能够自定义想要的音频,提高适用性。By setting in this way, the speech synthesis result can be changed, so that the user can customize the desired audio and improve the applicability.
可以理解的是,修饰功能操作指令包括连读功能操作指令,目标数据包括连读数据,根据目标数据获取修饰功能操作指令,根据修饰功能操作指令,得到与目标数据对应的目标修饰数据,包括:若第二选取范围内的连读数据对应的数量为一个,则撤销连读功能操作指令。It can be understood that the modification function operation instruction includes continuous reading function operation instruction, and the target data includes continuous reading data, the modification function operation instruction is obtained according to the target data, and the target modification data corresponding to the target data is obtained according to the modification function operation instruction, including: If the number corresponding to the continuous reading data in the second selection range is one, cancel the continuous reading function operation instruction.
本实施例中,修饰功能操作指令包括连读功能操作指令。需要说明的是,当目标数据包括连读数据时,若第二选取范围内的连读数据对应的数量为一个,则对该连读数据执行连读功能操作指令是无意义的。In this embodiment, the modification function operation instruction includes a continuous reading function operation instruction. It should be noted that when the target data includes continuous reading data, if the number of continuous reading data in the second selected range is one, it is meaningless to execute the continuous reading function operation instruction on the continuous reading data.
在一些实施例中,当第二选取范围内的连读数据对应的数量为一个时,则撤销连读功能操作指令。即表示文本编辑器拒绝执行连读功能操作指令。In some embodiments, when the number corresponding to the continuous reading data in the second selected range is one, the continuous reading function operation instruction is cancelled. It means that the text editor refuses to execute the continuous reading function operation instruction.
在一些实施例中,根据连读数据获取连读功能操作指令,根据连读功能操作指令,得到与连读数据对应的目标连读数据。对于目标连读数据,可以采用连读符号标记表示。该目标连读数据即连读符号标记对应于连读数据,可以设置于连读数据的左右两侧。当对目标连读数据执行删除功能操作指令时,也相应撤销连读功能操作指令即撤销连读功能。In some embodiments, the continuous reading function operation instruction is obtained according to the continuous reading data, and the target continuous reading data corresponding to the continuous reading data is obtained according to the continuous reading function operation instruction. For the target continuous reading data, the continuous reading symbol mark can be used to represent it. The target continuous reading data, that is, the continuous reading symbol mark corresponds to the continuous reading data, and can be set on the left and right sides of the continuous reading data. When the deletion function operation instruction is executed on the target continuous reading data, the continuous reading function operation instruction is correspondingly cancelled, that is, the continuous reading function is cancelled.
参照图20,例如,以左侧的第二个“连续”作为连读数据,其余的“连续”为未处理数据为例,当删除设置于连读数据“连续”的左右两侧中任意一个目标连读数据(即连读符号标记)时或者均删除连读数据“连续”的左右两侧的目标连读数据时,则表示对该目标连读数据执行删除功能操作指令,此时,撤销连读功能操作指令即撤销连读功能,该连读数据将转换为未处理数据,如图20的箭头右侧的数据。又例如,当第二选取范围内的连读数据对应的数量为多个时,例如连读数据对应的数量为至少两个,即对于连读数据“连续”而言,其数量为两个,“连”为一个,“续”为一个,之后对连读数据执行删除功能操作指令,直至第二选取范围内的连读数据对应的数量为一个,此时,也将撤销连读功能操作指令,使得该连读数据转换为未处理数据,如图20的箭头右侧第三行的数据。Referring to Figure 20, for example, take the second "continuous" on the left as the continuous reading data, and the rest of the "continuous" are unprocessed data as an example, when deleting any one of the left and right sides of the continuous reading data "continuous" When the target continuous reading data (that is, the continuous reading symbol mark) or delete the target continuous reading data on the left and right sides of the continuous reading data "continuous", it means that the delete function operation instruction is performed on the target continuous reading data. At this time, cancel The continuous reading function operation command cancels the continuous reading function, and the continuous reading data will be converted into unprocessed data, such as the data on the right side of the arrow in Figure 20 . For another example, when the number of consecutive reading data in the second selection range is multiple, for example, the corresponding number of consecutive reading data is at least two, that is, for the continuous reading data "continuous", the number is two, "Continuous" is one, "continued" is one, and then execute the delete function operation instruction on the continuous reading data until the number corresponding to the continuous reading data in the second selection range is one, at this time, the continuous reading function operation instruction will also be cancelled. , so that the continuous reading data is converted into unprocessed data, such as the data in the third row on the right side of the arrow in FIG. 20 .
可以理解的是,上述基于文本编辑器的语音合成方法可应用于智能诊疗、远程会诊。It can be understood that the above text editor-based speech synthesis method can be applied to intelligent diagnosis and treatment and remote consultation.
参照图21,本申请一个实施例还提供了一种基于文本编辑器的语音合成***,包括:Referring to FIG. 21 , an embodiment of the present application also provides a speech synthesis system based on a text editor, including:
数据输入模块100,用于获取待处理文本数据;A data input module 100, configured to obtain text data to be processed;
数据选取模块200,用于根据待处理文本数据,获取数据选取功能操作指令;The data selection module 200 is used to obtain the data selection function operation instruction according to the text data to be processed;
目标数据获取模块300,用于根据数据选取功能操作指令,得到未处理数据和第一选取范围内的目标数据,其中,待处理文本数据包括未处理数据和目标数据;The target data acquisition module 300 is used to obtain unprocessed data and target data within the first selection range according to the data selection function operation instruction, wherein the text data to be processed includes unprocessed data and target data;
目标纠正模块400,用于根据目标数据获取纠正功能操作指令,根据纠正功能操作指令,得到与目标数据对应的目标纠正数据;The target correction module 400 is configured to obtain a correction function operation instruction according to the target data, and obtain target correction data corresponding to the target data according to the correction function operation instruction;
目标修饰模块500,用于根据目标数据获取修饰功能操作指令,根据修饰功能操作指令,得到与目标数据对应的目标修饰数据;The target modification module 500 is configured to obtain a modification function operation instruction according to the target data, and obtain target modification data corresponding to the target data according to the modification function operation instruction;
语音合成模块600,用于根据未处理数据、目标数据、目标纠正数据和目标修饰数据,得到语音合成结果。The speech synthesis module 600 is configured to obtain a speech synthesis result according to the unprocessed data, the target data, the target correction data and the target modification data.
需说明的是,本申请方法实施例的内容均适用于本***实施例,本***实施例所具体实现的功能与上述方法实施例相同,并且达到的有益效果与上述方法达到的有益效果也相同,在此不再赘述。It should be noted that the contents of the method embodiments of the present application are all applicable to the system embodiments, and the functions realized by the system embodiments are the same as those of the above-mentioned method embodiments, and the beneficial effects achieved are also the same as those achieved by the above-mentioned methods , which will not be repeated here.
另外,本申请一个实施例还提供了一种基于文本编辑器的语音合成***,该基于文本编辑器的语音合成***包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现一种基于文本编辑器的语音合成方法,其中,所述基于文本编辑器的语音合成方法包括:获取待处理文本数据;根据所述待处理文本数据,获取数据选取功能操作指令;根据所述数据选取功能操作指令,得到未处理数据和第一选取范围内的目标数据,其中,所述待处理文本数据包括所述未处理数据和所述目标数据;根据所述目标数据获取纠正功能操作指令,根据所述纠正功能操作指令,得到与所述目标数据对应的目标纠正数据;根据所述目标数据获取修饰功能操作指令,根据所述修饰功能操作指令,得到与所述目标数据对应的目标修饰数据;根据所述未处理数据、所述目标数据、所述目标纠正数据和所述目标修饰数据,得到语音合成结果。In addition, an embodiment of the present application also provides a speech synthesis system based on a text editor, and the text editor-based speech synthesis system includes: a memory, a processor, and a computer stored in the memory and operable on the processor program, when the processor executes the computer program, a text editor-based speech synthesis method is implemented, wherein the text editor-based speech synthesis method includes: acquiring text data to be processed; data, to obtain a data selection function operation instruction; according to the data selection function operation instruction, unprocessed data and target data within the first selection range are obtained, wherein the text data to be processed includes the unprocessed data and the target data; obtain correcting function operation instructions according to the target data, obtain target correction data corresponding to the target data according to the correcting function operation instructions; obtain modification function operation instructions according to the target data, and operate according to the modification function an instruction to obtain target modification data corresponding to the target data; and obtain a speech synthesis result according to the unprocessed data, the target data, the target correction data and the target modification data.
处理器和存储器可以通过总线或者其他方式连接。The processor and memory can be connected by a bus or other means.
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可选包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。As a non-transitory computer-readable storage medium, memory can be used to store non-transitory software programs and non-transitory computer-executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
需要说明的是,本实施例中的基于文本编辑器的语音合成***,可以应用为如上述实施例的基于文本编辑器的语音合成方法,本实施例中的基于文本编辑器的语音合成***和如上述实施例的基于文本编辑器的语音合成方法具有相同的发明构思,因此这些实施例具有相同的实现原理以及技术效果,此处不再详述。It should be noted that the text editor-based speech synthesis system in this embodiment can be applied as the text editor-based speech synthesis method in the above-mentioned embodiment, and the text editor-based speech synthesis system and The speech synthesis method based on a text editor as in the above embodiments has the same inventive concept, so these embodiments have the same implementation principle and technical effect, and will not be described in detail here.
实现上述实施例的基于文本编辑器的语音合成方法所需的非暂态软件程序以及指令存储在存储器中,当被处理器执行时,执行上述实施例中的基于文本编辑器的语音合成方法,例如,执行以上描述的图2中的方法步骤S100至S600、图3中的方法步骤S410至S430、图6中的方法步骤S411至S412、图10中的方法步骤S510至S530、图14中的方法步骤S550至S560。The non-transitory software programs and instructions required for realizing the speech synthesis method based on the text editor of the above-mentioned embodiment are stored in the memory, and when executed by the processor, the speech synthesis method based on the text editor in the above-mentioned embodiment is executed, For example, the above described method steps S100 to S600 in FIG. 2, method steps S410 to S430 in FIG. 3, method steps S411 to S412 in FIG. 6, method steps S510 to S530 in FIG. Method steps S550 to S560.
以上所描述的基于文本编辑器的语音合成***实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The above-described embodiment of the text editor-based speech synthesis system is only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple locations. on a network unit. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
此外,本申请一个实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个处理器或控制器执行,例如,被上述基于文本编辑器的语音合成***实施例中的一个处理器执行,可使得上述处理器执行上述实施例中的基于文本编辑器的语音合成方法,例如,执行以上描述的图2中的方法步骤S100至S600、图3中的方法步骤S410至S430、图6中的方法步骤S411至S412、图10中的方法步骤S510至S530、图14中的方法步骤S550至S560。此外,所述计算机可读存储介质可以是非易失性,也可以是易失性。In addition, an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by a processor or a controller, for example, by the above-mentioned Execution by a processor in the speech synthesis system embodiment of the text editor can cause the processor to execute the speech synthesis method based on the text editor in the above embodiment, for example, execute the method steps S100 to 2 in FIG. 2 described above. S600, method steps S410 to S430 in FIG. 3 , method steps S411 to S412 in FIG. 6 , method steps S510 to S530 in FIG. 10 , method steps S550 to S560 in FIG. 14 . In addition, the computer-readable storage medium may be non-volatile or volatile.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、***可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。Those skilled in the art can understand that all or some of the steps and systems in the methods disclosed above can be implemented as software, firmware, hardware and an appropriate combination thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit . Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
以上是对本申请的较佳实施进行了具体说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of the preferred implementation of the application, but the application is not limited to the above-mentioned implementation, and those skilled in the art can also make various equivalent deformations or replacements without violating the spirit of the application. Equivalent modifications or replacements are all within the scope defined by the claims of the present application.

Claims (20)

  1. 一种基于文本编辑器的语音合成方法,其中,包括以下步骤:A method for speech synthesis based on a text editor, comprising the following steps:
    获取待处理文本数据;Get the text data to be processed;
    根据所述待处理文本数据,获取数据选取功能操作指令;Obtain a data selection function operation instruction according to the text data to be processed;
    根据所述数据选取功能操作指令,得到未处理数据和第一选取范围内的目标数据,其中,所述待处理文本数据包括所述未处理数据和所述目标数据;Obtain unprocessed data and target data within the first selection range according to the data selection function operation instruction, wherein the text data to be processed includes the unprocessed data and the target data;
    根据所述目标数据获取纠正功能操作指令,根据所述纠正功能操作指令,得到与所述目标数据对应的目标纠正数据;Acquiring a correction function operation instruction according to the target data, and obtaining target correction data corresponding to the target data according to the correction function operation instruction;
    根据所述目标数据获取修饰功能操作指令,根据所述修饰功能操作指令,得到与所述目标数据对应的目标修饰数据;Acquiring a modification function operation instruction according to the target data, and obtaining target modification data corresponding to the target data according to the modification function operation instruction;
    根据所述未处理数据、所述目标数据、所述目标纠正数据和所述目标修饰数据,得到语音合成结果。A speech synthesis result is obtained according to the unprocessed data, the target data, the target correction data and the target modification data.
  2. 根据权利要求1所述的基于文本编辑器的语音合成方法,其中,所述目标数据包括拼音数据,所述目标纠正数据包括目标拼音数据,所述纠正功能操作指令包括拼音发音纠正功能操作指令,The speech synthesis method based on a text editor according to claim 1, wherein the target data includes pinyin data, the target correction data includes target pinyin data, and the correction function operation instruction includes a phonetic pronunciation correction function operation instruction,
    所述根据所述目标数据获取纠正功能操作指令,根据所述纠正功能操作指令,得到与所述目标数据对应的目标纠正数据,包括:The obtaining the correction function operation instruction according to the target data, and obtaining the target correction data corresponding to the target data according to the correction function operation instruction includes:
    根据所述拼音数据,得到与所述拼音数据对应的至少一种候选拼音结果,其中,所述候选拼音结果根据每一所述候选拼音结果对应的发音概率值进行排序;According to the pinyin data, at least one candidate pinyin result corresponding to the pinyin data is obtained, wherein the candidate pinyin results are sorted according to the pronunciation probability value corresponding to each of the candidate pinyin results;
    根据所述候选拼音结果,获取拼音发音纠正功能操作指令;According to the candidate pinyin result, obtain the pinyin pronunciation correction function operation instruction;
    根据所述拼音发音纠正功能操作指令,得到与所述拼音数据对应的目标拼音数据。According to the operation instruction of the pinyin pronunciation correction function, the target pinyin data corresponding to the pinyin data is obtained.
  3. 根据权利要求2所述的基于文本编辑器的语音合成方法,其中,所述拼音发音纠正功能操作指令包括多音字发音纠正功能操作指令,The speech synthesis method based on a text editor according to claim 2, wherein said pinyin pronunciation correction function operation instruction comprises a polyphone pronunciation correction function operation instruction,
    所述根据所述拼音数据,得到与所述拼音数据对应的至少一种候选拼音结果,包括:According to the pinyin data, obtaining at least one candidate pinyin result corresponding to the pinyin data includes:
    根据所述拼音数据,获取多音字发音纠正功能操作指令;According to the pinyin data, obtain polyphonic word pronunciation correction function operation instructions;
    根据所述多音字发音纠正功能操作指令,得到与所述拼音数据对应的至少两种所述候选拼音结果。At least two candidate pinyin results corresponding to the pinyin data are obtained according to the operation instruction of the polyphone pronunciation correction function.
  4. 根据权利要求1所述的基于文本编辑器的语音合成方法,其中,所述修饰功能操作指令包括***点功能操作指令,The speech synthesis method based on a text editor according to claim 1, wherein the modification function operation instruction comprises an insertion point function operation instruction,
    所述根据所述目标数据获取修饰功能操作指令,根据所述修饰功能操作指令,得到与所述目标数据对应的目标修饰数据,包括:The obtaining the modification function operation instruction according to the target data, and obtaining the target modification data corresponding to the target data according to the modification function operation instruction include:
    根据所述目标数据,获取目标***位置,其中,所述目标***位置为所述目标数据对应的左侧位置和/或右侧位置;Acquiring a target insertion position according to the target data, wherein the target insertion position is a left position and/or a right position corresponding to the target data;
    根据所述目标***位置,获取所述***点功能操作指令;Acquiring the insertion point function operation instruction according to the target insertion position;
    根据所述***点功能操作指令,得到与所述目标***位置对应的目标修饰数据,其中,所述目标修饰数据包括目标停顿数据、目标静音数据、目标特效音数据中的至少一种。According to the insertion point function operation instruction, the target modification data corresponding to the target insertion position is obtained, wherein the target modification data includes at least one of target pause data, target mute data, and target special effect sound data.
  5. 根据权利要求1所述的基于文本编辑器的语音合成方法,其中,所述修饰功能操作指令包括连读功能操作指令、变速功能操作指令、场景功能操作指令,The speech synthesis method based on a text editor according to claim 1, wherein the modification function operation instruction includes a continuous reading function operation instruction, a speed change function operation instruction, and a scene function operation instruction,
    所述根据所述目标数据获取修饰功能操作指令,根据所述修饰功能操作指令,得到与所述目标数据对应的目标修饰数据,包括:The obtaining the modification function operation instruction according to the target data, and obtaining the target modification data corresponding to the target data according to the modification function operation instruction include:
    根据所述目标数据,获取连读功能操作指令、变速功能操作指令以及场景功能操作指令;According to the target data, obtain the continuous reading function operation instruction, the speed change function operation instruction and the scene function operation instruction;
    根据所述连读功能操作指令对应的优先级、所述变速功能操作指令对应的优先级以及所述场景功能操作指令对应的优先级,得到与所述目标数据对应的目标优先修饰数据,其中,定义所述连读功能操作指令对应的优先级大于所述变速功能操作指令对应的优先级,所述变速功能操作指令对应的优先级大于所述场景功能操作指令对应的优先级。According to the priority corresponding to the continuous reading function operation instruction, the priority corresponding to the speed change function operation instruction, and the priority corresponding to the scene function operation instruction, the object priority modification data corresponding to the object data is obtained, wherein, It is defined that the priority corresponding to the continuous reading function operation instruction is higher than the priority corresponding to the speed change function operation instruction, and the priority corresponding to the speed change function operation instruction is higher than the priority corresponding to the scene function operation instruction.
  6. 根据权利要求1所述的基于文本编辑器的语音合成方法,其中,在所述根据所述数据选取功能操作指令,得到未处理数据和第一选取范围内的目标数据之后,所述语音合成方法还包括:The speech synthesis method based on a text editor according to claim 1, wherein, after said selecting function operation instructions according to said data, obtaining unprocessed data and target data in the first selection range, said speech synthesis method Also includes:
    重新获取所述数据选取功能操作指令,并删除所述第一选取范围,得到第二选取范围内的目标数据。The operation instruction of the data selection function is acquired again, and the first selection range is deleted to obtain the target data within the second selection range.
  7. 根据权利要求6所述的基于文本编辑器的语音合成方法,其中,所述修饰功能操作指令包括连读功能操作指令,所述目标数据包括连读数据,所述根据所述目标数据获取修饰功能操作指令,根据所述修饰功能操作指令,得到与所述目标数据对应的目标修饰数据,包括:The speech synthesis method based on a text editor according to claim 6, wherein the modification function operation instruction includes a continuous reading function operation instruction, the target data includes continuous reading data, and the modification function is obtained according to the target data. The operation instruction is to obtain the target modification data corresponding to the target data according to the modification function operation instruction, including:
    若所述第二选取范围内的连读数据对应的数量为一个,则撤销所述连读功能操作指令。If the number corresponding to the continuous reading data in the second selection range is one, cancel the continuous reading function operation instruction.
  8. 一种基于文本编辑器的语音合成***,其中,包括:A speech synthesis system based on a text editor, including:
    数据输入模块,用于获取待处理文本数据;The data input module is used to obtain the text data to be processed;
    数据选取模块,用于根据所述待处理文本数据,获取数据选取功能操作指令;The data selection module is used to obtain the data selection function operation instruction according to the text data to be processed;
    目标数据获取模块,用于根据所述数据选取功能操作指令,得到未处理数据和第一选取范围内的目标数据,其中,所述待处理文本数据包括所述未处理数据和所述目标数据;A target data acquisition module, configured to obtain unprocessed data and target data within a first selection range according to the data selection function operation instruction, wherein the text data to be processed includes the unprocessed data and the target data;
    目标纠正模块,用于根据所述目标数据获取纠正功能操作指令,根据所述纠正功能操作指令,得到与所述目标数据对应的目标纠正数据;A target correction module, configured to obtain a correction function operation instruction according to the target data, and obtain target correction data corresponding to the target data according to the correction function operation instruction;
    目标修饰模块,用于根据所述目标数据获取修饰功能操作指令,根据所述修饰功能操作指令,得到与所述目标数据对应的目标修饰数据;A target modification module, configured to obtain a modification function operation instruction according to the target data, and obtain target modification data corresponding to the target data according to the modification function operation instruction;
    语音合成模块,用于根据所述未处理数据、所述目标数据、所述目标纠正数据和所述目标修饰数据,得到语音合成结果。A speech synthesis module, configured to obtain a speech synthesis result according to the unprocessed data, the target data, the target correction data and the target modification data.
  9. 一种基于文本编辑器的语音合成***,其中,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现一种基于文本编辑器的语音合成方法:A speech synthesis system based on a text editor, including: a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the computer program, a text-based speech synthesis system is implemented. The editor's speech synthesis method:
    其中,所述基于文本编辑器的语音合成方法包括:Wherein, the speech synthesis method based on the text editor comprises:
    获取待处理文本数据;Get the text data to be processed;
    根据所述待处理文本数据,获取数据选取功能操作指令;Obtain a data selection function operation instruction according to the text data to be processed;
    根据所述数据选取功能操作指令,得到未处理数据和第一选取范围内的目标数据,其中,所述待处理文本数据包括所述未处理数据和所述目标数据;Obtain unprocessed data and target data within the first selection range according to the data selection function operation instruction, wherein the text data to be processed includes the unprocessed data and the target data;
    根据所述目标数据获取纠正功能操作指令,根据所述纠正功能操作指令,得到与所述目标数据对应的目标纠正数据;Acquiring a correction function operation instruction according to the target data, and obtaining target correction data corresponding to the target data according to the correction function operation instruction;
    根据所述目标数据获取修饰功能操作指令,根据所述修饰功能操作指令,得到与所述目标数据对应的目标修饰数据;Acquiring a modification function operation instruction according to the target data, and obtaining target modification data corresponding to the target data according to the modification function operation instruction;
    根据所述未处理数据、所述目标数据、所述目标纠正数据和所述目标修饰数据,得到语音合成结果。A speech synthesis result is obtained according to the unprocessed data, the target data, the target correction data and the target modification data.
  10. 根据权利要求9所述的基于文本编辑器的语音合成***,其中,所述目标数据包括拼音数据,所述目标纠正数据包括目标拼音数据,所述纠正功能操作指令包括拼音发音纠正功能操作指令,The speech synthesis system based on a text editor according to claim 9, wherein the target data includes pinyin data, the target correction data includes target pinyin data, and the correction function operation instruction includes a phonetic pronunciation correction function operation instruction,
    所述根据所述目标数据获取纠正功能操作指令,根据所述纠正功能操作指令,得到与所述目标数据对应的目标纠正数据,包括:The obtaining the correction function operation instruction according to the target data, and obtaining the target correction data corresponding to the target data according to the correction function operation instruction includes:
    根据所述拼音数据,得到与所述拼音数据对应的至少一种候选拼音结果,其中,所述候选拼音结果根据每一所述候选拼音结果对应的发音概率值进行排序;According to the pinyin data, at least one candidate pinyin result corresponding to the pinyin data is obtained, wherein the candidate pinyin results are sorted according to the pronunciation probability value corresponding to each of the candidate pinyin results;
    根据所述候选拼音结果,获取拼音发音纠正功能操作指令;According to the candidate pinyin result, obtain the pinyin pronunciation correction function operation instruction;
    根据所述拼音发音纠正功能操作指令,得到与所述拼音数据对应的目标拼音数据。According to the operation instruction of the pinyin pronunciation correction function, the target pinyin data corresponding to the pinyin data is obtained.
  11. 根据权利要求10所述的基于文本编辑器的语音合成***,其中,所述拼音发音纠正功能操作指令包括多音字发音纠正功能操作指令,The speech synthesis system based on a text editor according to claim 10, wherein said Pinyin pronunciation correction function operation instructions include polyphone pronunciation correction function operation instructions,
    所述根据所述拼音数据,得到与所述拼音数据对应的至少一种候选拼音结果,包括:According to the pinyin data, obtaining at least one candidate pinyin result corresponding to the pinyin data includes:
    根据所述拼音数据,获取多音字发音纠正功能操作指令;According to the pinyin data, obtain polyphonic word pronunciation correction function operation instructions;
    根据所述多音字发音纠正功能操作指令,得到与所述拼音数据对应的至少两种所述候选拼音结果。At least two candidate pinyin results corresponding to the pinyin data are obtained according to the operation instruction of the polyphone pronunciation correction function.
  12. 根据权利要求9所述的基于文本编辑器的语音合成***,其中,所述修饰功能操作指令包括***点功能操作指令,The speech synthesis system based on a text editor according to claim 9, wherein the modification function operation instruction comprises an insertion point function operation instruction,
    所述根据所述目标数据获取修饰功能操作指令,根据所述修饰功能操作指令,得到与所述目标数据对应的目标修饰数据,包括:The obtaining the modification function operation instruction according to the target data, and obtaining the target modification data corresponding to the target data according to the modification function operation instruction include:
    根据所述目标数据,获取目标***位置,其中,所述目标***位置为所述目标数据对应的左侧位置和/或右侧位置;Acquiring a target insertion position according to the target data, wherein the target insertion position is a left position and/or a right position corresponding to the target data;
    根据所述目标***位置,获取所述***点功能操作指令;Acquiring the insertion point function operation instruction according to the target insertion position;
    根据所述***点功能操作指令,得到与所述目标***位置对应的目标修饰数据,其中,所述目标修饰数据包括目标停顿数据、目标静音数据、目标特效音数据中的至少一种。According to the insertion point function operation instruction, the target modification data corresponding to the target insertion position is obtained, wherein the target modification data includes at least one of target pause data, target mute data, and target special effect sound data.
  13. 根据权利要求9所述的基于文本编辑器的语音合成***,其中,所述修饰功能操作指令包括连读功能操作指令、变速功能操作指令、场景功能操作指令,The speech synthesis system based on a text editor according to claim 9, wherein the modification function operation instructions include continuous reading function operation instructions, variable speed function operation instructions, and scene function operation instructions,
    所述根据所述目标数据获取修饰功能操作指令,根据所述修饰功能操作指令,得到与所述目标数据对应的目标修饰数据,包括:The obtaining the modification function operation instruction according to the target data, and obtaining the target modification data corresponding to the target data according to the modification function operation instruction include:
    根据所述目标数据,获取连读功能操作指令、变速功能操作指令以及场景功能操作指令;According to the target data, obtain the continuous reading function operation instruction, the speed change function operation instruction and the scene function operation instruction;
    根据所述连读功能操作指令对应的优先级、所述变速功能操作指令对应的优先级以及所述场景功能操作指令对应的优先级,得到与所述目标数据对应的目标优先修饰数据,其中,定义所述连读功能操作指令对应的优先级大于所述变速功能操作指令对应的优先级,所述变速功能操作指令对应的优先级大于所述场景功能操作指令对应的优先级。According to the priority corresponding to the continuous reading function operation instruction, the priority corresponding to the speed change function operation instruction, and the priority corresponding to the scene function operation instruction, the object priority modification data corresponding to the object data is obtained, wherein, It is defined that the priority corresponding to the continuous reading function operation instruction is higher than the priority corresponding to the speed change function operation instruction, and the priority corresponding to the speed change function operation instruction is higher than the priority corresponding to the scene function operation instruction.
  14. 根据权利要求9所述的基于文本编辑器的语音合成***,其中,在所述根据所述数据选取功能操作指令,得到未处理数据和第一选取范围内的目标数据之后,所述语音合成方法还包括:The speech synthesis system based on a text editor according to claim 9, wherein, after obtaining the unprocessed data and the target data in the first selection range according to the operation instruction of the data selection function, the speech synthesis method Also includes:
    重新获取所述数据选取功能操作指令,并删除所述第一选取范围,得到第二选取范围内的目标数据。The operation instruction of the data selection function is acquired again, and the first selection range is deleted to obtain the target data within the second selection range.
  15. 一种计算机可读存储介质,其中,存储有计算机可执行指令,所述计算机可执行指令用于执行一种基于文本编辑器的语音合成方法,其中,所述基于文本编辑器的语音合成方法包括以下步骤:A computer-readable storage medium, wherein computer-executable instructions are stored, and the computer-executable instructions are used to execute a text editor-based speech synthesis method, wherein the text editor-based speech synthesis method includes The following steps:
    获取待处理文本数据;Get the text data to be processed;
    根据所述待处理文本数据,获取数据选取功能操作指令;Obtain a data selection function operation instruction according to the text data to be processed;
    根据所述数据选取功能操作指令,得到未处理数据和第一选取范围内的目标数据,其中,所述待处理文本数据包括所述未处理数据和所述目标数据;Obtain unprocessed data and target data within the first selection range according to the data selection function operation instruction, wherein the text data to be processed includes the unprocessed data and the target data;
    根据所述目标数据获取纠正功能操作指令,根据所述纠正功能操作指令,得到与所述目标数据对应的目标纠正数据;Acquiring a correction function operation instruction according to the target data, and obtaining target correction data corresponding to the target data according to the correction function operation instruction;
    根据所述目标数据获取修饰功能操作指令,根据所述修饰功能操作指令,得到与所述目标数据对应的目标修饰数据;Acquiring a modification function operation instruction according to the target data, and obtaining target modification data corresponding to the target data according to the modification function operation instruction;
    根据所述未处理数据、所述目标数据、所述目标纠正数据和所述目标修饰数据,得到语音合成结果。A speech synthesis result is obtained according to the unprocessed data, the target data, the target correction data and the target modification data.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述目标数据包括拼音数据,所述目标纠正数据包括目标拼音数据,所述纠正功能操作指令包括拼音发音纠正功能操作指令,The computer-readable storage medium according to claim 15, wherein the target data includes pinyin data, the target correction data includes target pinyin data, and the correction function operation instruction includes a pinyin pronunciation correction function operation instruction,
    所述根据所述目标数据获取纠正功能操作指令,根据所述纠正功能操作指令,得到与所述目标数据对应的目标纠正数据,包括:The obtaining the correction function operation instruction according to the target data, and obtaining the target correction data corresponding to the target data according to the correction function operation instruction includes:
    根据所述拼音数据,得到与所述拼音数据对应的至少一种候选拼音结果,其中,所述候选拼音结果根据每一所述候选拼音结果对应的发音概率值进行排序;According to the pinyin data, at least one candidate pinyin result corresponding to the pinyin data is obtained, wherein the candidate pinyin results are sorted according to the pronunciation probability value corresponding to each of the candidate pinyin results;
    根据所述候选拼音结果,获取拼音发音纠正功能操作指令;According to the candidate pinyin result, obtain the pinyin pronunciation correction function operation instruction;
    根据所述拼音发音纠正功能操作指令,得到与所述拼音数据对应的目标拼音数据。According to the operation instruction of the pinyin pronunciation correction function, the target pinyin data corresponding to the pinyin data is obtained.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述拼音发音纠正功能操作指令包括多音字发音纠正功能操作指令,The computer-readable storage medium according to claim 16, wherein said pinyin pronunciation correction function operation instructions include polyphone pronunciation correction function operation instructions,
    所述根据所述拼音数据,得到与所述拼音数据对应的至少一种候选拼音结果,包括:According to the pinyin data, obtaining at least one candidate pinyin result corresponding to the pinyin data includes:
    根据所述拼音数据,获取多音字发音纠正功能操作指令;According to the pinyin data, obtain polyphonic word pronunciation correction function operation instructions;
    根据所述多音字发音纠正功能操作指令,得到与所述拼音数据对应的至少两种所述候选拼音结果。At least two candidate pinyin results corresponding to the pinyin data are obtained according to the operation instruction of the polyphone pronunciation correction function.
  18. 根据权利要求15所述的计算机可读存储介质,其中,所述修饰功能操作指令包括***点功能操作指令,The computer-readable storage medium according to claim 15, wherein the modification function operation instruction comprises an insertion point function operation instruction,
    所述根据所述目标数据获取修饰功能操作指令,根据所述修饰功能操作指令,得到与所述目标数据对应的目标修饰数据,包括:The acquiring modification function operation instructions according to the target data, and obtaining target modification data corresponding to the target data according to the modification function operation instructions include:
    根据所述目标数据,获取目标***位置,其中,所述目标***位置为所述目标数据对应的左侧位置和/或右侧位置;Acquiring a target insertion position according to the target data, wherein the target insertion position is a left position and/or a right position corresponding to the target data;
    根据所述目标***位置,获取所述***点功能操作指令;Acquiring the insertion point function operation instruction according to the target insertion position;
    根据所述***点功能操作指令,得到与所述目标***位置对应的目标修饰数据,其中,所述目标修饰数据包括目标停顿数据、目标静音数据、目标特效音数据中的至少一种。According to the insertion point function operation instruction, the target modification data corresponding to the target insertion position is obtained, wherein the target modification data includes at least one of target pause data, target mute data, and target special effect sound data.
  19. 根据权利要求15所述的计算机可读存储介质,其中,所述修饰功能操作指令包括连读功能操作指令、变速功能操作指令、场景功能操作指令,The computer-readable storage medium according to claim 15, wherein the modification function operation instruction includes a continuous reading function operation instruction, a speed change function operation instruction, and a scene function operation instruction,
    所述根据所述目标数据获取修饰功能操作指令,根据所述修饰功能操作指令,得到与所述目标数据对应的目标修饰数据,包括:The obtaining the modification function operation instruction according to the target data, and obtaining the target modification data corresponding to the target data according to the modification function operation instruction include:
    根据所述目标数据,获取连读功能操作指令、变速功能操作指令以及场景功能操作指令;According to the target data, obtain the continuous reading function operation instruction, the speed change function operation instruction and the scene function operation instruction;
    根据所述连读功能操作指令对应的优先级、所述变速功能操作指令对应的优先级以及所述场景功能操作指令对应的优先级,得到与所述目标数据对应的目标优先修饰数据,其中,定义所述连读功能操作指令对应的优先级大于所述变速功能操作指令对应的优先级,所述变速功能操作指令对应的优先级大于所述场景功能操作指令对应的优先级。According to the priority corresponding to the continuous reading function operation instruction, the priority corresponding to the speed change function operation instruction, and the priority corresponding to the scene function operation instruction, the object priority modification data corresponding to the object data is obtained, wherein, It is defined that the priority corresponding to the continuous reading function operation instruction is higher than the priority corresponding to the speed change function operation instruction, and the priority corresponding to the speed change function operation instruction is higher than the priority corresponding to the scene function operation instruction.
  20. 根据权利要求15所述的计算机可读存储介质,其中,在所述根据所述数据选取功能操作指令,得到未处理数据和第一选取范围内的目标数据之后,所述语音合成方法还包括:The computer-readable storage medium according to claim 15, wherein, after obtaining the unprocessed data and the target data within the first selection range according to the data selection function operation instruction, the speech synthesis method further comprises:
    重新获取所述数据选取功能操作指令,并删除所述第一选取范围,得到第二选取范围内的目标数据。The operation instruction of the data selection function is acquired again, and the first selection range is deleted to obtain the target data within the second selection range.
PCT/CN2022/090728 2021-10-22 2022-04-29 Speech synthesis method and system based on text editor, and storage medium WO2023065641A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111236415.5 2021-10-22
CN202111236415.5A CN113963681A (en) 2021-10-22 2021-10-22 Speech synthesis method, system and storage medium based on text editor

Publications (1)

Publication Number Publication Date
WO2023065641A1 true WO2023065641A1 (en) 2023-04-27

Family

ID=79466238

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090728 WO2023065641A1 (en) 2021-10-22 2022-04-29 Speech synthesis method and system based on text editor, and storage medium

Country Status (2)

Country Link
CN (1) CN113963681A (en)
WO (1) WO2023065641A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113963681A (en) * 2021-10-22 2022-01-21 平安科技(深圳)有限公司 Speech synthesis method, system and storage medium based on text editor

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011170191A (en) * 2010-02-19 2011-09-01 Fujitsu Ltd Speech synthesis device, speech synthesis method and speech synthesis program
CN110767209A (en) * 2019-10-31 2020-02-07 标贝(北京)科技有限公司 Speech synthesis method, apparatus, system and storage medium
CN111142667A (en) * 2019-12-27 2020-05-12 苏州思必驰信息科技有限公司 System and method for generating voice based on text mark
CN111199724A (en) * 2019-12-31 2020-05-26 出门问问信息科技有限公司 Information processing method and device and computer readable storage medium
CN112037756A (en) * 2020-07-31 2020-12-04 北京搜狗科技发展有限公司 Voice processing method, apparatus and medium
CN113963681A (en) * 2021-10-22 2022-01-21 平安科技(深圳)有限公司 Speech synthesis method, system and storage medium based on text editor

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011170191A (en) * 2010-02-19 2011-09-01 Fujitsu Ltd Speech synthesis device, speech synthesis method and speech synthesis program
CN110767209A (en) * 2019-10-31 2020-02-07 标贝(北京)科技有限公司 Speech synthesis method, apparatus, system and storage medium
CN111142667A (en) * 2019-12-27 2020-05-12 苏州思必驰信息科技有限公司 System and method for generating voice based on text mark
CN111199724A (en) * 2019-12-31 2020-05-26 出门问问信息科技有限公司 Information processing method and device and computer readable storage medium
CN112037756A (en) * 2020-07-31 2020-12-04 北京搜狗科技发展有限公司 Voice processing method, apparatus and medium
CN113963681A (en) * 2021-10-22 2022-01-21 平安科技(深圳)有限公司 Speech synthesis method, system and storage medium based on text editor

Also Published As

Publication number Publication date
CN113963681A (en) 2022-01-21

Similar Documents

Publication Publication Date Title
US7062437B2 (en) Audio renderings for expressing non-audio nuances
JP4145796B2 (en) Method and system for writing dictation of text files and for correcting text
US20090204399A1 (en) Speech data summarizing and reproducing apparatus, speech data summarizing and reproducing method, and speech data summarizing and reproducing program
JP2009522614A (en) Method and system for text editing and score reproduction
WO2017062961A1 (en) Methods and systems for interactive multimedia creation
CN104412320A (en) Automated performance technology using audio waveform data
US20060195318A1 (en) System for correction of speech recognition results with confidence level indication
WO2023065641A1 (en) Speech synthesis method and system based on text editor, and storage medium
GB2444539A (en) Altering text attributes in a text-to-speech converter to change the output speech characteristics
US7094960B2 (en) Musical score display apparatus
CN109243450A (en) Interactive voice recognition method and system
JP2018142286A (en) Program for making electronic book
CN101667173A (en) Music numerical notation input editing system
KR100830689B1 (en) Method of reproducing multimedia for educating foreign language by chunking and Media recorded thereby
JP2005517216A (en) Transcription method and apparatus assisted in fast and pattern recognition of spoken and written words
US7376332B2 (en) Information processing method and information processing apparatus
KR100383061B1 (en) A learning method using a digital audio with caption data
US20150154000A1 (en) Information processing device, information processing method, and program
JPH07160289A (en) Voice recognition method and device
KR20000063615A (en) Method of Reproducing Audio Signal Corresponding to Partially Designated Text Data and Reproducing Apparatus for the Same
CN110085227B (en) Method and device for editing voice skill file, electronic equipment and readable medium
JPS6184771A (en) Voice input device
CN107679068A (en) The information of multimedia file imports and display methods, mobile terminal and storage device
JP4200093B2 (en) Lyric telop display system for karaoke equipment
KR102377038B1 (en) Method for generating speaker-labeled text

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22882266

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE