CN106294310B

CN106294310B - A kind of Tibetan language tone prediction technique and system

Info

Publication number: CN106294310B
Application number: CN201510325742.6A
Authority: CN
Inventors: 祖漪清; 尹大勇; 高杰; 朱荣华; 王影; 胡国平; 胡郁; 刘庆峰
Original assignee: Xun Feizhi Metamessage Science And Technology Ltd
Current assignee: Xun Feizhi Metamessage Science And Technology Ltd
Priority date: 2015-06-12
Filing date: 2015-06-12
Publication date: 2019-05-03
Anticipated expiration: 2035-06-12
Also published as: CN106294310A

Abstract

The invention discloses a kind of Tibetan language tone prediction technique and systems, comprising: receives Tibetan language text to be processed；Word segmentation processing is carried out to the Tibetan language text to be processed, obtains each word unit；According to context information of institute's predicate unit in the Tibetan language text to be processed, the part of speech of institute's predicate unit is determined；It predicts the rhythm boundary of the Tibetan language text to be processed, and according to the part of speech of rhythm boundary word unit, adjusts the word elementary boundary of rhythm boundary；According to the part of speech of each word unit, tone prediction is carried out to the syllable unit of the Tibetan language text to be processed after adjustment word elementary boundary, obtains the tone information of Tibetan language text to be processed.Using the present invention, it can solve modified tone problem of more tone recognition words in different rhythm boundaries, effectively improve Tibetan voice systematic difference effect.

Description

Tibetan tone prediction method and system

Technical Field

The invention relates to the field of Tibetan information processing, in particular to a Tibetan tone prediction method and system.

Background

The speech synthesis is an important component in the speech information processing, and refers to a process of outputting speech after a text is converted to a certain degree, and the synthesized speech has good naturalness and intelligibility as much as possible, and the text processing is the front-end text analysis processing of a speech synthesis system. The Tibetan tone prediction method is a key point of research in Tibetan text processing due to the uniqueness of Tibetan pronunciation.

The Tibetan language includes Lhasa, Kangba and Anduo, among which Lhasa is the main one. One prominent phonetic feature of the words of rasa is the presence of tones, and the Tibetan language described below refers primarily to the words of rasa. The Tibetan language voice synthesis method mainly comprises word-sound conversion, tone prediction and the like. In short, the pitch is the elevation of the sound. Due to the particularity of the Tibetan language, the tone information of the multi-tone mode words of the Tibetan language at different prosodic boundaries is different, and the semantic understanding can be seriously influenced by the tone change. If the tone type of the Tibetan cannot be accurately predicted, the application effect of the Tibetan language word-sound conversion will be reduced. The existing Tibetan tone prediction method is generally based on a rule method, namely, the initial consonants and the final consonants of syllables are classified, and the tone type of the syllables is obtained by searching a tone type table according to the combination condition of the classified initial consonants and the final consonants. However, the existing method is not complete enough in analyzing the tone type change mechanism, and the characteristics of the Tibetan language are not considered, so that the existing Tibetan language tone prediction method cannot accurately predict the tones of multi-tone mode words at different prosodic boundaries, the naturalness of Tibetan language synthesis is reduced, and even the intelligibility is influenced.

Disclosure of Invention

The embodiment of the invention provides a Tibetan tone prediction method and system, which solve the problem that the tones of multi-tone mode words in Tibetan are different at different prosodic boundaries so as to enable Tibetan speech to be synthesized more naturally.

Therefore, the embodiment of the invention provides the following technical scheme:

a Tibetan tone prediction method comprises the following steps:

receiving a Tibetan language text to be processed;

performing word segmentation processing on the Tibetan language text to be processed to obtain word units;

determining the part of speech of the word unit according to the context environment information of the word unit in the Tibetan language text to be processed;

predicting a prosodic boundary of the Tibetan language text to be processed, and adjusting a word unit boundary at the prosodic boundary according to the part of speech of a word unit at the prosodic boundary;

and performing tone prediction on the syllable unit of the Tibetan language text to be processed after the word unit boundary is adjusted according to the part of speech of each word unit to obtain tone information of the Tibetan language text to be processed.

Preferably, the determining the part of speech of each word unit according to the context and environment information of each word unit in the Tibetan language text to be processed includes:

dividing the Tibetan language text to be processed into sentences;

predicting the part of speech of the word unit in the sentence;

determining a type of the word unit;

and adjusting the part of speech of the word unit according to the type of the word unit.

Preferably, the sentence dividing of the Tibetan language text to be processed includes:

predicting the first-level part of speech of each word unit, wherein the first-level part of speech comprises the following steps: verbs, real words, pronouns, imaginary words, general affixes, verb configuration affixes;

if the single plumb signThe first-level part of speech of the previous word unit is a verb or verb configuration affix, and the place of the single plumb is a sentence boundary;

if the single plumb signAnd if the first-level part of speech of the previous word unit is not a verb or verb configuration affix, predicting the sentence boundary by a statistical modeling method.

Preferably, the predicting the first-level part of speech of each word unit includes:

acquiring candidate primary parts of speech of each word unit;

extracting context relevant characteristics of the current word unit;

and determining the primary part-of-speech of the current word unit from the candidate primary part-of-speech of the current word unit by a statistical modeling method according to the context correlation characteristics of the current word unit.

Preferably, the type of the word unit includes any one or more of the following: polyphonic mode words, null words, affixes, and regular words.

Preferably, the adjusting the word unit boundary at the prosodic boundary according to the part of speech of the word unit at the prosodic boundary includes:

when the word unit at the prosodic boundary is a multi-tone mode word unit and the part of speech is a verb or an adjective, splitting the multi-tone mode word unit by taking syllables as a unit, and performing subsequent tone prediction by using the split syllable unit.

A Tibetan tone prediction system, comprising:

the receiving module is used for receiving the Tibetan language text;

the word segmentation module is used for carrying out word segmentation processing on the Tibetan language text to be processed to obtain word units;

the part-of-speech determining module is used for determining the part of speech of the word unit according to the context environment information of the word unit in the Tibetan language text to be processed;

the word unit boundary adjusting module is used for predicting the prosodic boundary of the Tibetan language text to be processed and adjusting the word unit boundary at the prosodic boundary according to the part of speech of the word unit at the prosodic boundary;

and the tone prediction module is used for carrying out tone prediction on the syllable unit of the Tibetan language text to be processed after the word unit boundary is adjusted according to the part of speech of each word unit to obtain tone information of the Tibetan language text to be processed.

Preferably, the part of speech determination module includes:

the sentence dividing unit is used for dividing the Tibetan language text to be processed;

the part of speech prediction unit is used for predicting the part of speech of each word unit in the sentence;

the word type determining unit is used for determining the type of each word unit;

and the part-of-speech adjusting unit is used for adjusting the part of speech of the word unit according to the type of the word unit.

Preferably, the sentence dividing unit includes:

a first-level part-of-speech predicting subunit, configured to predict a first-level part-of-speech of each word unit, where the first-level part-of-speech includes: verbs, real words, pronouns, imaginary words, general affixes, verb configuration affixes;

a first sentence boundary determining subunit for determining if a single plumb is presentThe first-level part of speech of the previous word unit is a verb or verb configuration affix, and the place of the single plumb is a sentence boundary;

second sentence boundary determining subunit for determining if single plumbAnd if the first-level part of speech of the previous word unit is not a verb or verb configuration affix, predicting the sentence boundary by a statistical modeling method.

Preferably, the first sentence boundary determination subunit is specifically configured to: obtaining candidate first-level parts of speech of each word unit through table lookup; extracting context relevant characteristics of the current word unit; and determining the primary part-of-speech of the current word unit from the candidate primary part-of-speech of the current word unit by a statistical modeling method according to the context correlation characteristics of the current word unit.

Preferably, the word unit boundary adjusting module is specifically configured to: when the word unit at the prosodic boundary is a multi-tone mode word unit and the part of speech is a verb or an adjective, splitting the multi-tone mode word unit by taking syllables as a unit, and performing subsequent tone prediction by using the split syllable unit.

The Tibetan tone prediction method and the system provided by the embodiment of the invention have the advantages that the word segmentation processing is carried out on the received Tibetan text to be processed to obtain each word unit, the type of each word unit is determined, then the word unit boundary at the acquired prosodic boundary is adjusted according to the type of the word unit, and the word unit tone of the Tibetan text to be processed after the word unit boundary is adjusted is predicted. In the process of predicting the tone of the word unit, the word unit boundary at the prosodic boundary in the Tibetan language text is adjusted according to the part of speech of the word unit, and the tone of the word unit at the prosodic boundary is predicted according to the part of speech of the adjusted word unit at the prosodic boundary, so that the problem of different tones of multi-tone mode words in the Tibetan language at different prosodic boundaries is solved, the naturalness of Tibetan language voice synthesis is improved, and the intelligibility of the Tibetan language voice is ensured.

Drawings

In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a flow chart of a Tibetan tone prediction method according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a Tibetan tone prediction system according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a Tibetan language text processing system according to an embodiment of the present invention.

Detailed Description

In order to make the technical field better understand the scheme of the embodiment of the invention, the invention is further described in detail with reference to the attached drawings and the embodiment. The following examples are illustrative only and are not to be construed as limiting the invention.

For better understanding of the present invention, the following first briefly describes the Tibetan tone prediction method in the prior art. The existing Tibetan tone prediction method generally adopts a rule-based method to predict the tone of a text to be processed, for example: and searching the tone type table according to the combination mode of the initial consonants and the simple or compound vowels to obtain the tone type of the syllable. The single tone of syllable in the Tibetan is determined by the initial consonant and the final, wherein the initial consonant is composed of a prefix, an upper plus, a base and a lower plus, and the tone of the initial consonant is divided into high and low classes to indicate that the starting point of the tone of syllable is high and low. In general, the initial consonant is high when the base word is turbid, and the initial consonant is low when the base word is clear. Front addingWords and add words change the tone of the base word. The vowels are composed of vowel characters, postaddition characters and postaddition characters, and can be divided into 3 types according to the vowel tail of the vowel, namely long vowel, short vowel and unit note vowel. When tone prediction is carried out on the syllables in the current word unit, firstly, the type combination of the initials and finals of the current syllable is determined according to an initial and final type table, wherein the initial and final type table is generally constructed by field experts; then, a tone type table is searched for, and syllable tone types of the combination of the initial consonants and the final consonants are determined, wherein the tone type table comprises tone types of various combinations of the initial consonants and the final consonants, and is generally constructed based on a rule method, and the rule describes the language characteristics of the Tibetan language. Such as syllablesWherein the radical isThe front additional word isThe initial consonant isBelong to the high class; the front additional word isWithout changing the initial attribute, the vowel symbol isAfter adding characters asThe final isBelonging to the promotion vowels, thereby determining the combination of the initial consonants and the vowels of the syllables as high initial consonants and promotion vowels, and obtaining the syllables by looking up the tone type tableThe pitch of (f) is pitch fall. However, the prior art does not combine the characteristics of the Tibetan language itself, such as the problem that the tones of the multi-tone mode words at the boundary of different prosody vary with the part of speech is not considered, so that the application effect of the speech system is reduced.

The Tibetan tone prediction method and the Tibetan tone prediction system provided by the invention have the advantages that word segmentation processing is carried out on a Tibetan language text to be processed to obtain each word unit, the part of speech of each word unit is predicted, then the prosodic boundary of the Tibetan language text to be processed is obtained, the word boundary is adjusted according to the part of speech of the prosodic boundary word unit, the problem that multiple tone mode words are different in tone at different prosodic boundaries is solved, and then tone prediction is carried out on the Tibetan language text to be processed after the word unit boundary is adjusted to obtain tone information of the Tibetan language text to be processed. Because the boundary of the word unit at the prosodic boundary is adjusted and the tone of the word unit is predicted according to the part of speech of the word unit after the word boundary is adjusted, the problem of tone change of a multi-tone mode word at different prosodic boundaries is solved, the predicted tone is more accurate, and the application effect of a voice system is effectively improved.

In order to better understand the technical solutions and effects of the present invention, the following detailed descriptions will be made with reference to the flowcharts and specific embodiments.

As shown in fig. 1, it is a flowchart of a Tibetan tone prediction method provided in the embodiment of the present invention, and the method includes the following steps:

step 101, receiving a Tibetan language text to be processed.

And 102, performing word segmentation on the Tibetan language text to be processed to obtain word units.

In this embodiment, word segmentation can be performed according to the Tibetan word segmentation principle.

Specifically, firstly, the double pendants are usedSegmenting for marking, then using double pendantsOr single plumb signPerforming word segmentation in the marked segment to obtain a double-plumb characterOr single plumb signThe text to be processed which is the boundary corresponds to the word segmentation string.

Furthermore, after the corresponding word segmentation string of the text to be processed is obtained, lattice auxiliary words which may be adhesive lattice auxiliary words in the word segmentation string of the text to be processed can be marked, the adhesive lattice auxiliary words refer to word units followed by lattice auxiliary words with adhesive characteristics, the lattice auxiliary words are combined with the last syllable of the adhered word units into a syllable in a form of adding characters later, and the syllable is formed as the final of the syllable in pronunciation.

103, determining the part of speech of each word unit according to the context environment information of each word unit in the Tibetan language text to be processed.

In practical applications, the part-of-speech of each word unit may be determined by looking up a table or using a statistical modeling method, and in this embodiment, the part-of-speech of each word unit is predicted by using a statistical modeling method according to context-related features of each word unit in a sentence, and then the part-of-speech of each word unit is adjusted according to the type of the word unit to ensure the accuracy of the obtained part-of-speech of the word unit, which may specifically include:

dividing the Tibetan language text to be processed into sentences;

predicting the part of speech of each word unit in the sentence;

Step a) clause division is carried out on Tibetan language texts to be processed

The basic composition structure of the Tibetan sentence is 'subject-object-predicate'; therefore, when a sentence is divided into texts to be processed, a predicate verb position is first determined. The embodiment is based on a single plumbDetermining verticals by first-order part-of-speech of previous word unitsWhether the corresponding boundary is a sentence boundary. Wherein the first level parts of speech include: verbs, real words, pronouns, imaginary words, general affixes, and verb configuration affixes, and the clauseing of the Tibetan text to be processed may include:

predicting the first-level part of speech of each word unit;

and determining sentence boundaries according to the primary parts of speech of each word unit.

In practical application, the first-level part of speech of a word unit can be predicted by a statistical modeling method, and during specific prediction, the candidate first-level part of speech of each word unit is obtained by table lookup; then extracting the context relevant characteristics of the current word unit, wherein the context relevant characteristics can be the candidate first-level part of speech of the previous and/or next word unit of the current word unit, the candidate first-level part of speech of the current word unit and the like; and finally, determining the primary part of speech of the current word unit from the candidate primary part of speech of the current word unit by a statistical modeling method according to the context correlation characteristics of the current word unit, wherein the statistical model such as a decision tree model specifically comprises the following steps:

1. the real words specifically include: nouns, adjectives, adverbs, and numerologies;

2. pronouns, e.g. indicating pronouns(I) after,(you),(he);

3. the particle, the particle specifically includes: lattice-aid words, conjunctions, prepositions;

4. a verb;

5. the general affixes specifically include: noun affix, adjective affix, adverb affix, verb constituent affix, the verb constituent affix refers to a verb affix that changes the original semantics after being added to a verb;

6. the verb configuration affix specifically comprises: the verb structure affix is generally added behind the verb without changing the verb affix of the original semanteme of the verb, and the verb structure affix expresses a syntactic function. The verb affix string is the connected form when multiple verb configuration affixes occur simultaneously.

When a sentence boundary is determined according to the first-level part of speech of each word unit, the method specifically includes: judging the single vertical signWhether the first-level part of speech of the previous word unit is verb or verb configuration affix, and if so, considering the single plumb signIs a sentence boundary; otherwise, predicting the sentence boundary based on a statistical modeling method, wherein the statistical model is like a decision tree model to obtain the sentence boundary of the Tibetan language text to be processed.

It should be noted that, in the present embodiment, the double plumb signThe corresponding boundaries are both segment boundaries and sentence boundaries. While the single plumb characterThe corresponding boundaries are not necessarily sentence boundaries but may also be phrase boundaries or word boundaries.

Step b) predicting the part of speech of each word unit in the sentence

Because the first-level part of speech contains various different part of speech information, for example, when the first-level part of speech is a real word, the first-level part of speech contains four kinds of part of speech information of noun, adjective, adverb and digit, the first-level part of speech and tone of a word unit are not in one-to-one relationship, and because the parts of speech of multi-tone mode words are different, the tone may also be different, for example, the tone when the multi-tone mode word is used as a noun may be different from the tone when the multi-tone mode word is used as an adjective. Therefore, after the sentence is divided, the embodiment predicts the specific part of speech of the word unit, namely the secondary part of speech of the word unit according to the context characteristics of the word unit in the sentence; the secondary parts of speech are the same as parts of speech in the usual sense, such as nouns, adjectives, etc. In this embodiment, a secondary part-of-speech prediction is performed on each word unit in each clause, and a specific prediction method is the same as that in the prior art, for example, a statistical modeling method is used for prediction, so as to obtain a secondary part-of-speech of each word unit.

Step c) determining the type of word unit

Because the same word unit in the Tibetan language text can bear different parts of speech in different context environments, different tones are shown, the secondary part of speech range predicted by taking a sentence as a unit is large, and the secondary part of speech obtained by prediction is easy to be inaccurate; in order to more accurately predict the tone information of the Tibetan language text to be processed, the accuracy of the step b) of acquiring the secondary parts of speech of each word unit needs to be ensured. The secondary parts of speech of the word units predicted in step b) can be further adjusted by the type of the word units, and therefore, the type of the word units needs to be determined first.

In practical application, word units can be divided into 4 types according to expression forms of tones in different contexts, namely, multi-tone mode words, dummy words, affixes and conventional words, specifically as follows:

1. the particle: when the virtual words of different initial consonants are in isolated pitch, the tone is different; if the high initial consonant is under the action of different postaddition characters, the tone is h (high), f (low); the tone of the low initial consonant is l (low) and r (high) under the action of different postaddition characters; in the language flow, almost all the tone of the particle word is read as l or r, such as lower read values of the lattice auxiliary word, conjunctive word, preposition, state, tone word and the like;

2. multi-tone mode words: when the multi-tone mode word unit is used as different parts of speech, tones are different, such as:the latin is transcribed to khrom skin, the tone combination mode is hh when making nouns, and the tone combination mode is fr when making verbs.

3. Affix: pronouncing in a weak reading mode is similar to the light sound of Chinese Mandarin, such as ba, wa, bo, pa, po, ma, mo, and the like.

4. The conventional word: there is an inherent pitch change law in the tones. For example, a syllable that is read as a low tone l when two syllables are singly recited constitutes a word, and is not read as ll, but is read as lh, that is, the low tone of the second syllable becomes a high tone.

In this embodiment, the types of word units can be obtained by looking up various word unit type dictionaries, and can also be obtained by manually labeling the types of the Tibetan language text to be processed in advance.

And d) adjusting the part of speech of the word unit according to the type of the word unit.

In the embodiment, the part of speech of each word unit is adjusted according to the type of each word unit, so that the accuracy of the secondary part of speech of each word unit is ensured. For example, the part-of-speech of a word unit may be adjusted based on its type and its contextually relevant characteristics. Furthermore, the part of speech of the word unit can be adjusted through a statistical modeling method. Specifically, the part-of-speech adjustment of a word unit can be classified into the following cases:

1. part-of-speech adjustment of particle units

The role of the dummy word is to connect different sentence components, because the dummy words with different roles are the same in form, but may act as different parts of speech in the actual context, i.e. the dummy word has part of speech of the Facultative type; in this embodiment, the part-of-speech of the particle word is adjusted according to the context environment of the word unit or by using a statistical modeling method, so as to obtain the accurate secondary part-of-speech of each word unit. For example, lattice auxiliary wordsOr possibly a conjunctive word, and the help words can be determined by the connection condition of each component in the sentenceWhether the specific position in the sentence is the word assistant or the word conjunction.

2. Word property adjustment of multi-tone mode word unit

And dividing text segments to be processed by taking the adjusted virtual words as boundaries, performing part-of-speech adjustment on the multi-tone mode word units, and specifically adjusting the part-of-speech of the multi-tone mode words according to the context environment or by using a statistical modeling method. For example, multi-tone mode word units(between the two "/" tokens) parts of speech differ in two sentences:

①.

wherein,when a verb is used at the end of the sentence, occurrence and appearance are indicated, and the tone combination is rl;

②.

wherein,in this sentence, when the noun indicates history and biography, the tone combination is lr.

When the part of speech is adjusted, according toAnd adjusting the environment of the environment correspondingly.

3. Affix unit and regular word unit part-of-speech adjustment

The word unit where the affix unit is located and the conventional word unit directly use the secondary part of speech of each word unit, and the present embodiment does not perform part of speech adjustment. It should be noted that the affix is divided into two types, i.e., an independent affix and an affix part fixedly attached to other word units, in this embodiment, the parts of speech of the independent affix and the affix part are set to be the same, for example, the parts of speech of the independent affix and the affix part fixedly attached to other word units may be the affixes.

And 104, predicting the prosodic boundary of the Tibetan language text to be processed, and adjusting the word unit boundary at the prosodic boundary according to the part of speech of the word unit at the prosodic boundary.

The prosodic boundary refers to a pause between words in speech communication for expressing semantic information, the part of speech of a multi-tone mode word at the prosodic boundary is related to the prosodic boundary, and the intonation changes along with the change of the part of speech, so that the word unit boundary at the prosodic boundary needs to be adjusted according to the part of speech of the word unit at the prosodic boundary.

In this embodiment, firstly, prosody boundary prediction is performed on a text to be processed, for example, prediction of a pause position is prediction of a prosody boundary, a specific boundary prediction process is the same as that of the prior art, for example, prosody boundary prediction can be performed by adopting a statistical modeling method according to context related information of word units; and then, adjusting the word unit boundary at the prosodic boundary according to the secondary part of speech after the word unit adjustment at the prosodic boundary. Specifically, when the type of the word unit at the prosody boundary is a multi-tone mode word unit and the secondary part of speech is a verb or an adjective, the multi-tone mode word unit is split by taking syllables as a unit, and tone prediction is performed by using the split syllable unit.

And 105, performing tone prediction on the syllable unit of the Tibetan language text to be processed after the word unit boundary is adjusted according to the part of speech of each word unit to obtain tone information of the Tibetan language text to be processed.

In practical application, the tone prediction unit may use a syllable unit as a tone bearing unit to perform tone prediction on each syllable unit in all word units in the Tibetan language text to be processed, for example, the tone of each syllable unit is determined by looking up a tone type table according to the tone prediction characteristics, where the tone prediction characteristics may be the initial category of the current syllable, the final category of the current syllable, the initial and final categories of syllables before and after the current syllable, the position of the current syllable in the word unit, the length of the word unit in which the current syllable is located, the part of speech of the word unit in which the current syllable is located, and the like. In addition, the tone of the word unit can be adjusted according to the tone rule of the Tibetan pronunciation, for example, all affixes are set as weak reading, and specifically, all syllable units of the affixes can be set as weak reading, and the common affixes are ba, wa, bo, pa, po, ma, mo.

Further, the tones of each syllable unit obtained by the present embodiment can be applied to the field of speech synthesis, for example, the process of tone prediction performed by the present embodiment can be performed after the speech conversion of the Tibetan language is completed, so that the finally synthesized Tibetan language speech is more natural.

According to the Tibetan tone prediction method provided by the embodiment of the invention, the received Tibetan text to be processed is subjected to word segmentation processing to obtain each word unit, the type of each word unit is determined, and then the acquired word unit boundary at the prosodic boundary is adjusted according to the type of the word unit, so that when the Tibetan text with the word unit boundary adjusted is subjected to tone prediction according to the part of speech of the word unit, the influence of the part of speech of the multi-tone mode word on the tone of the word unit at the prosodic boundary is considered, the problem that the tones of the multi-tone mode word in the Tibetan at different prosodic boundaries are different can be solved, and the Tibetan speech synthesis is more natural.

Correspondingly, the present invention further provides a Tibetan intonation prediction system, as shown in fig. 2, including:

a receiving module 201, configured to receive a Tibetan language text;

a word segmentation module 202, configured to perform word segmentation processing on the Tibetan language text to be processed to obtain word units;

a part-of-speech determining module 203, configured to determine, according to context environment information of the word unit in the Tibetan text to be processed, a part-of-speech of the word unit;

a word unit boundary adjusting module 204, configured to predict a prosodic boundary of the Tibetan language text to be processed, and adjust a word unit boundary at the prosodic boundary according to a part of speech of a word unit at the prosodic boundary;

and the tone predicting module 205 is configured to perform tone prediction on the syllable unit of the Tibetan language text to be processed after the word unit boundary is adjusted according to the part of speech of each word unit, so as to obtain tone information of the Tibetan language text to be processed.

In practical applications, in this embodiment, the part of speech of each word unit is determined by the context-related features of the word unit in the sentence in which the word unit is located, and the part of speech determining module 203 includes:

Wherein, the single vertical sign in the Tibetan language text to be processed is passedPredicting sentence boundaries by the first-level part of speech of the previous word unit or by a statistical modeling method, wherein the sentence dividing unit comprises:

In practical applications, the first sentence boundary determining subunit is specifically configured to: obtaining candidate first-level parts of speech of each word unit through table lookup; extracting context relevant characteristics of the current word unit; and determining the primary part-of-speech of the current word unit from the candidate primary part-of-speech of the current word unit by a statistical modeling method according to the context correlation characteristics of the current word unit. The characteristics are, for example, the first-level part-of-speech information of the word units before and after the current word unit, the position of the current word unit in the sentence, and the like.

In this embodiment, the word unit boundary adjusting module 204 may be specifically configured to: when the word unit at the prosodic boundary is a multi-tone mode word unit and the part of speech is a verb or an adjective, splitting the multi-tone mode word unit by taking syllables as a unit, and performing tone prediction by using the split syllable unit.

Finally, the tone prediction module 205 performs tone prediction on the syllable unit of the Tibetan language text to be processed after the word unit boundary is adjusted according to the part of speech of each word unit obtained by the part of speech determination module 203 to obtain tone information of the Tibetan language text to be processed.

Of course, in practical application, the system may further include: and a storage module (not shown) for storing dictionary information, tone prediction results, and the like. Therefore, the Tibetan language text to be processed can be conveniently and automatically processed by a computer, and the relevant information of the synthesized voice and the like can be stored.

In practical applications, the system may be applied in the field of Tibetan language speech synthesis, for example, the system 301 may be used together with the word-sound conversion system 302 and the like as a subsystem of the text processing system 400, and the Tibetan language text processing is performed by the text processing system to improve the naturalness of the Tibetan language speech synthesis, as shown in fig. 3.

The Tibetan tone prediction system provided by the embodiment of the invention carries out word segmentation on a Tibetan text to be processed received by a receiving module 201 through a word segmentation module 202 to obtain word units, then determines the types of the word units through a part-of-speech determining module 203, and adjusts the word unit boundary at the prosodic boundary of the Tibetan text to be processed according to the result of the part-of-speech determining module 203, so that when the system carries out tone prediction on the word unit at the prosodic boundary according to the part-of-speech of the word unit, the influence of the part-of-speech of multi-tone mode words on the tones of the word unit at the prosodic boundary is considered, the problem that the tones of the multi-tone mode words in the Tibetan at different prosodic boundaries can be solved, and the Tibetan speech synthesis is more natural.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, they are described in a relatively simple manner, and reference may be made to some descriptions of method embodiments for relevant points. The above-described system embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above detailed description of the embodiments of the present invention, and the detailed description of the embodiments of the present invention used herein, is merely intended to facilitate the understanding of the methods and apparatuses of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A Tibetan tone prediction method is characterized by comprising the following steps:

receiving a Tibetan language text to be processed;

2. The method of claim 1, wherein the determining the part of speech of the word unit according to the context information of the word unit in the Tibetan language text to be processed comprises:

dividing the Tibetan language text to be processed into sentences;

predicting the part of speech of the word unit in the sentence;

determining a type of the word unit;

3. The method of claim 2, wherein the clause of the Tibetan text to be processed comprises:

if the first-level part of speech of the word unit before the single plumb character "|" is a verb or verb configuration affix, the single plumb character is a sentence boundary;

if the first-level part of speech of the word unit before the single plumb character "|" is not a verb or verb configuration affix, the sentence boundary is predicted through a statistical modeling method.

4. The method of claim 3, wherein predicting the first-level part-of-speech of each word unit comprises:

acquiring candidate primary parts of speech of each word unit;

extracting context relevant characteristics of the current word unit;

5. The method of claim 2, wherein the type of word unit comprises any one or more of: polyphonic mode words, null words, affixes, and regular words.

6. The method of claim 1, wherein adjusting the word unit boundaries at prosodic boundaries based on the part-of-speech of the word unit at the prosodic boundaries comprises:

7. A Tibetan tone prediction system, comprising:

the receiving module is used for receiving the Tibetan language text to be processed;

8. The system of claim 7, wherein the part of speech determination module comprises:

9. The system of claim 8, wherein the sentence segmentation unit comprises:

the first sentence boundary determining subunit is used for determining the sentence boundary at the single vertical symbol if the first-level part of speech of the word unit before the single vertical symbol "|" is a verb or a verb configuration affix;

and the second sentence boundary determining subunit is used for predicting the sentence boundary through a statistical modeling method if the first-level part of speech of the word unit before the monolaurate "|" is not a verb or verb configuration affix.

10. The system of claim 9, wherein the first sentence boundary determination subunit is specifically configured to: obtaining candidate first-level parts of speech of each word unit through table lookup; extracting context relevant characteristics of the current word unit; and determining the primary part-of-speech of the current word unit from the candidate primary part-of-speech of the current word unit by a statistical modeling method according to the context correlation characteristics of the current word unit.

11. The system of claim 7, wherein the word unit boundary adjustment module is specifically configured to: when the word unit at the prosodic boundary is a multi-tone mode word unit and the part of speech is a verb or an adjective, splitting the multi-tone mode word unit by taking syllables as a unit, and performing subsequent tone prediction by using the split syllable unit.