CN108090038A

CN108090038A - Text punctuate method and system

Info

Publication number: CN108090038A
Application number: CN201610993731.XA
Authority: CN
Inventors: 占吉清; 高建清; 王智国
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2016-11-11
Filing date: 2016-11-11
Publication date: 2018-05-29
Anticipated expiration: 2036-11-11
Also published as: CN108090038B

Abstract

The invention discloses a kind of text punctuate method and system, this method includes：A small amount of text data and its corresponding voice data are collected in advance, build the long-term memory punctuate model based on text punctuate feature and acoustics punctuate feature；When making pauses in reading unpunctuated ancient writings to text, obtain and treat punctuate text and its corresponding voice data；Punctuate text is treated according to respectively and described treats punctuate text corresponding voice data extraction text punctuate feature and acoustics punctuate feature；According to the text punctuate feature, acoustics punctuate feature and the long-term memory punctuate model of extraction, treat that punctuate text is made pauses in reading unpunctuated ancient writings to described.The present invention can effectively improve the accuracy of text punctuate.

Description

Text punctuate method and system

Technical field

The present invention relates to natural language processing fields, and in particular to a kind of text punctuate method and system.

Background technology

In recent years, with practical and hardware store the fast development of speech recognition technology, more and more people practise Usance storage device gets off SoundRec, and the voice data of recording is changed into text data into row information using transcription instrument It preserves rather than traditional manual record mode for remembering when listening records important information.However, voice is carried out to voice data When identification obtains corresponding identification text, text data is often continuous continual, this is highly detrimental to the reading of user and reason Solution such as identifies text for " on the one hand it is the very big parking of this present traffic pressure that this administrative services hall, which could handle affairs in distress, Difficult another aspect is in this window it may be seen that often some because handling affairs him inside this administrative services center It is a point seasonal point of month ", so long passage does not have any punctuate mark, and it is very painstaking that user reads, and It is then understandable more to add the identification text of punctuate mark, " this administrative services is big for the identification text after marking as addition is made pauses in reading unpunctuated ancient writings The Room could handle affairs strategic point/be on the one hand present this traffic pressure very big/parking it is also difficult/on the other hand/in this window/at this Inside a administrative services center/it may be seen that/often some/because handle affairs/he is a point seasonal point of month ". Therefore, how researcher makes pauses in reading unpunctuated ancient writings to text if beginning one's study, in order to improve the reading experience of user.

Existing punctuate method is generally directly carried out by the method for sequence labelling using the term vector information of text data Punctuate, however the term vector is only capable of that text data is described, and can not describe the phase that text data corresponds to voice data Information is closed, so that the accuracy of punctuate is relatively low；In addition, the prior art is generally made pauses in reading unpunctuated ancient writings using sequence labelling model, institute Less historical information can only be remembered by stating sequence labelling model, it is impossible to be remembered the Future Information of each word, be reduced further disconnected The accuracy of sentence.Such as " how I do a thing that her is allowed to come around to the correct way of thinking ", the series model current word of structure is " thing Feelings ", if model cannot remember " thing " historical information " how ", punctuate at " thing " judges, it is more likely that goes out Existing mistake；" word that you say is the word for representing query " for another example, if model can not remember the Future Information of " " word, When punctuate at " " word judges, can also it malfunction.

The content of the invention

The embodiment of the present invention provides a kind of text punctuate method and system, to improve the accuracy of text punctuate.

For this purpose, the present invention provides following technical solution：

A kind of text punctuate method, including：

A small amount of text data and its corresponding voice data are collected in advance, and structure is made pauses in reading unpunctuated ancient writings based on text punctuate feature and acoustics The long-term memory punctuate model of feature；

When making pauses in reading unpunctuated ancient writings to text, obtain and treat punctuate text and its corresponding voice data；

Punctuate text is treated according to respectively and described treats punctuate text corresponding voice data extraction text punctuate feature With acoustics punctuate feature；

According to the text punctuate feature, acoustics punctuate feature and the long-term memory punctuate model of extraction, treated to described Punctuate text is made pauses in reading unpunctuated ancient writings.

Preferably, collection a small amount of text data and its corresponding voice data, structure based on text punctuate feature and The long-term memory punctuate model of acoustics punctuate feature includes：

Collect a small amount of text data and its corresponding voice data；

Using the text data as training data, and mark the punctuate label of the training data；

Text punctuate feature is extracted according to the training data, and is extracted according to the corresponding voice data of the training data Acoustics punctuate feature；

Using the text punctuate feature of extraction and acoustics punctuate feature as training characteristics, training characteristics and described are utilized The punctuate label structure long-term memory punctuate model of training data.

Preferably, the method further includes：A large amount of plain text datas are collected in advance, build text punctuate model, the text This punctuate model includes input layer, one or more hidden layers and output layer；

It is described to be included according to training data extraction text punctuate feature：

The training data is segmented, and the term vector for each word being calculated；

The term vector of each word is inputted into the text punctuate model successively, the last one is hidden according to the text punctuate model The output of layer obtains the text punctuate feature of each word.

Preferably, described to collect a large amount of plain text datas, structure text punctuate model includes：

Collect a large amount of plain text datas；

According to the punctuate position of the plain text data, the punctuate label of text data is marked；

The plain text data is segmented, and calculates the term vector of each word；

According to the term vector of each word and the punctuate label in the plain text data, text punctuate model is built.

Preferably, the text punctuate model is：Two-way LSTM structures or two-way RNN structures, and the hidden layer of each word The input of node is respectively the previous word of current word and the output of the latter word hidden node and the output of current word last layer.

Preferably, it is described to be included according to the corresponding voice data extraction acoustics punctuate feature of the training data：

The corresponding voice data of the training data is alignd；

Included according to the training data after alignment and its voice data extraction acoustics punctuate feature, the acoustics punctuate feature Below any one or more：Pause duration between word, suffix fundamental frequency tendency, in word phoneme be averaged duration, vowel phoneme is put down in word Equal duration, speaker's history are averaged word speed, suffix energy tendency, word tone.

Preferably, the long-term memory punctuate model includes input layer, regular layer, one or more hidden layers, output layer；Rule Flood is used to carry out the different punctuate features of input layer input regular；The input of the hidden node of each word is respectively current word The output of previous word and the latter word hidden node and the output of current word last layer.

Preferably, the method further includes：

It is obtaining after the punctuate result of punctuate text, by the text feedback after punctuate to user；Or

Obtaining after the punctuate result of punctuate text, by the text feedback after addition punctuate mark is needed at punctuate to User.

A kind of text punctuate system, including：

Long-term memory punctuate model construction module, for collecting a small amount of text data and its corresponding voice data in advance, Build the long-term memory punctuate model based on text punctuate feature and acoustics punctuate feature；

Receiving module treats punctuate text and its corresponding voice data for when making pauses in reading unpunctuated ancient writings to text, obtaining；

Text punctuate characteristic extracting module, for treating punctuate Text Feature Extraction text punctuate feature according to；

Acoustics punctuate characteristic extracting module, for treating the corresponding voice data extraction acoustics punctuate of punctuate text according to Feature；

Judgment module, for according to the text punctuate feature, acoustics punctuate feature and the long-term memory punctuate mould Type treats that punctuate text is made pauses in reading unpunctuated ancient writings to described.

Preferably, the long-term memory punctuate model construction module includes：

First data collection module, for collecting a small amount of text data and its corresponding voice data；

First mark unit, for using the text data as training data, and marks the punctuate of the training data Label；

Fisrt feature extraction unit, for extracting text punctuate feature according to the training data；

Second feature extraction unit, for extracting acoustics punctuate feature according to the corresponding voice data of the training data；

First training unit, for the text punctuate feature for extracting the fisrt feature extraction unit and second spy The acoustics punctuate feature of extraction unit extraction is levied as training characteristics, utilizes the training characteristics and the punctuate of the training data Label builds long-term memory punctuate model.

Preferably, the system also includes：

Text punctuate model construction module for collecting a large amount of plain text datas in advance, builds text punctuate model, described Text punctuate model includes input layer, one or more hidden layers and output layer；

The fisrt feature extraction unit includes：

First participle subelement, for being segmented to the training data, and the term vector for each word being calculated；

First extraction subelement, the term vector of each word for successively obtaining the participle subelement input the text Punctuate model obtains the text punctuate feature of each word according to the output of the last one hidden layer of text punctuate model.

Preferably, the text punctuate model construction module includes：

Second data collection module, for collecting a large amount of plain text datas；

Second mark unit, for the punctuate position according to the plain text data, marks the punctuate label of text data；

Second participle unit for being segmented to the plain text data, and calculates the term vector of each word；

Second training unit, for according to the term vector of each word and the punctuate label, structure in the plain text data Build text punctuate model.

Preferably, the second feature extraction unit includes：

Align subelement, for the corresponding voice data of the training data to be alignd；

Second extraction subelement, for extracting acoustics punctuate feature according to the training data after alignment and its voice data, The acoustics punctuate feature include it is following any one or more：Pause duration between word, suffix fundamental frequency tendency, phoneme is averaged in word Be averaged duration, speaker's history of vowel phoneme is averaged word speed, suffix energy tendency, word tone in duration, word.

Preferably, the system also includes：

Feedback module, for being obtained in the judgment module after the punctuate result of punctuate text, by the text after punctuate Feed back to user；Or obtained in the judgment module after the punctuate result of punctuate text, addition at punctuate will needed disconnected Text feedback after sentence mark is to user.

Text punctuate method and system provided in an embodiment of the present invention collect a small amount of text data and its corresponding language in advance Sound data build the long-term memory punctuate model based on text punctuate feature and acoustics punctuate feature；When making pauses in reading unpunctuated ancient writings to text, point Not according to punctuate text and its corresponding voice data extraction text punctuate feature and acoustics punctuate feature is treated, the disconnected of extraction is utilized Sentence feature and the long-term memory punctuate model built in advance carry out punctuate judgement.Due to taking full advantage of text punctuate information and phase The acoustics punctuate information for the voice data answered, as the input of long-term memory punctuate model, so that memory is every simultaneously The historical information and Future Information of a word or word, and the length remembered does not limit, and is effectively guaranteed to make pauses in reading unpunctuated ancient writings to text and predict Accuracy.Further, the text after punctuate can also be shown to user or will by the method and system of the embodiment of the present invention User is shown to after text addition punctuate, improves the experience that user reads text.

Description of the drawings

It in order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only one described in the present invention A little embodiments for those of ordinary skill in the art, can also be obtained according to these attached drawings other attached drawings.

Fig. 1 is the flow chart that long-term memory punctuate model is built in the embodiment of the present invention；

Fig. 2 is the topological structure schematic diagram of text punctuate model in the embodiment of the present invention；

Fig. 3 is the flow chart that text punctuate model is built in the embodiment of the present invention；

Fig. 4 is the topological structure schematic diagram of long-term memory punctuate model in the embodiment of the present invention；

Fig. 5 is the flow chart of text punctuate method of the embodiment of the present invention；

Fig. 6 is a kind of structure diagram of text punctuate system of the embodiment of the present invention.

Specific embodiment

In order to which those skilled in the art is made to more fully understand the scheme of the embodiment of the present invention, below in conjunction with the accompanying drawings and implement Mode is described in further detail the embodiment of the present invention.

Since the term vector of text data is only capable of that text data is described, the corresponding language of text data can not be described The relevant information of sound data in fact, during speaking, can also include very strong punctuate information, such as in language in voice data Tone at sound data outage sentence is often falling tone, and the fundamental frequency value of suffix can be less and less at punctuate, while in voice data On energy, the pause duration between word and word also has obvious variation.In view of this feature, the embodiment of the present invention provides A kind of text punctuate method and system are collected a small amount of text data and its corresponding voice data, build and broken based on text in advance The long-term memory punctuate model of sentence feature and acoustics punctuate feature；When making pauses in reading unpunctuated ancient writings to text, respectively according to treat punctuate text and its Corresponding voice data extraction text punctuate feature and acoustics punctuate feature, punctuate feature and the length that builds in advance using extraction When memory punctuate model carry out punctuate judgement by word.Due to taking full advantage of the sound of text punctuate information and corresponding voice data Punctuate information is learned, as the input of long-term memory punctuate model, so that remembering the history letter of each word or word simultaneously Breath and Future Information, and the length remembered does not limit, and is effectively guaranteed the accuracy to text punctuate prediction.

The building process of the long-term memory punctuate model is as shown in Figure 1, comprise the following steps：

Step 101, a small amount of text data and its corresponding voice data are collected.

Voice data corresponding with text data refers to that text data is identical with the content of voice data, when specifically collecting, Voice data and its corresponding identification text can be directly collected, other textual datas corresponding with voice data can also be collected According to such as text data and voice data of sound novel.In addition, the textual data of multiple fields can be collected according to application demand According to can also only collect the text data in same field, such as treat that punctuate text data for education sector text data, then collects text During notebook data, the text data that education sector carries voice data can be directly collected.

Step 102, using the text data as training data, and the punctuate label of the training data is marked.

If the text data collected includes pointing information, using punctuate position as punctuate position, in the position Punctuate label is marked, the punctuate such as comma, fullstop, question mark, exclamation mark etc. are alternatively other punctuation marks, such as omit certainly Number, pause mark, colon, branch etc.；If the text data collected does not include pointing information, such as text data is identification text Data can then be added punctuate label in each punctuate position of text data, such as made by manually marking corresponding punctuate label Punctuate is represented with 0 or 1 mark punctuate position, 1,0 represents not make pauses in reading unpunctuated ancient writings, or vice versa.

Step 103, text punctuate feature is extracted according to the training data, and according to the corresponding voice of the training data Data extract acoustics punctuate feature.

It should be noted that the text punctuate feature and acoustics punctuate are characterized in can be directed to single word or word , that is to say, that, it is necessary to extract each word or the text punctuate feature of word and acoustics punctuate feature in the training data.Specifically Ground, the text punctuate feature can directly using each word or word term vector or word vector information or from each word to The relevant information extracted in amount or word vector, does not limit this embodiment of the present invention.

The extraction of specific text punctuate feature and acoustics punctuate feature will be described in detail later.

Step 104, it is special using the training using the text punctuate feature of extraction and acoustics punctuate feature as training characteristics The punctuate label of sign and training data structure long-term memory punctuate model.

In embodiments of the present invention, two-way LSTM (Long Short- specifically may be employed in the long-term memory punctuate model Term Memory) structure or two-way RNN (Recurrent neural Network) structure etc., including input layer, regular layer, One or more hidden layers, output layer；Regular layer is used to carry out the different punctuate features of input layer input regular.In units of word Exemplified by, in the model, the input of the hidden node of each word is respectively the previous word of current word and the latter word hidden node Output and current word last layer output, in this way, remembering the historical information of each word in text by hidden layer and not writing letters After breath, each word is corresponded to input of the output as output layer of hidden node, the output of output layer can be that each word is made pauses in reading unpunctuated ancient writings And the output for the probability or output layer do not made pauses in reading unpunctuated ancient writings can also be the judging result whether each word makes pauses in reading unpunctuated ancient writings.

The punctuate feature and the punctuate mark of text data extracted using a small amount of text data with voice data of collection Label are trained model parameter, after training, obtain long-term memory punctuate model, specific training process and prior art class Seemingly, it is not described in detail herein.

It is previously mentioned, in embodiments of the present invention, the text punctuate feature can be from each term vector or word vector The relevant information of middle extraction for example, collecting a large amount of plain text datas in advance, is built text punctuate model, is broken using the text The text punctuate feature of each word of sentence model extraction or word.

The text punctuate model includes input layer, one or more hidden layers and output layer, and DNN specifically may be employed (Deep Neural Networks), CNN (Convolutional Neural Network), two-way LSTM or two-way RNN etc. Structure is illustrated by taking two-way LSTM structures as an example below.

The building process of text punctuate model is as shown in Fig. 2, comprise the following steps in the embodiment of the present invention：

Step 201, a large amount of plain text datas are collected.

The plain text data refers to the text data for not corresponding to voice data, for example can be arrived by network collection.

Step 202, according to the punctuate position of the plain text data, the punctuate label of text data is marked.

The punctuate position in the plain text data is searched successively, and the punctuate can be fullstop, comma, question mark and exclamation Number etc., other punctuation marks are alternatively certainly, such as ellipsis, pause mark, colon, branch, specific punctuate are not construed as limiting.By described in As punctuate position, the place for having punctuate as needs the place made pauses in reading unpunctuated ancient writings for the position of punctuate, is marked at each punctuate position disconnected Sentence label such as represents punctuate label using 0 or 1, and 1 represents punctuate, and 0 represents not make pauses in reading unpunctuated ancient writings；Or vice versa.

Step 203, the plain text data is segmented, and calculates the term vector of each word.

When being segmented to the plain text data, retain the punctuate in text data, by punctuate separately as a word. Specific segmenting method is same as the prior art, and such as the method based on condition random field segments plain text data.Institute's predicate The acquisition methods of vector are same as the prior art, and each word direct vectorization such as is obtained the word of each word using word2vec Vector；Can also each word of random initializtion term vector, using based on the method for neutral net to the word of the initialization to Amount is trained.

The term vector of each word after segmenting is obtained, the acquisition methods of the term vector are same as the prior art, such as use Each word direct vectorization is obtained the term vector of each word by word2vec；Can also each word of random initializtion term vector, The term vector of the initialization is trained using based on the method for neutral net.

Step 204, according to the term vector of each word and the punctuate label in the plain text data, structure text punctuate Model.

The text punctuate model can remember the historical information and Future Information of each word simultaneously, and the historical information refers to The information of one or more words before current word；The Future Information refers to the information of one or more words after current word, specifically The information of how many a words, determines according to application demand before or after memory current word, all before can such as remembering current word The information of word, when remembering the information of word after current word, the random word number determined after memory current word.

A kind of topological structure of the text punctuate model is illustrated in figure 3, specifically includes input layer, hidden layer and output Layer；In order to increase the memory capability to each word historical information and Future Information, use and be bi-directionally connected between hidden node, i.e., often The input of the hidden node of a word is respectively in the output and current word of the previous word of current word and the latter word hidden node One layer of output, wherein the input of first word hidden node is the output of the last one word hidden node, second word hidden layer The output of node and the output of current word last layer node；The input of the last one word hidden node is its previous word hidden layer The output of node, the output of first word hidden node and the output of current word last layer node；Hidden layer can be one layer or more Layer only gives the schematic diagram of one layer of hidden layer in Fig. 3.

During model training, input each term vector in every text data respectively successively by input layer, remember by hidden layer In text data after the historical information of each word and Future Information, each word is corresponded into the output of hidden node as output layer Input exports the either each word punctuate of the probability made pauses in reading unpunctuated ancient writings and do not made pauses in reading unpunctuated ancient writings for each word or the judging result do not made pauses in reading unpunctuated ancient writings.

Model parameter is trained using a large amount of plain text datas of collection, the model parameter carries out for every layer of feature Transition matrix and biasing during conversion after training, obtain text punctuate model.Specific training process and prior art class Seemingly, this will not be detailed here.

If it should be noted that carry out punctuate judgement in units of word, in the structure of above-mentioned text punctuate model Cheng Zhong need not carry out word segmentation processing in step 203, but need to calculate the word vector of each word, correspondingly, in step 204, it is necessary to According to the word vector of each word in plain text data, text punctuate model is built.It is defeated respectively successively by input layer during model training Enter each word vector in every text data, the historical information and Future Information of each word in text data are remembered by hidden layer Afterwards, each word is corresponded to input of the output as output layer of hidden node, exports the probability made pauses in reading unpunctuated ancient writings and do not made pauses in reading unpunctuated ancient writings for each word Either each word punctuate or the judging result do not made pauses in reading unpunctuated ancient writings.

For convenience, carry out illustrating exemplified by punctuate judgement in units of word below.Broken based on above-mentioned text The process that sentence model extracts the text punctuate feature of training data in the step 103 of Fig. 1 in front is as follows：First, to the instruction Practice data to be segmented, and calculate the term vector of each word；Then the term vector of each training data is inputted the text successively to break Sentence model, the text punctuate feature of each word is obtained according to the output of the last one hidden layer of text punctuate model.

Since above-mentioned text punctuate model trains to obtain according to magnanimity plain text data, so that text punctuate model Contain the punctuate information of magnanimity plain text data.It is not only wrapped according to the text punctuate feature (i.e. hidden layer information) of the model extraction Contain the punctuate information of magnanimity plain text data, while contain the historical information and Future Information of each word, thus it is more favourable In accurately building the long-term memory punctuate model.

It continues in the step 103 of definition graph 1 according to the corresponding voice data extraction acoustics punctuate of the training data The process of feature, specifically includes following steps：

(1) the corresponding voice data of training data is alignd, such as using dynamic programming method by textual data According to aliging with voice data or being alignd using other methods, the embodiment of the present invention does not limit.

(2) acoustics punctuate feature is extracted according to the training data after alignment and its voice data.

Due to before and after punctuate, acoustically having different acoustics performances in voice data, therefore, in order to preferably build Long-term memory model, the embodiment of the present invention is from the acoustically extraction acoustics punctuate feature of voice data, for describing text data Performance acoustically.In embodiments of the present invention, the acoustics punctuate feature of extraction can include it is following any one or more： Pause duration between word, suffix fundamental frequency tendency, in word phoneme be averaged duration, be averaged duration, speaker's history of vowel phoneme is put down in word Equal word speed, suffix energy tendency, word tone etc..

Above-mentioned each acoustics punctuate feature is described in detail respectively below.

1) pause duration between word

Pause duration refers to current word and corresponds to time between voice data voice data corresponding with the latter word between institute's predicate Interval specifically during extraction, directly acquires the time span of current word end position and the latter word starting position, if The last one word, value 0.

2) suffix fundamental frequency tendency

The suffix fundamental frequency tendency refers to continuous fundamental frequency value at the suffix of current word, specifically during extraction, is looked on voice data To the end position of current word, the fundamental frequency value of multiframe voice data respectively forwardly and is backward taken to be used as successively from the end position and is worked as The fundamental frequency tendency of preceding word suffix, can specifically use the vector expression of 1 × n, and wherein n is represented forward backward and total frame of present frame Number, how many frame specifically taken forward and backward, are determined according to application demand or experimental result, and 7 frames is such as taken to take the base of 4 frames backward forward Frequency is worth, in addition present frame totally 12 frame fundamental frequency value, the fundamental frequency tendency vector of one 12 dimension obtained from.

3) phoneme is averaged duration in word

The phoneme duration that is averaged refers to the average duration that each phoneme is included in current word in institute's predicate, during specific extraction, first will Current word is converted to aligned phoneme sequence, and specific transfer process is same as the prior art, counts the number of phonemes included in word, then obtains and work as Preceding word corresponds to the duration of voice data, calculates current word duration and the ratio of number of phonemes, obtains phoneme in word and is averaged duration.

4) vowel phoneme is averaged duration in word

The vowel phoneme duration that is averaged refers to the average duration of each vowel phoneme included in current word in institute's predicate, specifically carries When taking, according to the aligned phoneme sequence of current word, the vowel phoneme number included in word is counted, according to the corresponding voice data of current word, Find the corresponding voice data of each vowel, obtain the duration of each vowel phoneme, by current word all vowel phonemes when Current word vowel phoneme total duration is obtained after long addition, the ratio for calculating vowel phoneme total duration and vowel phoneme number is worth in word Vowel phoneme is averaged duration.

5) speaker's history is averaged word speed

Speaker's history word speed that is averaged refers to the speed of speaking that is averaged that speaker in voice data is corresponded to by the end of current word Degree when specifically calculating, directly counts the voice data total duration by the end of the word sum of current word and by the end of current word, meter The ratio for calculating institute predicate sum and total duration is worth to speaker's history and is averaged word speed.

6) suffix energy tendency

The suffix energy tendency refers to the energy variation that current suffix corresponds to voice data；During specific extraction, in voice The end position of current word is found in data, calculates the energy of multiframe voice data backward forward successively respectively from the end position Value, first-order difference, second differnce represent that wherein m is represented forward backward and total frame of present frame using size for the matrix of m × 3 Number, can specifically determine according to application demand or experimental result, and 3-dimensional is corresponding respectively to represent energy value, first-order difference value, two jumps Score value.

7) word tone

Institute's predicate tone feature refers to the tone of current word, and specific extracting method is same as the prior art, the tone of each word Totally 5 kinds, i.e., softly, high and level tone, rising tone, upper sound, falling tone, specific each word is which tone can use digital representation, such as uses 0 To 4 represent respectively softly, high and level tone, rising tone, upper sound, falling tone.

It should be noted that if carrying out punctuate judgement in units of word, then above-mentioned each acoustics punctuate feature is also the same It needs to do accommodation.

Based on the text punctuate feature and acoustics punctuate feature extracted above, one kind of the long-term memory punctuate model of structure Topological structure is as shown in figure 4, specifically include input layer, regular layer, hidden layer, output layer；Regular layer is used for inputting different punctuates Feature progress is regular, in Fig. 4, employs two layers of regular layer and carries out regular, the knot between hidden node to input punctuate feature Structure is identical with text punctuate model shown in Fig. 3, and hidden layer can have one or more layers, and one layer is only provided in Fig. 4 as example.

Since the difference in dynamic range of above-mentioned different punctuate feature values is larger, it in practical applications, can will not Same punctuate feature is divided into input of the different groups as model, if these above-mentioned punctuate features are divided into four groups in Fig. 4, point It is not：

First group：Pause duration between word, in word phoneme be averaged duration, vowel phoneme is averaged duration, speaker's history in word Average word speed, word tone, are represented using SentDVec1；

Second group：Suffix pitch variation, is represented using SentDVec2；

3rd group：Suffix energy variation, is represented using SentDVec3；

4th group：Text punctuate feature, is represented using HiddenVec；

As shown in figure 4, SentDVec1 is directly over regular layer, obtain it is regular after feature Vec1, by SentDVec2 After being spliced into a vector with SentDVec3, by regular layer, obtain it is regular after feature Vec2, feature normalization method such as formula (1) shown in：

Veci=f (W*sentDVeci+b) (1)

Wherein, Veci is feature vector after regular, sentDVeci be regular preceding feature vector, W and b for regular weight and Regular biasing, while be also long-term memory punctuate model parameter, specific value can be trained to obtain according to mass data.

The feature vector obtained after regular to the regular layer of first layer respectively is spliced, using spliced feature vector as The input feature vector of second regular layer, obtain second it is regular after feature vector SEDVec, specific regular method is identical with formula (1), Details are not described herein.

By feature vector SEDVec of the regular layer of the second layer after regular, with text punctuate feature HiddenVec collectively as Each word after the historical information of each word in hidden layer memory text and Future Information, is corresponded to hidden layer section by the input of hidden layer Input of the output of point as output layer is exported as whether each word punctuate and the probability do not made pauses in reading unpunctuated ancient writings or each word make pauses in reading unpunctuated ancient writings Judging result.

It should be noted that in practical applications, the long-term memory punctuate model can also be only with one layer of regular layer Structure, naturally it is also possible to not including regular layer, but before model training, punctuate feature will carry out regular processing accordingly, This embodiment of the present invention is not limited.

Based on above by a small amount of text data and its corresponding voice data is collected in advance, structure is made pauses in reading unpunctuated ancient writings based on text The long-term memory punctuate model of feature and acoustics punctuate feature, the flow of text punctuate method of the embodiment of the present invention as shown in figure 5, Comprise the following steps：

Step 501, collect a small amount of text data and its corresponding voice data in advance, structure based on text punctuate feature and The long-term memory punctuate model of acoustics punctuate feature.

It has been described in detail before the specific building process of long-term memory punctuate model, details are not described herein.

Step 502, when making pauses in reading unpunctuated ancient writings to text, obtain and treat punctuate text and its corresponding voice data.

The voice data can be recorded according to application demand, the voice data of recording be carried out obtain after speech recognition pair The identification text answered.

Step 503, punctuate text is treated according to respectively and described treats punctuate text corresponding voice data extraction text Punctuate feature and acoustics punctuate feature.

It should be noted that in practical applications, the text punctuate feature can directly using the word of each word or word to Amount or word vector information or the relevant information extracted from each term vector or word vector, than being based on as previously described The output of the text punctuate feature, i.e. the last one hidden layer of text punctuate model of text punctuate model extraction.

Equally, according to text data correspond to voice data extraction acoustics punctuate feature can include it is following any one or It is a variety of：Pause duration between word, suffix fundamental frequency tendency, in word phoneme be averaged duration, be averaged duration, speaker of vowel phoneme goes through in word History is averaged word speed, suffix energy tendency, word tone etc..

Step 504, according to the text punctuate feature, acoustics punctuate feature and the long-term memory punctuate model of extraction, Treat that punctuate text is made pauses in reading unpunctuated ancient writings to described.

Using the punctuate feature for treating punctuate text as the input feature vector of long-term memory punctuate model, made pauses in reading unpunctuated ancient writings using long-term memory Model treats punctuate text and makes pauses in reading unpunctuated ancient writings, and obtains each word or word corresponds to punctuate and the probability do not made pauses in reading unpunctuated ancient writings, if the punctuate is general Rate is more than preset threshold value, then makes pauses in reading unpunctuated ancient writings after current word or word；Otherwise do not make pauses in reading unpunctuated ancient writings after current word or word.Certainly, it is described The output result of long-term memory punctuate model can also be the judging result whether made pauses in reading unpunctuated ancient writings after current word or word.

It should be noted that after the punctuate result of text is obtained, the text after punctuate can be fed directly to user, User can also be fed back to again after addition mark at corresponding punctuate, the punctuate mark is broken this such as using oblique line, space The text of sentence demarcates, and the specific label symbol embodiment of the present invention is not construed as limiting, it is, of course, also possible to add what is be suitble at punctuate User is fed back to after punctuate again, can further promote user's reading experience.

Text punctuate method provided in an embodiment of the present invention collects a small amount of text data and its corresponding voice number in advance According to building the long-term memory punctuate model based on text punctuate feature and acoustics punctuate feature；When making pauses in reading unpunctuated ancient writings to text, difference root According to punctuate text and its corresponding voice data extraction text punctuate feature and acoustics punctuate feature is treated, the punctuate spy of extraction is utilized The long-term memory punctuate model built in advance of seeking peace carries out punctuate judgement by word.Due to taking full advantage of text punctuate information and phase The acoustics punctuate information for the voice data answered, as the input of long-term memory punctuate model, so that memory is every simultaneously The historical information and Future Information of a word, and the length remembered does not limit, and the memory length can be determined according to application demand Degree is effectively guaranteed the accuracy to text punctuate prediction.Further, the method for the embodiment of the present invention can also will make pauses in reading unpunctuated ancient writings Text afterwards is shown to user or is shown to user after the text is added punctuate, improves the experience that user reads text.

Correspondingly, the embodiment of the present invention also provides a kind of text punctuate system, as shown in fig. 6, being a kind of knot of the system Structure schematic diagram.

In this embodiment, the system comprises：

Long-term memory punctuate model construction module 601, for collecting a small amount of text data and its corresponding voice number in advance According to building the long-term memory punctuate model based on text punctuate feature and acoustics punctuate feature；

Receiving module 602 treats punctuate text and its corresponding voice data for when making pauses in reading unpunctuated ancient writings to text, obtaining；

Text punctuate characteristic extracting module 603, for treating punctuate Text Feature Extraction text punctuate feature according to；

Acoustics punctuate characteristic extracting module 604, for treating the corresponding voice data extraction acoustics of punctuate text according to Punctuate feature；

Judgment module 605, for being made pauses in reading unpunctuated ancient writings according to the text punctuate feature, acoustics punctuate feature and the long-term memory Model treats that punctuate text is made pauses in reading unpunctuated ancient writings to described.

It should be noted that in practical applications, the text punctuate feature can directly using each word term vector or The word vector information of each word or the relevant information extracted from each term vector or word vector, than base as previously described In the output of the text punctuate feature, i.e. the last one hidden layer of text punctuate model of text punctuate model extraction.

Above-mentioned long-term memory punctuate model construction module 601 can specifically include following each unit：

Equally, above-mentioned fisrt feature extraction unit can be directly by the term vector of each word in the training data or each word Word vector information extracts relevant information as the text punctuate feature of the word or word or from each term vector or word vector As the text punctuate feature of the word or word, for example, a large amount of plain text datas are collected by text punctuate model construction module in advance, Text punctuate model is built, utilizes the text punctuate feature of each word of the text punctuate model extraction or word.For example, described first A kind of concrete structure of feature extraction unit can include：First participle subelement and the first extraction subelement, wherein, the first participle Subelement is for segmenting the training data, and the term vector for each word being calculated；First extraction subelement, is used for The term vector for each word that the participle subelement is obtained successively inputs the text punctuate model, according to the text punctuate mould The output of the last one hidden layer of type obtains the text punctuate feature of each word.

The text punctuate model construction module can as present system a part or independently of this hair A physical entity or logic unit for bright system.A kind of concrete structure of text punctuate model construction module may include following Each unit：

A kind of topological structure of the text punctuate model as shown in figure 3, including input layer, one or more hidden layers and Output layer specifically may be employed the structures such as DNN, CNN, two-way LSTM or two-way RNN, this embodiment of the present invention do not limited.

Based on above-mentioned text punctuate model, the term vector input of each word can be somebody's turn to do by the fisrt feature extraction unit successively Text punctuate model obtains the text punctuate feature of each word according to the output of the last one hidden layer of text punctuate model.

Above-mentioned second feature extraction unit can include：Align subelement and the second extraction subelement, wherein：The alignment Subelement is used to align the corresponding voice data of the training data；The second extraction subelement is used for basis Training data after alignment and its voice data extraction acoustics punctuate feature, the acoustics punctuate feature include it is following any one It is or a variety of：Pause duration between word, suffix fundamental frequency tendency, in word phoneme be averaged duration, vowel phoneme is averaged duration, speaker in word History is averaged word speed, suffix energy tendency, word tone, has specifically before the definition of each acoustics punctuate feature and extracting mode Bright, details are not described herein.

A kind of topological structure of the long-term memory punctuate model as shown in figure 4, including input layer, regular layer, one or Multiple hidden layers, output layer；Regular layer is used to carry out the different punctuate features of input layer input regular；The hidden node of each word Input be respectively the previous word of current word and the output of the latter word hidden node and the output of current word last layer.

Using the long-term memory punctuate model, above-mentioned judgment module 606 will treat the punctuate feature of punctuate text as length When remember punctuate model input feature vector, treat punctuate text using long-term memory punctuate model and make pauses in reading unpunctuated ancient writings, obtain each word Corresponding punctuate and the probability do not made pauses in reading unpunctuated ancient writings, if the punctuate probability is more than preset threshold value, make pauses in reading unpunctuated ancient writings after current word；It is no Do not make pauses in reading unpunctuated ancient writings after current word then.Certainly, whether the output result of the long-term memory punctuate model can also be broken after current word The judging result of sentence.

In another embodiment of present system, the system can also further comprise：Feedback module, for sentencing described Disconnected module 606 is obtained after the punctuate result of punctuate text, by the text feedback after punctuate to user；Or in the judgement mould Block 606 is obtained after the punctuate result of punctuate text, is needing to feed back to user after addition punctuate mark at punctuate.

Text punctuate system provided in an embodiment of the present invention collects a small amount of text data and its corresponding voice number in advance According to building the long-term memory punctuate model based on text punctuate feature and acoustics punctuate feature；When making pauses in reading unpunctuated ancient writings to text, difference root According to punctuate text and its corresponding voice data extraction text punctuate feature and acoustics punctuate feature is treated, the punctuate spy of extraction is utilized The long-term memory punctuate model built in advance of seeking peace carries out punctuate judgement by word.Due to taking full advantage of text punctuate information and phase The acoustics punctuate information for the voice data answered, as the input of long-term memory punctuate model, so that memory is every simultaneously The historical information and Future Information of a word, and the length remembered does not limit, and the note can be specifically determined according to application demand Recall length, be effectively guaranteed the accuracy to text punctuate prediction.Further, the system of the embodiment of the present invention can also incite somebody to action Text after punctuate is shown to user or is shown to user after the text is added punctuate, improves user and reads text Experience.

Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment Point just to refer each other, and the highlights of each of the examples are difference from other examples.It is real especially for system For applying example, since it is substantially similar to embodiment of the method, so describing fairly simple, related part is referring to embodiment of the method Part explanation.System embodiment described above is only schematical, wherein described be used as separating component explanation Unit may or may not be physically separate, the component shown as unit may or may not be Physical location, you can be located at a place or can also be distributed in multiple network element.It can be according to the actual needs Some or all of module therein is selected to realize the purpose of this embodiment scheme.Those of ordinary skill in the art are not paying In the case of creative work, you can to understand and implement.

The embodiment of the present invention is described in detail above, specific embodiment used herein carries out the present invention It illustrates, the explanation of above example is only intended to help to understand the method and system of the present invention；Meanwhile for the one of this field As technical staff, thought according to the invention, there will be changes in specific embodiments and applications, to sum up institute It states, this specification content should not be construed as limiting the invention.

Claims

A kind of 1. text punctuate method, which is characterized in that including：

A small amount of text data and its corresponding voice data are collected in advance, and structure is based on text punctuate feature and acoustics punctuate feature Long-term memory punctuate model；

When making pauses in reading unpunctuated ancient writings to text, obtain and treat punctuate text and its corresponding voice data；

Punctuate text is treated according to respectively and described treats punctuate text corresponding voice data extraction text punctuate feature harmony Learn punctuate feature；

According to the text punctuate feature, acoustics punctuate feature and the long-term memory punctuate model of extraction, wait to make pauses in reading unpunctuated ancient writings to described Text is made pauses in reading unpunctuated ancient writings.
2. according to the method described in claim 1, it is characterized in that, a small amount of text data of the collection and its corresponding voice number According to building the long-term memory punctuate model based on text punctuate feature and acoustics punctuate feature includes：

Collect a small amount of text data and its corresponding voice data；

Using the text data as training data, and mark the punctuate label of the training data；

Text punctuate feature is extracted according to the training data, and acoustics is extracted according to the corresponding voice data of the training data Punctuate feature；

Using the text punctuate feature of extraction and acoustics punctuate feature as training characteristics, the training characteristics and the training are utilized The punctuate label structure long-term memory punctuate model of data.
3. according to the method described in claim 2, it is characterized in that, the method further includes：A large amount of plain text numbers are collected in advance According to structure text punctuate model, the text punctuate model includes input layer, one or more hidden layers and output layer；

It is described to be included according to training data extraction text punctuate feature：

The training data is segmented, and the term vector for each word being calculated；

The term vector of each word is inputted into the text punctuate model successively, according to the last one hidden layer of text punctuate model Output obtains the text punctuate feature of each word.
4. according to the method described in claim 3, it is characterized in that, described collect a large amount of plain text datas, structure text punctuate Model includes：

Collect a large amount of plain text datas；

According to the punctuate position of the plain text data, the punctuate label of text data is marked；

The plain text data is segmented, and calculates the term vector of each word；

According to the term vector of each word and the punctuate label in the plain text data, text punctuate model is built.
5. according to the method described in claim 3, it is characterized in that, the text punctuate model is：Two-way LSTM structures or double To RNN structures, and the input of the hidden node of each word is respectively the previous word of current word and the latter word hidden node Output and the output of current word last layer.
6. according to the method described in claim 2, it is characterized in that, described carry according to the corresponding voice data of the training data Acoustics punctuate feature is taken to include：

The corresponding voice data of the training data is alignd；

According to the training data after alignment and its voice data extraction acoustics punctuate feature, the acoustics punctuate feature includes following Any one or more：Pause duration between word, suffix fundamental frequency tendency, phoneme is averaged duration, vowel phoneme mean time in word in word Long, speaker's history is averaged word speed, suffix energy tendency, word tone.
7. according to the method described in claim 1, it is characterized in that, the long-term memory punctuate model includes input layer, regular Layer, one or more hidden layers, output layer；Regular layer is used to carry out the different punctuate features of input layer input regular；Each word Hidden node input be respectively the previous word of current word and the output of the latter word hidden node and current word last layer Output.
8. according to claim 1-7 any one of them methods, which is characterized in that the method further includes：

It is obtaining after the punctuate result of punctuate text, by the text feedback after punctuate to user；Or

It is obtaining after the punctuate result of punctuate text, the text feedback at punctuate after addition punctuate mark will needed to use Family.
9. a kind of text punctuate system, which is characterized in that including：

Long-term memory punctuate model construction module, for collecting a small amount of text data and its corresponding voice data in advance, structure Long-term memory punctuate model based on text punctuate feature and acoustics punctuate feature；

Receiving module treats punctuate text and its corresponding voice data for when making pauses in reading unpunctuated ancient writings to text, obtaining；

Text punctuate characteristic extracting module, for treating punctuate Text Feature Extraction text punctuate feature according to；

Acoustics punctuate characteristic extracting module, for treating that the corresponding voice data extraction acoustics punctuate of punctuate text is special according to Sign；

Judgment module, it is right for according to the text punctuate feature, acoustics punctuate feature and the long-term memory punctuate model It is described to treat that punctuate text is made pauses in reading unpunctuated ancient writings.
10. system according to claim 9, which is characterized in that the long-term memory punctuate model construction module includes：

First data collection module, for collecting a small amount of text data and its corresponding voice data；

First mark unit, for using the text data as training data, and marks the punctuate label of the training data；

Fisrt feature extraction unit, for extracting text punctuate feature according to the training data；

Second feature extraction unit, for extracting acoustics punctuate feature according to the corresponding voice data of the training data；

First training unit, text punctuate feature and the second feature for the fisrt feature extraction unit to be extracted carry The acoustics punctuate feature that unit extracts is taken to utilize the training characteristics and the punctuate label of the training data as training characteristics Build long-term memory punctuate model.
11. system according to claim 10, which is characterized in that the system also includes：

Text punctuate model construction module for collecting a large amount of plain text datas in advance, builds text punctuate model, the text Punctuate model includes input layer, one or more hidden layers and output layer；

The fisrt feature extraction unit includes：

First participle subelement, for being segmented to the training data, and the term vector for each word being calculated；

First extraction subelement, the term vector of each word for successively obtaining the participle subelement input the text and make pauses in reading unpunctuated ancient writings Model obtains the text punctuate feature of each word according to the output of the last one hidden layer of text punctuate model.
12. system according to claim 11, which is characterized in that the text punctuate model construction module includes：

Second data collection module, for collecting a large amount of plain text datas；

Second mark unit, for the punctuate position according to the plain text data, marks the punctuate label of text data；

Second participle unit for being segmented to the plain text data, and calculates the term vector of each word；

Second training unit, for according to the term vector of each word and the punctuate label in the plain text data, structure text This punctuate model.
13. system according to claim 11, which is characterized in that the text punctuate model is：Two-way LSTM structures or Two-way RNN structures, and the input of the hidden node of each word is respectively the previous word of current word and the latter word hidden node Output and current word last layer output.
14. system according to claim 10, which is characterized in that the second feature extraction unit includes：

Align subelement, for the corresponding voice data of the training data to be alignd；

Second extraction subelement, it is described for extracting acoustics punctuate feature according to the training data after alignment and its voice data Acoustics punctuate feature include it is following any one or more：Pause duration between word, suffix fundamental frequency tendency, phoneme mean time in word Be averaged duration, speaker's history of vowel phoneme is averaged word speed, suffix energy tendency, word tone in long, word.
15. system according to claim 9, which is characterized in that the long-term memory punctuate model includes input layer, regular Layer, one or more hidden layers, output layer；Regular layer is used to carry out the different punctuate features of input layer input regular；Each word Hidden node input be respectively the previous word of current word and the output of the latter word hidden node and current word last layer Output.
16. according to claim 9-15 any one of them systems, which is characterized in that the system also includes：

Feedback module, for being obtained in the judgment module after the punctuate result of punctuate text, by the text feedback after punctuate To user；Or obtained in the judgment module after the punctuate result of punctuate text, addition punctuate mark at punctuate will needed Text feedback after note is to user.