CN109887499A

CN109887499A - A kind of voice based on Recognition with Recurrent Neural Network is made pauses in reading unpunctuated ancient writings algorithm automatically

Info

Publication number: CN109887499A
Application number: CN201910289742.3A
Authority: CN
Inventors: 张亚飞; 张卫山
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2019-04-11
Filing date: 2019-04-11
Publication date: 2019-06-14

Abstract

Make pauses in reading unpunctuated ancient writings automatically algorithm the invention proposes a kind of voice based on Recognition with Recurrent Neural Network, mode excavation and analysis based on shot and long term memory network realize the automatic punctuate of voice in conjunction with voice messaging core text information.The algorithm is divided into training stage and operation phase: the training stage mainly passes through the corresponding data collection of collection, i.e. audio file and corresponding text file, in conjunction with speech recognition technology, converts the fullstop in text file on the label of punctuate.By Training shot and long term memory network come Optimal Parameters.In the operation phase, make shot and long term memory network output phase that should make pauses in reading unpunctuated ancient writings a little by simple input audio file, and then punctuate cutting is carried out by corresponding program.To be finally reached the purpose that voice is made pauses in reading unpunctuated ancient writings automatically.

Description

A kind of voice based on Recognition with Recurrent Neural Network is made pauses in reading unpunctuated ancient writings algorithm automatically

Technical field

The present invention relates to internet areas and deep learning field, and in particular to a kind of language based on Recognition with Recurrent Neural Network Sound is made pauses in reading unpunctuated ancient writings algorithm automatically.

Background technique

Voice based on Recognition with Recurrent Neural Network is made pauses in reading unpunctuated ancient writings algorithm automatically, the mode excavation based on shot and long term memory network with point Analysis, the automatic punctuate of voice is realized in conjunction with voice messaging core text information.Have closest to technology of the invention:

(1), based on the method that dead time and least energy determine: the voice document feature that this method is spoken according to people, Often can all there be the dead time, be then based on this, according to the method that minimum pause time and least energy determine, to find out sentence The place of pause.This method has the characteristics that easy to operate.But it due to there is multiple parameters to need artificial configuration, is not easy to find most Excellent solution.

(2), based on mixed Gaussian-Hidden Markov Model method: this method is by mixed Gauss model and hidden Ma Erke Husband's model is organically combined, and is modeled by probabilistic inference to acoustic model, and then export translation result, be can be used as one Kind punctuate method.It is that model is smaller a little, is easy to be transplanted to embedded platform, the disadvantage is that contextual information cannot be made full use of.

(3), based on deep neural network-Hidden Markov Model method: this method is by deep neural network and Ma Er Section's husband's model is organically combined, and learns potential expression by deep neural network, in conjunction with the probability of Markov model Infer, to export translation result to carry out voice punctuate.Although deep neural network can learn the transformation of deep layer nonlinear characteristic, But current task can not be assisted to carry out voice punctuate using historical information.

Wherein, the method determined based on dead time and least energy needs human configuration multiple parameters, low efficiency.And base All it is in mixed Gaussian-Hidden Markov Model method and based on deep neural network-Hidden Markov Model method Using currently processed information, using context and historical information therefore cannot need to judge in conjunction with context to make pauses in reading unpunctuated ancient writings for some Place cannot effectively make pauses in reading unpunctuated ancient writings.And the present invention is based on the voice of Recognition with Recurrent Neural Network, punctuate algorithm can efficiently use history automatically Information and contextual information possess better expression effect by historical information persistence, and then when carrying out punctuate processing.

Summary of the invention

To solve shortcoming and defect in the prior art, the invention proposes the voices based on Recognition with Recurrent Neural Network from dynamic circuit breaker Sentence algorithm, mode excavation and analysis based on shot and long term memory network realize voice in conjunction with voice messaging core text information Automatic punctuate.

The technical solution of the present invention is as follows:

A kind of voice based on Recognition with Recurrent Neural Network is made pauses in reading unpunctuated ancient writings algorithm automatically, which is characterized in that shot and long term memory network, voice Identification module, text label conversion module, loss function evaluation module, comprising the following steps:

Step (1), in shot and long term memory module, we by using shot and long term memory network for before and after sequence data The effective processing capacity of dependence, which is established, maps the higher dimensional space of sequence, is translated into a voice insertion vector It piles up and is characterized.

Step (2), in speech recognition module, voice insertion vector is carried out the conversion of text orientation by we, and use is two-way LSTM carries out the mapping between insertion vector.Two-way LSTM helps to make full use of the dependence of front and back, by will be previous The input after information processing as next result is inputted, the effective information of long period can be remembered.

Step (3), text label conversion module, due to being mainly used for the automatic short sentence of voice herein, so its main information For the punctuate information in text information, i.e., fullstop information in text, therefore we will be made pauses in reading unpunctuated ancient writings using text label conversion module Information extracts, the label information as supervised training.

Step (4), in loss function evaluation module, phase that we are exported by calculating label information with voice conversion module It answers the label information of translation result to carry out Similarity measures, obtains corresponding penalty values, then use the update of whole network parameter.

Beneficial effects of the present invention:

(1) pass through shot and long term memory network, it is established that for the dependence between tonic train file front and back, and effectively Learn contextual information, to carry out Feature Mapping.

(2) translation for audio is realized based on certain phonetic rules by speech recognition module, is based on translation result Short sentence situation realize effective assessment for entire model punctuate effect.

(3) by the way that English learner can will be effectively improved with Optimization Platform quality by other platforms of punctuate Module-embedding automatically Learning efficiency, the English material oneself liked can be practiced.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is that the present invention is based on the overview flow charts of the voice of Recognition with Recurrent Neural Network punctuate algorithm automatically.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

The algorithm as shown in Figure 1, voice of the invention based on Recognition with Recurrent Neural Network is made pauses in reading unpunctuated ancient writings automatically, which is characterized in that length Phase memory network, speech recognition module, text label conversion module, loss function evaluation module.

Below with reference to Fig. 1, the detailed process of the punctuate algorithm automatically of the voice based on Recognition with Recurrent Neural Network is carried out specifically It is bright:

Voice based on Recognition with Recurrent Neural Network of the invention is made pauses in reading unpunctuated ancient writings algorithm automatically, passes through shot and long term memory network, it is established that For the dependence between tonic train file front and back, and effectively learn contextual information, to carry out Feature Mapping.Pass through Speech recognition module realizes the translation for audio based on certain phonetic rules, is realized based on the short sentence situation of translation result Effective assessment for entire model punctuate effect.By will other platforms of punctuate Module-embedding can be with Optimization Platform matter automatically Amount, effectively improves the learning efficiency of English learner, can practice the English material oneself liked.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

The algorithm 1. a kind of voice based on Recognition with Recurrent Neural Network is made pauses in reading unpunctuated ancient writings automatically, which is characterized in that shot and long term memory network, voice are known Other module, text label conversion module, loss function evaluation module, comprising the following steps:

Step (1), in shot and long term memory module, we pass through using shot and long term memory network for interdependent before and after sequence data The effective processing capacity of relationship, which is established, maps the higher dimensional space of sequence, is translated into a voice insertion vector to pile up It is characterized.

Step (2), in speech recognition module, voice insertion vector is carried out the conversion of text orientation by we, uses two-way LSTM Carry out the mapping between insertion vector.Two-way LSTM helps to make full use of the dependence of front and back, by by previously input Input after information processing as next result, can remember the effective information of long period.

Step (3), text label conversion module, due to being mainly used for the automatic short sentence of voice herein, so its main information is text Punctuate information in this information, i.e., the fullstop information in text, therefore we will be made pauses in reading unpunctuated ancient writings information using text label conversion module It extracts, the label information as supervised training.

Step (4), in loss function evaluation module, we corresponding are turned over by calculate that label information and voice conversion module export The label information for translating result carries out Similarity measures, obtains corresponding penalty values, then uses the update of whole network parameter.