CN110377889B

CN110377889B - Text editing method and system based on feedforward sequence memory neural network

Info

Publication number: CN110377889B
Application number: CN201910487145.1A
Authority: CN
Inventors: 吴立刚; 刘迪; 邱镇; 黄晓光; 浦正国; 梁翀; 韩涛; 张天奇; 余江斌; 宋杰; 何东; 郭庆; 吴小华; 胡心颖; 周伟
Original assignee: State Grid Corp of China SGCC; State Grid Information and Telecommunication Co Ltd; State Grid Zhejiang Electric Power Co Ltd; Anhui Jiyuan Software Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Information and Telecommunication Co Ltd; State Grid Zhejiang Electric Power Co Ltd; Anhui Jiyuan Software Co Ltd
Priority date: 2019-06-05
Filing date: 2019-06-05
Publication date: 2023-06-20
Anticipated expiration: 2039-06-05
Also published as: CN110377889A

Abstract

The invention discloses a text editing method based on a feedforward sequence memory neural network, which belongs to the technical field of voice signal processing and comprises the following steps: acquiring an original text to be edited; receiving edited voice data; performing voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command; and carrying out semantic understanding on the editing command, and executing the editing command. According to the technical scheme, the improved feedforward sequence memory neural network is adopted for voice recognition, and text editing is accurate and efficient.

Description

Text editing method and system based on feedforward sequence memory neural network

Technical Field

The invention belongs to the technical field of voice signal processing, and particularly relates to a text editing method and system based on a feedforward sequence memory neural network.

Background

With the popularity of mobile phones, people receive a large amount of text information on portable devices such as mobile phones or tablet computers every day. For example, messages pushed by short messages, instant messaging software or other software, web page content, text news, etc. When people want to edit the text content of interest in the text information, firstly, the cursor is required to be positioned at the text content of interest, and then the selected text is subjected to subsequent operations, such as adding the text at the cursor position, replacing the selected text, and the like, so that the editing process is complex and inconvenient. The prior art is to receive voice data recorded by a user, and then execute corresponding editing operation on an editing object according to the voice data. Therefore, when the user edits the text, the user can directly and rapidly select the editing object in the text without complex text selection operation, and the user can directly edit the editing object through voice input, so that the text editing process is simplified. However, the current voice data is directly received and then operated, no voice is processed, and under the conditions of strong far field and noise interference, the performance of the voice recognition system is not ideal enough, so that text editing is inaccurate.

Disclosure of Invention

In order to solve the defects in the prior art, the invention aims to provide a text editing method based on a feedforward sequence memory neural network, which adopts the feedforward sequence memory neural network based on improvement to carry out voice recognition, so that the text editing is more accurate and efficient.

In order to solve the technical problems, the invention adopts the following technical scheme:

in one aspect, the invention provides a text editing method based on a feedforward sequence memory neural network, which comprises the following specific steps:

s1: acquiring an original text to be edited;

s2: receiving edited voice data;

s3: performing voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command;

s4: and carrying out semantic understanding on the editing command, and executing the editing command.

Further preferably, the improved feedforward sequence memory neural network is characterized in that a low-dimensional linear projection layer is inserted between hidden layers of a feedforward fully-connected neural network, a memory module is arranged on the linear projection layer, and jump connection is added between adjacent memory modules, so that the output of the low-layer memory module can be directly accumulated and added to the high-layer memory module.

Further preferably, the memory module is a tap delay structure that encodes hidden layer outputs at a current time and a previous time through a set of coefficients to obtain a fixed representation.

Further preferably, the operation of the memory module employs scalar or vector based encoding.

Further preferably, the encoding of the memory module introduces a stride factor.

On the other hand, the invention also provides a text editing system based on the feedforward sequence memory neural network, which comprises the following steps:

the acquisition unit is configured to acquire an original text to be edited;

a receiving unit; configured to receive edit speech data;

the recognition unit is configured to perform voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command;

and the output unit is configured to perform semantic understanding on the editing command, execute the editing command and output an editing text.

In another aspect, the present invention also provides an apparatus, including:

one or more processors;

a memory for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to perform any of the feedforward sequence memory neural network-based text editing methods of examples of the invention.

In another aspect, the present invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements any of the feedforward sequence memory neural network-based text editing of the examples of the invention.

Compared with the prior art, the invention has the beneficial effects that:

according to the text editing method based on the feedforward sequence memory neural network, which is disclosed by the invention, the original text to be edited is obtained, the voice data input by the user is received, and the corresponding editing operation is executed on the editing object according to the voice data, so that the user can directly and rapidly select the editing object in the text without complex text selection operation when editing the text, and the user can directly edit the editing object through voice input, so that the text editing process is simplified. In addition, the voice recognition is carried out on the edited voice data by adopting a feedforward sequence memory neural network based on improvement, so that the text editing is more accurate and efficient.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings, in which:

FIG. 1 is a schematic flow diagram of one embodiment of the present invention;

FIG. 2 is a block diagram of an improved feedforward sequence memory neural network.

Detailed Description

The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the invention are shown in the drawings.

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

As shown in fig. 1, an embodiment of the present invention provides a text editing method based on a feedforward sequence memory neural network, which specifically includes the steps of:

s1: acquiring an original text to be edited;

s2: receiving edited voice data;

The improved feedforward sequence memory neural network is characterized in that a low-dimensional linear projection layer is inserted between hidden layers of a feedforward fully-connected neural network, a memory module is arranged on the linear projection layer, and jump connection is added to adjacent memory modules, so that the output of the low-layer memory module can be directly accumulated and added to a high-layer memory module.

The memory module is a tap delay structure, and hidden layer output at the current moment and the previous moment is encoded through a group of coefficients to obtain a fixed expression.

The memory module operates using scalar or vector based coding.

The coding of the memory module introduces a stride factor, and a specific calculation formula is as follows:

wherein the method comprises the steps of

Output of the memory module representing the previous cFSMN-layer, s1 and s2 represent look-back and look-forward, respectively

Stride for future viewing. If s1=2 then this means that one input is taken for each moment in time when the history is encoded. Thus in the same way

In order, a longer history can be seen, so that the long-term correlation can be modeled more effectively.

The performance of the improved feedforward sequence memory neural network (cFSMN) of this example on the SWB database, the number of model parameters and the training time for each iteration are compared with the performance of the existing Sigmoid-DNN, LSTM, BLSTM, sFSMN and vFSMN speech recognition systems, see Table 1:

table 1: performance of speech recognition system on SWB database, model parameters and training time per iteration

Experimental results indicate that models that can effectively model long-term correlations, such as LSTM and FSMN, can achieve significant performance improvements in DNN. LSTM-iterations take 9.5 hours, while BLSTM takes 23.2 hours. This is because NVIDIA Tesla K20GPU memory is only 3GB, so that BLSTM based on BPTT training can only use 16-sentence parallelism, while LSTM can use 64-sentence parallelism. The proposed vfmn can achieve a small performance improvement over the BLSTM. The model structure of the vFSMN is simpler, the training speed is faster, the training of the vFSMN for one iteration approximately takes 6.9 hours, and 3 times of training acceleration can be obtained compared with BLSTM. But the model parameters of the vfmn are more than those of the BLSTM. Further, the proposed cFSMN can reduce the overall parameters of the model to 74MB, which can reduce the amount of parameters by 60% compared to BLSTM. More importantly, only 3.0 hours are required for each iteration, approximately 7 times the training acceleration can be achieved compared to BLSTM. Furthermore, a model based on cFSMN can obtain a word error rate of 12.5% and an absolute performance improvement of 0.9% over BLSTM.

The improved feedforward sequence memory neural network is represented as 216-Nx [2048-P (N) ₁ ,N ₂ )]-mx2048-P-8911, wherein N and M represent the number of cfmn-layers and standard fully connected layers, respectively. P is the number of nodes of the low rank linear projection layer. N (N) ₁ ,N ₂ Representing the filter order for review and review, respectively. Performance tests of different configurations using the improved feedforward sequence memory neural network (cFSMN) acoustic model at FSH tasks are shown in Table 2:

table 2: performance of different configurations of cFSMN acoustic models employing shortcut training deep layers in FSH tasks

The experimental results exp1 and exp2 show that the memory module coding formula as formula (1) is adopted, and by setting a large stride, more context information can be seen, so that better performance can be obtained. From exp2 to exp6, the number of cFSMN-layers is gradually increased, and the model performance is gradually improved. Finally, by adding the jump connection, a Deep cFSMN comprising 12 cFSMN-layers and 2 full connection layers, which is marked as Deep-cFSMN, can be successfully trained, and the word error rate of 9.3% is obtained on the Hub5e00 test set.

the acquisition unit is configured to acquire an original text to be edited;

a receiving unit; configured to receive edit speech data;

In another aspect, the present invention also provides an apparatus, including:

one or more processors;

a memory for storing one or more programs,

The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the invention referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or equivalents thereof is possible without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.

Other technical features besides those described in the specification are known to those skilled in the art, and are not described herein in detail to highlight the innovative features of the present invention.

Claims

1. A text editing method based on a feedforward sequence memory neural network is characterized by comprising the following steps of: the method comprises the following specific steps:

s1: acquiring an original text to be edited;

s2: receiving edited voice data;

s4: carrying out semantic understanding on the editing command and executing the editing command;

the improved feedforward sequence memory neural network is characterized in that a low-dimensional linear projection layer is inserted between hidden layers of a feedforward fully-connected neural network, a memory module is arranged on the linear projection layer, and jump connection is added to adjacent memory modules, so that the output of the low-layer memory module can be directly accumulated and added to a high-layer memory module;

the memory module is a tap delay structure, and hidden layer output at the current moment and the previous moment is encoded through a group of coefficients to obtain a fixed expression;

the memory module operates using scalar or vector based coding.

2. The text editing method based on feedforward sequence memory neural network according to claim 1, wherein: the encoding of the memory module introduces a stride factor.

3. A text editing system based on a feedforward sequence memory neural network, comprising:

the acquisition unit is configured to acquire an original text to be edited;

a receiving unit; configured to receive edit speech data;

the output unit is configured to perform semantic understanding on the editing command, execute the editing command and output an editing text;

the memory module operates using scalar or vector based coding.

4. An apparatus, the apparatus comprising:

one or more processors;

a memory for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to perform a text editing method based on a feedforward sequence memory neural network of any of claims 1-2.

5. A computer readable storage medium storing a computer program which when executed by a processor implements a method of text editing based on a feedforward sequence memory neural network according to any of claims 1-2.