CN110377889B - Text editing method and system based on feedforward sequence memory neural network - Google Patents
Text editing method and system based on feedforward sequence memory neural network Download PDFInfo
- Publication number
- CN110377889B CN110377889B CN201910487145.1A CN201910487145A CN110377889B CN 110377889 B CN110377889 B CN 110377889B CN 201910487145 A CN201910487145 A CN 201910487145A CN 110377889 B CN110377889 B CN 110377889B
- Authority
- CN
- China
- Prior art keywords
- neural network
- edited
- memory module
- editing
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 41
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000004590 computer program Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 2
- 238000012549 training Methods 0.000 description 8
- 101100242909 Streptococcus pneumoniae (strain ATCC BAA-255 / R6) pbpA gene Proteins 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 101100269618 Streptococcus pneumoniae serotype 4 (strain ATCC BAA-334 / TIGR4) aliA gene Proteins 0.000 description 1
- 101100505001 Streptococcus pneumoniae serotype 4 (strain ATCC BAA-334 / TIGR4) glpO gene Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000011056 performance test Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a text editing method based on a feedforward sequence memory neural network, which belongs to the technical field of voice signal processing and comprises the following steps: acquiring an original text to be edited; receiving edited voice data; performing voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command; and carrying out semantic understanding on the editing command, and executing the editing command. According to the technical scheme, the improved feedforward sequence memory neural network is adopted for voice recognition, and text editing is accurate and efficient.
Description
Technical Field
The invention belongs to the technical field of voice signal processing, and particularly relates to a text editing method and system based on a feedforward sequence memory neural network.
Background
With the popularity of mobile phones, people receive a large amount of text information on portable devices such as mobile phones or tablet computers every day. For example, messages pushed by short messages, instant messaging software or other software, web page content, text news, etc. When people want to edit the text content of interest in the text information, firstly, the cursor is required to be positioned at the text content of interest, and then the selected text is subjected to subsequent operations, such as adding the text at the cursor position, replacing the selected text, and the like, so that the editing process is complex and inconvenient. The prior art is to receive voice data recorded by a user, and then execute corresponding editing operation on an editing object according to the voice data. Therefore, when the user edits the text, the user can directly and rapidly select the editing object in the text without complex text selection operation, and the user can directly edit the editing object through voice input, so that the text editing process is simplified. However, the current voice data is directly received and then operated, no voice is processed, and under the conditions of strong far field and noise interference, the performance of the voice recognition system is not ideal enough, so that text editing is inaccurate.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a text editing method based on a feedforward sequence memory neural network, which adopts the feedforward sequence memory neural network based on improvement to carry out voice recognition, so that the text editing is more accurate and efficient.
In order to solve the technical problems, the invention adopts the following technical scheme:
in one aspect, the invention provides a text editing method based on a feedforward sequence memory neural network, which comprises the following specific steps:
s1: acquiring an original text to be edited;
s2: receiving edited voice data;
s3: performing voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command;
s4: and carrying out semantic understanding on the editing command, and executing the editing command.
Further preferably, the improved feedforward sequence memory neural network is characterized in that a low-dimensional linear projection layer is inserted between hidden layers of a feedforward fully-connected neural network, a memory module is arranged on the linear projection layer, and jump connection is added between adjacent memory modules, so that the output of the low-layer memory module can be directly accumulated and added to the high-layer memory module.
Further preferably, the memory module is a tap delay structure that encodes hidden layer outputs at a current time and a previous time through a set of coefficients to obtain a fixed representation.
Further preferably, the operation of the memory module employs scalar or vector based encoding.
Further preferably, the encoding of the memory module introduces a stride factor.
On the other hand, the invention also provides a text editing system based on the feedforward sequence memory neural network, which comprises the following steps:
the acquisition unit is configured to acquire an original text to be edited;
a receiving unit; configured to receive edit speech data;
the recognition unit is configured to perform voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command;
and the output unit is configured to perform semantic understanding on the editing command, execute the editing command and output an editing text.
In another aspect, the present invention also provides an apparatus, including:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to perform any of the feedforward sequence memory neural network-based text editing methods of examples of the invention.
In another aspect, the present invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements any of the feedforward sequence memory neural network-based text editing of the examples of the invention.
Compared with the prior art, the invention has the beneficial effects that:
according to the text editing method based on the feedforward sequence memory neural network, which is disclosed by the invention, the original text to be edited is obtained, the voice data input by the user is received, and the corresponding editing operation is executed on the editing object according to the voice data, so that the user can directly and rapidly select the editing object in the text without complex text selection operation when editing the text, and the user can directly edit the editing object through voice input, so that the text editing process is simplified. In addition, the voice recognition is carried out on the edited voice data by adopting a feedforward sequence memory neural network based on improvement, so that the text editing is more accurate and efficient.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings, in which:
FIG. 1 is a schematic flow diagram of one embodiment of the present invention;
FIG. 2 is a block diagram of an improved feedforward sequence memory neural network.
Detailed Description
The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the invention are shown in the drawings.
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
As shown in fig. 1, an embodiment of the present invention provides a text editing method based on a feedforward sequence memory neural network, which specifically includes the steps of:
s1: acquiring an original text to be edited;
s2: receiving edited voice data;
s3: performing voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command;
s4: and carrying out semantic understanding on the editing command, and executing the editing command.
The improved feedforward sequence memory neural network is characterized in that a low-dimensional linear projection layer is inserted between hidden layers of a feedforward fully-connected neural network, a memory module is arranged on the linear projection layer, and jump connection is added to adjacent memory modules, so that the output of the low-layer memory module can be directly accumulated and added to a high-layer memory module.
The memory module is a tap delay structure, and hidden layer output at the current moment and the previous moment is encoded through a group of coefficients to obtain a fixed expression.
The memory module operates using scalar or vector based coding.
The coding of the memory module introduces a stride factor, and a specific calculation formula is as follows:
wherein the method comprises the steps ofOutput of the memory module representing the previous cFSMN-layer, s1 and s2 represent look-back and look-forward, respectively
Stride for future viewing. If s1=2 then this means that one input is taken for each moment in time when the history is encoded. Thus in the same way
In order, a longer history can be seen, so that the long-term correlation can be modeled more effectively.
The performance of the improved feedforward sequence memory neural network (cFSMN) of this example on the SWB database, the number of model parameters and the training time for each iteration are compared with the performance of the existing Sigmoid-DNN, LSTM, BLSTM, sFSMN and vFSMN speech recognition systems, see Table 1:
table 1: performance of speech recognition system on SWB database, model parameters and training time per iteration
Experimental results indicate that models that can effectively model long-term correlations, such as LSTM and FSMN, can achieve significant performance improvements in DNN. LSTM-iterations take 9.5 hours, while BLSTM takes 23.2 hours. This is because NVIDIA Tesla K20GPU memory is only 3GB, so that BLSTM based on BPTT training can only use 16-sentence parallelism, while LSTM can use 64-sentence parallelism. The proposed vfmn can achieve a small performance improvement over the BLSTM. The model structure of the vFSMN is simpler, the training speed is faster, the training of the vFSMN for one iteration approximately takes 6.9 hours, and 3 times of training acceleration can be obtained compared with BLSTM. But the model parameters of the vfmn are more than those of the BLSTM. Further, the proposed cFSMN can reduce the overall parameters of the model to 74MB, which can reduce the amount of parameters by 60% compared to BLSTM. More importantly, only 3.0 hours are required for each iteration, approximately 7 times the training acceleration can be achieved compared to BLSTM. Furthermore, a model based on cFSMN can obtain a word error rate of 12.5% and an absolute performance improvement of 0.9% over BLSTM.
The improved feedforward sequence memory neural network is represented as 216-Nx [2048-P (N) 1 ,N 2 )]-mx2048-P-8911, wherein N and M represent the number of cfmn-layers and standard fully connected layers, respectively. P is the number of nodes of the low rank linear projection layer. N (N) 1 ,N 2 Representing the filter order for review and review, respectively. Performance tests of different configurations using the improved feedforward sequence memory neural network (cFSMN) acoustic model at FSH tasks are shown in Table 2:
table 2: performance of different configurations of cFSMN acoustic models employing shortcut training deep layers in FSH tasks
The experimental results exp1 and exp2 show that the memory module coding formula as formula (1) is adopted, and by setting a large stride, more context information can be seen, so that better performance can be obtained. From exp2 to exp6, the number of cFSMN-layers is gradually increased, and the model performance is gradually improved. Finally, by adding the jump connection, a Deep cFSMN comprising 12 cFSMN-layers and 2 full connection layers, which is marked as Deep-cFSMN, can be successfully trained, and the word error rate of 9.3% is obtained on the Hub5e00 test set.
On the other hand, the invention also provides a text editing system based on the feedforward sequence memory neural network, which comprises the following steps:
the acquisition unit is configured to acquire an original text to be edited;
a receiving unit; configured to receive edit speech data;
the recognition unit is configured to perform voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command;
and the output unit is configured to perform semantic understanding on the editing command, execute the editing command and output an editing text.
In another aspect, the present invention also provides an apparatus, including:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to perform any of the feedforward sequence memory neural network-based text editing methods of examples of the invention.
In another aspect, the present invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements any of the feedforward sequence memory neural network-based text editing of the examples of the invention.
The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the invention referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or equivalents thereof is possible without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.
Other technical features besides those described in the specification are known to those skilled in the art, and are not described herein in detail to highlight the innovative features of the present invention.
Claims (5)
1. A text editing method based on a feedforward sequence memory neural network is characterized by comprising the following steps of: the method comprises the following specific steps:
s1: acquiring an original text to be edited;
s2: receiving edited voice data;
s3: performing voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command;
s4: carrying out semantic understanding on the editing command and executing the editing command;
the improved feedforward sequence memory neural network is characterized in that a low-dimensional linear projection layer is inserted between hidden layers of a feedforward fully-connected neural network, a memory module is arranged on the linear projection layer, and jump connection is added to adjacent memory modules, so that the output of the low-layer memory module can be directly accumulated and added to a high-layer memory module;
the memory module is a tap delay structure, and hidden layer output at the current moment and the previous moment is encoded through a group of coefficients to obtain a fixed expression;
the memory module operates using scalar or vector based coding.
2. The text editing method based on feedforward sequence memory neural network according to claim 1, wherein: the encoding of the memory module introduces a stride factor.
3. A text editing system based on a feedforward sequence memory neural network, comprising:
the acquisition unit is configured to acquire an original text to be edited;
a receiving unit; configured to receive edit speech data;
the recognition unit is configured to perform voice recognition on the edited voice data by adopting a feedforward sequence memory neural network based on improvement to obtain an edited command;
the output unit is configured to perform semantic understanding on the editing command, execute the editing command and output an editing text;
the improved feedforward sequence memory neural network is characterized in that a low-dimensional linear projection layer is inserted between hidden layers of a feedforward fully-connected neural network, a memory module is arranged on the linear projection layer, and jump connection is added to adjacent memory modules, so that the output of the low-layer memory module can be directly accumulated and added to a high-layer memory module;
the memory module is a tap delay structure, and hidden layer output at the current moment and the previous moment is encoded through a group of coefficients to obtain a fixed expression;
the memory module operates using scalar or vector based coding.
4. An apparatus, the apparatus comprising:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to perform a text editing method based on a feedforward sequence memory neural network of any of claims 1-2.
5. A computer readable storage medium storing a computer program which when executed by a processor implements a method of text editing based on a feedforward sequence memory neural network according to any of claims 1-2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910487145.1A CN110377889B (en) | 2019-06-05 | 2019-06-05 | Text editing method and system based on feedforward sequence memory neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910487145.1A CN110377889B (en) | 2019-06-05 | 2019-06-05 | Text editing method and system based on feedforward sequence memory neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110377889A CN110377889A (en) | 2019-10-25 |
CN110377889B true CN110377889B (en) | 2023-06-20 |
Family
ID=68249843
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910487145.1A Active CN110377889B (en) | 2019-06-05 | 2019-06-05 | Text editing method and system based on feedforward sequence memory neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110377889B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016101688A1 (en) * | 2014-12-25 | 2016-06-30 | 清华大学 | Continuous voice recognition method based on deep long-and-short-term memory recurrent neural network |
CN106919977A (en) * | 2015-12-25 | 2017-07-04 | 科大讯飞股份有限公司 | A kind of feedforward sequence Memory Neural Networks and its construction method and system |
-
2019
- 2019-06-05 CN CN201910487145.1A patent/CN110377889B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016101688A1 (en) * | 2014-12-25 | 2016-06-30 | 清华大学 | Continuous voice recognition method based on deep long-and-short-term memory recurrent neural network |
CN106919977A (en) * | 2015-12-25 | 2017-07-04 | 科大讯飞股份有限公司 | A kind of feedforward sequence Memory Neural Networks and its construction method and system |
Non-Patent Citations (1)
Title |
---|
基于时域建模的自动语音识别;王海坤等;《计算机工程与应用》;20171015(第20期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110377889A (en) | 2019-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11620983B2 (en) | Speech recognition method, device, and computer-readable storage medium | |
Wang et al. | An overview of image caption generation methods | |
US10395118B2 (en) | Systems and methods for video paragraph captioning using hierarchical recurrent neural networks | |
CN112185352B (en) | Voice recognition method and device and electronic equipment | |
US11321535B2 (en) | Hierarchical annotation of dialog acts | |
CN108735202A (en) | Convolution recurrent neural network for small occupancy resource keyword retrieval | |
CN110166650B (en) | Video set generation method and device, computer equipment and readable medium | |
CN107077842A (en) | System and method for phonetic transcription | |
CN110287461A (en) | Text conversion method, device and storage medium | |
CN103377651B (en) | The automatic synthesizer of voice and method | |
CN104199825A (en) | Information inquiry method and system | |
CN112825249A (en) | Voice processing method and device | |
CN108388597A (en) | Conference summary generation method and device | |
CN111653270B (en) | Voice processing method and device, computer readable storage medium and electronic equipment | |
WO2019138897A1 (en) | Learning device and method, and program | |
CN114360502A (en) | Processing method of voice recognition model, voice recognition method and device | |
CN110377889B (en) | Text editing method and system based on feedforward sequence memory neural network | |
CN108962228A (en) | model training method and device | |
CN116306672A (en) | Data processing method and device | |
CN112150103B (en) | Schedule setting method, schedule setting device and storage medium | |
CN109147773B (en) | Voice recognition device and method | |
GB2555945A (en) | Hierarchical annotation of dialog acts | |
CN109829035A (en) | Process searching method, device, computer equipment and storage medium | |
Farsi et al. | Modifying voice activity detection in low SNR by correction factors | |
CN117952171A (en) | Model generation method, image generation device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |