CN113836910A

CN113836910A - Text recognition method and system based on multilevel semantics

Info

Publication number: CN113836910A
Application number: CN202111094473.9A
Authority: CN
Inventors: 孔浩冉; 白振昊; 陈园
Original assignee: Shandong Normal University
Current assignee: Shandong Normal University
Priority date: 2021-09-17
Filing date: 2021-09-17
Publication date: 2021-12-24

Abstract

The disclosure provides a text recognition method and system based on multilevel semantics, which is used for acquiring text data to be recognized; extracting words of the text data to obtain word vectors of the words; obtaining the feature representation of a word according to the obtained word vector and the first bidirectional long-short term memory network, and obtaining word-level local sentence semantic representations under different visual angles by combining the first attention network; obtaining the characteristic representation of a sentence according to the obtained sentence semantic representation under different visual angles of word levels and a second bidirectional long-short term memory network, and obtaining sentence level global sentence semantic representation under different visual angles by combining a second attention network; obtaining a text recognition result according to the obtained global sentence semantic representation; the method not only highlights the contribution of important words and sentences to text semantics, but also expands the extraction from a single view angle to multi-view text semantics, thereby improving the accuracy of text recognition.

Description

Text recognition method and system based on multilevel semantics

Technical Field

The disclosure relates to the technical field of text data processing, and in particular relates to a text recognition method and system based on multilevel semantics.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Text recognition is a technique that processes printed text, handwritten text, or pictures containing words in the real world and converts them into machine-coded text. With the development and application of computer and artificial intelligence technology, text recognition is widely applied in many fields, for example, the text recognition must be performed before the security level confirmation of documents such as government and the like. The text recognition technology is an important component of computer vision, is the basis of a machine-cognitive world and is a hotspot of artificial intelligence research.

Text feature extraction is the core of text recognition, and most applications in the current text feature extraction are a document frequency method, an information gain method, a mutual information method, a statistical method and the like. The algorithms all adopt the thought based on word frequency statistical information, so that the semantic relation among words serving as important reference indexes is lacked in the characteristic extraction process, and certain text characteristics extracted by the method cannot effectively represent the theme content to be expressed by the text. A word embedding model proposed by researchers is a new word vector representation mode, semantic relation is established for relations among words, and semantic similarity and description relevance cannot be distinguished. With the help of word vectors, researchers have proposed an improved TextRank method which extracts and generates keywords from scientific publications based on pre-trained word vectors, but because its low-level feature layer is separated from the high-level feature layer, the extraction of the overall semantics of the text is not sufficient. Some researchers propose to learn keywords and context information thereof by using a deep recurrent neural network aiming at social short text information, so as to extract keywords in the text.

The inventor finds that the methods neglect the global semantic information of the text, have a single extraction view angle, and do not consider the contribution degree of different words and sentences to the text semantics, thereby resulting in insufficient extraction of the text semantic features.

Disclosure of Invention

In order to solve the defects of the prior art, the method and the system for recognizing the text based on the multilevel semantics are provided, the deep semantic relation among words is considered when text semantic features are extracted, a two-layer text semantic extraction method based on an attention mechanism is used for extracting semantic sequences in two aspects of forward and backward, local semantic information at a word level and global semantic information at a sentence level are extracted, single vector expression of semantics is converted into a feature matrix form, the contribution of important words and sentences to the text semantics is highlighted, the single-view-angle is expanded into multi-view text semantic extraction, and the accuracy of text recognition is improved.

In order to achieve the purpose, the following technical scheme is adopted in the disclosure:

the first aspect of the disclosure provides a text recognition method based on multilevel semantics.

A text recognition method based on multilevel semantics comprises the following processes:

acquiring text data to be identified;

extracting words of the text data to obtain word vectors of the words;

obtaining the feature representation of a word according to the obtained word vector and the first bidirectional long-short term memory network, and obtaining word-level local sentence semantic representations under different visual angles by combining the first attention network;

obtaining the characteristic representation of a sentence according to the obtained sentence semantic representation under different visual angles of word levels and a second bidirectional long-short term memory network, and obtaining sentence level global sentence semantic representation under different visual angles by combining a second attention network;

and obtaining a text recognition result according to the obtained global sentence semantic representation.

Further, word embedding is carried out by using a Skip-gram model to obtain a word vector of each word.

Further, in the first bidirectional long short term memory network and the second bidirectional long short term memory network, each partial feature vector is generated by connecting the forward long short term memory network and the backward long short term memory network.

Further, the weights of words in the sentence are described as a two-dimensional weight matrix by using the first attention network, and different rows of the matrix represent information of different perspectives of the sentence.

Further, the word-level sentence semantic representation at different perspectives is a product of the feature representation of the word and a weight matrix of the first attention mechanism network, and the weight matrix includes the constraints:

wherein A is a weight matrix and I is an identity matrix.

Furthermore, the word-level local sentence semantic representation and the feature representation of the words use the dropout method to eliminate the influence of overfitting.

Furthermore, according to the characteristic representation of the sentence, a weight matrix which represents the mutual relation of the sentences under a plurality of visual angles is obtained by utilizing a second attention network, and the weight matrix is multiplied by the sentence characteristics to obtain a global sentence semantic representation matrix of the text.

A second aspect of the disclosure provides a system for text recognition based on multilevel semantics.

A system for text recognition based on multilevel semantics, comprising:

a data acquisition module configured to: acquiring text data to be identified;

a word vector extraction module configured to: extracting words of the text data to obtain word vectors of the words;

a local sentence semantic representation acquisition module configured to: obtaining the feature representation of a word according to the obtained word vector and the first bidirectional long-short term memory network, and obtaining word-level local sentence semantic representations under different visual angles by combining the first attention network;

a global sentence semantic representation acquisition module configured to: obtaining the characteristic representation of a sentence according to the obtained sentence semantic representation under different visual angles of word levels and a second bidirectional long-short term memory network, and obtaining sentence level global sentence semantic representation under different visual angles by combining a second attention network;

a text recognition module configured to: and obtaining a text recognition result according to the obtained global sentence semantic representation.

A third aspect of the present disclosure provides a computer-readable storage medium, on which a program is stored, which when executed by a processor implements the steps in the multilevel semantic based text recognition method according to the first aspect of the present disclosure.

A fourth aspect of the present disclosure provides an electronic device, including a memory, a processor, and a program stored in the memory and executable on the processor, where the processor executes the program to implement the steps in the multilevel semantic based text recognition method according to the first aspect of the present disclosure.

Compared with the prior art, the beneficial effect of this disclosure is:

according to the method, the system, the medium or the electronic equipment, when text semantic features are extracted, deep semantic relations among words are considered, a two-layer text semantic extraction method based on an attention mechanism is used for extracting semantic sequences in two aspects of forward and backward, extracting local semantic information at a word level and global semantic information at a sentence level, and converting single vector expression of semantics into a feature matrix form, so that not only is the contribution of important words and sentences to text semantics highlighted, but also multi-view text semantic extraction is expanded from a single view angle, and the accuracy of text recognition is improved.

Advantages of additional aspects of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.

Fig. 1 is a schematic frame diagram of a text recognition method based on multilevel semantics according to embodiment 1 of the present disclosure.

Detailed Description

The present disclosure is further described with reference to the following drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

Example 1:

the embodiment 1 of the present disclosure provides a text recognition method based on multilevel semantics, which includes the following processes:

acquiring text data to be identified;

extracting words of the text data to obtain word vectors of the words;

In this embodiment, the word embedding matrix having local feature information and the sentence embedding matrix having global feature information are extracted using an attention model and a two-layer bidirectional LSTM deep network. In order to extract more valuable text semantic information, a double-layer BilTM module is fused at the output end of a word vector layer to learn deep semantic relations among words, an attention mechanism is introduced at the output end of two-layer bidirectional LSTM layers, and local semantic information is learned in a multi-view mode; the sentence characteristics are input into a double-layer BilSTM network to learn deep semantic relations among sentences, and global semantic information of text description is acquired through sentence level attention. The whole process comprises four stages of word encoding, word attention, sentence encoding and sentence attention, as shown in FIG. 1.

Specifically, the method comprises the following steps:

s1: context-dependent word vector construction

Bidirectional neural networks learn the feature information of sentences by accepting vectorized representations of text, typically using word embedding to map each word to vector space. Word embedding is carried out by the Skip-gram model, context words corresponding to given central words are predicted, semantic information of the words is quantized in an embedded vector, and a learning target of the word can be written into a maximum likelihood function:

where w is any word in the corpus, u is a word in the context of w,

assume that each multiplier in p (u | w) is a mutually independent logistic regression and

is 0 or 1, the objective function can be rewritten as:

for a given training example w, u is fixed, then equation (2) has two variables v (w) and

to v (w) and

calculating partial derivatives to obtain v (w) and

the update expression of (1):

wherein η is the learning rate, and after multiple update iterations, the sentence s ═ w is obtained₁，w₂，…,w_nThe word vector for each word in the Chinese.

S2: word-level text semantic information extraction

The depth structure can obtain high-level representation of a text layer by layer, and a depth bidirectional long and short memory model is obtained by stacking a BilSTM network formed by hiding layers of a forward sequence and a reverse sequence on the basis of a word vector obtained by a Skip-gram model.

Forward estimation

Scanning from the first word to the last word of the sentence, and if n words are total, traversing t from 1 to n and then calculating part

Starting with the last word, i.e., t goes from n to 1, the hidden layer output is usually as a textual representation in BiLSTM:

wherein the content of the first and second substances,

the representation forward propagation incorporates the above information representation,

representation back propagation fuses the context information representation.

In the feature fusion layer part, each part of feature vectors are generated by forward and backward LSTM connections, and the forward and backward LSTM connections represent text semantic representations of the BiLSTM model fusion context in the input vector fusion context in the t step.

After the BilSTM is stacked layer by layer, a deep network structure is formed. In the deep network structure, after normalization processing, the feature of a word is expressed as H ═ H (H)₁，h₂，…，h_n) Wherein

And | represents a vector join operation.

In recent years, attention mechanism has been applied to natural language processing, and it is possible to learn the weight of each word in a sentence expression sequence and to pay more attention to more important word information in a text. An attention mechanism is introduced into sentence expression, the sentence expression is mapped into a two-dimensional matrix, the matrix-level sentence semantic representation is obtained in a multi-view mode, and a word-level attention model weight matrix A is as follows:

A＝softmax(W_s2tanh(W_s1H^T)) (6)

wherein, W_s1And W_s2Representing the training parameters of the model.

In order to avoid the condition that the learned weights are the same, the attention weight matrix A is constrained to meet the following conditions:

under the constraint of formula (7), the attention weights are distributed more intensively on words with larger influence on text semantic extraction, and the attention weight distributions at different view angles are greatly different as much as possible, and the finally synthesized sentence semantic is expressed as M ═ AH. The word level attention model describes the weight of words in a sentence into a two-dimensional matrix, different rows of the matrix represent different levels of information of the sentence, namely sentence semantics are read from different visual angles, and then matrix level semantic representation is obtained, so that sentence expression information is richer, and the problem of information loss when the sentence is compressed into a one-dimensional vector is solved.

The sentence semantic matrix M and the word semantic representation H use the dropout method to eliminate the influence of overfitting, part of nodes in the network are randomly removed in the training stage, and all nodes are used in the testing stage. Introducing the hyperparameter p indicates that each node of the network is retained with a probability of p and removed with a probability of 1-p in an iteration. Therefore, in each iteration of the network, only the sub-network formed by the reserved partial nodes is trained, namely only the node parameters in the sub-network are updated, and the remaining node parameters reserve the result of the last iteration, so that the model generalization capability can be improved, and the training process can be accelerated.

S3: sentence-level text semantic information extraction

The sentence semantics of the attention mechanism are expressed as a two-dimensional matrix, the contribution of words to the text semantics from different perspectives is described, and in the global semantic information extraction at the sentence level, the sentences are encoded from each perspective.

Suppose a sentence s at the ith perspective_j(j is 1, 2, …, m) is given by

Using BilSTM network to input semantic vectors of m sentences under the view angle i (i is 1, …, r), combining the forward and backward context information to obtain hidden layer output, and then expressing the semantic as

Wherein r represents the number of viewing angles, hⁱRepresenting the semantics of the m sentences in the ith view angle.

The attention model introduced into the sentence level measures the importance of a sentence in the text, and h is calculatedⁱAnd h^jSimilarity metric function:

wherein M is_ijFor position coding, forward coding can be used

And backward position coding

For similarity metric matrix f (h)ⁱ，h^j) Performing softmax normalization operation to obtain a weight, and further obtaining a weight matrix of sentence interrelations from r visual angles

Multiplying with sentence characteristics to obtain global characteristic matrix representation of text

Example 2:

the embodiment 2 of the present disclosure provides a text recognition system based on multi-level semantics, including:

a data acquisition module configured to: acquiring text data to be identified;

The working method of the system is the same as the text recognition method based on the multilevel semantics provided in the embodiment 1, and the description is omitted here.

Example 3:

the embodiment 3 of the present disclosure provides a computer-readable storage medium, on which a program is stored, where the program, when executed by a processor, implements the steps in the text recognition method based on multilevel semantics according to embodiment 1 of the present disclosure, where the steps are:

acquiring text data to be identified;

extracting words of the text data to obtain word vectors of the words;

The detailed steps are the same as those of the text recognition method based on multilevel semantics provided in embodiment 1, and are not described herein again.

Example 4:

the embodiment 4 of the present disclosure provides an electronic device, which includes a memory, a processor, and a program stored in the memory and capable of running on the processor, where the processor implements the steps in the text recognition method based on multilevel semantics according to the embodiment 1 of the present disclosure when executing the program, where the steps are:

acquiring text data to be identified;

extracting words of the text data to obtain word vectors of the words;

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A text recognition method based on multilevel semantics is characterized in that: the method comprises the following steps:

acquiring text data to be identified;

extracting words of the text data to obtain word vectors of the words;

2. The method for text recognition based on multilevel semantics of claim 1, wherein:

and embedding words by using a Skip-gram model to obtain a word vector of each word.

3. The method for text recognition based on multilevel semantics of claim 1, wherein:

in the first bidirectional long-short term memory network and the second bidirectional long-short term memory network, each partial feature vector is generated by connecting the forward long-short term memory network and the backward long-short term memory network.

4. The method for text recognition based on multilevel semantics of claim 1, wherein:

with the first attention network, the weights of words in the sentence are described as a two-dimensional weight matrix, and different rows of the matrix represent information from different perspectives of the sentence.

5. The method for text recognition based on multilevel semantics of claim 4, wherein:

words under different viewing anglesThe level sentence semantic representation is a product of the feature representation of the word and a weight matrix of the first attention mechanism network, and the weight matrix includes a constraint:

wherein A is a weight matrix and I is an identity matrix.

6. The method for text recognition based on multilevel semantics of claim 4, wherein:

the word-level local sentence semantic representation and the feature representation of the words use the dropout method to eliminate the influence of overfitting.

7. The method for text recognition based on multilevel semantics of claim 1, wherein:

and according to the characteristic representation of the sentence, obtaining a weight matrix which represents the mutual relation of the sentences under a plurality of visual angles by utilizing a second attention network, and multiplying the weight matrix by the sentence characteristics to obtain a global sentence semantic representation matrix of the text.

8. A text recognition system based on multilevel semantics is characterized in that: the method comprises the following steps:

a data acquisition module configured to: acquiring text data to be identified;

9. A computer-readable storage medium, on which a program is stored, which, when being executed by a processor, carries out the steps of the method for multilevel semantic based text recognition according to any of claims 1 to 7.

10. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for text recognition based on multilevel semantics according to any one of claims 1 to 7 when executing the program.