CN113836910A - Text recognition method and system based on multilevel semantics - Google Patents

Text recognition method and system based on multilevel semantics Download PDF

Info

Publication number
CN113836910A
CN113836910A CN202111094473.9A CN202111094473A CN113836910A CN 113836910 A CN113836910 A CN 113836910A CN 202111094473 A CN202111094473 A CN 202111094473A CN 113836910 A CN113836910 A CN 113836910A
Authority
CN
China
Prior art keywords
sentence
word
obtaining
text
semantics
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111094473.9A
Other languages
Chinese (zh)
Inventor
孔浩冉
白振昊
陈园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN202111094473.9A priority Critical patent/CN113836910A/en
Publication of CN113836910A publication Critical patent/CN113836910A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a text recognition method and system based on multilevel semantics, which is used for acquiring text data to be recognized; extracting words of the text data to obtain word vectors of the words; obtaining the feature representation of a word according to the obtained word vector and the first bidirectional long-short term memory network, and obtaining word-level local sentence semantic representations under different visual angles by combining the first attention network; obtaining the characteristic representation of a sentence according to the obtained sentence semantic representation under different visual angles of word levels and a second bidirectional long-short term memory network, and obtaining sentence level global sentence semantic representation under different visual angles by combining a second attention network; obtaining a text recognition result according to the obtained global sentence semantic representation; the method not only highlights the contribution of important words and sentences to text semantics, but also expands the extraction from a single view angle to multi-view text semantics, thereby improving the accuracy of text recognition.

Description

Text recognition method and system based on multilevel semantics
Technical Field
The disclosure relates to the technical field of text data processing, and in particular relates to a text recognition method and system based on multilevel semantics.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Text recognition is a technique that processes printed text, handwritten text, or pictures containing words in the real world and converts them into machine-coded text. With the development and application of computer and artificial intelligence technology, text recognition is widely applied in many fields, for example, the text recognition must be performed before the security level confirmation of documents such as government and the like. The text recognition technology is an important component of computer vision, is the basis of a machine-cognitive world and is a hotspot of artificial intelligence research.
Text feature extraction is the core of text recognition, and most applications in the current text feature extraction are a document frequency method, an information gain method, a mutual information method, a statistical method and the like. The algorithms all adopt the thought based on word frequency statistical information, so that the semantic relation among words serving as important reference indexes is lacked in the characteristic extraction process, and certain text characteristics extracted by the method cannot effectively represent the theme content to be expressed by the text. A word embedding model proposed by researchers is a new word vector representation mode, semantic relation is established for relations among words, and semantic similarity and description relevance cannot be distinguished. With the help of word vectors, researchers have proposed an improved TextRank method which extracts and generates keywords from scientific publications based on pre-trained word vectors, but because its low-level feature layer is separated from the high-level feature layer, the extraction of the overall semantics of the text is not sufficient. Some researchers propose to learn keywords and context information thereof by using a deep recurrent neural network aiming at social short text information, so as to extract keywords in the text.
The inventor finds that the methods neglect the global semantic information of the text, have a single extraction view angle, and do not consider the contribution degree of different words and sentences to the text semantics, thereby resulting in insufficient extraction of the text semantic features.
Disclosure of Invention
In order to solve the defects of the prior art, the method and the system for recognizing the text based on the multilevel semantics are provided, the deep semantic relation among words is considered when text semantic features are extracted, a two-layer text semantic extraction method based on an attention mechanism is used for extracting semantic sequences in two aspects of forward and backward, local semantic information at a word level and global semantic information at a sentence level are extracted, single vector expression of semantics is converted into a feature matrix form, the contribution of important words and sentences to the text semantics is highlighted, the single-view-angle is expanded into multi-view text semantic extraction, and the accuracy of text recognition is improved.
In order to achieve the purpose, the following technical scheme is adopted in the disclosure:
the first aspect of the disclosure provides a text recognition method based on multilevel semantics.
A text recognition method based on multilevel semantics comprises the following processes:
acquiring text data to be identified;
extracting words of the text data to obtain word vectors of the words;
obtaining the feature representation of a word according to the obtained word vector and the first bidirectional long-short term memory network, and obtaining word-level local sentence semantic representations under different visual angles by combining the first attention network;
obtaining the characteristic representation of a sentence according to the obtained sentence semantic representation under different visual angles of word levels and a second bidirectional long-short term memory network, and obtaining sentence level global sentence semantic representation under different visual angles by combining a second attention network;
and obtaining a text recognition result according to the obtained global sentence semantic representation.
Further, word embedding is carried out by using a Skip-gram model to obtain a word vector of each word.
Further, in the first bidirectional long short term memory network and the second bidirectional long short term memory network, each partial feature vector is generated by connecting the forward long short term memory network and the backward long short term memory network.
Further, the weights of words in the sentence are described as a two-dimensional weight matrix by using the first attention network, and different rows of the matrix represent information of different perspectives of the sentence.
Further, the word-level sentence semantic representation at different perspectives is a product of the feature representation of the word and a weight matrix of the first attention mechanism network, and the weight matrix includes the constraints:
Figure BDA0003268613990000031
wherein A is a weight matrix and I is an identity matrix.
Furthermore, the word-level local sentence semantic representation and the feature representation of the words use the dropout method to eliminate the influence of overfitting.
Furthermore, according to the characteristic representation of the sentence, a weight matrix which represents the mutual relation of the sentences under a plurality of visual angles is obtained by utilizing a second attention network, and the weight matrix is multiplied by the sentence characteristics to obtain a global sentence semantic representation matrix of the text.
A second aspect of the disclosure provides a system for text recognition based on multilevel semantics.
A system for text recognition based on multilevel semantics, comprising:
a data acquisition module configured to: acquiring text data to be identified;
a word vector extraction module configured to: extracting words of the text data to obtain word vectors of the words;
a local sentence semantic representation acquisition module configured to: obtaining the feature representation of a word according to the obtained word vector and the first bidirectional long-short term memory network, and obtaining word-level local sentence semantic representations under different visual angles by combining the first attention network;
a global sentence semantic representation acquisition module configured to: obtaining the characteristic representation of a sentence according to the obtained sentence semantic representation under different visual angles of word levels and a second bidirectional long-short term memory network, and obtaining sentence level global sentence semantic representation under different visual angles by combining a second attention network;
a text recognition module configured to: and obtaining a text recognition result according to the obtained global sentence semantic representation.
A third aspect of the present disclosure provides a computer-readable storage medium, on which a program is stored, which when executed by a processor implements the steps in the multilevel semantic based text recognition method according to the first aspect of the present disclosure.
A fourth aspect of the present disclosure provides an electronic device, including a memory, a processor, and a program stored in the memory and executable on the processor, where the processor executes the program to implement the steps in the multilevel semantic based text recognition method according to the first aspect of the present disclosure.
Compared with the prior art, the beneficial effect of this disclosure is:
according to the method, the system, the medium or the electronic equipment, when text semantic features are extracted, deep semantic relations among words are considered, a two-layer text semantic extraction method based on an attention mechanism is used for extracting semantic sequences in two aspects of forward and backward, extracting local semantic information at a word level and global semantic information at a sentence level, and converting single vector expression of semantics into a feature matrix form, so that not only is the contribution of important words and sentences to text semantics highlighted, but also multi-view text semantic extraction is expanded from a single view angle, and the accuracy of text recognition is improved.
Advantages of additional aspects of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
Fig. 1 is a schematic frame diagram of a text recognition method based on multilevel semantics according to embodiment 1 of the present disclosure.
Detailed Description
The present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
Example 1:
the embodiment 1 of the present disclosure provides a text recognition method based on multilevel semantics, which includes the following processes:
acquiring text data to be identified;
extracting words of the text data to obtain word vectors of the words;
obtaining the feature representation of a word according to the obtained word vector and the first bidirectional long-short term memory network, and obtaining word-level local sentence semantic representations under different visual angles by combining the first attention network;
obtaining the characteristic representation of a sentence according to the obtained sentence semantic representation under different visual angles of word levels and a second bidirectional long-short term memory network, and obtaining sentence level global sentence semantic representation under different visual angles by combining a second attention network;
and obtaining a text recognition result according to the obtained global sentence semantic representation.
In this embodiment, the word embedding matrix having local feature information and the sentence embedding matrix having global feature information are extracted using an attention model and a two-layer bidirectional LSTM deep network. In order to extract more valuable text semantic information, a double-layer BilTM module is fused at the output end of a word vector layer to learn deep semantic relations among words, an attention mechanism is introduced at the output end of two-layer bidirectional LSTM layers, and local semantic information is learned in a multi-view mode; the sentence characteristics are input into a double-layer BilSTM network to learn deep semantic relations among sentences, and global semantic information of text description is acquired through sentence level attention. The whole process comprises four stages of word encoding, word attention, sentence encoding and sentence attention, as shown in FIG. 1.
Specifically, the method comprises the following steps:
s1: context-dependent word vector construction
Bidirectional neural networks learn the feature information of sentences by accepting vectorized representations of text, typically using word embedding to map each word to vector space. Word embedding is carried out by the Skip-gram model, context words corresponding to given central words are predicted, semantic information of the words is quantized in an embedded vector, and a learning target of the word can be written into a maximum likelihood function:
Figure BDA0003268613990000064
where w is any word in the corpus, u is a word in the context of w,
Figure BDA0003268613990000061
assume that each multiplier in p (u | w) is a mutually independent logistic regression and
Figure BDA0003268613990000062
is 0 or 1, the objective function can be rewritten as:
Figure BDA0003268613990000063
for a given training example w, u is fixed, then equation (2) has two variables v (w) and
Figure BDA0003268613990000071
to v (w) and
Figure BDA0003268613990000072
calculating partial derivatives to obtain v (w) and
Figure BDA0003268613990000073
the update expression of (1):
Figure BDA0003268613990000074
Figure BDA0003268613990000075
wherein η is the learning rate, and after multiple update iterations, the sentence s ═ w is obtained1,w2,…,wnThe word vector for each word in the Chinese.
S2: word-level text semantic information extraction
The depth structure can obtain high-level representation of a text layer by layer, and a depth bidirectional long and short memory model is obtained by stacking a BilSTM network formed by hiding layers of a forward sequence and a reverse sequence on the basis of a word vector obtained by a Skip-gram model.
Forward estimation
Figure BDA0003268613990000076
Scanning from the first word to the last word of the sentence, and if n words are total, traversing t from 1 to n and then calculating part
Figure BDA0003268613990000077
Starting with the last word, i.e., t goes from n to 1, the hidden layer output is usually as a textual representation in BiLSTM:
Figure BDA0003268613990000078
wherein the content of the first and second substances,
Figure BDA0003268613990000079
the representation forward propagation incorporates the above information representation,
Figure BDA00032686139900000710
representation back propagation fuses the context information representation.
In the feature fusion layer part, each part of feature vectors are generated by forward and backward LSTM connections, and the forward and backward LSTM connections represent text semantic representations of the BiLSTM model fusion context in the input vector fusion context in the t step.
After the BilSTM is stacked layer by layer, a deep network structure is formed. In the deep network structure, after normalization processing, the feature of a word is expressed as H ═ H (H)1,h2,…,hn) Wherein
Figure BDA00032686139900000711
And | represents a vector join operation.
In recent years, attention mechanism has been applied to natural language processing, and it is possible to learn the weight of each word in a sentence expression sequence and to pay more attention to more important word information in a text. An attention mechanism is introduced into sentence expression, the sentence expression is mapped into a two-dimensional matrix, the matrix-level sentence semantic representation is obtained in a multi-view mode, and a word-level attention model weight matrix A is as follows:
A=softmax(Ws2tanh(Ws1HT)) (6)
wherein, Ws1And Ws2Representing the training parameters of the model.
In order to avoid the condition that the learned weights are the same, the attention weight matrix A is constrained to meet the following conditions:
Figure BDA0003268613990000081
under the constraint of formula (7), the attention weights are distributed more intensively on words with larger influence on text semantic extraction, and the attention weight distributions at different view angles are greatly different as much as possible, and the finally synthesized sentence semantic is expressed as M ═ AH. The word level attention model describes the weight of words in a sentence into a two-dimensional matrix, different rows of the matrix represent different levels of information of the sentence, namely sentence semantics are read from different visual angles, and then matrix level semantic representation is obtained, so that sentence expression information is richer, and the problem of information loss when the sentence is compressed into a one-dimensional vector is solved.
The sentence semantic matrix M and the word semantic representation H use the dropout method to eliminate the influence of overfitting, part of nodes in the network are randomly removed in the training stage, and all nodes are used in the testing stage. Introducing the hyperparameter p indicates that each node of the network is retained with a probability of p and removed with a probability of 1-p in an iteration. Therefore, in each iteration of the network, only the sub-network formed by the reserved partial nodes is trained, namely only the node parameters in the sub-network are updated, and the remaining node parameters reserve the result of the last iteration, so that the model generalization capability can be improved, and the training process can be accelerated.
S3: sentence-level text semantic information extraction
The sentence semantics of the attention mechanism are expressed as a two-dimensional matrix, the contribution of words to the text semantics from different perspectives is described, and in the global semantic information extraction at the sentence level, the sentences are encoded from each perspective.
Suppose a sentence s at the ith perspectivej(j is 1, 2, …, m) is given by
Figure BDA0003268613990000091
Using BilSTM network to input semantic vectors of m sentences under the view angle i (i is 1, …, r), combining the forward and backward context information to obtain hidden layer output, and then expressing the semantic as
Figure BDA0003268613990000092
Wherein r represents the number of viewing angles, hiRepresenting the semantics of the m sentences in the ith view angle.
The attention model introduced into the sentence level measures the importance of a sentence in the text, and h is calculatediAnd hjSimilarity metric function:
Figure BDA0003268613990000093
wherein M isijFor position coding, forward coding can be used
Figure BDA0003268613990000094
And backward position coding
Figure BDA0003268613990000095
For similarity metric matrix f (h)i,hj) Performing softmax normalization operation to obtain a weight, and further obtaining a weight matrix of sentence interrelations from r visual angles
Figure BDA0003268613990000096
Multiplying with sentence characteristics to obtain global characteristic matrix representation of text
Figure BDA0003268613990000097
Example 2:
the embodiment 2 of the present disclosure provides a text recognition system based on multi-level semantics, including:
a data acquisition module configured to: acquiring text data to be identified;
a word vector extraction module configured to: extracting words of the text data to obtain word vectors of the words;
a local sentence semantic representation acquisition module configured to: obtaining the feature representation of a word according to the obtained word vector and the first bidirectional long-short term memory network, and obtaining word-level local sentence semantic representations under different visual angles by combining the first attention network;
a global sentence semantic representation acquisition module configured to: obtaining the characteristic representation of a sentence according to the obtained sentence semantic representation under different visual angles of word levels and a second bidirectional long-short term memory network, and obtaining sentence level global sentence semantic representation under different visual angles by combining a second attention network;
a text recognition module configured to: and obtaining a text recognition result according to the obtained global sentence semantic representation.
The working method of the system is the same as the text recognition method based on the multilevel semantics provided in the embodiment 1, and the description is omitted here.
Example 3:
the embodiment 3 of the present disclosure provides a computer-readable storage medium, on which a program is stored, where the program, when executed by a processor, implements the steps in the text recognition method based on multilevel semantics according to embodiment 1 of the present disclosure, where the steps are:
acquiring text data to be identified;
extracting words of the text data to obtain word vectors of the words;
obtaining the feature representation of a word according to the obtained word vector and the first bidirectional long-short term memory network, and obtaining word-level local sentence semantic representations under different visual angles by combining the first attention network;
obtaining the characteristic representation of a sentence according to the obtained sentence semantic representation under different visual angles of word levels and a second bidirectional long-short term memory network, and obtaining sentence level global sentence semantic representation under different visual angles by combining a second attention network;
and obtaining a text recognition result according to the obtained global sentence semantic representation.
The detailed steps are the same as those of the text recognition method based on multilevel semantics provided in embodiment 1, and are not described herein again.
Example 4:
the embodiment 4 of the present disclosure provides an electronic device, which includes a memory, a processor, and a program stored in the memory and capable of running on the processor, where the processor implements the steps in the text recognition method based on multilevel semantics according to the embodiment 1 of the present disclosure when executing the program, where the steps are:
acquiring text data to be identified;
extracting words of the text data to obtain word vectors of the words;
obtaining the feature representation of a word according to the obtained word vector and the first bidirectional long-short term memory network, and obtaining word-level local sentence semantic representations under different visual angles by combining the first attention network;
obtaining the characteristic representation of a sentence according to the obtained sentence semantic representation under different visual angles of word levels and a second bidirectional long-short term memory network, and obtaining sentence level global sentence semantic representation under different visual angles by combining a second attention network;
and obtaining a text recognition result according to the obtained global sentence semantic representation.
The detailed steps are the same as those of the text recognition method based on multilevel semantics provided in embodiment 1, and are not described herein again.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (10)

1. A text recognition method based on multilevel semantics is characterized in that: the method comprises the following steps:
acquiring text data to be identified;
extracting words of the text data to obtain word vectors of the words;
obtaining the feature representation of a word according to the obtained word vector and the first bidirectional long-short term memory network, and obtaining word-level local sentence semantic representations under different visual angles by combining the first attention network;
obtaining the characteristic representation of a sentence according to the obtained sentence semantic representation under different visual angles of word levels and a second bidirectional long-short term memory network, and obtaining sentence level global sentence semantic representation under different visual angles by combining a second attention network;
and obtaining a text recognition result according to the obtained global sentence semantic representation.
2. The method for text recognition based on multilevel semantics of claim 1, wherein:
and embedding words by using a Skip-gram model to obtain a word vector of each word.
3. The method for text recognition based on multilevel semantics of claim 1, wherein:
in the first bidirectional long-short term memory network and the second bidirectional long-short term memory network, each partial feature vector is generated by connecting the forward long-short term memory network and the backward long-short term memory network.
4. The method for text recognition based on multilevel semantics of claim 1, wherein:
with the first attention network, the weights of words in the sentence are described as a two-dimensional weight matrix, and different rows of the matrix represent information from different perspectives of the sentence.
5. The method for text recognition based on multilevel semantics of claim 4, wherein:
words under different viewing anglesThe level sentence semantic representation is a product of the feature representation of the word and a weight matrix of the first attention mechanism network, and the weight matrix includes a constraint:
Figure FDA0003268613980000011
wherein A is a weight matrix and I is an identity matrix.
6. The method for text recognition based on multilevel semantics of claim 4, wherein:
the word-level local sentence semantic representation and the feature representation of the words use the dropout method to eliminate the influence of overfitting.
7. The method for text recognition based on multilevel semantics of claim 1, wherein:
and according to the characteristic representation of the sentence, obtaining a weight matrix which represents the mutual relation of the sentences under a plurality of visual angles by utilizing a second attention network, and multiplying the weight matrix by the sentence characteristics to obtain a global sentence semantic representation matrix of the text.
8. A text recognition system based on multilevel semantics is characterized in that: the method comprises the following steps:
a data acquisition module configured to: acquiring text data to be identified;
a word vector extraction module configured to: extracting words of the text data to obtain word vectors of the words;
a local sentence semantic representation acquisition module configured to: obtaining the feature representation of a word according to the obtained word vector and the first bidirectional long-short term memory network, and obtaining word-level local sentence semantic representations under different visual angles by combining the first attention network;
a global sentence semantic representation acquisition module configured to: obtaining the characteristic representation of a sentence according to the obtained sentence semantic representation under different visual angles of word levels and a second bidirectional long-short term memory network, and obtaining sentence level global sentence semantic representation under different visual angles by combining a second attention network;
a text recognition module configured to: and obtaining a text recognition result according to the obtained global sentence semantic representation.
9. A computer-readable storage medium, on which a program is stored, which, when being executed by a processor, carries out the steps of the method for multilevel semantic based text recognition according to any of claims 1 to 7.
10. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for text recognition based on multilevel semantics according to any one of claims 1 to 7 when executing the program.
CN202111094473.9A 2021-09-17 2021-09-17 Text recognition method and system based on multilevel semantics Pending CN113836910A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111094473.9A CN113836910A (en) 2021-09-17 2021-09-17 Text recognition method and system based on multilevel semantics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111094473.9A CN113836910A (en) 2021-09-17 2021-09-17 Text recognition method and system based on multilevel semantics

Publications (1)

Publication Number Publication Date
CN113836910A true CN113836910A (en) 2021-12-24

Family

ID=78959949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111094473.9A Pending CN113836910A (en) 2021-09-17 2021-09-17 Text recognition method and system based on multilevel semantics

Country Status (1)

Country Link
CN (1) CN113836910A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116151241A (en) * 2023-04-19 2023-05-23 湖南马栏山视频先进技术研究院有限公司 Entity identification method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558487A (en) * 2018-11-06 2019-04-02 华南师范大学 Document Classification Method based on the more attention networks of hierarchy
CN109902293A (en) * 2019-01-30 2019-06-18 华南理工大学 A kind of file classification method based on part with global mutually attention mechanism
CN110162636A (en) * 2019-05-30 2019-08-23 中森云链(成都)科技有限责任公司 Text mood reason recognition methods based on D-LSTM
WO2020253052A1 (en) * 2019-06-18 2020-12-24 平安普惠企业管理有限公司 Behavior recognition method based on natural semantic understanding, and related device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558487A (en) * 2018-11-06 2019-04-02 华南师范大学 Document Classification Method based on the more attention networks of hierarchy
CN109902293A (en) * 2019-01-30 2019-06-18 华南理工大学 A kind of file classification method based on part with global mutually attention mechanism
CN110162636A (en) * 2019-05-30 2019-08-23 中森云链(成都)科技有限责任公司 Text mood reason recognition methods based on D-LSTM
WO2020253052A1 (en) * 2019-06-18 2020-12-24 平安普惠企业管理有限公司 Behavior recognition method based on natural semantic understanding, and related device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116151241A (en) * 2023-04-19 2023-05-23 湖南马栏山视频先进技术研究院有限公司 Entity identification method and device

Similar Documents

Publication Publication Date Title
CN110969020B (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN108875807B (en) Image description method based on multiple attention and multiple scales
CN108733792B (en) Entity relation extraction method
Zhou et al. End-to-end learning of semantic role labeling using recurrent neural networks
CN108549658B (en) Deep learning video question-answering method and system based on attention mechanism on syntax analysis tree
CN111950287B (en) Entity identification method based on text and related device
CN111274800A (en) Inference type reading understanding method based on relational graph convolution network
CN110110318B (en) Text steganography detection method and system based on cyclic neural network
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN111400494B (en) Emotion analysis method based on GCN-Attention
CN111046178A (en) Text sequence generation method and system
CN113435211A (en) Text implicit emotion analysis method combined with external knowledge
CN113705196A (en) Chinese open information extraction method and device based on graph neural network
Li et al. Image describing based on bidirectional LSTM and improved sequence sampling
CN113553418A (en) Visual dialog generation method and device based on multi-modal learning
CN115687609A (en) Zero sample relation extraction method based on Prompt multi-template fusion
CN111444328A (en) Natural language automatic prediction inference method with interpretation generation
CN111339256A (en) Method and device for text processing
CN113836910A (en) Text recognition method and system based on multilevel semantics
Prabhakar et al. Performance analysis of hybrid deep learning models with attention mechanism positioning and focal loss for text classification
CN114065769B (en) Method, device, equipment and medium for training emotion reason pair extraction model
CN116629361A (en) Knowledge reasoning method based on ontology learning and attention mechanism
CN112131879A (en) Relationship extraction system, method and device
CN115495579A (en) Method and device for classifying text of 5G communication assistant, electronic equipment and storage medium
CN115456173A (en) Generalized artificial neural network unsupervised local learning method, system and application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination