CN116483995A

CN116483995A - Text recognition method and device

Info

Publication number: CN116483995A
Application number: CN202310262350.4A
Authority: CN
Inventors: 王哲; 陈子骁; 庄光庭
Original assignee: Avatr Technology Chongqing Co Ltd
Current assignee: Avatr Technology Chongqing Co Ltd
Priority date: 2023-03-17
Filing date: 2023-03-17
Publication date: 2023-07-25

Abstract

The embodiment of the invention relates to the technical field of text recognition, and discloses a text recognition method and a text recognition device, wherein the text recognition method comprises the following steps: acquiring a text to be identified, and extracting slot filling features and intention identification features of the text to be identified; then, based on a cross attention mechanism, carrying out semantic information fusion on the slot filling features and the intention recognition features to obtain a slot filling stream and an intention recognition stream; and finally, acquiring classification information of the text to be identified and entity information of the text to be identified based on the intention identification stream and the slot filling stream respectively. By applying the technical scheme of the invention, semantic information fusion is carried out on the slot filling features and the intention recognition features, so that the intention recognition flow and the slot filling flow can use the semantic information of the other party, the accuracy of prediction reasoning of the intention recognition flow and the slot filling flow is promoted, and the fault tolerance of text recognition is improved.

Description

Text recognition method and device

Technical Field

The embodiment of the invention relates to the technical field of text recognition, in particular to a text recognition method and device.

Background

At present, voice assistants and intelligent chat robots are widely applied in various fields, and provide a more convenient way for human-computer interaction, so that the robots can make judgment according to the input semantics of texts, thereby realizing the requirements of users and facilitating industrialized production and life of people. The way the robot understands and parses the human input semantics is natural language understanding (Natural Language Understanding, NLU).

Conventional NLU models typically connect two tasks by pipeline (linear communication model) that propagates errors of an upstream task to a downstream task, resulting in error accumulation. For example, if a user inputs "i want to go to the nearest movie theater to watch movie a", and if the upstream intention recognition erroneously recognizes the user intention as "listen to music", instead of "watch movie a", the downstream slot filling should "listen to music" according to the erroneous intention, and attempts to parse the entities "musician", "song" etc. in the "listen to music" semantic slot from the input text, it is obvious that this attempt is futile, and implementation of this approach relies on the high-precision and intended semantic slot rule design of the upstream task, with no flexibility and heuristics.

Disclosure of Invention

In view of the above problems, an embodiment of the present invention provides a text recognition method and apparatus, which are used to solve the problem in the prior art that a conventional NLU model connects two tasks in a pipeline (linear communication model) manner, resulting in error propagation of an upstream task to a downstream task, and error accumulation.

According to an aspect of an embodiment of the present invention, there is provided a text recognition method including:

Acquiring a text to be identified;

extracting slot filling features and intention recognition features of the text to be recognized;

based on a cross attention mechanism, carrying out semantic information fusion on the slot filling features and the intention recognition features to obtain a slot filling stream fused with intention knowledge and an intention recognition stream fused with entity knowledge;

acquiring classification information of the text to be identified based on the intention identification flow of the fusion entity knowledge;

and acquiring entity information of the text to be identified based on the slot filling flow of the fusion intention knowledge.

In an optional manner, the step of obtaining the classification information of the text to be identified based on the intention recognition flow of the fusion entity knowledge includes:

decoding the intention recognition flow of the fused entity knowledge by using a multi-layer perceptron to obtain a sequence set of intention labels corresponding to the intention recognition flow of the fused entity knowledge;

and acquiring the intention label with the largest vote in the sequence set of the intention labels by utilizing a voting mechanism, wherein the intention label with the largest vote is the classification information of the text to be identified.

In an optional manner, the step of obtaining the entity information of the text to be identified based on the slot filling flow of the fusion intention knowledge includes:

Decoding a plurality of labels of the slot filling flow fusing the intention knowledge by using a conditional random field model to obtain a sequence set of entity class labels corresponding to the slot filling flow fusing the intention knowledge;

searching an optimal solution of the sequence set of the entity class labels, wherein the optimal solution is entity information of the text to be identified.

In an optional manner, the step of extracting the slot filling feature and the intention recognition feature of the text to be recognized includes:

converting the text to be identified into an embedded representation;

and obtaining the slot filling characteristic and the intention recognition characteristic based on the embedded representation and the sine and cosine position codes.

In an optional manner, the step of decoding the intent recognition stream of the fused entity knowledge by using the multi-layer perceptron to obtain the sequence set of intent labels corresponding to the intent recognition stream of the fused entity knowledge includes:

inputting the intention recognition flow of the fused entity knowledge into a multi-layer perceptron, and outputting an intention label corresponding to a word vector of each mark level in the intention recognition flow of the fused entity knowledge by an output end of the multi-layer perceptron;

and establishing a sequence set of intention labels corresponding to the intention recognition flow of the fused entity knowledge according to the intention labels corresponding to the word vectors of each mark level.

In an optional manner, the step of searching for the optimal solution of the sequence set of entity class labels includes:

and searching the optimal solution of the sequence set of the entity class labels by adopting a Viterbi algorithm.

In an alternative way, the same encoder is used to extract both the slot filling features and the intent recognition features of the text to be recognized.

According to another aspect of an embodiment of the present invention, there is provided a text recognition apparatus including:

the acquisition module is used for acquiring the text to be identified;

the extraction module is used for extracting slot filling features and intention recognition features of the text to be recognized;

the fusion module is used for carrying out semantic information fusion on the slot filling features and the intention recognition features based on a cross attention mechanism to obtain a slot filling stream fused with intention knowledge and an intention recognition stream fused with entity knowledge;

the intention recognition reasoning module is used for acquiring the classification information of the text to be recognized based on the intention recognition flow of the fusion entity knowledge;

and the slot filling reasoning module is used for acquiring the entity information of the text to be identified based on the slot filling flow of the fusion intention knowledge.

In an optional manner, the intent recognition reasoning module is further configured to decode the intent recognition stream of the fused entity knowledge by using a multi-layer perceptron, and obtain a sequence set of intent labels corresponding to the intent recognition stream of the fused entity knowledge;

In an optional manner, the slot filling inference module is further configured to decode a plurality of labels of the slot filling stream of the fused intention knowledge by using a conditional random field model, to obtain a sequence set of entity class labels corresponding to the slot filling stream of the fused intention knowledge;

In an alternative manner, the extracting module is further configured to convert the text to be identified into an embedded representation;

In an optional manner, the intent recognition reasoning module is further configured to input the intent recognition stream of the fused entity knowledge to a multi-layer perceptron, and an output end of the multi-layer perceptron outputs an intent label corresponding to a word vector of each mark level in the intent recognition stream of the fused entity knowledge;

In an optional manner, the slot filling inference module is further configured to search for an optimal solution of the sequence set of entity class labels using a viterbi algorithm.

According to another aspect of an embodiment of the present invention, there is provided a text recognition apparatus including: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;

the memory is configured to hold at least one executable instruction that causes the processor to:

acquiring a text to be identified;

converting the text to be identified into an embedded representation;

According to yet another aspect of an embodiment of the present invention, there is provided a computer-readable storage medium having stored therein at least one executable instruction for causing a text recognition apparatus/device to:

acquiring a text to be identified;

converting the text to be identified into an embedded representation;

According to the embodiment of the invention, the text to be identified is obtained, and the slot filling characteristics and the intention identification characteristics of the text to be identified are extracted; then, based on a cross attention mechanism, carrying out semantic information fusion on the slot filling features and the intention recognition features to obtain a slot filling stream fused with intention knowledge and an intention recognition stream fused with entity knowledge; and finally, acquiring the classification information of the text to be identified and the entity information of the text to be identified based on the intention identification flow of the fused entity knowledge and the slot filling flow of the fused intention knowledge respectively. By applying the technical scheme of the invention, the semantic information fusion is carried out on the slot filling features and the intention recognition features, so that the intention recognition flow of the fused entity knowledge and the slot filling flow of the fused intention knowledge can use the semantic information of the other party, thereby simultaneously promoting the accuracy of the intention recognition flow of the fused entity knowledge and the prediction reasoning of the slot filling flow of the fused intention knowledge and improving the fault tolerance of text recognition.

The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and may be implemented according to the content of the specification, so that the technical means of the embodiments of the present invention can be more clearly understood, and the following specific embodiments of the present invention are given for clarity and understanding.

Drawings

The drawings are only for purposes of illustrating embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

fig. 1 is a schematic flow chart of an embodiment of a text recognition method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an overall framework of a joint learning architecture provided by an embodiment of the present invention;

FIG. 3 illustrates a scaled dot product attention schematic provided by an embodiment of the present invention;

FIG. 4 shows a schematic Cross-transducer structure based on Cross-attention provided by an embodiment of the present invention;

FIG. 5 is a diagram showing a comparison of a joint learning method and a stack propagation method of a common encoder according to the present invention;

fig. 6 is a schematic structural diagram of an embodiment of a text recognition device according to an embodiment of the present invention;

Fig. 7 shows a schematic structural diagram of a text recognition device according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein.

Fig. 1 shows a flowchart of an embodiment of a text recognition method provided by an embodiment of the present invention, which is performed by an intention recognition device. As shown in fig. 1, the text recognition method includes the steps of:

step 110: and acquiring a text to be identified.

The recognition text can be the content which is directly input by the user in a text form, or can be the content which is input by the user in a voice form and is converted into the text form by a voice recognition machine.

Step 120: and extracting the slot filling characteristics and the intention recognition characteristics of the text to be recognized.

For example, as shown in fig. 2, for the user's input text: "listentopsides", the text recognition device obtains two information streams-slot fill stream and intent recognition stream, and transforms the input text into an embedded representation (empedding representation) of a distributed vector through a token Token TokenEmbedding module (token embedding module is a vector to transform each token into a fixed dimension), so that the data can be modeled in vector space. For example, the token embedding module converts each word of the input text into 768-dimensional vector representation, e.g., the input text is "Ilike strawberries", and the input text is subjected to token embedding before being fed into the token layers. Furthermore, two special token are inserted at the beginning ([ CLS ]) and end ([ SEP ]) of the token result, wherein the token process uses the method of wordpiectokenization (sub-word level marking algorithm), so that the following is obtained: [ CLS ], I, like, straw, # # series, [ SEP ],6token, token Embedding module will convert each wordpiesetoken into 768-dimensional vector, thus 6tokens in the example are converted into a vector of size: (6,768) or a matrix of the size: (1,6,768).

The embedded representation generated for the input text is e ₀ ，e ₁ ，e ₂ ，e ₃ And adds the embedded representation with even vector positions to the sine position code and adds the embedded representation with odd vector positions to the cosine position code to produce a sequence representation with position information.

The sine position coding formula is:

the cosine position coding formula is:

wherein PE is a position code Position Encoding matrix, pos is the position of Token in the input text, the double index bit 2i is calculated using sin function, the singular index 2i+1 is calculated using cos function, d _model The size is input for the encoder/decoder used.

For example, when the embedded representation generated by inputting text is a 4-dimensional vector, the 4-dimensional vector includes 4 vector positions of 0 (even), 1 (odd), 2 (even), and 3 (odd), respectively. For even positions, useObtaining a position code corresponding to an even position, and adding the position code of the even position to a vector (content vector) of an embedded representation of the even position to obtain a vector (including the content vector and the position code vector) of a sequence input of position information; for odd positions, use +.>A position code corresponding to an odd position is obtained, and the position code of the odd position is added to a vector (content vector) of the embedded representation of the odd position, to obtain a vector (including a content vector and a position code vector) of a sequence input of position information.

Then, inputting the sequence with the position information into a Transfomer Encoder coding module, coding the sequence representation with the position information, generating Slot filling features (Slot filling) and intention recognition features (inter filling) with rich context semantic knowledge, and fully considering the context semantics of the input text by the text recognition method through the introduction of the position coding.

Step 130: and based on a cross attention mechanism, carrying out semantic information fusion on the slot filling features and the intention recognition features to obtain a slot filling stream fused with intention knowledge and an intention recognition stream fused with entity knowledge.

The invention aims to solve the problem of error propagation of an NLU (non-line language) traditional pipeline method, improve a general joint learning method, only shallow fusion of semantic representation of intention recognition and slot filling, or only incomplete joint of objective functions of the two, and provide a joint learning architecture of intention recognition and slot filling based on cross attention, as shown in fig. 2, which is a whole frame schematic diagram of the joint learning architecture.

Attention is an important technique in the field of deep learning and can be described as mapping a query and a set of key-value pairs (key-value) to an output, where the query, key, value and output are all vectors, and the output is calculated as a weighted sum of values, where the weight assigned to each value is calculated by the compatibility function of the query with the corresponding key, and a scaled dot product Attention graph is shown in fig. 3.

The Q, K, V corresponds to the query matrix, the key matrix and the value matrix, respectively, and calculates the weight of the corresponding vector, namely the attention score by using a dot product matching mode, and then performs weighted summation on the attention scores.

Respectively carrying out linear mapping on the slot filling characteristics and the intention recognition characteristics to obtain a first matrix Q, a first matrix K and a first matrix V of the slot filling characteristics; and obtaining a second matrix Q, a second matrix K and a second matrix V of the intention recognition feature.

When a slot filling flow fusing intention knowledge is obtained, firstly, a first key value pair of a slot filling characteristic is determined based on the first matrix K and the first matrix V, and then, the attention weight of intention recognition is determined based on the second matrix Q and the first key value pair; finally, based on the attention weight of intention recognition and the first matrix V, obtaining a slot filling flow fusing intention knowledge;

when the intention recognition flow of the fusion entity knowledge is obtained, firstly, a second key value pair of the intention recognition feature is determined based on the second matrix K and the second matrix V, and then, the attention weight of slot filling is determined based on the first matrix Q and the second key value pair; finally, based on the attention weight of the slot filling and the second matrix V, an intention recognition flow of the fusion entity knowledge is obtained.

Unlike the self-attention mechanism (Q, K, V is from the linear mapping transformation of the same sequence data, i.e. includes the attention of the own sequence data), the Cross-attention is to match the query of one sequence data with the key-value of another sequence data, thus completing the information fusion, as shown in fig. 4, and is a Cross-transducer structure diagram based on the Cross-attention.

For two different sequence input representations, respectively performing linear mapping on the two different sequence input representations to generate a query matrix Q, key matrix K, value matrix V, respectively inquiring key-value key value pairs of the other party by using a query of the one party, calculating to generate a weight matrix representing the attention of the one party to the other party, then performing weighted summation on the content value of the other party to generate a cross attention result, and in the Cross Transformer structure, performing residual connection, layer normalization, feedforward neural network, residual connection, layer normalization and other treatments on the output attention result (multi-head attention), so that both attention parties (slot filling semantic representation fusing intention knowledge, intention semantic representation fusing entity knowledge) are connected with the context information of the other party, thereby guiding the signal output of the other party.

The text recognition method provided by the embodiment of the invention has the other core idea of joint learning based on a cross attention mechanism, wherein the joint learning is an important technology in the field of deep learning, namely, two tasks with stronger correlation are associated and trained, so that the accuracy and consistency of a model are improved together. There are roughly three kinds of joint learning methods in the NLU field:

the first joint learning method is as follows: the method is characterized in that the method is simply combined from the angle of the objective function, the objective function of the intention recognition flow fusing entity knowledge and the slot filling flow fusing the intention knowledge is combined and added, no signal is shared and transmitted between the two models, and the combined learning of the unfused features only puts two tasks to be trained together and does not have the essence of combination.

Fig. 5 is a schematic diagram showing a comparison between a joint learning method and a stack propagation method of a common encoder according to an embodiment of the present invention.

The second joint learning method is as follows: the method carries out depth fusion on the information of two tasks, but the method is different in task and data input and has no interpretability because the method belongs to one encoder.

The third joint learning method is as follows: stack Propagation (Stack-Propagation), where the intent recognition stream that merges entity knowledge and the slot fill stream that merges intent knowledge use one encoder together, the intent recognition is used as a guide to explicitly transmit semantic information to the slot fill, this approach greatly improves the interpretability of the Stack Propagation structure, but this structure requires encoder power to be sufficiently powerful that this approach cannot perform its full function when encoder power is limited.

It should be noted that the slot filling feature and the intention recognition feature of the text to be recognized are obtained by adopting the same coding mode. Specifically, the invention completely separates the slot filling flow and the intention recognition flow based on the thought of cross attention, and in the semantic fusion stage, the attention mechanism is utilized to enable the two parties to heuristically extract the semantic information of the other party, and the same coding mode (sine and cosine position coding) is utilized to obtain the slot filling characteristics and the intention recognition characteristics with rich context semantic knowledge.

According to the text recognition method provided by the embodiment of the invention, a new NLU joint learning architecture is provided, the idea of cross attention is used for completely separating the two information streams of the intention recognition stream fusing entity knowledge and the slot filling stream fusing the intention knowledge, in the semantic fusion stage, the attention mechanism is utilized to enable two parties to heuristically extract the semantic information of the other party, and the encoder and the decoder are completely separated, and simultaneously, the semantic features are deeply fused to form the NLU joint learning new architecture with the interpretability, so that the intention recognition stream fusing the entity knowledge and the slot filling stream fusing the intention knowledge are mutually promoted, and the efficiency and the accuracy of the NLU joint learning new architecture are improved by improving the interpretability and the joint depth of the NLU joint learning new architecture.

Step 140: and acquiring the classification information of the text to be identified based on the intention identification flow of the fusion entity knowledge.

Specifically, for the intent recognition flow of the fused entity knowledge, the conventional intent recognition flow of the fused entity knowledge can be regarded as a text classification task of sentence level (content-level), and for an input sentence, the intent category of the input sentence is output. At the output decoding layer, firstly, the intention recognition flow of the fused entity knowledge is input to a multi-layer perceptron, and the output end of the multi-layer perceptron outputs the intention label corresponding to the word vector of each mark level in the intention recognition flow of the fused entity knowledge; then, taking the intention labels corresponding to the word vectors of each mark level as a set, and establishing a sequence set containing the intention labels corresponding to the word vectors of each mark level, so as to obtain the sequence set of the intention labels corresponding to the intention recognition flow of the fused entity knowledge; when the prediction reasoning is carried out, voting is carried out on the intention labels corresponding to the word vectors of each mark level in the sequence set of the intention labels by utilizing a voting mechanism, the intention label with the largest vote is selected to be used as the intention label of the intention recognition flow of the fused entity knowledge, namely, the intention recognition flow of the fused entity knowledge is converted into the intention recognition by the scheme, the text classification task of the conventional sentence level is converted into the task of a plurality of mark levels (token-level), and the voting mechanism is used in the reasoning stage, so that the fault tolerance of the text recognition method is improved.

Step 150: and acquiring entity information of the text to be identified based on the slot filling flow of the fusion intention knowledge.

Specifically, in the decoding stage, because the slot filling flow of the fusion intention knowledge is a sequence labeling task, the dependency relationship among the labels needs to be fully considered, and the text recognition method provided by the embodiment of the invention uses the conditional random field model to perform output decoding, performs label prediction on each input label of the slot filling flow of the fusion intention knowledge, and judges the corresponding entity class label, thereby obtaining the sequence set of the entity class labels corresponding to the slot filling flow of the fusion intention knowledge. And then, when prediction reasoning is carried out, for the slot filling flow fusing the intention knowledge, searching an optimal solution in a sequence set of entity class labels by using a Viterbi algorithm to be used as entity information of the text to be identified in order to reduce algorithm complexity.

Specifically, firstly, determining an optimal search path in a sequence set of the entity class labels based on a Viterbi algorithm; then, based on the optimal search path, an optimal solution is determined from the sequence set of entity class labels. The viterbi algorithm is a dynamic programming algorithm, and is used for searching the viterbi path which is most likely to generate the observation event sequence, and determining the optimal search path by adopting the following three principles: 1. if the optimal search path passes through the M points of the network, the sub-path from the start point to the M points must also be the path optimal from the start point to the M points. 2. Assuming that there are k states at time i, there are k optimal paths from the start to the k states at time i, and the final optimal path must pass through one of them. 3. In calculating the shortest path of the i+1 state, only the optimal path from the start to the current k state values and the optimal path from the current state value to the i+1 state value need be considered, for example, the optimal path when t=3 is determined, which is equal to the optimal path of all the state nodes x2i when t=2 plus the optimal path of each node from t=2 to t=3.

It should be noted that, according to the text recognition method provided by the embodiment of the present invention, according to different actual application scenarios, different feedback content may be given based on the classification information of the text to be recognized and the entity information of the text to be recognized, for example, for a conversational chat robot, the scheme may further determine semantic information of the text to be recognized based on the classification information of the text to be recognized and the entity information of the text to be recognized; and then, based on the semantic information of the text to be recognized, determining corresponding feedback information, for example, if the text to be recognized is a question, and the conversational chat robot gives a corresponding answer according to the determined semantic information. For another example, for the commodity selling robot, if the text to be identified is the price range of the target commodity, the conversational chat robot gives out the commodity information corresponding to the price range according to the determined semantic information. By the text recognition method provided by the invention, based on more accurate text semantic recognition, good experience of a user is realized in different application scenes.

According to the text recognition method provided by the embodiment of the invention, the text to be recognized is obtained, and the slot filling characteristics and the intention recognition characteristics of the text to be recognized are extracted; then, based on a cross attention mechanism, carrying out semantic information fusion on the slot filling features and the intention recognition features to obtain a slot filling stream fused with intention knowledge and an intention recognition stream fused with entity knowledge; and finally, acquiring the classification information of the text to be identified and the entity information of the text to be identified based on the intention identification flow of the fused entity knowledge and the slot filling flow of the fused intention knowledge respectively. By applying the technical scheme of the invention, the semantic information fusion is carried out on the slot filling features and the intention recognition features, so that the intention recognition flow of the fused entity knowledge and the slot filling flow of the fused intention knowledge can use the semantic information of the other party, thereby simultaneously promoting the accuracy of the intention recognition flow of the fused entity knowledge and the prediction reasoning of the slot filling flow of the fused intention knowledge and improving the fault tolerance of text recognition.

Fig. 6 shows a schematic structural diagram of an embodiment of a text recognition device according to an embodiment of the present invention. As shown in fig. 6, the text recognition apparatus 600 includes: the acquisition module 610, the extraction module 620, the fusion module 630, the intent recognition inference module 640, and the slot fill inference module 650.

The obtaining module 610 is configured to obtain text to be identified.

The extracting module 620 is configured to extract a slot filling feature and an intention recognition feature of the text to be recognized.

The fusion module 630 is configured to perform semantic information fusion on the slot filling feature and the intention recognition feature based on a cross attention mechanism, so as to obtain a slot filling stream fused with intention knowledge and an intention recognition stream fused with entity knowledge.

The intention recognition reasoning module 640 is configured to obtain classification information of the text to be recognized based on the intention recognition flow of the fused entity knowledge.

The slot filling inference module 650 is configured to obtain entity information of the text to be identified based on the slot filling flow of the fusion intention knowledge.

In an optional manner, the intent recognition inference module 640 is further configured to decode the intent recognition stream of the fused entity knowledge by using a multi-layer perceptron, to obtain a sequence set of intent labels corresponding to the intent recognition stream of the fused entity knowledge.

In an optional manner, the slot filling inference module 650 is further configured to decode a plurality of labels of the slot filling stream of the fused intention knowledge by using a conditional random field model, to obtain a sequence set of entity class labels corresponding to the slot filling stream of the fused intention knowledge.

In an alternative manner, the extraction module 620 is further configured to convert the text to be recognized into an embedded representation.

In an optional manner, the intent recognition inference module 640 is further configured to input the intent recognition stream of the fused entity knowledge to a multi-layer perceptron, and an output end of the multi-layer perceptron outputs an intent label corresponding to a word vector of each label level in the intent recognition stream of the fused entity knowledge.

In an alternative manner, the slot fill inference module 650 is further configured to search for an optimal solution for the sequence set of entity class labels using a viterbi algorithm.

According to the text recognition device provided by the embodiment of the invention, the text to be recognized is obtained, and the slot filling characteristics and the intention recognition characteristics of the text to be recognized are extracted; then, based on a cross attention mechanism, carrying out semantic information fusion on the slot filling features and the intention recognition features to obtain a slot filling stream fused with intention knowledge and an intention recognition stream fused with entity knowledge; and finally, acquiring the classification information of the text to be identified and the entity information of the text to be identified based on the intention identification flow of the fused entity knowledge and the slot filling flow of the fused intention knowledge respectively. By applying the technical scheme of the invention, the semantic information fusion is carried out on the slot filling features and the intention recognition features, so that the intention recognition flow of the fused entity knowledge and the slot filling flow of the fused intention knowledge can use the semantic information of the other party, thereby simultaneously promoting the accuracy of the intention recognition flow of the fused entity knowledge and the prediction reasoning of the slot filling flow of the fused intention knowledge and improving the fault tolerance of text recognition.

Fig. 7 is a schematic structural diagram of a text recognition device according to an embodiment of the present invention, and the specific embodiment of the present invention is not limited to the specific implementation of the text recognition device.

As shown in fig. 7, the text recognition apparatus may include: a processor 702, a communication interface (Communications Interface), a memory 706, and a communication bus 708.

Wherein: processor 702, communication interface 704, and memory 706 perform communication with each other via a communication bus 708. A communication interface 704 for communicating with network elements of other devices, such as clients or other servers. The processor 702 is configured to execute the program 710, and may specifically perform the relevant steps in the text recognition method embodiment described above.

In particular, program 710 may include program code including computer-executable instructions.

The processor 702 may be a Central Processing Unit (CPU), or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors comprised by the text recognition device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.

Memory 706 for storing programs 710. The memory 706 may comprise high-speed RAM memory or may further comprise non-volatile memory (non-volatile memory), such as at least one disk memory.

Program 710 may be specifically invoked by processor 702 to cause a text recognition device to:

and acquiring a text to be identified.

And extracting the slot filling characteristics and the intention recognition characteristics of the text to be recognized.

And based on a cross attention mechanism, carrying out semantic information fusion on the slot filling features and the intention recognition features to obtain a slot filling stream fused with intention knowledge and an intention recognition stream fused with entity knowledge.

And acquiring the classification information of the text to be identified based on the intention identification flow of the fusion entity knowledge.

and decoding the intention recognition flow of the fused entity knowledge by using a multi-layer perceptron to obtain a sequence set of intention labels corresponding to the intention recognition flow of the fused entity knowledge.

and decoding a plurality of labels of the slot filling flow fusing the intention knowledge by using a conditional random field model to obtain a sequence set of entity class labels corresponding to the slot filling flow fusing the intention knowledge.

and converting the text to be identified into an embedded representation.

In the text recognition device provided by the embodiment of the invention, the stored program 710 is called by the processor 702 to enable the text recognition device to execute the following operations: acquiring a text to be identified, and extracting slot filling characteristics and intention identification characteristics of the text to be identified; then, based on a cross attention mechanism, carrying out semantic information fusion on the slot filling features and the intention recognition features to obtain a slot filling stream fused with intention knowledge and an intention recognition stream fused with entity knowledge; and finally, acquiring the classification information of the text to be identified and the entity information of the text to be identified based on the intention identification flow of the fused entity knowledge and the slot filling flow of the fused intention knowledge respectively. By applying the technical scheme of the invention, the semantic information fusion is carried out on the slot filling features and the intention recognition features, so that the intention recognition flow of the fused entity knowledge and the slot filling flow of the fused intention knowledge can use the semantic information of the other party, thereby simultaneously promoting the accuracy of the intention recognition flow of the fused entity knowledge and the prediction reasoning of the slot filling flow of the fused intention knowledge and improving the fault tolerance of text recognition.

An embodiment of the present invention provides a computer readable storage medium storing at least one executable instruction that, when executed on a text recognition device/apparatus, causes the text recognition device/apparatus to perform the text recognition method in any of the method embodiments described above.

The executable instructions may be specifically for causing a text recognition device/apparatus to:

and acquiring a text to be identified.

and converting the text to be identified into an embedded representation.

The storage medium stores at least one executable instruction operable to cause a text recognition device/apparatus to: acquiring a text to be identified, and extracting slot filling characteristics and intention identification characteristics of the text to be identified; then, based on a cross attention mechanism, carrying out semantic information fusion on the slot filling features and the intention recognition features to obtain a slot filling stream fused with intention knowledge and an intention recognition stream fused with entity knowledge; and finally, acquiring the classification information of the text to be identified and the entity information of the text to be identified based on the intention identification flow of the fused entity knowledge and the slot filling flow of the fused intention knowledge respectively. By applying the technical scheme of the invention, the semantic information fusion is carried out on the slot filling features and the intention recognition features, so that the intention recognition flow of the fused entity knowledge and the slot filling flow of the fused intention knowledge can use the semantic information of the other party, thereby simultaneously promoting the accuracy of the intention recognition flow of the fused entity knowledge and the prediction reasoning of the slot filling flow of the fused intention knowledge and improving the fault tolerance of text recognition.

The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. In addition, embodiments of the present invention are not directed to any particular programming language.

In the description provided herein, numerous specific details are set forth. It will be appreciated, however, that embodiments of the invention may be practiced without such specific details. Similarly, in the above description of exemplary embodiments of the invention, various features of embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. Wherein the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Except that at least some of such features and/or processes or elements are mutually exclusive.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specifically stated.

Claims

1. A text recognition method, characterized in that the text recognition method comprises:

acquiring a text to be identified;

2. The text recognition method according to claim 1, wherein the step of acquiring the classification information of the text to be recognized based on the intention recognition flow of the fused entity knowledge includes:

3. The text recognition method according to claim 2, wherein the step of decoding the intention recognition stream of the fused entity knowledge by using the multi-layer perceptron to obtain the sequence set of intention labels corresponding to the intention recognition stream of the fused entity knowledge includes:

4. The text recognition method according to claim 1, wherein the step of obtaining entity information of the text to be recognized based on the slot filling flow of the fusion intention knowledge includes:

5. The text recognition method according to claim 1, wherein the step of extracting slot filling features and intention recognition features of the text to be recognized comprises:

converting the text to be identified into an embedded representation;

Adding the embedded representation and the sine and cosine position codes to obtain a sequence representation with position information;

and encoding the sequence representation with the position information to obtain the slot filling characteristic and the intention recognition characteristic.

6. The text recognition method of claim 4, wherein the searching for the optimal solution of the sequence set of entity class labels comprises:

determining an optimal search path in the sequence set of the entity class labels based on a Viterbi algorithm;

and determining an optimal solution from the sequence set of entity class labels based on the optimal search path.

7. The text recognition method according to claim 1, wherein the step of obtaining a slot filling stream of fused intention knowledge and an intention recognition stream of fused entity knowledge by performing semantic information fusion of the slot filling feature and the intention recognition feature based on a cross-attention mechanism comprises:

performing linear mapping on the slot filling characteristics to obtain a first matrix Q, a first matrix K and a first matrix V of the slot filling characteristics;

linearly mapping the intention recognition feature to obtain a second matrix Q, a second matrix K and a second matrix V of the intention recognition feature;

Based on the first matrix Q, the second matrix K and the second matrix V, obtaining a slot filling flow of fusion intention knowledge;

and obtaining an intention recognition flow of the fused entity knowledge based on the second matrix Q, the first matrix K and the first matrix V.

8. The text recognition method according to claim 7, wherein the step of obtaining the intention recognition stream of the fused entity knowledge based on the second matrix Q, the first matrix K, and the first matrix V includes:

determining a first key value pair of the slot filling feature based on the first matrix K and the first matrix V;

determining an attention weight for intent recognition based on the second matrix Q and the first key-value pair;

based on the attention weight of intention recognition and the first matrix V, obtaining a slot filling flow fusing intention knowledge;

the step of obtaining the slot filling flow of the fusion intention knowledge according to the first matrix Q, the second matrix K and the second matrix V comprises the following steps:

determining a second key value pair of the intention recognition feature based on the second matrix K and the second matrix V;

determining a slot filling attention weight based on the first matrix Q and the second key value pair;

and obtaining an intention recognition flow of the fusion entity knowledge based on the attention weight of the slot filling and the second matrix V.

9. The text recognition method according to any one of claims 1 to 8, wherein after the step of obtaining entity information of the text to be recognized, the slot filling stream based on the fusion intention knowledge further comprises:

determining semantic information of the text to be recognized based on the classification information of the text to be recognized and the entity information of the text to be recognized;

and determining corresponding feedback information based on the semantic information of the text to be identified.

10. A text recognition device, the text recognition device comprising:

the acquisition module is used for acquiring the text to be identified;