CN114330370A

CN114330370A - Natural language processing system and method based on artificial intelligence

Info

Publication number: CN114330370A
Application number: CN202210260510.7A
Authority: CN
Inventors: 李晋; 刘宇鹏
Original assignee: Tianjin Sirui Information Technology Co ltd
Current assignee: Li Jin
Priority date: 2022-03-17
Filing date: 2022-03-17
Publication date: 2022-04-12
Anticipated expiration: 2042-03-17
Also published as: CN114330370B

Abstract

The invention provides a natural language processing system and a natural language processing method based on artificial intelligence, which are used for obtaining a natural language information original data set, carrying out anomaly analysis on the original data set and generating an anomaly value set of the original data set; removing sample data in the abnormal value set from the original data set, inputting the information data set from which the sample data in the abnormal value set is removed into a semantic matching model for identification, and determining a semantic matching result; and sorting the predicted values of the matching results in consideration of the estimated loss values according to the sizes, wherein the sequence obtained after sorting is the natural language processing result. When the natural language processing time is increased along with the use duration, each layer is continuously optimized, so that the intelligent accuracy of the natural language processing is gradually improved.

Description

Natural language processing system and method based on artificial intelligence

Technical Field

The invention relates to the technical field of natural language processing, in particular to a natural language processing system and a natural language processing method based on artificial intelligence.

Background

With the advent of the big data age, the internet has a problem of text information explosion. Natural Language Processing (NLP): is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. NLP techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, and knowledge mapping.

People generally retrieve required information through a search engine, text matching is a core problem in natural language understanding, and the method has specific application in the fields of searching, advertising, recommendation, intelligent customer service systems and the like in the real world. Many tasks in natural language understanding, such as paraphrase recognition, repetitive question recognition, natural language reasoning, machine-read understanding, etc. studied herein, can be formalized as text matching questions.

For the study of text matching, the traditional method mainly focuses on manually defining features. With the rise of deep learning, many researchers use deep representation learning to perform text matching research, and deep self-coding language models are widely applied to natural language understanding tasks recently, and the strong language representation capability of the deep self-coding language models can improve the performance of the natural language understanding tasks.

However, the existing pre-training and fine-tuning method of the self-coding language model is not specific to a specific text matching task, only searches based on keyword matching, only considers information in a grammar level, only returns a webpage related to a result, does not pay attention to semantic matching, and leads to difficulty in obtaining accurate text information by a user under the condition that the user is difficult to express the self requirement by using keywords.

For example, patent document CN105608201A discloses a text matching method supporting a multi-keyword expression, including: a grammar conversion stage, which converts the multi-keyword expression into a plurality of groups of keywords; a keyword matching stage, namely, taking a plurality of groups of keywords output in the grammar conversion stage as input, and completing by adopting a keyword matching algorithm to obtain keywords appearing in the text; and a matching degree determining stage, wherein the text with the keywords output in the keyword matching stage is used as input, and the matching degree of the keywords in the keyword matching stage and the multiple groups of keywords obtained in the grammar conversion stage is determined. But the technical scheme has complex matching logic expression and needs strong processing system support.

For example, patent document CN113283235B provides a method and system for predicting a user tag, including: acquiring a user text set and a preset keyword library; obtaining each approximate word in a user text through the keywords, obtaining the keywords corresponding to the approximate words m before ranking according to the magnitude of the degree of association, determining n-dimensional vectors matched with the corresponding keywords, and determining a feature matrix through the m n-dimensional vectors; inputting the characteristic matrix into a neural network for training to obtain a prediction model; and predicting the text of the user to be processed through the prediction model to obtain a predicted user label. However, in the technical scheme, the search is performed only on the basis of keyword matching, and semantic matching is not concerned, so that a user is difficult to acquire accurate text information.

Disclosure of Invention

In order to solve the technical problem, the invention provides a natural language processing method based on artificial intelligence, which comprises the following steps:

s1, acquiring a natural language information original data set, and performing anomaly analysis on the original data set to generate an anomaly value set of the original data set;

s2, removing sample data in the abnormal value set from the original data set, inputting the information data set from which the sample data in the abnormal value set is removed into a semantic matching model for recognition, and determining a semantic matching result;

and S3, after multiplying the predicted value of the matching result by the estimated loss value, sequencing according to the product size, wherein the sequence obtained after sequencing is the natural language processing result.

Further, step S1 specifically includes: randomly selecting m samples from an original data set to form a network topology structure, wherein n NODEs form a NODE set NODE = { NODE =₁，node₂，……，node_nH, the node path length set is L_node={ L₁,…, L_i，…，L_nAnd the standard deviation of the path length of the network topology structure is:

；

wherein n is the total number of nodes,

is the average value of the node path length;

normalizing the path length standard difference set to obtain normalized index

Expressed as:

；

wherein the path length standard deviation of the network topology is set as

={

₁,…，

_i,…，

_nAt a maximum of

_maxMinimum value of

_min；

The outliers in the sample points were calculated as:

；

wherein the path length set of m sample points is H_d={h₁,…，h_i， …，h_m}；

Calculating an abnormal value of each sample point, and combining the abnormal values into an abnormal value set N;

randomly selecting m samples for multiple times, calculating an abnormal value set, and forming an abnormal value set N capable of covering the original data set_{General assembly}。

Further, in step S2, the semantic matching model includes an input layer, an intermediate layer and an output layer; the input layer calculates the weight of each input vector by using a positive and inverse frequency algorithm; the middle layer adopts a multi-layer bidirectional feature extraction model; the output layer calculates an output vector using a failure estimation model.

Further, the forward and inverse frequency algorithm specifically includes:

calculating the inverse frequency idf (E) of the input vector E:

IDF（E）=log（P/n_E）；

wherein, P is the total number of the training vector set; n is_EThe times of the input vector E appearing in the training vector set;

computing input vector weights K (E, D)_i）：

；

Wherein, TF (E, D)_i) Training vector set I for input vector E_iFrequency of middle;

is a normalization factor.

Further, the multi-layer bidirectional feature extraction model has three sublayers, namely a bidirectional Transformer coding layer, an interaction layer and a normalization layer.

Further, in each of the bidirectional Transformer coding layers, a matrix K composed of an input matrix X and each input vector weight calculated by a forward inverse frequency algorithm is used as an input, and an output matrix Z of the bidirectional Transformer coding layer is calculated:

；

where d is the dimension of the input matrix X, Q represents the vector sequence of the sets of input vectors E1, …, En, and B is the number of encodings.

Further, in the interaction layer, let the output matrix in the left and right directions be expressed as Z₁And Z₂Then the interaction matrix of the two output matrices is calculated as follows:

R₁=Z₁*Z₂ ^T；

R₂=Z₂*Z₁ ^T；

wherein R is₁Is Z₁The interaction matrix of (2); r₂Is Z₂The interaction matrix of (2);

calculating the final output matrix R after passing through each side coding layer_mulWhere H is the number of coding layers, R_iAn output matrix representing the i-th coding layer, C (R)_i) The function represents that all H coding layers are spliced together;

R_mul=C(R_i) ，i=1,…，H；

calculating layer normalization, output matrix after layer normalization

Expressed as:

；

wherein the LN function represents a layer normalization function.

Further, output matrixes on the left side and the right side after layer normalization are respectively represented as v1 and v2, and v1 and v2 are subjected to matching operation:

；

wherein y' represents the predicted value of the matching result of the two texts, v1 represents that v2 multiplies the corresponding elements of v1 and v2 one by one, and the function F represents that the spliced vectors of 4 vectors are input into a classifier for processing and the predicted value of the matching result is output.

Further, the calculating of the estimated failure value of the predicted value of the matching result by using the failure estimation model specifically includes:

predicting a loss value Lp of a sample of an output layer based on an ith left training sample and a corresponding right training sample in the obtained left training samples:

；

wherein,

may refer to the similarity between the head and tail samples in the ith left training sample,

the similarity between a head sample and a reference sample included in a jth right training sample in a right training sample corresponding to an ith left training sample is shown, I is the number of the right training samples, I is an integer less than or equal to the total number of the left training samples, and j is an integer less than or equal to I;

degree of similarity

Expressed in cosine similarity:

；

wherein ei1 refers to the vector representation of the head sample in the ith left training sample, ei2 refers to the vector representation of the tail sample in the ith left training sample,

are empirical parameters.

The invention also provides a natural language processing system based on artificial intelligence, which is used for realizing the natural language processing method.

The invention has the technical effects and advantages that:

1. the invention carries out natural language processing by deep learning based on artificial intelligence, improves the comprehension degree of a program by checking and using a mode in data, tunes the input weight to improve the prediction accuracy, and continuously optimizes each level when the natural language processing is increased along with the use duration, so that the intelligent accuracy of the natural language processing is gradually improved.

2. The invention uses the semantic matching model as the language processing module of the natural language processing system to help the system to quickly process the language, provides a great deal of language characteristic information for the artificial intelligent deep neural network, reduces the operation amount of the deep neural network, is beneficial to the quick operation of natural language processing and improves the processing efficiency.

3. The invention identifies the natural language information after removing the data abnormal points and determines the semantic matching result, thereby reducing the data processing operation amount of the information acquisition hardware equipment, simplifying the hardware structure and being suitable for mass popularization and generalization.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a flow chart of a natural language processing method based on artificial intelligence according to the present invention.

Fig. 2 is a schematic diagram of the network topology of the present invention.

FIG. 3 is a schematic structural diagram of the semantic matching model of the present invention.

Fig. 4 is a schematic architecture diagram of the intermediate layer and the output layer of the present invention.

Fig. 5 is a data diagram of data processing using the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, a flow chart of the artificial intelligence-based natural language processing method according to the present invention includes the following steps:

and S1, acquiring the natural language information original data set, and performing anomaly analysis on the original data set to generate an anomaly value set of the original data set.

According to the characteristics of abnormal data, the abnormal values of the data can be divided into abnormal large values, abnormal small values, zero values, negative values and missing values. The causes of the zero value and the negative value are complex, the zero value and the negative value need to be screened out for manual identification, and the zero value and the negative value in the identification data need to be judged by combining the actual situation of the data when the zero value and the negative value are abnormal values; the abnormal large value and the abnormal small value are values different from the normal rule of the data, and are not simple data beyond a certain threshold value, because for the data in the normal range, if the data are inconsistent with the rule of the data at the adjacent moment, the data are also judged to be abnormal values; the missing value is caused by object abnormality, and if only simple deletion or zero-setting processing is performed on the missing value, the accuracy of data at a time close to the missing value is affected, so that the abnormal value needs to be corrected. In the present embodiment, analysis is performed only for the case of an abnormal value distinguished from the normal regularity of data.

Specifically, the original data set is a data set with M data, and M samples are randomly selected from the original data set to form a network topology, as shown in fig. 2, the network topology in the figure has n NODEs, and the n NODEs form a NODE set NODE = { NODE = { (NODE)₁，node₂，……，node_nH, the node path length set is L_node={ L₁，…， L_i，…，L_nCalculating the standard deviation of the path length of the network topology structure by using the following formula:

；

wherein n is the total number of nodes,

is the average of the node path lengths.

If the path length standard deviation set of the network topology is as

={

₁,…，

_i,…，

_nAt a maximum of

_maxMinimum value of

_minNormalizing the path length standard difference set to obtain normalized index

Expressed as:

；

in the network topology, m piecesThe path length set of the local point is H_d={h₁,…，h_i, …，h_mAnd f, calculating an abnormal value S in the sample point according to the formula:

；

weighting and calculating the abnormal value of each sample point by using an abnormal value calculation formula, and combining the abnormal values into an abnormal value set N;

repeating the steps, randomly selecting M samples for multiple times, calculating an abnormal value set, and finally forming an abnormal value set N capable of covering all M data_{General assembly}。

And S2, removing the sample data in the abnormal value set from the original data set, inputting the information data set from which the sample data in the abnormal value set is removed into a semantic matching model for recognition, and determining a semantic matching result.

In the embodiment of the application, the fact that the actual semantic information may have errors is considered, and the semantic interference caused by the errors of the actual semantic information can be avoided by using the input of the semantic vector, so that the information input to the semantic matching model for recognition is processed by semantic vector word segmentation, and the method does not depend on any pre-training technology and word segmentation technology, and cannot cause errors caused by inaccuracy of the pre-training technology or the word segmentation technology.

As shown in fig. 3, the semantic matching model is a schematic structural diagram, and includes an input layer, an intermediate layer, and an output layer.

Wherein, in the input layer, E1, … and En are used for representing input vectors of the semantic matching model; the middle layer adopts a multi-layer bidirectional Transformer characteristic extraction model; in the output layer, T1, …, Tn represent the output vector of the semantic matching model. The semantic matching model is used for obtaining word vectors, so that the application of a subsequent text classifier is facilitated.

For the input layer, in order to strengthen the influence of the input vectors of the semantic matching model, the weight of each input vector is calculated by using a positive inverse frequency algorithm.

The forward and inverse frequency algorithm is a weighted statistical algorithm for information retrieval and text mining to evaluate the importance of a semantic information to a data set or a corpus.

Let the input vector be E, the forward and inverse frequency algorithm be the forward frequency × the inverse frequency, TF be the forward frequency, and IDF be the inverse frequency.

The inverse frequency idf (E) of the input vector E is calculated as follows:

IDF（E）=log（P/n_E）；

in the formula: p is the total number of the training vector set; n is_EThe number of times the input vector E appears in the training vector set. Input vector weights K (E, D) calculated using a positive inverse frequency algorithm_i) The calculation formula of (a) is as follows:

；

in the formula: TF (E, D)_i) Training vector set I for input vector E_iFrequency of middle;

is a normalization factor.

Fig. 4 is a schematic diagram of the architecture of the middle layer and the output layer.

The intermediate layer adopts a multi-layer bidirectional Transformer feature extraction model, the multi-layer bidirectional Transformer feature extraction model has three sub-layers, namely a bidirectional Transformer coding layer, an interaction layer and a normalization layer, and the structure of the multi-layer bidirectional Transformer feature extraction model is shown in fig. 4.

The multi-group input vectors E1, … and En form an input matrix X and are input into a multi-layer bidirectional Transformer characteristic extraction model, and the calculation process realized in the multi-layer bidirectional Transformer characteristic extraction model is as follows:

in each of the bidirectional Transformer encoding layers, a matrix K composed of an input matrix X and each input vector weight calculated by a forward-inverse frequency algorithm is used as an input, and an output matrix Z of the bidirectional Transformer encoding layer is calculated.

；

Where d is the dimension of the input matrix X, Q represents the vector sequence of the sets of input vectors E1, …, En, and B is the number of encoding times, i.e., the number of layers of the bidirectional transform encoding layer.

In the interaction layer, since the transform coding layer is bidirectional, it is assumed here that the left and right directional output matrices are represented as Z1 and Z2, and the interaction matrix of the two output matrices is calculated as follows:

R₁=Z₁*Z₂ ^T；

R₂=Z₂*Z₁ ^T；

wherein R is₁Is Z₁The interaction matrix of (2); r₂Is Z₂The interaction matrix of (2).

Calculating the final output matrix R after passing through each side coding layer_mulWhere H is the number of coding layers, R_iAn output matrix representing the i-th coding layer, C (R)_i) The function represents the concatenation of all H encoded layers, R_mulIs the same as the input matrix X.

R_mul=C(R_i) ，i=1，…，H；

In the normalization layer, the LN function is used to calculate the output matrix after layer normalization

Expressed as:

；

in the output layer, assuming that output matrixes on the left side and the right side after layer normalization are respectively represented as v1 and v2, v1 and v2 are input into the matching layer to perform matching operation, and the matching result of the two texts is calculated as follows:

；

wherein y' represents a predicted value of a matching result of two texts, and v1 v2 represents that corresponding elements of v1 and v2 are in phase-by-phaseMultiplication emphasizes the identity between two texts, while | V1-V2 | emphasizes the difference between two texts, the function F represents the concatenation vector V =that would be 4 vectors

The input is input to a classifier to process and output a predicted value of a matching result.

In the process of detecting specific data in a big data embedded network, the splicing vector V = of 4 vectors obtained by the formula

For the basis, the method is fused with fractional Fourier transformation to perform data matching processing, performs data classification space guidance, constructs a K-L data classifier, and realizes big data embedded data classification by using the classifier, and the specific steps are as follows:

in the classification process of big data embedded data, the splicing vector of 4 vectors obtained by the above formula is used as a basis, and the following defined fractional order Fourier transform expression is utilized:

；

in the formula,

a transform kernel representing a fractional order Fourier transform,

the representative data features match the rotation angles,

a token representing the form of the transform operator,

a set of attributes representing a clustering feature of the data.

In the classification process of the big data embedded data, the data matching based on the fractional order Fourier transformation is realized by utilizing the rotational additivity in the fractional order Fourier domain.

；

Wherein P represents the order of the fractional order Fourier domain of the specific data which is positive real number, q represents the order of the fractional order Fourier domain of the specific data which is negative real number,

and representing a large data embedded data classification fractional order Fourier domain.

The classifier obtained by the above formula is used for obtaining the energy distribution of the specific data among different frequencies in the big data embedded network, thereby realizing the detection of the specific data in the big data network.

And calculating an estimated failure value of a predicted value y' of the matching result by using the failure estimation model, taking output matrixes on the left side and the right side after layer normalization as a left training sample and a right training sample respectively, and inputting the left training sample and the right training sample into the failure estimation model of an output layer to obtain the estimated failure value.

The estimated failure value is used for indicating the efficiency of the output layer for predicting the sample, and is used for measuring the performance of the output layer for predicting the sample. The smaller the estimated failure value is, the better the sample prediction performance of the output layer is, namely the higher the prediction accuracy is.

In the embodiment of the present application, the estimated failure value Lp of the output layer obtained based on the ith left training sample and the corresponding right training sample in the obtained left training samples:

；

wherein,

the similarity between a head sample and a reference sample included in a jth right training sample in the right training samples corresponding to the ith left training sample is referred to, I is the number of the right training samples, I is an integer less than or equal to the total number of the left training samples, and j may be an integer less than or equal to I.

In a preferred embodiment, the similarity is expressed in terms of cosine similarity, as described above

Namely, it can satisfy:

；

where ei1 may refer to the vector representation of the leading sample in the ith left training sample, ei2 refers to the vector representation of the trailing sample in the ith left training sample,

is an empirical parameter, typically 1.5. The similarity between the head sample and the reference sample included in the jth right-direction training sample can be calculated by referring to the above formula

And will not be described herein.

And S3, after multiplying the predicted value of the matching result by the estimated loss value, sequencing according to the product size, wherein the sequence obtained after sequencing is the natural language processing result. Fig. 5 is a data diagram showing the data processing by the above steps.

The system comprises an acquisition module, a data processing module and a data processing module, wherein the acquisition module is used for acquiring an original data set of natural language information; in a preferred embodiment, the raw data set of natural language information further comprises at least two data sets to be trained.

The processor is used for removing the sample data in the abnormal value set from the original data set, inputting the information data set from which the sample data in the abnormal value set is removed into the semantic matching model for identification, and determining a semantic matching result; and (4) after multiplying the predicted value of the matching result by the estimated loss value, sequencing according to the product size, wherein the sequence obtained after sequencing is the natural language processing result. The processor provided in this embodiment may be deployed in a computer device, and may generate a large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) and memories, and one or more storage media (e.g., one or more mass storage devices) for storing applications or data. The memory and storage medium may be, among other things, transient or persistent storage. The program stored on the storage medium may include one or more modules, each of which may include a series of instruction operations for the server. Further, the processor may be configured to communicate with the storage medium and execute a series of instruction operations in the storage medium on the processor.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

Embodiments of the present application also provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the method described in the foregoing embodiments.

Embodiments of the present application also provide a computer program product including a program, which, when run on a computer, causes the computer to perform the methods described in the foregoing embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile memory may include Read Only Memory (ROM), magnetic tape, floppy disk, flash memory, optical storage, or the like. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A natural language processing method based on artificial intelligence is characterized by comprising the following steps:

2. The natural language processing method based on artificial intelligence of claim 1, wherein the step S1 specifically includes: randomly selecting m samples from an original data set to form a network topology structure, wherein n nodes form the network topology structureNODE set NODE = { NODE₁，node₂，……，node_nH, the node path length set is L_node={ L₁，…，L_i，…，L_nAnd the standard deviation of the path length of the network topology structure is:

；

wherein n is the total number of nodes,

is the average value of the node path length;

normalizing the path length standard difference set to obtain normalized index

Expressed as:

；

wherein the path length standard deviation of the network topology is set as

={

₁，…，

_i，…，

_nAt a maximum of

_maxMinimum value of

_min；

The outliers in the sample points were calculated as:

；

wherein the path length set of m sample points is H_d={h₁，…，h_i， …，h_m}；

3. The artificial intelligence based natural language processing method according to claim 1, wherein in step S2, the semantic matching model includes an input layer, an intermediate layer and an output layer; the input layer calculates the weight of each input vector by using a positive and inverse frequency algorithm; the middle layer adopts a multi-layer bidirectional feature extraction model; the output layer calculates an output vector using a failure estimation model.

4. The artificial intelligence based natural language processing method according to claim 3, wherein the forward and inverse frequency algorithm specifically comprises:

calculating the inverse frequency idf (E) of the input vector E:

IDF（E）=log（P/n_E）；

computing input vector weights K (E, D)_i）：

；

is a normalization factor.

5. The artificial intelligence based natural language processing method of claim 3, wherein the multi-layered bidirectional feature extraction model has three sub-layers, which are a bidirectional Transformer coding layer, an interaction layer and a normalization layer.

6. The artificial intelligence based natural language processing method according to claim 5, wherein in each of the bidirectional fransformer coding layers, a matrix K formed by an input matrix X and a weight of each input vector calculated by a forward inverse frequency algorithm is used as an input, and an output matrix Z of the bidirectional fransformer coding layer is calculated:

；

7. The artificial intelligence based natural language processing method of claim 5,

in the interaction layer, the output matrix in left and right directions is represented as Z₁And Z₂Then the interaction matrix of the two output matrices is calculated as follows:

R₁=Z₁*Z₂ ^T；

R₂=Z₂*Z₁ ^T；

computing a pass throughFinal output matrix R after each side coding layer_mulWhere H is the number of coding layers, R_iAn output matrix representing the i-th coding layer, C (R)_i) The function represents that all H coding layers are spliced together;

R_mul=C(R_i) ，i=1,…，H；

calculating layer normalization, output matrix after layer normalization

Expressed as:

；

wherein the LN function represents a layer normalization function.

8. The artificial intelligence based natural language processing method according to claim 7, wherein the output matrix representations of the left and right sides after the layer normalization are v1 and v2, respectively, and v1 and v2 are subjected to matching operations:

；

9. The artificial intelligence based natural language processing method according to claim 8, wherein the calculating an estimated failure value of the predicted value of the matching result using the failure estimation model specifically includes:

；

wherein,

degree of similarity

Expressed in cosine similarity:

；

are empirical parameters.

10. A natural language processing system based on artificial intelligence, wherein the natural language processing system is used for implementing the natural language processing method according to any one of claims 1 to 9.