CN116662582A

CN116662582A - Specific domain business knowledge retrieval method and retrieval device based on natural language

Info

Publication number: CN116662582A
Application number: CN202310954971.9A
Authority: CN
Inventors: 邱洪涛; 高渐朋
Original assignee: Chengdu Ict Information Technology Co ltd
Current assignee: Chengdu Ict Information Technology Co ltd
Priority date: 2023-08-01
Filing date: 2023-08-01
Publication date: 2023-08-29
Anticipated expiration: 2043-08-01
Also published as: CN116662582B

Abstract

The application discloses a specific field business knowledge retrieval method and a retrieval device based on natural language, comprising the following steps: constructing a pre-training language model; carrying out feature representation on the business knowledge data to obtain a database formed by feature vectors; constructing a language understanding model; inputting natural sentences and obtaining query vectors of the search questions through a language understanding model; calculating the similarity between the query vector and the feature vector in the database; and returning the business knowledge corresponding to the first k feature vectors; the application can better understand the query intention of the user by constructing the pre-training language model and the language understanding model, thereby more accurately matching and retrieving the related business knowledge. Meanwhile, by calculating the similarity between the query vector and the feature vector in the database, the service knowledge most relevant to the query can be found more quickly, and the retrieval efficiency is greatly improved.

Description

Specific domain business knowledge retrieval method and retrieval device based on natural language

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a specific field business knowledge retrieval method and a retrieval device based on natural language.

Background

The existing knowledge retrieval technology mainly comprises keyword searching, text matching, semantic matching and the like. These methods generally require manual classification of the knowledge base and then matching of the entered keywords or query sentences by setting specific search algorithms or rules, thereby returning relevant results. The traditional knowledge retrieval methods can be used for processing large-scale, multi-field and diversified data, and can be used for solving the problems of low search efficiency, low matching accuracy, incapability of understanding complex semantics and the like.

The existing knowledge retrieval method, especially keyword search and simple text matching, cannot understand the true intention and complex semantics of the query, and cannot process fuzzy query. For example, if a query sentence entered by a user contains some ambiguous, non-keyword representations, conventional knowledge retrieval methods may not return the correct results.

With the development of the internet and information technology, the amount of business knowledge data in a specific field is increasing, which makes knowledge retrieval more difficult. Existing knowledge retrieval methods may not be able to efficiently process large-scale data and are difficult to adapt to ever-changing and updated knowledge bases.

Therefore, a more intelligent, flexible and efficient knowledge retrieval method is needed to improve the efficiency and accuracy of knowledge retrieval, and simultaneously, the knowledge retrieval method can adapt to a large-scale, diverse and continuously-changing data environment.

Disclosure of Invention

The technical problem to be solved by the application is that the traditional knowledge retrieval method can only carry out simple keyword matching and can not return correct results, and the application aims to provide the specific field business knowledge retrieval method and the retrieval device based on natural language, so that the retrieval through the natural language is realized, the query intention is better understood, and the related business knowledge is more accurately matched and retrieved.

The application is realized by the following technical scheme:

a specific domain business knowledge retrieval method based on natural language comprises the following steps:

constructing a pre-training language model aiming at a specific field;

carrying out feature representation on the business knowledge data through the pre-training language model to obtain a database formed by feature vectors;

constructing a language understanding model for knowledge retrieval;

inputting natural sentences for searching questions to the language understanding model, and obtaining query vectors of the searching questions through the language understanding model;

sending the query vector to a database, and calculating the similarity between the query vector and the feature vector in the database;

sequencing according to the descending order of the similarity, and returning the business knowledge corresponding to the first k feature vectors;

and completing the retrieval of the business knowledge in the specific field.

Specifically, the construction method of the pre-training language model comprises the following steps:

constructing a transducer model, and collecting text corpus of business knowledge in a specific field;

masking the text corpus to obtain preprocessing data; the mask strategy comprises a mask word mask and a mask word mask;

word embedding and position encoding are carried out on the preprocessed data, so that first data are obtained;

inputting the first data into a transducer encoder, and processing the first data layer by layer through a plurality of encoders connected in series to obtain second data;

a mask language model is constructed and the second data is input to the mask language model, and the mask language model outputs predicted data for the pre-processed data and takes the predicted data as a feature representation.

Specifically, the training method of the transducer encoder comprises the following steps:

inputting first data into an attention layer of an encoder, performing linear transformation on a query matrix, a key matrix and a value matrix of the attention layer for a plurality of times, calculating dot product attention scores, splicing the attention scores, and performing linear transformation to obtain multi-head attention scores;

adding the input and the output of the attention layer through residual connection, normalizing the hidden layer in the neural network into standard normal distribution through normalization, and obtaining a first vector;

inputting the obtained first vector to a feedforward layer of an encoder, wherein the feedforward layer projects the first vector to a high-dimensional space and obtains a feedforward output;

adding the input and the output of the feedforward layer through residual connection, normalizing the hidden layer in the neural network into standard normal distribution through normalization, and obtaining the output of the encoder;

specifically, the training optimization objective function of the mask language model is:wherein->For the number of words masked, +.>Is dictionary (I)>For masking language model parameters, ++>The likelihood functions of the structure are predicted for the model mask.

Specifically, the method for constructing the language understanding model comprises the following steps:

constructing an intent encoder based on LSTM and outputting intent characteristics of word levels of natural sentences;

constructing an embedder, integrating the words and the corresponding business knowledge according to the intention characteristics of the words, and obtaining a knowledge characteristic set of the word level;

obtaining a knowledge context vector through a knowledge feature set;

constructing a decoder, taking the intention characteristic and the knowledge context vector of the word level as input, and outputting the intention prediction result of the output word level through the decoder;

outputting the intention prediction result of the sentence level through the intention prediction result of the word level,wherein->For the length of the sentence +.>For the number of intention labels +.>Is 1 at the j-th bit and 0 at the other bits>Dimension 0-1 vector, i.e., the intent prediction result at the ith word level, ++>An indication function for measuring whether the words in the predicted result are matched with the actual labels;

and converting words in the intention prediction result into word vectors through a pre-trained word embedding model, and obtaining query vectors on average for all the word vectors.

Specifically, the method for obtaining the intention characteristic of the word level comprises the following steps:

constructing a BERT encoder, inputting words into the BERT encoder, outputting T hidden layer state vectors with d dimensions and 1 semantic slot value characteristic of word level by using the hidden layer state vectors, wherein d is a special vector for classifying tasks; representing the intention characteristic of the sentence level by using a special vector; wherein T is the length of the input natural language, and d is the dimension;

the semantic slot value features are input to an intention encoder, a special vector is input as an initial hidden layer of the intention encoder, and then the corresponding T intention features with the word level of d dimension are hidden and emitted.

Specifically, the method for obtaining the knowledge feature set at the word level comprises the following steps:

determining the i-th wordCorresponding intention characteristics, determining n pieces of business knowledge corresponding to the ith word, and obtaining a concept set of n pieces of business knowledge>；

Calculating a correlation coefficient b between each business knowledge in the concept set and the ith word by using an attention mechanism;

after weighting calculation is carried out through the correlation coefficient b, n-dimensional knowledge characteristics of the i-th word are obtained;

and embedding the position code of the i-th word into the n-bit knowledge feature corresponding to the i-th word to obtain a knowledge feature set.

Specifically, the method for obtaining the knowledge context vector comprises the following steps:

calculating the correlation coefficient of the ith word and the knowledge feature through an attention mechanismAnd obtaining a multidimensional knowledge context vector of the ith word through weighted calculation: />，/>Wherein->For elements in the knowledge feature set, W is a trained weight parameter, ++>For the corresponding intention feature of words, +.>The semantic slot value characteristics corresponding to the words are obtained;

for multidimensional knowledge context directionThe amount is subjected to layer normalization to obtain knowledge context vectors of natural sentences。

Specifically, the method for obtaining the intention prediction result at the word level comprises the following steps:

taking the intention characteristics of word level and knowledge context vectors as input, mapping out the intention characteristics of word level through a decoder based on LSTM, and processing the intention characteristics through layer standardization;

calculating to obtain the intention detection result of word levelWherein->Is a trainable parameter.

A natural language based domain specific business knowledge retrieval device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method as described above when the computer program is executed.

Compared with the prior art, the application has the following advantages and beneficial effects:

the application can better understand the query intention of the user by constructing the pre-training language model and the language understanding model, thereby more accurately matching and retrieving the related business knowledge. Meanwhile, by calculating the similarity between the query vector and the feature vector in the database, the service knowledge most relevant to the query can be found more quickly, and the retrieval efficiency is greatly improved.

By constructing the feature vector database, the application can effectively organize and manage large-scale business knowledge data. The pre-trained language model can continuously learn and adapt to new business knowledge, and the real-time performance and accuracy of retrieval are guaranteed.

The natural sentences are analyzed through the language understanding model, so that not only can the query intention of a user be understood, but also complex semantics and fuzzy query can be processed. The number of returned results can be automatically determined according to the distribution of the similarity, so that more intelligent retrieval service is provided.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the application and together with the description serve to explain the principles of the application.

Fig. 1 is a flow chart of a specific domain business knowledge retrieval method based on natural language according to the present application.

FIG. 2 is a schematic flow chart of constructing a pre-trained language model according to the present application.

FIG. 3 is a schematic flow chart of constructing a language understanding model according to the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and embodiments, for the purpose of making the objects, technical solutions and advantages of the present application more apparent. It is to be understood that the specific embodiments described herein are merely illustrative of the substances, and not restrictive of the application.

It should be further noted that, for convenience of description, only the portions related to the present application are shown in the drawings.

Embodiments of the present application and features of the embodiments may be combined with each other without conflict. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Example 1

As shown in fig. 1, a specific domain business knowledge retrieval method based on natural language is provided, which includes:

firstly, constructing a pre-training language model aiming at a specific field, and carrying out feature representation on business knowledge data through the pre-training language model to obtain a database formed by feature vectors;

in order to obtain a model that can understand domain-specific language characteristics. Pre-trained language models are typically trained with large amounts of unlabeled text data to learn the semantics and grammar rules of the language. For a particular domain, we can choose the relevant text data to train so that the model can understand and generate the language of that domain. This step is self-supervised learning based on deep learning, and commonly used models include BERT, GPT, etc.

To convert business knowledge data into a machine-understandable form, the pre-trained language model would convert the text data into vectors in high-dimensional space that can preserve semantic information of the text data. We then organize these feature vectors into a database for subsequent knowledge retrieval.

Secondly, constructing a language understanding model for knowledge retrieval; and inputting natural sentences for searching questions into the language understanding model, and obtaining query vectors of the searching questions through the language understanding model.

And thirdly, sending the query vector to a database, and calculating the similarity between the query vector and the feature vector in the database.

The goal is to find the business knowledge that is most relevant to the query. We generally use a similarity calculation method to calculate the similarity of the query vector to the feature vectors in the database, such as cosine similarity or euclidean distance.

Fourthly, sorting according to the descending order of the similarity, and returning the business knowledge corresponding to the first k feature vectors; and completing the retrieval of the business knowledge in the specific field.

The k value may be dynamic, determining how many results to return based on the complexity and distribution of similarity of the query. Meanwhile, an explanation interface can be provided to help the user understand the returned result.

In this embodiment, the first step and the second step may be performed synchronously or asynchronously.

Example two

As shown in fig. 2, a method for constructing a pre-training language model in this embodiment is described, where the method includes:

and constructing a transducer model and collecting text corpus of business knowledge in a specific field.

First, a model based on a transducer architecture is constructed. The transducer is a model of self-attention (self-attention) mechanism that can handle variable length input sequences, and can take into account interactions between all elements in the sequence. Then, a large amount of text data related to the specific field is collected, and the data is used for training the model so as to understand and adapt to the specific field.

Masking the text corpus to obtain preprocessing data; the mask strategy includes a mask word mask and a mask word mask.

When preprocessing data, some words or characters are masked randomly, and then the model predicts these masked portions. This can help the model learn to understand the context and relationships between words.

Word embedding and position encoding are carried out on the preprocessed data, so that first data are obtained; word embedding (word embedding) is a process of converting words or characters into real vectors, such that semantically similar words or characters are located closer in vector space. Position coding is the addition of position information to word vectors, because in natural language processing, the order and position of words is often very important to understanding semantics.

Inputting the first data into a transducer encoder, and processing the first data layer by layer through a plurality of encoders connected in series to obtain second data; the encoder in the transducer model is formed by stacking a plurality of identical layers, each layer mainly comprising a Multi-Head Self-Attention mechanism (Multi-Head Self-Attention) and a feed-forward neural network (Feed Forward Neural Network). Both sub-layers have residual connections and layer normalization. The first data (word embedded and position encoded input) is first fed into the self-attention mechanism and then processed layer by layer through the feed-forward neural network to obtain the second data.

A mask language model is constructed and the second data is input to the mask language model, and the mask language model outputs predicted data for the pre-processed data and takes the predicted data as a feature representation. The masking language model is a model for pre-training, with the goal of predicting the portion of the input that is masked. The second data processed by the transducer encoder is input into the masked language model, and then the model predicts the masked portion, which allows the model to learn the internal structure and context of the language during the pre-training phase.

The training optimization objective function of the mask language model in this embodiment is:wherein->For the number of words masked, +.>Is dictionary (I)>For masking language model parameters, ++>The likelihood functions of the structure are predicted for the model mask.

The training method of the transducer encoder comprises the following steps:

inputting first data into an attention layer of an encoder, performing linear transformation on a query matrix, a key matrix and a value matrix of the attention layer for a plurality of times, calculating dot product attention scores, splicing the attention scores, and performing linear transformation to obtain multi-head attention scores; the input data (the embedded and position-coded data, i.e., the first data) is divided into three parts of "query", "key", and "value", which are all subjected to a linear transformation. The dot product of the query and the key is then calculated to obtain an attention score that indicates which portions of the input data the model should pay more attention to. This process is performed in a number of different representation spaces, each of which is called a "head", and finally, the outputs of all heads are spliced together and then subjected to a linear transformation to obtain the output of multiple heads of attention.

Adding the input and the output of the attention layer through residual connection, normalizing the hidden layer in the neural network into standard normal distribution through normalization, and obtaining a first vector; the residual connection (Residual Connection) and Normalization operations are both to prevent the problem of gradient extinction or gradient explosion during the training of the neural network. Residual connection refers to adding an input directly to an output, while normalization is to have the output of the hidden layer approach a standard normal distribution in all dimensions.

Inputting the obtained first vector to a feedforward layer of an encoder, wherein the feedforward layer projects the first vector to a high-dimensional space and obtains a feedforward output; the feed forward layer (Feed Forward layer) is a fully connected neural network that processes the output of the self-attention mechanism and then outputs it to the next level. It projects the input into a high dimensional space and then back into the original dimension, producing a feed-forward output.

Adding the input and the output of the feedforward layer through residual connection, normalizing the hidden layer in the neural network into standard normal distribution through normalization, and obtaining the output of the encoder; similar to the second step, it is only for the feed-forward layer. By summing the input and output of the feed-forward layer and normalizing the result, the final output of the encoder can be obtained

Example III

As shown in fig. 3, the present embodiment describes a method for constructing a language understanding model, the method including:

constructing an intent encoder based on LSTM and outputting intent characteristics of word levels of natural sentences; LSTM (Long Short Term Memory) is a special Recurrent Neural Network (RNN) that can effectively capture long-range dependencies when processing sequence data. In this step, an LSTM-based encoder is used to read the input natural sentence and generate a feature vector for each word that contains the intent information. This feature vector captures the semantic information of the word in the whole sentence and its relationship to other words.

Constructing an embedder, integrating the words and the corresponding business knowledge according to the intention characteristics of the words, and obtaining a knowledge characteristic set of the word level; and integrating the intention features with the corresponding business knowledge to generate a knowledge feature set, so as to realize the association of the intention features and the business knowledge.

Obtaining a knowledge context vector through a knowledge feature set; the knowledge feature set contains knowledge features for each word that together form a set containing information for all words. These features are then integrated into a single vector, which is called the knowledge context vector.

Constructing a decoder, taking the intention characteristic and the knowledge context vector of the word level as input, and outputting the intention prediction result of the output word level through the decoder; the previously generated intent features and knowledge context vectors are input as inputs to a decoder (which may also be an LSTM model). The task of the decoder is to predict the intent of each word.

Outputting the intention prediction result of the sentence level through the intention prediction result of the word level,wherein->For the length of the sentence +.>For the number of intention labels +.>Is 1 at the j-th bit and 0 at the other bits>Dimension 0-1 vector, i.e., the intent prediction result at the ith word level, ++>An indication function for measuring whether the words in the predicted result are matched with the actual labels; integrating word-level intent prediction results generated by a decoder intoIntent prediction at a sentence level.

And converting words in the intention prediction result into word vectors through a pre-trained word embedding model, and obtaining query vectors on average for all the word vectors. Each Word is converted to a Word vector using a pre-trained Word embedding model (e.g., word2Vec or GloVe). And then averaging all word vectors to obtain a query vector. This query vector can be seen as a single vector representing the whole sentence semantic, which can be used for subsequent retrieval tasks.

The method for obtaining the intention characteristics at the word level comprises the following steps:

constructing a BERT encoder, inputting words into the BERT encoder, outputting T hidden layer state vectors with d dimensions and 1 semantic slot value characteristic of word level by using the hidden layer state vectors, wherein d is a special vector for classifying tasks; representing the intention characteristic of the sentence level by using a special vector; wherein T is the length of the input natural language, and d is the dimension; BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained deep learning model for natural language processing tasks. In this step, each word is input into the BERT encoder. For each word input, the BERT encoder outputs a d-dimensional hidden layer state vector. These vectors capture the deep semantic information of each word. In addition to these vectors, the BERT encoder outputs a special vector that is commonly used to perform classification tasks. The hidden layer state vector is used to represent semantic slot value features for each word that reflect the semantic role of each word in a particular context. Meanwhile, a special vector is used to represent the intended feature of the whole sentence, which can grasp the overall meaning and purpose of the sentence.

The semantic slot value features are input to an intention encoder, a special vector is input as an initial hidden layer of the intention encoder, and then the corresponding T intention features with the word level of d dimension are hidden and emitted. An intention encoder is utilized to process word-level semantic slot value features. The initial hidden layer state input of this intent encoder is the special vector previously acquired, which contains the intent information of the entire sentence. Through this process, the intent encoder can generate intent features at the corresponding word level from the semantic slot value features of each word and the intent features of the entire sentence.

By using a concentration mechanism and position codes, the business knowledge is associated with words in a specific field to obtain a detailed knowledge feature set, and the method for obtaining the knowledge feature set at the word level comprises the following steps:

Calculating a correlation coefficient b between each business knowledge in the concept set and the ith word by using an attention mechanism; the attention mechanism is a computational model that can measure the correlation between inputs. In this step, the attention mechanism is used to calculate the correlation between the ith word and each business knowledge in the concept set

After weighted calculation is carried out through the correlation coefficient b, n-dimensional knowledge features of the ith word are obtained, and the knowledge features can reflect the correlation between the ith word and each business knowledge;

and embedding the position code of the i-th word into the n-bit knowledge feature corresponding to the i-th word to obtain a knowledge feature set. The location information of the i-th word is encoded into its knowledge features, and the encoding process ensures that the location information of each word is preserved.

The knowledge context vector obtaining method comprises the following steps:

calculating the correlation coefficient of the ith word and the knowledge feature through an attention mechanismAnd obtaining a multidimensional knowledge context vector of the ith word through weighted calculation: />，/>Wherein->For elements in the knowledge feature set, W is a trained weight parameter, ++>For the corresponding intention feature of words, +.>The semantic slot value characteristics corresponding to the words are obtained; first, a relevance coefficient between the i-th word and each element in the knowledge feature set is calculated using an attention mechanism. The correlation coefficient is calculated by comparing the intention characteristic corresponding to each word and the semantic slot value characteristic with the elements in the knowledge characteristic set, and the obtained correlation coefficient can be used for subsequent weighted calculation. The multi-dimensional knowledge context vector of each word can be obtained by carrying out weighted calculation on the trained weight parameters W. This vector is effectively a weighted sum of the knowledge feature set and the relevance coefficients of each word, which ensures the location of the knowledge context vector for each word in feature space.

Performing layer normalization on the multidimensional knowledge context vector to obtain the knowledge context vector of the natural sentence. After the multidimensional knowledge context vectors of each word are obtained, hierarchical normalization processing is performed on the vectors. The purpose of this step is to ensure stability of the model and effectiveness of the training. And (3) obtaining a knowledge context vector corresponding to each word after normalization processing, wherein the combination of the vectors forms the knowledge context vector of the whole natural sentence.

The method comprises the steps of attention mechanism and layer standardization, wherein knowledge context vectors of each word are obtained by calculating correlation coefficients of knowledge features and carrying out weighted calculation, and the vectors are further combined to obtain knowledge context vectors of the whole natural sentence. This vector contains contextual knowledge information for each word in the sentence, which can be used for subsequent natural language understanding and generating tasks.

The method for obtaining the intention prediction result of the word level comprises the following steps:

taking the intention characteristics of word level and knowledge context vectors as input, mapping out the intention characteristics of word level through a decoder based on LSTM, and processing the intention characteristics through layer standardization; first, the intent feature and knowledge context vector at word level are fetched as inputs to the LSTM based decoder. The task of the decoder is to map these inputs to new space, and these features enable higher level representation, including richer information, under the decoder's processing. Then, the intent features of these new mappings are subjected to a layer normalization process. The purpose of doing so is to ensure the stability of the model and the effectiveness of training, to make the characteristic value in a fixed range, and to reduce the difficulty of model training.

Calculating to obtain the intention detection result of word levelWherein->Is a trainable parameter. After the decoder and normalization process we have obtained the intended features of each word. Next, these features need to be translated into actual intent prediction results.

This step typically involves a classification problem, i.e., determining what each word is intended for. This is done by a classifier which has a number of parameters which can be trained and which is optimized to obtain the best prediction. Some other steps may also be involved in the process, such as: training of decoders, training and optimization of classifiers, etc. These steps require specific circumstances and data to determine, and thus may need to be adjusted to the actual situation in practice.

Example IV

A natural language based domain specific business knowledge retrieval device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor executing the steps of the method as described above.

The memory may be used to store software programs and modules, and the processor executes various functional applications of the terminal and data processing by running the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an execution program required for at least one function, and the like.

The storage data area may store data created according to the use of the terminal, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

A computer readable storage medium storing a computer program which when executed by a processor performs the steps of a natural language based domain specific business knowledge retrieval method as described above.

Computer readable media may include computer storage media and communication media without loss of generality. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instruction data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that computer storage media are not limited to the ones described above. The above-described system memory and mass storage devices may be collectively referred to as memory.

In the description of the present specification, reference to the terms "one embodiment/manner," "some embodiments/manner," "example," "a particular example," "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment/manner or example is included in at least one embodiment/manner or example of the application. In this specification, the schematic representations of the above terms are not necessarily for the same embodiment/manner or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments/modes or examples. Furthermore, the various embodiments/modes or examples described in this specification and the features of the various embodiments/modes or examples can be combined and combined by persons skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

It will be appreciated by persons skilled in the art that the above embodiments are provided for clarity of illustration only and are not intended to limit the scope of the application. Other variations or modifications of the above-described application will be apparent to those of skill in the art, and are still within the scope of the application.

Claims

1. A specific domain business knowledge retrieval method based on natural language is characterized by comprising the following steps:

constructing a pre-training language model aiming at a specific field;

constructing a language understanding model for knowledge retrieval;

and completing the retrieval of the business knowledge in the specific field.

2. The natural language-based domain-specific business knowledge retrieval method according to claim 1, wherein the pre-training language model construction method comprises:

3. The method for domain-specific business knowledge retrieval based on natural language according to claim 2, wherein the training method of the transducer encoder comprises:

and adding the input and the output of the feedforward layer through residual connection, normalizing the hidden layer in the neural network into standard normal distribution through normalization, and obtaining the output of the encoder.

4. The natural language based domain specific business knowledge retrieval method according to claim 2, wherein the training optimization objective function of the mask language model is:wherein->For the number of words masked, +.>Is dictionary (I)>For masking language model parameters, ++>The likelihood functions of the structure are predicted for the model mask.

5. The method for retrieving domain-specific business knowledge based on natural language according to claim 1, wherein the method for constructing the language understanding model comprises the following steps:

obtaining a knowledge context vector through a knowledge feature set;

6. The method for retrieving domain-specific business knowledge based on natural language as claimed in claim 5, wherein the method for obtaining the intention feature at word level comprises:

7. The method for retrieving domain-specific business knowledge based on natural language as claimed in claim 6, wherein the obtaining method of the knowledge feature set at word level comprises:

8. The domain-specific business knowledge retrieval method based on natural language as claimed in claim 6, wherein the knowledge context vector obtaining method comprises:

performing layer normalization on the multidimensional knowledge context vector to obtain the knowledge context vector of the natural sentence。

9. The method for retrieving domain-specific business knowledge based on natural language according to claim 8, wherein the method for obtaining the intention prediction result at word level comprises:

calculating to obtain the intention detection result of word levelWherein, the method comprises the steps of, wherein,is a trainable parameter.

10. A natural language based domain specific business knowledge retrieval device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1-9 when the computer program is executed.