CN116662582A - Specific domain business knowledge retrieval method and retrieval device based on natural language - Google Patents

Specific domain business knowledge retrieval method and retrieval device based on natural language Download PDF

Info

Publication number
CN116662582A
CN116662582A CN202310954971.9A CN202310954971A CN116662582A CN 116662582 A CN116662582 A CN 116662582A CN 202310954971 A CN202310954971 A CN 202310954971A CN 116662582 A CN116662582 A CN 116662582A
Authority
CN
China
Prior art keywords
knowledge
word
intention
vector
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310954971.9A
Other languages
Chinese (zh)
Other versions
CN116662582B (en
Inventor
邱洪涛
高渐朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Ict Information Technology Co ltd
Original Assignee
Chengdu Ict Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Ict Information Technology Co ltd filed Critical Chengdu Ict Information Technology Co ltd
Priority to CN202310954971.9A priority Critical patent/CN116662582B/en
Publication of CN116662582A publication Critical patent/CN116662582A/en
Application granted granted Critical
Publication of CN116662582B publication Critical patent/CN116662582B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a specific field business knowledge retrieval method and a retrieval device based on natural language, comprising the following steps: constructing a pre-training language model; carrying out feature representation on the business knowledge data to obtain a database formed by feature vectors; constructing a language understanding model; inputting natural sentences and obtaining query vectors of the search questions through a language understanding model; calculating the similarity between the query vector and the feature vector in the database; and returning the business knowledge corresponding to the first k feature vectors; the application can better understand the query intention of the user by constructing the pre-training language model and the language understanding model, thereby more accurately matching and retrieving the related business knowledge. Meanwhile, by calculating the similarity between the query vector and the feature vector in the database, the service knowledge most relevant to the query can be found more quickly, and the retrieval efficiency is greatly improved.

Description

Specific domain business knowledge retrieval method and retrieval device based on natural language
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a specific field business knowledge retrieval method and a retrieval device based on natural language.
Background
The existing knowledge retrieval technology mainly comprises keyword searching, text matching, semantic matching and the like. These methods generally require manual classification of the knowledge base and then matching of the entered keywords or query sentences by setting specific search algorithms or rules, thereby returning relevant results. The traditional knowledge retrieval methods can be used for processing large-scale, multi-field and diversified data, and can be used for solving the problems of low search efficiency, low matching accuracy, incapability of understanding complex semantics and the like.
The existing knowledge retrieval method, especially keyword search and simple text matching, cannot understand the true intention and complex semantics of the query, and cannot process fuzzy query. For example, if a query sentence entered by a user contains some ambiguous, non-keyword representations, conventional knowledge retrieval methods may not return the correct results.
With the development of the internet and information technology, the amount of business knowledge data in a specific field is increasing, which makes knowledge retrieval more difficult. Existing knowledge retrieval methods may not be able to efficiently process large-scale data and are difficult to adapt to ever-changing and updated knowledge bases.
Therefore, a more intelligent, flexible and efficient knowledge retrieval method is needed to improve the efficiency and accuracy of knowledge retrieval, and simultaneously, the knowledge retrieval method can adapt to a large-scale, diverse and continuously-changing data environment.
Disclosure of Invention
The technical problem to be solved by the application is that the traditional knowledge retrieval method can only carry out simple keyword matching and can not return correct results, and the application aims to provide the specific field business knowledge retrieval method and the retrieval device based on natural language, so that the retrieval through the natural language is realized, the query intention is better understood, and the related business knowledge is more accurately matched and retrieved.
The application is realized by the following technical scheme:
a specific domain business knowledge retrieval method based on natural language comprises the following steps:
constructing a pre-training language model aiming at a specific field;
carrying out feature representation on the business knowledge data through the pre-training language model to obtain a database formed by feature vectors;
constructing a language understanding model for knowledge retrieval;
inputting natural sentences for searching questions to the language understanding model, and obtaining query vectors of the searching questions through the language understanding model;
sending the query vector to a database, and calculating the similarity between the query vector and the feature vector in the database;
sequencing according to the descending order of the similarity, and returning the business knowledge corresponding to the first k feature vectors;
and completing the retrieval of the business knowledge in the specific field.
Specifically, the construction method of the pre-training language model comprises the following steps:
constructing a transducer model, and collecting text corpus of business knowledge in a specific field;
masking the text corpus to obtain preprocessing data; the mask strategy comprises a mask word mask and a mask word mask;
word embedding and position encoding are carried out on the preprocessed data, so that first data are obtained;
inputting the first data into a transducer encoder, and processing the first data layer by layer through a plurality of encoders connected in series to obtain second data;
a mask language model is constructed and the second data is input to the mask language model, and the mask language model outputs predicted data for the pre-processed data and takes the predicted data as a feature representation.
Specifically, the training method of the transducer encoder comprises the following steps:
inputting first data into an attention layer of an encoder, performing linear transformation on a query matrix, a key matrix and a value matrix of the attention layer for a plurality of times, calculating dot product attention scores, splicing the attention scores, and performing linear transformation to obtain multi-head attention scores;
adding the input and the output of the attention layer through residual connection, normalizing the hidden layer in the neural network into standard normal distribution through normalization, and obtaining a first vector;
inputting the obtained first vector to a feedforward layer of an encoder, wherein the feedforward layer projects the first vector to a high-dimensional space and obtains a feedforward output;
adding the input and the output of the feedforward layer through residual connection, normalizing the hidden layer in the neural network into standard normal distribution through normalization, and obtaining the output of the encoder;
specifically, the training optimization objective function of the mask language model is:wherein->For the number of words masked, +.>Is dictionary (I)>For masking language model parameters, ++>The likelihood functions of the structure are predicted for the model mask.
Specifically, the method for constructing the language understanding model comprises the following steps:
constructing an intent encoder based on LSTM and outputting intent characteristics of word levels of natural sentences;
constructing an embedder, integrating the words and the corresponding business knowledge according to the intention characteristics of the words, and obtaining a knowledge characteristic set of the word level;
obtaining a knowledge context vector through a knowledge feature set;
constructing a decoder, taking the intention characteristic and the knowledge context vector of the word level as input, and outputting the intention prediction result of the output word level through the decoder;
outputting the intention prediction result of the sentence level through the intention prediction result of the word level,wherein->For the length of the sentence +.>For the number of intention labels +.>Is 1 at the j-th bit and 0 at the other bits>Dimension 0-1 vector, i.e., the intent prediction result at the ith word level, ++>An indication function for measuring whether the words in the predicted result are matched with the actual labels;
and converting words in the intention prediction result into word vectors through a pre-trained word embedding model, and obtaining query vectors on average for all the word vectors.
Specifically, the method for obtaining the intention characteristic of the word level comprises the following steps:
constructing a BERT encoder, inputting words into the BERT encoder, outputting T hidden layer state vectors with d dimensions and 1 semantic slot value characteristic of word level by using the hidden layer state vectors, wherein d is a special vector for classifying tasks; representing the intention characteristic of the sentence level by using a special vector; wherein T is the length of the input natural language, and d is the dimension;
the semantic slot value features are input to an intention encoder, a special vector is input as an initial hidden layer of the intention encoder, and then the corresponding T intention features with the word level of d dimension are hidden and emitted.
Specifically, the method for obtaining the knowledge feature set at the word level comprises the following steps:
determining the i-th wordCorresponding intention characteristics, determining n pieces of business knowledge corresponding to the ith word, and obtaining a concept set of n pieces of business knowledge>
Calculating a correlation coefficient b between each business knowledge in the concept set and the ith word by using an attention mechanism;
after weighting calculation is carried out through the correlation coefficient b, n-dimensional knowledge characteristics of the i-th word are obtained;
and embedding the position code of the i-th word into the n-bit knowledge feature corresponding to the i-th word to obtain a knowledge feature set.
Specifically, the method for obtaining the knowledge context vector comprises the following steps:
calculating the correlation coefficient of the ith word and the knowledge feature through an attention mechanismAnd obtaining a multidimensional knowledge context vector of the ith word through weighted calculation: />,/>Wherein->For elements in the knowledge feature set, W is a trained weight parameter, ++>For the corresponding intention feature of words, +.>The semantic slot value characteristics corresponding to the words are obtained;
for multidimensional knowledge context directionThe amount is subjected to layer normalization to obtain knowledge context vectors of natural sentences
Specifically, the method for obtaining the intention prediction result at the word level comprises the following steps:
taking the intention characteristics of word level and knowledge context vectors as input, mapping out the intention characteristics of word level through a decoder based on LSTM, and processing the intention characteristics through layer standardization;
calculating to obtain the intention detection result of word levelWherein->Is a trainable parameter.
A natural language based domain specific business knowledge retrieval device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method as described above when the computer program is executed.
Compared with the prior art, the application has the following advantages and beneficial effects:
the application can better understand the query intention of the user by constructing the pre-training language model and the language understanding model, thereby more accurately matching and retrieving the related business knowledge. Meanwhile, by calculating the similarity between the query vector and the feature vector in the database, the service knowledge most relevant to the query can be found more quickly, and the retrieval efficiency is greatly improved.
By constructing the feature vector database, the application can effectively organize and manage large-scale business knowledge data. The pre-trained language model can continuously learn and adapt to new business knowledge, and the real-time performance and accuracy of retrieval are guaranteed.
The natural sentences are analyzed through the language understanding model, so that not only can the query intention of a user be understood, but also complex semantics and fuzzy query can be processed. The number of returned results can be automatically determined according to the distribution of the similarity, so that more intelligent retrieval service is provided.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the application and together with the description serve to explain the principles of the application.
Fig. 1 is a flow chart of a specific domain business knowledge retrieval method based on natural language according to the present application.
FIG. 2 is a schematic flow chart of constructing a pre-trained language model according to the present application.
FIG. 3 is a schematic flow chart of constructing a language understanding model according to the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and embodiments, for the purpose of making the objects, technical solutions and advantages of the present application more apparent. It is to be understood that the specific embodiments described herein are merely illustrative of the substances, and not restrictive of the application.
It should be further noted that, for convenience of description, only the portions related to the present application are shown in the drawings.
Embodiments of the present application and features of the embodiments may be combined with each other without conflict. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Example 1
As shown in fig. 1, a specific domain business knowledge retrieval method based on natural language is provided, which includes:
firstly, constructing a pre-training language model aiming at a specific field, and carrying out feature representation on business knowledge data through the pre-training language model to obtain a database formed by feature vectors;
in order to obtain a model that can understand domain-specific language characteristics. Pre-trained language models are typically trained with large amounts of unlabeled text data to learn the semantics and grammar rules of the language. For a particular domain, we can choose the relevant text data to train so that the model can understand and generate the language of that domain. This step is self-supervised learning based on deep learning, and commonly used models include BERT, GPT, etc.
To convert business knowledge data into a machine-understandable form, the pre-trained language model would convert the text data into vectors in high-dimensional space that can preserve semantic information of the text data. We then organize these feature vectors into a database for subsequent knowledge retrieval.
Secondly, constructing a language understanding model for knowledge retrieval; and inputting natural sentences for searching questions into the language understanding model, and obtaining query vectors of the searching questions through the language understanding model.
And thirdly, sending the query vector to a database, and calculating the similarity between the query vector and the feature vector in the database.
The goal is to find the business knowledge that is most relevant to the query. We generally use a similarity calculation method to calculate the similarity of the query vector to the feature vectors in the database, such as cosine similarity or euclidean distance.
Fourthly, sorting according to the descending order of the similarity, and returning the business knowledge corresponding to the first k feature vectors; and completing the retrieval of the business knowledge in the specific field.
The k value may be dynamic, determining how many results to return based on the complexity and distribution of similarity of the query. Meanwhile, an explanation interface can be provided to help the user understand the returned result.
In this embodiment, the first step and the second step may be performed synchronously or asynchronously.
Example two
As shown in fig. 2, a method for constructing a pre-training language model in this embodiment is described, where the method includes:
and constructing a transducer model and collecting text corpus of business knowledge in a specific field.
First, a model based on a transducer architecture is constructed. The transducer is a model of self-attention (self-attention) mechanism that can handle variable length input sequences, and can take into account interactions between all elements in the sequence. Then, a large amount of text data related to the specific field is collected, and the data is used for training the model so as to understand and adapt to the specific field.
Masking the text corpus to obtain preprocessing data; the mask strategy includes a mask word mask and a mask word mask.
When preprocessing data, some words or characters are masked randomly, and then the model predicts these masked portions. This can help the model learn to understand the context and relationships between words.
Word embedding and position encoding are carried out on the preprocessed data, so that first data are obtained; word embedding (word embedding) is a process of converting words or characters into real vectors, such that semantically similar words or characters are located closer in vector space. Position coding is the addition of position information to word vectors, because in natural language processing, the order and position of words is often very important to understanding semantics.
Inputting the first data into a transducer encoder, and processing the first data layer by layer through a plurality of encoders connected in series to obtain second data; the encoder in the transducer model is formed by stacking a plurality of identical layers, each layer mainly comprising a Multi-Head Self-Attention mechanism (Multi-Head Self-Attention) and a feed-forward neural network (Feed Forward Neural Network). Both sub-layers have residual connections and layer normalization. The first data (word embedded and position encoded input) is first fed into the self-attention mechanism and then processed layer by layer through the feed-forward neural network to obtain the second data.
A mask language model is constructed and the second data is input to the mask language model, and the mask language model outputs predicted data for the pre-processed data and takes the predicted data as a feature representation. The masking language model is a model for pre-training, with the goal of predicting the portion of the input that is masked. The second data processed by the transducer encoder is input into the masked language model, and then the model predicts the masked portion, which allows the model to learn the internal structure and context of the language during the pre-training phase.
The training optimization objective function of the mask language model in this embodiment is:wherein->For the number of words masked, +.>Is dictionary (I)>For masking language model parameters, ++>The likelihood functions of the structure are predicted for the model mask.
The training method of the transducer encoder comprises the following steps:
inputting first data into an attention layer of an encoder, performing linear transformation on a query matrix, a key matrix and a value matrix of the attention layer for a plurality of times, calculating dot product attention scores, splicing the attention scores, and performing linear transformation to obtain multi-head attention scores; the input data (the embedded and position-coded data, i.e., the first data) is divided into three parts of "query", "key", and "value", which are all subjected to a linear transformation. The dot product of the query and the key is then calculated to obtain an attention score that indicates which portions of the input data the model should pay more attention to. This process is performed in a number of different representation spaces, each of which is called a "head", and finally, the outputs of all heads are spliced together and then subjected to a linear transformation to obtain the output of multiple heads of attention.
Adding the input and the output of the attention layer through residual connection, normalizing the hidden layer in the neural network into standard normal distribution through normalization, and obtaining a first vector; the residual connection (Residual Connection) and Normalization operations are both to prevent the problem of gradient extinction or gradient explosion during the training of the neural network. Residual connection refers to adding an input directly to an output, while normalization is to have the output of the hidden layer approach a standard normal distribution in all dimensions.
Inputting the obtained first vector to a feedforward layer of an encoder, wherein the feedforward layer projects the first vector to a high-dimensional space and obtains a feedforward output; the feed forward layer (Feed Forward layer) is a fully connected neural network that processes the output of the self-attention mechanism and then outputs it to the next level. It projects the input into a high dimensional space and then back into the original dimension, producing a feed-forward output.
Adding the input and the output of the feedforward layer through residual connection, normalizing the hidden layer in the neural network into standard normal distribution through normalization, and obtaining the output of the encoder; similar to the second step, it is only for the feed-forward layer. By summing the input and output of the feed-forward layer and normalizing the result, the final output of the encoder can be obtained
Example III
As shown in fig. 3, the present embodiment describes a method for constructing a language understanding model, the method including:
constructing an intent encoder based on LSTM and outputting intent characteristics of word levels of natural sentences; LSTM (Long Short Term Memory) is a special Recurrent Neural Network (RNN) that can effectively capture long-range dependencies when processing sequence data. In this step, an LSTM-based encoder is used to read the input natural sentence and generate a feature vector for each word that contains the intent information. This feature vector captures the semantic information of the word in the whole sentence and its relationship to other words.
Constructing an embedder, integrating the words and the corresponding business knowledge according to the intention characteristics of the words, and obtaining a knowledge characteristic set of the word level; and integrating the intention features with the corresponding business knowledge to generate a knowledge feature set, so as to realize the association of the intention features and the business knowledge.
Obtaining a knowledge context vector through a knowledge feature set; the knowledge feature set contains knowledge features for each word that together form a set containing information for all words. These features are then integrated into a single vector, which is called the knowledge context vector.
Constructing a decoder, taking the intention characteristic and the knowledge context vector of the word level as input, and outputting the intention prediction result of the output word level through the decoder; the previously generated intent features and knowledge context vectors are input as inputs to a decoder (which may also be an LSTM model). The task of the decoder is to predict the intent of each word.
Outputting the intention prediction result of the sentence level through the intention prediction result of the word level,wherein->For the length of the sentence +.>For the number of intention labels +.>Is 1 at the j-th bit and 0 at the other bits>Dimension 0-1 vector, i.e., the intent prediction result at the ith word level, ++>An indication function for measuring whether the words in the predicted result are matched with the actual labels; integrating word-level intent prediction results generated by a decoder intoIntent prediction at a sentence level.
And converting words in the intention prediction result into word vectors through a pre-trained word embedding model, and obtaining query vectors on average for all the word vectors. Each Word is converted to a Word vector using a pre-trained Word embedding model (e.g., word2Vec or GloVe). And then averaging all word vectors to obtain a query vector. This query vector can be seen as a single vector representing the whole sentence semantic, which can be used for subsequent retrieval tasks.
The method for obtaining the intention characteristics at the word level comprises the following steps:
constructing a BERT encoder, inputting words into the BERT encoder, outputting T hidden layer state vectors with d dimensions and 1 semantic slot value characteristic of word level by using the hidden layer state vectors, wherein d is a special vector for classifying tasks; representing the intention characteristic of the sentence level by using a special vector; wherein T is the length of the input natural language, and d is the dimension; BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained deep learning model for natural language processing tasks. In this step, each word is input into the BERT encoder. For each word input, the BERT encoder outputs a d-dimensional hidden layer state vector. These vectors capture the deep semantic information of each word. In addition to these vectors, the BERT encoder outputs a special vector that is commonly used to perform classification tasks. The hidden layer state vector is used to represent semantic slot value features for each word that reflect the semantic role of each word in a particular context. Meanwhile, a special vector is used to represent the intended feature of the whole sentence, which can grasp the overall meaning and purpose of the sentence.
The semantic slot value features are input to an intention encoder, a special vector is input as an initial hidden layer of the intention encoder, and then the corresponding T intention features with the word level of d dimension are hidden and emitted. An intention encoder is utilized to process word-level semantic slot value features. The initial hidden layer state input of this intent encoder is the special vector previously acquired, which contains the intent information of the entire sentence. Through this process, the intent encoder can generate intent features at the corresponding word level from the semantic slot value features of each word and the intent features of the entire sentence.
By using a concentration mechanism and position codes, the business knowledge is associated with words in a specific field to obtain a detailed knowledge feature set, and the method for obtaining the knowledge feature set at the word level comprises the following steps:
determining the i-th wordCorresponding intention characteristics, determining n pieces of business knowledge corresponding to the ith word, and obtaining a concept set of n pieces of business knowledge>
Calculating a correlation coefficient b between each business knowledge in the concept set and the ith word by using an attention mechanism; the attention mechanism is a computational model that can measure the correlation between inputs. In this step, the attention mechanism is used to calculate the correlation between the ith word and each business knowledge in the concept set
After weighted calculation is carried out through the correlation coefficient b, n-dimensional knowledge features of the ith word are obtained, and the knowledge features can reflect the correlation between the ith word and each business knowledge;
and embedding the position code of the i-th word into the n-bit knowledge feature corresponding to the i-th word to obtain a knowledge feature set. The location information of the i-th word is encoded into its knowledge features, and the encoding process ensures that the location information of each word is preserved.
The knowledge context vector obtaining method comprises the following steps:
calculating the correlation coefficient of the ith word and the knowledge feature through an attention mechanismAnd obtaining a multidimensional knowledge context vector of the ith word through weighted calculation: />,/>Wherein->For elements in the knowledge feature set, W is a trained weight parameter, ++>For the corresponding intention feature of words, +.>The semantic slot value characteristics corresponding to the words are obtained; first, a relevance coefficient between the i-th word and each element in the knowledge feature set is calculated using an attention mechanism. The correlation coefficient is calculated by comparing the intention characteristic corresponding to each word and the semantic slot value characteristic with the elements in the knowledge characteristic set, and the obtained correlation coefficient can be used for subsequent weighted calculation. The multi-dimensional knowledge context vector of each word can be obtained by carrying out weighted calculation on the trained weight parameters W. This vector is effectively a weighted sum of the knowledge feature set and the relevance coefficients of each word, which ensures the location of the knowledge context vector for each word in feature space.
Performing layer normalization on the multidimensional knowledge context vector to obtain the knowledge context vector of the natural sentence. After the multidimensional knowledge context vectors of each word are obtained, hierarchical normalization processing is performed on the vectors. The purpose of this step is to ensure stability of the model and effectiveness of the training. And (3) obtaining a knowledge context vector corresponding to each word after normalization processing, wherein the combination of the vectors forms the knowledge context vector of the whole natural sentence.
The method comprises the steps of attention mechanism and layer standardization, wherein knowledge context vectors of each word are obtained by calculating correlation coefficients of knowledge features and carrying out weighted calculation, and the vectors are further combined to obtain knowledge context vectors of the whole natural sentence. This vector contains contextual knowledge information for each word in the sentence, which can be used for subsequent natural language understanding and generating tasks.
The method for obtaining the intention prediction result of the word level comprises the following steps:
taking the intention characteristics of word level and knowledge context vectors as input, mapping out the intention characteristics of word level through a decoder based on LSTM, and processing the intention characteristics through layer standardization; first, the intent feature and knowledge context vector at word level are fetched as inputs to the LSTM based decoder. The task of the decoder is to map these inputs to new space, and these features enable higher level representation, including richer information, under the decoder's processing. Then, the intent features of these new mappings are subjected to a layer normalization process. The purpose of doing so is to ensure the stability of the model and the effectiveness of training, to make the characteristic value in a fixed range, and to reduce the difficulty of model training.
Calculating to obtain the intention detection result of word levelWherein->Is a trainable parameter. After the decoder and normalization process we have obtained the intended features of each word. Next, these features need to be translated into actual intent prediction results.
This step typically involves a classification problem, i.e., determining what each word is intended for. This is done by a classifier which has a number of parameters which can be trained and which is optimized to obtain the best prediction. Some other steps may also be involved in the process, such as: training of decoders, training and optimization of classifiers, etc. These steps require specific circumstances and data to determine, and thus may need to be adjusted to the actual situation in practice.
Example IV
A natural language based domain specific business knowledge retrieval device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor executing the steps of the method as described above.
The memory may be used to store software programs and modules, and the processor executes various functional applications of the terminal and data processing by running the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an execution program required for at least one function, and the like.
The storage data area may store data created according to the use of the terminal, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
A computer readable storage medium storing a computer program which when executed by a processor performs the steps of a natural language based domain specific business knowledge retrieval method as described above.
Computer readable media may include computer storage media and communication media without loss of generality. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instruction data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that computer storage media are not limited to the ones described above. The above-described system memory and mass storage devices may be collectively referred to as memory.
In the description of the present specification, reference to the terms "one embodiment/manner," "some embodiments/manner," "example," "a particular example," "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment/manner or example is included in at least one embodiment/manner or example of the application. In this specification, the schematic representations of the above terms are not necessarily for the same embodiment/manner or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments/modes or examples. Furthermore, the various embodiments/modes or examples described in this specification and the features of the various embodiments/modes or examples can be combined and combined by persons skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
It will be appreciated by persons skilled in the art that the above embodiments are provided for clarity of illustration only and are not intended to limit the scope of the application. Other variations or modifications of the above-described application will be apparent to those of skill in the art, and are still within the scope of the application.

Claims (10)

1. A specific domain business knowledge retrieval method based on natural language is characterized by comprising the following steps:
constructing a pre-training language model aiming at a specific field;
carrying out feature representation on the business knowledge data through the pre-training language model to obtain a database formed by feature vectors;
constructing a language understanding model for knowledge retrieval;
inputting natural sentences for searching questions to the language understanding model, and obtaining query vectors of the searching questions through the language understanding model;
sending the query vector to a database, and calculating the similarity between the query vector and the feature vector in the database;
sequencing according to the descending order of the similarity, and returning the business knowledge corresponding to the first k feature vectors;
and completing the retrieval of the business knowledge in the specific field.
2. The natural language-based domain-specific business knowledge retrieval method according to claim 1, wherein the pre-training language model construction method comprises:
constructing a transducer model, and collecting text corpus of business knowledge in a specific field;
masking the text corpus to obtain preprocessing data; the mask strategy comprises a mask word mask and a mask word mask;
word embedding and position encoding are carried out on the preprocessed data, so that first data are obtained;
inputting the first data into a transducer encoder, and processing the first data layer by layer through a plurality of encoders connected in series to obtain second data;
a mask language model is constructed and the second data is input to the mask language model, and the mask language model outputs predicted data for the pre-processed data and takes the predicted data as a feature representation.
3. The method for domain-specific business knowledge retrieval based on natural language according to claim 2, wherein the training method of the transducer encoder comprises:
inputting first data into an attention layer of an encoder, performing linear transformation on a query matrix, a key matrix and a value matrix of the attention layer for a plurality of times, calculating dot product attention scores, splicing the attention scores, and performing linear transformation to obtain multi-head attention scores;
adding the input and the output of the attention layer through residual connection, normalizing the hidden layer in the neural network into standard normal distribution through normalization, and obtaining a first vector;
inputting the obtained first vector to a feedforward layer of an encoder, wherein the feedforward layer projects the first vector to a high-dimensional space and obtains a feedforward output;
and adding the input and the output of the feedforward layer through residual connection, normalizing the hidden layer in the neural network into standard normal distribution through normalization, and obtaining the output of the encoder.
4. The natural language based domain specific business knowledge retrieval method according to claim 2, wherein the training optimization objective function of the mask language model is:wherein->For the number of words masked, +.>Is dictionary (I)>For masking language model parameters, ++>The likelihood functions of the structure are predicted for the model mask.
5. The method for retrieving domain-specific business knowledge based on natural language according to claim 1, wherein the method for constructing the language understanding model comprises the following steps:
constructing an intent encoder based on LSTM and outputting intent characteristics of word levels of natural sentences;
constructing an embedder, integrating the words and the corresponding business knowledge according to the intention characteristics of the words, and obtaining a knowledge characteristic set of the word level;
obtaining a knowledge context vector through a knowledge feature set;
constructing a decoder, taking the intention characteristic and the knowledge context vector of the word level as input, and outputting the intention prediction result of the output word level through the decoder;
outputting the intention prediction result of the sentence level through the intention prediction result of the word level,wherein->For the length of the sentence +.>For the number of intention labels +.>Is 1 at the j-th bit and 0 at the other bits>Dimension 0-1 vector, i.e., the intent prediction result at the ith word level, ++>An indication function for measuring whether the words in the predicted result are matched with the actual labels;
and converting words in the intention prediction result into word vectors through a pre-trained word embedding model, and obtaining query vectors on average for all the word vectors.
6. The method for retrieving domain-specific business knowledge based on natural language as claimed in claim 5, wherein the method for obtaining the intention feature at word level comprises:
constructing a BERT encoder, inputting words into the BERT encoder, outputting T hidden layer state vectors with d dimensions and 1 semantic slot value characteristic of word level by using the hidden layer state vectors, wherein d is a special vector for classifying tasks; representing the intention characteristic of the sentence level by using a special vector; wherein T is the length of the input natural language, and d is the dimension;
the semantic slot value features are input to an intention encoder, a special vector is input as an initial hidden layer of the intention encoder, and then the corresponding T intention features with the word level of d dimension are hidden and emitted.
7. The method for retrieving domain-specific business knowledge based on natural language as claimed in claim 6, wherein the obtaining method of the knowledge feature set at word level comprises:
determining the i-th wordCorresponding intention characteristics, determining n pieces of business knowledge corresponding to the ith word, and obtaining a concept set of n pieces of business knowledge>
Calculating a correlation coefficient b between each business knowledge in the concept set and the ith word by using an attention mechanism;
after weighting calculation is carried out through the correlation coefficient b, n-dimensional knowledge characteristics of the i-th word are obtained;
and embedding the position code of the i-th word into the n-bit knowledge feature corresponding to the i-th word to obtain a knowledge feature set.
8. The domain-specific business knowledge retrieval method based on natural language as claimed in claim 6, wherein the knowledge context vector obtaining method comprises:
calculating the correlation coefficient of the ith word and the knowledge feature through an attention mechanismAnd obtaining a multidimensional knowledge context vector of the ith word through weighted calculation: />,/>Wherein->For elements in the knowledge feature set, W is a trained weight parameter, ++>For the corresponding intention feature of words, +.>The semantic slot value characteristics corresponding to the words are obtained;
performing layer normalization on the multidimensional knowledge context vector to obtain the knowledge context vector of the natural sentence
9. The method for retrieving domain-specific business knowledge based on natural language according to claim 8, wherein the method for obtaining the intention prediction result at word level comprises:
taking the intention characteristics of word level and knowledge context vectors as input, mapping out the intention characteristics of word level through a decoder based on LSTM, and processing the intention characteristics through layer standardization;
calculating to obtain the intention detection result of word levelWherein, the method comprises the steps of, wherein,is a trainable parameter.
10. A natural language based domain specific business knowledge retrieval device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1-9 when the computer program is executed.
CN202310954971.9A 2023-08-01 2023-08-01 Specific domain business knowledge retrieval method and retrieval device based on natural language Active CN116662582B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310954971.9A CN116662582B (en) 2023-08-01 2023-08-01 Specific domain business knowledge retrieval method and retrieval device based on natural language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310954971.9A CN116662582B (en) 2023-08-01 2023-08-01 Specific domain business knowledge retrieval method and retrieval device based on natural language

Publications (2)

Publication Number Publication Date
CN116662582A true CN116662582A (en) 2023-08-29
CN116662582B CN116662582B (en) 2023-10-10

Family

ID=87721037

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310954971.9A Active CN116662582B (en) 2023-08-01 2023-08-01 Specific domain business knowledge retrieval method and retrieval device based on natural language

Country Status (1)

Country Link
CN (1) CN116662582B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116992942A (en) * 2023-09-26 2023-11-03 苏州元脑智能科技有限公司 Natural language model optimization method, device, natural language model, equipment and medium
CN117093729A (en) * 2023-10-17 2023-11-21 北方健康医疗大数据科技有限公司 Retrieval method, system and retrieval terminal based on medical scientific research information
CN117171413A (en) * 2023-09-07 2023-12-05 滨州八爪鱼网络科技有限公司 Data processing system and method for digital collection management
CN117473071A (en) * 2023-12-27 2024-01-30 珠海格力电器股份有限公司 Data retrieval method, device, equipment and computer readable medium
CN117992068A (en) * 2024-04-02 2024-05-07 天津南大通用数据技术股份有限公司 LSTM and TRM combined intelligent database grammar analysis method

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109074353A (en) * 2016-10-10 2018-12-21 微软技术许可有限责任公司 The combination of language understanding and information retrieval
CN110516055A (en) * 2019-08-16 2019-11-29 西北工业大学 A kind of cross-platform intelligent answer implementation method for teaching task of combination BERT
CN111625641A (en) * 2020-07-30 2020-09-04 浙江大学 Dialog intention recognition method and system based on multi-dimensional semantic interaction representation model
CN111680511A (en) * 2020-04-21 2020-09-18 华东师范大学 Military field named entity identification method with cooperation of multiple neural networks
CN112000805A (en) * 2020-08-24 2020-11-27 平安国际智慧城市科技股份有限公司 Text matching method, device, terminal and storage medium based on pre-training model
CN113377844A (en) * 2021-06-29 2021-09-10 哈尔滨工业大学 Dialogue type data fuzzy retrieval method and device facing large relational database
CN113962219A (en) * 2021-10-13 2022-01-21 国网浙江省电力有限公司电力科学研究院 Semantic matching method and system for knowledge retrieval and question answering of power transformer
CN114047929A (en) * 2022-01-12 2022-02-15 广东省科技基础条件平台中心 Knowledge enhancement-based user defined function identification method, device and medium
CN114416927A (en) * 2022-01-24 2022-04-29 招商银行股份有限公司 Intelligent question and answer method, device, equipment and storage medium
CN114491024A (en) * 2021-12-31 2022-05-13 长城信息股份有限公司 Small sample-based specific field multi-label text classification method
US11334565B1 (en) * 2016-10-28 2022-05-17 Intuit, Inc. System to convert natural-language financial questions into database queries
CN114817494A (en) * 2022-04-02 2022-07-29 华南理工大学 Knowledge type retrieval type dialogue method based on pre-training and attention interaction network
CN115048447A (en) * 2022-06-27 2022-09-13 华中科技大学 Database natural language interface system based on intelligent semantic completion
US20220292262A1 (en) * 2021-03-10 2022-09-15 At&T Intellectual Property I, L.P. System and method for hybrid question answering over knowledge graph
CN115292457A (en) * 2022-06-30 2022-11-04 腾讯科技(深圳)有限公司 Knowledge question answering method and device, computer readable medium and electronic equipment
CN115309879A (en) * 2022-08-05 2022-11-08 中国石油大学(华东) Multi-task semantic parsing model based on BART
CN115510814A (en) * 2022-11-09 2022-12-23 东南大学 Chapter-level complex problem generation method based on double planning
CN115759062A (en) * 2022-10-09 2023-03-07 阿里巴巴(中国)有限公司 Knowledge injection-based text and image pre-training model processing method and text and image retrieval system
CN116257616A (en) * 2023-03-14 2023-06-13 山东师范大学 Entity relation extraction method and system for music field

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109074353A (en) * 2016-10-10 2018-12-21 微软技术许可有限责任公司 The combination of language understanding and information retrieval
US11334565B1 (en) * 2016-10-28 2022-05-17 Intuit, Inc. System to convert natural-language financial questions into database queries
CN110516055A (en) * 2019-08-16 2019-11-29 西北工业大学 A kind of cross-platform intelligent answer implementation method for teaching task of combination BERT
CN111680511A (en) * 2020-04-21 2020-09-18 华东师范大学 Military field named entity identification method with cooperation of multiple neural networks
CN111625641A (en) * 2020-07-30 2020-09-04 浙江大学 Dialog intention recognition method and system based on multi-dimensional semantic interaction representation model
CN112000805A (en) * 2020-08-24 2020-11-27 平安国际智慧城市科技股份有限公司 Text matching method, device, terminal and storage medium based on pre-training model
US20220292262A1 (en) * 2021-03-10 2022-09-15 At&T Intellectual Property I, L.P. System and method for hybrid question answering over knowledge graph
CN113377844A (en) * 2021-06-29 2021-09-10 哈尔滨工业大学 Dialogue type data fuzzy retrieval method and device facing large relational database
CN113962219A (en) * 2021-10-13 2022-01-21 国网浙江省电力有限公司电力科学研究院 Semantic matching method and system for knowledge retrieval and question answering of power transformer
CN114491024A (en) * 2021-12-31 2022-05-13 长城信息股份有限公司 Small sample-based specific field multi-label text classification method
CN114047929A (en) * 2022-01-12 2022-02-15 广东省科技基础条件平台中心 Knowledge enhancement-based user defined function identification method, device and medium
CN114416927A (en) * 2022-01-24 2022-04-29 招商银行股份有限公司 Intelligent question and answer method, device, equipment and storage medium
CN114817494A (en) * 2022-04-02 2022-07-29 华南理工大学 Knowledge type retrieval type dialogue method based on pre-training and attention interaction network
CN115048447A (en) * 2022-06-27 2022-09-13 华中科技大学 Database natural language interface system based on intelligent semantic completion
CN115292457A (en) * 2022-06-30 2022-11-04 腾讯科技(深圳)有限公司 Knowledge question answering method and device, computer readable medium and electronic equipment
CN115309879A (en) * 2022-08-05 2022-11-08 中国石油大学(华东) Multi-task semantic parsing model based on BART
CN115759062A (en) * 2022-10-09 2023-03-07 阿里巴巴(中国)有限公司 Knowledge injection-based text and image pre-training model processing method and text and image retrieval system
CN115510814A (en) * 2022-11-09 2022-12-23 东南大学 Chapter-level complex problem generation method based on double planning
CN116257616A (en) * 2023-03-14 2023-06-13 山东师范大学 Entity relation extraction method and system for music field

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孙毅等: "自然语言预训练模型知识增强方法综述", 中文信息学报, vol. 35, no. 7, pages 10 - 29 *
赵良等: "用BERT 和改进PCNN 模型抽取食品安全领域关系", 农业工程学报, vol. 38, no. 8, pages 263 - 270 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117171413A (en) * 2023-09-07 2023-12-05 滨州八爪鱼网络科技有限公司 Data processing system and method for digital collection management
CN117171413B (en) * 2023-09-07 2024-03-08 滨州八爪鱼网络科技有限公司 Data processing system and method for digital collection management
CN116992942A (en) * 2023-09-26 2023-11-03 苏州元脑智能科技有限公司 Natural language model optimization method, device, natural language model, equipment and medium
CN116992942B (en) * 2023-09-26 2024-02-02 苏州元脑智能科技有限公司 Natural language model optimization method, device, natural language model, equipment and medium
CN117093729A (en) * 2023-10-17 2023-11-21 北方健康医疗大数据科技有限公司 Retrieval method, system and retrieval terminal based on medical scientific research information
CN117093729B (en) * 2023-10-17 2024-01-09 北方健康医疗大数据科技有限公司 Retrieval method, system and retrieval terminal based on medical scientific research information
CN117473071A (en) * 2023-12-27 2024-01-30 珠海格力电器股份有限公司 Data retrieval method, device, equipment and computer readable medium
CN117473071B (en) * 2023-12-27 2024-04-05 珠海格力电器股份有限公司 Data retrieval method, device, equipment and computer readable medium
CN117992068A (en) * 2024-04-02 2024-05-07 天津南大通用数据技术股份有限公司 LSTM and TRM combined intelligent database grammar analysis method

Also Published As

Publication number Publication date
CN116662582B (en) 2023-10-10

Similar Documents

Publication Publication Date Title
CN116662582B (en) Specific domain business knowledge retrieval method and retrieval device based on natural language
CN111611361B (en) Intelligent reading, understanding, question answering system of extraction type machine
CN113239700A (en) Text semantic matching device, system, method and storage medium for improving BERT
CN108932342A (en) A kind of method of semantic matches, the learning method of model and server
CN113010693A (en) Intelligent knowledge graph question-answering method fusing pointer to generate network
CN112989834A (en) Named entity identification method and system based on flat grid enhanced linear converter
CN110990555B (en) End-to-end retrieval type dialogue method and system and computer equipment
CN111414481A (en) Chinese semantic matching method based on pinyin and BERT embedding
CN114548101B (en) Event detection method and system based on backtracking sequence generation method
CN111985205A (en) Aspect level emotion classification model
CN111462749A (en) End-to-end dialogue system and method based on dialogue state guidance and knowledge base retrieval
CN116150335A (en) Text semantic retrieval method under military scene
CN114781375A (en) Military equipment relation extraction method based on BERT and attention mechanism
CN115600597A (en) Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium
CN113705238A (en) Method and model for analyzing aspect level emotion based on BERT and aspect feature positioning model
CN115796182A (en) Multi-modal named entity recognition method based on entity-level cross-modal interaction
Parvin et al. Transformer-based local-global guidance for image captioning
CN117435716B (en) Data processing method and system of power grid man-machine interaction terminal
CN114492796A (en) Multitask learning sign language translation method based on syntax tree
CN116955579B (en) Chat reply generation method and device based on keyword knowledge retrieval
CN111581365B (en) Predicate extraction method
Sabharwal et al. Introduction to word embeddings
CN115964497A (en) Event extraction method integrating attention mechanism and convolutional neural network
CN116167353A (en) Text semantic similarity measurement method based on twin long-term memory network
CN115510230A (en) Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant