CN114064859A - Knowledge extraction method, apparatus, device, medium, and program product - Google Patents

Knowledge extraction method, apparatus, device, medium, and program product Download PDF

Info

Publication number
CN114064859A
CN114064859A CN202111437931.4A CN202111437931A CN114064859A CN 114064859 A CN114064859 A CN 114064859A CN 202111437931 A CN202111437931 A CN 202111437931A CN 114064859 A CN114064859 A CN 114064859A
Authority
CN
China
Prior art keywords
knowledge
search
target
statement
search statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111437931.4A
Other languages
Chinese (zh)
Inventor
崔林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
ICBC Technology Co Ltd
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
ICBC Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC, ICBC Technology Co Ltd filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202111437931.4A priority Critical patent/CN114064859A/en
Publication of CN114064859A publication Critical patent/CN114064859A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a knowledge extraction method, which relates to the field of finance and data processing technology, and comprises: in response to receiving the target search statement, determining a historical search statement having an association relation with the target search statement from a historical search statement database according to the target search statement; determining a rough knowledge set from a preconfigured operation and maintenance knowledge base according to a target search statement and the historical search statement; and extracting the target knowledge document from the rough knowledge set according to the search word vector generated by fusing the target search sentence and the historical search sentence. The present disclosure also provides a knowledge extraction apparatus, an electronic device, a readable storage medium, and a computer program product.

Description

Knowledge extraction method, apparatus, device, medium, and program product
Technical Field
The present disclosure relates to the field of finance and data processing, and more particularly, to a knowledge extraction method, apparatus, electronic device, computer-readable storage medium, and computer program product.
Background
With the development of social economy, the construction of computing infrastructure such as data centers becomes important. The operation and maintenance are important links of the data center and are important supports for ensuring stable operation of the data center. The operation and maintenance belongs to the field of intensive technical information and has the characteristics of wide knowledge range, multiple fields, complicated problems, difficult entrance, difficult penetration and difficult inheritance. At present, the problems of knowledge accumulation, search, propagation and the like in the operation and maintenance field are very difficult to solve.
In the daily operation and maintenance process, a large number of valuable system documents, plans, problem solutions, system rules and operation manuals and the like are generated, and most of the documents are dispersed in hands of various experts or stored in tools similar to confluence.
In the process of realizing the concept disclosed by the invention, the inventor finds that the method for obtaining knowledge through communicating with experts has the problem of low efficiency; the method for extracting knowledge through the confluence tool has the problem of low accuracy.
Disclosure of Invention
In view of the above, the present disclosure provides a knowledge extraction method, a knowledge extraction apparatus, an electronic device, a readable storage medium, and a computer program product.
According to a first aspect of the present disclosure, there is provided a knowledge extraction method, including:
in response to receiving a target search statement, determining a historical search statement having an association relation with the target search statement from a historical search statement database according to the target search statement;
determining a rough knowledge set from a preconfigured operation and maintenance knowledge base according to the target search statement and the historical search statement; and
and extracting a target knowledge document from the rough knowledge set according to a search word vector generated by fusing the target search term and the historical search term.
According to an embodiment of the present disclosure, the determining, from a historical search term database, a historical search term having an association relationship with the target search term according to the target search term includes:
determining candidate historical search sentences from the historical search sentence database according to the target search sentences;
inputting the target search sentence and the candidate search sentence into a neural network model which is trained in advance, and outputting a similarity result;
and determining the candidate search sentence as the historical search sentence under the condition that the similarity result meets a preset condition.
According to an embodiment of the present disclosure, the determining a candidate search term from the historical search term database according to the target search term includes:
acquiring the receiving time of the target search statement;
sorting the search sentences in the historical search sentence database according to the receiving time to generate a sorting result;
and determining the candidate historical search sentences according to the sorting result.
According to an embodiment of the present disclosure, the determining, according to the target search statement and the historical search statement, a rough-ranking knowledge set from a preconfigured operation and maintenance knowledge base includes:
respectively utilizing multiple recall algorithms to determine an initial knowledge set respectively corresponding to each recall algorithm from the operation and maintenance knowledge base based on the target search statement and the historical search statement;
and generating the coarse knowledge set according to a plurality of initial knowledge sets.
According to an embodiment of the present disclosure, the knowledge document in the initial knowledge set has a score calculated based on a recall algorithm;
the generating the coarse knowledge set based on the plurality of initial knowledge sets includes:
normalizing the scores of the knowledge documents in the initial knowledge sets to generate normalized initial knowledge sets;
carrying out duplication removal processing on knowledge documents in the plurality of normalized initial knowledge sets to generate duplication removal initial knowledge sets;
and sequencing the knowledge documents in the plurality of duplication-removing initial knowledge sets according to the scores to generate the rough knowledge set.
According to an embodiment of the present disclosure, the extracting a target knowledge document from the rough knowledge set based on a fusion vector generated by fusing the target search term and the historical search term includes:
performing word embedding processing on the target search statement and the historical search statement respectively to generate a first search word vector corresponding to the target search statement and a second search word vector corresponding to the historical search statement;
fusing the first search term vector and the second search term vector to generate the search term vector;
and extracting a target knowledge document from the rough knowledge set according to the search word vector.
According to an embodiment of the present disclosure, the knowledge extraction method further includes:
acquiring an original operation and maintenance knowledge document;
performing word segmentation on the operation and maintenance knowledge sentences in the original operation and maintenance knowledge document to generate a plurality of original operation and maintenance knowledge words;
and generating an operation and maintenance knowledge map according to the original operation and maintenance knowledge words, wherein the operation and maintenance knowledge map comprises nodes and edges, the nodes represent the original operation and maintenance knowledge words, the nodes have word weights, and the edges represent the front-back relationship of the nodes connected by the edges in the operation and maintenance knowledge sentences.
According to an embodiment of the present disclosure, the knowledge extraction method further includes:
and calculating the node weight of the operation and maintenance knowledge statement in the original operation and maintenance knowledge document.
According to an embodiment of the present disclosure, the extracting a target knowledge document from the rough knowledge set according to the target search term and the historical search term includes:
determining word weights corresponding to the knowledge documents in the rough-arranged knowledge set from the operation and maintenance knowledge map;
generating a rough knowledge vector set according to the knowledge documents in the rough knowledge set and the word weight;
similarity calculation is carried out on the search word vectors, the knowledge document vectors in the refined knowledge set and the word weights, and a refined knowledge set is generated;
and determining the target knowledge document from the refined knowledge.
According to a second aspect of the present disclosure, there is provided a knowledge extraction apparatus comprising:
the system comprises a first determination module, a second determination module and a third determination module, wherein the first determination module is used for responding to the received target search statement, and determining the historical search statement which has an association relation with the target search statement from a historical search statement database according to the target search statement;
the second determining module is used for determining a rough knowledge set from a pre-configured operation and maintenance knowledge base according to the target search statement and the historical search statement; and
and the extraction module is used for extracting the target knowledge document from the rough knowledge set according to the search word vector generated by fusing the target search sentence and the historical search sentence.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: one or more processors; a memory for storing one or more instructions, wherein the one or more instructions, when executed by the one or more processors, cause the one or more processors to implement the method as described above.
According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to implement the method as described above.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising computer executable instructions for implementing the method as described above when executed.
According to the embodiment of the disclosure, the technical scheme that the historical search sentences related to the target search sentences are determined according to the target search sentences, then the knowledge in the operation and maintenance knowledge base is roughly screened according to the target search sentences and the historical search sentences, and then the target knowledge documents are extracted from the roughly-arranged knowledge set according to the search word vectors generated by fusing the target search sentences and the historical search sentences is adopted, so that the fusion search is performed on similar problems, the search emphasis is highlighted, and the technical effect of improving the accuracy of the knowledge extraction is realized.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates a system architecture 100 to which the knowledge extraction method may be applied, according to an embodiment of the disclosure;
FIG. 2 schematically illustrates a flow diagram of a knowledge extraction method according to an embodiment of the disclosure;
FIG. 3 schematically illustrates a flow chart of a method of determining a historical search statement according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of a method of determining candidate historical search statements according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow chart of a method of determining a coarse knowledge set according to an embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow chart of a method of generating a coarse knowledge set according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a flow chart of a method of extracting a target knowledge document according to an embodiment of the present disclosure;
FIG. 8 schematically illustrates a flow diagram of a method of generating an operation and maintenance knowledge-graph according to an embodiment of the disclosure;
FIG. 9 schematically illustrates a flow chart of a method of determining a target knowledge document according to an embodiment of the present disclosure;
FIG. 10 schematically shows a block diagram of a knowledge extraction apparatus 1000 according to an embodiment of the present disclosure; and
FIG. 11 schematically shows a block diagram of an electronic device 1100 adapted to implement a knowledge extraction method according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, necessary security measures are taken, and the customs of the public order is not violated.
With the development of social economy, the construction of computing infrastructure such as data centers becomes important. The operation and maintenance are important links of the data center and are important supports for ensuring stable operation of the data center. The operation and maintenance belongs to the field of intensive technical information and has the characteristics of wide knowledge range, multiple fields, complicated problems, difficult entrance, difficult penetration and difficult inheritance. At present, the problems of knowledge accumulation, search, propagation and the like in the operation and maintenance field are very difficult to solve.
In the daily operation and maintenance process, a large number of valuable system documents, plans, problem solutions, system rules and operation manuals and the like are generated, and most of the documents are dispersed in hands of various experts or stored in tools similar to confluence.
In the process of realizing the concept disclosed by the invention, the inventor finds that the method for obtaining knowledge through communicating with experts has the problem of low efficiency; the method for extracting knowledge through the confluence tool has the problem of low accuracy.
To at least partially solve the technical problems in the related art, the present disclosure provides a knowledge extraction method, including: in response to receiving the target search statement, determining a historical search statement having an association relation with the target search statement from a historical search statement database according to the target search statement; determining a rough knowledge set from a pre-configured operation and maintenance knowledge base according to a target search statement; and extracting the target knowledge document from the rough knowledge set according to the target search sentence and the historical search sentence. The present disclosure also provides a knowledge extraction apparatus, an electronic device, a computer storage medium, and a computer program product.
It should be noted that the method and apparatus for knowledge extraction provided by the embodiments of the present disclosure may be used in the financial field and the data processing technology field, for example, the accumulation, search and propagation of operation and maintenance knowledge of banking outlets. The knowledge extraction method and device provided by the embodiment of the disclosure can also be used in any fields except the financial field and the data processing technical field, such as a chat robot which can quickly return a correct response to a user's question. The application fields of the knowledge extraction method and the knowledge extraction device provided by the embodiment of the disclosure are not limited.
Fig. 1 schematically illustrates a system architecture 100 to which the knowledge extraction method may be applied, according to an embodiment of the disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a shopping-like application, a web browser application, a search-like application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the method of knowledge extraction provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the means for knowledge extraction provided by the embodiments of the present disclosure may be generally disposed in the server 105. The method of knowledge extraction provided by embodiments of the present disclosure may be performed by a server or server cluster that is distinct from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the apparatus for knowledge extraction provided by the embodiments of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Alternatively, the method for extracting knowledge provided by the embodiment of the present disclosure may also be performed by the terminal device 101, 102, or 103, or may also be performed by another terminal device different from the terminal device 101, 102, or 103. Accordingly, the apparatus for knowledge extraction provided by the embodiment of the present disclosure may also be disposed in the terminal device 101, 102, or 103, or in another terminal device different from the terminal device 101, 102, or 103.
For example, the history search statement may originally be stored in any one of the terminal apparatuses 101, 102, or 103 (for example, but not limited to, the terminal apparatus 101), or may be stored on an external storage apparatus and may be imported into the terminal apparatus 101. Then, the terminal device 101 may locally execute the knowledge extraction method provided by the embodiment of the present disclosure, or send the data acquisition instruction to another terminal device, server, or server cluster, and execute the knowledge extraction method provided by the embodiment of the present disclosure by another terminal device, server, or server cluster that receives the data acquisition instruction.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically shows a flow diagram of a knowledge extraction method according to an embodiment of the present disclosure.
As shown in fig. 2, the knowledge extraction method of this embodiment includes operations S201 to S203.
In operation S201, in response to receiving the target search sentence, a history search sentence having an association relationship with the target search sentence is determined from the history search sentence database according to the target search sentence.
According to an embodiment of the present disclosure, a knowledge extraction apparatus may receive a target search statement transmitted by a front end.
According to an embodiment of the present disclosure, a large number of history search sentences may be stored in advance in the history search sentence database.
According to the embodiment of the disclosure, the history search sentence having the association relation with the target search sentence can be determined in a manner of performing text matching on the history search sentence and the target search sentence and judging whether the two sentences have the same meaning.
According to embodiments of the present disclosure, text matching may include data preprocessing, feature engineering, and modeling. The data preprocessing comprises the steps of extracting nominal phrases, translating models, removing co-occurrence words and the like; the feature engineering can comprise statistical features of co-occurrence words, weighted co-occurrence words and special co-occurrence words, representation features of Word2Vec, Word shift distance and DocVector, NLP features of keyword features and TFIDF distances, graph features of graph statistics features, graph structure features and graph propagation features; modeling, may include traditional models such as xgboost, depth models such as CNN, LSTM-RNN, fusion models, and the like.
In operation S202, a rough knowledge set is determined from the preconfigured operation and maintenance knowledge base according to the target search statement and the historical search statement.
According to the embodiment of the disclosure, the operation and maintenance knowledge base may include metadata (Meta), Status data (Status), and Event data (Event), where the metadata models the operation and maintenance entity world, including attributes, composition, and association of the operation and maintenance entity; state data reflects the state of the system, the survivability, resource consumption or capacity of the characterizing service, etc.; event data describes changes made to the system, exceptions to the service state, and the like.
According to the embodiment of the disclosure, the operation and maintenance knowledge statement corresponding to the target search statement can be obtained from the operation and maintenance knowledge base configured in advance according to various recall algorithms to serve as the recall statement, and the recalled operation and maintenance knowledge statement serves as a rough-ranking knowledge set.
In operation S203, a target knowledge document is extracted from the rough knowledge set based on a search term vector generated by fusing the target search term and the history search term.
According to an embodiment of the present disclosure, the target knowledge document may be a search answer corresponding to a target search sentence or a history search sentence.
According to the embodiment of the disclosure, the technical scheme that the historical search sentences related to the target search sentences are determined according to the target search sentences, then the knowledge in the operation and maintenance knowledge base is roughly screened according to the target search sentences and the historical search sentences, the target knowledge documents are extracted from the roughly-arranged knowledge set according to the search word vectors generated by fusing the target search sentences and the historical search sentences is adopted, the fusion search is performed on similar problems, the search emphasis is highlighted, and the technical effect of improving the accuracy of the knowledge extraction is achieved.
The method shown in fig. 2 is further described with reference to fig. 3-8 in conjunction with specific embodiments.
FIG. 3 schematically shows a flow diagram of a method of determining a historical search statement according to an embodiment of the disclosure.
As shown in fig. 3, the method of determining a history search statement of this embodiment includes operations S301 to S303.
In operation S301, a candidate history search sentence is determined from the history search sentence database according to the target search sentence.
According to an embodiment of the present disclosure, the candidate history search sentence may be determined according to a reception time of the target search sentence. For example, the reception time may be set to 5 minutes, that is, search sentences within 5 minutes may be determined as candidate history search sentences.
In operation S302, the target search statement and the candidate historical search statement are input into the neural network model trained in advance, and a similarity result is output.
According to the embodiment of the present disclosure, when determining the similarity between the target search sentence and the candidate history search sentence, it may be based on a QQ match (quira Question Pairs), which refers to a problem regarding text matching, and mainly determines whether two problems have the same meaning.
According to an embodiment of the present disclosure, the neural network model may include a multi-way multi-layer LSTM (Long Short-Term Memory network), a distance-based bag-of-words model, an embedding model based on text sentence vectors, and the like.
According to an embodiment of the present disclosure, calculating the similarity of the text may include a representation of the text and a distance metric measuring the similarity of the text.
According to an embodiment of the present disclosure, the representation of the text may be based on a SOW vocabulary model, a BOW bag of words model, an nBOW standardized bag of words model, a TF-IDF term-inverse document frequency model, an N-Gram multivariate language model, an LSI/LDA model, a Word2vec model, a GloVe model, a Doc2vec model, and the like.
According to embodiments of the present disclosure, the distance metric may include an euclidean distance, a manhattan distance, a chebyshev distance, a minkowski distance, a normalized euclidean distance, a mahalanobis distance, an included angle cosine, a hamming distance, a jackard distance & jackard similarity coefficient, a correlation coefficient & correlation distance, and an information entropy, among others.
According to the embodiment of the disclosure, the characteristics of the target search statement and the candidate historical search statement can be extracted by adopting a multi-path multi-layer LSTM mode through QQ matching, and the similarity result is determined according to the output vector.
According to an embodiment of the present disclosure, the similarity result may characterize whether the target search sentence and the candidate history search sentence are synonymous.
In operation S303, in the case where the similarity result satisfies the preset condition, the candidate history search sentence is determined as the history search sentence.
According to an embodiment of the present disclosure, the preset condition may be a magnitude relationship between the similarity result and a preset threshold, for example, in a case that the similarity result is greater than the preset threshold, the candidate historical search statement is determined as the historical search statement; but not limited to this, it may also be determined that the candidate historical search statement is a historical search statement if the similarity result is less than the preset threshold.
According to the embodiment of the present disclosure, the preset threshold may be flexibly set by a person skilled in the art according to an actual application situation, and the embodiment of the present disclosure does not limit a specific value of the preset threshold.
FIG. 4 schematically illustrates a flow chart of a method of determining candidate historical search statements according to an embodiment of the disclosure.
As shown in fig. 4, the method of determining candidate history search sentences according to this embodiment includes operations S401 to S403.
In operation S401, a reception time of a target search sentence is acquired.
According to embodiments of the present disclosure, the reception time may characterize a time at which the target search sentence is received.
According to the embodiment of the disclosure, the receiving time can be flexibly set by a person skilled in the art according to the actual application situation, and the preset receiving time is not limited by the embodiment of the disclosure.
In operation S402, search sentences in the historical search sentence database are sorted according to the reception time, and a sorting result is generated.
According to an embodiment of the present disclosure, search sentences and reception times respectively corresponding to each search sentence may be stored in the history search sentence database.
According to the embodiment of the disclosure, some search sentences to be screened are determined from the search sentences in the historical search sentence database according to a preset time range, and then the search sentences to be screened and the target search sentences are sorted according to the receiving time difference to generate a sorting result.
According to embodiments of the present disclosure, the ranking results may characterize the search order of the search statements. For example, the preset time range may be set to 5 minutes, if the receiving time of the target search term is 09:00, the receiving time of the search term 1 is 09:01, the receiving time of the search term 2 is 09:03, and the receiving time of the search term 3 is 09:07, the search terms 1 and the search terms 2 to be screened may be determined from the search terms according to the preset time range, and then the search terms 1 and the search terms 2 may be sorted according to the receiving time difference between the search terms 1 and the target search term being 1 minute and the receiving time difference between the search terms 2 and the target search term being 3 minutes, and the sorting results may be generated as search terms 1-2.
In operation S403, candidate history search sentences are determined according to the sorting result.
According to the embodiment of the disclosure, a plurality of candidate historical search sentences can be output and displayed according to the sorting result; in order to reduce redundant data and improve the efficiency of data transmission, further, after determining the sorting results of multiple candidate historical search sentences, the data of the top N candidate historical search sentences may be displayed, where N may be set according to actual conditions.
According to the embodiment of the disclosure, because the sorting result generated according to the receiving time of the target search statement is adopted, the candidate historical search statement is determined from the historical search statement database, then the similarity between the target search statement and the candidate historical search statement is calculated by using the neural network model, and under the condition that the similarity result meets the preset condition, the candidate historical search statement is determined as the historical search statement, the similar problem in the historical search statement database can be determined, and the similar problem is taken as the historical search statement, so that the subsequent fusion search can be performed on the similar problem.
FIG. 5 schematically illustrates a flow chart of a method of determining a coarse knowledge set according to an embodiment of the present disclosure.
As shown in fig. 5, the method of determining a coarse knowledge set of this embodiment includes operations S501 to S502.
In operation S501, based on the target search statement and the historical search statement, an initial knowledge set corresponding to each recall algorithm is determined from the operation and maintenance knowledge base by using a plurality of recall algorithms, respectively.
According to embodiments of the present disclosure, the recall algorithm may include a hit recall, a tag-of-interest based recall, a collaborative filtering based recall, and the like. Such as the TFIDF algorithm, the BM25 algorithm, etc.
According to the embodiment of the present disclosure, the TFIDF algorithm is a common weighting technique for information retrieval (information retrieval) and text mining (text mining), and is used for counting the importance degree of a word to a file. The importance of a word increases with the frequency with which it appears in a document, and decreases with the frequency with which it appears in the document library; TFIDF, wherein:
Figure BDA0003380291360000131
Figure BDA0003380291360000132
according to the embodiment of the disclosure, the BM25 algorithm obtains the Score of the final query sentence by calculating the Score sum Score (Q, d) of the morphemes (chinese participles) in the query sentence in the document; the weight calculation modes of morphemes and documents are many, and an IDF algorithm is usually adopted; wherein:
Figure BDA0003380291360000133
Figure BDA0003380291360000141
Figure BDA0003380291360000142
wherein Q represents a query statement, d represents a recall statement, QiDenotes the ith word, n denotes the number of query sentence words, WiRepresenting the weight between the ith word recall sentence, K representing the toneSection factor, q denotes query statement, k1B and k2Is a regulatory factor; r (q)iD) a relevance score between the search and recall sentences, qiRepresenting a query statement, d representing a recall statement; f. ofiDenotes the frequency of occurrence of morphemes (words) in q, qfiIndicating the frequency of occurrence of morphemes (words) in a recalled sentence, k is typically set empirically1=k22, b is 0.75; dl is the document length and avgdl is the average document length in the document library.
According to the embodiment of the disclosure, a recall algorithm may have a vulnerability, and various recall algorithms can be utilized to take out answers related to similar problems with association relations from an operation and maintenance knowledge base, so that the recall rate and the recall accuracy rate are ensured.
In operation S502, a coarse knowledge set is generated from a plurality of initial knowledge sets.
FIG. 6 schematically shows a flow diagram of a method of generating a coarse knowledge set according to an embodiment of the disclosure.
As shown in fig. 6, the method of generating a ruled knowledge set of this embodiment includes operations S601 to S603.
In operation S601, a normalization process is performed on scores of knowledge documents in a plurality of initial knowledge sets to generate a normalized initial knowledge set.
According to an embodiment of the present disclosure, a score may be calculated based on a recall algorithm by a plurality of initial knowledge-set knowledge documents.
According to an embodiment of the present disclosure, the normalization processing method may include (0, 1) normalization, Z-score normalization, min-max normalization, log function conversion, atan function conversion, Sigmoid function, and the like.
In operation S602, deduplication processing is performed on knowledge documents in the plurality of normalized initial knowledge sets, so as to generate a deduplication initial knowledge set.
According to an embodiment of the present disclosure, the deduplication process may include deduplication based on an information digest algorithm, deduplication based on a simhash algorithm, deduplication based on a bloom filter principle, and the like.
In operation S603, the knowledge documents in the multiple deduplication initial knowledge sets are sorted according to the scores, and a rough knowledge set is generated.
According to embodiments of the present disclosure, knowledge documents may be ranked by score from high to low.
According to the embodiment of the disclosure, a plurality of recall sentences can be used as a rough knowledge set according to the sequencing result; further, after the sorting results of a plurality of recall sentences are determined, the recall sentence data of the top N bits can be used as a rough knowledge set, and N can be set according to actual conditions.
According to the embodiment of the disclosure, through a target search statement, determining an initial knowledge set corresponding to each recall algorithm from an operation and maintenance knowledge base by utilizing multiple recall algorithms respectively; performing score normalization processing and deduplication processing on the knowledge documents in the initial knowledge set; and generating a coarse knowledge set according to the value sorting; the reliability of operation and maintenance knowledge statement storage is improved, redundant data is reduced, and the data transmission efficiency is improved.
FIG. 7 schematically shows a flowchart of a method of extracting a target knowledge document according to an embodiment of the present disclosure.
As shown in FIG. 7, the method of extracting a target knowledge document of this embodiment includes operations S701 to S703.
In operation S701, word embedding processing is performed on the target search sentence and the history search sentence, respectively, to generate a first search word vector corresponding to the target search sentence and a second search word vector corresponding to the history search sentence.
According to an embodiment of the present disclosure, the Word Embedding processing method may include, but is not limited to, a skip-gram-based Model, a CBOW-based (Continuous Bag-of-Word Model) Model, and Embedding, for example.
According to an embodiment of the present disclosure, embed is a way to convert discrete variables into a continuous vector representation; not only can the dimension of the discrete space be reduced, but also more similar sentences can be found in the high-dimensional space.
In operation S702, the first search term vector and the second search term vector are fused to generate a search term vector.
According to an embodiment of the present disclosure, the fusing process may include vector fusing the first search term vector and the second search term vector to generate a search term vector, where the search term vector may include a new vector obtained after vector fusing.
According to the embodiment of the disclosure, the fusion processing can adopt methods such as multitask learning, knowledge graph-based text generation, memory network-based text generation, text generation by combining distribution-sampling and the like.
In operation S703, a target knowledge document is extracted from the rough knowledge set according to the search term vector.
According to the embodiment of the disclosure, firstly, word embedding processing is carried out on a target search statement and a historical search statement to generate a first search word vector and a second search word vector; then fusing the first search word vector and the second search word vector to generate a search word vector; and extracting a target knowledge document from the rough knowledge set according to the search word vector, performing fusion search aiming at similar problems, further extracting search characteristics, highlighting search key points, and realizing the technical effect of improving the accuracy rate of knowledge extraction.
FIG. 8 schematically shows a flowchart of a method of generating an operation and maintenance knowledge graph according to an embodiment of the present disclosure.
As shown in fig. 8, the operation and maintenance knowledge graph generation method of this embodiment includes operations S801 to S803.
In operation S801, an original operation and maintenance knowledge document is acquired.
According to an embodiment of the present disclosure, the original operation and maintenance knowledge documents may include system documents, plans, problem solutions, institutional rules, operation manuals, and the like of the data center.
In operation S802, a word segmentation process is performed on the operation and maintenance knowledge sentence in the original operation and maintenance knowledge document to generate a plurality of original operation and maintenance knowledge words.
According to the embodiment of the disclosure, the original operation and maintenance knowledge document can be stripped according to sections; and stores the knowledge pieces in ES (Elastic search, Lucene-based search server) by lines.
According to embodiments of the present disclosure, ES may be replaced with SOLR search application server, MYSQL and ORACLE databases, etc.; one piece of content in the original operation and maintenance knowledge document can be used as an operation and maintenance knowledge statement.
According to the embodiment of the disclosure, the target search statement can be cut into words, and the original operation and maintenance known words corresponding to the target search statement are obtained.
According to the embodiment of the disclosure, in the original operation and maintenance knowledge document, the key words are more prominent, so that the word weight of the words is extracted on the basis of the original operation and maintenance knowledge document, and the importance of the words can be better highlighted; the original operation and maintenance knowledge word is a word obtained by performing word segmentation processing on an original operation and maintenance knowledge document.
According to the embodiment of the disclosure, the operation and maintenance knowledge sentences in the original operation and maintenance knowledge document can be vectorized, and the obtained vector is used as a word vector for the operation and maintenance to know and recognize words.
In operation S803, an operation and maintenance knowledge graph is generated according to a plurality of original operation and maintenance knowledge words, where the operation and maintenance knowledge graph includes nodes and edges, the nodes represent the original operation and maintenance knowledge words, the nodes have word weights, and the edges represent the context of the nodes connected by the edges in the operation and maintenance knowledge sentence.
According to the embodiment of the disclosure, the node weight can be calculated for the operation and maintenance knowledge statement in the original operation and maintenance knowledge document.
According to an embodiment of the present disclosure, the operation and maintenance knowledge graph may include a correspondence between a plurality of original operation and maintenance known words and word weights.
According to an embodiment of the present disclosure, the context may be that a connection line from a word before the current word to the current word is called an in-degree (in-edge), and a connection line between the current word and a word after the current word is called an out-degree (out-edge).
According to the embodiment of the disclosure, an original operation and maintenance knowledge document can be decomposed into original operation and maintenance knowledge words, an operation and maintenance knowledge map is formed, and the importance of the node weight is calculated according to the number of the degree of arrival and the degree of departure of each word, which is specifically described as follows:
a) carrying out word segmentation on the original operation and maintenance knowledge document and generating an initial graph structure; is provided withInitialization weight of each node (default to reciprocal of number of nodes in graph structure) namely WS (V)j) The initial value of (2) is set as the reciprocal of the number of nodes in the graph structure; setting out-degree weight w of each nodejkThe default is that the out-degree number of the node is the number of the connecting lines pointing to other nodes by the node; setting an in-degree weight w of each nodeijThe default is the number of entries of the node, i.e. the number of links pointing to the node from other nodes.
b) Obtaining the word weight of each node according to a calculation formula of the word importance degree; the calculation formula of the degree of importance is as follows:
Figure BDA0003380291360000171
wherein the content of the first and second substances,
Figure BDA0003380291360000182
represents the sum of all out-of-degree weights for each node,
Figure BDA0003380291360000181
representing the important weight of the node, d representing the damping coefficient, and the default setting is 0.85; the word weight for each node is calculated over several iterations.
c) And obtaining the original sentence graph structure, namely the operation and maintenance knowledge graph according to the word weight of each node.
According to the embodiment of the disclosure, a plurality of original operation and maintenance knowledge words are generated by carrying out word segmentation on operation and maintenance knowledge sentences in an original operation and maintenance knowledge document; generating an operation and maintenance knowledge map according to a plurality of original operation and maintenance knowledge words; the reliability of storing the operation and maintenance knowledge statement is improved, and the reliability of recalling the operation and maintenance knowledge statement next is improved; meanwhile, the accuracy of obtaining the operation and maintenance knowledge graph is improved, and the reliability of obtaining the word vector set is also improved.
FIG. 9 schematically shows a flow chart of a method of determining a target knowledge document according to an embodiment of the present disclosure.
As shown in FIG. 9, the method of determining a target knowledge document of this embodiment includes operations S901-S904.
In operation S901, word weights corresponding to knowledge documents in the rough knowledge set are determined from the operation and maintenance knowledge graph.
According to the embodiment of the disclosure, the operation and maintenance knowledge words and the word weight of each operation and maintenance knowledge word corresponding to the target search statement and each recall statement can be determined according to the operation and maintenance knowledge map, the target search statement and each recall statement.
According to the embodiment of the disclosure, whether the operation and maintenance knowledge word exists in the operation and maintenance knowledge map is judged, if yes, the word weight of the operation and maintenance knowledge word is obtained from the operation and maintenance knowledge map, otherwise, the word weight of the operation and maintenance knowledge word corresponding to the target search statement can be set as the reciprocal of the number of nodes in the map, and the importance of the word is neither highlighted nor weakened; the original operation and maintenance known words belonging to the recall sentences can be obtained from the operation and maintenance knowledge map, the original operation and maintenance known words are used as the operation and maintenance known words corresponding to the recall sentences, and the word weights corresponding to the operation and maintenance known words are obtained from the operation and maintenance knowledge map.
In operation S902, a set of coarse knowledge vectors is generated from the knowledge documents and word weights in the coarse knowledge set.
According to embodiments of the present disclosure, a set of coarse knowledge vectors may be generated from knowledge documents and word weights in the coarse knowledge set.
According to an embodiment of the present disclosure, the rough knowledge vector set may include each original operation and maintenance knowledge word and its respective word weight and word vector.
In operation S903, similarity calculation is performed on the search term vector and the knowledge document vectors in the rough-ranking knowledge vector set and the term weights, so as to generate a fine-ranking knowledge set.
According to the embodiment of the disclosure, similarity calculation can be performed according to the word weight and the word vector of the operation and maintenance known word of the target search statement and the word weight and the word vector of the operation and maintenance known word of each recall statement.
In operation S904, a target knowledge document is determined from the refined knowledge.
According to the embodiment of the disclosure, the sequencing results of a plurality of recall sentences can be determined according to the similarity calculation result of each recall sentence, so as to determine the target knowledge document from the refined knowledge.
According to the embodiment of the present disclosure, the similarity calculation result of the recall sentence may be used as the ranking score of the recall sentence, and the recall sentences may be ranked from high to low according to the ranking score.
According to the embodiment of the disclosure, the similarity calculation is carried out on the knowledge document vectors in the search word vector and rough-arranged knowledge vector sets and the word weights to generate the fine-arranged knowledge sets, and then the target knowledge documents are determined from the fine-arranged knowledge, namely, the operation and maintenance knowledge subjected to the coarse screening is further subjected to the fine screening, so that the technical effect of improving the accuracy of knowledge extraction is realized.
According to the embodiment of the disclosure, by storing the operation and maintenance knowledge items and vectorization in the operation and maintenance knowledge base, firstly, a historical search statement is determined according to a target search statement, and then, the knowledge in the operation and maintenance knowledge base is roughly screened according to the target search statement and the historical search statement; then, according to the fusion of the target search statement and the historical search statement, aiming at the fusion search of similar problems, the search characteristics are further extracted, and the search key points are highlighted; then similarity calculation is carried out on the knowledge document vectors in the search word vector and rough-ranking knowledge vector sets and the word weights to generate a fine-ranking knowledge set, and then a target knowledge document is determined from the fine-ranking knowledge, so that the search accuracy is improved; and then, by utilizing various recall algorithms, the answers related to the similar problems with the incidence relation are all taken out from the operation and maintenance knowledge base, the recall rate and the recall accuracy rate are ensured, the technical effect of improving the knowledge extraction accuracy rate is realized, the operation and maintenance threshold is reduced, and the operation and maintenance efficiency is improved.
Fig. 10 schematically shows a block diagram of a knowledge extraction apparatus 1000 according to an embodiment of the present disclosure.
As shown in fig. 10, the knowledge extraction device includes a first determination module 1001, a second determination module 1002, and an extraction module 1003.
A first determining module 1001, configured to, in response to receiving a target search statement, determine, according to the target search statement, a historical search statement from a historical search statement database, where the historical search statement has an association relationship with the target search statement;
a second determining module 1002, configured to determine a rough knowledge set from a preconfigured operation and maintenance knowledge base according to a target search statement and the historical search statement; and
the extraction module 1003 is configured to extract a target knowledge document from the rough knowledge set according to a search term vector generated by fusing the target search term and the historical search term.
According to an embodiment of the present disclosure, the first determination module 1001 includes a first determination unit, an output unit, and a second determination unit.
A first determining unit, configured to determine a candidate historical search statement from a historical search statement database according to the target search statement.
And the output unit is used for inputting the target search statement and the candidate search statement into the pre-trained neural network model and outputting a similarity result.
And the second determining unit is used for determining the candidate search sentences as the historical search sentences under the condition that the similarity result meets the preset condition.
According to an embodiment of the present disclosure, the first determination unit includes an acquisition subunit, a generation subunit, and a determination subunit.
And the acquisition subunit is used for acquiring the receiving time of the target search statement.
And the generating subunit is used for sequencing the search sentences in the historical search sentence database according to the receiving time and generating a sequencing result.
And the determining subunit is used for determining the candidate historical search sentences according to the sorting result.
According to an embodiment of the present disclosure, the second determining module 1002 includes a third determining unit and a generating unit.
And the third determining unit is used for determining an initial knowledge set corresponding to each recall algorithm from the operation and maintenance knowledge base by utilizing a plurality of recall algorithms respectively based on the target search statement and the historical search statement.
And the first generation unit is used for generating a rough knowledge set according to the plurality of initial knowledge sets.
According to an embodiment of the present disclosure, the generation unit includes a first processing subunit, a second processing subunit, and a sorting subunit.
And the first processing subunit is used for carrying out normalization processing on the scores of the knowledge documents in the plurality of initial knowledge sets to generate a normalized initial knowledge set.
And the second processing subunit is used for performing duplicate removal processing on the knowledge documents in the plurality of normalized initial knowledge sets to generate a duplicate removal initial knowledge set.
And the sequencing subunit is used for sequencing the knowledge documents in the plurality of duplication-removing initial knowledge sets according to the scores to generate a rough knowledge set.
According to an embodiment of the present disclosure, the extraction module 1003 includes:
and the second generation correspondence is used for performing word embedding processing on the target search statement and the historical search statement respectively to generate a first search word vector corresponding to the target search statement and a second search word vector corresponding to the historical search statement.
And a third generating unit, configured to fuse the first search term vector and the second search term vector, and generate the search term vector.
And the extraction unit is used for extracting the target knowledge document from the rough knowledge set according to the search word vector.
According to the embodiment of the disclosure, the knowledge extraction device further comprises an acquisition module, a processing module and a generation module.
And the acquisition module is used for acquiring the original operation and maintenance knowledge document.
The first processing module is used for carrying out word segmentation processing on operation and maintenance knowledge sentences in the original operation and maintenance knowledge document to generate a plurality of original operation and maintenance knowledge words.
The generation module is used for generating an operation and maintenance knowledge graph according to a plurality of original operation and maintenance knowledge words, wherein the operation and maintenance knowledge graph comprises nodes and edges, the nodes represent the original operation and maintenance knowledge words, the nodes have word weights, and the edges represent the front-back relation of the nodes connected by the edges in the operation and maintenance knowledge sentence.
According to an embodiment of the present disclosure, the knowledge extraction apparatus further includes a second processing module.
And the second processing module is used for calculating the node weight of the operation and maintenance knowledge statement in the original operation and maintenance knowledge document.
According to an embodiment of the present disclosure, the extraction module 1003 includes a fourth determination unit, a second generation unit, a processing unit, a calculation unit, and a fifth determination unit.
And the fourth determining unit is used for determining word weights corresponding to the knowledge documents in the rough-arranged knowledge set from the operation and maintenance knowledge graph.
And the fourth generating unit is used for generating a rough knowledge vector set according to the knowledge documents in the rough knowledge set and the word weights.
And the calculating unit is used for carrying out similarity calculation on the search word vectors, the knowledge document vectors in the refined knowledge set and the word weights to generate the refined knowledge set.
And the fifth determining unit is used for determining the target knowledge document from the refined knowledge.
Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.
For example, any plurality of the first determining module 1001, the second determining module 1002 and the extracting module 1003 may be combined and implemented in one module/unit/sub-unit, or any one of the modules/units/sub-units may be split into a plurality of modules/units/sub-units. Alternatively, at least part of the functionality of one or more of these modules/units/sub-units may be combined with at least part of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to an embodiment of the present disclosure, at least one of the first determining module 1001, the second determining module 1002, and the extracting module 1003 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented in any one of three implementations of software, hardware, and firmware, or in a suitable combination of any of them. Alternatively, at least one of the first determining module 1001, the second determining module 1002 and the extracting module 1003 may be at least partly implemented as a computer program module, which when executed may perform the respective function.
It should be noted that the knowledge extraction device portion in the embodiment of the present disclosure corresponds to the knowledge extraction method portion in the embodiment of the present disclosure, and the description of the knowledge extraction device portion specifically refers to the knowledge extraction method portion, and is not repeated herein.
Fig. 11 schematically illustrates a block diagram of an electronic device 1100 adapted to implement a method of knowledge extraction according to an embodiment of the disclosure. The electronic device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 11, a computer electronic device 1100 according to an embodiment of the present disclosure includes a processor 1101, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. Processor 1001 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 1101 may also include on-board memory for caching purposes. The processor 1101 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to the embodiments of the present disclosure.
In the RAM1103, various programs and data necessary for the operation of the electronic device 1100 are stored. The processor 1101, the ROM 1102, and the RAM1103 are connected to each other by a bus 1104. The processor 1101 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 1102 and/or the RAM 1103. It is noted that the programs may also be stored in one or more memories other than the ROM 1102 and RAM 1103. The processor 1101 may also perform various operations of the method flows according to the embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 1100 may also include input/output (I/O) interface 1105, input/output (I/O) interface 1105 also connected to bus 1104, according to an embodiment of the disclosure. Electronic device 1100 may also include one or more of the following components connected to I/O interface 1105: an input portion 1106 including a keyboard, mouse, and the like; an output portion 1107 including a signal output unit such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 1108 including a hard disk and the like; and a communication section 1109 including a network interface card such as a LAN card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. A driver 1110 is also connected to the I/O interface 1105 as necessary. A removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1110 as necessary, so that a computer program read out therefrom is mounted into the storage section 1108 as necessary.
According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 1109 and/or installed from the removable medium 1111. The computer program, when executed by the processor 1101, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 1102 and/or the RAM1103 and/or one or more memories other than the ROM 1102 and the RAM1103 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method provided by the embodiments of the present disclosure, when the computer program product is run on an electronic device, the program code being configured to cause the electronic device to implement the method for determining a cloud service implementation policy provided by the embodiments of the present disclosure.
The computer program, when executed by the processor 1101, performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via the communication part 1109, and/or installed from the removable medium 1111. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (13)

1. A method of knowledge extraction, comprising:
in response to receiving a target search statement, determining a historical search statement having an association relation with the target search statement from a historical search statement database according to the target search statement;
determining a rough knowledge set from a preconfigured operation and maintenance knowledge base according to the target search statement and the historical search statement; and
and extracting a target knowledge document from the rough knowledge set according to a search word vector generated by fusing the target search statement and the historical search statement.
2. The method of claim 1, wherein the determining, from a historical search statement database, a historical search statement having an association with the target search statement from the target search statement comprises:
determining candidate historical search sentences from the historical search sentence database according to the target search sentences;
inputting the target search statement and the candidate historical search statement into a neural network model which is trained in advance, and outputting a similarity result;
and determining the candidate historical search sentences as the historical search sentences under the condition that the similarity result meets a preset condition.
3. The method of claim 2, wherein the determining candidate historical search sentences from the historical search sentence database according to the target search sentence comprises:
acquiring the receiving time of the target search statement;
sorting the search sentences in the historical search sentence database according to the receiving time to generate a sorting result;
and determining the candidate historical search sentences according to the sorting result.
4. The method of claim 1, wherein the determining a coarse knowledge set from a preconfigured completed operation and maintenance knowledge base according to the target search statement and the historical search statement comprises:
respectively utilizing multiple recall algorithms to determine an initial knowledge set respectively corresponding to each recall algorithm from the operation and maintenance knowledge base based on the target search statement and the historical search statement;
and generating the rough knowledge set according to a plurality of initial knowledge sets.
5. The method of claim 4, wherein knowledge documents in the initial knowledge set have scores calculated based on a recall algorithm;
the generating the coarse knowledge set from the plurality of initial knowledge sets comprises:
normalizing the scores of the knowledge documents in the initial knowledge sets to generate normalized initial knowledge sets;
carrying out duplication removal processing on knowledge documents in the plurality of normalized initial knowledge sets to generate duplication removal initial knowledge sets;
and sequencing the knowledge documents in the plurality of duplication-removing initial knowledge sets according to the scores to generate the rough knowledge set.
6. The method of claim 1, wherein the extracting a target knowledge document from the coarse knowledge set from a fused vector generated by fusing the target search term and the historical search term comprises:
performing word embedding processing on the target search statement and the historical search statement respectively to generate a first search word vector corresponding to the target search statement and a second search word vector corresponding to the historical search statement;
fusing the first search term vector and the second search term vector to generate the search term vector;
and extracting a target knowledge document from the rough knowledge set according to the search word vector.
7. The method of claim 1, further comprising:
acquiring an original operation and maintenance knowledge document;
carrying out word segmentation on operation and maintenance knowledge sentences in the original operation and maintenance knowledge document to generate a plurality of original operation and maintenance knowledge words;
and generating an operation and maintenance knowledge graph according to the original operation and maintenance knowledge words, wherein the operation and maintenance knowledge graph comprises nodes and edges, the nodes represent the original operation and maintenance knowledge words, the nodes have word weights, and the edges represent the front-back relationship of the nodes connected by the edges in the operation and maintenance knowledge sentence.
8. The method of claim 7, further comprising:
and calculating the node weight of the operation and maintenance knowledge statement in the original operation and maintenance knowledge document.
9. The method of claim 8, wherein the extracting a target knowledge document from the coarse knowledge set according to the target search statement and the historical search statement comprises:
determining word weights corresponding to knowledge documents in the rough-ranking knowledge set from the operation and maintenance knowledge graph;
generating a rough knowledge vector set according to the knowledge documents in the rough knowledge set and the word weight;
similarity calculation is carried out on the search word vectors and the knowledge document vectors and the word weights in the rough knowledge vector set, and a fine knowledge set is generated;
and determining the target knowledge document from the refined knowledge.
10. A knowledge extraction device, comprising:
the first determination module is used for responding to the receiving of a target search statement, and determining a historical search statement which has an association relation with the target search statement from a historical search statement database according to the target search statement;
the second determining module is used for determining a rough knowledge set from a preconfigured operation and maintenance knowledge base according to the target search statement and the historical search statement; and
and the extraction module is used for extracting the target knowledge document from the rough knowledge set according to the search word vector generated by fusing the target search statement and the historical search statement.
11. An electronic device, comprising:
one or more processors;
a memory to store one or more instructions that,
wherein the one or more instructions, when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.
12. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 9.
13. A computer program product comprising computer executable instructions for implementing the method of any one of claims 1 to 9 when executed.
CN202111437931.4A 2021-11-29 2021-11-29 Knowledge extraction method, apparatus, device, medium, and program product Pending CN114064859A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111437931.4A CN114064859A (en) 2021-11-29 2021-11-29 Knowledge extraction method, apparatus, device, medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111437931.4A CN114064859A (en) 2021-11-29 2021-11-29 Knowledge extraction method, apparatus, device, medium, and program product

Publications (1)

Publication Number Publication Date
CN114064859A true CN114064859A (en) 2022-02-18

Family

ID=80277242

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111437931.4A Pending CN114064859A (en) 2021-11-29 2021-11-29 Knowledge extraction method, apparatus, device, medium, and program product

Country Status (1)

Country Link
CN (1) CN114064859A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118132731A (en) * 2024-05-06 2024-06-04 杭州数云信息技术有限公司 Dialogue method and device, storage medium, terminal and computer program product

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118132731A (en) * 2024-05-06 2024-06-04 杭州数云信息技术有限公司 Dialogue method and device, storage medium, terminal and computer program product

Similar Documents

Publication Publication Date Title
US11334635B2 (en) Domain specific natural language understanding of customer intent in self-help
CN111797214A (en) FAQ database-based problem screening method and device, computer equipment and medium
CN113887701A (en) Generating outputs for neural network output layers
WO2019217096A1 (en) System and method for automatically responding to user requests
US11068479B2 (en) Method and system for analytic based connections among user types in an online platform
CN109766441B (en) Text classification method, device and system
CN114861889B (en) Deep learning model training method, target object detection method and device
CN113986864A (en) Log data processing method and device, electronic equipment and storage medium
CN114036398A (en) Content recommendation and ranking model training method, device, equipment and storage medium
CN112686053A (en) Data enhancement method and device, computer equipment and storage medium
CN114064859A (en) Knowledge extraction method, apparatus, device, medium, and program product
CN111737607B (en) Data processing method, device, electronic equipment and storage medium
CN112329429A (en) Text similarity learning method, device, equipment and storage medium
CN116383340A (en) Information searching method, device, electronic equipment and storage medium
CN115759292A (en) Model training method and device, semantic recognition method and device, and electronic device
CN115238676A (en) Method and device for identifying hot spots of bidding demands, storage medium and electronic equipment
CN114048315A (en) Method and device for determining document tag, electronic equipment and storage medium
CN112926295A (en) Model recommendation method and device
CN113095078A (en) Associated asset determination method and device and electronic equipment
CN114154477A (en) Text data processing method and device, electronic equipment and readable storage medium
CN112926297A (en) Method, apparatus, device and storage medium for processing information
RU2823436C1 (en) System for automatic determination of subject matter of text documents based on explicable artificial intelligence methods
CN111914536B (en) Viewpoint analysis method, viewpoint analysis device, viewpoint analysis equipment and storage medium
CN113177122A (en) Associated asset determination method and device and electronic equipment
CN114840666A (en) Retrieval method, retrieval apparatus, electronic device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination