CN117407492A - Keyword sequence generation method, system and equipment based on knowledge graph - Google Patents

Keyword sequence generation method, system and equipment based on knowledge graph Download PDF

Info

Publication number
CN117407492A
CN117407492A CN202311713478.4A CN202311713478A CN117407492A CN 117407492 A CN117407492 A CN 117407492A CN 202311713478 A CN202311713478 A CN 202311713478A CN 117407492 A CN117407492 A CN 117407492A
Authority
CN
China
Prior art keywords
triplet
guest
entity
triples
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311713478.4A
Other languages
Chinese (zh)
Other versions
CN117407492B (en
Inventor
黎海情
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Ocean University
Original Assignee
Guangdong Ocean University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Ocean University filed Critical Guangdong Ocean University
Priority to CN202311713478.4A priority Critical patent/CN117407492B/en
Publication of CN117407492A publication Critical patent/CN117407492A/en
Application granted granted Critical
Publication of CN117407492B publication Critical patent/CN117407492B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to the field of data processing, and provides a keyword sequence generation method, a keyword sequence generation system and keyword sequence generation equipment based on a knowledge graph, wherein input information is acquired from a client, and a generated text is obtained by inputting the information into a generation model; acquiring a plurality of triples from the generated text; and marking the elements in the triples in the generated text, filling a logic diagram according to the marked sequence of the elements to obtain a keyword sequence tree, and outputting the keyword sequence tree to a client, thereby not only improving the quality of the generated text, but also providing strong support for deepening the logic hierarchy of the text and enhancing the context correlation.

Description

Keyword sequence generation method, system and equipment based on knowledge graph
Technical Field
The invention belongs to the field of data processing, and particularly relates to a method, a system and equipment for generating a keyword sequence based on a knowledge graph.
Background
In the linguistic field, key word sequence generation is a key research topic, and has important influence on various tasks such as automatic abstracting, machine translation, question-answering systems and the like. However, conventional keyword sequence generation methods are typically based on statistical or rule-based methods, which focus mainly on the word order relationships of surfaces, and it is difficult to process and understand complex semantic relationships in text. This approach tends to work poorly when dealing with complex, ambiguous or indirect semantic relationships. Although some key word sequence generating methods based on deep learning can capture a certain logic relationship, understanding and processing capability of a deep logic structure still are lacking. For example, the artificial intelligence-based assisted writing method described in the patent document publication No. CN116992834a cannot accurately understand and depict causal, sequential or comparative relationships in text. In actual language use, the meaning of the words is often affected by the context information. However, conventional keyword sequence generation methods often lack an efficient context processing mechanism, and for example, a writing assistance method described in patent document CN109582839B, the generated text may have problems in context consistency and consistency. In the real world, the relationships between entities are complex and various, and may include relationships belonging to various types of relationships, located relationships, owned relationships and the like, and the conventional keyword sequence generation method often has difficulty in accurately processing the complex entity relationships. Based on these shortcomings, the industry is beginning to explore new methods, and hopefully, the quality and logic depth of keyword sequence generation can be improved.
Disclosure of Invention
The invention aims to provide a key word sequence generation method, a key word sequence generation system and key word sequence generation equipment based on a knowledge graph, which are used for solving one or more technical problems in the prior art and at least providing a beneficial selection or creation condition.
In order to achieve the above object, according to an aspect of the present invention, there is provided a keyword sequence generation method based on a knowledge graph, the method comprising the steps of:
acquiring input information from a client, and obtaining a generated text by inputting the information to a generation model; acquiring a plurality of triples from the generated text; marking the elements in the triples in the generated text, filling a logic diagram according to the marked sequence of the elements to obtain a keyword sequence tree, and outputting the keyword sequence tree to a client.
Further, wherein the information input from the client is obtained as one or more character strings.
Further, wherein the generative model is a generative pre-training language model.
Further, entity relation extraction is performed on the generated text, so that a plurality of different triples are obtained.
Further, elements in the plurality of triples are marked in the generated text: and segmenting the generated text, marking the elements in each triplet in the corresponding segmented word, and recording the sequence and the position relation of each element in each triplet in the generated text.
Further, the method for obtaining the keyword sequence tree by performing logic diagram filling according to the marked sequence of the elements may specifically be that the logic diagram filling is performed according to the first logic diagram constraint, the second logic diagram constraint and/or the third logic diagram constraint:
connecting the triples with the knowledge graph, and searching nodes corresponding to elements in the triples in the knowledge graph, wherein each triplet is formed by taking a head entity, an entity relationship and a tail entity as elements;
first logic diagram constraint: in the generated text, when a tail entity of one triplet precedes its head entity and its physical relationship follows its head entity, and there are elements in other triples between its head entities:
setting that each element in other triplets between the tail entity and the head entity is called each prepositioned guest node corresponding to the triplet, wherein the triplet is called a target triplet, calculating shortest paths between each prepositioned guest node corresponding to the triplet and the head entity of the triplet and the tail entity of the triplet respectively, taking each shortest path between each prepositioned guest node and the head entity and each shortest path between each foresitioned guest node and the tail entity as each prepositioned guest sub-path, obtaining the shortest paths passing through all prepositioned guest nodes as a prepositioned guest sub-graph, and filling logic diagrams according to the prepositioned guest sub-paths and the prepositioned guest sub-graph;
in linguistic research, the relationship between the preceding word and the following word may be deeply explored, the preceding guest node may represent the preceding word, and the head entity and the tail entity may represent the subject and the focus point, respectively, and the constraint may help understand and model the relationship to reveal the deep structure of the text, better understand and model the relationship between the language structure, especially the subject and the focus point in the sentence, and further more accurately identify and utilize the relationship implicit in the text, thereby improving the accuracy and the richness of the keyword sequence generation, helping to construct a more complex and detailed knowledge graph, and providing deeper text understanding and generation.
Second logic diagram constraint: in the generated text, when a tail entity of one triplet precedes its head entity and its physical relationship precedes its head entity, and there are elements in other triples between the head entity and its physical relationship:
setting each element in other triplets between a head entity and an entity relation thereof as each centrally-installed guest node corresponding to the triplets, wherein the triplets are called target triplets, calculating shortest paths between each centrally-installed guest node corresponding to the triplets and the head entity and the tail entity of the triplets respectively, taking each shortest path between each centrally-installed guest node and the head entity and each shortest path between each centrally-installed guest node and the tail entity as each centrally-installed guest sub-path, acquiring the shortest paths passing through all centrally-installed guest nodes as centrally-installed guest sub-paths, and filling logic diagrams according to the centrally-installed guest sub-paths and the centrally-installed guest sub-paths;
in linguistics, centrally located guest nodes correspond to intermediary elements that semantically connect topics and focus, and by identifying and modeling these nodes, complex semantic relationships can be better understood and delineated, which helps to understand language complexity and diversity more deeply, particularly when dealing with ambiguous, ambiguous or indirect semantic relationships. Such constraints facilitate better understanding and rendering of indirect or implicit relationships in text, facilitate more rich and diverse text generation, and enhance consistency and integrity of knowledgebased text.
Third logic diagram constraint: in the generated text, when a tail entity of a triplet is preceded by its head entity and its physical relationship is preceded by its head entity, and there are elements in other triples between the tail entity and its physical relationship: setting that each element in a target triplet exists between a tail entity and an entity relation of the target triplet as each post-guest node corresponding to the triplet, wherein the triplet is called the target triplet, calculating shortest paths between each post-guest node corresponding to the triplet and a head entity of the triplet and between each post-guest node and the tail entity of the triplet respectively, taking each shortest path between each post-guest node and the head entity and each shortest path between each post-guest node and the tail entity as each post-guest sub-path, acquiring the shortest paths passing through all post-guest nodes as post-guest sub-paths, and carrying out logic diagram filling according to the post-guest sub-paths and the post-guest sub-paths;
in linguistics, post-guest nodes may represent contexts or references, these elements may influence or modify the understanding of topics and focuses, and by analyzing these nodes, a more accurate context model may be established, which may help to explore and understand context effects in languages, help to more comprehensively capture and express causal relationships and logical structures in text, enhance the application capability of knowledge maps in text generation, and promote consistency and logic of text content.
The output of the traditional large model-based text auxiliary generation or auxiliary writing is very shallow, or only texts which look much like some modes but do not go deep into logic content support are generated, and the texts produced by the texts in the application scenes of teaching application or document production are not up to standard. The solution in the software industry is to blindly enlarge the tensor parameter scale of the language model and mechanically build the tensor parameter scale into a knowledge base, and the inadvisable problem is caused only if deep internal rules in linguistics are not researched by tracing. The three logic diagram constraints of the method can enhance the role of the knowledge graph in text generation by analyzing the entities and the relations in the text in detail, so that the generated text is more accurate, rich and logical. By considering the context relation of entities in the text, the logic diagram filling can also better handle the context dependence in the text, and the consistency and logic of the text are improved. The method can be applied to various types of text generation tasks, such as automatic abstract, story generation, dialogue systems and the like, and the text generation quality and logic depth of various applications are widely improved.
Further, the method for filling the logic diagram can be as follows: and sequencing according to the superposition times of the extension nodes to form an extension node sequence, screening from the extension node sequence to obtain a plurality of semantic connection triples, connecting the standard extension nodes to be called a keyword sequence tree through the semantic connection triples, generating texts by using the keyword sequence tree to obtain a logic filling text, and supplementing the generated texts by using the logic filling text. Logic diagram stuffing helps to mine deep logic relationships, such as implicit causal, sequential or contrasting relationships, that are not obvious in text, thereby making the generated text deeper and more comprehensive. By using the logic diagram filling, more related entities and relations can be introduced, the richness and diversity of the text are increased, and repetition and monotone are avoided.
Further, preferably, the semantic connection relationship means that two extension nodes may belong to the same triplet in the knowledge graph, or a node where a triplet that one of the two extension nodes belongs to coincides with a triplet that the other extension node belongs to, where the same triplet that the two extension nodes belong to is referred to as a semantic connection triplet, and two triples where a coincident node exists where a triplet that one of the two extension nodes belongs to coincides with a triplet that the other extension node belongs to may also be referred to as a semantic connection triplet.
The invention also provides a keyword sequence generation system based on the knowledge graph, which comprises: the method comprises the steps of a knowledge-graph-based keyword sequence generation method, wherein the knowledge-graph-based keyword sequence generation system can be operated in a computing device such as a desktop computer, a notebook computer, a palm computer and a cloud data center, and the operable system can comprise, but is not limited to, a processor, a memory and a server cluster, and the processor executes the computer program to operate in the following units:
the generating unit is used for acquiring input information from the client and obtaining a generated text by inputting the information to the generating model;
the extraction unit is used for acquiring a plurality of triples from the generated text;
the logic unit is used for marking the elements in the triples in the generated text, and filling a logic diagram according to the marked sequence of the elements to obtain a keyword sequence tree;
and the output unit is used for outputting the keyword sequence tree to the client.
Correspondingly, the invention also provides an electronic device, a readable storage medium and a computer program product:
an electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the one knowledge-graph-based keyword sequence generation method and the method of steps therein.
A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of generating a keyword sequence based on a knowledge graph and the method of steps therein.
A computer program product comprising a computer program which when executed by a processor implements the method of generating a keyword sequence based on a knowledge-graph and the method of the steps therein.
The beneficial effects of the invention are as follows: the invention provides a key word sequence generation method, a key word sequence generation system and key word sequence generation equipment based on a knowledge graph, wherein input information is acquired from a client, and generated text is obtained by inputting information to a generation model; acquiring a plurality of triples from the generated text; and marking the elements in the triples in the generated text, filling a logic diagram according to the marked sequence of the elements to obtain a keyword sequence tree, and outputting the keyword sequence tree to a client, thereby not only improving the quality of the generated text, but also providing strong support for deepening the logic hierarchy of the text and enhancing the context correlation.
Drawings
The above and other features of the present invention will become more apparent from the detailed description of the embodiments thereof given in conjunction with the accompanying drawings, in which like reference characters designate like or similar elements, and it is apparent that the drawings in the following description are merely some examples of the present invention, and other drawings may be obtained from these drawings without inventive effort to those of ordinary skill in the art, in which:
FIG. 1 is a flow chart of a method for generating a keyword sequence based on a knowledge graph;
FIG. 2 is a system block diagram of a knowledge-based keyword sequence generation system.
Detailed Description
The conception, specific structure, and technical effects produced by the present invention will be clearly and completely described below with reference to the embodiments and the drawings to fully understand the objects, aspects, and effects of the present invention. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other.
In the description of the present invention, a number means one or more, a number means two or more, and greater than, less than, exceeding, etc. are understood to not include the present number, and above, below, within, etc. are understood to include the present number. The description of the first and second is for the purpose of distinguishing between technical features only and should not be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
Referring to fig. 1, a flowchart of a keyword sequence generating method based on a knowledge graph according to the present invention is shown, and a keyword sequence generating method, system and device based on a knowledge graph according to an embodiment of the present invention are described below with reference to fig. 1.
The invention provides a key word sequence generation method based on a knowledge graph, which specifically comprises the following steps:
acquiring input information from a client, and obtaining a generated text by inputting the information to a generation model;
acquiring a plurality of triples from the generated text;
marking elements in the triples in the generated text, and filling a logic diagram according to the marked sequence of the elements to obtain a keyword sequence tree;
and outputting the keyword sequence tree to the client.
Further, wherein the information input from the client is obtained as one or more character strings.
Further, wherein the generative model is a generative pre-training language model.
Further, entity relation extraction is performed on the generated text, so that a plurality of different triples are obtained.
Further, elements in the plurality of triples are marked in the generated text: and segmenting the generated text, marking the elements in each triplet in the corresponding segmented word, and recording the sequence and the position relation of each element in each triplet in the generated text.
Further, according to the marked sequence of the elements, filling the logic diagram to obtain a keyword sequence tree, specifically:
the knowledge graph is a full-connection graph, no isolated node exists in the graph, the node in the graph is at least an element of a triplet, all nodes are connected at the root node of the graph, and a communication path exists between the nodes;
first logic diagram constraint: in the generated text, when a tail entity of one triplet precedes its head entity and its physical relationship follows its head entity, and there are elements in other triples between its head entities:
setting that each element in the other triples between the tail entity and the head entity is called each prepositioned guest node corresponding to the triples, at this time, the triples are called target triples, calculating shortest paths between each prepositioned guest node corresponding to the triples and the head entity of the triples and the tail entity of the triples respectively, taking each shortest path between each prepositioned guest node and the head entity and each shortest path between each prepositioned guest node and the tail entity as each prepositioned guest sub-path, obtaining shortest paths passing through all prepositioned guest nodes as prepositioned guest sub-paths, and filling logic diagrams according to the prepositioned guest sub-paths and the prepositioned guest sub-paths, wherein in some embodiments, the method specifically comprises the following steps: overlapping each prepositive sub-path and the prepositive sub-path, recording the overlapping times of each prepositive sub-path and each node in the union of the prepositive sub-paths, calculating and centralizing the arithmetic average value of the overlapping times of each node as the average overlapping number, respectively calculating the ratio of the overlapping times of each prepositive sub-path and each node in the union to the average overlapping number, sorting each node in the union according to the ratio of the overlapping times of each node to the average overlapping number, then called extension nodes to form an extension node sequence, screening extension nodes which have semantic connection relation with a head entity or a tail entity of a target triplet in the extension node sequence or extension nodes which have semantic connection relation with other extension nodes in the extension node sequence as target extension nodes, obtaining a plurality of semantic connection triples through each target extension node, enabling the target extension nodes to be connected as a key word sequence tree, performing text generation on the key word sequence tree, using the obtained text as a logic filling text, and performing text supplement on the generated text between the tail entity and the head entity of the target triples by using the logic filling text;
second logic diagram constraint: in the generated text, when a tail entity of one triplet precedes its head entity and its physical relationship precedes its head entity, and there are elements in other triples between the head entity and its physical relationship:
setting each element in the other triplets between the head entity and the entity relation thereof as each centrally-located guest node corresponding to the triplets, wherein the triplets are called target triplets, calculating shortest paths between each centrally-located guest node corresponding to the triplets and the head entity and the tail entity of the triplets respectively, taking each shortest path between each centrally-located guest node and the head entity and each shortest path between each centrally-located guest node and the tail entity as each centrally-located guest sub-path, acquiring the shortest paths passing through all centrally-located guest nodes as centrally-located guest sub-paths, and filling logic diagrams according to the centrally-located guest sub-paths and the centrally-located guest sub-paths, wherein in some embodiments, the method specifically can also be as follows: overlapping all the centrally placed sub-paths with the centrally placed sub-graph, recording the overlapping times of all the centrally placed sub-paths and the centrally placed sub-graph, calculating and concentrating the arithmetic average value of the overlapping times of all the nodes to be the average overlapping number, respectively calculating the ratio of the overlapping times of all the centrally placed sub-paths and the centrally placed sub-graph and the average overlapping number, sorting all the nodes in the union according to the ratio of the overlapping times of all the nodes to the average overlapping number, namely extension nodes to form an extension node sequence, screening extension nodes with semantic connection relation with a head entity or a tail entity of a target triplet in the extension node sequence or extension nodes with semantic connection relation with other extension nodes in the extension node sequence as target extension nodes, acquiring a plurality of semantic connection triples through all the target extension nodes, enabling the target extension nodes to be connected to be called a key word sequence tree, performing text generation on the key word sequence tree, using the obtained text as a logic filling text, and supplementing the generated text between the head entity and the entity relation of the target triplet by using the logic filling text;
third logic diagram constraint: in the generated text, when a tail entity of a triplet is preceded by its head entity and its physical relationship is preceded by its head entity, and there are elements in other triples between the tail entity and its physical relationship:
setting that each element in the tail entity of the target triplet and the entity relation thereof is called each post-guest node corresponding to the triplet, wherein the triplet is called the target triplet, calculating shortest paths between each post-guest node corresponding to the triplet and the head entity of the triplet and the tail entity of the triplet respectively, taking each shortest path between each post-guest node and the head entity and each shortest path between each post-guest node and the tail entity as each post-guest sub-path, obtaining the shortest path passing through all post-guest nodes as a post-guest sub-graph, and filling logic diagram according to the post-guest sub-paths and the post-guest sub-graph, wherein in some embodiments, the method specifically can further comprise the following steps: and (3) overlapping each post-guest sub-path with the post-guest subgraph, recording the overlapping times of each post-guest sub-path and each node in the union of the post-guest subgraphs, calculating and centralizing the arithmetic average value of the overlapping times of each node as the average overlapping number, respectively calculating the ratio of the overlapping times of each post-guest sub-path and each post-guest subgraph and centralizing each node to the average overlapping number, sorting each node in the union according to the ratio of the overlapping times of each node to the average overlapping number, namely an extension node sequence, screening extension nodes which have semantic connection relations with a head entity or a tail entity of a target triplet in the extension node sequence or extension nodes which have semantic connection relations with other extension nodes in the extension node sequence, acquiring a plurality of semantic connection triples through each target extension node, enabling the target extension node to be connected with the keyword sequence tree, generating text for the keyword sequence tree, using the obtained text as a logic filling text, and using the logic filling text to carry out supplement between the tail entity of the generated text and the target entity of the target triplet.
In some embodiments, the keyword sequence tree is subjected to text generation by using a knowledgement-Enhanced Text Generation method (which can include, but is not limited to, a Pointer-Generator and the like, for example), and the obtained text is used as logic filling text.
In some embodiments, in the knowledgegraph structure, dijkstra algorithm may be used to obtain shortest paths passing through all preset nodes, and depth-first search algorithm may also be used to obtain each shortest path between two nodes.
In some embodiments, the accessed external knowledge graph may include, but is not limited to DBpedia, freebase or a self-built knowledge base or graph database, and the external knowledge graph shi may be designed by itself, so that the content generated by supplementing is convenient and variable.
Further, according to various embodiments described above, the method of performing logic diagram population may be represented in some embodiments as: and sequencing according to the superposition times of the extension nodes to form an extension node sequence, screening from the extension node sequence to obtain a plurality of semantic connection triples, connecting the standard extension nodes to be called a keyword sequence tree through the semantic connection triples, generating texts by using the keyword sequence tree to obtain a logic filling text, and supplementing the generated texts by using the logic filling text.
Further, in some embodiments, the semantic connection relationship may refer to that two extension nodes belong to the same triplet in the knowledge graph, or a node where a triplet that one extension node belongs to overlaps with a triplet that another extension node belongs to exists in the knowledge graph, where the same triplet that two extension nodes belong to may be referred to as a semantic connection triplet, and where two triples where a triplet that one extension node belongs to overlaps with a triplet that another extension node belongs to exist in the case of a node that two triples that overlap with each other may be referred to as a semantic connection triplet.
Further, in some embodiments, the method for generating a keyword sequence based on a knowledge graph is applied to writing assistance, for example, the method and steps thereof of the present invention may be applied to writing assistance methods, writing assistance clients, and the like.
The key word sequence generating system based on the knowledge graph is operated in any computing device of a desktop computer, a notebook computer, a palm computer or a cloud data center, and the computing device comprises: a processor, a memory, and a computer program stored in the memory and running on the processor, wherein the processor implements the steps of the method for generating keyword sequences based on knowledge-graph when executing the computer program, and the operable system can include, but is not limited to, a processor, a memory, and a server cluster.
The embodiment of the invention provides a keyword sequence generating system based on a knowledge graph, as shown in fig. 2, and the keyword sequence generating system based on the knowledge graph of the embodiment comprises: a processor, a memory, and a computer program stored in the memory and executable on the processor, the steps in the above embodiment of a keyword sequence generating method based on a knowledge graph being implemented when the processor executes the computer program, the processor executing the computer program to be executed in units of the following system:
the generating unit is used for acquiring input information from the client and obtaining a generated text by inputting the information to the generating model;
the extraction unit is used for acquiring a plurality of triples from the generated text;
the logic unit is used for marking the elements in the triples in the generated text, and filling a logic diagram according to the marked sequence of the elements to obtain a keyword sequence tree;
and the output unit is used for outputting the keyword sequence tree to the client.
In order to better unify the linear relation and probability relation of numerical values between physical quantities of different units, dimensionless processing can be performed on different physical quantities.
Preferably, all undefined variables in the present invention may be threshold set manually if not explicitly defined.
The key word sequence generating system based on the knowledge graph can be operated in computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud data center and the like. The keyword sequence generation system based on the knowledge graph comprises, but is not limited to, a processor and a memory. It will be understood by those skilled in the art that the examples are merely examples of the method, the system and the device for generating a keyword sequence based on a knowledge graph, and the method, the system and the device for generating a keyword sequence based on a knowledge graph are not limited, and may include more or less components than examples, or may combine some components, or different components, for example, the system for generating a keyword sequence based on a knowledge graph may further include an input/output device, a network access device, a bus, and the like.
The invention also provides an electronic device, a readable storage medium and a computer program product:
an electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the one knowledge-graph-based keyword sequence generation method and the method of steps therein.
A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of generating a keyword sequence based on a knowledge graph and the method of steps therein.
A computer program product comprising a computer program which when executed by a processor implements the method of generating a keyword sequence based on a knowledge-graph and the method of the steps therein.
Wherein the electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete component gate or transistor logic devices, discrete hardware components, or the like. The general processor can be a microprocessor or any conventional processor, and the processor is a control center of the keyword sequence generating system based on the knowledge graph, and various interfaces and lines are used for connecting various subareas of the whole keyword sequence generating system based on the knowledge graph.
The memory may be used to store the computer program and/or module, and the processor may implement the various functions of the knowledge-based keyword sequence generation method, system and device by running or executing the computer program and/or module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present disclosure may be performed in parallel, sequentially, or in a different order, so long as the desired result of the technical solution of the present disclosure is achieved, and the present disclosure is not limited herein.
The invention provides a key word sequence generation method, a key word sequence generation system and key word sequence generation equipment based on a knowledge graph, wherein input information is acquired from a client, and generated text is obtained by inputting information to a generation model; acquiring a plurality of triples from the generated text; and marking the elements in the triples in the generated text, filling a logic diagram according to the marked sequence of the elements to obtain a keyword sequence tree, and outputting the keyword sequence tree to a client, thereby not only improving the quality of the generated text, but also providing strong support for deepening the logic hierarchy of the text and enhancing the context correlation.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. The method for generating the keyword sequence based on the knowledge graph is characterized by comprising the following steps: acquiring input information, and obtaining a generated text by inputting the information to the generated model; acquiring a plurality of triples from the generated text; marking the elements in the triples in the generated text, filling a logic diagram according to the marked sequence of the elements to obtain a keyword sequence tree, and outputting the keyword sequence tree.
2. The knowledge-graph-based keyword sequence generation method of claim 1, wherein the input information is one or more character strings obtained from a client.
3. The knowledge-based keyword sequence generation method of claim 2, wherein the generation model is a generation-type pre-training language model.
4. The method for generating a keyword sequence based on a knowledge graph of claim 3, wherein the entity relationship extraction is performed on the generated text to obtain a plurality of different triples.
5. The knowledge-graph-based keyword sequence generation method of claim 4, wherein elements in the plurality of triples are labeled in the generated text: and segmenting the generated text, marking the elements in each triplet in the corresponding segmented word, and recording the sequence and the position relation of each element in each triplet in the generated text.
6. The knowledge-graph-based keyword sequence generation method of claim 1, wherein the logic graph filling is performed according to a first logic graph constraint, a second logic graph constraint and/or a third logic graph constraint:
wherein the first logic diagram constraint: in the generated text, when the tail entity of one triplet is in front of the head entity and the entity relationship is behind the head entity of the triplet, and elements in other triples exist between the head entities of the tail entity, each element in other triples existing between the tail entity and the head entity of the tail entity is called each prepositioned guest node corresponding to the triplet, a prepositioned guest path and a prepositioned guest subgraph are obtained from the prepositioned guest nodes, and logic diagram filling is carried out according to the prepositioned guest path and the prepositioned guest subgraph;
wherein the second logic diagram constraint: in the generated text, when the tail entity of one triplet is arranged in front of the head entity and the entity relation is arranged in front of the head entity, and elements in other triples exist between the head entity and the entity relation, each element in the other triples exist between the head entity and the entity relation is called as each centrally-located guest node corresponding to the triplet, a centrally-located guest sub-path and a centrally-located guest subgraph are obtained from the centrally-located guest nodes, and logic diagram filling is carried out according to the centrally-located guest sub-path and the centrally-located guest subgraph;
wherein the third logic diagram constraint: in the generated text, when the tail entity of one triplet is behind the head entity and the entity relation is in front of the head entity, and elements in other triples exist between the tail entity and the entity relation, each element in other triples exist between the tail entity of the target triplet and the entity relation of the target triplet and is called each post-guest node corresponding to the triplet, a post-guest sub-path and a post-guest sub-graph are obtained from the post-guest nodes, and logic diagram filling is carried out according to the post-guest sub-path and the post-guest sub-graph.
7. The method for generating the keyword sequence based on the knowledge graph as claimed in claim 6, wherein the method for filling the logic graph is as follows: sequencing all nodes in the union according to the ratio of the number of times of superposition of all nodes to the average number of superposition, namely, sequencing all nodes in the union to form an extension node sequence according to the number of superposition of the extension nodes, screening from the extension node sequence to obtain a plurality of semantic connection triples, enabling the standard extension nodes to be connected to be called a keyword sequence tree through the semantic connection triples, generating texts by using the keyword sequence tree to obtain a logic filling text, and supplementing the generated text by using the logic filling text.
8. The method for generating a keyword sequence based on a knowledge graph according to claim 7, wherein the semantic connection relationship means that two extension nodes belong to a same triplet in the knowledge graph or a node where a triplet to which one extension node belongs overlaps with a triplet to which the other extension node belongs, wherein the same triplet to which two extension nodes belong is called a semantic connection triplet, and two triples where a coincident node exists when a triplet to which one extension node belongs and a triplet to which the other extension node belongs overlap are also called semantic connection triples.
9. The utility model provides a key word sequence generation system based on knowledge graph, its characterized in that, a key word sequence generation system based on knowledge graph is operated in desktop computer, notebook computer or the arbitrary computing device in high in the clouds data center, and the computing device includes: a processor, a memory and a computer program stored in the memory and running on the processor, the processor implementing the steps in a knowledge-graph based keyword sequence generation method as claimed in any one of claims 1 to 8 when the computer program is executed.
10. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 8.
CN202311713478.4A 2023-12-14 2023-12-14 Keyword sequence generation method, system and equipment based on knowledge graph Active CN117407492B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311713478.4A CN117407492B (en) 2023-12-14 2023-12-14 Keyword sequence generation method, system and equipment based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311713478.4A CN117407492B (en) 2023-12-14 2023-12-14 Keyword sequence generation method, system and equipment based on knowledge graph

Publications (2)

Publication Number Publication Date
CN117407492A true CN117407492A (en) 2024-01-16
CN117407492B CN117407492B (en) 2024-02-23

Family

ID=89500250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311713478.4A Active CN117407492B (en) 2023-12-14 2023-12-14 Keyword sequence generation method, system and equipment based on knowledge graph

Country Status (1)

Country Link
CN (1) CN117407492B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241209A (en) * 2020-01-03 2020-06-05 北京百度网讯科技有限公司 Method and apparatus for generating information
CN114490984A (en) * 2022-01-21 2022-05-13 深圳壹账通科技服务有限公司 Question-answer knowledge extraction method, device, equipment and medium based on keyword guidance
CN114491077A (en) * 2022-02-15 2022-05-13 平安科技(深圳)有限公司 Text generation method, device, equipment and medium
US20230334241A1 (en) * 2022-04-19 2023-10-19 International Business Machines Corporation Syntactic and semantic autocorrect learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241209A (en) * 2020-01-03 2020-06-05 北京百度网讯科技有限公司 Method and apparatus for generating information
CN114490984A (en) * 2022-01-21 2022-05-13 深圳壹账通科技服务有限公司 Question-answer knowledge extraction method, device, equipment and medium based on keyword guidance
CN114491077A (en) * 2022-02-15 2022-05-13 平安科技(深圳)有限公司 Text generation method, device, equipment and medium
US20230334241A1 (en) * 2022-04-19 2023-10-19 International Business Machines Corporation Syntactic and semantic autocorrect learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
毛兴静 等: "基于关键词异构图的生成式摘要研究", 《计算机科学》, 27 September 2023 (2023-09-27), pages 1 - 15 *

Also Published As

Publication number Publication date
CN117407492B (en) 2024-02-23

Similar Documents

Publication Publication Date Title
JP7127106B2 (en) Question answering process, language model training method, apparatus, equipment and storage medium
KR102565659B1 (en) Method and apparatus for generating information
JP2021114291A (en) Time series knowledge graph generation method, apparatus, device and medium
JP7301922B2 (en) Semantic retrieval method, device, electronic device, storage medium and computer program
US20230004721A1 (en) Method for training semantic representation model, device and storage medium
US20220318275A1 (en) Search method, electronic device and storage medium
JP2021099890A (en) Determination method of cause-and-effect relationship, device, electronic apparatus, and storage medium
JP2021111420A (en) Method and apparatus for processing semantic description of text entity, and device
KR20210056961A (en) Semantic processing method, device, electronic equipment and medium
JP2021174516A (en) Knowledge graph construction method, device, electronic equipment, storage medium, and computer program
CN113220836A (en) Training method and device of sequence labeling model, electronic equipment and storage medium
JP2021082306A (en) Method, apparatus, device, and computer-readable storage medium for determining target content
JP7309811B2 (en) Data annotation method, apparatus, electronics and storage medium
CN117371406A (en) Annotation generation method, device, equipment and medium based on large language model
CN117407492B (en) Keyword sequence generation method, system and equipment based on knowledge graph
CN116150394A (en) Knowledge extraction method, device, storage medium and equipment for knowledge graph
JP7242797B2 (en) Phrase processing method, equipment and storage medium
CN111339314A (en) Method and device for generating triple-group data and electronic equipment
JP2023012541A (en) Question answering method, device, and electronic apparatus based on table
CN113641724B (en) Knowledge tag mining method and device, electronic equipment and storage medium
CN113553833B (en) Text error correction method and device and electronic equipment
CN111832313B (en) Method, device, equipment and medium for generating emotion matching set in text
CN112541346A (en) Abstract generation method and device, electronic equipment and readable storage medium
CN112560466A (en) Link entity association method and device, electronic equipment and storage medium
CN116484870B (en) Method, device, equipment and medium for extracting text information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant