CN114595334A

CN114595334A - Language analysis method and system based on double-graph-spectrum fusion and terminal equipment

Info

Publication number: CN114595334A
Application number: CN202011428641.9A
Authority: CN
Inventors: 曲道奎; 刘世昌; 陈烁; 王晓东; 王海鹏; 王晓峰
Original assignee: Shandong Siasun Industrial Software Research Institute Co Ltd
Current assignee: Shandong Siasun Industrial Software Research Institute Co Ltd
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2022-06-07

Abstract

The invention is suitable for the technical field of robots, and provides a language analysis method, a system and terminal equipment based on double-spectrum fusion, wherein the language analysis method comprises the following steps: extracting knowledge information and event information of input text information to respectively obtain a first extraction result and a second extraction result; searching out associated knowledge information from a pre-established knowledge graph and a pre-established affair graph according to the first extraction result and the second extraction result; and analyzing the character information by combining the searched knowledge-affair map. The language analysis method based on the double-atlas enables a machine to understand human language more accurately, provides more accurate service for the machine, and enables a user to experience convenience brought by artificial intelligence. Meanwhile, the fusion application of the two maps is also a new map application mode, and can be applied to a plurality of fields such as search, knowledge reasoning, visual decision and the like in an expanded mode, so that the development and the application of the knowledge map are deepened.

Description

Language analysis method and system based on double-graph-spectrum fusion and terminal equipment

Technical Field

The invention relates to the field of intelligent identification, in particular to a language analysis method, a language analysis system, terminal equipment and a computer readable storage medium based on double-graph-spectrum fusion.

Background

In general, the most common application of knowledge graph in semantic understanding is a question-answering system, in which the knowledge graph is only used as representation and storage of knowledge to provide data knowledge for generating answers, and a specific semantic understanding process is not used, the function of the knowledge graph is not fully exerted, the knowledge graph based on semantic network is not applied to actual semantic understanding, and the fact graph has a similar structure with the knowledge graph as a relatively new event representation form, and also represents knowledge in the form of graph, but the two are always used separately and independently, but in reality, the event and the entity have my relationship, and the two are closely related, so that the independent use of each other greatly limits the application and development of the graph in each field.

Therefore, a new technical solution is needed to solve the above technical problems.

Disclosure of Invention

In view of this, embodiments of the present invention provide a language parsing method, system and terminal device based on dual-graph fusion, through which a human language can be more accurately understood, more accurate services are provided for the human language, and a user can experience convenience brought by artificial intelligence more.

The first aspect of the embodiments of the present invention provides a language parsing method based on dual-graph fusion, where the language parsing method includes:

carrying out two-way extraction on the knowledge information and the event information of the input characters to respectively obtain a first extraction result and a second extraction result;

searching a corresponding knowledge-affair map in a pre-established map relation library according to the first extraction result and the second extraction result;

and analyzing the character information by combining the searched related information in the knowledge-affair map.

Optionally, in another embodiment provided by the present application, the performing knowledge information and event information on the input text to perform two-way extraction to obtain a first extraction result and a second extraction result respectively includes:

extracting keywords of at least two character information and the relation between the keywords, taking the relation value corresponding to the keywords as a first extraction result, wherein the relation value corresponds to the relation value in the knowledge-affair map;

and extracting the logical relationship of the text information as a second extraction result, wherein the logical relationship is the event logical relationship contained in the text information, and the expressed event is subjected to abstract logical representation.

Optionally, in another embodiment provided by the present application, the extracting at least two keywords of the text information includes:

and extracting at least two keywords of the text information through synonym processing, real word extraction, grammar analysis, ambiguity processing and model matching.

Optionally, in another embodiment provided by the present application, the data of the knowledge-matter graph is stored in the form of RDF triples, including:

structured data, semi-structured data, and unstructured data.

A second aspect of the embodiments of the present invention provides a language parsing system based on dual-graph fusion, where the language parsing system includes:

the extraction module is used for carrying out two-way extraction on the knowledge information and the event information of the input characters to respectively obtain a first extraction result and a second extraction result;

the searching module is used for searching a corresponding knowledge-affair map in a pre-established map relation base according to the first extraction result and the second extraction result;

and the analysis module is used for analyzing the character information by combining the searched related information in the knowledge-affair map.

Optionally, in another embodiment provided by the present application, the extraction module is specifically configured to:

Optionally, in another embodiment provided by the present application, when the extracting module extracts at least two keywords of the text information, the extracting module is specifically configured to:

structured data, semi-structured data, and unstructured data.

A third aspect of embodiments of the present invention provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method of any one of the first aspect when executing the computer program.

A fourth aspect of embodiments of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the first aspect mentioned above.

Compared with the prior art, the embodiment of the invention has the following beneficial effects: the language analysis method based on the double-atlas enables a machine to understand human language more accurately, provides more accurate service for the machine, and enables a user to experience convenience brought by artificial intelligence. Meanwhile, the fusion application of the two maps is also a new map application mode, and can be applied to a plurality of fields such as search, knowledge reasoning, visual decision and the like in an expanded mode, so that the development and the application of the knowledge map are deepened.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flowchart of a language parsing method based on dual-spectrum fusion according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a language parsing method based on dual-atlas fusion according to another embodiment of the present invention;

fig. 3 is a schematic structural diagram of a language parsing system based on dual-graph fusion according to a second embodiment of the present invention;

fig. 4 is a schematic structural diagram of a terminal device according to a third embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Example one

Fig. 1 is a schematic flowchart of a speech parsing method based on dual-atlas fusion according to an embodiment of the present invention, where the method may include the following steps:

s101: and carrying out two-way extraction on the knowledge information and the event information of the input characters to respectively obtain a first extraction result and a second extraction result.

S102: and searching a corresponding knowledge-affair map in a pre-established map relation library according to the first extraction result and the second extraction result.

S103: and analyzing the character information by combining the searched related information in the knowledge-affair map.

Optionally, the extracting knowledge information and event information from the input text information to obtain a first extraction result and a second extraction result respectively includes: extracting keywords of at least two character information and the relation between the keywords, taking the relation value corresponding to the keywords as a first extraction result, wherein the relation value corresponds to the relation value in the knowledge-affair map; and extracting the logical relation of the text information as a second extraction result.

In the following, the language parsing method provided by the present application is specifically described with reference to fig. 2, in the knowledge extraction step, the extraction is divided into two parts, one part is the extraction of entities (i.e. keywords), entity relationships (relationships between keywords), attributes and values of the knowledge graph, and the other part is the extraction of event and event relation of the case graph. The knowledge units are extracted from the data with different structures, and comprise the entities, the relations and the attributes of the knowledge graph, the events, the relations and the like of the affair graph, and a series of high-quality fact expressions are formed on the basis of the entities, the relations and the attributes of the knowledge graph, and the events, the relations and the like of the affair graph, so that a foundation is laid for the construction of the upper-layer graph. The method adopts a mode of combining two modes for extracting two types of map knowledge, wherein one mode simultaneously extracts real words by using a lexical analysis according to a constructed template, a rule and an event knowledge base, the event knowledge base comprises an event word connection base, an effect word base and a causal pattern base, and simultaneously extracts a knowledge map, a case map and the relation between an event and an entity by supervised learning, and the other mode simultaneously extracts knowledge data in the knowledge map and the case map from data by respectively adopting some basic methods, and then establishes the relation between the two types of extracted knowledge by adopting a mode of combining the rule and the supervised learning according to the constructed relation knowledge base. The technical key point is the mode of extracting two types of image spectrum knowledge, the mode of combining the two modes of extracting and extracting first and then combining the two modes, and the specific using time of the two modes can be judged according to the form and the content of data.

Optionally, the database of the prior knowledge-matter graph, wherein the schema of the graph is equivalent to a data model and directly influences the application range and the application mode of the graph, so that the design of the schema is the key of graph construction. In addition, the sources of data are diverse, so extracted knowledge needs to be fused in the construction process, the fusion comprises a data mode layer and a data layer, the data mode layer is considered in the schema design, the fusion of the data layer generally comprises entity combination and combination of entity attributes and relations, and the conditions of synonyms, aliases and the like are mainly considered. The technical key point is that a part of the technical key points is to add new relation and attribute on the design of the schema, construct the relation between the knowledge graph and the affair graph, and the other part adds the event background as an element into the fusion process in the aspect of fusion.

The knowledge-matter graph comprises a data source including structured data, semi-structured data and unstructured data, a data organization form of a network form of the knowledge-matter graph may not meet the storage requirement in reality by using a common relational database, the data organization form is stored by using a Resource Description Framework (RDF) triple form, and the RDF provides a standard data model for the knowledge graph to describe entities, attributes and relationships and can express events. The extracted knowledge is converted into an RDF form for storage, a graph database is used for storing data, the technical key point is that aiming at the characteristics and the requirement requirements of different graph databases, a two-database combined application mode is adopted, a neo4j graph database is adopted for main knowledge data, and for the difference between the requirement and the knowledge field, fuseki with smaller data volume is adopted as an auxiliary dependency under a large database, so that the data is not only stored in a distributed mode, but also the knowledge is more conveniently and quickly positioned, so that the problem that on one hand, neo4j can only construct one database, so that the field knowledge cannot be distinguished by a plurality of databases, and the fuseki can construct a plurality of databases, so that the independent storage of the knowledge with strong points is more conveniently called, and simultaneously, the data volume is used as the basis for database selection during the storage, so although two databases are used, the access speed is not affected.

Semantic understanding: the technology of semantic understanding includes multiple aspects, and usually needs multiple aspects of technologies such as synonym processing, real word extraction, syntactic analysis, ambiguity processing, model matching, answer generation, and the like, and deep learning also begins to be applied thereto. Regardless of the technical method, the key to semantic understanding is to understand the meaning of the meaning and expression expressed by the words so that the machine can know how to perform the corresponding operation. The important point that human beings can understand the meaning of the language is that human beings possess knowledge reserves, and the knowledge is stored in a language organization form unique to the human beings, while the knowledge graph and the affair graph construct a semantic network in a language organization form of the human beings to express the knowledge, and the human knowledge is mainly expressed by taking events and entities as main bodies, so that the knowledge of the visible graph can be completely applied to the semantic understanding process. The key technical point is that in the processes of semantic real word extraction, syntactic analysis and the like, a knowledge graph and a case graph are used as bases, the extracted real words can judge the meaning of the extracted real words according to entities in the graph, corresponding attributes and the like, for example, "apples" are extracted, but the "apples" can be fruits or mobile phones, meanwhile, the case graph also provides event background information of the entities, the real meaning of the extracted real words is extracted by combining comprehensive judgment and rules, so the knowledge bases are added in the understanding process, the semantic understanding is more accurate, answers can be generated from multiple dimensions according to the understanding intention by fusion use of the graphs, and the answers are more natural and closer to the human way.

The language analysis method provided by the application uses a natural language understanding method of fusing a case map and a knowledge map, not only realizes the fusion construction and application of the two maps, and effectively utilizes and exerts the relation between the two maps, but also adds a knowledge basis in the semantic understanding process, can more effectively and more accurately help to understand the semantic intention, and simultaneously searches for required knowledge information in the fusion map in the answer generation process, so that the coverage is wider, and the method is more suitable for the answer mode of human beings. Moreover, the knowledge graph and the natural language understanding serve as two important research fields of artificial intelligence, and the knowledge graph and the natural language understanding are more effectively fused and applied to promote the development of each other.

Example two

Fig. 3 is a schematic structural diagram of a language parsing system based on dual-atlas fusion according to a second embodiment of the present invention, and for convenience of description, only the parts related to the second embodiment of the present invention are shown.

The fault detection system can be a software unit, a hardware unit or a combination unit which is built in the robot, and can also be integrated into the computer or other terminals as an independent pendant.

The dual graph-based language parsing system comprises:

the extraction module 31 is configured to extract a knowledge graph and a case graph from pre-stored text information to obtain a first extraction result and a second extraction result respectively;

a searching module 32, configured to search a corresponding knowledge-matter graph in a pre-established graph relation library according to the first extraction result and the second extraction result;

and the analysis module 33 is configured to analyze the text information in combination with the searched knowledge-matter graph.

Optionally, in another embodiment provided by the present application, when the extraction module extracts at least two keywords of the text information, the extraction module is specifically configured to:

structured data, semi-structured data, and unstructured data.

The working process of the language analysis system based on the double-atlas refers to the implementation process of the language analysis method based on the double-atlas, and is not described herein again.

EXAMPLE III

Fig. 4 is a schematic structural diagram of a terminal device according to a fourth embodiment of the present invention. As shown in fig. 4, the terminal device 4 of this embodiment includes: a processor 40, a memory 41 and a computer program 42, such as a dual-atlas based language parsing method program, stored in the memory 41 and executable on the processor 40. The processor 40, when executing the computer program 42, implements the steps of the first embodiment of the method, such as the steps S101 to S103 shown in fig. 1. The processor 40, when executing the computer program 42, implements the functions of the modules/units in the above-described device embodiments, such as the functions of the modules 31 to 33 shown in fig. 4.

Illustratively, the computer program 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 42 in the terminal device 4. For example, the computer program 42 may be divided into different modules, and the specific functions of each module are as follows:

structured data, semi-structured data, and unstructured data.

The terminal device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 40, a memory 41. Those skilled in the art will appreciate that fig. 4 is merely an example of a terminal device 4 and does not constitute a limitation of terminal device 4 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.

The Processor 40 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 41 may be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. The memory 41 may also be an external storage device of the terminal device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the terminal device 4. The memory 41 is used for storing the computer program and other programs and data required by the terminal device. The memory 41 may also be used to temporarily store data that has been output or is to be output.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art would appreciate that the modules, elements, and/or method steps of the various embodiments described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A language parsing method based on dual-graph fusion is characterized by comprising the following steps:

2. The method of claim 1, wherein the performing knowledge information and event information extraction on the input text to obtain a first extraction result and a second extraction result respectively comprises:

3. The method of claim 2, wherein the extracting at least two keywords of the text message comprises:

4. A language parsing method according to claim 1 wherein the data of the knowledge-physics graph is stored in the form of RDF triples comprising:

structured data, semi-structured data, and unstructured data.

5. A language parsing system based on dual graph fusion, the language parsing system comprising:

6. The language parsing system of claim 5, wherein the extraction module is specifically configured to:

7. The language parsing system of claim 6, wherein the extracting module, when extracting at least two keywords of the text message, is specifically configured to:

8. A language parsing system according to claim 5 wherein the knowledge-physics graph data is stored in the form of RDF triples comprising:

structured data, semi-structured data, and unstructured data.

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 4 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.