WO2022078145A1 - 面向本质计算与推理的跨dikw模态文本歧义处理方法 - Google Patents

面向本质计算与推理的跨dikw模态文本歧义处理方法 Download PDF

Info

Publication number
WO2022078145A1
WO2022078145A1 PCT/CN2021/118178 CN2021118178W WO2022078145A1 WO 2022078145 A1 WO2022078145 A1 WO 2022078145A1 CN 2021118178 W CN2021118178 W CN 2021118178W WO 2022078145 A1 WO2022078145 A1 WO 2022078145A1
Authority
WO
WIPO (PCT)
Prior art keywords
resources
text
target
resource
information
Prior art date
Application number
PCT/CN2021/118178
Other languages
English (en)
French (fr)
Inventor
段玉聪
胡时京
Original Assignee
海南大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 海南大学 filed Critical 海南大学
Priority to CA3136527A priority Critical patent/CA3136527C/en
Publication of WO2022078145A1 publication Critical patent/WO2022078145A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the present application relates to the technical field of software engineering, and in particular, to a method, a system, an electronic device, and a storage medium for processing text ambiguity across DIKW modalities for essential computing and reasoning.
  • Ambiguity refers to the understanding of text content with different purposes, that is, information resources with different purposes can be derived from the type resources in the content. There are two reasons for the ambiguity: one is due to the lack of content, the lack of some data resources or information resources, which leads to a wider range of understanding of the content, and the understanding of different purposes will be generated during derivation; the other is due to the existence of content in the content. Redundancy, redundant data resources or information resources, when derived in combination with different types of resources, can lead to understanding of different purposes. In related technologies, the processing of text ambiguity is mainly realized by machine learning models, but the recognition accuracy of the machine learning models is overly dependent on the richness of training samples, and cannot effectively handle text ambiguity.
  • the purpose of this application is to provide a cross-DIKW modal text ambiguity processing method, system, an electronic device and a storage medium oriented to essential calculation and reasoning, which can accurately identify and eliminate the ambiguity existing in the text.
  • the present application provides a cross-DIKW modal text ambiguity processing method oriented to essential calculation and reasoning, and the method includes:
  • the number of textual meanings of the target text is greater than 1, acquiring supplementary resources of the target text, and generating a conditional restriction text of the target text according to the supplementary resources;
  • the text meaning of the text that meets the condition restriction is taken as the actual text meaning of the target text, and the target text is modified according to the actual text meaning.
  • determining the target data resource and target information resource in the target text including:
  • the resource type includes data resources, information resources and knowledge resources, the data resources are the resources in the data map, the information resources are the resources in the information map, and the knowledge resources are the resources in the knowledge map resource;
  • performing cross-modal transformation on the target text to obtain the target data resources and target information resources including:
  • query the relevant resources of the target text according to the target data resource and/or the target information resource including:
  • the related resources of the target text are queried from the associated text according to the target data resource and/or the target information resource.
  • supplementary resources of the target text including:
  • an information resource whose degree of association with the target information resource in the information map is greater than the preset value is used as a supplementary resource of the target text.
  • the method further includes:
  • the target text is a text with redundant data resources or redundant information resources.
  • the textual meaning of the target text is derived by combining the related resources with each of the target data resources and each of the target information resources, respectively.
  • the present application also provides a cross-DIKW modal text ambiguity processing system for essential computation and reasoning, the system comprising:
  • a text analysis module used to obtain target text and determine target data resources and target information resources in the target text
  • a meaning determination module configured to query the relevant resources of the target text according to the target data resources and/or the target information resources, and determine the textual meaning of the target text according to the relevant resources;
  • a resource supplement module configured to acquire supplementary resources of the target text if the number of textual meanings of the target text is greater than 1, and generate a conditional restriction text of the target text according to the supplementary resources;
  • a text modification module configured to take the textual meaning of the text that meets the condition restriction as the actual textual meaning of the target text, and modify the target text according to the actual textual meaning.
  • the present application also provides a storage medium on which a computer program is stored, and when the computer program is executed, implements the steps performed by the above-mentioned method for processing text ambiguity across DIKW modalities oriented to essential calculation and reasoning.
  • the present application also provides an electronic device, including a memory and a processor, where a computer program is stored in the memory, and when the processor invokes the computer program in the memory, the above-mentioned cross-DIKW modality for essential computing and reasoning is implemented The steps performed by the text disambiguation method.
  • the present application provides a cross-DIKW modal text ambiguity processing method oriented to essential calculation and reasoning, including: obtaining target text, and determining target data resources and target information resources in the target text; /or the target information resource queries the relevant resources of the target text, and determines the textual meaning of the target text according to the relevant resources; if the number of textual meanings of the target text is greater than 1, obtain the target text and generate the conditional text of the target text according to the supplementary resources; take the text meaning of the text that meets the conditional restrictions as the actual text meaning of the target text, and modify the text according to the actual text meaning target text.
  • the present application After acquiring the target text, the present application determines the target data resources and target information resources contained in the target text, and then queries the relevant resources of the target text according to the target data resources and the target information resources, and determines the textual meaning of the target text according to the relevant resources.
  • the present application generates the conditionally restricted text of the target text according to the supplementary resources of the target text, takes the text meaning of the qualified text as the actual text meaning of the target text, and then modifies the target text according to the actual text meaning, eliminating the target text.
  • Ambiguity in text It can be seen that the present application can accurately identify and eliminate the ambiguity existing in the text.
  • the present application also provides a cross-DIKW modal text ambiguity processing system, a storage medium, and an electronic device oriented to essential computing and reasoning, which have the above beneficial effects, and are not repeated here.
  • FIG. 1 is a flowchart of a method for processing text ambiguity across DIKW modality oriented to essential computing and reasoning provided by an embodiment of the present application;
  • FIG. 2 is a schematic structural diagram of a cross-DIKW modal text ambiguity processing system for essential computing and reasoning provided by an embodiment of the present application.
  • Resource elements can include three forms of data resources, information resources and knowledge resources.
  • Graph refers to the result of integrating resource elements.
  • the graph of resource elements includes data graph, information graph and knowledge graph.
  • DIKW refers to Data, Information, knowledge and wisdom.
  • the DIKW model is a model that can be used to help understand the relationship between data, information, knowledge and wisdom.
  • a data graph is a collection of various data structures including data resources such as arrays, linked lists, stacks, queues, trees, and graphs.
  • Data graphs are basic individual items of numerical or other types of information obtained through observation.
  • the information graph is conveyed through the context of the data resources and the combination of the data resources, and the information suitable for analysis and interpretation after the concept mapping and the combination of related relationships.
  • the knowledge graph is essentially a semantic network, including a collection of statistical rules summarized by information resources.
  • the knowledge graph contains rich semantic relations.
  • the edge density and node density of the knowledge graph can be improved through information reasoning and entity linking on the knowledge graph.
  • the unstructured nature of the knowledge graph makes it possible to link itself seamlessly.
  • the embodiments provided in this application can be applied to the remote sensing field, that is, text ambiguity processing can be realized based on the data map, information map and knowledge map related to the remote sensing field.
  • FIG. 1 is a flowchart of a method for processing text ambiguity across DIKW modalities oriented to essential calculation and reasoning provided by an embodiment of the present application.
  • S101 Acquire a target text, and determine a target data resource and a target information resource in the target text;
  • this embodiment may perform a data resource extraction operation on the target text to obtain the target data resource, and may also perform an information resource extraction operation on the target text to obtain the target information resource.
  • a template including data resources and information resources corresponding to each other can be used to determine target data resources and target information resources that conform to the templates.
  • text analysis can be performed on the target text, and the target data resource and the target information resource can be determined according to the text analysis result.
  • a resource set including a sample data resource and a sample information resource can also be text-matched with the target text to obtain the target data resource and the target information resource.
  • the target data resources in the target text "Summer night, user A stays in the study.” can include (Location: Study), (time: night), (season: summer), information resources can include: I 0 (summer night in the study);
  • S102 Query the relevant resources of the target text according to the target data resources and/or the target information resources, and determine the textual meaning of the target text according to the relevant resources;
  • the associated text of the target text can be obtained, and the textual meaning of the target text can be determined based on the data resources and information resources in the associated text. Any two textual meanings of the target text are different from each other, that is, the ambiguity of the target text.
  • the context of the target text may be used as the associated text, and other texts that are related to the target text may also be used as the associated text.
  • the present embodiment may query the relevant resources of the target text from the associated text according to the target data resource and/or the target information resource.
  • this embodiment may combine the relevant resources with each of the target data resources and each of the target information resources to deduce the textual meaning of the target text.
  • the target text is "Night, Huaweing is in the study”
  • the related resources identified in the related text are "Spring Festival” and "Exam”
  • relevant resources and target text can be displayed on the human-computer interaction interface, so that the user can determine the textual meaning of the target text.
  • the ambiguous target text mentioned in this embodiment may be text with missing data resources or information resources, and the lack of content in the text leads to a reduction in the limit on the understanding range of the content.
  • combining different knowledge resources can deduce information resources for different purposes. By increasing the restriction on content understanding, the range of content understanding can be narrowed, so that only one of the derived information resources with different purposes is retained, thereby eliminating ambiguity.
  • the content is modeled based on the data graph, information graph and knowledge graph.
  • the graph can be divided into two categories: the absence of information resources and the absence of data resources.
  • the ambiguous target text mentioned in this embodiment may also be text with redundant data resources or redundant information resources. There are redundant data resources or information resources in the text that have multiple understandings on the same issue for different purposes. Model content based on data graphs, information graphs, and knowledge graphs. In the case of redundant content, it can be divided into two categories: redundant information resources and redundant data resources.
  • the supplementary resources of the target text can be queried from the data map according to the target data resources; the supplementary resources of the target text can also be queried from the information map according to the target information resources in this embodiment.
  • the data map includes a large number of data resources, and there is a certain degree of correlation between the data resources in the data map.
  • the supplementary resources of the target text can be queried according to the degree of association of the data resources in the data map.
  • the data map can be A data resource whose degree of association with the target data resource is greater than a preset value is used as a supplementary resource for the target text.
  • the target data resource is "winter”
  • the data resource in the data map whose correlation degree with the target data resource is greater than the preset value may include "warmth index", "snow probability” and the like.
  • the information map includes a large number of information resources, and there is a certain degree of correlation between the information resources in the information map.
  • the supplementary resources of the target text can be queried according to the correlation degree of the information resources in the information map.
  • the information resource in the information map whose degree of association with the target information resource is greater than the preset value is used as the supplementary resource of the target text.
  • the target information resource is "Einstein is in class”
  • the information resources in the information map whose degree of association with the target information resource is greater than the preset value may include "Einstein is a physicist", "Einstein was already a physicist" get married” etc.
  • S104 Use the textual meaning of the text that meets the condition restriction as the actual textual meaning of the target text, and modify the target text according to the actual textual meaning.
  • this embodiment may use the textual meaning of the conditional restriction text as the actual textual meaning of the target text.
  • the target text will have the following two text meanings: "1, Einstein's occupation is a student", "2, Einstein's occupation is a teacher"; If the conditional text corresponding to the supplementary resource is "Einstein was a physicist, and Einstein was married at that time", the conditional text can be used to determine the actual textual meaning of the target text.
  • the target text can be modified according to the actual text meaning, so as to eliminate ambiguity in the target text.
  • each textual meaning of the conditional text and the target text can be displayed on the human-computer interaction interface, so that the user can determine the textual meaning of the conditional text.
  • the target data resources and target information resources contained in the target text are determined, and then the relevant resources of the target text are searched according to the target data resources and the target information resources, and the textual meaning of the target text is determined according to the relevant resources.
  • the conditionally restricted text of the target text is generated according to the supplementary resources of the target text, the textual meaning of the qualified text is taken as the actual textual meaning of the target text, and the target text is modified according to the actual textual meaning, eliminating the need for Ambiguity in target text. It can be seen that this embodiment can accurately identify and eliminate the ambiguity existing in the text.
  • the target data resources and target information resources in the target text can be determined in the following ways: determining the resource type of the target text; performing cross-modal transformation on the target text to obtain the Target data resources and target information resources; wherein, the resource types include data resources, information resources and knowledge resources, data resources are resources in the data map, information resources are resources in the information map, and knowledge resources are resources in the knowledge map ;
  • the above-mentioned cross-modal transformation is a transformation operation between any two resources among data resources, information resources, knowledge resources, and data-information mixed resources.
  • Data and information mixed resources are resources in which data resources and information resources are mixed.
  • the following operations may be performed: determine whether the target text is a data resource; if so, set the target text as the target data resource; Perform cross-modal transformation on the target text to obtain the target data resource; determine whether the target text is an information resource; if so, set the target text as the target information resource; Perform cross-modal transformation to obtain the target information resource.
  • Scenario 1 Handling of ambiguity caused by missing information resources.
  • K 1 R IN (T ACTIVITY (Study),T PLACE (Studyroom))
  • K 2 R AT (T ACTIVITY (Sleep),T TIME (Night))
  • This embodiment can narrow the scope of content understanding by adding related data resources or information resources, thereby eliminating ambiguity.
  • two scenarios of increasing data resources and increasing information resources are discussed respectively.
  • D 1 (T FACILITY (INS(AIR_CONDITION Bedroom ))
  • K 3 R IS (T SEASON (Summer),T TEMPERATURE (High))
  • K 4 R LIKE (T PERSON ,R IN (T ACTIVITY (Sleep),R IS (T PLACE ,T TEMPERATURE (Low))))
  • the information resources I new1 and I new2 with different purposes are derived from the known data resource D 0 and the information resource I 0 in combination with the relevant knowledge resources (ie, the relevant resources mentioned above).
  • I new3 and I new1 and I new2 The relationship between I new3 and I new1 and I new2 is judged, the information resources supported by I new3 are reserved, and other information resources are deleted.
  • K 5 R AT (R DO (T PERSON (R LIKE (T PERSON ,T ACTIVITY (Study))),T ACTIVITY (Study)),T TIME (Night))
  • the information resources I new1 and I new2 with different purposes are derived from the known data resource D 0 and the information resource I 0 in combination with the relevant knowledge resources (ie, the relevant resources mentioned above).
  • I new3 and I new1 and I new2 The relationship between I new3 and I new1 and I new2 is judged, the information resources opposed by I new3 are deleted, and other information resources are reserved.
  • Scenario 2 Handling of ambiguity caused by missing information resources.
  • Text content "The seniority of user A is higher than that of user B.” It can correspond to the following data and information resources:
  • K 1 R PROBABLY_GREATER_THAN (T AGE (T PERSON (T SENIORITY (High)),T AGE (T PERSON (T SENIORITY (Low))
  • D 1 User A is mature in mind
  • D 2 User B is naive.
  • D 1 and D 2 the information resource of "user A is more mature than user B" can be deduced.
  • the above data resources increase the restrictions on the mature relationship between user A and user B when judging "the age of user A and user B", which further narrows the scope of understanding of the content. Supports the previously derived information resource that "user A is older than user B".
  • the information resources I new1 and I new2 with different purposes are deduced from the known data resource D 0 and the information resource I 0 in combination with the relevant knowledge resources.
  • I new3 and I new1 and I new2 The relationship between I new3 and I new1 and I new2 is judged, the information resources opposed by I new3 are deleted, and other information resources are reserved.
  • I 1 R RESPECT (B,A)
  • K 3 R RESPECT (T PERSON (T STATUS (Low)),T PERSON (T STATUS (High)))
  • the information resources I new1 and I new2 with different purposes are derived from the known data resource D 0 and the information resource I 0 in combination with the relevant knowledge resources (ie, the relevant resources mentioned above).
  • I new3 and I new1 and I new2 The relationship between I new3 and I new1 and I new2 is judged, the information resources opposed by I new3 are deleted, and other information resources are reserved.
  • Scenario 3 Handling of ambiguity caused by redundancy in information resources.
  • Text content "User A likes to play basketball, but user A hates sports.” It can correspond to the following information resources:
  • K 1 playing basketball is a sport
  • K 2 relationship "hate” and relationship "like” are contradictory.
  • K 1 user A hates sports, and playing basketball is a kind of sports
  • a new information resource I new1 can be deduced: user A hates playing basketball, it can be known from K 2 Contradicts with I new1 . So for the question of "User A's attitude towards playing basketball", and There is an understanding of different purposes, that is, there is redundancy in the information resources in the content.
  • K 1 R BELONGTO (T ACTIVITY (PlayBasketball),T ACTIVITY (Sports))
  • K 2 R OPPOSE (T RELATION (Like),T RELATION (Hate))
  • redundant information resources and are contradictory, so one of them must be wrong. It can help to judge whether the redundant information resources are right or wrong by adding relevant data resources or information resources, so as to eliminate ambiguity. In the following, two scenarios of increasing data resources and increasing information resources are discussed respectively.
  • the main purpose of the basketball court is to play basketball; K 4 : People who often play basketball like to play basketball. Combining D 1 and K 3 , user A often appears on the basketball court, so user A often plays basketball. Combined with K 4 , user A often plays basketball, and people who often play basketball are likely to like to play basketball, indicating that user A is likely to like to play basketball, which supports information resources in information resources There are supporting data, and information resources When there is no supporting data, tend to judge correct and error, thereby eliminating ambiguity.
  • D 1 (A
  • K 3 R IN (T ACTIVITY (PlayBasketball),T PLACE (BasketballCourt))
  • K 4 R LIKE (T PERSON (R DO (person,T ACTIVITY (PlayBasketball))),T ACTIVITY (PlayBasketball))
  • D related further deduces the information resources I new that help to judge right and wrong.
  • relevant information resource I 1 is known: User A is a member of the basketball school team.
  • Relevant knowledge resource K 5 Members of the basketball varsity team often play basketball. Combining I 1 and K 5 , user A is a member of the basketball school team, so user A often plays basketball.
  • User A often plays basketball, combined with K 4 , user A often plays basketball, and people who often play basketball are likely to like to play basketball, indicating that user A is likely to like to play basketball, which supports information resources in information resources There are supporting data, and information resources When there is no supporting data, tend to judge correct and error, thereby eliminating ambiguity.
  • I 1 R IS_A_MEMBER_OF (A,T GROUP (INS(BasketballTeam))
  • K 5 R DO (T PERSON (R IS_A_MEMBER_OF (person,T GROUP (BasketballTeam)),T ACTIVITY (PlayBasketball))
  • I related further deduces the information resource I new that helps to judge right and wrong.
  • Scenario 3 Handling of ambiguity caused by redundancy in data resources.
  • K 1 R IS (R IN (T PLACE (Hainan),T SEASON (Summer)),T TEMPERATURE (High))
  • D related further deduces the data resource D new that helps to judge right and wrong.
  • information resource I 1 data resource Sourced from the Bureau of Meteorology
  • Information Resource I 2 Data Resource from the network.
  • Knowledgeable resource K 2 Data from professional institutions are more reliable than data from the Internet. Combining information resources I 1 , I 2 and knowledge resource K 2 , data resources can be derived than data resources be more reliable. From this it can be determined correct and error, thereby eliminating ambiguity.
  • K 2 R RELIABLE_THAN (T DATA (R FROM (data,T INSTITUTE )),T DATA (R FROM (data,T INTERNET )))
  • I related further deduces the information resource I new that helps to judge right and wrong.
  • the types of resources used as transformation objects can be mainly divided into two types: data resources and information resources. The following discusses the two situations where the transformation objects are data resources and the transformation objects are information resources.
  • D 0 There are three ways to derive D 0 , which are: derivation from data resources combined with knowledge resources, derivation from information resources combined with knowledge resources, and derivation from data resources combined with information resources and knowledge resources. These three derivation modes are discussed separately below.
  • D 1 User A is 10 years old this year.
  • K1 People younger than 15 should go to school.
  • D 1 and K 1 User A is 10 years old this year, and his age is less than 15 years old, so User A should go to school. It can be further deduced that the target data resource of "user A's occupation is a student".
  • K 1 R SHOULD (T PERSON (R LESS THAN (T AGE ,15)),T ACTIVITY (educationion))
  • I 0 ⁇ D 0 (A
  • I 1 User A often goes to school
  • I 2 User A does not have a teacher qualification certificate.
  • Knowledge resources K 2 Students and teachers need to go to school frequently; K 3 : Teachers have teaching qualifications.
  • Combining I 1 and K 2 User A often goes to school, so User A is a student or teacher.
  • Combining I 2 and K 3 User A does not have a teacher qualification certificate, so User A is not a teacher. If user A is a student or teacher, and user A is not a teacher, the target data resource of "user A's occupation is a student" can be further deduced.
  • I 1 R GO_TO (A,T PLACE (INS(School)))
  • K 2 R GO_TO (T OCCUPATION (Student) AND T OCCUPATION (Teacher), T PLACE (School))
  • K 3 ROWN (T OCCUPATION (Teacher),T LICENCE (INS(TeacherCertification))
  • I 0 ⁇ D 0 (A
  • relevant data resource D 1 user A is 10 years old this year; relevant information resource I 1 : user A often goes to school.
  • K 2 Students and teachers need to go to school frequently; K 4 : The age of teachers is generally more than 20.
  • Combining I 1 and K 2 User A often goes to school, so User A is a student or teacher.
  • Combining D 1 and K 2 User A is 10 years old this year, and teachers are generally older than 20 years old, so user A is not a teacher. If user A is a student or teacher, and user A is not a teacher, the target data resource of "user A's occupation is a student" can be further deduced.
  • I 1 R GO_TO (A,T PLACE (INS(School)))
  • K 2 R GO_TO (T OCCUPATION (Student) AND T OCCUPATION (Teacher), T PLACE (School))
  • K 4 R GREATER_THAN (T AGE (T OCCUPATION (Teacher)),20)
  • I 0 ⁇ D 0 (A
  • I 0 There are three ways of deriving I 0 , namely: deriving from data resources combined with knowledge resources, from information resources combined with knowledge resources, and from data resources combined with information resources combined with knowledge resources. These three derivation modes are discussed separately below.
  • the main purpose of the football field is to play football; K 2 : People who often play football like to play football. Combining D 1 and K 1 , user A often appears on the football field, so user A often plays football. Combined with K 2 , user A often plays football, and people who often play football are likely to like to play football, so the target information resource of "user A likes to play football" can be further deduced.
  • D 1 (A
  • K 1 R IN (T ACTIVITY (PlaySoccer),T PLACE (SoccerCourt))
  • K 2 R LIKE (T PERSON (R DO (person,T ACTIVITY (PlaySoccer))),T ACTIVITY (PlaySoccer))
  • I 1 User A is a member of a football school team.
  • K 2 people who often play football like to play football
  • K 3 members of the football school team often play football.
  • I 1 and K 3 user A is a member of the football school team, so user A often plays football.
  • K 2 user A often plays football, and people who often play football are likely to like to play football, so the target information resource of "user A likes to play football" can be further deduced.
  • I 1 R IS_A_MEMBER_OF (A,T GROUP (INS(SoccerTeam))
  • K 2 R LIKE (T PERSON (R DO (person,T ACTIVITY (PlaySoccer))),T(PlaySoccer))
  • K 3 R DO (T PERSON (R IS_A_MEMBER_OF (person,T GROUP (SoccerTeam)),T ACTIVITY (PlaySoccer))
  • K 4 People who often watch football news are interested in football sports events
  • K 5 Sports include playing football, playing basketball and so on.
  • Combining D 2 and K 4 user A often reads football news, so user A is interested in football events. Because user A's interest in football may only stop at watching football matches, the information "user A is interested in football matches" cannot directly deduce that user A likes to play football.
  • Combining I 2 and K 5 user A likes sports, and sports include playing football. Because user A may be more interested in playing basketball and other sports, this information is not enough to directly infer that user A likes to play football. However, since it was previously deduced that user A is interested in football events, and combined with the information of "user A likes sports", the target information resource of "user A likes to play football" can be deduced.
  • I 2 R LIKE (A
  • K 4 R INTERESTED_IN (T PERSON (R READ (person,T NEWS (Soccer))),T SPORTS (Soccer))
  • K 5 R INCLUDE (T ACTIVITY (SportsActivity),T ACTIVITY (PlaySoccer,PlayBasketball,%))
  • FIG. 2 is a schematic structural diagram of a cross-DIKW modal text ambiguity processing system for essential computing and reasoning provided by an embodiment of the application;
  • the system can include:
  • a text analysis module 100 configured to acquire target text, and determine target data resources and target information resources in the target text;
  • a meaning determination module 200 configured to query the relevant resources of the target text according to the target data resources and/or the target information resources, and determine the textual meaning of the target text according to the relevant resources;
  • a resource supplement module 300 configured to acquire supplementary resources of the target text if the number of textual meanings of the target text is greater than 1, and generate a conditional restriction text of the target text according to the supplementary resources;
  • the text modification module 400 is configured to take the textual meaning of the text that meets the condition restriction as the actual textual meaning of the target text, and modify the target text according to the actual textual meaning.
  • the target data resources and target information resources contained in the target text are determined, and then the relevant resources of the target text are searched according to the target data resources and the target information resources, and the textual meaning of the target text is determined according to the relevant resources.
  • the conditionally restricted text of the target text is generated according to the supplementary resources of the target text, the textual meaning of the qualified text is taken as the actual textual meaning of the target text, and the target text is modified according to the actual textual meaning, eliminating the need for Ambiguity in target text. It can be seen that this embodiment can accurately identify and eliminate the ambiguity existing in the text.
  • the text analysis module 100 includes:
  • Type determination unit used for determining the resource type of the target text; wherein, the resource type includes data resources, information resources and knowledge resources, data resources are resources in the data map, information resources are resources in the information map, knowledge resources The resource is the resource in the knowledge graph;
  • a modal conversion unit configured to perform cross-modal conversion on the target text to obtain the target data resources and target information resources; wherein, the cross-modal conversion is data resources, information resources, knowledge resources, and mixed data and information resources A conversion operation between any two resources in .
  • a modal conversion unit for judging whether the target text is a data resource; if so, setting the target text as the target data resource; if not, performing cross-modal conversion on the target text obtaining the target data resource; also used to judge whether the target text is an information resource; if so, set the target text as the target information resource; if not, perform cross-modal transformation on the target text Obtain the target information resource.
  • the meaning determination module 200 is configured to obtain the associated text of the target text; and also configured to query the relevant resources of the target text from the associated text according to the target data resource and/or the target information resource.
  • the resource supplement module 300 is configured to use a data resource whose degree of association with the target data resource is greater than a preset value in the data map as a supplementary resource of the target text; and/or, is also used to add the target data resource An information resource whose degree of association with the target information resource in the information map is greater than the preset value is used as a supplementary resource for the target text.
  • a text type determination module configured to determine that the target text is text with missing data resources or information resources after determining that the number of text meanings of the target text is greater than 1; or, determine that the target text is redundant in data resources or Redundant text for information resources.
  • the meaning determination module 200 is configured to combine the relevant resources with each of the target data resources and each of the target information resources to derive the textual meaning of the target text.
  • the present application also provides a storage medium on which a computer program is stored, and when the computer program is executed, the steps provided by the above embodiments can be implemented.
  • the storage medium may include: U disk, removable hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • the present application also provides an electronic device, which may include a memory and a processor, where a computer program is stored in the memory, and when the processor invokes the computer program in the memory, the steps provided in the above embodiments can be implemented.
  • the electronic device may also include various network interfaces, power supplies and other components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种面向本质计算与推理的跨DIKW模态文本歧义处理方法、***、一种电子设备及一种存储介质,所述方法包括:获取目标文本,并确定目标文本中的目标数据资源和目标信息资源(S101);根据目标数据资源和/或目标信息资源查询目标文本的相关资源,并根据相关资源确定目标文本的文本含义(S102);若目标文本的文本含义的数量大于1,则获取目标文本的补充资源,并根据补充资源生成目标文本的条件限制文本(S103);将符合条件限制文本的文本含义作为目标文本的实际文本含义,并根据实际文本含义修改目标文本(S104),该方法能够准确识别并消除文本中存在的歧义。

Description

面向本质计算与推理的跨DIKW模态文本歧义处理方法
本申请要求于2020年10月15日提交中国专利局、申请号为202011103480.6、发明名称为“面向本质计算与推理的跨DIKW模态文本歧义处理方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及软件工程学技术领域,特别涉及一种面向本质计算与推理的跨DIKW模态文本歧义处理方法、***、一种电子设备及一种存储介质。
背景技术
大数据时代的来临,使得数据的规模变得愈发庞大。通过对数据进行关联分析可以获取到很多信息,甚至包括隐私、机密等十分重要的内容,并且数据和信息资源可被概括、逻辑推理成为知识,而知识资源又能反过来作用在数据资源和信息资源上,计算推理出更多新的对特定目标存在价值的数据资源和信息资源,甚至可以对某些特定目标进行预测分析。
歧义是指对文本内容有多种不同目的的理解,即由内容中的类型资源可以进行多种推导得到不同目的的信息资源。产生歧义的原因由两种:一种是由于内容有缺失,缺少部分数据资源或信息资源导致对内容的理解的范围较广,推导时会产生不同目的的理解;另一种是由于内容中存在冗余,冗余的数据资源或信息资源在结合不同的类型资源进行推导时会产生不同目的的理解。相关技术中主要通过机器学习模型实现文本歧义的处理,但是机器学习模型的识别准确率过度依赖训练样本的丰富度,无法对文本歧义进行有效的处理。
因此,如何准确识别并消除文本中存在的歧义是本领域技术人员目前需要解决的技术问题。
发明内容
本申请的目的是提供一种面向本质计算与推理的跨DIKW模态文本歧 义处理方法、***、一种电子设备及一种存储介质,能够准确识别并消除文本中存在的歧义。
为解决上述技术问题,本申请提供一种面向本质计算与推理的跨DIKW模态文本歧义处理方法,该方法包括:
获取目标文本,并确定所述目标文本中的目标数据资源和目标信息资源;
根据所述目标数据资源和/或所述目标信息资源查询所述目标文本的相关资源,并根据所述相关资源确定所述目标文本的文本含义;
若所述目标文本的文本含义的数量大于1,则获取所述目标文本的补充资源,并根据所述补充资源生成所述目标文本的条件限制文本;
将符合所述条件限制文本的文本含义作为所述目标文本的实际文本含义,并根据所述实际文本含义修改所述目标文本。
可选的,确定所述目标文本中的目标数据资源和目标信息资源,包括:
确定所述目标文本的资源类型;其中,所述资源类型包括数据资源、信息资源和知识资源,数据资源为数据图谱中的资源,信息资源为信息图谱中的资源,知识资源为知识图谱中的资源;
对所述目标文本执行跨模态转化得到所述目标数据资源和目标信息资源;其中,所述跨模态转化为数据资源、信息资源、知识资源、数据信息混合资源中任意两种资源之间的转化操作。
可选的,对所述目标文本执行跨模态转化得到所述目标数据资源和目标信息资源,包括:
判断所述目标文本是否为数据资源;若是,则将所述目标文本设置为所述目标数据资源;若否,则对所述目标文本执行跨模态转化得到所述目标数据资源;
判断所述目标文本是否为信息资源;若是,则将所述目标文本设置为所述目标信息资源;若否,则对所述目标文本执行跨模态转化得到所述目标信息资源。
可选的,根据所述目标数据资源和/或所述目标信息资源查询所述目标文本的相关资源,包括:
获取所述目标文本的关联文本;
根据所述目标数据资源和/或所述目标信息资源从所述关联文本中查询所述目标文本的相关资源。
可选的,获取所述目标文本的补充资源,包括:
将所述数据图谱中与所述目标数据资源的关联程度大于预设值的数据资源作为所述目标文本的补充资源;
和/或,将所述信息图谱中与所述目标信息资源的关联程度大于所述预设值的信息资源作为所述目标文本的补充资源。
可选的,在判定所述目标文本的文本含义的数量大于1之后,还包括:
判定所述目标文本为缺失数据资源或信息资源的文本;
或,判定所述目标文本为数据资源冗余或信息资源冗余的文本。
可选的,根据所述相关资源确定所述目标文本的文本含义,包括
将所述相关资源分别与每一所述目标数据资源和每一所述目标信息资源相结合推导所述目标文本的文本含义。
本申请还提供了一种面向本质计算与推理的跨DIKW模态文本歧义处理***,该***包括:
文本分析模块,用于获取目标文本,并确定所述目标文本中的目标数据资源和目标信息资源;
含义确定模块,用于根据所述目标数据资源和/或所述目标信息资源查询所述目标文本的相关资源,并根据所述相关资源确定所述目标文本的文本含义;
资源补充模块,用于若所述目标文本的文本含义的数量大于1,则获取所述目标文本的补充资源,并根据所述补充资源生成所述目标文本的条件限制文本;
文本修改模块,用于将符合所述条件限制文本的文本含义作为所述目标文本的实际文本含义,并根据所述实际文本含义修改所述目标文本。
本申请还提供了一种存储介质,其上存储有计算机程序,所述计算机程序执行时实现上述面向本质计算与推理的跨DIKW模态文本歧义处理方法执行的步骤。
本申请还提供了一种电子设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器调用所述存储器中的计算机程序时实现上述面向本质计算与推理的跨DIKW模态文本歧义处理方法执行的步骤。
本申请提供了一种面向本质计算与推理的跨DIKW模态文本歧义处理方法,包括:获取目标文本,并确定所述目标文本中的目标数据资源和目标信息资源;根据所述目标数据资源和/或所述目标信息资源查询所述目标文本的相关资源,并根据所述相关资源确定所述目标文本的文本含义;若所述目标文本的文本含义的数量大于1,则获取所述目标文本的补充资源,并根据所述补充资源生成所述目标文本的条件限制文本;将符合所述条件限制文本的文本含义作为所述目标文本的实际文本含义,并根据所述实际文本含义修改所述目标文本。
本申请在获取目标文本之后,确定目标文本中包含的目标数据资源和目标信息资源,进而根据目标数据资源和目标信息资源查询目标文本的相关资源,根据相关资源确定目标文本的文本含义。本申请根据目标文本的补充资源生成目标文本的条件限制文本,将符合条件限制文本的文本含义作为所述目标文本的实际文本含义,进而根据所述实际文本含义修改所述目标文本,消除了目标文本中的歧义。可见,本申请能够准确识别并消除文本中存在的歧义。本申请同时还提供了一种面向本质计算与推理的跨DIKW模态文本歧义处理***、一种存储介质和一种电子设备,具有上述有益效果,在此不再赘述。
附图说明
为了更清楚地说明本申请实施例,下面将对实施例中所需要使用的附图做简单的介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例所提供的一种面向本质计算与推理的跨DIKW模态文本歧义处理方法的流程图;
图2为本申请实施例所提供的一种面向本质计算与推理的跨DIKW模态文本歧义处理***的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的以下实施例可以利用基于数据图谱、信息图谱和知识图谱的多模态内容歧义判断***实现。资源元素可以包括数据资源、信息资源和知识资源三种形态,图谱指对资源元素进行整合的结果,资源元素的图谱包括数据图谱、信息图谱和知识图谱。DIKW指Data(数据),Information(信息),knowledge(知识)和Wisdom(智慧),DIKW模型是可以用于帮助理解数据、信息、知识和智慧之间的关系的模型。
数据图谱是各种数据结构包括数组、链表、栈、队列、树和图等数据资源的集合。数据图谱是通过观察获得的数字或其他类型信息的基本个体项目。信息图谱是通过数据资源和数据资源组合之后的上下文传达的,经过概念映射和相关关系组合之后的适合分析和解释的信息。知识图谱实质是语义网络,包括由信息资源总结出的统计规则的集合。知识图谱蕴含丰富的语义关系,在知识图谱上通过信息推理和实体链接可提高知识图谱的边密度和节点密度,知识图谱的无结构特性使得其自身可以无缝链接。对于本领域技术人员而言,数据图谱、信息图谱、知识图谱、数据资源、信息资源和知识资源等概念均清楚明确,具体可以参见《投入驱动的存储与计算一体化的事务处理效率优化方法》、《Modelling Data,Information and Knowledge for Security Protection of Hybrid IoT and Edge Resources》等文献中的介绍。
本申请所提供的实施例可以应用于遥感领域,即基于与遥感领域相关 的数据图谱、信息图谱和知识图谱实现文本歧义处理。
下面请参见图1,图1为本申请实施例所提供的一种面向本质计算与推理的跨DIKW模态文本歧义处理方法的流程图。
具体步骤可以包括:
S101:获取目标文本,并确定所述目标文本中的目标数据资源和目标信息资源;
其中,在获取目标文本之后,本实施例可以对目标文本执行数据资源提取操作得到目标数据资源,也可以对目标文本执行信息资源提取操作得到目标信息资源。作为一种可行的实施方式,本实施例可以利用包括数据资源和信息资源对应的模板确定符合模板的目标数据资源和目标信息资源。本实施例可以对目标文本进行文本分析,根据文本分析结果确定目标数据资源和目标信息资源。本实施例还可以将包括样本数据资源和样本信息资源的资源集合与目标文本进行文本匹配,得到目标数据资源和目标信息资源。
举例说明本实施例中从目标文本中提取目标数据资源和目标信息资源的过程:
目标文本“夏天夜晚,用户A待在书房。”中的目标数据资源可以包括
Figure PCTCN2021118178-appb-000001
(地点:书房)、
Figure PCTCN2021118178-appb-000002
(时间:夜晚)、
Figure PCTCN2021118178-appb-000003
(季节:夏天),信息资源可以包括:I 0(夏天夜晚在书房);
Figure PCTCN2021118178-appb-000004
Figure PCTCN2021118178-appb-000005
Figure PCTCN2021118178-appb-000006
Figure PCTCN2021118178-appb-000007
S102:根据所述目标数据资源和/或所述目标信息资源查询所述目标文本的相关资源,并根据所述相关资源确定所述目标文本的文本含义;
其中,本实施例可以获取目标文本的关联文本,基于关联文本中的数据资源和信息资源确定目标文本的文本含义,目标文本的任意两个文本含义互不相同,即目标文本的歧义所在。本实施例可以将目标文本得上下文 作为关联文本,也可以将其他与目标文本存在联系的文本作为关联文本。在获取所述目标文本的关联文本之后,本实施例可以根据所述目标数据资源和/或所述目标信息资源从所述关联文本中查询所述目标文本的相关资源。
在得到目标文本的相关资源之后,本实施例可以将所述相关资源分别与每一所述目标数据资源和每一所述目标信息资源相结合推导所述目标文本的文本含义。例如目标文本为“夜晚,小明在书房”,若关联文本中确定的关联资源为“春节”和“考试”,此时将会得到以下两种文本含义“夜晚,小明在书房守岁”,“夜晚,小明在书房复习”。作为一种可行的实施方式,本实施例可以将相关资源和目标文本显示至人机交互界面,以便用户确定目标文本的文本含义。
S103:若所述目标文本的文本含义的数量大于1,则获取目标文本的补充资源,并根据补充资源生成所述目标文本的条件限制文本;
若所述目标文本的文本含义的数量为1,则说明目标文本中不存在歧义;若所述目标文本的文本含义的数量大于1,则说明目标文本中存在歧义。本实施例中提到的存在歧义的目标文本可以为缺失数据资源或信息资源的文本,文本中内容缺失导致了对内容理解范围的限制的减少。在理解范围较广的情况下,结合不同的知识资源可以推导出不同目的的信息资源。通过增加对内容理解的限制,可以缩小内容的理解范围,以达到在推导出的多种不同目的的信息资源中只保留其中一个,从而消除歧义。基于数据图谱、信息图谱和知识图谱对内容进行建模,内容存在缺失的情况对应在图谱上可分为信息资源存在缺失和数据资源存在缺失两类。本实施例中提到的存在歧义的目标文本也可以为数据资源冗余或信息资源冗余的文本。文本中存在冗余的数据资源或信息资源在某一相同的问题上有多种不同目的的理解。基于数据图谱、信息图谱和知识图谱对内容进行建模。内容存在冗余的情况对应在图谱上可分为信息资源存在冗余和数据资源存在冗余两类。
其中,本实施例可以根据所述目标数据资源从数据图谱中查询所述目标文本的补充资源;本实施例还可以根据目标信息资源从信息图谱中查询 所述目标文本的补充资源。
数据图谱中包括大量的数据资源,数据图谱中的数据资源之间存在一定的关联程度,本实施例可以根据数据图谱中的数据资源关联程度查询目标文本的补充资源,例如可以将所述数据图谱中与所述目标数据资源的关联程度大于预设值的数据资源作为所述目标文本的补充资源。例如目标数据资源为“冬天”,数据图谱中与所述目标数据资源的关联程度大于预设值的数据资源可以包括“保暖指数”、“降雪概率”等。相应的,信息图谱中包括大量的信息资源,信息图谱中的信息资源之间存在一定的关联程度,本实施例可以根据信息图谱中的信息资源关联程度查询目标文本的补充资源,例如可以将所述信息图谱中与所述目标信息资源的关联程度大于所述预设值的信息资源作为所述目标文本的补充资源。例如目标信息资源为“爱因斯坦正在上课”,信息图谱中与所述目标信息资源的关联程度大于预设值的信息资源可以包括“爱因斯坦是物理学家”、“爱因斯坦当时已经结婚”等。
S104:将符合所述条件限制文本的文本含义作为所述目标文本的实际文本含义,并根据所述实际文本含义修改所述目标文本。
在获得了目标文本的条件限制文本之后,本实施例可以将符合所述条件限制文本的文本含义作为所述目标文本的实际文本含义。继续以目标信息资源“爱因斯坦正在上课”为例,目标文本的将会存在以下两个文本含义“1、爱因斯坦的职业是学生”、“2、爱因斯坦的职业是教师”;若补充资源对应的条件限制文本为“爱因斯坦是物理学家,且当时爱因斯坦已经结婚”,可以利用条件限制文本确定目标文本的实际文本含义。本实施例可以根据实际文本含义修改目标文本,以便消除目标文本中的歧义。作为一种可行的实施方式,本实施例可以将条件限制文本和目标文本的每一文本含义显示至人机交互界面,以便用户确定将符合所述条件限制文本的文本含义。
本实施例在获取目标文本之后,确定目标文本中包含的目标数据资源和目标信息资源,进而根据目标数据资源和目标信息资源查询目标文本的相关资源,根据相关资源确定目标文本的文本含义。本实施例根据目标文 本的补充资源生成目标文本的条件限制文本,将符合条件限制文本的文本含义作为所述目标文本的实际文本含义,进而根据所述实际文本含义修改所述目标文本,消除了目标文本中的歧义。可见,本实施例能够准确识别并消除文本中存在的歧义。
作为对于图1对应实施例的进一步介绍,可以通过以下方式确定目标文本中的目标数据资源和目标信息资源:确定所述目标文本的资源类型;对所述目标文本执行跨模态转化得到所述目标数据资源和目标信息资源;其中,所述资源类型包括数据资源、信息资源和知识资源,数据资源为数据图谱中的资源,信息资源为信息图谱中的资源,知识资源为知识图谱中的资源;上述所述跨模态转化为数据资源、信息资源、知识资源、数据信息混合资源中任意两种资源之间的转化操作。数据信息混合资源为数据资源与信息资源相混合的资源。
具体的,在执行跨模态转化的过程中,可以执行以下操作:判断所述目标文本是否为数据资源;若是,则将所述目标文本设置为所述目标数据资源;若否,则对所述目标文本执行跨模态转化得到所述目标数据资源;判断所述目标文本是否为信息资源;若是,则将所述目标文本设置为所述目标信息资源;若否,则对所述目标文本执行跨模态转化得到所述目标信息资源。
下面通过在实际应用中的实施例说明上述实施例描述的流程。
场景1:对于信息资源存在缺失导致的歧义的处理。
文本内容:“夏天夜晚,用户A待在书房。”可以对应如下数据、信息资源:
Figure PCTCN2021118178-appb-000008
Figure PCTCN2021118178-appb-000009
Figure PCTCN2021118178-appb-000010
Figure PCTCN2021118178-appb-000011
由于该内容缺少“用户A待在书房做什么”这一信息资源,因此对于 该内容进行理解时会有歧义。例如结合数据资源
Figure PCTCN2021118178-appb-000012
“用户待在书房”和知识资源K 1:“书房是用来学习的地方”,可以推导出用户A待在书房的目的是学习。而结合数据资源
Figure PCTCN2021118178-appb-000013
:“晚上”和知识资源K 2:“人在晚上一般会睡觉”,可以推导出用户A可能目的是在书房睡觉。这两种推导方式在没有其它相关资源的情况下都是正确的,但却产生了不同目的的信息资源,导致了歧义。
上述内容符号化表达如下:
已知:K 1=R IN(T ACTIVITY(Study),T PLACE(Studyroom))
K 2=R AT(T ACTIVITY(Sleep),T TIME(Night))
可进行推导:
Figure PCTCN2021118178-appb-000014
Figure PCTCN2021118178-appb-000015
Figure PCTCN2021118178-appb-000016
本实施例可以通过增加相关的数据资源或信息资源来缩小内容理解的范围,从而消除歧义。以下分别针对增加数据资源和增加信息资源两种情形进行讨论。
方式A1:增加数据资源
若知道相关的数据资源D 1:卧室的空调是坏的;D 2:书房的空调是好的。结合已知的数据资源
Figure PCTCN2021118178-appb-000017
:“夏天”,和知识资源K 3:“夏天很热”,就可以推导出卧室温度高而书房温度低。上述数据资源增加了对书房环境的限制,从而将对内容理解的范围缩小至温度相关的领域。再结合知识资源K 4:“人喜欢在凉快的地方睡觉”,可以推导出书房温度低适合睡觉,支持了之前推导出的“用户A待在书房的目的是睡觉”这一信息资源,从而消除了歧义。
上述内容符号化表达如下:
已知:D 1=(T FACILITY(INS(AIR_CONDITION Bedroom))|T CONDITION(Broken))
D 2=(T FACILITY(INS(AIR_CONDITION Studyroom))|T CONDITION(Normal))
K 3=R IS(T SEASON(Summer),T TEMPERATURE(High))
K 4=R LIKE(T PERSON,R IN(T ACTIVITY(Sleep),R IS(T PLACE,T TEMPERATURE(Low))))
可进行推导:
Figure PCTCN2021118178-appb-000018
Figure PCTCN2021118178-appb-000019
Figure PCTCN2021118178-appb-000020
Figure PCTCN2021118178-appb-000021
上述方式A1的算法实现过程如下:
由已知数据资源D 0、信息资源I 0结合相关知识资源(即上文提到的相关资源),推导出不同目的的信息资源I new1和I new2
在数据图谱中检索相关的数据资源D related(即上文提到的补充资源)。
由D related结合相关信息资源、知识资源,进一步推导出能缩小理解范围的信息资源I new3
判断I new3与I new1、I new2之间的关系,保留I new3支持的信息资源,删除其它信息资源。
将剩余的唯一信息资源设定为最终结果,消除歧义。
方式A2:增加信息资源
若知道相关的信息资源I 1:用户A不喜欢学习。结合
Figure PCTCN2021118178-appb-000022
“晚上”,和知识资源K 5:“喜欢学习的人才可能会在晚上学习”,可以推导出用户A不太可能在这个时间在书房学习。该信息资源将“用户A待在书房的目的是学习”这一信息资源排除出了对内容的理解范围,剩下唯一的信息资源“用户A待在书房的目的是睡觉”就是最终结果,从而消除了歧义。
上述内容符号化表达如下:
已知:I 1=!R LIKE(A,T ACTIVITY(Study))
K 5=R AT(R DO(T PERSON(R LIKE(T PERSON,T ACTIVITY(Study))),T ACTIVITY(Study)),T TIME(Night))
可以进行推导:
Figure PCTCN2021118178-appb-000023
Figure PCTCN2021118178-appb-000024
上述方式A2的算法实现过程如下:
由已知数据资源D 0、信息资源I 0结合相关知识资源(即上文提到的相 关资源),推导出不同目的的信息资源I new1和I new2
在信息图谱中检索相关的信息资源I related(即上文提到的补充资源)。
由I related结合相关数据资源、知识资源,进一步推导出能缩小理解范围的信息资源I new3
判断I new3与I new1、I new2之间的关系,删除I new3反对的信息资源,保留其它信息资源。
将剩余的唯一信息资源设定为最终结果,消除歧义。
场景2:对于信息资源存在缺失导致的歧义的处理。
文本内容:“用户A的辈分比用户B的辈分大。”可以对应如下数据、信息资源:
Figure PCTCN2021118178-appb-000025
Figure PCTCN2021118178-appb-000026
Figure PCTCN2021118178-appb-000027
由于该内容缺少“用户A的年龄和用户B的年龄”相关的数据资源,因此基于此内容对“用户A和用户B的年龄大小”这一信息资源存在不同目的的理解。虽然有知识资源K_1:“辈分高有可能年龄大”,可以推导出“用户A可能比用户B年龄大”这一信息资源。但是辈分高且年龄小的例子也有很多,所以仍然无法排除“用户A可能比用户B年龄小”这一信息资源。
上述内容符号化表达如下:
已知:K 1=R PROBABLY_GREATER_THAN(T AGE(T PERSON(T SENIORITY(High)),T AGE(T PERSON(T SENIORITY(Low))
可进行推导:
Figure PCTCN2021118178-appb-000028
Figure PCTCN2021118178-appb-000029
同样也可以通过增加相关的数据资源或信息资源来缩小内容理解的范围,从而消除歧义。以下分别针对增加数据资源和增加信息资源两种情形进行讨论。
方式B1:增加数据资源
若知道相关的数据资源D 1:用户A心智成熟;D 2:用户B心智天真。 结合D 1、D 2可以推导出“用户A比用户B更成熟”这一信息资源。上述数据资源增加了在判断“用户A和用户B的年龄大小”时,对用户A和用户B心智成熟关系的限制,将对内容的理解范围进一步缩小。支持了之前推导的“用户A比用户B年龄大”这一信息资源。
上述内容符号化表达如下:
已知:D 1=(A|T MIND(Mature))
D 2=(B|T MIND(Naieve))
可进行推导:
Figure PCTCN2021118178-appb-000030
Figure PCTCN2021118178-appb-000031
上述方式B1的算法实现过程如下:
由已知数据资源D 0、信息资源I 0结合相关知识资源,推导出不同目的的信息资源I new1和I new2
在数据图谱中检索相关的数据资源D related
由D related结合相关信息资源、知识资源,进一步推导出能缩小理解范围的信息资源I new3
判断I new3与I new1、I new2之间的关系,删除I new3反对的信息资源,保留其它信息资源。
将剩余的唯一信息资源设定为最终结果,消除歧义。
方式B2:增加信息资源
若知道相关的信息资源I 1:用户B对用户A的态度很尊敬。有知识资源K 3:“地位低的人对地位高的人态度尊敬”。结合I 1和K 3可推导出“用户A比用户B地位高”这一信息资源。上述数据资源增加了在判断“用户A和用户B的年龄大小”时,对用户A和用户B地位关系的限制,将对内容的理解范围进一步缩小。支持了之前推导的“用户A比用户B年龄大”这一信息资源。
上述内容符号化表达如下:
已知:I 1=R RESPECT(B,A)
K 3=R RESPECT(T PERSON(T STATUS(Low)),T PERSON(T STATUS(High)))
可进行推导:
Figure PCTCN2021118178-appb-000032
Figure PCTCN2021118178-appb-000033
上述方式B2的算法实现过程如下:
由已知数据资源D 0、信息资源I 0结合相关知识资源(即上文提到的相关资源),推导出不同目的的信息资源I new1和I new2
在信息图谱中检索相关的信息资源I related(即上文提到的补充资源)。
由I related结合相关数据资源、知识资源,进一步推导出能缩小理解范围的信息资源I new3
判断I new3与I new1、I new2之间的关系,删除I new3反对的信息资源,保留其它信息资源。
将剩余的唯一信息资源设定为最终结果,消除歧义。
场景3:对于信息资源存在冗余导致的歧义的处理。
文本内容:“用户A喜欢打篮球,用户A讨厌运动。”可以对应如下信息资源:
Figure PCTCN2021118178-appb-000034
Figure PCTCN2021118178-appb-000035
有知识资源K 1:打篮球属于运动;K 2:关系“讨厌”和关系“喜欢”相矛盾。由
Figure PCTCN2021118178-appb-000036
和K 1,用户A讨厌运动,而打篮球属于运动的一种,则可以推导出新的信息资源I new1:用户A讨厌打篮球,由K 2可知
Figure PCTCN2021118178-appb-000037
与I new1相矛盾。所以对于“用户A对打篮球的态度”这个问题,
Figure PCTCN2021118178-appb-000038
Figure PCTCN2021118178-appb-000039
有不同目的的理解,即内容中的信息资源存在冗余。
上述内容符号化表示如下:
已知:K 1=R BELONGTO(T ACTIVITY(PlayBasketball),T ACTIVITY(Sports))
K 2=R OPPOSE(T RELATION(Like),T RELATION(Hate))
可进行推导:
Figure PCTCN2021118178-appb-000040
Figure PCTCN2021118178-appb-000041
由上述推导可知:冗余的信息资源
Figure PCTCN2021118178-appb-000042
Figure PCTCN2021118178-appb-000043
是矛盾的,所以其中必有一个存在错误。可以通过增加相关的数据资源或信息资源来帮助判断冗余的信息资源的正误,从而消除歧义。以下分别针对增加数据资源和增加信息资源两种情形进行讨论。
方式C1:增加数据资源
若知道用户A相关的空间数据资源D 1:篮球场。有相关知识资源K 3
篮球场主要用途是打篮球;K 4:经常打篮球的人喜欢打篮球。结合D 1和K 3,用户A经常出现在篮球场,所以用户A经常打篮球。再结合K 4,用户A经常打篮球,而经常打篮球的人很可能喜欢打篮球,说明用户A很可能喜欢打篮球,支持了信息资源
Figure PCTCN2021118178-appb-000044
在信息资源
Figure PCTCN2021118178-appb-000045
有相支持的数据,而信息资源
Figure PCTCN2021118178-appb-000046
没有相支持的数据时,倾向于判定
Figure PCTCN2021118178-appb-000047
正确而
Figure PCTCN2021118178-appb-000048
错误,从而消除了歧义。
上述内容符号化表示如下:
已知:D 1=(A|T PLACE(INS(BasketballCourt))
K 3=R IN(T ACTIVITY(PlayBasketball),T PLACE(BasketballCourt))
K 4=R LIKE(T PERSON(R DO(person,T ACTIVITY(PlayBasketball))),T ACTIVITY(PlayBasketball))
可进行推导:
Figure PCTCN2021118178-appb-000049
Figure PCTCN2021118178-appb-000050
Figure PCTCN2021118178-appb-000051
上述方式C1的算法实现过程如下:
已知存在冲突的信息资源
Figure PCTCN2021118178-appb-000052
Figure PCTCN2021118178-appb-000053
在数据图谱中检索相关的数据资源D related
由D related结合相关信息资源、知识资源,进一步推导出帮助判断正误的信息资源I new
判断I new
Figure PCTCN2021118178-appb-000054
之间的关系,保留I new支持的结果,删除另一个结果。
将I new支持的结果设定为最终结果,消除歧义。
方式C2:增加信息资源
若知道相关的信息资源I 1:用户A是篮球校队的成员。有相关知识资源K 5:篮球校队的成员经常打篮球。结合I 1和K 5,用户A是篮球校队的成员,所以用户A经常打篮球。用户A经常打篮球,再结合K 4,用户A经常打篮球,而经常打篮球的人很可能喜欢打篮球,说明用户A很可能喜欢打篮球,支持了信息资源
Figure PCTCN2021118178-appb-000055
在信息资源
Figure PCTCN2021118178-appb-000056
有相支持的数据,而信息资源
Figure PCTCN2021118178-appb-000057
没有相支持的数据时,倾向于判定
Figure PCTCN2021118178-appb-000058
正确而
Figure PCTCN2021118178-appb-000059
错误,从而消除了歧义。
上述内容符号化表示如下:
已知:I 1=R IS_A_MEMBER_OF(A,T GROUP(INS(BasketballTeam))
K 5=R DO(T PERSON(R IS_A_MEMBER_OF(person,T GROUP(BasketballTeam)),T ACTIVITY(PlayBasketball))
可进行推导:
Figure PCTCN2021118178-appb-000060
Figure PCTCN2021118178-appb-000061
Figure PCTCN2021118178-appb-000062
上述方式C2的算法实现过程如下:
已知存在冲突的信息资源
Figure PCTCN2021118178-appb-000063
Figure PCTCN2021118178-appb-000064
在信息图谱中检索相关的数据资源I related
由I related结合相关数据资源、知识资源,进一步推导出帮助判断正误的信息资源I new
判断I new
Figure PCTCN2021118178-appb-000065
之间的关系,保留I new支持的结果,删除另一个结 果。
将I new支持的结果设定为最终结果,消除歧义。
场景3:对于数据资源存在冗余导致的歧义的处理。
内容中同时存在数据资源
Figure PCTCN2021118178-appb-000066
今天温度为30度;
Figure PCTCN2021118178-appb-000067
今天温度为20度。可以对应如下数据资源:
Figure PCTCN2021118178-appb-000068
Figure PCTCN2021118178-appb-000069
针对“今天的温度”这一问题,数据资源
Figure PCTCN2021118178-appb-000070
Figure PCTCN2021118178-appb-000071
所表示内容相矛盾,说明冗余的数据资源
Figure PCTCN2021118178-appb-000072
Figure PCTCN2021118178-appb-000073
中必有一个存在错误。可以通过增加相关的数据资源或信息资源来帮助判断冗余的数据资源的正误,从而消除歧义。以下分别针对增加数据资源和增加信息资源两种情形进行讨论。
方式D1:增加数据资源
若知道数据资源D 1:季节夏天;D 2:地点海南。有知识资源K 1:海南夏天气温较高。结合数据资源D 1、D 2和知识资源源K 1,可以推导出今天的气温应该较高。支持了数据资源
Figure PCTCN2021118178-appb-000074
在数据资源
Figure PCTCN2021118178-appb-000075
有相支持的数据,而数据资源
Figure PCTCN2021118178-appb-000076
没有相支持的数据时,倾向于判定
Figure PCTCN2021118178-appb-000077
正确而
Figure PCTCN2021118178-appb-000078
错误,从而消除了歧义。
上述内容符号化表示如下:
已知:D 1=(T SEASON(Summer))
D 2=(T PLACE(Hainan))
K 1=R IS(R IN(T PLACE(Hainan),T SEASON(Summer)),T TEMPERATURE(High))
可进行推导:
Figure PCTCN2021118178-appb-000079
Figure PCTCN2021118178-appb-000080
上述方式D1的算法实现过程如下:
已知存在冲突的数据资源
Figure PCTCN2021118178-appb-000081
Figure PCTCN2021118178-appb-000082
在数据图谱中检索相关的数据资源D related
由D related结合相关信息资源、知识资源,进一步推导出帮助判断正误的数据资源D new
判断D new
Figure PCTCN2021118178-appb-000083
之间的关系,保留D new支持的结果,删除另一个结果。
将D new支持的结果设定为最终结果,消除歧义。
方式D2:增加信息资源
若知道信息资源I 1:数据资源
Figure PCTCN2021118178-appb-000084
来源于气象局;信息资源I 2:数据资源
Figure PCTCN2021118178-appb-000085
来源于网络。有知识资源K 2:来源于专业机构的数据比来源于网络的数据更可靠。结合信息资源I 1,I 2和知识资源K 2,可以推导出数据资源
Figure PCTCN2021118178-appb-000086
比数据资源
Figure PCTCN2021118178-appb-000087
要更可靠。由此可以判定
Figure PCTCN2021118178-appb-000088
正确而
Figure PCTCN2021118178-appb-000089
错误,从而消除了歧义。
上述内容符号化表示如下:
已知:
Figure PCTCN2021118178-appb-000090
Figure PCTCN2021118178-appb-000091
K 2=R RELIABLE_THAN(T DATA(R FROM(data,T INSTITUTE)),T DATA(R FROM(data,T INTERNET)))
可进行推导:
Figure PCTCN2021118178-appb-000092
Figure PCTCN2021118178-appb-000093
上述方式D2的算法实现过程如下:
已知存在冲突的信息资源
Figure PCTCN2021118178-appb-000094
Figure PCTCN2021118178-appb-000095
在信息图谱中检索相关的信息资源I related
由I related结合相关数据资源,进一步推导出帮助判断正误的信息资源I new
判断I new
Figure PCTCN2021118178-appb-000096
之间的关系,保留I new支持的结果,删除另一个结果。
将I new支持的结果设定为最终结果,消除歧义。
无论是歧义现象的检测,还是为了消除歧义增加相关类型资源,都需要完成由原有类型资源向新的类型资源的跨模态转化。作为转化对象的类型资源主要可分为数据资源和信息资源两种,以下针对这转化对象为数据资源和转化对象为信息资源两种情形进行讨论。
模态转化情形1:
若转化对象为数据资源:“用户A的职业”。符号化表示如下:
D 0=(A|T OCCUPATION(INS(Student))
有三种可以推导出D 0的方式,分别是:由数据资源结合知识资源进行推导、由信息资源结合知识资源进行推导和由数据资源结合信息资源结合知识资源进行推导。以下针对这三种推导模式分别进行讨论。
数据资源结合知识资源进行推导的过程如下:
若有相关的数据资源D 1:用户A今年10岁。有相关的知识资源K 1:年龄小于15岁的人应该去上学。结合D 1和K 1:用户A今年10岁,他的年龄小于15岁,所以用户A应该去上学。即可进一步推导出“用户A的职业是学生”这一目标数据资源。
上述内容符号化表示如下:
已知:D 1=(A|T AGE(10))
K 1=R SHOULD(T PERSON(R LESS THAN(T AGE,15)),T ACTIVITY(Education))
可进行推导:
Figure PCTCN2021118178-appb-000097
Figure PCTCN2021118178-appb-000098
I 0→D 0=(A|T OCCUPATION(INS(Student))
信息资源结合知识资源进行推导的过程如下:
若有相关的信息资源I 1:用户A经常去学校;I 2:用户A没有教师资格证。有知识资源K 2:学生和教师需要经常去学校;K 3:教师拥有教师资 格证。结合I 1和K 2:用户A经常去学校,所以用户A是学生或教师。结合I 2和K 3:用户A没有教师资格证,所以用户A不是教师。用户A是学生或教师,并且用户A不是教师,即可进一步推导出“用户A的职业是学生”这一目标数据资源。
上述内容符号化表示如下:
已知:I 1=R GO_TO(A,T PLACE(INS(School)))
I 2=!R OWN(A,T LICENCE(INS(TeacherCertification)))
K 2=R GO_TO(T OCCUPATION(Student)AND T OCCUPATION(Teacher),T PLACE(School))
K 3=R OWN(T OCCUPATION(Teacher),T LICENCE(INS(TeacherCertification))
可进行推导:
Figure PCTCN2021118178-appb-000099
Figure PCTCN2021118178-appb-000100
Figure PCTCN2021118178-appb-000101
I 0→D 0=(A|T OCCUPATION(INS(Student))
数据资源混合信息资源结合知识资源进行推导的过程如下:
若有相关的数据资源D 1:用户A今年10岁;相关的信息资源I 1:用户A经常去学校。有知识资源K 2:学生和教师需要经常去学校;K 4:教师的年龄一般大于20。结合I 1和K 2:用户A经常去学校,所以用户A是学生或教师。结合D 1和K 2:用户A今年10岁,而教师的年龄一般大于20岁,所以用户A不是教师。用户A是学生或教师,并且用户A不是教师,即可进一步推导出“用户A的职业是学生”这一目标数据资源。
上述内容符号化表示如下:
已知:D 1=(A|T AGE(10))
I 1=R GO_TO(A,T PLACE(INS(School)))
K 2=R GO_TO(T OCCUPATION(Student)AND T OCCUPATION(Teacher),T PLACE(School))
K 4=R GREATER_THAN(T AGE(T OCCUPATION(Teacher)),20)
可进行推导:
Figure PCTCN2021118178-appb-000102
Figure PCTCN2021118178-appb-000103
Figure PCTCN2021118178-appb-000104
I 0→D 0=(A|T OCCUPATION(INS(Student))
模态转化情形2:
若转化对象为信息资源:“用户A喜欢踢足球”。符号化表示如下:
$$I 0=R_{LIKE}(A,\T_{ACTIVITY}(INS(Play Soccer))\\$$
有三种推导I 0的方式,分别是:由数据资源结合知识资源进行推导、由信息资源结合知识资源进行推导和由数据资源结合信息资源结合知识资源进行推导。以下针对这三种推导模式分别进行讨论。
数据资源结合知识资源进行推导的过程如下:
若有用户A相关的空间数据资源D 1:足球场。有相关知识资源K 1
足球场主要用途是踢足球;K 2:经常踢足球的人喜欢踢足球。结合D 1和K 1,用户A经常出现在足球场,所以用户A经常踢足球。再结合K 2,用户A经常踢足球,而经常踢足球的人很可能喜欢踢足球,即可进一步推导出“用户A喜欢踢足球”这一目标信息资源。
上述内容符号化表示如下:
已知:D 1=(A|T PLACE(INS(SoccerCourt))
K 1=R IN(T ACTIVITY(PlaySoccer),T PLACE(SoccerCourt))
K 2=R LIKE(T PERSON(R DO(person,T ACTIVITY(PlaySoccer))),T ACTIVITY(PlaySoccer))
可进行推导:
Figure PCTCN2021118178-appb-000105
Figure PCTCN2021118178-appb-000106
信息资源结合知识资源进行推导的过程如下:
若有信息资源I 1:用户A是足球校队的成员。有相关知识资源K 2:经常踢足球的人喜欢踢足球;K 3:足球校队的成员经常踢足球。结合I 1和K 3,用户A是足球校队的成员,所以用户A经常踢足球。再结合K 2,用户A经常踢足球,而经常踢足球的人很可能喜欢踢足球,即可进一步推导出“用户A喜欢踢足球”这一目标信息资源。
上述内容符号化表示如下:
已知:I 1=R IS_A_MEMBER_OF(A,T GROUP(INS(SoccerTeam))
K 2=R LIKE(T PERSON(R DO(person,T ACTIVITY(PlaySoccer))),T(PlaySoccer))
K 3=R DO(T PERSON(R IS_A_MEMBER_OF(person,T GROUP(SoccerTeam)),T ACTIVITY(PlaySoccer))
可进行推导:
Figure PCTCN2021118178-appb-000107
Figure PCTCN2021118178-appb-000108
数据资源混合信息资源结合知识资源进行推导的过程如下:
若有用户A相关的阅读数据资源D 2:足球新闻;以及信息资源I 2:用 户A喜欢运动。有知识资源K 4:经常看足球新闻的人对足球体育赛事感兴趣;K 5:运动包括踢足球、打篮球等等。结合D 2和K 4,用户A经常阅读足球新闻,所以用户A对足球赛事感兴趣。因为用户A对足球的兴趣可能只停留在看足球比赛上,所以“用户A对足球赛事感兴趣”这一信息不能直接推导出用户A喜欢踢足球。结合I 2和K 5,用户A喜欢运动,运动包含踢足球。因为用户A可能对打篮球等运动更感兴趣,所以这一信息也不足以直接推导出用户A喜欢踢足球。但示由于之前推导出用户A对足球赛事感兴趣,再结合“用户A喜欢运动”这一信息,即可推导出“用户A喜欢踢足球”这一目标信息资源。
上述内容符号化表示如下:
已知:D 2=(A|T NEWS(Soccer))
I 2=R LIKE(A|T ACTIVITY(INS(SportsActivity)))
K 4=R INTERESTED_IN(T PERSON(R READ(person,T NEWS(Soccer))),T SPORTS(Soccer))
K 5=R INCLUDE(T ACTIVITY(SportsActivity),T ACTIVITY(PlaySoccer,PlayBasketball,...))
可进行推导:
Figure PCTCN2021118178-appb-000109
Figure PCTCN2021118178-appb-000110
请参见图2,图2为本申请实施例所提供的一种面向本质计算与推理的跨DIKW模态文本歧义处理***的结构示意图;
该***可以包括:
文本分析模块100,用于获取目标文本,并确定所述目标文本中的目标数据资源和目标信息资源;
含义确定模块200,用于根据所述目标数据资源和/或所述目标信息资源查询所述目标文本的相关资源,并根据所述相关资源确定所述目标文本的文本含义;
资源补充模块300,用于若所述目标文本的文本含义的数量大于1,则获取所述目标文本的补充资源,并根据所述补充资源生成所述目标文本的条件限制文本;
文本修改模块400,用于将符合所述条件限制文本的文本含义作为所述目标文本的实际文本含义,并根据所述实际文本含义修改所述目标文本。
本实施例在获取目标文本之后,确定目标文本中包含的目标数据资源和目标信息资源,进而根据目标数据资源和目标信息资源查询目标文本的相关资源,根据相关资源确定目标文本的文本含义。本实施例根据目标文本的补充资源生成目标文本的条件限制文本,将符合条件限制文本的文本含义作为所述目标文本的实际文本含义,进而根据所述实际文本含义修改所述目标文本,消除了目标文本中的歧义。可见,本实施例能够准确识别并消除文本中存在的歧义。
进一步的,文本分析模块100包括:
类型确定单元,用于确定所述目标文本的资源类型;其中,所述资源类型包括数据资源、信息资源和知识资源,数据资源为数据图谱中的资源,信息资源为信息图谱中的资源,知识资源为知识图谱中的资源;
模态转化单元,用于对所述目标文本执行跨模态转化得到所述目标数据资源和目标信息资源;其中,所述跨模态转化为数据资源、信息资源、知识资源、数据信息混合资源中任意两种资源之间的转化操作。
进一步的,模态转化单元,用于判断所述目标文本是否为数据资源;若是,则将所述目标文本设置为所述目标数据资源;若否,则对所述目标文本执行跨模态转化得到所述目标数据资源;还用于判断所述目标文本是否为信息资源;若是,则将所述目标文本设置为所述目标信息资源;若否,则对所述目标文本执行跨模态转化得到所述目标信息资源。
进一步的,含义确定模块200用于获取所述目标文本的关联文本;还用于根据所述目标数据资源和/或所述目标信息资源从所述关联文本中查询所述目标文本的相关资源。
进一步的,资源补充模块300用于将所述数据图谱中与所述目标数据资源的关联程度大于预设值的数据资源作为所述目标文本的补充资源;和/或,还用于将所述信息图谱中与所述目标信息资源的关联程度大于所述预设值的信息资源作为所述目标文本的补充资源。
进一步的,还包括:
文本种类判定模块,用于在判定所述目标文本的文本含义的数量大于1之后,判定所述目标文本为缺失数据资源或信息资源的文本;或,判定 所述目标文本为数据资源冗余或信息资源冗余的文本。
进一步的,含义确定模块200用于将所述相关资源分别与每一所述目标数据资源和每一所述目标信息资源相结合推导所述目标文本的文本含义。
由于***部分的实施例与方法部分的实施例相互对应,因此***部分的实施例请参见方法部分的实施例的描述,这里暂不赘述。
本申请还提供了一种存储介质,其上存有计算机程序,该计算机程序被执行时可以实现上述实施例所提供的步骤。该存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请还提供了一种电子设备,可以包括存储器和处理器,所述存储器中存有计算机程序,所述处理器调用所述存储器中的计算机程序时,可以实现上述实施例所提供的步骤。当然所述电子设备还可以包括各种网络接口,电源等组件。
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的***而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。
还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物 品或者设备所固有的要素。在没有更多限制的状况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。

Claims (10)

  1. 一种面向本质计算与推理的跨DIKW模态文本歧义处理方法,其特征在于,包括:
    获取目标文本,并确定所述目标文本中的目标数据资源和目标信息资源;
    根据所述目标数据资源和/或所述目标信息资源查询所述目标文本的相关资源,并根据所述相关资源确定所述目标文本的文本含义;
    若所述目标文本的文本含义的数量大于1,则获取所述目标文本的补充资源,并根据所述补充资源生成所述目标文本的条件限制文本;
    将符合所述条件限制文本的文本含义作为所述目标文本的实际文本含义,并根据所述实际文本含义修改所述目标文本。
  2. 根据权利要求1所述面向本质计算与推理的跨DIKW模态文本歧义处理方法,其特征在于,确定所述目标文本中的目标数据资源和目标信息资源,包括:
    确定所述目标文本的资源类型;其中,所述资源类型包括数据资源、信息资源和知识资源,数据资源为数据图谱中的资源,信息资源为信息图谱中的资源,知识资源为知识图谱中的资源;
    对所述目标文本执行跨模态转化得到所述目标数据资源和目标信息资源;其中,所述跨模态转化为数据资源、信息资源、知识资源、数据信息混合资源中任意两种资源之间的转化操作。
  3. 根据权利要求2所述面向本质计算与推理的跨DIKW模态文本歧义处理方法,其特征在于,对所述目标文本执行跨模态转化得到所述目标数据资源和目标信息资源,包括:
    判断所述目标文本是否为数据资源;若是,则将所述目标文本设置为所述目标数据资源;若否,则对所述目标文本执行跨模态转化得到所述目标数据资源;
    判断所述目标文本是否为信息资源;若是,则将所述目标文本设置为所述目标信息资源;若否,则对所述目标文本执行跨模态转化得到所述目标信息资源。
  4. 根据权利要求1所述面向本质计算与推理的跨DIKW模态文本歧义处理方法,其特征在于,根据所述目标数据资源和/或所述目标信息资源查询所述目标文本的相关资源,包括:
    获取所述目标文本的关联文本;
    根据所述目标数据资源和/或所述目标信息资源从所述关联文本中查询所述目标文本的相关资源。
  5. 根据权利要求1所述面向本质计算与推理的跨DIKW模态文本歧义处理方法,其特征在于,获取所述目标文本的补充资源,包括:
    将所述数据图谱中与所述目标数据资源的关联程度大于预设值的数据资源作为所述目标文本的补充资源;
    和/或,将所述信息图谱中与所述目标信息资源的关联程度大于所述预设值的信息资源作为所述目标文本的补充资源。
  6. 根据权利要求1所述面向本质计算与推理的跨DIKW模态文本歧义处理方法,其特征在于,在判定所述目标文本的文本含义的数量大于1之后,还包括:
    判定所述目标文本为缺失数据资源或信息资源的文本;
    或,判定所述目标文本为数据资源冗余或信息资源冗余的文本。
  7. 根据权利要求1至6任一项所述面向本质计算与推理的跨DIKW模态文本歧义处理方法,其特征在于,根据所述相关资源确定所述目标文本的文本含义,包括:
    将所述相关资源分别与每一所述目标数据资源和每一所述目标信息资源相结合推导所述目标文本的文本含义。
  8. 一种面向本质计算与推理的跨DIKW模态文本歧义处理***,其特征在于,包括:
    文本分析模块,用于获取目标文本,并确定所述目标文本中的目标数据资源和目标信息资源;
    含义确定模块,用于根据所述目标数据资源和/或所述目标信息资源查询所述目标文本的相关资源,并根据所述相关资源确定所述目标文本的文本含义;
    资源补充模块,用于若所述目标文本的文本含义的数量大于1,则获取所述目标文本的补充资源,并根据所述补充资源生成所述目标文本的条件限制文本;
    文本修改模块,用于将符合所述条件限制文本的文本含义作为所述目标文本的实际文本含义,并根据所述实际文本含义修改所述目标文本。
  9. 一种电子设备,其特征在于,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器调用所述存储器中的计算机程序时实现如权利要求1至7任一项所述面向本质计算与推理的跨DIKW模态文本歧义处理方法的步骤。
  10. 一种存储介质,其特征在于,所述存储介质中存储有计算机可执行指令,所述计算机可执行指令被处理器加载并执行时,实现如上权利要求1至7任一项所述面向本质计算与推理的跨DIKW模态文本歧义处理方法的步骤。
PCT/CN2021/118178 2020-10-15 2021-09-14 面向本质计算与推理的跨dikw模态文本歧义处理方法 WO2022078145A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA3136527A CA3136527C (en) 2020-10-15 2021-09-14 Cross-dikw-mode ambiguity processing method oriented to essential computing and reasoning

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011103480.6A CN112232085B (zh) 2020-10-15 2020-10-15 面向本质计算与推理的跨dikw模态文本歧义处理方法
CN202011103480.6 2020-10-15

Publications (1)

Publication Number Publication Date
WO2022078145A1 true WO2022078145A1 (zh) 2022-04-21

Family

ID=74117326

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/118178 WO2022078145A1 (zh) 2020-10-15 2021-09-14 面向本质计算与推理的跨dikw模态文本歧义处理方法

Country Status (2)

Country Link
CN (1) CN112232085B (zh)
WO (1) WO2022078145A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232085B (zh) * 2020-10-15 2021-10-08 海南大学 面向本质计算与推理的跨dikw模态文本歧义处理方法
CN113538179A (zh) * 2021-06-11 2021-10-22 海南大学 一种基于dikw的专利智能申请方法及***
CN114039865B (zh) * 2021-08-30 2023-03-31 海南大学 意图计算导向的跨dikw模态传输与优化***
CN113810480B (zh) * 2021-09-03 2022-09-16 海南大学 基于dikw内容对象的情感通讯方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160124939A1 (en) * 2014-10-31 2016-05-05 International Business Machines Corporation Disambiguation in mention detection
KR20190094078A (ko) * 2018-01-17 2019-08-12 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. 모호한 엔티티 단어에 기반한 텍스트 처리 방법과 장치
CN110969022A (zh) * 2018-09-29 2020-04-07 北京国双科技有限公司 语义确定方法及相关设备
CN111651570A (zh) * 2020-05-13 2020-09-11 深圳追一科技有限公司 文本语句处理方法、装置、电子设备以及存储介质
CN112232085A (zh) * 2020-10-15 2021-01-15 海南大学 面向本质计算与推理的跨dikw模态文本歧义处理方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8112402B2 (en) * 2007-02-26 2012-02-07 Microsoft Corporation Automatic disambiguation based on a reference resource
US9443005B2 (en) * 2012-12-14 2016-09-13 Instaknow.Com, Inc. Systems and methods for natural language processing
CN106997399A (zh) * 2017-05-24 2017-08-01 海南大学 一种基于数据图谱、信息图谱、知识图谱和智慧图谱关联架构的分类问答***设计方法
US20200073996A1 (en) * 2018-08-28 2020-03-05 Stitched.IO Limited Methods and Systems for Domain-Specific Disambiguation of Acronyms or Homonyms
CN111368548A (zh) * 2018-12-07 2020-07-03 北京京东尚科信息技术有限公司 语义识别方法及装置、电子设备和计算机可读存储介质
CN110633366B (zh) * 2019-07-31 2022-12-16 国家计算机网络与信息安全管理中心 一种短文本分类方法、装置和存储介质
CN110704641B (zh) * 2019-10-11 2023-04-07 零犀(北京)科技有限公司 一种万级意图分类方法、装置、存储介质及电子设备
CN111538844B (zh) * 2020-03-20 2022-03-25 华为技术有限公司 目标领域知识库的生成、问题解答方法及装置
CN111723188A (zh) * 2020-06-23 2020-09-29 宁波富万信息科技有限公司 用于问答***的基于人工智能的语句显示方法、电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160124939A1 (en) * 2014-10-31 2016-05-05 International Business Machines Corporation Disambiguation in mention detection
KR20190094078A (ko) * 2018-01-17 2019-08-12 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. 모호한 엔티티 단어에 기반한 텍스트 처리 방법과 장치
CN110969022A (zh) * 2018-09-29 2020-04-07 北京国双科技有限公司 语义确定方法及相关设备
CN111651570A (zh) * 2020-05-13 2020-09-11 深圳追一科技有限公司 文本语句处理方法、装置、电子设备以及存储介质
CN112232085A (zh) * 2020-10-15 2021-01-15 海南大学 面向本质计算与推理的跨dikw模态文本歧义处理方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LEI YUXIAO, DUAN YUCONG: "Personality Classification and Conversion Method of Virtual Community Personnel Based on DIKW Graph", JOURNAL OF APPLIED SCIENCES - YINGYONG KEXUE XUEBAO, SHANGHAI, CN, vol. 38, no. 5, 1 September 2020 (2020-09-01), CN , pages 803 - 824, XP055921314, ISSN: 0255-8297, DOI: 10.3969/j.issn.0255-8297.2020.05.011 *

Also Published As

Publication number Publication date
CN112232085B (zh) 2021-10-08
CN112232085A (zh) 2021-01-15

Similar Documents

Publication Publication Date Title
WO2022078145A1 (zh) 面向本质计算与推理的跨dikw模态文本歧义处理方法
WO2021000676A1 (zh) 问答方法、问答装置、计算机设备及存储介质
KR102564144B1 (ko) 텍스트 관련도를 확정하기 위한 방법, 장치, 기기 및 매체
US10963794B2 (en) Concept analysis operations utilizing accelerators
US20160171095A1 (en) Identifying and Displaying Relationships Between Candidate Answers
US20180349355A1 (en) Artificial Intelligence Based Method and Apparatus for Constructing Comment Graph
US9336485B2 (en) Determining answers in a question/answer system when answer is not contained in corpus
JP2957702B2 (ja) 関係データベーススキーマを生成する意味オブジェクトモデリングシステム
US8126915B2 (en) Expanding the scope of an annotation to an entity level
CN107180045B (zh) 一种互联网文本蕴含地理实体关系的抽取方法
CN111324752B (zh) 基于图神经网络结构建模的图像与文本检索方法
US20090182723A1 (en) Ranking search results using author extraction
CN112232082B (zh) 面向本质计算的多模态dikw内容多语义分析方法
US20180285448A1 (en) Producing personalized selection of applications for presentation on web-based interface
KR101441219B1 (ko) 정보 엔터티들의 자동 연관
Cambazoglu et al. An intent taxonomy for questions asked in web search
Khalid et al. Supporting scholarly search by query expansion and citation analysis
Gasparetti Discovering prerequisite relations from educational documents through word embeddings
Faisal et al. A novel framework for social web forums’ thread ranking based on semantics and post quality features
He et al. Sentiment classification technology based on Markov logic networks
CN115878761B (zh) 事件脉络生成方法、设备及介质
Lu et al. A novel approach towards large scale cross-media retrieval
US9305103B2 (en) Method or system for semantic categorization
US20130138480A1 (en) Method and apparatus for exploring and selecting data sources
Bramer Inducer: a public domain workbench for data mining

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21879191

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21879191

Country of ref document: EP

Kind code of ref document: A1