CN111259659B

CN111259659B - Information processing method and device

Info

Publication number: CN111259659B
Application number: CN202010034694.6A
Authority: CN
Inventors: 李千; 王赵煜; 史亚冰; 梁海金; 蒋烨; 张扬; 朱勇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-01-14
Filing date: 2020-01-14
Publication date: 2023-07-04
Anticipated expiration: 2040-01-14
Also published as: CN111259659A

Abstract

The embodiment of the application discloses an information processing method and device. One embodiment of the method comprises the following steps: acquiring word combinations to be processed, wherein the word combinations to be processed comprise an entity and attributes of the entity; determining a knowledge type corresponding to the word combination to be processed in a preset structured data set, and determining an attribute belonging to the knowledge type as a candidate attribute, wherein the candidate attribute comprises at least two; and determining the candidate attribute corresponding to the attribute in the phrase to be processed based on the similarity between at least two candidate attributes and the phrase to be processed. According to the method and the device for the word combination, the candidate attribute corresponding to the attribute in the word combination can be rapidly and accurately determined in the preset structured data set, so that automatic association of the strange word combination to the structured data can be facilitated, manpower consumption is avoided, and association efficiency and accuracy are improved.

Description

Information processing method and device

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of Internet, and particularly relates to an information processing method and device.

Background

With the development of internet technology, massive amounts of information are produced daily on the internet. The information is various in sources and contents, and how to collect and sort the information is a problem to be solved.

Because of the flexibility of use of the vocabulary, the same vocabulary may have multiple uses in different scenarios, and thus, it is often necessary to manually sort the collected vocabulary.

Disclosure of Invention

The embodiment of the application provides an information processing method and device.

In a first aspect, an embodiment of the present application provides an information processing method, including: acquiring word combinations to be processed, wherein the word combinations to be processed comprise an entity and attributes of the entity; determining a knowledge type corresponding to the word combination to be processed in a preset structured data set, and determining an attribute belonging to the knowledge type as a candidate attribute, wherein the candidate attribute comprises at least two; and determining the candidate attribute corresponding to the attribute in the word combination to be processed based on the similarity between the at least two candidate attributes and the word combination to be processed.

In some embodiments, the phrase to be processed further includes an attribute value associated with the attribute; in a preset structured data set, determining a knowledge type corresponding to the phrase to be processed, including: in the preset structured data set, determining the knowledge type of the concept of the entity and the knowledge type of the concept of the attribute value, wherein the knowledge type of the entity and the knowledge type of the attribute value are at least one.

In some embodiments, determining, in the preset structured data set, a knowledge type corresponding to the phrase to be processed includes: performing upper processing on the entity to obtain an upper word of the entity; and determining a knowledge type corresponding to the superword of the entity in the preset structured data set, and taking the knowledge type as the knowledge type corresponding to the word combination to be processed.

In some embodiments, the phrase to be processed further includes an attribute value associated with the attribute; the method further comprises the steps of: performing upper processing on the attribute values to obtain upper words of the attribute values; and determining a knowledge type corresponding to the hypernym of the entity, and taking the knowledge type as the knowledge type corresponding to the word combination to be processed, wherein the method comprises the following steps: determining a knowledge type corresponding to the upper word of the entity, and determining a knowledge type corresponding to the upper word of the attribute value; and taking the knowledge type corresponding to the upper word of the entity and the knowledge type corresponding to the upper word of the attribute value as the knowledge type corresponding to the phrase to be processed.

In some embodiments, before determining the candidate attribute corresponding to the attribute in the word combination to be processed based on the similarity between the at least two candidate attributes and the word combination to be processed, the method further includes: determining, for each of at least two of an entity, an attribute, and an attribute value in the phrase to be processed, a feature of the same, wherein the feature of each of the at least two includes at least two; fusing the features of at least two of the words, and taking the fusion result as the feature of the word combination to be processed; and determining a candidate attribute corresponding to the attribute in the word combination to be processed based on the similarity between the at least two candidate attributes and the word combination to be processed, including: ordering the similarity between the features of the word combination to be processed and the features of the at least two candidate attributes; and taking the candidate attribute corresponding to the highest similarity in the obtained similarity sequence as the candidate attribute corresponding to the attribute in the word combination to be processed.

In some embodiments, ordering the similarity between the features of the word combination to be processed and the features of the at least two candidate attributes includes: and inputting the feature of the word combination to be processed and the features of the at least two candidate attributes into a pre-trained ranking model to rank the similarity between the feature of the word combination to be processed and the features of the at least two candidate attributes through the pre-trained ranking model.

In some embodiments, the method further comprises: for each of the entities, attributes, and attribute values in the phrase to be processed, determining a feature of the same, wherein the feature of each of the at least one includes a fusion feature of the jaccard feature and the bag of words feature; fusing the features of at least two to obtain target fusion features; determining the similarity between the target fusion feature and the feature of each determined candidate attribute; and selecting a preset number or a preset proportion of candidate attributes from the determined candidate attributes according to the sequence of the similarity from large to small, and taking the preset number or the preset proportion of candidate attributes as at least two candidate attributes.

In some embodiments, the pre-trained ranking model may be trained by: the method comprises the steps that a sample set is obtained, wherein the sample set comprises a positive sample and a negative sample, the positive sample comprises a positive sample word combination and an attribute sample, the negative sample comprises a negative sample word combination and an attribute sample, and the similarity of the characteristics of the positive sample word combination and the characteristics of the attribute sample is larger than that of the characteristics of the negative sample word combination and the characteristics of the attribute sample; inputting a sample sequence consisting of a plurality of samples in a sample set into a sequencing model to be trained, and predicting a sequencing result of similarity among features in samples of the sample sequence; training a sequencing model to be trained based on the predicted sequencing result to obtain a pre-trained sequencing model.

In some embodiments, obtaining a sample set includes: taking the word combination which belongs to the knowledge type and corresponds to the target attribute in the preset structured data set as a positive sample word combination, wherein the target attribute is a candidate attribute corresponding to the attribute in the word combination to be processed; and combining the word which belongs to the knowledge type and does not correspond to the target attribute in the preset structured data set as a negative-sample word.

In a second aspect, an embodiment of the present application provides an information processing apparatus, including: an acquisition unit configured to acquire a word combination to be processed, wherein the word combination to be processed includes an entity, and an attribute of the entity; a candidate determining unit configured to determine a knowledge type corresponding to the word combination to be processed in a preset structured data set, and determine an attribute belonging to the knowledge type as a candidate attribute, wherein the candidate attribute includes at least two; and the attribute determining unit is configured to determine the candidate attribute corresponding to the attribute in the word combination to be processed based on the similarity between at least two candidate attributes and the word combination to be processed.

In some embodiments, the phrase to be processed further includes an attribute value associated with the attribute; the candidate determining unit is configured to determine a knowledge type corresponding to the to-be-processed word combination in the preset structured data set according to the following mode: in the preset structured data set, determining the knowledge type of the concept of the entity and the knowledge type of the concept of the attribute value, wherein the knowledge type of the entity and the knowledge type of the attribute value are at least one.

In some embodiments, the candidate determining unit is configured to determine, in the preset structured data set, a knowledge type corresponding to the phrase to be processed in the following manner: performing upper processing on the entity to obtain an upper word of the entity; and determining a knowledge type corresponding to the superword of the entity in the preset structured data set, and taking the knowledge type as the knowledge type corresponding to the word combination to be processed.

In some embodiments, the phrase to be processed further includes an attribute value associated with the attribute; the apparatus further comprises: the upper unit is configured to perform upper processing on the attribute values to obtain upper words of the attribute values; and a candidate determining unit configured to perform determining a knowledge type corresponding to the hypernym of the entity as follows, and regarding the knowledge type as a knowledge type corresponding to the phrase to be processed: determining a knowledge type corresponding to the upper word of the entity, and determining a knowledge type corresponding to the upper word of the attribute value; and taking the knowledge type corresponding to the upper word of the entity and the knowledge type corresponding to the upper word of the attribute value as the knowledge type corresponding to the phrase to be processed.

In some embodiments, the apparatus further comprises: a feature determining unit configured to determine, for each of at least two of an entity, an attribute, and an attribute value in the word combination to be processed, a feature of each of the at least two of the candidate attributes and the word combination to be processed before determining the candidate attribute corresponding to the attribute in the word combination to be processed based on a similarity of the at least two of the candidate attributes and the word combination to be processed, wherein the feature of each of the at least two includes at least two of the at least two; a fusion unit configured to fuse the features of each of the at least two, and taking the fusion result as the feature of the phrase to be processed; and the attribute determining unit is further configured to determine a candidate attribute corresponding to an attribute in the word combination to be processed based on the similarity between at least two candidate attributes and the word combination to be processed according to the following manner: sorting the similarity between the feature of the phrase to be processed and the feature of at least two candidate attributes; and taking the candidate attribute corresponding to the highest similarity in the obtained similarity sequence as the candidate attribute corresponding to the attribute in the word combination to be processed.

In some embodiments, the attribute determination unit is further configured to perform the ranking of the similarity between the feature of the phrase to be processed and the feature of at least two of the candidate attributes as follows: and inputting the feature of the word combination to be processed and the features of at least two candidate attributes into a pre-trained ranking model to rank the similarity between the feature of the word combination to be processed and the features of the at least two candidate attributes through the pre-trained ranking model.

In some embodiments, the apparatus further comprises: a first determining unit configured to determine, for each of at least one of an entity, an attribute, and an attribute value in the word combination to be processed, a feature of the same, wherein the feature of each of the at least one includes a fusion feature of a jaccard feature and a bag of words feature; the target determining unit is configured to fuse the characteristics of each of the at least two to obtain target fusion characteristics; a similarity determination unit configured to determine a similarity between the target fusion feature and the feature of each of the determined candidate attributes; and the selecting unit is configured to select a preset number or a preset proportion of candidate attributes from the determined candidate attributes according to the sequence of the similarity from large to small, and the candidate attributes are used as the at least two candidate attributes.

In some embodiments, the pre-trained ranking model may be trained by: the method comprises the steps that a sample set is obtained, wherein the sample set comprises a positive sample and a negative sample, the positive sample comprises a positive sample word combination and an attribute sample, the negative sample comprises a negative sample word combination and an attribute sample, and the similarity of the characteristics of the positive sample word combination and the characteristics of the attribute sample is larger than that of the characteristics of the negative sample word combination and the characteristics of the attribute sample; inputting a sample sequence consisting of a plurality of samples in the sample set into a sequencing model to be trained, and predicting a sequencing result of similarity between features in samples of the sample sequence; and training the sequencing model to be trained based on the predicted sequencing result to obtain the pre-trained sequencing model.

In some embodiments, the acquiring a sample set includes: taking the word combination which belongs to the knowledge type and corresponds to the target attribute in the preset structured data set as the positive sample word combination, wherein the target attribute is a candidate attribute corresponding to the attribute in the word combination to be processed; and combining the word which belongs to the knowledge type and does not correspond to the target attribute in the preset structured data set as the negative sample word.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; and a storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method as in any of the information processing methods.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as in any of the embodiments of the information processing method.

The information processing scheme provided by the embodiment of the application acquires word combinations to be processed, wherein the word combinations to be processed comprise an entity and attributes of the entity; determining a knowledge type corresponding to the word combination to be processed in a preset structured data set, and determining an attribute belonging to the knowledge type as a candidate attribute, wherein the candidate attribute comprises at least two; and determining the candidate attribute corresponding to the attribute in the word combination to be processed based on the similarity between at least two candidate attributes and the word combination to be processed. According to the method and the device for the word combination, the candidate attribute corresponding to the attribute in the word combination can be rapidly and accurately determined in the preset structured data set, so that automatic association of the strange word combination to the structured data can be facilitated, manpower consumption is avoided, and association efficiency and accuracy are improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings, in which:

FIG. 1 is an exemplary system architecture diagram in which some embodiments of the present application may be applied;

FIG. 2 is a flow chart of one embodiment of an information processing method according to the present application;

FIG. 3 is a schematic diagram of an application scenario of an information processing method according to the present application;

FIG. 4 is a flow chart of yet another embodiment of an information processing method according to the present application;

FIG. 5 is a schematic structural view of an embodiment of an information processing apparatus according to the present application;

FIG. 6 is a schematic diagram of a computer system suitable for use in implementing some embodiments of the electronic device of the present application.

Detailed Description

The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the information processing methods or information processing apparatuses of the present application may be applied.

As shown in fig. 1, a system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the

terminal devices

101, 102, 103 to receive or send messages or the like. Various communication client applications, such as video-type applications, live applications, instant messaging tools, mailbox clients, social platform software, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal devices

101, 102, 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smartphones, tablets, electronic book readers, laptop and desktop computers, and the like. When the

terminal devices

101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., multiple software or software modules for providing distributed services) or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for the

terminal devices

101, 102, 103. The background server may analyze and process the received data such as the word combination, and feed back the processing result (for example, the candidate attribute corresponding to the attribute in the word combination) to the terminal device.

It should be noted that the information processing method provided in the embodiment of the present application may be executed by the server 105 or the

terminal devices

101, 102, 103, and accordingly, the information processing apparatus may be provided in the server 105 or the

terminal devices

101, 102, 103.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 is shown according to one embodiment of the information processing method of the present application. The information processing method comprises the following steps:

step 201, a word combination to be processed is obtained, wherein the word combination to be processed includes an entity and an attribute of the entity.

In this embodiment, the execution subject of the information processing method (e.g., the server or the terminal device shown in fig. 1) may acquire the phrase to be processed. The word combination here may comprise at least two words, e.g. may comprise an entity, e.g. the entity may be a person's name. Furthermore, the phrase may also include attributes of the entity.

Step 202, determining a knowledge type corresponding to the word combination to be processed in a preset structured data set, and determining an attribute belonging to the knowledge type as a candidate attribute, wherein the candidate attribute comprises at least two.

In this embodiment, the execution body may determine, in a preset structured data set, a knowledge type (type) corresponding to a word combination to be processed. The knowledge type is a superordinate concept, for example, may be a superordinate concept of an entity. The execution body can determine the knowledge type corresponding to the word combination in various modes. For example, the executing entity may obtain an ID (Identity) of the entity word in the phrase, and determine a knowledge type to which the ID belongs. In addition, the execution body can determine the superword of the entity, determine the knowledge type of the synonym of the superword, and take the knowledge type as the knowledge type corresponding to the word combination to be processed. In the preset structured data set, there are several attributes under each knowledge type that belong to that knowledge type.

The preset structured data set is a data set conforming to a preset constraint (schema), for example, the preset constraint may be a word combination formed by constraining three words to be connected, and the first and last words are names of people. For example, the word combination is Zhang San-wife-Liu Ying.

For example, an entity is a name of a person, and knowledge types include "singer", "person", "thing". The attributes belonging to the "persona" knowledge type, i.e. the attributes under the knowledge type, include "name", "wife", "blood group", "age", "gender". Attributes belonging to the "things" knowledge type include "names", "attributes", and so forth.

Step 203, determining a candidate attribute corresponding to the attribute in the word combination to be processed based on the similarity between the at least two candidate attributes and the word combination to be processed.

In this embodiment, the execution body may determine the candidate attribute corresponding to the attribute in the word combination based on the word combination to be processed and the similarity between each candidate attribute in the at least two candidate attributes. In practice, the execution entity may determine the candidate attribute corresponding to the attribute in the word combination in various manners. For example, the execution body may sort the similarity between each candidate attribute and the word combination to be processed, and use the candidate attribute corresponding to the highest similarity as the candidate attribute corresponding to the attribute in the word combination to be processed.

Specifically, the similarity between the candidate attribute and the phrase to be processed may be a similarity between the feature of the candidate attribute and the feature of the phrase to be processed. The features of the word combinations herein may include features of attributes in the word combinations, features of attribute values in the word combinations, features of entities, and the like.

And the execution main body determines the candidate attribute corresponding to the attribute, namely, establishes the mapping relation between the attribute and the candidate attribute. In this way, the execution subject may use the association relationship between the candidate attribute and other words in the preset structured data set as the association relationship of the attribute, and may also use the knowledge type to which the candidate attribute belongs as the knowledge type of the attribute.

Optionally, after step 203, the executing body may compare the similarity of the candidate attribute corresponding to the attribute in the word combination to be processed with a preset similarity threshold. If the similarity threshold is greater than or equal to the similarity threshold, the candidate attribute corresponding to the attribute in the word combination to be processed can be associated to the preset structured data set, and can also be associated to the knowledge graph. If the candidate attribute is smaller than the similarity threshold, discarding the candidate attribute.

The method provided by the embodiment of the invention can rapidly and accurately determine the candidate attribute corresponding to the attribute in the word combination in the preset structured data set, thereby being beneficial to realizing automatic association of the strange word combination into the structured data, avoiding the consumption of manpower and improving the association efficiency and accuracy.

In some optional implementations of this embodiment, the phrase to be processed further includes an attribute value associated with the attribute; step 202 may include: in the preset structured data set, determining the knowledge type of the concept of the entity and the knowledge type of the concept of the attribute value, wherein the knowledge type of the entity and the knowledge type of the attribute value are at least one.

In these alternative implementations, the execution body may determine the knowledge type directly from a preset structured data set. In particular, the word combinations may be SPO triples, i.e. entities, attributes, attribute values (Subject-predicte-objects). In the preset structured data set, there is at least one level of knowledge type. For example, knowledge types for three levels may include: "singer", "person", "thing". The execution body can determine the concept of the entity, namely paraphrasing, and determine the knowledge type of the concept in a preset structured data set. In addition, the execution body may determine a concept of the attribute value, and determine a knowledge type to which the concept belongs. Specifically, in the preset structured data set, there are a plurality of concepts belonging to each knowledge type, where a concept may be a concept of an entity, a concept of an attribute value, and so on.

The execution body may combine knowledge types determined for both the entity and the attribute value as the knowledge type corresponding to the word to be processed.

The implementation modes can accurately determine the knowledge type directly from the preset structured numerical value set.

In some alternative implementations of the present embodiment, step 202 may include: performing upper processing on the entity to obtain an upper word of the entity; and determining a knowledge type corresponding to the superword of the entity in the preset structured data set, and taking the knowledge type as the knowledge type corresponding to the word combination to be processed.

In these alternative implementations, the execution body may perform a superordinate process on the entity, so as to obtain a superordinate word of the entity. And then, the executing body can determine the knowledge type corresponding to the superword and takes the knowledge type as the knowledge type corresponding to the phrase to be processed. Specifically, performing upper processing on the original word means performing upper generalization to obtain an upper word with a larger range than the original word and a higher generalization.

The execution body may determine the knowledge type corresponding to the hypernym of the entity in various manners. For example, the execution body may simply use, as the knowledge type corresponding to the hypernym of the entity, the knowledge type completely consistent with the hypernym among the knowledge types in the preset structured data set. For another example, the execution body may further use a knowledge type with a word sense similarity greater than a preset value, for example, 95%, as the knowledge type corresponding to the hypernym of the entity. In addition, the execution body may first determine a knowledge type (first one) completely consistent with the hypernym, determine a knowledge type (second one) with a semantic similarity greater than a preset threshold value, and use both (first one and second one) as the knowledge type corresponding to the hypernym of the entity.

The implementation modes can more comprehensively determine the knowledge type through the upper words, and the recall rate of determining the knowledge type is improved.

In some optional application scenarios of these implementations, the phrase to be processed also includes an attribute value of the attribute; the method may further include: performing upper processing on the attribute values to obtain upper words of the attribute values; and determining a knowledge type corresponding to the superword of the entity in the implementation modes, and taking the knowledge type as the knowledge type corresponding to the phrase to be processed, wherein the method can comprise the following steps: determining a knowledge type corresponding to the upper word of the entity, and determining a knowledge type corresponding to the upper word of the attribute value; and taking the knowledge type corresponding to the upper word of the entity and the knowledge type corresponding to the upper word of the attribute value as the knowledge type corresponding to the phrase to be processed.

In these application scenarios, the execution body may perform not only upper-level processing on the entity, but also upper-level processing on the attribute value, so as to obtain an upper word of the attribute value. And determining the knowledge type corresponding to the hypernym of the attribute value by adopting a mode of determining the knowledge type corresponding to the hypernym of the entity. Then, the executing body may use the knowledge type corresponding to the superword of the entity and the knowledge type corresponding to the superword of the attribute value as the knowledge types corresponding to the word to be processed.

The application scenes can determine the upper words of the attribute values, so that the attribute values are utilized on the basis of utilizing the entities, and the recall rate for determining the knowledge types is further enlarged.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the information processing method according to the present embodiment. In the application scenario of fig. 3, the executing body 301 may acquire a phrase "Zhang San-Fu" 302 to be processed, where the phrase to be processed includes an entity "Zhang San", and an attribute "Fu" of the entity. The execution subject 301 determines, in a preset structured data set, knowledge types "singer", "person" 303 corresponding to the word combination to be processed, and determines attributes belonging to the knowledge types as candidate attributes 304, wherein the candidate attributes include 25. The execution subject 301 determines a candidate attribute "wife" 305 corresponding to the attribute "wife" in the word combination to be processed, based on the similarity between the candidate attribute and the word combination to be processed.

With further reference to fig. 4, a flow 400 of yet another embodiment of an information processing method is shown. The flow 400 of the information processing method includes the steps of:

Step 401, obtaining a word combination to be processed, wherein the word combination to be processed includes an entity and an attribute of the entity.

In this embodiment, the execution subject of the information processing method (e.g., the server or the terminal device shown in fig. 1) may acquire the phrase to be processed. The word combination herein may comprise a plurality of words, for example, may comprise an entity, for example, the entity may be a person's name. Furthermore, the phrase may also include attributes of the entity.

Step 402, determining a knowledge type corresponding to the word combination to be processed in a preset structured data set, and determining an attribute belonging to the knowledge type as a candidate attribute, wherein the candidate attribute comprises at least two.

In this embodiment, the executing body may determine, in a preset structured data set, a knowledge type corresponding to a word combination to be processed. The execution body can determine the knowledge type corresponding to the word combination in various modes, and determine at least two candidate attributes.

Step 403, for each of at least two of the entity, attribute, and attribute value in the word combination to be processed, determining a feature of the same, wherein the feature of each of the at least two comprises at least two.

In this embodiment, the execution body may determine the characteristics of various word combinations. The feature here may be a feature from which the common substring is removed. Specifically, the features may include at least one of: the method comprises the steps of obtaining a Jaccard (jaccard) feature, a word bag (bag of words, bow) feature, a feature obtained by word embedding (ebedding) processing of a Generalized Regression Neural Network (GRNN) model and a feature obtained by word embedding (ebedding) by using a skip gram model.

In practice, the feature of an entity, attribute or attribute value may be at least one of: features obtained by general word segmentation, features obtained after word segmentation (the granularity of the word segmentation is smaller than that of the general word segmentation) and features of hypernyms are refined. Furthermore, the characteristics of the word combination may also include co-occurrence characteristics between the entity and the attribute value, i.e. the entity and the attribute value in the word combination, and the Jacader characteristics of the entity and the attribute value in the determined knowledge type.

And step 404, fusing the features of at least two of the words, and taking the fusion result as the feature of the word combination to be processed.

In this embodiment, the execution body may determine the characteristics of various word combinations and fuse the characteristics. In particular, the features of the words are vectors, and thus, the fusion between features may be directed to the concatenation of quantities.

In practice, a variety of features may be determined for each, such as the Jacquard feature and the bag of words feature. The execution body may perform weighted averaging of the various features by using weights preset for the features of any one, and may use the weighted average result as the feature of the one.

Step 405 ranks the similarity between the features of the word combination to be processed and the features of the at least two candidate attributes.

In this embodiment, the execution body may determine the similarity between the feature of the phrase to be processed and the feature of each candidate attribute, and rank the similarities to obtain the similarity sequence.

In some alternative implementations of the present embodiment, step 405 may include: and inputting the feature of the word combination to be processed and the features of the at least two candidate attributes into a pre-trained ranking model to rank the similarity between the feature of the word combination to be processed and the features of the at least two candidate attributes through the pre-trained ranking model.

In these alternative implementations, the execution subject may implement the output ordered respective similarities through a pre-trained ordering model. The ranking model here may be an LTR (learning to rank) model. Specifically, the execution body may input a sequence to the ranking model. Each element in the sequence includes a feature of the word combination to be processed, and a feature of a candidate attribute. The characteristics of the candidate attributes included by different elements are different. The execution body may determine a similarity of the feature of the word combination in each element to the feature of the candidate attribute using a ranking model. And then, the execution main body ranks the similarity by using a ranking model and acquires a similarity sequence.

These implementations may utilize a ranking model to accurately rank the similarity.

In some alternative application scenarios of these implementations, the pre-trained ranking model may be trained by: the method comprises the steps that a sample set is obtained, wherein the sample set comprises a positive sample and a negative sample, the positive sample comprises a positive sample word combination and an attribute sample, the negative sample comprises a negative sample word combination and an attribute sample, and the similarity of the characteristics of the positive sample word combination and the characteristics of the attribute sample is larger than that of the characteristics of the negative sample word combination and the characteristics of the attribute sample; inputting a sample sequence consisting of a plurality of samples in a sample set into a sequencing model to be trained, and predicting a sequencing result of similarity among features in samples of the sample sequence; training a sequencing model to be trained based on the predicted sequencing result to obtain a pre-trained sequencing model.

In these optional application scenarios, the execution body may input samples in the sample set into the ranking model to be trained, so as to predict the similarity between the features in each input sample, and predict the order between the similarities. The samples entered at any one time may include positive samples and/or negative samples. The samples in the sample set may be input in one or more times, each of which may input a sample sequence comprising a plurality of samples.

Specifically, the features within each sample herein refer to the features of the positive sample word combination and the features of the attribute sample, or the features of the negative sample word combination and the features of the attribute sample.

In practice, the similarity between the features of the positive sample word combination and the features of the attribute sample may be greater than a similarity threshold. The similarity between the features of the negative sample word combination and the features of the attribute sample may be less than the similarity threshold, and may be less than another similarity threshold. The value of the other similarity threshold here is smaller than the similarity threshold described above.

The application scenes can train the sequencing model by utilizing positive samples and negative samples, so that an accurate sequencing model is obtained.

Optionally, the acquiring the sample set in the application scenario may include: taking the word combination which belongs to the knowledge type and corresponds to the target attribute in the preset structured data set as a positive sample word combination, wherein the target attribute is a candidate attribute corresponding to the attribute in the word combination to be processed; and combining the word which belongs to the knowledge type and does not correspond to the target attribute in the preset structured data set as a negative-sample word.

In these alternative application scenarios, the word combinations that have been associated to the preset structured data set, belonging to the above-mentioned knowledge type and that correspond to the target attribute, may be combined as positive sample words. In addition, the execution subject may also combine the word groups which belong to the knowledge type and do not correspond to the target attribute into a negative-sample word combination. Specifically, the preset structured data set includes word combinations corresponding to each attribute, and word combinations not corresponding to the attribute also exist.

The number of the samples can be increased by combining the phrases in the preset structured data set into the samples, so that the sequencing model obtained by training is more accurate.

Step 406, taking the candidate attribute corresponding to the highest similarity in the obtained similarity sequence as the candidate attribute corresponding to the attribute in the word combination to be processed.

In this embodiment, the execution body may use the candidate attribute with the highest similarity as the candidate attribute corresponding to the attribute in the word combination to be processed.

According to the embodiment, the word combinations can be more accurately represented by determining the various characteristics of at least two of the word combinations, so that the accuracy of determining the similarity is improved.

In some optional implementations of this embodiment, the method may further include: for each of the entities, attributes, and attribute values in the phrase to be processed, determining a feature of the same, wherein the feature of each of the at least one includes a fusion feature of the jaccard feature and the bag of words feature; fusing the features of at least two to obtain target fusion features; determining the similarity between the target fusion feature and the feature of each determined candidate attribute; and selecting a preset number or a preset proportion of candidate attributes from the determined candidate attributes according to the sequence of the corresponding similarity from large to small, and taking the preset number or the preset proportion of candidate attributes as at least two candidate attributes.

In these alternative implementations, the execution body may perform screening of candidate attributes before step 403, so as to screen the at least two candidate attributes. Specifically, the execution subject may determine, for each of at least one of the entity, the attribute, and the attribute value, a plurality of features of the one, including a jaccard feature and a bag of words feature.

The execution body may obtain the similarity sequence after sorting the similarity. In this way, the execution body may determine, from the sequence of similarities, a preset number or a preset proportion of similarities in order from large to small, and use candidate attributes corresponding to the determined similarities as at least two candidate attributes in step 402 and step 405.

These implementations may first perform a preliminary screening of candidate attributes to improve the efficiency of determining candidate attributes corresponding to the phrase combination.

With further reference to fig. 5, as an implementation of the method shown in the foregoing figures, the present application provides an embodiment of an information processing apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the embodiment of the apparatus may further include the same or corresponding features or effects as the embodiment of the method shown in fig. 2, except for the features described below. The device can be applied to various electronic equipment.

As shown in fig. 5, the information processing apparatus 500 of the present embodiment includes: an acquisition unit 501, a candidate determination unit 502, and an attribute determination unit 503. Wherein the obtaining unit 501 is configured to obtain a word combination to be processed, where the word combination to be processed includes an entity, and an attribute of the entity; a candidate determining unit 502 configured to determine, in a preset structured data set, a knowledge type corresponding to the to-be-processed phrase, and determine an attribute belonging to the knowledge type as a candidate attribute, where the candidate attribute includes at least two; the attribute determining unit 503 is configured to determine a candidate attribute corresponding to an attribute in the word combination to be processed based on the similarity between at least two candidate attributes and the word combination to be processed.

In some embodiments, the acquisition unit 501 of the information processing apparatus 500 may acquire the phrase to be processed. The word combination here may comprise at least two words, e.g. may comprise an entity, e.g. the entity may be a person's name. Furthermore, the phrase may also include attributes of the entity.

In some embodiments, the candidate determining unit 502 may determine, in a preset structured data set, a knowledge type corresponding to the word combination to be processed. The knowledge type is a superordinate concept, for example, may be a superordinate concept of an entity. The execution body can determine the knowledge type corresponding to the word combination in various modes.

In some embodiments, the attribute determining unit 503 may determine, based on the word combination to be processed and the similarity between each candidate attribute in the at least two candidate attributes, a candidate attribute corresponding to the attribute in the word combination. In practice, the execution entity may determine the candidate attribute corresponding to the attribute in the word combination in various manners.

In some optional implementations of this embodiment, the phrase to be processed further includes an attribute value associated with the attribute; the candidate determining unit is configured to perform the determining, in a preset structured data set, a knowledge type corresponding to the phrase to be processed in the following manner: and determining the knowledge type of the concept of the entity and the knowledge type of the concept of the attribute value in a preset structured data set, wherein the knowledge type of the entity and the knowledge type of the attribute value are at least one.

In some optional implementations of this embodiment, the candidate determining unit is configured to perform the determining, in a preset structured data set, a knowledge type corresponding to the phrase to be processed in the following manner: performing upper processing on the entity to obtain an upper word of the entity; and determining a knowledge type corresponding to the hypernym of the entity in a preset structured data set, and taking the knowledge type as the knowledge type corresponding to the phrase to be processed.

In some optional implementations of this embodiment, the phrase to be processed further includes an attribute value associated with the attribute; the apparatus further comprises: the upper unit is configured to perform upper processing on the attribute values to obtain upper words of the attribute values; and the candidate determining unit is configured to perform the determining of the knowledge type corresponding to the superword of the entity as follows, and take the knowledge type as the knowledge type corresponding to the phrase to be processed: determining a knowledge type corresponding to the hypernym of the entity, and determining the knowledge type corresponding to the hypernym of the attribute value; and taking the knowledge type corresponding to the superword of the entity and the knowledge type corresponding to the superword of the attribute value as the knowledge type corresponding to the word combination to be processed.

In some optional implementations of this embodiment, the apparatus further includes: a feature determining unit configured to determine, for each of at least two of an entity, an attribute, and an attribute value in the word combination to be processed, a feature of each of the at least two of the candidate attributes and the word combination to be processed before determining the candidate attribute corresponding to the attribute in the word combination to be processed based on a similarity of the at least two of the candidate attributes and the word combination to be processed, wherein the feature of each of the at least two includes at least two of the at least two; a fusion unit configured to fuse the features of each of the at least two, and taking the fusion result as the feature of the phrase to be processed; and the attribute determining unit is further configured to determine a candidate attribute corresponding to an attribute in the word combination to be processed based on the similarity between at least two candidate attributes and the word combination to be processed according to the following manner: sorting the similarity between the feature of the phrase to be processed and the feature of at least two candidate attributes; and taking the candidate attribute corresponding to the highest similarity in the obtained similarity sequence as the candidate attribute corresponding to the attribute in the word combination to be processed.

In some optional implementations of the present embodiment, the attribute determining unit is further configured to perform the ranking of the similarity between the feature of the phrase to be processed and the feature of at least two of the candidate attributes as follows: and inputting the feature of the word combination to be processed and the features of at least two candidate attributes into a pre-trained ranking model to rank the similarity between the feature of the word combination to be processed and the features of the at least two candidate attributes through the pre-trained ranking model.

In some optional implementations of this embodiment, the apparatus further includes: a first determining unit configured to determine, for each of at least one of an entity, an attribute, and an attribute value in the word combination to be processed, a feature of the same, wherein the feature of each of the at least one includes a fusion feature of a jaccard feature and a bag of words feature; the target determining unit is configured to fuse the characteristics of each of the at least two to obtain target fusion characteristics; a similarity determination unit configured to determine a similarity between the target fusion feature and the feature of each of the determined candidate attributes; and the selecting unit is configured to select a preset number or a preset proportion of candidate attributes from the determined candidate attributes according to the sequence of the similarity from large to small, and the candidate attributes are used as the at least two candidate attributes.

In some optional implementations of the present embodiment, the pre-trained ranking model may be trained by: the method comprises the steps that a sample set is obtained, wherein the sample set comprises a positive sample and a negative sample, the positive sample comprises a positive sample word combination and an attribute sample, the negative sample comprises a negative sample word combination and an attribute sample, and the similarity of the characteristics of the positive sample word combination and the characteristics of the attribute sample is larger than that of the characteristics of the negative sample word combination and the characteristics of the attribute sample; inputting a sample sequence consisting of a plurality of samples in the sample set into a sequencing model to be trained, and predicting a sequencing result of similarity between features in samples of the sample sequence; and training the sequencing model to be trained based on the predicted sequencing result to obtain the pre-trained sequencing model.

In some optional implementations of this embodiment, the acquiring a sample set includes: taking the word combination which belongs to the knowledge type and corresponds to the target attribute in the preset structured data set as the positive sample word combination, wherein the target attribute is a candidate attribute corresponding to the attribute in the word combination to be processed; and combining the word which belongs to the knowledge type and does not correspond to the target attribute in the preset structured data set as the negative sample word.

As shown in fig. 6, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 shows an electronic device 600 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 6 may represent one device or a plurality of devices as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing means 601. It should be noted that, the computer readable medium according to the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In an embodiment of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Whereas in embodiments of the present disclosure, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware. The described units may also be provided in a processor, for example, described as: a processor includes an acquisition unit, a candidate determination unit, and an attribute determination unit. The names of these units do not constitute a limitation on the unit itself in some cases, and for example, the acquisition unit may also be described as "a unit that acquires a phrase to be processed".

As another aspect, the present application also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring word combinations to be processed, wherein the word combinations to be processed comprise an entity and attributes of the entity; determining a knowledge type corresponding to the word combination to be processed in a preset structured data set, and determining an attribute belonging to the knowledge type as a candidate attribute, wherein the candidate attribute comprises at least two; and determining the candidate attribute corresponding to the attribute in the word combination to be processed based on the similarity between at least two candidate attributes and the word combination to be processed.

The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the invention referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or equivalents thereof is possible without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.

Claims

1. An information processing method, the method comprising:

acquiring word combinations to be processed, wherein the word combinations to be processed comprise an entity and attributes of the entity;

determining a knowledge type corresponding to the combination of the words to be processed in a preset structured data set, and determining an attribute belonging to the knowledge type as a candidate attribute, wherein the candidate attribute comprises at least two pieces of the preset structured data set, the preset structured data set is a data set conforming to preset constraint, the preset constraint is constraint on words in the combination of the words, the knowledge type is an upper concept of the entity, and at least one level of knowledge type exists in the preset structured data set, and each knowledge type has an attribute belonging to the knowledge type;

and determining the candidate attribute corresponding to the attribute in the word combination to be processed based on the similarity between at least two candidate attributes and the word combination to be processed.

2. The method of claim 1, wherein the phrase to be processed further comprises an attribute value associated with the attribute;

determining the knowledge type corresponding to the phrase to be processed in the preset structured data set, including:

And determining the knowledge type of the concept of the entity and the knowledge type of the concept of the attribute value in a preset structured data set, wherein the knowledge type of the entity and the knowledge type of the attribute value are at least one.

3. The method of claim 1, wherein the determining, in the preset structured data set, a knowledge type corresponding to the phrase to be processed includes:

performing upper processing on the entity to obtain an upper word of the entity;

and determining a knowledge type corresponding to the hypernym of the entity in a preset structured data set, and taking the knowledge type as the knowledge type corresponding to the phrase to be processed.

4. A method according to claim 3, wherein the phrase to be processed further comprises an attribute value associated with the attribute;

the method further comprises the steps of:

performing upper processing on the attribute values to obtain upper words of the attribute values; and

the determining the knowledge type corresponding to the hypernym of the entity, and taking the knowledge type as the knowledge type corresponding to the phrase to be processed comprises the following steps:

determining a knowledge type corresponding to the hypernym of the entity, and determining the knowledge type corresponding to the hypernym of the attribute value;

And taking the knowledge type corresponding to the superword of the entity and the knowledge type corresponding to the superword of the attribute value as the knowledge type corresponding to the word combination to be processed.

5. The method according to claim 2 or 4, wherein before determining the candidate attribute corresponding to the attribute in the word combination to be processed based on the similarity between at least two of the candidate attributes and the word combination to be processed, the method further comprises:

determining, for each of at least two of an entity, an attribute, and an attribute value in the word combination to be processed, a feature of the same, wherein the feature of each of the at least two includes at least two;

fusing the features of each of the at least two, and taking the fusion result as the feature of the phrase to be processed; and

the determining the candidate attribute corresponding to the attribute in the word combination to be processed based on the similarity between at least two candidate attributes and the word combination to be processed comprises the following steps:

sorting the similarity between the feature of the phrase to be processed and the feature of at least two candidate attributes;

And taking the candidate attribute corresponding to the highest similarity in the obtained similarity sequence as the candidate attribute corresponding to the attribute in the word combination to be processed.

6. The method of claim 5, wherein the ordering the similarity between the feature of the phrase to be processed and the feature of at least two of the candidate attributes comprises:

and inputting the feature of the word combination to be processed and the features of at least two candidate attributes into a pre-trained ranking model to rank the similarity between the feature of the word combination to be processed and the features of the at least two candidate attributes through the pre-trained ranking model.

7. The method of claim 5, wherein the method further comprises:

for each of the entities, attributes, and attribute values in the word combinations to be processed, determining features of the same, wherein the features of each of the at least one include a fusion feature of a jaccard feature and a bag of words feature;

fusing the characteristics of each of the at least two to obtain target fusion characteristics;

determining the similarity between the target fusion feature and the feature of each determined candidate attribute;

And selecting a preset number or a preset proportion of candidate attributes from the determined candidate attributes according to the sequence of the similarity from large to small, and taking the preset number or the preset proportion of candidate attributes as the at least two candidate attributes.

8. The method of claim 6, wherein the pre-trained ranking model is trained by:

the method comprises the steps that a sample set is obtained, wherein the sample set comprises a positive sample and a negative sample, the positive sample comprises a positive sample word combination and an attribute sample, the negative sample comprises a negative sample word combination and an attribute sample, and the similarity of the characteristics of the positive sample word combination and the characteristics of the attribute sample is larger than that of the characteristics of the negative sample word combination and the characteristics of the attribute sample;

inputting a sample sequence consisting of a plurality of samples in the sample set into a sequencing model to be trained, and predicting a sequencing result of similarity between features in samples of the sample sequence;

and training the sequencing model to be trained based on the predicted sequencing result to obtain the pre-trained sequencing model.

9. The method of claim 8, wherein the acquiring a sample set comprises:

taking the word combination which belongs to the knowledge type and corresponds to the target attribute in the preset structured data set as the positive sample word combination, wherein the target attribute is a candidate attribute corresponding to the attribute in the word combination to be processed;

And combining the word which belongs to the knowledge type and does not correspond to the target attribute in the preset structured data set as the negative sample word.

10. An information processing apparatus, the apparatus comprising:

an acquisition unit configured to acquire a word combination to be processed, wherein the word combination to be processed includes an entity, and an attribute of the entity;

a candidate determining unit configured to determine, in a preset structured data set, a knowledge type corresponding to the to-be-processed word combination, and determine an attribute belonging to the knowledge type as a candidate attribute, where the candidate attribute includes at least two preset structured data sets, the preset structured data set is a data set conforming to a preset constraint, the preset constraint is a constraint on a word in the word combination, the knowledge type is an upper concept of the entity, in the preset structured data set, at least one level of knowledge type exists, and each knowledge type exists an attribute belonging to the knowledge type;

and the attribute determining unit is configured to determine candidate attributes corresponding to the attributes in the word combination to be processed based on the similarity between at least two candidate attributes and the word combination to be processed.

11. The apparatus of claim 10, wherein the phrase to be processed further comprises an attribute value associated with the attribute;

the candidate determining unit is configured to perform the determining, in a preset structured data set, a knowledge type corresponding to the phrase to be processed in the following manner:

12. The apparatus according to claim 10, wherein the candidate determining unit is configured to perform the determining, in a preset structured data set, a knowledge type corresponding to the phrase to be processed in such a way that:

13. The apparatus of claim 12, wherein the phrase to be processed further comprises an attribute value associated with the attribute;

The apparatus further comprises:

the upper unit is configured to perform upper processing on the attribute values to obtain upper words of the attribute values; and

the candidate determining unit is configured to perform the determining of the knowledge type corresponding to the superword of the entity as follows, and take the knowledge type as the knowledge type corresponding to the phrase to be processed:

14. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-9.

15. A computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-9.