CN114201603A - Entity classification method, device, storage medium, processor and electronic device - Google Patents

Entity classification method, device, storage medium, processor and electronic device Download PDF

Info

Publication number
CN114201603A
CN114201603A CN202111301031.7A CN202111301031A CN114201603A CN 114201603 A CN114201603 A CN 114201603A CN 202111301031 A CN202111301031 A CN 202111301031A CN 114201603 A CN114201603 A CN 114201603A
Authority
CN
China
Prior art keywords
entity
classification
predicted
instances
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111301031.7A
Other languages
Chinese (zh)
Inventor
钱爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202111301031.7A priority Critical patent/CN114201603A/en
Publication of CN114201603A publication Critical patent/CN114201603A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an entity classification method, an entity classification device, a storage medium, a processor and an electronic device. Wherein, the method comprises the following steps: acquiring an entity to be predicted; constructing a plurality of instances by adopting an entity to be predicted and a plurality of relationship types, wherein each instance in the plurality of instances comprises: a text portion, a question portion and an answer portion; classifying each instance in the multiple instances to obtain multiple classification labels; an attribution category of the entity to be predicted is determined based on the plurality of classification tags. The method solves the technical problems of low entity risk auditing efficiency and low accuracy rate caused by variable risk types in the entity risk classification process and complex and difficult judgment basis of the entity risk types, thereby accurately identifying the risk types of different knowledge entities in the automatic production flow of knowledge map knowledge and effectively improving the efficiency and the accuracy of entity risk classification.

Description

Entity classification method, device, storage medium, processor and electronic device
Technical Field
The invention relates to the field of natural language processing, in particular to an entity classification method, an entity classification device, a storage medium, a processor and an electronic device.
Background
Entity classification techniques generally refer to techniques for classifying entities in text form into specified categories based on a series of features. In the automatic production process of the knowledge graph, entity risk classification is a key link for ensuring that the risk knowledge graph is closely linked with a risk identification application scene. A large number of knowledge entities exist in the risk knowledge graph, different risk types of the entities are accurately identified and recommended to different risk auditing scenes to assist operators in performing risk auditing, knowledge blind areas of the operators can be supplemented, and auditing efficiency and auditing accuracy are improved.
The main points of distinction between entity risk classification and traditional entity classification include: first, the risk categories of entities in the risk identification scenario in the entity risk classification are not completely fixed, and may change dynamically in stages with the change of the target task. Secondly, the risk type of the entity in the entity risk classification cannot be directly judged through the display information of the entity name, and needs to be judged by combining a large number of entities and the background knowledge of the entity type.
Therefore, how to establish a new entity risk classification method based on the two difference points to be applied to an actual scene and generate an actual application value becomes a key problem in the entity classification technology in the field of natural language processing. In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides an entity classification method, an entity classification device, a storage medium, a processor and an electronic device, which at least solve the technical problems of low entity risk auditing efficiency and low accuracy rate caused by variable risk types in an entity risk classification process and complex and difficult entity risk type judgment basis, so that the risk types of different knowledge entities are accurately identified in an automatic production process of knowledge graph knowledge, and the efficiency and the accuracy of entity risk classification are effectively improved.
According to an aspect of an embodiment of the present invention, there is provided an entity classification method, including: acquiring an entity to be predicted; constructing a plurality of instances by adopting an entity to be predicted and a plurality of relationship types, wherein each instance in the plurality of instances comprises: a text portion, a question portion and an answer portion; classifying each instance in the multiple instances to obtain multiple classification labels; an attribution category of the entity to be predicted is determined based on the plurality of classification tags.
According to another aspect of the embodiments of the present invention, there is also provided an entity classification method, including: receiving an entity to be predicted from a client; the method comprises the steps of constructing a plurality of examples by adopting an entity to be predicted and a plurality of relation types, classifying each example in the plurality of examples to obtain a plurality of classification labels, and determining the attribution type of the entity to be predicted based on the plurality of classification labels, wherein each example in the plurality of examples comprises the following steps: a text portion, a question portion and an answer portion; and feeding back the attribution type of the entity to be predicted to the client.
According to another aspect of the embodiments of the present invention, there is also provided an entity classification method, including: acquiring a knowledge entity to be predicted from a knowledge graph; building a plurality of instances using the knowledge entity and a plurality of relationship types, wherein each instance in the plurality of instances comprises: a text portion, a question portion and an answer portion; classifying each instance in the multiple instances to obtain multiple classification labels; an entity type of the knowledge entity is determined based on the plurality of classification tags.
According to another aspect of the embodiments of the present invention, there is also provided an entity classification apparatus, including: the acquisition module is used for acquiring an entity to be predicted; a construction module, configured to construct a plurality of instances using the entity to be predicted and a plurality of relationship types, where each of the plurality of instances includes: a text portion, a question portion and an answer portion; the processing module is used for carrying out classification processing on each example in the multiple examples to obtain multiple classification labels; and the classification module is used for determining the attribution type of the entity to be predicted based on the plurality of classification labels.
According to another aspect of the embodiments of the present invention, there is also provided a storage medium, wherein the storage medium includes a stored program, and when the program runs, the apparatus on which the storage medium is located is controlled to execute any one of the entity classification methods described above.
According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes any one of the entity classification methods described above.
According to another aspect of the embodiments of the present invention, there is also provided an electronic apparatus, including: a processor; and a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring an entity to be predicted; constructing a plurality of instances by adopting an entity to be predicted and a plurality of relationship types, wherein each instance in the plurality of instances comprises: a text portion, a question portion and an answer portion; classifying each instance in the multiple instances to obtain multiple classification labels; an attribution category of the entity to be predicted is determined based on the plurality of classification tags.
In the embodiment of the invention, the entity to be predicted is obtained; constructing a plurality of instances by adopting an entity to be predicted and a plurality of relationship types, wherein each instance in the plurality of instances comprises: a text portion, a question portion and an answer portion; classifying each instance in the multiple instances to obtain multiple classification labels; and determining the attribution type of the entity to be predicted based on the plurality of classification labels.
It is easy to note that, a plurality of classification labels are obtained by classifying a plurality of instances constructed by an entity to be predicted and a plurality of relation types in advance, and the attribution type of the entity to be predicted is determined based on the classification labels, so that the purposes of providing a more appropriate risk type and performing more accurate classification processing on the entity to be predicted are achieved, the technical effects of improving the efficiency and the accuracy of entity risk classification are achieved, the technical problems of low entity risk auditing efficiency and low accuracy caused by the fact that the risk types are variable in the entity risk classification process and the entity risk type judgment is complicated and difficult are solved, and therefore, in the automatic production process of knowledge graph knowledge, the risk types of different knowledge entities are accurately identified, and the efficiency and the accuracy of entity risk classification are effectively improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a block diagram of a hardware structure of a computer terminal (or mobile device) for implementing an entity classification method according to the prior art;
FIG. 2 is a flow chart of a method of entity classification according to an embodiment of the present invention;
FIG. 3 is a flow diagram of an alternative entity risk classification according to an embodiment of the invention;
FIG. 4 is a schematic diagram of an alternative example of constructing a question-answer in accordance with embodiments of the present invention;
FIG. 5 is a block diagram of an alternative reading understanding model for entity classification according to an embodiment of the present invention;
FIG. 6 is a flow chart of an alternative entity classification method according to an embodiment of the invention;
fig. 7 is a diagram illustrating an alternative classification of entities at a cloud server according to an embodiment of the present invention;
FIG. 8 is a flow diagram of another alternative entity classification method according to an embodiment of the invention;
FIG. 9 is a schematic structural diagram of an entity classification apparatus according to an embodiment of the present invention;
fig. 10 is a block diagram of another configuration of a computer terminal according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:
entity: an entity in natural language refers to an object or concept that exists in the objective world and can be distinguished from each other, which appears in a sentence.
Transform-based Bidirectional encoding Representation (Bert) model: the Bert Model was proposed in 2019 by devin et al, and it uses a transmomer encoder to encode bidirectional context information, and adopts a multi-task Model to consider semantic information of different granularities, i.e. it is respectively applicable to two methods, namely mask Language Model (Masked Language Model) and Next Sentence Prediction (Next sequence Prediction), to capture word and Sentence level feature descriptions. The Bert model is mainly divided into two stages: a Pre-training phase (Pre-training) and a Fine-tuning phase (Fine-tuning). In the pre-training phase, the Bert pre-trained language model trains two subtasks based on a large amount of unlabeled data. In the fine tuning stage, the Bert downstream task model initializes the parameters by using the result of the Bert pre-training language model, and then fine tunes the model through the marked data. At present, the Bert model has been widely used in various natural language processing tasks to learn semantic representations of training words.
Knowledge graph: a knowledge graph is a structured semantic knowledge base used to quickly describe concepts and their interrelationships in the physical world. The knowledge graph obtains simple and clear entity, relation and entity triples by effectively processing, processing and integrating data of the complicated and intricate documents, and finally, quick response and reasoning on knowledge are realized by aggregating a large amount of knowledge. Due to the strong semantic processing capability and the open interconnection capability, the knowledge graph is widely applied to the fields of intelligent search, intelligent question answering, personalized recommendation, information analysis, fraud prevention and the like.
Example 1
There is also provided, in accordance with an embodiment of the present invention, an embodiment of an example classification model, to note that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
The method provided by the embodiment 1 of the present application can be executed in a mobile terminal, a computer terminal or a similar computing device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing the entity classification method. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission module 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).
The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the entity classification method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, so as to implement the entity classification method. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).
It should be noted here that in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.
Under the above operating environment, the present application provides an entity classification method as shown in fig. 2. Fig. 2 is a flowchart of an entity classification method according to an embodiment of the present invention, as shown in fig. 2, the entity classification method includes:
step S202, obtaining an entity to be predicted;
step S204, a plurality of examples are constructed by adopting the entity to be predicted and a plurality of relation types, wherein each example in the plurality of examples comprises: a text portion, a question portion and an answer portion;
step S206, classifying each instance in the multiple instances to obtain multiple classification labels;
step S208, determining the attribution type of the entity to be predicted based on the plurality of classification labels.
It is easy to note that, a plurality of classification labels are obtained by classifying a plurality of instances constructed by an entity to be predicted and a plurality of relationship types in advance, and the attribution type of the entity to be predicted is determined based on the classification labels, so that the purposes of providing a more appropriate risk type and performing more accurate classification processing on the entity to be predicted are achieved, the technical effects of improving the efficiency and the accuracy of entity risk classification are achieved, the technical problems of low entity risk auditing efficiency and low accuracy caused by the fact that the risk types are variable in the entity risk classification process and the entity risk type judgment is complicated and difficult are solved, and therefore, in the automatic production process of knowledge map knowledge, the risk types of different knowledge entities are accurately identified, and the efficiency and the accuracy of entity risk classification are effectively improved.
Optionally, the entity classification method provided in the embodiment of the present application may be, but is not limited to, applied to entity risk classification, where when the method is used in an entity risk classification task, the classification tags are various risk type tags, and the final classification result is determination of an actual attribution risk category of an entity to be predicted. By adopting the entity classification method in the embodiment of the application, the corresponding classification label can be determined according to the actual situation, and a more accurate entity classification result can be obtained.
Alternatively, the entity to be predicted may be objects or concepts which exist in the objective world and are distinguishable from each other, the profile of the entity to be predicted may be in a text format, and the attribute of the entity to be predicted may be text, a picture, a video, or the like. The same entity appearing in different sentence contexts may have different meanings and belong to different semantic types, so the classification of the entity has important significance in practical application. By obtaining the brief introduction and the attribute of the entity to be predicted, the entity classification method in the embodiment of the application is adopted to classify the entity.
Alternatively, the relationship type may be a type profile and a relationship attribute of an existing type, for example, a relationship attribute of a certain relationship type is "computer hardware", and a type profile is "computer hardware" which refers to a generic name of various physical devices in a computer system, which are self-called by electronic, mechanical and optoelectronic elements, etc., and which form an organic whole to provide a material basis for computer software to operate according to the requirements of the system structure ". Learning the relationship type may help the machine determine whether an entity belongs to the relationship type.
Optionally, the above example may refer to an example actually existing in the objective world, and one example may include a plurality of parts, and in this embodiment, the example includes a text part, a question part, and an answer part. The method in the embodiment converts the entity classification task into the question-answer instance to be input into the machine for reading, understanding and classifying, can realize the analysis and mining of mass entity data and the information extraction, and quickly and accurately classify the entities by means of the neural network,
in an alternative embodiment, in step S204, a plurality of instances are constructed by using the entity to be predicted and a plurality of relationship types, including the following method steps:
step S241, obtaining the entity brief introduction and the entity attribute of the entity to be predicted;
step S242, obtaining a type profile and a relationship attribute of each relationship type in the plurality of relationship types;
step S243, constructing the multiple instances by using the entity profiles, the entity attributes, the type profiles, and the relationship attributes.
Optionally, the entity to be predicted is an entity in a text form containing a series of characteristics, and when the entity is obtained, the entity profile and the entity attribute of the entity can be obtained by using the existing resources; the plurality of relation types are existing, and the type brief introduction and the relation attribute of the relation type can be obtained by utilizing the existing resources; obtaining the entity profile, the entity attributes, the type profile and the relationship attributes to construct an instance, wherein each instance comprises: a text portion, a question portion, and an answer portion.
Fig. 3 is a flowchart of an optional entity risk classification according to an embodiment of the present invention, and as shown in fig. 3, in an actual application of the entity risk classification, there is an entity "entity a" in a text form to be predicted, and a true entity risk type of the entity "entity a" is "risk type 01", according to this optional embodiment, the process of the entity risk classification for the entity a "is as follows:
and preprocessing the entity to obtain entity information and entity type information.
The method comprises the steps of obtaining an entity A to be predicted, and obtaining an entity brief introduction and an entity attribute of the entity A by utilizing a preset type resource library, wherein the entity brief introduction is a text containing basic information and brief introduction of the entity A and is marked as the entity brief introduction A, and the entity attribute is the text.
Determining 30 existing entity risk types which are respectively 'risk type 01' to 'risk type 30', and acquiring a type brief introduction and a relationship attribute of each entity risk type in the 30 entity risk types by using resources such as wiki encyclopedia and the like, wherein the type brief introduction of 'risk type 01' is basic information containing 'risk type 01' and a briefly introduced text which is marked as 'type brief introduction 01', and the relationship attribute of 'risk type 01' is 'keyword 01'; the type profile of the "risk type 30" is a text containing basic information and brief introduction of the "risk type 30" and is marked as the "type profile 30", and the relationship attribute of the "risk type 30" is the "keyword 30".
Constructing question-answer examples, and constructing a plurality of examples based on the acquired entity profile A, the attribute text of the entity A and the type profiles of 30 entity risk types, which are respectively marked as a risk type 01 to a risk type 30 and a relationship attribute key 01 to a key 30 of the 30 entity risk types.
In an alternative embodiment, in step S243, constructing the plurality of instances using the entity profile, the entity attributes, the type profile, and the relationship attributes includes the following method steps:
step S2431, adopting the entity brief introduction and the type brief introduction to construct the text part;
step S2432, adopting the entity attribute and the relationship attribute to construct the question part;
and step S2433, constructing the answer part by adopting a plurality of preset options.
Optionally, for each instance, the text part is formed by splicing an entity brief description text to be predicted and a type brief description text; for each example, the question part is a question constructed based on the entity attribute to be predicted and the relationship attribute; for each example, the answer part is constructed by a plurality of preset options.
Still taking the entity risk classification of the entity a in actual use as an example, fig. 4 is a schematic diagram of an optional question and answer constructing example according to the embodiment of the present invention, as shown in fig. 4, a plurality of examples are constructed based on the obtained "entity profile a", the attribute "text" of the entity a, the type profiles "risk types 01" to "risk types 30" of the entity risk types, and the relationship attributes "keyword 01" to "keyword 30" of the 30 entity risk types, and the specific construction steps refer to steps S2431 to S2433:
construction example 01:
the article section: "entity profile A", "type profile 01".
Problem part: is the text "entity A" belong to "keyword 01"?
And an answer part: yes/no.
Construction example 02:
the article section: "entity profile A", "type profile 02".
Problem part: is the text "entity A" belong to "keyword 02"?
And an answer part: yes/no.
……
Construction example 30:
the article section: "entity profile A", "type profile 30".
Problem part: is the text "entity A" belong to "keyword 30"?
And an answer part: yes/no.
In an optional embodiment, in step S206, the classifying process is performed on each of the multiple instances to obtain the multiple classification tags, including the following steps:
step S261, classifying each of the multiple instances by using a reading understanding model to obtain the multiple classification labels, where the reading understanding model is obtained by deep learning training using a data set, and the data set is determined by the multiple instances.
Optionally, the multiple instance determination data sets are used for deep learning training to obtain a reading understanding model, and the reading understanding model is used for performing classification processing on each instance of the multiple instances to obtain the multiple classification labels. Because the reading understanding model for classifying the multiple instances is obtained through deep learning training based on the data set determined by the multiple instances in the scheme, the applicability of the multiple classification labels to the multiple instances is necessarily ensured, and the problem of inaccurate classification caused by dynamic change of entity classes in the traditional classification method is avoided.
In an optional embodiment, the entity classification method further includes the following method steps:
step S210, dividing the plurality of instances into a first partial instance and a second partial instance, wherein the first partial instance is a positive instance of the plurality of instances, and the second partial instance is a negative instance of the plurality of instances;
step S212, based on the similarity degree of the types of the first part of examples and the second part of examples, carrying out negative example sampling processing with different proportions to obtain a negative example sampling result, wherein the similarity degree of the types is in direct proportion to the negative example sampling proportion;
step S214, determining the data set by using the first partial example and partial sampling results in the negative example sampling results;
and S216, training by using the data set as training data to obtain a reading understanding model.
Optionally, a positive example in the multiple examples is divided into a first part example, and a negative example is divided into a second part example, where the positive example is an example in which the answer part is a positive option, and the negative example is an example in which the answer part is a negative option; sampling the negative examples according to the similarity degree of the types of a certain negative example and the current positive example, if the similarity degree of the types of the certain negative example and the current positive example is large, sampling the certain negative example in a large proportion, and if the similarity degree of the types of the certain negative example and the current positive example is small, sampling the certain negative example in a small proportion; dividing the data set into a test set and a training set, wherein the test set is not processed, and the training set is constructed in the following way: adding all positive examples into a training set, extracting parts from the negative examples and adding the parts into the training set to finish sampling; and training based on the training set to obtain the reading understanding model.
It should be noted that by adopting the sampling method in the scheme, a data set with uniformly distributed difficult samples and simple samples can be formed, so that more challenging negative examples are generated, and the model can learn the differences among different classes.
Still taking the entity risk classification of entity A in actual use as an example, as shown in FIG. 3, the 30 instances constructed above are divided into positive and negative instances, wherein the positive instance is an instance belonging to the "keyword 01" and the negative instance is an instance not belonging to the "keyword 02". Adding all positive examples into a training set, judging the similarity degree of each negative example and the type of the current positive example, and further determining the sampling proportion of each negative example, wherein the basis for determining the sampling proportion is as follows: the sampling proportion of the negative example with the similarity degree with the current positive example is large, and the sampling proportion of the negative example with the low similarity degree with the current positive example is small. And extracting parts from the negative examples and adding the parts into the training set to obtain the training set for learning the training model.
Specifically, for example, a type attribute word "keyword 01 a", "keyword 01 b", or "keyword 01 c" similar to the meaning of "keyword 01" is determined, and a negative example in the above negative examples belongs to at least one of "keyword 01 a", "keyword 01 b", and "keyword 01 c", and the negative example is considered to have a high degree of similarity to the type of the current positive example, and a high sampling scale factor is determined for the negative example, otherwise, the negative example is considered to have a low degree of similarity to the type of the forward positive example, and a low sampling scale factor is determined for the negative example. All instances belonging to "keyword 01" are added to the training set D, and 5 instances are extracted from the negative examples and added to the training set D.
And generating corresponding question and answer examples by using the rules for all the examples to obtain corresponding data sets D1-D30, and summarizing the data sets and then disordering the sequence to be used as a training set Ds. And training through a Bert model based on the finally formed training set Ds to obtain a reading understanding model corresponding to the question-answer instance.
In an alternative embodiment, the reading understanding model comprises: the entity classification method comprises the following steps:
step S218, in the input layer, converting a text portion in the data set into a text sequence, converting a problem portion in the data set into a problem sequence, and concatenating the text sequence and the problem sequence into a target sequence, where the target sequence is an input sequence of the coding layer;
step S220, in the coding layer, performing coding processing on the target sequence, and outputting a target vector, where the target vector is used to indicate a correlation between a text portion in the data set and a question portion in the data set;
step S222, in the output layer, the target vector outputs entity type probability distribution through a full-connection network.
Optionally, the reading understanding model may include: an input layer, an encoding layer, and an output layer. The encoder used in the encoding layer may pre-train the language model for Bert.
Wherein the input layer is configured to generate a target sequence based on the data set, and to serve as an input sequence for the coding layer, and the target sequence is generated by: acquiring a text part in the data set and converting the text part into a text sequence; acquiring a problem part in the data set and converting the problem part into a problem sequence; and splicing the text sequence and the problem sequence to obtain the target sequence.
Wherein the coding layer is configured to generate a target vector based on the target sequence and serve as an input vector of the output layer, and the target vector is generated by: and acquiring the target sequence, coding the target sequence, and obtaining and outputting a target vector, wherein the target vector is used for representing the relevance between the text part in the data set and the problem part in the data set.
The output layer is used for calculating and outputting entity type probability distribution based on the target vector, wherein the calculation process can be realized through a full-connection network.
Alternatively, fig. 5 is a schematic structural diagram of an alternative reading understanding model for entity classification according to an embodiment of the present invention; as shown in fig. 5, the reading understanding model includes an input layer, an encoding layer, and an output layer:
1) input layer
As shown in fig. 5, the input data of the input layer is the data set composed of all positive examples and partial negative examples, the data set includes a text part and a question part, and the input layer of the reading understanding model performs the following operations: obtaining a text part in the data set and converting the text part into a text Sequence (Document Sequence), wherein D is { D ═ D { (D)1,...,di,...,dnIn which d isiThe value range of i is 1-n (n is the length of the sentence); the problem part in the data set is obtained and transformed into a problem Sequence (Question Sequence), denoted Q ═ Q { (Q)1,...,qj,...,qnWherein q isjThe value range of j is 1-m (m is the length of the question sequence); and splicing the text sequence D obtained by conversion with the problem sequence Q to obtain a target sequence X ═ CLS, D1,...,di,...,dn,SEP,q1,...,qj,...,qnSEP, where the beginning of the target sequence is marked with "CLS" and the end and separation are marked with "SEP" as the input sequence in the Bert encoded layer standard format.
2) Coding layer
Still as shown in fig. 5, semantic information is learned using the Bert fine-tuning model, and specifically, in the learning process for each word, three target vectors are obtained: and carrying out encoding operation on the target vector to obtain an encoded target vector h and outputting the encoded target vector h, wherein the target vector h represents the relevance between the text part in the data set and the problem part in the data set.
h=Bert(X)
Wherein h ∈ RdAnd d is the Bert output unit size.
3) Output layer
Still as shown in fig. 5, the target vector h output by the coding layer is sent to the full-connection network and the entity type probability distribution P is calculated:
P=sigmoid(W·h+b)
wherein sigmoid is an activation function for a fully-connected network, and a coefficient W in the function belongs to Rd×nBias coefficient b ∈ RnThe return value of the sigmoid function is between 0 and 1, the return value of 0 represents that the entity is not related to the type, the return value of 1 represents that the entity is related to the type, and the return value of 0 to 1, the more the value isA value close to 0 represents a smaller entity-to-type correlation, and a value close to 1 represents a larger entity-to-type correlation).
In an alternative embodiment, in step S208, the determining the attribution category of the entity to be predicted based on the plurality of classification tags includes the following steps:
step S281, when the values of the plurality of classification labels are not all the first values, determining the attribution type of the entity to be predicted by using the classification label whose value is the second value in the plurality of classification labels;
step S282, when all the values of the plurality of classification tags are the first value, determining the attribution type of the entity to be predicted by using the classification tag with the highest prediction probability among the plurality of classification tags.
Optionally, the values of the plurality of classification labels are a first value or a second value, the first value may be set to 0, and the second value may be set to 1; when the values of the plurality of classification labels are not all 0, determining the attribution type of the entity to be predicted by using the classification label with the value of 1 in the plurality of classification labels; and when the values of the plurality of classification labels are all 0, determining the attribution type of the entity to be predicted by using the classification label with the highest prediction probability in the plurality of classification labels.
Still taking the entity risk classification of entity a in actual use as an example, as shown in fig. 3, in the post-processing stage, the entity a to be predicted is paired with all 30 entity types to construct the question-answering example, and the trained model prediction example is used to obtain 30 classification labels with values of 0 or 1: [1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1].
Obviously, the values of the 30 classification labels are not all 0, and at this time, the attribution type of the entity to be predicted is determined by using the classification label with the value of 1 in the 30 classification labels, that is, "entity a" belongs to "risk type 01" and "risk type 30".
If the entity to be predicted, namely entity A, is paired with all 30 entity types to construct the question-answering example, the trained model prediction example is used, and 30 classification labels with the values of 0 or 1 are obtained as follows: [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0].
At this time, when the values of the 30 classification labels are all 0, the classification label with the highest prediction probability in the 30 classification labels should be used to determine the attribution type of the entity to be predicted. If the corresponding 30 prediction probabilities are: [0.7620033634417748,0.1388555675800121,0.2596139521380908,0.4325283008530042,0.24238841649395962,0.4082061891682502,0.1781496170716906,0.10886773581501563,0.06036736362770878,0.3287696247872377,0.3198184198034486,0.2928648547591263,0.4324529417656455,0.29852280541357246,0.2595469739637038,0.4323834268278667,0.1476391270067722,0.04205311802982234,0.18918742155292834,0.06506514503916522,0.10705794383051248,0.16008469976844203,0.008171617462894354,0.17319397234972866,0.11130817377887536,0.478273845637069,0.34554487024967784,0.2703587190987574,0.27691884067223166,0.61653045891421438].
Wherein, the value with the maximum prediction probability is 0.7620033634417748, and corresponds to the risk type 01, and then the entity A is determined to belong to the risk type 01.
According to the embodiment of the application, the classification labels obtained by classifying the entity to be predicted and the multiple instances constructed by the multiple relation types in advance can be used, the attribution type of the entity to be predicted is determined based on the classification labels, the purposes of providing a more appropriate risk type and performing more accurate classification on the entity to be predicted are achieved, and therefore the technical effects of improving the efficiency and the accuracy of entity risk classification are achieved.
An embodiment of the present invention further provides an entity classification method, where the entity classification method is executed on a cloud server, fig. 6 is a flowchart of an optional entity classification method according to an embodiment of the present invention, and as shown in fig. 6, the entity classification method includes:
step S602, receiving an entity to be predicted from a client;
step S604, constructing multiple instances by using the entity to be predicted and multiple relationship types, classifying each of the multiple instances to obtain multiple classification tags, and determining an attribution type of the entity to be predicted based on the multiple classification tags, where each of the multiple instances includes: a text portion, a question portion and an answer portion;
step S606, the attribution type of the entity to be predicted is fed back to the client.
Optionally, fig. 7 is a schematic diagram of an optional entity classification performed on a cloud server according to an embodiment of the present invention, as shown in fig. 7, a client uploads an entity to be classified to the cloud server, the cloud server analyzes the entity to be classified by using an entity classification model, constructs multiple instances by using the entity to be predicted and multiple relationship types, performs classification processing on each instance of the multiple instances to obtain multiple classification tags, and determines an attribution type of the entity to be predicted based on the multiple classification tags, where each instance of the multiple instances includes: a text portion, a question portion, and an answer portion.
And then, the cloud server feeds back the classification result to the client, and the final classification result is displayed to the user through a graphical user interface of the client. The optional way of displaying the classification result on the graphical user interface has been described in the above embodiments, and is not described herein again.
It should be noted that the entity classification method provided in the embodiment of the present application may be, but is not limited to, applicable to an actual application scenario of entity risk classification, and the entity to be classified is analyzed by using an entity classification model through an interaction manner between the SaaS server and the client, so as to obtain an attribution type corresponding to the entity to be classified, and display a returned classification result on the client.
An embodiment of the present invention further provides an entity classification method, which may be used in an automatic production process of knowledge-graph knowledge, and fig. 8 is a flowchart of another alternative entity classification method according to an embodiment of the present invention, as shown in fig. 8, the entity classification method includes:
step S802, acquiring a knowledge entity to be predicted from a knowledge graph;
step S804, a plurality of examples are constructed by adopting the knowledge entity and a plurality of relation types, wherein each example in the plurality of examples comprises: a text portion, a question portion and an answer portion;
step S806, classifying each instance in the multiple instances to obtain multiple classification labels;
step S808, determining an entity type of the knowledge entity based on the plurality of classification tags.
In an alternative embodiment, in step S806, a classification process is performed on each of the multiple instances to obtain multiple classification tags, including the following steps:
step S8061, classifying each instance in the multiple instances by using a reading understanding model to obtain multiple classification labels, wherein the reading understanding model is obtained by deep learning training by using a data set, and the data set is determined by the multiple instances.
The knowledge entity to be predicted may be obtained from a knowledge graph, and the plurality of instances are constructed by using the knowledge entity and the plurality of relationship types, where each instance in the plurality of instances includes: a text portion, a question portion and an answer portion; classifying each of the multiple instances to obtain multiple classification labels; an entity type of the knowledge entity is determined based on the plurality of classification tags. The optional manner of classifying each of the multiple instances to obtain multiple classification tags has been described in the above embodiments, and is not described herein again.
In the automatic production process of knowledge map knowledge, the entity classification method provided by the embodiment of the invention can be used for accurately identifying the risk types of different knowledge entities, so that the efficiency and the accuracy of entity risk classification are effectively improved.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
According to an embodiment of the present invention, an embodiment of an apparatus for implementing the entity classification method is further provided, and fig. 9 is a schematic structural diagram of an entity classification apparatus according to an embodiment of the present invention, as shown in fig. 9, the entity classification apparatus includes: an obtaining module 901, a constructing module 902, a processing module 903, and a classifying module 904, wherein,
an obtaining module 901, configured to obtain an entity to be predicted; a constructing module 902, configured to construct a plurality of instances by using the entity to be predicted and a plurality of relationship types, where each of the plurality of instances includes: a text portion, a question portion and an answer portion; a processing module 903, configured to perform classification processing on each of the multiple instances to obtain multiple classification labels; a classification module 904, configured to determine an attribution category of the entity to be predicted based on the plurality of classification tags.
Optionally, the building module 902 is further configured to: acquiring the entity brief introduction and the entity attribute of the entity to be predicted; obtaining a type brief introduction and a relation attribute of each relation type in the plurality of relation types; the plurality of instances are constructed using the entity profiles, the entity attributes, the type profiles, and the relationship attributes.
Optionally, the building module 902 is further configured to: constructing the text portion using the entity profile and the type profile; constructing the problem part by adopting the entity attribute and the relationship attribute; and constructing the answer part by adopting a plurality of preset options.
Optionally, the processing module 903 is further configured to: and classifying each of the plurality of instances by using a reading understanding model to obtain the plurality of classification labels, wherein the reading understanding model is obtained by deep learning training by using a data set, and the data set is determined by the plurality of instances.
Optionally, the processing module 903 is further configured to: dividing the plurality of examples into a first part example and a second part example, wherein the first part example is a positive example in the plurality of examples, and the second part example is a negative example in the plurality of examples; carrying out negative example sampling processing with different proportions on the basis of the similarity degree of the types of the first part of examples and the second part of examples to obtain a negative example sampling result, wherein the similarity degree of the types is in direct proportion to the negative example sampling proportion; determining the data set by using the first partial example and partial sampling results in the negative example sampling results; and training by using the data set as training data to obtain the reading understanding model.
Optionally, the reading understanding model comprises: an input layer, an encoding layer, and an output layer, and the processing module 903 is further configured to: in the input layer, converting a text portion in the data set into a text sequence, converting a problem portion in the data set into a problem sequence, and splicing the text sequence and the problem sequence into a target sequence, wherein the target sequence is an input sequence of the coding layer; in the coding layer, coding the target sequence and outputting a target vector, wherein the target vector is used for representing the relevance between a text part in the data set and a problem part in the data set; in the output layer, the target vector outputs an entity type probability distribution through a full-connection network.
Optionally, the classification module 904 is further configured to: when the values of the plurality of classification labels are not all the first values, determining the attribution type of the entity to be predicted by using the classification label with the value of the second value in the plurality of classification labels; and when the values of the plurality of classification labels are all the first numerical values, determining the attribution type of the entity to be predicted by using the classification label with the highest prediction probability in the plurality of classification labels.
It should be noted here that the acquiring module 901, the constructing module 902, the processing module 903 and the classifying module 904 correspond to steps S202 to S208 in embodiment 1, and the four modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of the apparatus may be run in the computer terminal 10 provided in the first embodiment.
In the embodiment of the invention, the entity to be predicted is obtained; constructing a plurality of instances by adopting the entity to be predicted and a plurality of relationship types, wherein each instance in the plurality of instances comprises: a text portion, a question portion and an answer portion; classifying each example in the plurality of examples to obtain a plurality of classification labels; and determining the attribution type of the entity to be predicted based on the plurality of classification labels. It is easy to note that, a plurality of classification labels are obtained by classifying a plurality of instances constructed by an entity to be predicted and a plurality of relationship types in advance, and the attribution type of the entity to be predicted is determined based on the classification labels, so that the purposes of providing a more appropriate risk type and performing more accurate classification processing on the entity to be predicted are achieved, the technical effects of improving the efficiency and the accuracy of entity risk classification are achieved, the technical problems of low entity risk auditing efficiency and low accuracy caused by the fact that the risk types are variable in the entity risk classification process and the entity risk type judgment is complicated and difficult are solved, and therefore, in the automatic production process of knowledge map knowledge, the risk types of different knowledge entities are accurately identified, and the efficiency and the accuracy of entity risk classification are effectively improved.
It should be noted that, reference may be made to the relevant description in embodiment 1 for a preferred implementation of this embodiment, and details are not described here again.
Example 3
There is also provided, in accordance with an embodiment of the present invention, an embodiment of an electronic device, which may be any one of a group of computing devices. The electronic device includes: a processor and a memory, wherein:
a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring an entity to be predicted; constructing a plurality of instances by adopting the entity to be predicted and a plurality of relationship types, wherein each instance in the plurality of instances comprises: a text portion, a question portion and an answer portion; classifying each example in the plurality of examples to obtain a plurality of classification labels; and determining the attribution type of the entity to be predicted based on the plurality of classification labels.
In the embodiment of the invention, the entity to be predicted is obtained; constructing a plurality of instances by adopting the entity to be predicted and a plurality of relationship types, wherein each instance in the plurality of instances comprises: a text portion, a question portion and an answer portion; classifying each example in the plurality of examples to obtain a plurality of classification labels; and determining the attribution type of the entity to be predicted based on the plurality of classification labels.
It is easy to note that, a plurality of classification labels are obtained by classifying a plurality of instances constructed by an entity to be predicted and a plurality of relationship types in advance, and the attribution type of the entity to be predicted is determined based on the classification labels, so that the purposes of providing a more appropriate risk type and performing more accurate classification processing on the entity to be predicted are achieved, the technical effects of improving the efficiency and the accuracy of entity risk classification are achieved, the technical problems of low entity risk auditing efficiency and low accuracy caused by the fact that the risk types are variable in the entity risk classification process and the entity risk type judgment is complicated and difficult are solved, and therefore, in the automatic production process of knowledge map knowledge, the risk types of different knowledge entities are accurately identified, and the efficiency and the accuracy of entity risk classification are effectively improved.
It should be noted that, reference may be made to the relevant description in embodiment 1 for a preferred implementation of this embodiment, and details are not described here again.
Example 4
According to the embodiment of the invention, the embodiment of the computer terminal is also provided, and the computer terminal can be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.
Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.
In this embodiment, the computer terminal may execute the program code of the following steps in the entity classification method: acquiring an entity to be predicted; constructing a plurality of instances by adopting the entity to be predicted and a plurality of relationship types, wherein each instance in the plurality of instances comprises: a text portion, a question portion and an answer portion; classifying each example in the plurality of examples to obtain a plurality of classification labels; and determining the attribution type of the entity to be predicted based on the plurality of classification labels.
Optionally, fig. 10 is a block diagram of another computer terminal according to an embodiment of the present invention, and as shown in fig. 10, the computer terminal may include: one or more processors 122 (only one of which is shown), memory 124, and peripherals interface 126.
The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the entity classification method and apparatus in the embodiments of the present invention, and the processor executes various functional applications and data processing by operating the software programs and modules stored in the memory, so as to implement the entity classification method. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory located remotely from the processor, and these remote memories may be connected to the computer terminal through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring an entity to be predicted; constructing a plurality of instances by adopting the entity to be predicted and a plurality of relationship types, wherein each instance in the plurality of instances comprises: a text portion, a question portion and an answer portion; classifying each example in the plurality of examples to obtain a plurality of classification labels; and determining the attribution type of the entity to be predicted based on the plurality of classification labels.
Optionally, the processor may further execute the program code of the following steps: acquiring the entity brief introduction and the entity attribute of the entity to be predicted; obtaining a type brief introduction and a relation attribute of each relation type in the plurality of relation types; the plurality of instances are constructed using the entity profiles, the entity attributes, the type profiles, and the relationship attributes.
Optionally, the processor may further execute the program code of the following steps: constructing the plurality of instances of the type profile and the relationship attribute includes: constructing the text portion using the entity profile and the type profile; constructing the problem part by adopting the entity attribute and the relationship attribute; and constructing the answer part by adopting a plurality of preset options.
Optionally, the processor may further execute the program code of the following steps: and classifying each of the plurality of instances by using a reading understanding model to obtain the plurality of classification labels, wherein the reading understanding model is obtained by deep learning training by using a data set, and the data set is determined by the plurality of instances.
Optionally, the processor may further execute the program code of the following steps: dividing the plurality of examples into a first part example and a second part example, wherein the first part example is a positive example in the plurality of examples, and the second part example is a negative example in the plurality of examples; carrying out negative example sampling processing with different proportions on the basis of the similarity degree of the types of the first part of examples and the second part of examples to obtain a negative example sampling result, wherein the similarity degree of the types is in direct proportion to the negative example sampling proportion; determining the data set by using the first partial example and partial sampling results in the negative example sampling results; and training to obtain a reading understanding model by using the data set as training data.
Optionally, the processor may further execute the program code of the following steps: in the input layer, converting a text portion in the data set into a text sequence, converting a problem portion in the data set into a problem sequence, and splicing the text sequence and the problem sequence into a target sequence, wherein the target sequence is an input sequence of the coding layer; in the coding layer, coding the target sequence and outputting a target vector, wherein the target vector is used for representing the relevance between a text part in the data set and a problem part in the data set; in the output layer, the target vector outputs an entity type probability distribution through a full-connection network.
Optionally, the processor may further execute the program code of the following steps: when the values of the plurality of classification labels are not all the first values, determining the attribution type of the entity to be predicted by using the classification label with the value of the second value in the plurality of classification labels; and when the values of the plurality of classification labels are all the first numerical values, determining the attribution type of the entity to be predicted by using the classification label with the highest prediction probability in the plurality of classification labels.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: receiving an entity to be predicted from a client; the method comprises the steps of constructing a plurality of examples by adopting the entity to be predicted and a plurality of relation types, classifying each example in the plurality of examples to obtain a plurality of classification labels, and determining the attribution type of the entity to be predicted based on the plurality of classification labels, wherein each example in the plurality of examples comprises the following steps: a text portion, a question portion and an answer portion; and feeding back the attribution type of the entity to be predicted to the client.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring a knowledge entity to be predicted from a knowledge graph; building a plurality of instances using the knowledge entity and a plurality of relationship types, wherein each instance in the plurality of instances comprises: a text portion, a question portion and an answer portion; classifying each instance in the multiple instances to obtain multiple classification labels; an entity type of the knowledge entity is determined based on the plurality of classification tags.
Optionally, the processor may further execute the program code of the following steps: and classifying each instance in the multiple instances by using a reading understanding model to obtain multiple classification labels, wherein the reading understanding model is obtained by deep learning training by using a data set, and the data set is determined by the multiple instances.
The embodiment of the invention provides a scheme for entity classification. Obtaining an entity to be predicted; constructing a plurality of instances by adopting the entity to be predicted and a plurality of relationship types, wherein each instance in the plurality of instances comprises: a text portion, a question portion and an answer portion; classifying each example in the plurality of examples to obtain a plurality of classification labels; and determining the attribution type of the entity to be predicted based on the plurality of classification labels.
It is easy to note that, a plurality of classification labels are obtained by classifying a plurality of instances constructed by an entity to be predicted and a plurality of relationship types in advance, and the attribution type of the entity to be predicted is determined based on the classification labels, so that the purposes of providing a more appropriate risk type and performing more accurate classification processing on the entity to be predicted are achieved, the technical effects of improving the efficiency and the accuracy of entity risk classification are achieved, the technical problems of low entity risk auditing efficiency and low accuracy caused by the fact that the risk types are variable in the entity risk classification process and the entity risk type judgment is complicated and difficult are solved, and therefore, in the automatic production process of knowledge map knowledge, the risk types of different knowledge entities are accurately identified, and the efficiency and the accuracy of entity risk classification are effectively improved.
It can be understood by those skilled in the art that the structure shown in fig. 10 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 10 is a diagram illustrating a structure of the electronic device. For example, the computer terminal may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 10, or have a different configuration than shown in FIG. 10.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the computer-readable storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
Embodiments of a computer-readable storage medium are also provided according to embodiments of the present invention. Optionally, in this embodiment, the storage medium may be configured to store the program code executed by the entity classification method provided in embodiment 1.
Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring an entity to be predicted; constructing a plurality of instances by adopting the entity to be predicted and a plurality of relationship types, wherein each instance in the plurality of instances comprises: a text portion, a question portion and an answer portion; classifying each example in the plurality of examples to obtain a plurality of classification labels; and determining the attribution type of the entity to be predicted based on the plurality of classification labels.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring the entity brief introduction and the entity attribute of the entity to be predicted; obtaining a type brief introduction and a relation attribute of each relation type in the plurality of relation types; the plurality of instances are constructed using the entity profiles, the entity attributes, the type profiles, and the relationship attributes.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: constructing the plurality of instances of the type profile and the relationship attribute includes: constructing the text portion using the entity profile and the type profile; constructing the problem part by adopting the entity attribute and the relationship attribute; and constructing the answer part by adopting a plurality of preset options.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: and classifying each of the plurality of instances by using a reading understanding model to obtain the plurality of classification labels, wherein the reading understanding model is obtained by deep learning training by using a data set, and the data set is determined by the plurality of instances.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: dividing the plurality of examples into a first part example and a second part example, wherein the first part example is a positive example in the plurality of examples, and the second part example is a negative example in the plurality of examples; carrying out negative example sampling processing with different proportions on the basis of the similarity degree of the types of the first part of examples and the second part of examples to obtain a negative example sampling result, wherein the similarity degree of the types is in direct proportion to the negative example sampling proportion; determining the data set by using the first partial example and partial sampling results in the negative example sampling results; and training to obtain a reading understanding model by using the data set as training data.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: in the input layer, converting a text portion in the data set into a text sequence, converting a problem portion in the data set into a problem sequence, and splicing the text sequence and the problem sequence into a target sequence, wherein the target sequence is an input sequence of the coding layer; in the coding layer, coding the target sequence and outputting a target vector, wherein the target vector is used for representing the relevance between a text part in the data set and a problem part in the data set; in the output layer, the target vector outputs an entity type probability distribution through a full-connection network.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: when the values of the plurality of classification labels are not all the first values, determining the attribution type of the entity to be predicted by using the classification label with the value of the second value in the plurality of classification labels; and when the values of the plurality of classification labels are all the first numerical values, determining the attribution type of the entity to be predicted by using the classification label with the highest prediction probability in the plurality of classification labels.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: receiving an entity to be predicted from a client; the method comprises the steps of constructing a plurality of examples by adopting the entity to be predicted and a plurality of relation types, classifying each example in the plurality of examples to obtain a plurality of classification labels, and determining the attribution type of the entity to be predicted based on the plurality of classification labels, wherein each example in the plurality of examples comprises the following steps: a text portion, a question portion and an answer portion; and feeding back the attribution type of the entity to be predicted to the client.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring a knowledge entity to be predicted from a knowledge graph; building a plurality of instances using the knowledge entity and a plurality of relationship types, wherein each instance in the plurality of instances comprises: a text portion, a question portion and an answer portion; classifying each instance in the multiple instances to obtain multiple classification labels; an entity type of the knowledge entity is determined based on the plurality of classification tags.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: and classifying each instance in the multiple instances by using a reading understanding model to obtain multiple classification labels, wherein the reading understanding model is obtained by deep learning training by using a data set, and the data set is determined by the multiple instances.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above methods according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims (14)

1. An entity classification method, comprising:
acquiring an entity to be predicted;
constructing a plurality of instances by adopting the entity to be predicted and a plurality of relationship types, wherein each instance in the plurality of instances comprises: a text portion, a question portion and an answer portion;
classifying each instance in the multiple instances to obtain multiple classification labels;
determining a home category of the entity to be predicted based on the plurality of classification labels.
2. The entity classification method according to claim 1, wherein constructing the plurality of instances using the entity to be predicted and the plurality of relationship types comprises:
acquiring the entity brief introduction and the entity attribute of the entity to be predicted;
obtaining a type brief introduction and a relation attribute of each relation type in the plurality of relation types;
building the plurality of instances using the entity profile, the entity attributes, the type profile, and the relationship attributes.
3. The entity classification method of claim 2, wherein constructing the plurality of instances using the entity profile, the entity attributes, the type profile, and the relationship attributes comprises:
constructing the text portion using the entity profile and the type profile;
constructing the problem part by adopting the entity attribute and the relationship attribute;
and constructing the answer part by adopting a plurality of preset options.
4. The entity classification method according to claim 1, wherein the classifying each of the plurality of instances to obtain the plurality of classification labels comprises:
and classifying each instance in the multiple instances by using a reading understanding model to obtain the multiple classification labels, wherein the reading understanding model is obtained by deep learning training by using a data set, and the data set is determined by the multiple instances.
5. The entity classification method according to claim 4, further comprising:
dividing the plurality of instances into a first partial instance and a second partial instance, wherein the first partial instance is a positive instance of the plurality of instances and the second partial instance is a negative instance of the plurality of instances;
carrying out negative example sampling processing with different proportions on the basis of the similarity degree of the types of the first part of examples and the second part of examples to obtain a negative example sampling result, wherein the similarity degree of the types is in direct proportion to the negative example sampling proportion;
determining the data set by using the first partial example and a partial sampling result in the negative example sampling results;
and training to obtain a reading understanding model by using the data set as training data.
6. The entity classification method according to claim 5, wherein the reading understanding model comprises: the entity classification method further comprises the following steps:
in the input layer, converting a text part in the data set into a text sequence, converting a question part in the data set into a question sequence, and splicing the text sequence and the question sequence into a target sequence, wherein the target sequence is an input sequence of the coding layer;
in the coding layer, performing coding processing on the target sequence and outputting a target vector, wherein the target vector is used for representing the relevance between a text part in the data set and a question part in the data set;
in the output layer, the target vector outputs an entity type probability distribution over a fully connected network.
7. The entity classification method according to claim 1, wherein determining the home category of the entity to be predicted based on the plurality of classification labels comprises:
when the values of the plurality of classification labels are not all the first values, determining the attribution type of the entity to be predicted by using the classification label with the value of the second value in the plurality of classification labels;
and when the values of the plurality of classification labels are all the first numerical values, determining the attribution type of the entity to be predicted by using the classification label with the highest prediction probability in the plurality of classification labels.
8. An entity classification method, comprising:
receiving an entity to be predicted from a client;
the method comprises the steps of constructing a plurality of examples by adopting the entity to be predicted and a plurality of relation types, classifying each example in the plurality of examples to obtain a plurality of classification labels, and determining the attribution type of the entity to be predicted based on the plurality of classification labels, wherein each example in the plurality of examples comprises the following steps: a text portion, a question portion and an answer portion;
and feeding back the attribution type of the entity to be predicted to the client.
9. An entity classification method, comprising:
acquiring a knowledge entity to be predicted from a knowledge graph;
constructing a plurality of instances using the knowledge entity and a plurality of relationship types, wherein each instance of the plurality of instances comprises: a text portion, a question portion and an answer portion;
classifying each instance in the multiple instances to obtain multiple classification labels;
determining an entity type for the knowledge entity based on the plurality of classification tags.
10. The entity classification method according to claim 9, wherein the classifying each of the plurality of instances to obtain the plurality of classification labels comprises:
and classifying each instance in the multiple instances by using a reading understanding model to obtain the multiple classification labels, wherein the reading understanding model is obtained by deep learning training by using a data set, and the data set is determined by the multiple instances.
11. An entity classification apparatus, comprising:
the acquisition module is used for acquiring an entity to be predicted;
a construction module, configured to construct a plurality of instances by using the entity to be predicted and a plurality of relationship types, where each of the plurality of instances includes: a text portion, a question portion and an answer portion;
the processing module is used for carrying out classification processing on each example in the plurality of examples to obtain a plurality of classification labels;
and the classification module is used for determining the attribution type of the entity to be predicted based on the plurality of classification labels.
12. A storage medium comprising a stored program, wherein the program, when executed, controls a device on which the storage medium is located to perform the entity classification method of any one of claims 1 to 10.
13. A processor, configured to run a program, wherein the program when running performs the entity classification method of any one of claims 1 to 10.
14. An electronic device, comprising:
a processor; and
a memory coupled to the processor for providing instructions to the processor for processing the following processing steps:
step 1, acquiring an entity to be predicted;
step 2, constructing a plurality of instances by adopting the entity to be predicted and a plurality of relationship types, wherein each instance in the plurality of instances comprises: a text portion, a question portion and an answer portion;
step 3, classifying each instance in the multiple instances to obtain multiple classification labels;
and 4, determining the attribution type of the entity to be predicted based on the plurality of classification labels.
CN202111301031.7A 2021-11-04 2021-11-04 Entity classification method, device, storage medium, processor and electronic device Pending CN114201603A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111301031.7A CN114201603A (en) 2021-11-04 2021-11-04 Entity classification method, device, storage medium, processor and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111301031.7A CN114201603A (en) 2021-11-04 2021-11-04 Entity classification method, device, storage medium, processor and electronic device

Publications (1)

Publication Number Publication Date
CN114201603A true CN114201603A (en) 2022-03-18

Family

ID=80646787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111301031.7A Pending CN114201603A (en) 2021-11-04 2021-11-04 Entity classification method, device, storage medium, processor and electronic device

Country Status (1)

Country Link
CN (1) CN114201603A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165309A (en) * 2018-08-06 2019-01-08 北京邮电大学 Negative training sample acquisition method, device and model training method, device
CN110276075A (en) * 2019-06-21 2019-09-24 腾讯科技(深圳)有限公司 Model training method, name entity recognition method, device, equipment and medium
CN112287095A (en) * 2020-12-30 2021-01-29 中航信移动科技有限公司 Method and device for determining answers to questions, computer equipment and storage medium
CN112464647A (en) * 2020-11-23 2021-03-09 北京智源人工智能研究院 Recommendation system-oriented negative sampling method and device and electronic equipment
CN112686044A (en) * 2021-01-18 2021-04-20 华东理工大学 Medical entity zero sample classification method based on language model
CN113221573A (en) * 2021-05-31 2021-08-06 平安科技(深圳)有限公司 Entity classification method and device, computing equipment and storage medium
CN114238571A (en) * 2021-12-15 2022-03-25 平安科技(深圳)有限公司 Model training method, knowledge classification method, device, equipment and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165309A (en) * 2018-08-06 2019-01-08 北京邮电大学 Negative training sample acquisition method, device and model training method, device
CN110276075A (en) * 2019-06-21 2019-09-24 腾讯科技(深圳)有限公司 Model training method, name entity recognition method, device, equipment and medium
CN112464647A (en) * 2020-11-23 2021-03-09 北京智源人工智能研究院 Recommendation system-oriented negative sampling method and device and electronic equipment
CN112287095A (en) * 2020-12-30 2021-01-29 中航信移动科技有限公司 Method and device for determining answers to questions, computer equipment and storage medium
CN112686044A (en) * 2021-01-18 2021-04-20 华东理工大学 Medical entity zero sample classification method based on language model
CN113221573A (en) * 2021-05-31 2021-08-06 平安科技(深圳)有限公司 Entity classification method and device, computing equipment and storage medium
CN114238571A (en) * 2021-12-15 2022-03-25 平安科技(深圳)有限公司 Model training method, knowledge classification method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN112085012B (en) Project name and category identification method and device
CN112164391A (en) Statement processing method and device, electronic equipment and storage medium
CN110717325B (en) Text emotion analysis method and device, electronic equipment and storage medium
CN112632385A (en) Course recommendation method and device, computer equipment and medium
CN107992937B (en) Unstructured data judgment method and device based on deep learning
CN115131698B (en) Video attribute determining method, device, equipment and storage medium
CN112948575A (en) Text data processing method, text data processing device and computer-readable storage medium
CN113469298A (en) Model training method and resource recommendation method
CN113342489A (en) Task processing method and device, electronic equipment and storage medium
CN116881462A (en) Text data processing, text representation and text clustering method and equipment
CN114548325B (en) Zero sample relation extraction method and system based on dual contrast learning
CN112464087B (en) Recommendation probability output method and device, storage medium and electronic equipment
CN115168609A (en) Text matching method and device, computer equipment and storage medium
CN114201603A (en) Entity classification method, device, storage medium, processor and electronic device
CN115203412A (en) Emotion viewpoint information analysis method and device, storage medium and electronic equipment
CN114449342A (en) Video recommendation method and device, computer readable storage medium and computer equipment
CN116414938A (en) Knowledge point labeling method, device, equipment and storage medium
CN113822143A (en) Text image processing method, device, equipment and storage medium
CN115687701A (en) Text processing method
CN112148976A (en) Data processing method and device, electronic equipment and storage medium
CN111090723A (en) Power grid safety production content recommendation method based on knowledge graph
CN111538914A (en) Address information processing method and device
CN113536788B (en) Information processing method, device, storage medium and equipment
CN116150428B (en) Video tag acquisition method and device, electronic equipment and storage medium
CN117725236A (en) Method, device, equipment and medium for processing multimedia content

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination