CN115168573A - Entity relationship processing method, device and computer readable storage medium - Google Patents

Entity relationship processing method, device and computer readable storage medium Download PDF

Info

Publication number
CN115168573A
CN115168573A CN202210720266.8A CN202210720266A CN115168573A CN 115168573 A CN115168573 A CN 115168573A CN 202210720266 A CN202210720266 A CN 202210720266A CN 115168573 A CN115168573 A CN 115168573A
Authority
CN
China
Prior art keywords
target
entity
text
entities
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210720266.8A
Other languages
Chinese (zh)
Inventor
于翠楠
王飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Suikun Intelligent Technology Co ltd
Original Assignee
Nanjing Suikun Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Suikun Intelligent Technology Co ltd filed Critical Nanjing Suikun Intelligent Technology Co ltd
Priority to CN202210720266.8A priority Critical patent/CN115168573A/en
Publication of CN115168573A publication Critical patent/CN115168573A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Primary Health Care (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Public Health (AREA)
  • Medicinal Chemistry (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for processing entity relationships and a computer readable storage medium. Wherein, the method comprises the following steps: acquiring a target text; extracting a plurality of target entities corresponding to pharmacokinetic parameters from a target text; determining a plurality of target entity relationships formed by pairwise target entities; determining target entity relationship types corresponding to a plurality of target entity relationships; and determining target entity relationship chains corresponding to a plurality of target entities in the target text based on the target entity relationship types. The invention solves the technical problem of difficult extraction of entity relations in pharmacokinetics.

Description

Entity relationship processing method, device and computer readable storage medium
Technical Field
The present invention relates to the field of entity relationship extraction, and in particular, to a method and an apparatus for processing an entity relationship, and a computer-readable storage medium.
Background
In the related art, entity extraction is usually performed by adopting a sequence labeling and pointer labeling mode, and relationship extraction is performed by adopting a pipeline combined extraction mode, but when the method is used for extracting the entity relationship in the pharmacokinetic text, the problems of difficult data labeling, high cost for obtaining the labeled text and difficult model training exist.
Therefore, in the related art, there is a technical problem that extraction of entity relationships in pharmacokinetics is difficult.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a method and a device for processing entity relationships and a computer readable storage medium, which at least solve the technical problem of difficult extraction of entity relationships in pharmacokinetics.
According to an aspect of the embodiments of the present invention, there is provided an entity relationship processing method, including: acquiring a target text; extracting a plurality of target entities corresponding to pharmacokinetic parameters from a target text; determining a plurality of target entity relationships formed by pairwise target entities; determining target entity relationship types corresponding to a plurality of target entity relationships; and determining target entity relationship chains corresponding to a plurality of target entities in the target text based on the target entity relationship types.
Optionally, obtaining the target text includes: acquiring an initial text; recognizing the initial text by adopting a text recognition model to obtain a recognition result, wherein the text recognition model is obtained by training a plurality of groups of sample data, and the plurality of groups of sample data comprise: a sample text, and identification information identifying whether the sample text is a valid text including pharmacokinetic parameters; in the case where the recognition result identifies the initial text as valid text including pharmacokinetic parameters, the initial text is determined to be the target text.
Optionally, extracting a plurality of target entities corresponding to the pharmacokinetic parameters from the target text includes: and extracting a plurality of target entities corresponding to pharmacokinetic parameters from the target text by adopting an entity training model, wherein the entity training model is obtained by training the pre-training model based on the hyper-parameters, and the hyper-parameters are obtained by verifying the pre-training model based on a k-fold intersection method.
Optionally, determining a plurality of target entity relationships formed by two target entities includes: combining the target entities pairwise to obtain a plurality of pairwise combination relations included by the target entities; and classifying the pairwise combination relations by adopting a relation classification model to obtain a plurality of target entity relations, wherein the pairwise combination relations between the target entities have a preset labeling relation.
Optionally, determining a target entity relationship type corresponding to the target entity relationships includes: generating a relationship graph among a plurality of target entities based on the plurality of target entity relationships; and searching a target entity relation type corresponding to the target entity relations from the plurality of preset entity relation types based on the relation graph.
Optionally, searching a target entity relationship type corresponding to a plurality of target entity relationships from a plurality of predetermined entity relationship types based on the relationship graph includes: determining the matching degree between the relationship graph and a plurality of preset entity relationship types respectively, wherein the matching degree is determined based on the similarity of the relationship structure between the relationship graph and the preset entity relationship types; and determining a target entity relationship type from a plurality of predetermined entity relationship types based on the matching degree.
Optionally, the pharmacokinetic parameters include at least one of: medicine, parameter index, parameter value, test group and administration mode.
According to another aspect of the embodiments of the present invention, there is also provided an entity relationship processing apparatus, including: the acquisition module is used for acquiring a target text; the extraction module is used for extracting a plurality of target entities corresponding to the pharmacokinetic parameters from the target text; the first determining module is used for determining a plurality of target entity relationships formed by pairwise target entities; the second determining module is used for determining target entity relationship types corresponding to the target entity relationships; and the third determining module is used for determining target entity relationship chains corresponding to a plurality of target entities in the target text based on the target entity relationship types.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, where the computer-readable storage medium includes a stored program, and when the program runs, the apparatus where the computer-readable storage medium is located is controlled to execute any one of the entity relationship processing methods described above.
According to another aspect of the embodiments of the present invention, there is also provided a computer device, including: a memory and a processor, the memory storing a computer program; and a processor for executing the computer program stored in the memory, wherein the computer program causes the processor to execute any one of the entity relationship processing methods described above when the computer program runs.
In the embodiment of the invention, a plurality of target entities corresponding to pharmacokinetic parameters are extracted from a target text, the target entity relationship type between every two target entities is determined, all one-to-one entity relationships existing in the target text can be determined, and then a complete target entity relationship chain can be determined according to all the obtained one-to-one entity relationships, so that the purpose of extracting the entity relationships is achieved, the technical effect of directly extracting the entity relationships from the target text is realized, and the technical problem of difficulty in extracting the entity relationships in the pharmacokinetic is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow diagram of an entity relationship processing method according to an embodiment of the invention;
FIG. 2 is a schematic diagram of entity relationship type 1, according to an alternative embodiment of the present invention;
FIG. 3 is a schematic diagram of entity relationship type 2 in accordance with an alternative embodiment of the present invention;
FIG. 4 is a schematic diagram of entity relationship type 3 in accordance with an alternative embodiment of the present invention;
FIG. 5 is a schematic diagram of entity relationship type 4 in accordance with an alternative embodiment of the present invention;
FIG. 6 is a schematic diagram of entity relationship type 5 in accordance with an alternative embodiment of the present invention;
FIG. 7 is a schematic illustration of an entity relationship type 6 in accordance with an alternative embodiment of the present invention;
FIG. 8 is a diagram of an entity relationship type 7 in accordance with an alternative embodiment of the present invention;
FIG. 9 is a schematic diagram of entity relationship type 8 in accordance with an alternative embodiment of the present invention;
FIG. 10 is a schematic diagram of entity relationship type 9 in accordance with an alternative embodiment of the present invention;
fig. 11 is a block diagram of a structure of an entity relationship processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Description of the terms
Pharmacokinetics, pharmacokinetics (Pharmacokinetic) is a subject for quantitatively studying the absorption, distribution, metabolism and excretion rules of drugs in organisms and describing the change rule of blood drug concentration with time by applying mathematical principles and methods.
And (3) entity extraction: and the subtask of information extraction extracts predefined entity information such as time, place and the like from the text data.
And (3) extracting the relation: the relationship between pairs of entities in the text is determined.
k-fold cross validation, a common method for model evaluation in validation data, is also called cycle validation. The method divides original data into k groups, each subset data is respectively made into a primary verification set, the rest k-1 groups of subset data are used as training sets, so that k models are obtained, the k models are respectively evaluated in the verification sets, and finally, the error is added and averaged to obtain the cross-validation error. The cross validation effectively utilizes limited data, and the evaluation result can be as close as possible to the performance of the model on the test set, and can be used as an index for model optimization.
In accordance with an embodiment of the present invention, there is provided a method embodiment of entity relationship processing, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a flowchart of an entity relationship processing method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step S102, acquiring a target text;
step S104, extracting a plurality of target entities corresponding to pharmacokinetic parameters from the target text;
step S106, determining a plurality of target entity relations formed by pairwise target entities;
step S108, determining target entity relationship types corresponding to a plurality of target entity relationships;
step S110, based on the target entity relationship type, determining a target entity relationship chain corresponding to a plurality of target entities in the target text.
Through the steps, a plurality of target entities corresponding to pharmacokinetic parameters are extracted from the target text, the target entity relationship type between every two target entities is determined, all one-to-one entity relationships existing in the target text can be determined, and then a complete target entity relationship chain can be determined according to all the obtained one-to-one entity relationships, so that the purpose of extracting the entity relationships is achieved, the technical effect of directly extracting the entity relationships from the target text is achieved, and the technical problem of difficulty in extracting the entity relationships in the pharmacokinetic is solved.
As an alternative embodiment, when the target text is obtained, various manners may be adopted, for example, the following manners may be adopted: acquiring an initial text; recognizing the initial text by adopting a text recognition model to obtain a recognition result, wherein the text recognition model is obtained by training a plurality of groups of sample data, and the plurality of groups of sample data comprise: a sample text, and identification information identifying whether the sample text is a valid text including pharmacokinetic parameters; in the case where the recognition result identifies the initial text as valid text including pharmacokinetic parameters, the initial text is determined to be the target text. The initial text is identified by adopting the trained text identification model, the identified effective text comprising the pharmacokinetic parameters is determined as the target text, the text content can be screened before the entity extraction and the entity relation extraction are carried out, and the invalid entity marking and entity extraction are avoided for the text not comprising the pharmacokinetic parameters, so that the entity marking and extraction efficiency is improved.
As an alternative embodiment, when a plurality of target entities corresponding to pharmacokinetic parameters are extracted from the target text, various manners may be adopted, for example, the following manners may be adopted: and extracting a plurality of target entities corresponding to pharmacokinetic parameters from the target text by adopting an entity training model, wherein the entity training model is obtained by training the pre-training model based on the hyper-parameters, and the hyper-parameters are obtained by verifying the pre-training model based on a k-fold intersection method. The method comprises the steps of extracting a plurality of target entities corresponding to pharmacokinetic parameters in a target text by using a trained entity training model, wherein the effectiveness of the model and the training hyperparameters can be ensured by verifying the pre-training model by introducing a k-fold intersection method in the training process of the pre-training model, and then the entity training model is determined according to the hyperparameters determined by the intersection verification, so that the entity extraction effect and the labeling effect of the entity training model are ensured.
As an alternative embodiment, when determining a plurality of target entity relationships formed by two target entities, a plurality of manners may be adopted, for example, the following manners may be adopted: combining the target entities pairwise to obtain a plurality of pairwise combination relations included by the target entities; and classifying the pairwise combination relations by adopting a relation classification model to obtain a plurality of target entity relations, wherein the pairwise combination relations between the target entities have a preset labeling relation. By obtaining the entity relationship after pairwise combination among the target entities, for example, determining the entity relationship of "A-B" and "B-C" in A, B, C, all the obtained pairwise combination relationships can be classified to obtain the corresponding entity relationship types after pairwise combination of the target entities, for example, "A-B" is the relationship between dosage and medicament, and "B-C" is the relationship between medicament and administration mode.
As an optional embodiment, when determining a target entity relationship type corresponding to a plurality of target entity relationships, a plurality of manners may be adopted, for example, the following manners may be adopted: generating a relationship graph among a plurality of target entities based on the plurality of target entity relationships; and searching a target entity relation type corresponding to the target entity relations from the preset entity relation types based on the relation graph. Through the relationship graph among the target entities, the target entity relationship type corresponding to the relationship graph can be directly searched from the preset entity relationship type, the technical effect of directly determining the target entity relationship type according to the relationship graph among the target entities is further achieved, and the problem of complex relationship calculation is greatly reduced.
As an alternative embodiment, when the target entity relationship type corresponding to the plurality of target entity relationships is searched out from the plurality of predetermined entity relationship types based on the relationship graph, a plurality of manners may be adopted, for example, the following manner may be adopted: determining the matching degree between the relationship graph and a plurality of preset entity relationship types respectively, wherein the matching degree is determined based on the similarity of the relationship structure between the relationship graph and the preset entity relationship types; and determining a target entity relationship type from a plurality of predetermined entity relationship types based on the matching degree. By determining the matching degree between the obtained relationship graph and the plurality of predetermined entity relationship types, which predetermined entity relationship type the entity relationship type corresponding to the relationship graph is closer to can be obtained, particularly, the matching degree is determined through the relationship structure, and then the target entity relationship type can be accurately determined according to the relationship structure.
As an alternative embodiment, the pharmacokinetic parameters include at least one of: medicine, parameter index, parameter value, test group and administration mode. By setting the pharmacokinetic parameters, the relation among the pharmacokinetic parameters can be directly and accurately determined from the target text, and the extraction of the pharmacokinetic entity relation can be efficiently completed.
Based on the above embodiments and alternative embodiments, the present invention proposes an alternative implementation, which is described below.
Entity extraction and relationship extraction are common tasks in natural language processing. The method for extracting the entity mainly comprises sequence labeling and pointer labeling. Wherein, the sequence marking is to mark each word in the processed text as belonging to which entity or not belonging to any entity. Pointer labeling locates the beginning and ending position of each entity. The main method of relational extraction is a joint extraction and pipeline method. The joint extraction is to extract the entities and the relationships at the same time, and the pipeline method is to extract the entities first and then extract the relationships between the entities.
Sequence labeling and pointer labeling of entity extraction, and joint extraction and pipeline method of relation extraction are mainly realized through a deep learning method. Deep learning methods usually require the use of large amounts of labeled data to achieve good results. However, in the pharmacokinetic related text, the method has the problems of less labeled data, high cost for acquiring the labeled data and low recall rate and accuracy rate of the model. The relation extraction mainly has the problems of complex relation data labeling and high cost for acquiring a large amount of labeled texts, and the combined extraction model mainly has the problems of high parameter complexity and difficulty in model training.
The method aims at the problems of high data labeling cost and low model accuracy in deep learning. The invention provides a combined method of a pre-screening model and multi-task training.
Firstly, aiming at the situation of sparse entities in a pharmacokinetic scene, a classification model is trained by a small amount of data marked with whether effective entities exist in texts, and the classification model is used for pre-screening the texts to remove the texts obviously without the effective entities. The model uses a pre-training model as a basic model to extract text features, and adds a full-connection layer for classification. Training is carried out on the labeled data, and the recall rate is improved as much as possible under the condition that the accuracy rate is acceptable by controlling the weight balance accuracy rate and the recall rate of the cross entropy loss function. And ensuring the effectiveness of the model and determining the training hyper-parameters through k-fold cross validation, and training on all training data according to the hyper-parameters determined by the cross validation to obtain a pre-screening model. And screening a large amount of text data of the unlabeled possible existing entities by using a pre-screening model, and removing texts obviously without effective entities. And manually labeling the texts of which the effective entities exist in the model.
For the situation that effective entities are sparse in a pharmacokinetic scene, the method can reduce the time for labeling texts without the effective entities, and can concentrate on the texts of the effective entities. Therefore, the efficiency of entity labeling can be improved, a large amount of effective labeling data are obtained to improve the effect of the model, and therefore the accuracy and the recall rate of prediction are improved.
And the problems that the relation between entities existing in the relation extraction task is complex, and the model is difficult to train and the like are solved. The invention provides a relationship chain extraction method based on an entity category relationship graph. The following description will be given by taking the extraction of the relation of pharmacokinetic parameters as an example.
Firstly, extracting entities in a text through an entity extraction model, then combining the extracted entities pairwise, and predicting the relationship between the two entities through a relationship classification model, wherein the relationship comprises whether the two entities have a relationship and a specific relationship type. The relationship classification model is obtained by training relationship labeling data between every two entities. And finally, obtaining a corresponding relation chain from the starting node through a graph search algorithm to realize the extraction of the relation chain between the entities. The method comprises the following specific steps:
1. entity extraction model training
And when the entity extraction model is trained, judging whether the entity text and the sequence label are effective or not and simultaneously training. Namely, whether effective entities exist in the currently processed text is judged during sequence labeling training. The loss function of the model is the sum of the loss of the text of the effective entity and the loss of the sequence label. The model can identify texts with effective entities more accurately by a multi-task training mode, and entity labeling is carried out on the texts, so that the accuracy of the model is further improved. And during training, the effectiveness of the model is determined through k-fold cross validation, and the trained hyper-parameter is determined. And training on all training data according to the hyper-parameters determined by the cross validation to obtain an entity extraction model.
2. Relationship classification model training
And (4) carrying out relation classification model training through relation marking data between every two entities. Firstly, training data enhancement is carried out, all entities in a section of text are paired pairwise to form a relation, the entity positioned at the front in the text in the two entities in the relation is a main entity in the relation, and the entity positioned at the back is a guest entity. If the current relationship exists in the label and the positions of the host and guest entities are consistent, recording the current relationship as a forward relationship; if the current relationship exists in the label, but the host and the object are opposite, recording the current relationship as a reverse relationship; if the current relationship does not exist in the annotation, then no relationship exists between the two entities in the annotation relationship. And performing the same processing on all the texts to obtain enhanced training data, and then performing relation classification model training.
3. Extraction entity
And performing entity extraction on the text through the entity extraction model obtained through training to obtain entities in the text.
4. Inter-entity relationship identification
And (4) combining the entities extracted in the step (3) pairwise, and identifying the relationship between the entities pairwise through the relationship classification model obtained by training in the step (2). And after the relation between all pairwise entity pairs in the text is extracted, a relation graph between the entities is formed. Taking the physical relationship of the pharmacokinetic parameters as an example, the parameter relationship type chart exists in the following forms. In the figure, DRUG refers to the DRUG, PARAM is an abbreviation for parameter, and refers to a parameter, VALUE refers to a specific VALUE, GROUP refers to a test GROUP, and ADMIN is an abbreviation for administration, and refers to a mode of administration.
FIG. 2 is a schematic illustration of entity relationship type 1 in accordance with an alternative embodiment of the present invention; FIG. 3 is a schematic diagram of entity relationship type 2 in accordance with an alternative embodiment of the present invention; FIG. 4 is a schematic diagram of entity relationship type 3 in accordance with an alternative embodiment of the present invention; FIG. 5 is a schematic diagram of entity relationship type 4 in accordance with an alternative embodiment of the present invention; FIG. 6 is a schematic diagram of entity relationship type 5 in accordance with an alternative embodiment of the present invention; FIG. 7 is a schematic diagram of entity relationship type 6 in accordance with an alternative embodiment of the present invention; FIG. 8 is a schematic diagram of entity relationship type 7 in accordance with an alternative embodiment of the present invention; FIG. 9 is a schematic diagram of entity relationship type 8 in accordance with an alternative embodiment of the present invention; FIG. 10 is a schematic diagram of entity relationship type 9 in accordance with an alternative embodiment of the present invention.
Taking the entity relationship type 1 in fig. 2 as an example, as shown in fig. 2, it is recognized that "average half-life", "propranolol", "men", "won", "1.83hr", "2.1hr" are effective entities in pharmacokinetics, and "average half-life" is a parameter type entity, "propranolol" is a drug type entity, "men" and "won" are experimental grouping type entities, "1.83hr" and "2.1hr" are numerical type entities, and then an entity relationship between two entities is determined, as shown in the diagram, an entity relationship exists between "average half-life" and "propranolol", an entity relationship exists between "average half-life" and "1.83hr", an entity relationship exists between "average half-life" and "2.1hr", an entity relationship exists between "and" average half-life "and" 2.1hr ", and a relationship between" and "won" 2.83 hr ", and a relationship between" and "12.1 hr" exists in the target text, and a target entity relationship between two entities is determined, that all entities are the target entities are found according to the target entity relationship, that the target entity relationship between the target entities is shown in the diagram.
5. Extracting entity relationship chains
And (4) extracting related tethers by searching all starting nodes in the entity relationship graph formed by the relationship between every two entities acquired in the step 4.
For similar pharmacokinetic parameter relationship extraction scenes, when the relationship types among the entities in the text are clear and limited, the relationship types can be labeled while the entities are labeled, and the relationship types of the entities in the text can be learned through a model. Therefore, the method and the device realize direct extraction of the relationship between the entities after the entities are extracted.
Alternative embodiments of the invention have the following advantages:
1. the method aims at the problems of long acquisition time and high cost of entity labeling data in the pharmacokinetic parameter extraction scene. The prescreening model method provided by the optional embodiment of the invention filters texts without entities in advance through the effective text prescreening model, alleviates the problem of sparse effective pharmacokinetic parameters, can improve the labeling efficiency, reduces the problem of unbalanced proportion of effective entities and ineffective entities, and improves the accuracy and recall rate of model prediction.
2. The method aims at the problems of complex relation data annotation, high data acquisition cost and difficult model training in the pharmacokinetic parameter relation extraction scene. The optional embodiment of the invention provides a relation extraction method based on an entity relation graph, which can realize direct extraction of the relation after the entity is extracted without identifying the relation between entity pairs.
In the test, the optional embodiment of the invention utilizes the trained entity extraction model to extract the entities from the original data (total 960 ten thousand texts) which is not screened by the pre-screening model, and the total 19464 texts with effective entities are obtained. And (3) performing entity extraction on the texts filtered by the pre-screening model (32460 texts in total) to obtain 15511 texts with effective entities. Test results prove that the pre-screening model can filter a large amount of texts without effective entities, so that the text labeling efficiency is improved, and the model prediction accuracy is improved. And the relation extraction does not need labeling data and training models any more, so that the labeling cost and the model training cost of the relation data are reduced.
Fig. 11 is a block diagram of a structure of an entity relationship processing apparatus according to an embodiment of the present invention, and as shown in fig. 11, the apparatus includes: the device includes an obtaining module 1101, an extracting module 1102, a first determining module 1103, a second determining module 1104 and a third determining module 1105.
An obtaining module 1101, configured to obtain a target text; an extracting module 1102 connected to the acquiring module 1101, configured to extract a plurality of target entities corresponding to pharmacokinetic parameters from a target text; a first determining module 1103, connected to the extracting module 1102, for determining a plurality of target entity relationships formed by two target entities; a second determining module 1104, connected to the first determining module 1103, configured to determine target entity relationship types corresponding to a plurality of target entity relationships; a third determining module 1105, connected to the second determining module 1104, configured to determine, based on the target entity relationship type, a target entity relationship chain corresponding to multiple target entities in the target text.
According to the embodiment of the present invention, a computer-readable storage medium is further provided, where the computer-readable storage medium includes a stored program, and when the program runs, the apparatus where the computer-readable storage medium is located is controlled to execute any one of the entity relationship processing methods described above.
According to an embodiment of the present invention, there is also provided a computer apparatus including: a memory and a processor, the memory storing a computer program; and a processor for executing the computer program stored in the memory, wherein the computer program causes the processor to execute any one of the entity relationship processing methods described above when the computer program runs. The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is substantially or partly contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. An entity relationship processing method, comprising:
acquiring a target text;
extracting a plurality of target entities corresponding to pharmacokinetic parameters from the target text;
determining a plurality of target entity relations formed by two target entities;
determining target entity relationship types corresponding to the target entity relationships;
and determining target entity relationship chains corresponding to the target entities in the target text based on the target entity relationship types.
2. The method of claim 1, wherein obtaining the target text comprises:
acquiring an initial text;
recognizing the initial text by adopting a text recognition model to obtain a recognition result, wherein the text recognition model is obtained by training a plurality of groups of sample data, and the plurality of groups of sample data comprise: a sample text, and identification information identifying whether the sample text is a valid text including pharmacokinetic parameters;
determining the initial text as the target text in case that the recognition result identifies the initial text as a valid text including pharmacokinetic parameters.
3. The method of claim 1, wherein the extracting a plurality of target entities corresponding to pharmacokinetic parameters from the target text comprises:
and extracting a plurality of target entities corresponding to pharmacokinetic parameters from the target text by adopting an entity training model, wherein the entity training model is obtained by training a pre-training model based on a hyper-parameter, and the hyper-parameter is obtained by verifying the pre-training model based on a k-fold intersection method.
4. The method of claim 1, wherein determining the plurality of target entity relationships between two of the plurality of target entities comprises:
combining the target entities pairwise to obtain a plurality of pairwise combination relations included by the target entities;
and classifying the pairwise combination relations by adopting a relation classification model to obtain the target entity relations between the target entities, wherein the target entities have preset labeling relations.
5. The method of claim 1, wherein the determining the target entity relationship type corresponding to the plurality of target entity relationships comprises:
generating a relationship graph between the plurality of target entities based on the plurality of target entity relationships;
and searching a target entity relation type corresponding to the target entity relations from a plurality of preset entity relation types based on the relation graph.
6. The method according to claim 5, wherein the searching for a target entity relationship type corresponding to the target entity relationships from a plurality of predetermined entity relationship types based on the relationship graph comprises:
determining a degree of matching between the relationship graph and the plurality of predetermined entity relationship types respectively, wherein the degree of matching is determined based on similarity of relationship structures between the relationship graph and the plurality of predetermined entity relationship types;
and determining the target entity relationship type from the plurality of predetermined entity relationship types based on the matching degree.
7. The method of any one of claims 1 to 6, wherein the pharmacokinetic parameters include at least one of: medicine, parameter index, parameter value, test group and administration mode.
8. An entity relationship processing apparatus, comprising:
the acquisition module is used for acquiring a target text;
the extraction module is used for extracting a plurality of target entities corresponding to the pharmacokinetic parameters from the target text;
the first determining module is used for determining a plurality of target entity relations formed by two target entities;
a second determining module, configured to determine target entity relationship types corresponding to the multiple target entity relationships;
and a third determining module, configured to determine, based on the target entity relationship type, target entity relationship chains corresponding to the multiple target entities in the target text.
9. A computer-readable storage medium, comprising a stored program, wherein when the program is run, the program controls an apparatus in which the computer-readable storage medium is located to execute the entity relationship processing method according to any one of claims 1 to 7.
10. A computer device, comprising: a memory and a processor, wherein the processor is capable of,
the memory stores a computer program;
the processor is configured to execute the computer program stored in the memory, and when the computer program runs, the processor is enabled to execute the entity relationship processing method according to any one of claims 1 to 7.
CN202210720266.8A 2022-06-23 2022-06-23 Entity relationship processing method, device and computer readable storage medium Withdrawn CN115168573A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210720266.8A CN115168573A (en) 2022-06-23 2022-06-23 Entity relationship processing method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210720266.8A CN115168573A (en) 2022-06-23 2022-06-23 Entity relationship processing method, device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN115168573A true CN115168573A (en) 2022-10-11

Family

ID=83486435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210720266.8A Withdrawn CN115168573A (en) 2022-06-23 2022-06-23 Entity relationship processing method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115168573A (en)

Similar Documents

Publication Publication Date Title
Kuznetsova et al. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale
CN109766872B (en) Image recognition method and device
Dehghan et al. Dager: Deep age, gender and emotion recognition using convolutional neural network
Lun et al. Elements of style: learning perceptual shape style similarity
CN109918560B (en) Question and answer method and device based on search engine
CN112270196B (en) Entity relationship identification method and device and electronic equipment
Gangwar et al. AttM-CNN: Attention and metric learning based CNN for pornography, age and Child Sexual Abuse (CSA) Detection in images
CN104573130B (en) The entity resolution method and device calculated based on colony
CN107491447B (en) Method for establishing query rewrite judging model, method for judging query rewrite and corresponding device
CN104778186B (en) Merchandise items are mounted to the method and system of standardized product unit
Ramdhani et al. Indonesian news classification using convolutional neural network
CN110929525B (en) Network loan risk behavior analysis and detection method, device, equipment and storage medium
CN106529110A (en) Classification method and equipment of user data
CN109086794A (en) A kind of driving behavior mode knowledge method based on T-LDA topic model
CN115033668B (en) Story venation construction method and device, electronic equipment and storage medium
CN106023159A (en) Disease spot image segmentation method and system for greenhouse vegetable leaf
CN113641906A (en) System, method, device, processor and medium for realizing similar target person identification processing based on fund transaction relation data
KR102457455B1 (en) Device and Method for Artwork Price Prediction Using Artificial intelligence
CN111144453A (en) Method and equipment for constructing multi-model fusion calculation model and method and equipment for identifying website data
CN114662477A (en) Stop word list generating method and device based on traditional Chinese medicine conversation and storage medium
Matzen et al. Bubblenet: Foveated imaging for visual discovery
Xu et al. Robust seed localization and growing with deep convolutional features for scene text detection
Stewart et al. The animal id problem: continual curation
JP2023130409A (en) Information processing device, information processing method, and program
CN110750712A (en) Software security requirement recommendation method based on data driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20221011