CN112766903B - Method, device, equipment and medium for identifying adverse event - Google Patents

Method, device, equipment and medium for identifying adverse event Download PDF

Info

Publication number
CN112766903B
CN112766903B CN202110065632.6A CN202110065632A CN112766903B CN 112766903 B CN112766903 B CN 112766903B CN 202110065632 A CN202110065632 A CN 202110065632A CN 112766903 B CN112766903 B CN 112766903B
Authority
CN
China
Prior art keywords
text
word
adverse event
word segmentation
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110065632.6A
Other languages
Chinese (zh)
Other versions
CN112766903A (en
Inventor
赵奇
金毅
黄晞益
刘戈
朱晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AstraZeneca Investment China Co Ltd
Original Assignee
AstraZeneca Investment China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AstraZeneca Investment China Co Ltd filed Critical AstraZeneca Investment China Co Ltd
Priority to CN202110065632.6A priority Critical patent/CN112766903B/en
Publication of CN112766903A publication Critical patent/CN112766903A/en
Application granted granted Critical
Publication of CN112766903B publication Critical patent/CN112766903B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Machine Translation (AREA)

Abstract

The present disclosure provides a method, apparatus, device, and medium for identifying adverse events, where the method includes: selectively retrieving text to be identified from one or more data sources; selecting an identification model corresponding to the type of the text according to the type of the text; and performing semantic recognition on the text using the selected recognition model to identify adverse events in the text. According to the method for identifying the adverse events, different identification models can be selected according to different text types, then corresponding adverse event identification is carried out on the text to be identified according to the selected identification models, screening identification is not needed, occurrence of missing report/delayed report adverse events is avoided, and adverse events can be timely/accurately reported, so that accuracy and timeliness of adverse event report can be improved in places such as companies or hospitals which manufacture or use medicines or medical instruments, and purposes such as source saving, efficiency improving and energy enabling of the whole flow of the adverse event identification and reporting are achieved.

Description

Method, device, equipment and medium for identifying adverse event
Technical Field
The present disclosure relates to the field of medicine, and more particularly, to a method, apparatus, device, and medium for identifying adverse events.
Background
In recent years, as regulations and regulations become stricter, requirements for reporting of Adverse Events (AEs) are becoming higher and higher. The national drug administration requires drug marketing licensees to build a sound drug adverse event monitoring system and report adverse events in time. Adverse events overdue may lead to product outages and even revocation of drug approval certificates. Therefore, all staff are now required by each pharmaceutical company to report to the drug alert department on the day of the adverse event being known to ensure that product safety is assessed in time, ensuring patient safety.
In addition, in places where medicines or medical devices are used (such as hospitals), there are more and more adverse events caused by drug exposure in gestation period (parent source and father source), drug exposure in lactation period, excessive drug, drug abuse, misuse, adverse events accompanying overdose, misuse, occupational exposure, lack of efficacy and disease progression, exposure to pathogens, drug interaction, medical devices (malfunction), death due to unknown reasons, suicide or attempted suicide, unexpected benefits, etc., and if the adverse events are not timely and effectively reported to the relevant alert departments, the places where medicines or medical devices are used (such as hospitals) will face serious risks such as compensation, responsibility for questions and even shut down, etc., and it is difficult to ensure the safety of patients or users.
At present, the identification of the adverse events is carried out in a manual screening mode, but as the sources of the information of the adverse events are more and more complicated and diversified, more people are more and more required to process the manual screening of the adverse events. However, the existing human resources are limited, so that the compliance risk of the current adverse event report is increasingly prominent.
Under the conditions of complicated and diversified information sources and limited human resources of the adverse events, the requirement of avoiding missing report/delayed report AE and timely/accurately reporting AE is also increasing.
Therefore, a method for automatically identifying adverse events is needed, and the method can identify different adverse event sources respectively, so as to avoid missing report/delayed report AE and report AE timely/accurately.
Disclosure of Invention
Aiming at the problems, the present disclosure provides a method for identifying adverse events, and the method can identify different adverse event sources respectively, so as to avoid missing report/delayed report AE and timely/accurately report AE, thereby helping places such as companies or hospitals which manufacture or use medicines or medical instruments to improve the accuracy and timeliness of adverse event report, and realizing the purposes of source saving, efficiency improving, energy giving and the like of the whole flow of AE identification and reporting.
The embodiment of the disclosure provides a method for identifying adverse events, which comprises the following steps: selectively retrieving text to be identified from one or more data sources; selecting an identification model corresponding to the type of the text according to the type of the text; and performing semantic recognition on the text using the selected recognition model to identify adverse events in the text.
According to an embodiment of the present disclosure, the type of the text is determined based on the length of the text and/or the source of the text.
According to an embodiment of the present disclosure, the selecting, according to the type of the text, a recognition model corresponding to the type of the text includes: in the case that the type of the text is a first type, selecting a first recognition model, wherein the first recognition model comprises: the device comprises a word segmentation device, a converter, a feature extractor and a classifier, wherein the word segmentation device is used for segmenting sentences in the text, the converter is used for converting word segmentation results into vector sequences, the feature extractor is used for extracting semantic features based on the vector sequences, and the classifier is used for judging whether the text contains adverse events or not based on the extracted semantic features.
According to the embodiment of the disclosure, the word segmentation device comprises a first word segmentation device and a second word segmentation device, wherein the first word segmentation device is used for word-by-word segmentation of the text, and the second word segmentation device is used for word-by-word segmentation of the text; the converter comprises a first converter and a second converter, wherein the first converter is used for converting word segmentation results of the first word segmentation device into word vector sequences, and the second converter is used for converting word segmentation results of the second word segmentation device into word vector sequences.
According to an embodiment of the present disclosure, the second word separator is configured to separate the text into words, and includes: and generating a directed acyclic graph of all word segmentation conditions in sentences in the text according to dictionary trees generated by the universal dictionary and the domain professional dictionary, thereby realizing word segmentation of the text word by word.
According to an embodiment of the disclosure, the performing semantic recognition on the text to identify the adverse event in the text using the selected recognition model includes: word-by-word segmentation is carried out on sentences in the text by utilizing the first word segmentation device, and word-by-word segmentation is carried out on sentences in the text by utilizing the second word segmentation device; converting the word segmentation result of the first word segmentation device into a word vector sequence by using the first converter, and converting the word segmentation result of the second word segmentation device into a word vector sequence by using the second converter; extracting semantic features based on the word vector sequence and the word vector sequence with the feature extractor; judging whether the text contains adverse events or not based on the extracted semantic features by using the classifier, wherein the text is determined to contain the adverse events under the condition that the probability of occurrence of the adverse events in the text is greater than a preset threshold value.
According to an embodiment of the present disclosure, the selecting, according to the type of the text, a recognition model corresponding to the type of the text includes: in the case that the type of the text is a second type, selecting a second recognition model, wherein the second recognition model comprises: the system comprises a named entity identifier, an adverse event name identifier, a semantic role filter and an event determiner, wherein the named entity identifier is used for identifying a named entity in a text, the adverse event name identifier is used for identifying an adverse event name in the text, the semantic role identifier is used for identifying a semantic role of an adverse event occurrence in a sentence of the text according to the identified named entity and the adverse event name, the semantic role filter is used for filtering out at least one part of roles according to the identified semantic roles and a preset rule, and the event determiner is used for determining whether the text contains the adverse event according to the screened roles and the preset trigger word.
According to an embodiment of the disclosure, the performing semantic recognition on the text to identify the adverse event in the text using the selected recognition model includes: identifying a named entity in the text by using the named entity identifier; identifying an adverse event name in a text by using the adverse event name identifier; identifying semantic roles of adverse events in sentences of the text according to the identified named entities and the adverse event names by utilizing the semantic role identifier; screening at least one part of roles according to the identified semantic roles and a preset rule by utilizing the semantic role screening device; and determining whether the text contains the adverse event or not according to the screened role and the preset trigger word by using the event determiner, wherein the text is determined to contain the adverse event under the condition that the screened role and the preset trigger word meet the preset event triplet.
According to an embodiment of the present disclosure, the second recognition model further includes: and a coreference resolution unit configured to complete coreference resolution in the text, thereby determining an association relationship between the drug and the adverse event, wherein the identifying the adverse event in the text using the selected identification model further includes: after identifying the semantic roles of the adverse events in the sentences of the text according to the identified named entities and the adverse event names by utilizing a semantic role identifier, utilizing the coreference resolution device to complete coreference resolution in the text.
According to the embodiment of the disclosure, the adverse event at least comprises the following three elements: subjects, causes, and adverse outcomes.
According to an embodiment of the present disclosure, further including: and feeding back the identification result about the adverse event through a preset reporter.
According to an embodiment of the present disclosure, the identification model is an identification model of the medical field, and the adverse event is an adverse event of the medical field.
The embodiment of the disclosure provides a device for identifying adverse events, which comprises: an acquisition module configured to selectively acquire text to be identified from one or more data sources; a selection module configured to select an identification model corresponding to a type of the text according to the type of the text; and an identification module configured to perform semantic identification on the text using the selected identification model to identify adverse events in the text.
According to an embodiment of the present disclosure, the type of the text is determined based on the length of the text and/or the source of the text.
According to an embodiment of the disclosure, the selecting module includes: in the case that the type of the text is a first type, selecting a first recognition model, wherein the first recognition model comprises: the device comprises a word segmentation device, a converter, a feature extractor and a classifier, wherein the word segmentation device is used for segmenting sentences in the text, the converter is used for converting word segmentation results into vector sequences, the feature extractor is used for extracting semantic features based on the vector sequences, and the classifier is used for judging whether the text contains adverse events or not based on the extracted semantic features.
According to the embodiment of the disclosure, the word segmentation device comprises a first word segmentation device and a second word segmentation device, wherein the first word segmentation device is used for word-by-word segmentation of the text, and the second word segmentation device is used for word-by-word segmentation of the text; the converter comprises a first converter and a second converter, wherein the first converter is used for converting word segmentation results of the first word segmentation device into word vector sequences, and the second converter is used for converting word segmentation results of the second word segmentation device into word vector sequences.
According to an embodiment of the present disclosure, the second word separator is configured to separate the text into words, and includes: and generating a directed acyclic graph of all word segmentation conditions in sentences in the text according to dictionary trees generated by the universal dictionary and the domain professional dictionary, thereby realizing word segmentation of the text word by word.
According to an embodiment of the present disclosure, the identification module includes: word-by-word segmentation is carried out on sentences in the text by utilizing the first word segmentation device, and word-by-word segmentation is carried out on sentences in the text by utilizing the second word segmentation device; converting the word segmentation result of the first word segmentation device into a word vector sequence by using the first converter, and converting the word segmentation result of the second word segmentation device into a word vector sequence by using the second converter; extracting semantic features based on the word vector sequence and the word vector sequence with the feature extractor; judging whether the text contains adverse events or not based on the extracted semantic features by using the classifier, wherein the text is determined to contain the adverse events under the condition that the probability of occurrence of the adverse events in the text is greater than a preset threshold value.
According to an embodiment of the disclosure, the selecting module includes: in the case that the type of the text is a second type, selecting a second recognition model, wherein the second recognition model comprises: the system comprises a named entity identifier, an adverse event name identifier, a semantic role filter and an event determiner, wherein the named entity identifier is used for identifying a named entity in a text, the adverse event name identifier is used for identifying an adverse event name in the text, the semantic role identifier is used for identifying a semantic role of an adverse event occurrence in a sentence of the text according to the identified named entity and the adverse event name, the semantic role filter is used for filtering out at least one part of roles according to the identified semantic roles and a preset rule, and the event determiner is used for determining whether the text contains the adverse event according to the screened roles and the preset trigger word.
According to an embodiment of the present disclosure, the identification module includes: identifying a named entity in the text by using the named entity identifier; identifying an adverse event name in a text by using the adverse event name identifier; identifying semantic roles of adverse events in sentences of the text according to the identified named entities and the adverse event names by utilizing the semantic role identifier; screening at least one part of roles according to the identified semantic roles and a preset rule by utilizing the semantic role screening device; and determining whether the text contains the adverse event or not according to the screened role and the preset trigger word by using the event determiner, wherein the text is determined to contain the adverse event under the condition that the screened role and the preset trigger word meet the preset event triplet.
According to an embodiment of the present disclosure, the second recognition model further includes: and a coreference resolution unit configured to complete coreference resolution in the text, thereby determining an association relationship between the drug and the adverse event, wherein the identifying the adverse event in the text using the selected identification model further includes: after identifying the semantic roles of the adverse events in the sentences of the text according to the identified named entities and the adverse event names by utilizing a semantic role identifier, utilizing the coreference resolution device to complete coreference resolution in the text.
The embodiment of the disclosure provides equipment for identifying adverse events, which comprises: a processor, and a memory storing computer executable instructions that when executed by the processor cause the processor to perform the method of any of the above.
The disclosed embodiments provide a computer-readable recording medium storing computer-executable instructions, wherein the computer-executable instructions, when executed by a processor, cause the processor to perform the method of any one of the above.
The embodiment of the disclosure provides a method, a device, equipment and a medium for identifying adverse events. According to the method for identifying the adverse events, different identification models can be selected according to different text types, and then corresponding adverse event identification is carried out on the text to be identified according to the selected identification models without manual screening identification, so that occurrence of missed report/delayed report AE is avoided, AE can be timely/accurately reported, accuracy and timeliness of adverse event report can be improved in places such as companies or hospitals which manufacture or use medicines or medical instruments, and purposes such as source saving, efficiency improving and energy giving of the whole process of AE identification and reporting are achieved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required to be used in the description of the embodiments will be briefly described below. It should be apparent that the drawings in the following description are only some exemplary embodiments of the present disclosure, and that other drawings may be obtained from these drawings by those of ordinary skill in the art without undue effort.
Fig. 1 illustrates a flowchart of a method of identifying adverse events according to an embodiment of the present disclosure.
Fig. 2A illustrates a block diagram of a first recognition model, according to an embodiment of the present disclosure.
Fig. 2B illustrates a flowchart of adverse event recognition of text using a first recognition model according to an embodiment of the present disclosure.
Fig. 2C is an example of adverse event recognition of a first type of text using a first recognition model.
Fig. 3A illustrates a block diagram of a second recognition model, according to an embodiment of the present disclosure.
Fig. 3B illustrates a flow chart for adverse event recognition of text using a second recognition model in accordance with an embodiment of the present disclosure.
Fig. 3C is an example of adverse event recognition of a second type of text using a second recognition model.
Fig. 3D and 3E illustrate examples of the result after semantic role labeling according to embodiments of the present disclosure.
Fig. 4 illustrates a block diagram of an apparatus 400 for identifying adverse events according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the present disclosure more apparent, exemplary embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present disclosure and not all of the embodiments of the present disclosure, and that the present disclosure is not limited by the example embodiments described herein.
In the present specification and drawings, substantially the same or similar steps and elements are denoted by the same or similar reference numerals, and repeated descriptions of the steps and elements will be omitted. Meanwhile, in the description of the present disclosure, the terms "first," "second," and the like are used merely to distinguish the descriptions, and are not to be construed as indicating or implying relative importance or order.
In the present specification and drawings, elements are described in the singular or plural form according to an embodiment. However, the singular and plural forms are properly selected for the proposed case only for convenience of explanation and are not intended to limit the present disclosure thereto. Accordingly, the singular may include the plural and the plural may include the singular unless the context clearly indicates otherwise.
In the prior art, the identification of the Adverse Event (AE) is performed by a manual screening method, and under the conditions of complicated and diversified information sources and limited human resources of the adverse event, the risk of the AE identified by the manual screening method is increasingly prominent, which is also increasingly prominent for the compliance risk of the adverse event report of various places such as companies or hospitals for manufacturing or using medicines or medical instruments.
In order to solve the problems, the present disclosure provides a method for identifying adverse events, and the method can identify different adverse event sources respectively, so as to avoid missing report/delayed report AE, and timely/accurately report AE, thereby helping each company or hospital etc. where a drug or a medical instrument is manufactured or used to improve accuracy and timeliness of adverse event report, and achieving purposes of saving sources, improving efficiency, enabling etc. of the whole flow of AE identification and reporting.
The method for identifying adverse events provided in the present disclosure will be described in detail with reference to the accompanying drawings.
Fig. 1 illustrates a flowchart of a method of identifying adverse events according to an embodiment of the present disclosure.
Referring to fig. 1, text to be recognized may be selectively acquired from one or more data sources at step S110.
By way of example, one or more of the data sources may be from an online published paper, where the published paper is a medical field-related paper in which some adverse events may be described at a location such as a company or hospital that makes or uses a drug or medical device, and which is generally chapter-level, often of a particular length.
As another example, the one or more data sources may be data recorded from a Call Center (Call Center) of a company or hospital or the like that manufactures or uses drugs or medical devices. Typically, a person calls a call center, and a caller in the call center records the information of the call faithfully, and adverse events reported or mentioned by a patient or a client may exist in the information of the call. Typically, the information of the call is relatively short, e.g., several sentences.
As yet another example, the one or more data sources may be data from records of a visit record, wherein the visit record is typically a visit summary written by a medical representative of the respective pharmaceutical company or the like after making a visit with a doctor, in which a certain adverse event may be mentioned by the doctor. Typically, the content of the data in the call record is relatively modest, intermediate to the two sources.
As yet another example, the selection of data or call records from one or more of the above-described data sources may be performed manually or by setting an automated configuration, e.g., data or call records recorded by an online published paper or call center may be selected manually as one or more data sources; for example, an automation configuration may be preset and data recorded by the call center may be selected as one or more data sources for certain time periods (e.g., 00:00 to 07:00 a day) in the automation configuration, and papers or call recordings disclosed on the web may be selected as one or more data sources for another time period (e.g., 08:00-20:00 a day), or the data sources may be selected for all the time periods, which is not limited herein, and other time periods or data sources may be selected according to other manners.
The text to be identified may or may not include adverse event information, and may not be obtained from the data sources in the examples described above.
In step S120, a recognition model corresponding to the type of the text may be selected according to the type of the text.
According to embodiments of the present disclosure, the type of text may be determined based on the length of the text and/or the source of the text. The text types may be a first type and a second type.
As an example, the type of text may be determined based on the length of the text. For example, a type of text whose content length is particularly long may be determined as a first type, a type of text whose content length is relatively short may be determined as a second type, and a type of text whose content length is relatively moderate may be determined as a first type.
As another example, the type of text may be determined based on the source of the text. For example, the type of text from a paper published on the web may be determined as a first type, the type of text from a call record may be determined as a first type, and the type of text from data recorded to a call center may be determined as a second type.
As yet another example, the type of text may be determined based on the length of the text and the source of the text. For example, a type of text that arrives at a paper published on the web and has a particularly long content length may be determined as a first type, a type of text that arrives at a call record and has a moderate content length may be determined as a first type, a type of text that arrives at a call record and has a relatively short content length may be determined as a second type, a type of text that arrives at a call center recorded data and has a particularly long content length may be determined as a first type, and a type of text that arrives at a call center recorded data and has a relatively short content length may be determined as a second type.
According to an embodiment of the present disclosure, the recognition model may be a first recognition model and a second recognition model.
In accordance with an embodiment of the present disclosure, in the case where the type of text is a first type, a first recognition model corresponding to the first type is selected, which will be described in detail below in connection with fig. 2A.
In accordance with an embodiment of the present disclosure, in the case where the type of text is the second type, a second recognition model corresponding to the second type is selected, which will be described in detail below in connection with fig. 3A.
At step S130, semantic recognition may be performed on the text using the selected recognition model to identify adverse events in the text.
According to embodiments of the present disclosure, semantic recognition may be performed on the text using the selected first recognition model to identify adverse events in the text, the recognition process being described in detail below in connection with fig. 2B.
According to embodiments of the present disclosure, semantic recognition may be performed on the text using the selected second recognition model to identify adverse events in the text, the recognition process being described in detail below in connection with fig. 3B.
The method of identifying adverse events according to embodiments of the present disclosure is described above in connection with fig. 1. According to the method for identifying the adverse events, the corresponding identification model can be selected according to different text types, so that the adverse events in the text can be automatically identified without manually identifying the adverse events, the adverse events in the text can be timely and accurately identified, the identified adverse events can be timely and accurately reported, the accuracy and timeliness of adverse event reports of companies or hospitals and other places for manufacturing or using medicines or medical instruments are improved, and the purposes of source saving, efficiency improving, energy giving and the like of the overall process of identifying and reporting the adverse events are achieved.
The first recognition model, the second recognition model, and the recognition process thereof will be described in detail with reference to fig. 2A and 3A.
Fig. 2A illustrates a block diagram of a first recognition model, according to an embodiment of the present disclosure.
Referring to fig. 2A, the first recognition model 200 may include a word segmenter 210, a converter 220, a feature extractor 230, and a classifier 240.
The word segmenter 210 may be used to segment sentences in the text, and the word segmenter 210 may include a first word segmenter that may be used to segment the text word by word and a second word segmenter that may be used to segment the text word by word.
According to the embodiment of the disclosure, the first word segmentation device may be configured to word-by-word segment the text, and may include word-by-word segmentation of sentences in the text, so as to obtain word-by-word granularity sentence segmentation.
As an example, the text to be recognized may be "dizziness after one patient takes aspirin", and the sentence break with granularity of the individual word obtained after the word-by-word sentence break may be "one/patient/oral/ajasz/p/lin/post/present/head/halo".
According to the embodiment of the disclosure, the second word segmentation device may be configured to segment the text word by word, and may include generating a directed acyclic graph (Directed Acyclic Graph, DAG) of all possible word segmentation conditions in the sentence according to a dictionary tree generated by a general dictionary and a domain-specific dictionary, and then performing viterbi decoding to obtain an optimal word segmentation result, so as to obtain a sentence with a granularity of words, where the general dictionary may be obtained from an open source word stock such as a thesaurus, and the domain-specific dictionary refers to a dictionary in the medical domain and may be a dictionary actually constructed according to its own condition.
As an example, the text to be recognized may be "dizziness after one patient takes aspirin", and the sentence-by-sentence-by-sentence granularity may be "one patient/taking/aspirin/post/dizziness".
The converter 220 may be configured to convert the word segmentation result into a vector sequence, and the converter 220 may include a first converter configured to convert the word segmentation result of the first word segmentation unit into a word vector sequence, and a second converter configured to convert the word segmentation result of the second word segmentation unit into a word vector sequence.
According to embodiments of the present disclosure, the first converter may use the medical field article for word segmentation results (e.g., "one/name/patient/suit/use/ajasjp/forest/post/present/head/halo" as described above), and then, fine tuning an open source pre-training language module (such as an albert model) to obtain a semantic expression specialized in the medical field, namely a word vector sequence (char-sequence), wherein medical field articles can be obtained through published articles related to medical treatment.
According to an embodiment of the disclosure, the second converter may convert the word segmentation result (for example, "one word/patient/take/aspirin/post/present/dizziness") of the second word segmentation unit, using an algorithm for converting the word into a vector (for example, cbow algorithm) to obtain a word-vector sequence (word-sequence).
Feature extractor 230 may be used to extract semantic features based on the vector sequence.
According to an embodiment of the present disclosure, the feature extractor 230 may send the obtained word vector sequence (char-sequence) and the word vector sequence (char-sequence) to the deep learning model respectively and extract semantic features therein by using convolution in the model, so as to obtain two sentence vectors, where the deep learning model may be obtained by training with training samples with marks in advance, and the sources of the samples may be recorded data from the call center, and the marks may indicate which data contains adverse events and which data does not contain adverse events.
As an example, the extraction of semantic features described above may be performed using convolution kernel sizes of 3, 4, and 5 lengths in the deep learning model.
As an example, the deep learning model may be a convolutional neural network (Convolutional Neural Networks, CNN), and more specifically, may be an algorithm for classifying text using the convolutional neural network, such as TextCNN algorithm.
Classifier 240 may be used to determine whether adverse events are contained in the text based on the extracted semantic features.
According to the embodiment of the disclosure, the classifier 240 may splice the two sentence vectors obtained by the training and input the two sentence vectors to the full-connection layer of the cross entropy loss function of the deep learning model, so as to obtain a classification result of whether the text contains an adverse event. As an example, in case the probability of occurrence of an adverse event in a text is greater than a predetermined threshold, which may be 50% or 60% equivalent, it may be determined that the adverse event is contained in the text.
The first recognition model and the components included therein are described above in connection with fig. 2A, and as can be appreciated from the foregoing, the process of performing adverse event recognition on the text using the first recognition model may be a process as illustrated in fig. 2B, and in particular, fig. 2B illustrates a flowchart of performing adverse event recognition on the text using the first recognition model according to an embodiment of the disclosure.
Referring to fig. 2B, in step S210, sentences in the text may be segmented word by word using the first word segmentation unit and segmented word by word using the second word segmentation unit.
In step S220, the word segmentation result of the first word segmentation device may be converted into a word vector sequence by using the first converter and the word segmentation result of the second word segmentation device may be converted into a word vector sequence by using the second converter.
In step S230, semantic features may be extracted based on the word vector sequence and the word vector sequence using the feature extractor.
In step S240, the classifier may determine whether an adverse event is included in the text based on the extracted semantic features, where it is determined that the adverse event is included in the text if the probability of occurrence of the adverse event in the text is greater than a predetermined threshold.
The details of each step in fig. 2B may be referred to the description of the corresponding parts in fig. 2A, and will not be repeated here. The above description of the recognition of adverse events to text using the recognition model will be described below by way of example with reference to fig. 2C.
Fig. 2C is an example of adverse event recognition of a first type of text using a first recognition model.
Referring to fig. 2C, the text to be recognized may be "dizziness after one patient takes aspirin" as described above.
First, in step S2021, the text to be recognized is subjected to word segmentation by using the first word segmentation device, and in step S2011, the text to be recognized is subjected to word segmentation by using the second word segmentation device, so as to obtain: "one/patient/oral/administration/ajasm/p/lin/back/out/present/head/dizziness" and "one/patient/administration/aspirin/back/present/dizziness";
next, in step S2022, the obtained word segmentation results are respectively processed by the first converter, and in step S2012, the obtained word segmentation results are respectively processed by the second converter to obtain different vector sequences, that is, the word vector sequence (char-sequence) and the word vector sequence (char-sequence);
Then, in step S2023 and step S2013, the feature extractor is used to extract semantic features from the vector sequence obtained by the above steps through the convolution layer, so as to obtain a brand new sentence vector respectively;
next, in step S2030, the classifier is used to process the concatenation of the two brand-new sentence vectors, so as to obtain a two-classification result, and if the probability of occurrence of an adverse event in the text is greater than a predetermined threshold, it can be determined that the adverse event is included in the text;
finally, in step S2040, an AE recognition result is obtained.
The first recognition model and the process of recognizing the first type of text using the first recognition model have been described in detail above in conjunction with fig. 2A to 2C, and the second recognition model and the process of recognizing the second type of text using the second recognition model will be described in detail below in conjunction with fig. 3A to 3E.
Fig. 3A illustrates a block diagram of a second recognition model, according to an embodiment of the present disclosure.
Referring to fig. 3A, the second recognition module 300 may include a named entity identifier 310, an adverse event name identifier 320, a semantic role identifier 330, a semantic role filter 340, and an event determiner 350.
Named entity identifier 310 may be used to identify named entities in text.
According to embodiments of the present disclosure, the named entity identifier 310 may identify named entities of patients, drugs, factories, time, etc. by sequential modeling of text using a transducer model in conjunction with conditional random fields (Conditional Random Field, CRF).
By way of example, the text to be identified may be "Sunzhen taught that two patients took aspirin, and that vomiting and dizziness occurred after 100 mg/d. There may also be a few patients taking ticagrelor [ Shandong Qinghua pharmacy ]]Bleeding symptoms were later developed. After the named entity identifier 310 identifies the named entity in the text, the sequence labeling result may be: "Sunzhen professor [ PER ]]Has the following componentsTwo patients [ PER]Taken with the medicineAspirin [ MED]Vomiting and dizziness symptoms occurred later. May also haveMinority patients [ PER]Combined administrationTicagrelor [ Shandong Qinghua pharmacy][MED]Bleeding symptoms were later developed. "wherein PER represents a person and MED represents a drug.
The adverse event name identifier 320 may be used to identify the adverse event name in text.
According to embodiments of the present disclosure, the adverse event name identifier 320 may generate a dictionary tree using the international medical term dictionary (Medical Dictionary for Regulatory Activities, medDRA) and then use pattern recognition to find the adverse event name that appears therein.
As an example, after the adverse event name identifier 320 identifies the adverse event name by using the result marked by the sequence, the obtained result may be: professor [ PER ]]There are two patients [ PER]Aspirin [ MED ] is taken]After which takes place. There may also be a small number of patients [ PER]Combined administration of ticagrelor [ Shandong Qinghua pharmacy ]][MED]After occurrence of->. The term "AE" refers to adverse events, and the preceding terms "vomiting and dizziness" and "bleeding" refer to adverse events.
Semantic role identifier 330 may be used to identify semantic roles for adverse events occurring in sentences of text based on the identified named entities and adverse event names.
According to the embodiment of the disclosure, the semantic role identifier 330 may perform semantic role labeling by using an open source model (e.g., BERT-BLSTM-CRF) according to the content in the text to be identified, and find out the corresponding semantic role components of the occurrence of the adverse event in the sentence, including the core semantic roles (e.g., patient, doctor, etc.) and the auxiliary semantic roles (e.g., medicine, AE, etc.).
By way of example, the semantic role identifier 330 may obtain semantic role labels by performing semantic role identification on the name recognition result obtained above, where the result obtained may be that "Sunzhen professor [ PER ] [ A0] has two patients [ PER ] [ A0] who have taken aspirin [ MED ] [ A1] and have experienced vomiting and dizziness symptoms [ AE ] [ A1]. There may also be a small number of patients [ PER ] [ A0] who have bleeding symptoms [ AE ] [ A1] after administering ticagrelor [ Shandong Qinghai pharmaceutical ] [ MED ] [ A1] in combination. "wherein A0 represents a constructor and A1 represents an interview receiver.
Semantic role filter 340 may be configured to filter at least a portion of the roles based on the identified semantic roles and predetermined rules.
According to an embodiment of the present disclosure, the semantic role filter 340 may filter out components that are unlikely to perform roles by formulating rules in advance, where the rules may be heuristic rules, for example, the rules may be conditional clauses (such as even, if, etc.), other fuzzy class statements (such as possible), or rules that only hold specific drug roles, etc., and filter out roles that satisfy the rules, so as to filter out at least a portion of roles.
As an example, semantic role filtering is performed on the obtained semantic role labeling result, and the result does not have a role meeting the rule, so that information of all entities is kept.
As another example, if the text information is such that "there may be also a small number of patients taking ticagrelor [ Shandong Qinghua pharmaceutical production ]", then there is a patient role of "there may also be a small number of patients", which is a fuzzy sentence, and specific patient information cannot be extracted, so that the rule is satisfied and filtered; the drug role is ticagrelor [ Shandong Qinghua pharmaceutical production ], and is not a required specific drug role, so that the drug role is judged to be invalid by rules and is filtered.
The event determiner 350 may be configured to determine whether the text contains an adverse event according to the selected character and the predetermined trigger word.
According to the embodiment of the present disclosure, the event determiner 350 determines that the text includes an adverse event by determining whether the above-screened character and the predetermined trigger word satisfy a preset event triplet, and in case that the preset event triplet is satisfied, the trigger word may be a use, occurrence, or an equal verb, the preset event triplet may be composed of three elements, and the three elements may be a patient, a trigger word, and a drug or an adverse event name, respectively. As an example, the preset event triplet may be "patient-use-drug" or "patient-occurrence-AE".
As an example, the above-mentioned screened roles and the above-mentioned predetermined trigger words are extracted and judged by the designed templates of the above-mentioned event triples, so that the "two cases of patients [ Patient ] -take-aspirin [ Medicine ]" events and "two cases of patients [ Patient ] -take-emesis and dizziness symptom [ AE ]" events are obtained, and the judgment conditions of the preset event triples are satisfied, so that they are judged as AE items.
The second recognition model and the components included therein are described above in connection with fig. 3A, and as can be seen from the foregoing, the process of performing adverse event recognition on the text using the second recognition model may be a process as shown in fig. 3B, and in particular, fig. 3B shows a flowchart of performing adverse event recognition on the text using the second recognition model according to an embodiment of the disclosure.
Referring to fig. 3B, in step S310, named entities in text may be identified using the named entity identifier.
In step S320, the adverse event name in the text may be identified using the adverse event name identifier.
In step S330, the semantic role identifier may be used to identify, according to the identified named entity and the adverse event name, a semantic role of the adverse event occurrence in the sentence of the text;
at step S340, at least a portion of the roles may be filtered out according to the identified semantic roles and predetermined rules using the semantic role filter.
In step S350, the event determiner may determine whether the text includes an adverse event according to the selected character and the predetermined trigger word, where the text includes the adverse event if the selected character and the predetermined trigger word satisfy a preset event triplet.
The details of each step in fig. 3B may be referred to the description of the corresponding parts in fig. 3A, and will not be repeated here.
Because for particularly long text (e.g., chapter-level text), there is often a problem of conflicting or ambiguous references, and thus it is necessary to determine what each reference is, the second recognition model provided in the present disclosure further includes: and a coreference resolver, which may be used to complete coreference resolution in the text, thereby determining an association between the drug and the adverse event.
According to the embodiment of the disclosure, for particularly long texts (such as text with chapter level), the coreference resolver can establish causal relation connection for drug names and drug curative effects, and for reference conflicts in the context, entity coreference resolution is completed based on similarity of entity attributes, so that the association relation between drugs and AE is judged.
By way of example, the particularly long text may be "32 patients in the experimental group take aspirin effervescent tablets, 100mg/d … … … aspirin has an anticoagulant effect, and … … has been confirmed in practical use to cause local bleeding symptoms in 2 patients due to the anticoagulant effect. "wherein, due to anticoagulation, 2 patients develop symptoms of local hemorrhage" lack the subject, and need to be linked to aspirin in the foregoing through anticoagulation. The aspirin is used for replacing anticoagulation, so that the symptom of local bleeding of 2 patients caused by the aspirin can be obtained.
As can be seen from the above, the process of identifying an adverse event in a text using the second identification model may further include a step of completing coreference resolution in the text using the coreference resolver after identifying a semantic character of the occurrence of the adverse event in a sentence of the text using the semantic character identifier according to the identified named entity and the name of the adverse event, for example, the coreference resolution may be included between step S330 and step S340 of fig. 3B.
The above will be described below by way of example in connection with fig. 3C.
Fig. 3C is an example of adverse event recognition of a second type of text using a second recognition model.
Referring to fig. 3C, the text to be identified may be "grand professor had two patients taken aspirin, and had symptoms of vomiting and dizziness occurred after 100 mg/d. There may also be a few patients who have bleeding symptoms after combined administration of ticagrelor [ Shandong Qinghua pharmaceutical ]. "
First, in step S3040, after the named entity identifier identifies the named entity in the text, the sequence labeling result is: "Sunzhen professor [ PER ]]Has the following componentsTwo patients [ PER]Taken with the medicineAspirin [ MED]Vomiting and dizziness symptoms occurred later. May also have Minority patients [ PER]Combined administrationTicagrelor [ Shandong Qinghua pharmacy][MED]Bleeding symptoms were later developed. "
Next, in step 3010, the adverse event name identifier is used to identify the adverse event name of the result marked by the sequence, and the obtained result is: professor [ PER ]]There are two patients [ PER]Aspirin [ MED ] is taken]After which takes place. There may also be a small number of patients [ PER]Combined administration of ticagrelor [ Shandong Qinghua pharmacy ]][MED]After occurrence of->。”
Next, in step S3020, the semantic role identifier is used to perform semantic role identification on the obtained name recognition result, and the result of the semantic role labeling is: "Sunzhen professor [ PER ] [ A0] had two patients [ PER ] [ A0] who had taken aspirin [ MED ] [ A1] and had developed vomiting and dizziness symptoms [ AE ] [ A1]. There may also be a small number of patients [ PER ] [ A0] who have bleeding symptoms [ AE ] [ A1] after administering ticagrelor [ Shandong Qinghai pharmaceutical ] [ MED ] [ A1] in combination. "the result of the semantic role labeling shown in fig. 3D and 3E, in the figure, n represents a noun, nh represents a person name, v represents a verb, m represents a number word, u represents a helper word, c represents a interlinker, wp represents a punctuation, nd represents an adverb, A0 represents a constructor, A1 represents an objective, TMP represents a time, ADV represents an adverb, and DIS represents a session flag.
Since the text of the above example is not chapter level text, there is no coreference resolution step, that is, step S2030 is not required to be performed, and step S2050 is only required to be performed. In step S2050, the text is filtered by the semantic role filter, and since there is no statement satisfying the rule, such as ambiguity, in the example text, no statement is filtered out, and all entity information is kept.
Finally, in step S2060, the example text is extracted by the event determiner to obtain "two cases of Patient [ Patient ] -take-aspirin [ Medicine ]" event and "two cases of Patient [ Patient ] -occur-emesis and dizziness symptom [ AE ]" event, satisfying the judgment condition of the preset event triplet, so it is judged as AE item.
From the foregoing, it will be appreciated that the above example text is free of coreference resolution steps, and that another example will be described below for a better understanding of the present disclosure.
Referring again to fig. 3C, the text to be identified may be "32 patients in the experimental group take aspirin effervescent tablets, 100mg/d … … … aspirin has an anticoagulant effect, … … has been confirmed in actual use. Due to the anticoagulant effect, 2 patients developed symptoms of local bleeding … … ".
First, in step S3040, after the named entity identifier identifies the named entity in the text, the sequence labeling result is: "in the experimental group32 patients [ PER ]]Is taken orallyAspirin effervescent tablet, 100mg/d [ MED ]]………Sauce Pilin [ MED ]]Has anticoagulant effect, and … … has been proved in practical use. Due to anticoagulant effect, result in2 patients [PER]Symptoms … …% of local bleeding are present.
Next, in step 3010, the adverse event name identifier is used to identify the adverse event name of the result marked by the sequence, and the obtained result is: "32 patients in the Experimental group [ PER ]]Aspirin effervescent tablet is taken, 100mg/d [ MED ]]… … … Aspirin [ MED ]]Has anticoagulant effect, and … … has been proved in practical use. Due to anticoagulant effect, 2 patients [ PER ]]Appears to ……”。
Again, in step S3020, the semantic role identifier is used to perform semantic role identification on the obtained name recognition result, and the result of the semantic role labeling is: "32 patients in the experimental group [ PER ] [ A0] take aspirin effervescent tablet, 100mg/d [ MED ] [ A1] … … … aspirin [ MED ] [ A0] has anticoagulant effect [ A1], … … has been confirmed in practical use. Due to the anticoagulant effect, the local bleeding symptoms [ AE ] [ A1] … … ] were seen in 2 patients [ PER ] [ A0 ].
Next, in step S2030, the above text is subjected to coreference resolution using a coreference resolution machine, specifically, "32 patients in the experimental group take aspirin effervescent tablet, 100mg/d … … … aspirin has an anticoagulant effect, and … … has been confirmed in actual use. Local bleeding symptoms "were developed in 2 patients due to anticoagulation, wherein" local bleeding symptoms were developed in 2 patients due to anticoagulation "the drug subject was absent, and the above aspirin was required to be associated with the anticoagulation. The aspirin is used for replacing anticoagulation, so that 'aspirin [ MED ] [ A0] causes local bleeding symptoms [ AE ] [ A1 ]' to appear in 2 patients [ PER ] [ A0 ].
Then, in step S2050, the text is filtered by the semantic role filter, and since there is no sentence satisfying the rule such as ambiguity in the example text, no sentence is filtered out, and all entity information is kept.
Finally, in step S2060, the sample text is extracted by the event determiner to obtain an event of "[ Patient ] -take-aspirin effervescent tablet [ Medicine ]" and an event of "[ Patient ] -causing-2 patients to have a local bleeding symptom [ AE ]" which satisfies the judgment condition of the preset event triplet, so that it is judged as an AE entry.
According to an embodiment of the present disclosure, the adverse event may include at least the following three elements: subjects, causes, and adverse effects, and in the above examples, subjects with drugs with causes with adverse effects after administration of the drugs as adverse effects are exemplified, of course, other variant embodiments according to the embodiments of the present disclosure are also possible, for example, subjects or users who make or use drugs or medical devices with causes with drugs or medical devices with gestational (parent and father source) drug exposure, lactation/lactation drug exposure, drug overdose, drug abuse, misuse, overinstruction use with accompanying adverse events, medication errors, occupational exposure, lack of efficacy and disease progression, exposure to pathogens, drug interactions, medical devices (faults), death of unknown cause, suicide or attempted suicide, and adverse effects brought about by unexpected benefits as adverse effect elements included in the above adverse events are all encompassed within the scope of the present disclosure, and the art can easily obtain adverse event identification results related to the above according to the methods of identifying adverse events disclosed above, here again.
According to the embodiment of the disclosure, the method for identifying the adverse event may further include feeding back an identification result about the adverse event by a predetermined reporter, and uploading the identified specific AE entry to the reporting system by the predetermined reporter when the text to be identified is identified to include the adverse event, where the predetermined reporter may be a person reporting in the reporting system.
The detailed process of identifying the text by using the first identification model, the second identification model and the identification model is described above in conjunction with fig. 2A to 2C and fig. 3A to 3E, respectively, and in addition, the identification effect of the method for identifying adverse events provided by the present disclosure on different data sources achieves an accuracy rate of greater than 50%, a recall rate of greater than 99%, and refer to table 1 below.
TABLE 1
See table 1, where recall represents the ratio of the number of AEs identified in all texts to be identified to the number of all AEs, e.g., of 100 texts to be identified, 4 AEs, if 3 AEs were identified, recall was 75%, if 4 AEs were identified, recall was 100%. The accuracy rate represents the ratio of the number of AEs identified in all texts to be identified as true to the number of all the identified AEs, for example, 4 AEs out of 100 texts to be identified, if 5 AEs are identified, the accuracy rate is 80% indicating that 4 AEs out of 5 AEs are identified.
As can be seen from table 1, the method for identifying adverse events provided by the present disclosure has an identifying effect exceeding the respective target values, and has a very good identifying effect.
The method for identifying the adverse event provided by the present disclosure is described in detail above with reference to fig. 1 to 3E, and as can be seen from the foregoing detailed description, the method for identifying the adverse event provided by the present disclosure can select different identification models according to different text types, and then perform corresponding adverse event identification on the text to be identified according to the selected identification models, without manually performing screening identification, so as to avoid occurrence of missed report/delayed report AE, and timely/accurately report AE, thereby helping companies or hospitals that make or use medicines or medical devices to improve accuracy and timeliness of adverse event report, and achieving purposes such as source saving, efficiency improvement, enabling, etc. of the whole flow of AE identification and reporting. In addition, as can be seen from table 1, the method for identifying adverse events provided by the present disclosure has the effect of completely recovering adverse events of drugs without missing one adverse event. On the basis, the resource investment degree of manual recheck is greatly reduced.
The present disclosure provides an apparatus for identifying an adverse event, in addition to the above-described method for identifying an adverse event, and an apparatus for identifying an adverse event according to an embodiment of the present disclosure will be described with reference to fig. 4.
Fig. 4 illustrates a block diagram of an apparatus 400 for identifying adverse events according to an embodiment of the present disclosure.
Referring to fig. 4, the apparatus 400 for identifying an adverse event may include an acquisition module 410, a selection module 420, and an identification module 430.
The retrieval module 410 may be configured to selectively retrieve text to be recognized from one or more data sources.
By way of example, one or more of the data sources may be from an online published paper, where the published paper is a medical field-related paper in which some adverse events may be described at a location such as a company or hospital that makes or uses a drug or medical device, and which is generally chapter-level, often of a particular length.
As another example, the one or more data sources may be data recorded from a Call Center (Call Center) of a company or hospital or the like that manufactures or uses drugs or medical devices. Typically, a person calls a call center, and a caller in the call center records the information of the call faithfully, and adverse events reported or mentioned by a patient or a client may exist in the information of the call. Typically, the information of the call is relatively short, e.g., several sentences.
As yet another example, the one or more data sources may be data from records of a visit record, wherein the visit record is typically a visit section that a medical representative of an individual pharmaceutical company or the like writes after making a visit with a doctor, and in which a certain adverse event may be mentioned by the doctor. Typically, the content of the data in the call record is relatively modest, intermediate to the two sources.
The selection module 420 may be configured to select a recognition model corresponding to the type of text according to the type of text.
According to embodiments of the present disclosure, the type of text may be determined based on the length of the text and/or the source of the text. The text types may be a first type and a second type.
According to an embodiment of the present disclosure, the recognition model may be a first recognition model and a second recognition model.
According to an embodiment of the present disclosure, in a case where the type of the text is a first type, a first recognition model corresponding to the first type is selected.
According to an embodiment of the present disclosure, in case the type of the text is a second type, a second recognition model corresponding to the second type is selected.
The recognition module 430 may be configured to perform semantic recognition on the text using the selected recognition model to identify adverse events in the text.
According to embodiments of the present disclosure, semantic recognition may be performed on the text using the selected first recognition model to identify adverse events in the text.
According to embodiments of the present disclosure, semantic recognition may be performed on the text using the selected second recognition model to identify adverse events in the text.
According to an embodiment of the present disclosure, the first recognition model may include a word segmenter, a converter, a feature extractor, and a classifier.
The word segmenter may be used to segment sentences in the text, and the word segmenter may include a first word segmenter that may be used to segment the text word by word and a second word segmenter that may be used to segment the text word by word.
The converter may be configured to convert the word segmentation result into a vector sequence, and the converter may include a first converter configured to convert the word segmentation result of the first word segmentation unit into a word vector sequence, and a second converter configured to convert the word segmentation result of the second word segmentation unit into a word vector sequence.
A feature extractor may be used to extract semantic features based on the vector sequence.
The classifier may be used to determine whether adverse events are contained in the text based on the extracted semantic features.
According to an embodiment of the present disclosure, the second recognition model may include a named entity identifier, an adverse event name identifier, a semantic role filter, and an event determiner.
According to an embodiment of the present disclosure, in a case where the type of the text is a first type, the recognition module may include: word-by-word segmentation is carried out on sentences in the text by utilizing the first word segmentation device, and word-by-word segmentation is carried out on sentences in the text by utilizing the second word segmentation device; converting the word segmentation result of the first word segmentation device into a word vector sequence by using the first converter, and converting the word segmentation result of the second word segmentation device into a word vector sequence by using the second converter; extracting semantic features based on the word vector sequence and the word vector sequence with the feature extractor; judging whether the text contains the adverse event or not based on the extracted semantic features by using the classifier, wherein the text is determined to contain the adverse event if the probability of occurrence of the adverse event in the text is larger than a preset threshold value, and the preset threshold value can be 50% or 60% equivalent.
According to an embodiment of the present disclosure, in a case where the type of the text is the second type, the recognition module may include: identifying a named entity in the text by using the named entity identifier; identifying an adverse event name in a text by using the adverse event name identifier; identifying semantic roles of adverse events in sentences of the text according to the identified named entities and the adverse event names by utilizing the semantic role identifier; screening at least one part of roles according to the identified semantic roles and a preset rule by utilizing the semantic role screening device; and determining whether the text contains the adverse event or not according to the screened role and the preset trigger word by using the event determiner, wherein the text is determined to contain the adverse event under the condition that the screened role and the preset trigger word meet the preset event triplet.
According to an embodiment of the present disclosure, the second recognition model further includes: and a coreference resolution unit configured to complete coreference resolution in the text, thereby determining an association relationship between the drug and the adverse event, wherein the identifying the adverse event in the text using the selected identification model further includes: after identifying the semantic roles of the adverse events in the sentences of the text according to the identified named entities and the adverse event names by utilizing a semantic role identifier, utilizing the coreference resolution device to complete coreference resolution in the text.
Since details of the foregoing operations are described in the course of describing the method for identifying adverse events according to the present disclosure, details thereof will not be repeated herein for brevity, and reference may be made to the descriptions above with respect to fig. 1 to 3E.
Methods and apparatuses for identifying adverse events according to disclosed embodiments have been described above with reference to fig. 1 through 4. However, it should be understood that: the various modules in the apparatus shown in fig. 4 may be configured as software, hardware, firmware, or any combination thereof, respectively, that perform specific functions. For example, these modules may correspond to application specific integrated circuits, to pure software code, or to a combination of software and hardware. By way of example, the device described with reference to fig. 4 may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing program instructions, but is not limited thereto.
It should be noted that, although the apparatus 400 for identifying an adverse event is described as being divided into modules for respectively performing the corresponding processes, it is apparent to those skilled in the art that the processes performed by the modules may be performed without any specific division of the modules or without explicit demarcation between the modules. Furthermore, the apparatus described above with reference to fig. 4 is not limited to include the above-described modules, but some other modules (e.g., a memory module, a data processing module, etc.) may be added as needed, or the above modules may be combined as well.
Further, the method of identifying adverse events according to the present disclosure may be recorded in a computer-readable recording medium. In particular, according to the present disclosure, a computer-readable recording medium storing computer-executable instructions that, when executed by a processor, cause the processor to perform the method of identifying adverse events as described above may be provided. Examples of the computer-readable recording medium may include magnetic media (e.g., hard disk, floppy disk, and magnetic tape); optical media (e.g., CD-ROM and DVD); magneto-optical media (e.g., optical disks); and hardware devices that are specially configured to store and perform program instructions (e.g., read-only memory (ROM), random Access Memory (RAM), flash memory, etc.). Further, in accordance with the present disclosure, there may be provided an apparatus comprising a processor and a memory having stored therein computer executable instructions, wherein the computer executable instructions, when executed by the processor, cause the processor to perform the method of identifying adverse events as described above. Examples of computer-executable instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
In addition, some operations in the method for identifying adverse events according to the present disclosure may be implemented in software, some operations may be implemented in hardware, and furthermore, the operations may be implemented in a combination of software and hardware.
It is noted that the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises at least one executable instruction for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In general, the various example embodiments of the disclosure may be implemented in hardware or special purpose circuits, software, firmware, logic, or any combination thereof. Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While aspects of the embodiments of the present disclosure are illustrated or described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The exemplary embodiments of the present disclosure described in detail above are illustrative only and are not limiting. Those skilled in the art will understand that various modifications and combinations of these embodiments or features thereof may be made without departing from the principles and spirit of the disclosure, and such modifications should fall within the scope of the disclosure.

Claims (21)

1. A method of identifying an adverse event, comprising:
selectively retrieving text to be identified from one or more data sources;
Selecting an identification model corresponding to the type of the text according to the type of the text; and
performing semantic recognition on the text using the selected recognition model to identify adverse events in the text,
wherein the selecting, according to the type of the text, a recognition model corresponding to the type of the text includes:
in the case that the type of the text is a second type, selecting a second recognition model, wherein the second recognition model comprises: the system comprises a named entity identifier, an adverse event name identifier, a semantic role filter and an event determiner, wherein the named entity identifier is used for identifying a named entity in a text, the adverse event name identifier is used for identifying an adverse event name in the text, the semantic role identifier is used for identifying a semantic role of an adverse event occurrence in a sentence of the text according to the identified named entity and the adverse event name, the semantic role filter is used for filtering out at least one part of roles according to the identified semantic roles and a preset rule, and the event determiner is used for determining whether the text contains the adverse event according to the screened roles and the preset trigger word.
2. The method of claim 1, wherein the type of text is determined based on a length of the text and/or a source of the text.
3. The method of claim 2, wherein the selecting the recognition model corresponding to the type of text according to the type of text further comprises:
in the case that the type of the text is a first type, selecting a first recognition model, wherein the first recognition model comprises: the device comprises a word segmentation device, a converter, a feature extractor and a classifier, wherein the word segmentation device is used for segmenting sentences in the text, the converter is used for converting word segmentation results into vector sequences, the feature extractor is used for extracting semantic features based on the vector sequences, and the classifier is used for judging whether the text contains adverse events or not based on the extracted semantic features.
4. A method as claimed in claim 3, wherein the word segmenter comprises a first word segmenter for segmenting the text word by word and a second word segmenter for segmenting the text word by word;
the converter comprises a first converter and a second converter, wherein the first converter is used for converting word segmentation results of the first word segmentation device into word vector sequences, and the second converter is used for converting word segmentation results of the second word segmentation device into word vector sequences.
5. The method of claim 4, wherein the second word splitter is configured to word the text word by word, comprising:
and generating a directed acyclic graph of all word segmentation conditions in sentences in the text according to dictionary trees generated by the universal dictionary and the domain professional dictionary, thereby realizing word segmentation of the text word by word.
6. The method of claim 4, wherein the performing semantic recognition on the text to identify adverse events in the text using the selected recognition model comprises:
word-by-word segmentation is carried out on sentences in the text by utilizing the first word segmentation device, and word-by-word segmentation is carried out on sentences in the text by utilizing the second word segmentation device;
converting the word segmentation result of the first word segmentation device into a word vector sequence by using the first converter, and converting the word segmentation result of the second word segmentation device into a word vector sequence by using the second converter;
extracting semantic features based on the word vector sequence and the word vector sequence with the feature extractor;
judging whether the text contains adverse events or not based on the extracted semantic features by using the classifier,
and determining that the text contains the adverse event under the condition that the probability of the adverse event in the text is larger than a preset threshold value.
7. The method of claim 1, wherein the performing semantic recognition on the text to identify adverse events in the text using the selected recognition model comprises:
identifying a named entity in the text by using the named entity identifier;
identifying an adverse event name in a text by using the adverse event name identifier;
identifying semantic roles of adverse events in sentences of the text according to the identified named entities and the adverse event names by utilizing the semantic role identifier;
screening at least one part of roles according to the identified semantic roles and a preset rule by utilizing the semantic role screening device; and
determining whether the text contains adverse events or not according to the screened roles and the preset trigger words by using the event determiner,
and determining that the text contains adverse events under the condition that the screened roles and the preset trigger words meet the preset event triples.
8. The method of claim 7, wherein the second recognition model further comprises: a coreference resolution device for completing coreference resolution in the text, thereby determining the association relationship between the medicine and the adverse event,
wherein, the identifying adverse events in the text by using the selected identification model further comprises:
After identifying the semantic roles of the adverse events in the sentences of the text according to the identified named entities and the adverse event names by utilizing a semantic role identifier, utilizing the coreference resolution device to complete coreference resolution in the text.
9. The method of any one of claims 1 to 8, wherein the adverse event comprises at least three elements: subjects, causes, and adverse outcomes.
10. The method of claim 9, further comprising: and feeding back the identification result about the adverse event through a preset reporter.
11. The method of claim 1, wherein the identification model is a medical domain identification model and the adverse event is a medical domain adverse event.
12. An apparatus for identifying adverse events, comprising:
an acquisition module configured to selectively acquire text to be identified from one or more data sources;
a selection module configured to select an identification model corresponding to a type of the text according to the type of the text; and
a recognition module configured to perform semantic recognition on the text using the selected recognition model to identify adverse events in the text,
wherein the selection module comprises:
In the case that the type of the text is a second type, selecting a second recognition model, wherein the second recognition model comprises: the system comprises a named entity identifier, an adverse event name identifier, a semantic role filter and an event determiner, wherein the named entity identifier is used for identifying a named entity in a text, the adverse event name identifier is used for identifying an adverse event name in the text, the semantic role identifier is used for identifying a semantic role of an adverse event occurrence in a sentence of the text according to the identified named entity and the adverse event name, the semantic role filter is used for filtering out at least one part of roles according to the identified semantic roles and a preset rule, and the event determiner is used for determining whether the text contains the adverse event according to the screened roles and the preset trigger word.
13. The apparatus of claim 12, wherein the type of text is determined based on a length of the text and/or a source of the text.
14. The apparatus of claim 13, wherein the selection module further comprises:
in the case that the type of the text is a first type, selecting a first recognition model, wherein the first recognition model comprises: the device comprises a word segmentation device, a converter, a feature extractor and a classifier, wherein the word segmentation device is used for segmenting sentences in the text, the converter is used for converting word segmentation results into vector sequences, the feature extractor is used for extracting semantic features based on the vector sequences, and the classifier is used for judging whether the text contains adverse events or not based on the extracted semantic features.
15. The apparatus of claim 14, wherein the word segmenter comprises a first word segmenter for segmenting the text word by word and a second word segmenter for segmenting the text word by word;
the converter comprises a first converter and a second converter, wherein the first converter is used for converting word segmentation results of the first word segmentation device into word vector sequences, and the second converter is used for converting word segmentation results of the second word segmentation device into word vector sequences.
16. The apparatus of claim 15, wherein the second word splitter is configured to word the text word by word, comprising:
and generating a directed acyclic graph of all word segmentation conditions in sentences in the text according to dictionary trees generated by the universal dictionary and the domain professional dictionary, thereby realizing word segmentation of the text word by word.
17. The apparatus of claim 15, wherein the identification module comprises:
word-by-word segmentation is carried out on sentences in the text by utilizing the first word segmentation device, and word-by-word segmentation is carried out on sentences in the text by utilizing the second word segmentation device;
converting the word segmentation result of the first word segmentation device into a word vector sequence by using the first converter, and converting the word segmentation result of the second word segmentation device into a word vector sequence by using the second converter;
Extracting semantic features based on the word vector sequence and the word vector sequence with the feature extractor;
judging whether the text contains adverse events or not based on the extracted semantic features by using the classifier,
and determining that the text contains the adverse event under the condition that the probability of the adverse event in the text is larger than a preset threshold value.
18. The apparatus of claim 12, wherein the identification module comprises:
identifying a named entity in the text by using the named entity identifier;
identifying an adverse event name in a text by using the adverse event name identifier;
identifying semantic roles of adverse events in sentences of the text according to the identified named entities and the adverse event names by utilizing the semantic role identifier;
screening at least one part of roles according to the identified semantic roles and a preset rule by utilizing the semantic role screening device; and
determining whether the text contains adverse events or not according to the screened roles and the preset trigger words by using the event determiner,
and determining that the text contains adverse events under the condition that the screened roles and the preset trigger words meet the preset event triples.
19. The apparatus of claim 18, wherein the second recognition model further comprises: a coreference resolution device for completing coreference resolution in the text, thereby determining the association relationship between the medicine and the adverse event,
wherein, the identifying adverse events in the text by using the selected identification model further comprises:
after identifying the semantic roles of the adverse events in the sentences of the text according to the identified named entities and the adverse event names by utilizing a semantic role identifier, utilizing the coreference resolution device to complete coreference resolution in the text.
20. An apparatus for identifying adverse events, comprising:
a processor, and
a memory storing computer-executable instructions that, when executed by a processor, cause the processor to perform the method of any of claims 1-11.
21. A computer readable recording medium storing computer executable instructions, wherein the computer executable instructions when executed by a processor cause the processor to perform the method of any one of claims 1-11.
CN202110065632.6A 2021-01-18 2021-01-18 Method, device, equipment and medium for identifying adverse event Active CN112766903B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110065632.6A CN112766903B (en) 2021-01-18 2021-01-18 Method, device, equipment and medium for identifying adverse event

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110065632.6A CN112766903B (en) 2021-01-18 2021-01-18 Method, device, equipment and medium for identifying adverse event

Publications (2)

Publication Number Publication Date
CN112766903A CN112766903A (en) 2021-05-07
CN112766903B true CN112766903B (en) 2024-02-06

Family

ID=75702951

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110065632.6A Active CN112766903B (en) 2021-01-18 2021-01-18 Method, device, equipment and medium for identifying adverse event

Country Status (1)

Country Link
CN (1) CN112766903B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122416A (en) * 2017-03-31 2017-09-01 北京大学 A kind of Chinese event abstracting method
CN108231059A (en) * 2017-11-27 2018-06-29 北京搜狗科技发展有限公司 Treating method and apparatus, the device for processing
CN109582949A (en) * 2018-09-14 2019-04-05 阿里巴巴集团控股有限公司 Event element abstracting method, calculates equipment and storage medium at device
CN109657158A (en) * 2018-11-29 2019-04-19 山西大学 A kind of adverse drug events information extracting method based on social network data
CN109670174A (en) * 2018-12-14 2019-04-23 腾讯科技(深圳)有限公司 A kind of training method and device of event recognition model
CN110597994A (en) * 2019-09-17 2019-12-20 北京百度网讯科技有限公司 Event element identification method and device
CN111669757A (en) * 2020-06-15 2020-09-15 国家计算机网络与信息安全管理中心 Terminal fraud call identification method based on conversation text word vector
CN112015901A (en) * 2020-09-08 2020-12-01 迪爱斯信息技术股份有限公司 Text classification method and device and warning situation analysis system
CN112131882A (en) * 2020-09-30 2020-12-25 绿盟科技集团股份有限公司 Multi-source heterogeneous network security knowledge graph construction method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122416A (en) * 2017-03-31 2017-09-01 北京大学 A kind of Chinese event abstracting method
CN108231059A (en) * 2017-11-27 2018-06-29 北京搜狗科技发展有限公司 Treating method and apparatus, the device for processing
CN109582949A (en) * 2018-09-14 2019-04-05 阿里巴巴集团控股有限公司 Event element abstracting method, calculates equipment and storage medium at device
CN109657158A (en) * 2018-11-29 2019-04-19 山西大学 A kind of adverse drug events information extracting method based on social network data
CN109670174A (en) * 2018-12-14 2019-04-23 腾讯科技(深圳)有限公司 A kind of training method and device of event recognition model
CN110597994A (en) * 2019-09-17 2019-12-20 北京百度网讯科技有限公司 Event element identification method and device
CN111669757A (en) * 2020-06-15 2020-09-15 国家计算机网络与信息安全管理中心 Terminal fraud call identification method based on conversation text word vector
CN112015901A (en) * 2020-09-08 2020-12-01 迪爱斯信息技术股份有限公司 Text classification method and device and warning situation analysis system
CN112131882A (en) * 2020-09-30 2020-12-25 绿盟科技集团股份有限公司 Multi-source heterogeneous network security knowledge graph construction method and device

Also Published As

Publication number Publication date
CN112766903A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
CN106919793B (en) Data standardization processing method and device for medical big data
Ferraresi et al. Introducing and evaluating ukWaC, a very large web-derived corpus of English
Mintz et al. Distant supervision for relation extraction without labeled data
CN110427491B (en) Medical knowledge graph construction method and device based on electronic medical record
CN109657158B (en) Medicine adverse event information extraction method based on social network data
WO2021030915A1 (en) Systems and methods for extracting information from a dialogue
CN110032728B (en) Conversion method and device for disease name standardization
CN112241457A (en) Event detection method for event of affair knowledge graph fused with extension features
CN111986793B (en) Diagnosis guiding processing method and device based on artificial intelligence, computer equipment and medium
CN104298714A (en) Automatic massive-text labeling method based on exception handling
Goryachev et al. Implementation and evaluation of four different methods of negation detection
Gaur et al. “Who can help me?”: Knowledge Infused Matching of Support Seekers and Support Providers during COVID-19 on Reddit
Dornescu et al. Relative clause extraction for syntactic simplification
Doan et al. Using natural language processing to extract health-related causality from Twitter messages
Stent et al. Interaction between dialog structure and coreference resolution
Vu et al. Identifying patients with pain in emergency departments using conventional machine learning and deep learning
Pal et al. Anubhuti--An annotated dataset for emotional analysis of Bengali short stories
Xu et al. Extracting subject demographic information from abstracts of randomized clinical trial reports
CN112699669A (en) Natural language processing, device and storage medium for fluid pathology survey report
CN112766903B (en) Method, device, equipment and medium for identifying adverse event
Müller Fully automatic resolution of'it','this', and'that'in unrestricted multi-party dialog
Schilder et al. Temporal Information Extraction for Temporal Question Answering.
Danso et al. A semantically annotated Verbal Autopsy corpus for automatic analysis of cause of death
Sosea et al. Unsupervised extractive summarization of emotion triggers
CN114334049B (en) Method, device and equipment for structuring electronic medical record

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant