CN110795941A - Named entity identification method and system based on external knowledge and electronic equipment - Google Patents

Named entity identification method and system based on external knowledge and electronic equipment Download PDF

Info

Publication number
CN110795941A
CN110795941A CN201911034091.XA CN201911034091A CN110795941A CN 110795941 A CN110795941 A CN 110795941A CN 201911034091 A CN201911034091 A CN 201911034091A CN 110795941 A CN110795941 A CN 110795941A
Authority
CN
China
Prior art keywords
initial
named entity
named
external knowledge
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911034091.XA
Other languages
Chinese (zh)
Other versions
CN110795941B (en
Inventor
宋思睿
宋彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Innovation Workshop (guangzhou) Artificial Intelligence Research Co Ltd
Original Assignee
Innovation Workshop (guangzhou) Artificial Intelligence Research Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innovation Workshop (guangzhou) Artificial Intelligence Research Co Ltd filed Critical Innovation Workshop (guangzhou) Artificial Intelligence Research Co Ltd
Priority to CN201911034091.XA priority Critical patent/CN110795941B/en
Publication of CN110795941A publication Critical patent/CN110795941A/en
Application granted granted Critical
Publication of CN110795941B publication Critical patent/CN110795941B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Character Discrimination (AREA)

Abstract

The invention provides a named entity identification method, a named entity identification system and electronic equipment based on external knowledge.

Description

Named entity identification method and system based on external knowledge and electronic equipment
[ technical field ] A method for producing a semiconductor device
The invention relates to the field of named entity identification, in particular to a named entity identification method and system based on external knowledge and electronic equipment.
[ background of the invention ]
Named Entity (NE), meaning an entity object existing in reality, a phrase consisting of a word or several times; for example, "beijing haichi district" is a named entity, which means a real entity location.
Named Entity Recognition (NER), which is to input a text, requires detecting that the text contains all named entities, and classifies the detected named entities. In general, named entities can be classified into names of people, places, organizations, etc., and in specific fields such as medical, financial text, named entities can have different classification methods, such as protein names, DNA names; company name, job name, etc.
However, in the existing named entity recognition method, the influence of external knowledge is ignored, a result of wrong recognition is caused, and the recognition accuracy is low.
[ summary of the invention ]
In order to overcome the problem that the existing named entity identification method based on external knowledge is low in identification accuracy, the invention provides a named entity identification method based on external knowledge, a named entity identification system based on external knowledge and electronic equipment.
In order to solve the technical problems, the invention provides a technical scheme as follows: a named entity recognition method based on external knowledge comprises the following steps of S1: obtaining at least one text, wherein the text contains at least one word arranged in sequence, and identifying a plurality of initial named entities obtained by a plurality of words in the text; step S2: acquiring at least one external knowledge database, inputting every two initial named entities into the external knowledge database, and acquiring initial relationship vectors corresponding to the two initial named entities; and step S3: and obtaining a final real named entity based on the initial relation vector and the initial named entities, and identifying the named entity type corresponding to the real named entity.
Preferably, the step S3 specifically includes the following steps: step S31: obtaining a temporary weight between each two initial named entities based on the initial relationship vector; step S32: obtaining a final weight between every two initial named entities relative to all the initial named entities based on a plurality of temporary weights; and step S33: and obtaining an external knowledge vector corresponding to each initial named entity relative to all initial named entities based on the final weight, obtaining a final real named entity based on the external knowledge vector, and identifying a named entity category corresponding to the real named entity.
Preferably, the step S33 specifically includes the following steps: step S331: based on the final weight, obtaining an external knowledge vector corresponding to each initial named entity relative to all initial named entities; step S332: judging whether the initial named entity is a real named entity or not based on the word vector corresponding to the initial named entity and the corresponding external knowledge vector, if so, entering a step S333, and if not, entering a step S334; step S333: obtaining a final real named entity, and identifying a named entity type corresponding to the real named entity; and step S334: the current initial named entity is deleted, and the next initial named entity is selected and then the process returns to the step S332.
Preferably, the step S1 specifically includes the following steps: step S11: obtaining at least one text, wherein the text contains at least one character which is arranged in sequence; and step S12: and sequentially predicting the prediction label of each word, and sequentially combining a plurality of corresponding words into a corresponding initial named entity according to the prediction labels.
Preferably, in step S1, the words include chinese words and/or english words.
Preferably, in the step S12, the prediction threshold is decreased to increase the number of the identified initial named entities when predicting the prediction tag of each word.
The invention also provides a named entity recognition system based on external knowledge, which comprises: the system comprises an initial identification unit, a first naming unit and a second naming unit, wherein the initial identification unit is used for acquiring at least one text, the text contains at least one word which is sequentially arranged, and a plurality of initial named entities obtained by identifying a plurality of words in the text are identified; the external knowledge embedding unit is used for acquiring at least one external knowledge database, inputting every two initial named entities into the external knowledge database and acquiring initial relation vectors corresponding to the two initial named entities; and the named entity identification unit is used for obtaining a final real named entity based on the initial relation vector and the initial named entities and identifying the named entity type corresponding to the real named entity.
Preferably, the named entity identifying unit further comprises: a temporary weight calculation unit, configured to obtain a temporary weight between each two initial named entities based on the initial relationship vector; a final weight calculation unit, configured to obtain a final weight between each two of the initial named entities with respect to all of the initial named entities based on a plurality of temporary weights; and the named entity distinguishing unit is used for obtaining an external knowledge vector corresponding to each initial named entity relative to all the initial named entities based on the final weight, obtaining a final real named entity based on the external knowledge vector, and identifying the named entity type corresponding to the real named entity.
Preferably, the initial identifying unit further includes: the text acquisition unit is used for acquiring at least one text, and the text contains at least one character which is arranged in sequence; and the label prediction unit is used for sequentially predicting the prediction label of each word and sequentially combining a plurality of corresponding words into a corresponding initial named entity according to the prediction labels.
The invention also provides an electronic device comprising a memory and a processor, the memory having stored therein a computer program arranged to execute the external knowledge based named entity recognition method of any of the above when run; the processor is arranged to execute the external knowledge based named entity recognition method of any of the above by means of the computer program.
Compared with the prior art, the named entity identification method, the named entity identification system and the electronic equipment based on the external knowledge have the following advantages that:
1. the method comprises the steps of preliminarily identifying a plurality of initial named entities consisting of a plurality of words by obtaining at least one text, obtaining an initial relation vector between every two initial named entities based on the initial named entities and an external knowledge database, obtaining real named entities based on the initial relation vector and the initial named entities, helping the named entities to be identified by introducing information from the external knowledge database, enabling the initial named entities to be further judged after preliminary identification by combining external knowledge, further screening and obtaining the real named entities based on the external knowledge information in the initial named entities, and improving the accuracy of named entity identification.
2. The method comprises the steps of obtaining temporary weights between every two initial named entities through calculation according to initial relation vectors, obtaining final weights between every two initial named entities relative to all the initial named entities based on the temporary weights to integrate the temporary weights, and finally obtaining external knowledge vectors corresponding to the initial named entities through the final weights to judge final real named entities.
3. The real named entities are sequentially reserved, the initial named entities of the non-real named entities are deleted, all the real named entities in the text are finally obtained, the missing report rate of the named entities is reduced, and the accuracy and the recall degree of the named entity recognition are improved.
4. The characters in the text comprise Chinese characters and/or English words, so that the named entity recognition method based on external knowledge provided by the invention can adapt to characters of various different languages, and has higher adaptability.
5. When the prediction label of each word is predicted, the prediction threshold value is reduced to increase the number of the initial named entities identified in the same text, namely, the prediction threshold value is reduced, so that when the same text is predicted, the initial named entities as many as possible are identified as candidates of the real named entities on the basis of the lower prediction threshold value, the missing report caused by the higher threshold value is prevented, and the identification accuracy is improved.
[ description of the drawings ]
Fig. 1 is a flowchart of a named entity recognition method based on external knowledge according to a first embodiment of the present invention.
Fig. 2 is a detailed flowchart of step S1 in the external knowledge-based named entity recognition method according to the first embodiment of the present invention.
Fig. 3 is a detailed flowchart of step S3 in the external knowledge-based named entity recognition method according to the first embodiment of the present invention.
Fig. 4 is a detailed flowchart of step S33 in the external knowledge-based named entity recognition method according to the first embodiment of the present invention.
Fig. 5 is a block diagram of a named entity recognition system based on external knowledge according to a second embodiment of the present invention.
Fig. 6 is a block diagram of an initial recognition unit in a named entity recognition system based on external knowledge according to a second embodiment of the present invention.
Fig. 7 is a block diagram of a named entity recognition unit in a named entity recognition system based on external knowledge according to a second embodiment of the present invention.
Fig. 8 is a block diagram of an electronic device according to a third embodiment of the invention.
Description of reference numerals:
1. an initial identification unit; 2. an external knowledge embedding unit; 3. a named entity recognition unit;
11. a text acquisition unit; 12. a label prediction unit; 31. a temporary weight calculation unit; 32. a final weight calculation unit; 33. a named entity discriminating unit;
10. a memory; 20. a processor;
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, a first embodiment of the present invention provides a named entity recognition method based on external knowledge, including the following steps:
step S1: the method comprises the steps of obtaining at least one text, wherein the text contains at least one word which is sequentially arranged, and identifying a plurality of initial named entities obtained by a plurality of words in the text.
It is understood that in step S1, the text includes at least one word in a sequential order, the word includes one or more combinations of chinese characters and/or english words and/or characters of other countries, and a plurality of named entities are included in a plurality of the words. In the present embodiment, the words are described as chinese characters, but the present invention is not limited to the words.
It is understood that in step S1, a plurality of initial named entities composed of a plurality of words in the text may be identified by any one of a rule-based method, a feature template-based method (e.g. generating model HMM, discriminant model CRF, etc.) or a neural network-based method (e.g. CNN, RNN, etc.), which is exemplified by the word BiLSTM-CRF model.
In step S1, the obtained plurality of initial named entities are identified as the preliminarily identified named entities, and among all the initial named entities, there is also a named entity which is incorrectly identified or fails in identification (i.e., missing identification). For example, in the present embodiment, the input text "science and five great lakes star coming to beijing", the initial named entities are "science", "five great lakes", "lake", "beijing", and it is known that "five great lakes" is not a true named entity in this text.
Step S2: and acquiring at least one external knowledge database, inputting every two initial named entities into the external knowledge database, and acquiring initial relationship vectors corresponding to the two initial named entities.
It is to be understood that, in step S2, the external knowledge database is a preset database, and the external knowledge database at least contains a relationship vector between a plurality of named entities in the input text, the relationship vector is used for representing a real relationship between two named entities, and a calculation may be performed based on the relationship vector to determine whether the initial named entity is a real named entity.
It is to be appreciated that in step S2, when every two initial named entities are input, the external knowledge database outputs an initial relationship vector based on the two initial named entities input.
Step S3: and obtaining a final real named entity based on the initial relation vector and the initial named entities, and identifying the named entity type corresponding to the real named entity.
It is to be understood that, in step S3, a calculation is performed through the initial relationship vector and the plurality of initial named entities to determine a real named entity in the plurality of initial named entities. For example, three real named entities of "science", "lake person" and "beijing" are identified from four initial named entities of "science", "five lakes", "lake person" and "beijing", and the "five lakes" are excluded from the text without belonging to the real named entities, so that the accuracy of the judgment of the named entities in the input text is improved.
It can be understood that, in step S3, the real named entities are classified to obtain the category of each real named entity, so that the user can distinguish the identified category and the domain of the real named entity, the user can understand the text content conveniently, and the accuracy of identifying the named entities in the input text is improved.
Referring to fig. 2, step S1: the method comprises the steps of obtaining at least one text, wherein the text contains at least one word which is sequentially arranged, and identifying a plurality of initial named entities obtained by a plurality of words in the text. The step S1 specifically includes steps S11 to S12:
step S11: obtaining at least one text, wherein the text contains at least one character which is arranged in sequence; and
step S12: and sequentially predicting the prediction label of each word, and sequentially combining a plurality of corresponding words into a corresponding initial named entity according to the prediction labels.
It is to be understood that in step S12, by predicting the prediction tag of each word in turn, the attribute of the corresponding word is identified based on the prediction tag, for example, in the text "science and five lake man globalstar coming to beijing", identifying the word "science" identifies the tag as B-PER, identifying the adjacent word "ratio" identifies the tag as E-PER, identifies an initial named entity "science".
It is understood that the above prediction tag has several expressions of BIES, BIO, etc., where B represents that the word (word) is the beginning (Begin) of an NE, I represents that the word (word) is in the middle (Inside) of an NE, O represents that the word (word) does not belong to an NE (out), E represents that the word (word) belongs to the End (End) of an NE, and S represents that the word (word) is a single NE (single). The NE tag will often carry a NE category, such as B-LOC, where B-PER represents the beginning of a place name and the beginning of a person name, respectively.
Alternatively, in step S12, by decreasing the prediction threshold when predicting the prediction tag of each word, so as to increase the number of the initial named entities identified in the same text, that is, by decreasing the prediction threshold, so that when predicting the same text, as many initial named entities as possible are identified as candidates for the real named entities based on a lower prediction threshold, missing report due to a higher threshold is prevented, and the identification accuracy is improved.
It is understood that steps S11-S12 are only one embodiment of this example, and the embodiment is not limited to steps S11-S12.
Referring to fig. 3, step S3: and obtaining a final real named entity based on the initial relation vector and the initial named entities, and identifying the named entity type corresponding to the real named entity. The step S3 specifically includes steps S31 to S33:
step S31: obtaining a temporary weight between each two initial named entities based on the initial relationship vector;
step S32: obtaining a final weight between every two initial named entities relative to all the initial named entities based on a plurality of temporary weights; and
step S33: and obtaining an external knowledge vector corresponding to each initial named entity relative to all initial named entities based on the final weight, obtaining a final real named entity based on the external knowledge vector, and identifying a named entity category corresponding to the real named entity.
It is to be understood that in step S31, the two initial named entities and the corresponding initial relationship vectors may be input into a feedforward neural network trained based on temporal weights, and the feedforward neural network calculates temporal weights of the two initial named entities that are input.
It is to be appreciated that in step S31, the temporal weight may characterize a relationship weight between the two initial named entities of the current input in an external knowledge database. For example, in the present embodiment, the temporary weights between any two initial named entities such as "kobi" and "beijing", "kobi" and "great lakes", "kobi" and "lake man" can be obtained through step S31.
It is understood that in step S32, the temporal weight between any two initial named entities can be obtained through step S31, and then through the following formula:
Figure BDA0002248971420000081
wherein i and j are word vector sequence numbers corresponding to two different initial named entities, wi,jAs final weights for the two initial named entities, βi,jFor temporal weights, e is the base of the natural logarithm.
Based on the above formula, a final weight between every two initial named entities relative to all the initial named entities can be calculated, the final weight represents the weight size between the two initial named entities relative to other initial named entities, and the larger the final weight is, the stronger the relationship between the two initial named entities relative to other initial named entities is, that is, the more compact the relationship is. For example, in the present embodiment, by step S32, final weights of "science" and "beijing", "science" and "great lakes", "science" and "lakemen" are calculated, and the final weights represent different proportions of a plurality of temporary weights, for example, the final weight of "science" and "lakemen" is the highest, the final weight of "science" and "beijing" is the next to the final weight of "science" and "great lakes", and it is seen that the initial named entity combination of "science" and "great lakes" has the lowest weight in the text, but it cannot be distinguished that either of "science" and "great lakes" is a real named entity.
It is understood that, in step S33, the following formula is used:
eki=∑(wi,j*ki,j)
wherein, ekiIs an external knowledge vector, k, corresponding to an initial named entityi,jAn initial relationship vector for two initial named entities in the external knowledge database.
Obtaining an external knowledge vector ek corresponding to an initial named entity based on the formulaiThe external knowledge vector characterizes a final knowledge representation of an initial named entity after considering all other initial named entities and consulting all relevant external knowledge, and whether the corresponding initial named entity is a real named entity can be judged based on the external knowledge vector.
Further, the word vector of the initial named entity and the corresponding external knowledge vector are input into a feed-forward neural network based on real named entity recognition training, and the feed-forward neural network judges the input word vector of the initial named entity and the corresponding external knowledge vector to select and obtain the final real named entity. For example, in this embodiment, since the final weights of "great lakes" and "science," Beijing "and" lake man "are all low, the feedforward neural network discriminates that the" great lakes "are not true named entities and excludes them, and then outputs three true named entities" science, "Beijing" and "lake man" and classifies them, where "science" is a name of a person, "Beijing" is a name of a place, and "lake man" is a name of a team.
It is understood that steps S31-S33 are only one embodiment of this example, and the embodiment is not limited to steps S31-S33.
Referring to fig. 4, step S33: and obtaining an external knowledge vector corresponding to each initial named entity relative to all initial named entities based on the final weight, obtaining a final real named entity based on the external knowledge vector, and identifying a named entity category corresponding to the real named entity. Step S33 specifically includes steps S331 to S334:
step S331: based on the final weight, obtaining an external knowledge vector corresponding to each initial named entity relative to all initial named entities;
step S332: judging whether the initial named entity is a real named entity or not based on the word vector corresponding to the initial named entity and the corresponding external knowledge vector, if so, entering a step S333, and if not, entering a step S334;
step S333: obtaining a final real named entity, and identifying a named entity type corresponding to the real named entity; and
step S334: the current initial named entity is deleted, and the next initial named entity is selected and then the process returns to the step S332.
It is to be understood that, in step S332, all real named entities in the text are finally obtained by sequentially retaining the real named entities and deleting the initial named entities other than the real named entities.
It is understood that steps S331 to S334 are only one implementation of this embodiment, and the implementation is not limited to steps S331 to S334.
Referring to fig. 5, a named entity recognition system based on external knowledge is further provided in the second embodiment of the present invention. The external knowledge-based named entity recognition system can include:
the system comprises an initial identification unit 1, a text processing unit and a control unit, wherein the initial identification unit 1 is used for acquiring at least one text, the text contains at least one word which is arranged in sequence, and a plurality of initial named entities obtained by identifying a plurality of words in the text are acquired;
the external knowledge embedding unit 2 is used for acquiring at least one external knowledge database, inputting every two initial named entities into the external knowledge database and acquiring initial relationship vectors corresponding to the two initial named entities; and
and the named entity identification unit 3 is used for obtaining a final real named entity based on the initial relationship vector and the initial named entities and identifying the named entity type corresponding to the real named entity.
Referring to fig. 6, the initial recognition unit 1 further includes:
the text acquiring unit 11 is configured to acquire at least one text, where the text contains at least one word arranged in sequence; and
and a label prediction unit 12, configured to sequentially predict a prediction label of each word, and sequentially combine a plurality of corresponding words into a corresponding initial named entity according to the prediction label.
Referring to fig. 7, the named entity recognition unit 3 further includes:
a temporal weight calculation unit 31, configured to obtain a temporal weight between each two initial named entities based on the initial relationship vector;
a final weight calculation unit 32, configured to obtain a final weight between each two of the initial named entities with respect to all the initial named entities based on a plurality of temporary weights; and
and a named entity distinguishing unit 33, configured to obtain, based on the final weight, an external knowledge vector corresponding to each initial named entity and relative to all initial named entities, obtain, based on the external knowledge vector, a final real named entity, and identify a named entity category corresponding to the real named entity.
It can be understood that the named entity recognition system based on external knowledge according to the second embodiment of the present invention is particularly suitable for use in a system for recognizing named entities by combining an external knowledge database, and the system obtains real named entities by combining a plurality of initially recognized named entities with the external knowledge database to perform weight calculation, and introduces information from the external knowledge database to help recognition of the named entities, thereby effectively improving the accuracy of named entity recognition.
Referring to fig. 8, a third embodiment of the present invention provides an electronic device for implementing the external knowledge-based named entity recognition method, where the electronic device includes a memory 10 and a processor 20, the memory 10 stores therein an arithmetic machine program, and the arithmetic machine program is configured to execute the steps in any one of the above external knowledge-based named entity recognition method embodiments when running. The processor 20 is arranged to perform the steps of any one of the above embodiments of the external knowledge based named entity recognition method by means of the operator program.
Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of an operating machine network.
The electronic equipment is particularly suitable for equipment for conducting named entity recognition by combining an external knowledge database, real named entities are obtained by conducting weight calculation on a plurality of initially recognized named entities by combining the external knowledge database, information from the external knowledge database is introduced to help recognition of the named entities, and accuracy of named entity recognition can be effectively improved.
Compared with the prior art, the named entity identification method, the named entity identification system and the electronic equipment based on the external knowledge have the following advantages that:
1. the method comprises the steps of preliminarily identifying a plurality of initial named entities consisting of a plurality of words by obtaining at least one text, obtaining an initial relation vector between every two initial named entities based on the initial named entities and an external knowledge database, obtaining real named entities based on the initial relation vector and the initial named entities, helping the named entities to be identified by introducing information from the external knowledge database, enabling the initial named entities to be further judged after preliminary identification by combining external knowledge, further screening and obtaining the real named entities based on the external knowledge information in the initial named entities, and improving the accuracy of named entity identification.
2. The method comprises the steps of obtaining temporary weights between every two initial named entities through calculation according to initial relation vectors, obtaining final weights between every two initial named entities relative to all the initial named entities based on the temporary weights to integrate the temporary weights, and finally obtaining external knowledge vectors corresponding to the initial named entities through the final weights to judge final real named entities.
3. The real named entities are sequentially reserved, the initial named entities of the non-real named entities are deleted, all the real named entities in the text are finally obtained, the missing report rate of the named entities is reduced, and the accuracy and the recall degree of the named entity recognition are improved.
4. The characters in the text comprise Chinese characters and/or English words, so that the named entity recognition method based on external knowledge provided by the invention can adapt to characters of various different languages, and has higher adaptability.
5. When the prediction label of each word is predicted, the prediction threshold value is reduced to increase the number of the initial named entities identified in the same text, namely, the prediction threshold value is reduced, so that when the same text is predicted, the initial named entities as many as possible are identified as candidates of the real named entities on the basis of the lower prediction threshold value, the missing report caused by the higher threshold value is prevented, and the identification accuracy is improved.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart.
Which when executed by a processor performs the above-described functions defined in the method of the present application. It should be noted that the computer memory described herein may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer memory may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof.
More specific examples of computer memory may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable signal medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an initial recognition unit, an external knowledge embedding unit, and a named entity recognition unit. The initial identification unit may also be described as "a unit that obtains at least one text containing at least one word arranged in sequence and identifies a plurality of initial named entities obtained by a plurality of words in the text".
As another aspect, the present application also provides a computer memory, which may be included in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer memory carries one or more programs that, when executed by the apparatus, cause the apparatus to: obtaining at least one text, wherein the text contains at least one word arranged in sequence, and identifying a plurality of initial named entities obtained by a plurality of words in the text; the method comprises the steps of obtaining at least one external knowledge database, inputting every two initial named entities into the external knowledge database, obtaining initial relation vectors corresponding to the two initial named entities, obtaining a final real named entity based on the initial relation vectors and the initial named entities, and identifying the named entity type corresponding to the real named entity.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalent alterations and improvements made within the spirit of the present invention should be included in the scope of the present invention.

Claims (10)

1. A named entity recognition method based on external knowledge is characterized in that: the method comprises the following steps:
step S1: obtaining at least one text, wherein the text contains at least one word arranged in sequence, and identifying a plurality of initial named entities obtained by a plurality of words in the text;
step S2: acquiring at least one external knowledge database, inputting every two initial named entities into the external knowledge database, and acquiring initial relationship vectors corresponding to the two initial named entities; and
step S3: and obtaining a final real named entity based on the initial relation vector and the initial named entities, and identifying the named entity type corresponding to the real named entity.
2. A method for named entity recognition based on external knowledge as in claim 1, characterized by: the step S3 specifically includes the following steps:
step S31: obtaining a temporary weight between each two initial named entities based on the initial relationship vector;
step S32: obtaining a final weight between every two initial named entities relative to all the initial named entities based on a plurality of temporary weights; and
step S33: and obtaining an external knowledge vector corresponding to each initial named entity relative to all initial named entities based on the final weight, obtaining a final real named entity based on the external knowledge vector, and identifying a named entity category corresponding to the real named entity.
3. A method for named entity recognition based on external knowledge as in claim 2, characterized by: the step S33 specifically includes the following steps:
step S331: based on the final weight, obtaining an external knowledge vector corresponding to each initial named entity relative to all initial named entities;
step S332: judging whether the initial named entity is a real named entity or not based on the word vector corresponding to the initial named entity and the corresponding external knowledge vector, if so, entering a step S333, and if not, entering a step S334;
step S333: obtaining a final real named entity, and identifying a named entity type corresponding to the real named entity; and
step S334: the current initial named entity is deleted, and the next initial named entity is selected and then the process returns to the step S332.
4. A method for named entity recognition based on external knowledge as in claim 1, characterized by: the step S1 specifically includes the following steps:
step S11: obtaining at least one text, wherein the text contains at least one character which is arranged in sequence; and
step S12: and sequentially predicting the prediction label of each word, and sequentially combining a plurality of corresponding words into a corresponding initial named entity according to the prediction labels.
5. A method for named entity recognition based on external knowledge as in claim 1, characterized by: in the above step S1, the words include chinese characters and/or english words.
6. A method for named entity recognition based on external knowledge as in claim 4, characterized by: in the step S12, the prediction threshold is lowered to increase the number of the identified initial named entities when predicting the prediction tag of each word.
7. An external knowledge-based named entity recognition system, comprising:
the system comprises an initial identification unit, a first naming unit and a second naming unit, wherein the initial identification unit is used for acquiring at least one text, the text contains at least one word which is sequentially arranged, and a plurality of initial named entities obtained by identifying a plurality of words in the text are identified;
the external knowledge embedding unit is used for acquiring at least one external knowledge database, inputting every two initial named entities into the external knowledge database and acquiring initial relation vectors corresponding to the two initial named entities; and
and the named entity identification unit is used for obtaining a final real named entity based on the initial relation vector and the initial named entities and identifying the named entity type corresponding to the real named entity.
8. The system for named entity recognition based on external knowledge of claim 7, wherein the named entity recognition unit further comprises:
a temporary weight calculation unit, configured to obtain a temporary weight between each two initial named entities based on the initial relationship vector;
a final weight calculation unit, configured to obtain a final weight between each two of the initial named entities with respect to all of the initial named entities based on a plurality of temporary weights; and
and the named entity distinguishing unit is used for obtaining an external knowledge vector corresponding to each initial named entity relative to all the initial named entities based on the final weight, obtaining a final real named entity based on the external knowledge vector, and identifying the named entity category corresponding to the real named entity.
9. The system for extrinsic knowledge based named entity recognition as claimed in claim 7, wherein said initial recognition unit further comprises:
the text acquisition unit is used for acquiring at least one text, and the text contains at least one character which is arranged in sequence; and
and the label prediction unit is used for sequentially predicting the prediction label of each word and sequentially combining a plurality of corresponding words into a corresponding initial named entity according to the prediction labels.
10. An electronic device comprising a memory and a processor, characterized in that: the memory having stored therein a computer program arranged to execute the external knowledge based named entity recognition method of any one of the claims 1 to 6 when run;
the processor is arranged to execute the external knowledge based named entity recognition method of any of the claims 1 to 6 by means of the computer program.
CN201911034091.XA 2019-10-26 2019-10-26 Named entity identification method and system based on external knowledge and electronic equipment Active CN110795941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911034091.XA CN110795941B (en) 2019-10-26 2019-10-26 Named entity identification method and system based on external knowledge and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911034091.XA CN110795941B (en) 2019-10-26 2019-10-26 Named entity identification method and system based on external knowledge and electronic equipment

Publications (2)

Publication Number Publication Date
CN110795941A true CN110795941A (en) 2020-02-14
CN110795941B CN110795941B (en) 2024-04-05

Family

ID=69441690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911034091.XA Active CN110795941B (en) 2019-10-26 2019-10-26 Named entity identification method and system based on external knowledge and electronic equipment

Country Status (1)

Country Link
CN (1) CN110795941B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268339A (en) * 2013-05-17 2013-08-28 中国科学院计算技术研究所 Recognition method and system of named entities in microblog messages
CN105138515A (en) * 2015-09-02 2015-12-09 百度在线网络技术(北京)有限公司 Named entity recognition method and device
CN108875051A (en) * 2018-06-28 2018-11-23 中译语通科技股份有限公司 Knowledge mapping method for auto constructing and system towards magnanimity non-structured text
CN110083838A (en) * 2019-04-29 2019-08-02 西安交通大学 Biomedical relation extraction method based on multilayer neural network Yu external knowledge library

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268339A (en) * 2013-05-17 2013-08-28 中国科学院计算技术研究所 Recognition method and system of named entities in microblog messages
CN105138515A (en) * 2015-09-02 2015-12-09 百度在线网络技术(北京)有限公司 Named entity recognition method and device
CN108875051A (en) * 2018-06-28 2018-11-23 中译语通科技股份有限公司 Knowledge mapping method for auto constructing and system towards magnanimity non-structured text
CN110083838A (en) * 2019-04-29 2019-08-02 西安交通大学 Biomedical relation extraction method based on multilayer neural network Yu external knowledge library

Also Published As

Publication number Publication date
CN110795941B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
CN109145219B (en) Method and device for judging validity of interest points based on Internet text mining
CN109614625B (en) Method, device and equipment for determining title text relevancy and storage medium
CN107992596A (en) A kind of Text Clustering Method, device, server and storage medium
CN112015859A (en) Text knowledge hierarchy extraction method and device, computer equipment and readable medium
CN110363220B (en) Behavior class detection method and device, electronic equipment and computer readable medium
CN106296195A (en) A kind of Risk Identification Method and device
CN109087667B (en) Voice fluency recognition method and device, computer equipment and readable storage medium
CN111222500A (en) Label extraction method and device
CN111460250A (en) Image data cleaning method, image data cleaning device, image data cleaning medium, and electronic apparatus
CN108228567A (en) For extracting the method and apparatus of the abbreviation of organization
CN112100377A (en) Text classification method and device, computer equipment and storage medium
CN115346686A (en) Relation map generation method and device, storage medium and electronic equipment
CN112818667B (en) Address correction method, system, device and storage medium
CN118013963A (en) Method and device for identifying and replacing sensitive words
CN112599211B (en) Medical entity relationship extraction method and device
CN113268588A (en) Text abstract extraction method, device, equipment, storage medium and program product
CN112232088A (en) Contract clause risk intelligent identification method and device, electronic equipment and storage medium
CN116306663A (en) Semantic role labeling method, device, equipment and medium
CN110795941A (en) Named entity identification method and system based on external knowledge and electronic equipment
CN115328753A (en) Fault prediction method and device, electronic equipment and storage medium
CN115240704A (en) Audio recognition method, device, electronic equipment and computer program product
CN115048929A (en) Sensitive text monitoring method and device
CN112465149A (en) Same-city part identification method and device, electronic equipment and storage medium
CN115587358A (en) Binary code similarity detection method and device and storage medium
CN113836297A (en) Training method and device for text emotion analysis model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant