CN112749561B

CN112749561B - Entity identification method and equipment

Info

Publication number: CN112749561B
Application number: CN202010303388.8A
Authority: CN
Inventors: 黄婷
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2023-11-03
Anticipated expiration: 2040-04-17
Also published as: CN112749561A

Abstract

The embodiment of the invention provides an entity identification method and equipment; the method comprises the following steps: acquiring information to be processed and text information corresponding to the information to be processed, and obtaining text information to be identified; performing entity recognition on the text information to be recognized by adopting at least one basic entity recognition model to obtain at least one recognition result corresponding to the text information to be recognized; at least one basic entity recognition model refers to at least one means for entity recognition; extracting text features of text information to be identified by adopting a fusion entity identification model to obtain target text features, and carrying out feature construction on at least one identification result to obtain basic model features; combining the basic model features and the target text features to perform entity recognition to obtain a target recognition result; the fusion entity recognition model is used for carrying out entity recognition on the text information to be recognized by utilizing at least one recognition result. By the embodiment of the invention, the accuracy of entity identification can be improved.

Description

Entity identification method and equipment

Technical Field

The present invention relates to natural language processing technology in the field of artificial intelligence, and in particular, to a method and apparatus for entity recognition.

Background

Named entity recognition (Entity Name Recognition, NER), which refers to the process of recognizing entities in text and the entity types to which the entities belong, is a fundamental task in natural language processing; through named entity recognition, the execution efficiency of information processing can be improved, and the information processing can be assisted; thus, named entity recognition plays an important role in information processing.

Generally, when performing named entity recognition, a text is generally subjected to named entity recognition by using a plurality of named entity recognition methods to obtain a plurality of corresponding recognition results, and then one recognition result is selected from the plurality of recognition results, or the plurality of recognition results are combined to obtain a final recognition result, so as to realize named entity recognition. However, in the above-mentioned process of implementing named entity recognition, the final recognition result is obtained by selecting one recognition result from a plurality of recognition results, or by combining a plurality of recognition results; thus, the final recognition result is strongly correlated with the plurality of recognition results; thus, when there is an erroneous recognition result among the plurality of recognition results, the accuracy of the final recognition result is low, and therefore, the accuracy of entity recognition is low.

Disclosure of Invention

The embodiment of the invention provides an entity identification method and equipment, which can improve the accuracy of entity identification.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides an entity identification method, which comprises the following steps:

acquiring information to be processed and text information corresponding to the information to be processed, and obtaining text information to be identified;

performing entity recognition on the text information to be recognized by adopting at least one basic entity recognition model to obtain at least one recognition result corresponding to each text information; the at least one basic entity recognition model refers to at least one mode for entity recognition;

extracting text features of the text information to be identified by adopting a fusion entity identification model to obtain target text features, and carrying out feature construction on at least one identification result to obtain basic model features; combining the basic model features and the target text features to perform entity recognition to obtain a target recognition result;

the fusion entity recognition model is used for carrying out entity recognition on the text information to be recognized by utilizing the at least one recognition result, the target text characteristics refer to characteristics about characters and character strings in the text information to be recognized, and the target recognition result is the entity recognition result of the information to be processed.

The embodiment of the invention provides an entity identification device, which comprises:

the information acquisition module is used for acquiring information to be processed and text information corresponding to the information to be processed to obtain text information to be identified;

the basic recognition module is used for carrying out entity recognition on the text information to be recognized by adopting at least one basic entity recognition model to obtain at least one recognition result corresponding to the text information to be recognized; the at least one basic entity recognition model refers to at least one mode for entity recognition;

the fusion recognition module is used for extracting text features of the text information to be recognized by adopting a fusion entity recognition model to obtain target text features, and carrying out feature construction on the at least one recognition result to obtain basic model features; combining the basic model features and the target text features to perform entity recognition to obtain a target recognition result; the fusion entity recognition model is used for carrying out entity recognition on the text information to be recognized by utilizing the at least one recognition result, the target text characteristics refer to characteristics about characters and character strings in the text information to be recognized, and the target recognition result is the entity recognition result of the information to be processed.

The embodiment of the invention provides entity identification equipment, which comprises the following components:

a memory for storing executable instructions;

and the processor is used for realizing the entity identification method provided by the embodiment of the invention when executing the executable instructions stored in the memory.

The embodiment of the invention provides a computer readable storage medium which stores executable instructions for realizing the entity identification method provided by the embodiment of the invention when being executed by a processor.

The embodiment of the invention has the following beneficial effects: the obtained target recognition result is obtained by combining at least one recognition result and text features corresponding to the text information to be recognized to perform feature construction; therefore, on one hand, even if at least one recognition result has an error recognition result, text features corresponding to the text information to be recognized can correct the error recognition result; on the other hand, the text features corresponding to the text information to be identified are combined with at least one identification result to perform feature construction to obtain the features to be identified for entity identification, so that the features to be identified are rich, and therefore, when the features to be identified are subjected to entity identification, the accuracy of the obtained target identification result is high; in summary, the entity identification method provided by the embodiment of the invention improves the accuracy of entity identification.

Drawings

FIG. 1 is a schematic diagram of an exemplary process for entity identification;

FIG. 2 is a schematic diagram of an alternative architecture of an entity identification system provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of a structure of the server in FIG. 2 according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of an alternative entity identification method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a relationship of entity recognition models provided by an embodiment of the present invention;

FIG. 6 is a schematic flow chart of an alternative method for extracting target text features according to an embodiment of the present invention;

FIG. 7 is a schematic flow chart of an alternative method for obtaining basic model features according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of an exemplary structure of input features of a fusion entity recognition model provided by an embodiment of the present invention;

FIG. 9 is a schematic diagram of an exemplary format of a feature to be identified provided by an embodiment of the present invention;

FIG. 10 is a schematic flow chart of another alternative entity identification method according to an embodiment of the present invention;

FIG. 11 is a schematic flow chart of an alternative entity identification method according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of another alternative architecture of an entity identification system provided by an embodiment of the present invention;

FIG. 13 is a block chain architecture diagram of a block chain network according to an embodiment of the present invention;

FIG. 14 is a functional architecture diagram of a blockchain network provided by an embodiment of the present invention;

FIG. 15 is a schematic illustration of an exemplary entity identification application provided by an embodiment of the present invention;

fig. 16 is a flow chart illustrating an exemplary entity identification provided by an embodiment of the present invention.

Detailed Description

The present invention will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent, and the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the embodiments of the invention is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

Before describing embodiments of the present invention in further detail, the terms and terminology involved in the embodiments of the present invention will be described, and the terms and terminology involved in the embodiments of the present invention will be used in the following explanation.

1) Entity type refers to the type to which the entity belongs, such as place name, person name, organization name, movie and TV play, novel, animation, game, event, song, and App (Application) name. And entities, also known as named entities, refer to instances of concepts; for example, "person name" is a concept (or entity type) and "Wang Xiaoming" is a "person name" entity. "time" is an entity type, and "mid-autumn festival" is a "time" entity. In addition, entities are also known as named entities, so that entity identification is named entity identification.

2) Natural language processing (Nature Language processing, NLP), an important direction in the fields of computer science and artificial intelligence, is aimed at studying various theories and methods that enable effective communication between humans and computers in natural language; natural language processing is a science integrating linguistics, computer science and mathematics, so that the research in the field relates to natural language, namely the language used by people in daily life, and has close relation with the research of linguistics; natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

3) Artificial intelligence (Artificial Intelligence, AI), is a theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, obtains knowledge, and uses the knowledge to obtain optimal results.

4) Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. Specially researching how a computer simulates or implements learning behavior of a human to acquire new knowledge or skill; reorganizing the existing knowledge structure to continuously improve the performance of the knowledge structure. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning typically includes techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, and induction learning.

5) An artificial neural network, which is a mathematical model that mimics the structure and function of a biological neural network, exemplary structures of which herein include deep neural networks (Deep Neural Networks, DNN), convolutional neural networks (Convolutional Neural Network, CNN), and recurrent neural networks (Recurrent Neural Network, RNN), among others.

6) A loss function, also known as a cost function, is a function that maps the value of a random event or its related random variable to a non-negative real number to represent the "risk" or "loss" of the random event.

7) Blockchain (Blockchain) is a storage structure of encrypted, chained transactions formed by blocks (blocks).

8) A blockchain network (Blockchain Network) incorporates new blocks into a set of nodes of the blockchain by way of consensus.

9) Ledger (Ledger), a generic term for blockchains (also known as Ledger data) and state databases that are synchronized with blockchains. Wherein the blockchain records transactions in the form of files in a file system; the state database records transactions in the blockchain in the form of different types of Key (Key) Value pairs for supporting quick queries for transaction data in the blockchain.

10 Smart contacts), also known as chain code (Chaincode) or application code, deployed in a node of a blockchain network, which executes Smart Contracts invoked in received transactions to update or query key values of a state database for data.

11 Consensus (Consensus) is a process in the blockchain network for agreeing on transactions among the involved nodes that will be appended to the tail of the blockchain and used to update the state database.

It should be noted that artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

In addition, the artificial intelligence technology is a comprehensive discipline, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

With the research and advancement of artificial intelligence technology, artificial intelligence technology has been developed for research and application in a variety of fields; for example, common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned, robotic, smart medical and smart customer service, and the like; with the development of technology, artificial intelligence technology will be applied in more fields and will have increasingly important value; artificial intelligence may also be applied in the field of text processing, for example. The application of the artificial intelligence according to the embodiments of the present invention in the text processing field will be described later.

Generally, when entity recognition is performed, entity recognition is performed on a text by using a plurality of entity recognition methods to obtain a plurality of corresponding recognition results, and then one recognition result is selected from the plurality of recognition results, for example, a "learnTorank" model is used to select one recognition result from the plurality of recognition results; or combining a plurality of recognition results through predefined rules to obtain a final recognition result so as to realize entity recognition. However, in the above-mentioned process of implementing entity recognition, the final recognition result is obtained by selecting one recognition result from a plurality of recognition results, or by combining a plurality of recognition results; thus, the final recognition result is strongly correlated with the plurality of recognition results; thus, when there is an erroneous recognition result among the plurality of recognition results, the accuracy of the final recognition result is low, and therefore, the accuracy of entity recognition is low. In addition, the predefined rule is a rule set based on various situations, so that a large amount of hard codes exist in the implementation, and the maintenance is not easy.

Among a plurality of entity recognition methods employed for entity recognition of text, there are a dictionary-based entity recognition method and a network model-based entity recognition method. Aiming at the entity recognition method based on the dictionary, the entity recognition result of the text is generally obtained by collecting the entities of various entity types to form a prefix tree and then matching the text with the formed prefix tree; however, when matching text with the composed prefix tree, the context information of each word between texts is not considered, and thus, a recall problem may be caused.

For entity recognition methods based on network models, the entity recognition result of the text is generally determined through the network model. Generally, a network model for entity recognition includes an input layer, a context encoding layer, and a tag decoding layer; here, the input layer is used to vector the input text, and is typically implemented by "Pre-trained word embedding", "Character-level", "ebedding", "POS tag" or "Gazetteer", etc.; the context coding layer is used for semantically coding the vector representation output by the input layer, and is usually implemented by adopting CNN, RNN, language model or Transformer and the like; the tag decoding layer is used for decoding the semantic encoding result to obtain an entity identification result, and is generally implemented by using "Softmax", "CRF", "RNN" or "Point network", etc.

Referring to fig. 1, fig. 1 is a schematic flow chart illustrating an exemplary entity identification process; as shown in FIG. 1, input texts 1-1"M … J … J … ws born in B … and N … Y …' are input into a network model 1-2 for entity recognition to obtain entity recognition results 1-3"M … { B-PER } J … { I-PER } J … { E-PER } ws { O } born { O } in { O } B … { S-LOC }, { O } N … { B-LOC } Y … { E-LOC } "; the network model 1-2 includes an input layer 1-21, a context encoding layer 1-22 and a tag decoding layer 1-23, and is thus entity recognition of the input text 1-1 by the input layer 1-21, the context encoding layer 1-22 and the tag decoding layer 1-23.

In addition, for errors in the recognition result, one is errors caused by the recognition mode itself, for example, the accuracy of the network model of entity recognition is low; the other is caused by a definition change of the entity type. While there are two cases for definition changes of entity types; one situation is a change of application scenario, for example, in some application scenarios, the name type does not include movie and television characters, game characters, etc., while in other application scenarios, the name type includes movie and television characters, game characters, etc., and when different application scenarios use the same entity recognition method, an erroneous entity recognition result will be obtained. Another case is that in different application phases in the application scenario, for example, in the first phase, the entity types include: the method comprises the steps of setting up a name of a play, a name of a book, a name of a game, a name type of a person, a name of a commentator of a game, an uploader of ugc (User Generated Content ), a name of a role and the like, wherein two corresponding entity identification methods comprise a dictionary-based entity identification method and a multi-layer convolutional neural network model-based entity identification method; in the second stage, a new definition is made for the entity type, where the new entity type includes a name type (including a name type, a game commentator, a ugc uploader, and a character name in the first stage, and a competition mechanism in the first stage, such as "NBA", euro, and "WWE"), an IP (Intellectual Property, knowledge resource) type, a place name type, and a mechanism type, and at this time, when the entity is identified by using two entity identification methods in the first stage, an erroneous identification result is obtained.

Based on the above, the embodiment of the invention provides an entity identification method and equipment, which can correct the wrong identification result in the entity identification process, and can utilize richer features to be identified for entity identification, so that the accuracy of the obtained entity identification result is high, and the accuracy of the entity identification result is improved. The following describes an exemplary application of the entity identification device provided by the embodiment of the present invention, where the entity identification device provided by the embodiment of the present invention may be implemented as various types of user terminals such as a smart phone, a tablet computer, a notebook computer, and the like, and may also be implemented as a server. In the following, an exemplary application when the entity identification device is implemented as a server will be described.

It should be noted that the embodiments of the present invention may also be implemented in combination with a blockchain technology, where blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like. The blockchain is essentially a decentralised database, which is a series of data blocks generated by cryptographic methods, each data block containing a batch of information of network transactions for verifying the validity (anti-counterfeiting) of the information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer. For the entity identification method combined with the blockchain technology provided by the embodiment of the invention, refer to the following description specifically.

Referring to fig. 2, fig. 2 is an optional architecture diagram of an entity identification system according to an embodiment of the present invention; as shown in fig. 2, to support an entity identification application, the terminal 200 is connected to the server 400 through the network 300, and the network 300 may be a wide area network or a local area network, or a combination of both.

A terminal 200 for transmitting information to be processed to the server 400 through the network 300.

The server 400 is configured to obtain information to be processed and text information corresponding to the information to be processed from the terminal 200 through the network 300, so as to obtain text information to be identified; performing entity recognition on the text information to be recognized by adopting at least one basic entity recognition model to obtain at least one recognition result corresponding to the text information to be recognized; extracting text features of text information to be identified by adopting a fusion entity identification model to obtain target text features, and carrying out feature construction on at least one identification result to obtain basic model features; and combining the basic model features and the target text features to perform entity recognition to obtain a target recognition result.

Referring to fig. 3, fig. 3 is a schematic structural diagram of the server in fig. 2 according to an embodiment of the present invention; the server 400 shown in fig. 3 includes: at least one processor 410, a memory 450, at least one network interface 420, and a user interface 430. The various components in server 400 are coupled together by bus system 440. It is understood that the bus system 440 is used to enable connected communication between these components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled in fig. 3 as bus system 440.

The processor 410 may be an integrated circuit chip having signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, or the like, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.

The user interface 430 includes one or more output devices 431, including one or more speakers and/or one or more visual displays, that enable presentation of the media content. The user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

Memory 450 includes volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a random access Memory (RAM, random AccessMemory). The memory 450 described in embodiments of the present invention is intended to comprise any suitable type of memory. Memory 450 optionally includes one or more storage devices physically remote from processor 410.

In some embodiments, memory 450 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 451 including system programs, e.g., framework layer, core library layer, driver layer, etc., for handling various basic system services and performing hardware-related tasks, for implementing various basic services and handling hardware-based tasks;

network communication module 452 for reaching other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 include: bluetooth, wireless compatibility authentication (Wi-Fi), universal serial bus (USB, universal Serial Bus), and the like;

a display module 453 for enabling presentation of information (e.g., a user interface for operating peripheral devices and displaying content and information) via one or more output devices 431 (e.g., a display screen, speakers, etc.) associated with the user interface 430;

an input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.

In some embodiments, the entity identifying apparatus provided in the embodiments of the present invention may be implemented in software, and fig. 3 shows the entity identifying apparatus 455 stored in the memory 450, which may be software in the form of a program, a plug-in, or the like, including the following software modules: the information acquisition module 4551, the base recognition module 4552, the fusion recognition module 4553, the application module 4554, and the blockchain module 4555, the functions of each of which will be described below.

In other embodiments, the entity recognition device provided in the embodiments of the present invention may be implemented in hardware, and by way of example, the entity recognition device provided in the embodiments of the present invention may be a processor in the form of a hardware decoding processor that is programmed to perform the entity recognition method provided in the embodiments of the present invention, for example, the processor in the form of a hardware decoding processor may employ one or more application specific integrated circuits (ASIC, application Specific Integrated Circuit), DSP, programmable logic device (PLD, programmable Logic Device), complex programmable logic device (CPLD, complex Programmable Logic Device), field programmable gate array (FPGA, field-Programmable Gate Array), or other electronic component.

In the following, an entity identification method provided by the embodiment of the present invention will be described in connection with exemplary applications and implementations of a server provided by the embodiment of the present invention.

Referring to fig. 4, fig. 4 is a schematic flowchart of an alternative entity identification method according to an embodiment of the present invention, and will be described with reference to the steps shown in fig. 4.

S101, acquiring information to be processed and text information corresponding to the information to be processed, and obtaining text information to be identified.

In the embodiment of the invention, when the entity identification equipment carries out entity identification, the object identified by the entity is obtained, and the information to be processed is also obtained; the information to be processed can be any format of information, such as text, video, audio or document, and the entity identification equipment can only identify the entity of the text format of the information, so that the entity identification equipment acquires the text information corresponding to the information to be processed, and the text information to be identified is obtained, so that the entity identification of the text information to be identified is performed.

It should be noted that, after the entity recognition device obtains the information to be processed, the information to be processed is directly used as the text information to be recognized for the text format information to be processed; and converting the information to be processed in other non-text formats such as audio or video into text information, and obtaining the text information to be identified.

S102, performing entity recognition on text information to be recognized by adopting at least one basic entity recognition model to obtain at least one recognition result corresponding to the text information to be recognized; at least one underlying entity recognition model refers to at least one means for performing entity recognition.

It should be noted that different entity recognition methods, that is, at least one basic entity recognition model, are preset in the entity recognition device, and are used for performing entity recognition by adopting at least one entity recognition mode; that is, at least one basic entity recognition model refers to at least one means for performing entity recognition, such as a dictionary-based entity recognition method, an entity recognition method based on various network models, and the like.

In the embodiment of the invention, after the entity recognition equipment obtains the text information to be recognized, each basic entity recognition model in at least one basic entity recognition model is utilized to perform entity recognition on the text information to be recognized to obtain a recognition result corresponding to each basic entity recognition model, so that when the entity recognition of the text information to be recognized by the at least one basic entity recognition model is completed, at least one entity recognition result corresponding to the at least one basic entity recognition model is also obtained; it is easy to know that at least one entity recognition result corresponds to at least one basic entity recognition model one by one.

S103, extracting text features of text information to be identified by adopting a fusion entity identification model to obtain target text features, and carrying out feature construction on at least one identification result to obtain basic model features; combining the basic model features and the target text features to perform entity recognition to obtain a target recognition result; the fusion entity recognition model is used for carrying out entity recognition on the text information to be recognized by utilizing at least one recognition result, the target text characteristics refer to characteristics about characters and character strings in the text information to be recognized, and the target recognition result is the entity recognition result of the information to be processed.

In the embodiment of the invention, after at least one recognition result is obtained, the entity recognition device does not directly select one recognition result from the at least one recognition result as a final recognition result, does not combine the at least one recognition result into the final recognition result, but utilizes the at least one recognition result to construct a feature for carrying out entity recognition again, combines the extracted text feature of the text information to be recognized, namely the target text feature, together serves as the feature for carrying out entity recognition again, and carries out entity recognition again, wherein the obtained recognition result is the final recognition result, namely the target recognition result.

It should be noted that, the fusion entity recognition model is used for performing entity recognition on the text information to be recognized by using at least one recognition result, for example, "BI-LSTM-CRF" in the sequence labeling model; the target text features refer to features related to characters and character strings in the text information to be recognized, such as vector features of the characters, features corresponding to the relation between the characters and the character strings, features corresponding to the character strings, and the like; the target recognition result is an entity recognition result of the information to be processed. In addition, the entity types aimed at by the at least one basic entity recognition model may be the same or different, and the entity types aimed at by the at least one basic entity recognition model and the entity types aimed at by the fusion entity recognition model may be the same or different; when the entity type aimed at by the at least one basic entity recognition model is different from the entity type aimed at by the fusion entity recognition model, the entity type aimed at by the fusion entity recognition model comprises the entity type aimed at by the at least one basic entity recognition model, for example, the name type comprises: the name, the character and the movie name are different entity types in the entity types aimed by the at least one basic entity identification model.

It should be further noted that, training data in the training process of the fusion entity identification model may be manually labeled, or may be labeled by means of a tool, which is not particularly limited in the embodiment of the present invention; and the corpus of the labels can be video search strings, news headlines and the like.

It can be understood that, since the obtained target recognition result is obtained by performing feature construction by combining at least one recognition result and text features corresponding to the text information to be recognized; therefore, on one hand, even if at least one recognition result has an error recognition result, text features corresponding to the text information to be recognized can correct the error recognition result; on the other hand, the text features corresponding to the text information to be identified are combined with at least one identification result to perform feature construction to obtain the features to be identified for entity identification, so that the features to be identified are rich, and therefore, when the features to be identified are subjected to entity identification, the accuracy of the obtained target identification result is high; in summary, the entity identification method provided by the embodiment of the invention improves the accuracy of entity identification.

Further, in an embodiment of the present invention, the at least one base entity recognition model includes at least one of a dictionary recognition model, an artificial intelligence recognition model, and a new entity type recognition model; the dictionary recognition model refers to a mode of carrying out entity recognition by utilizing an entity dictionary based on a historical entity type, wherein the historical entity type is a historically defined entity type, namely the entity type aimed at by the dictionary recognition model is a historically defined entity type, and the adopted entity recognition method is a dictionary-based entity recognition method; the artificial intelligent recognition model refers to a mode of carrying out entity recognition based on historical entity types and utilizing a network model, and the network model refers to an artificial neural network model, namely the entity type aimed by the artificial intelligent recognition model is the entity type defined by history, and the adopted entity recognition method is an entity recognition method based on the network model, such as a CNN model; the new entity type recognition model refers to a manner of performing entity recognition based on a new entity type, wherein the new entity type is a newly defined entity type, namely the entity type aimed by the new entity type recognition model is a newly defined entity type, and the adopted entity recognition method is a dictionary-based entity recognition method or a network model-based entity recognition method, such as a CNN model based on an attention mechanism; here, the new entity type includes a history entity type.

Referring to fig. 5, fig. 5 is a schematic diagram of a relationship of an entity recognition model according to an embodiment of the present invention; as shown in FIG. 5, at least one basic entity recognition model 5-1 includes three sub-models, namely a dictionary recognition model 5-11, an artificial intelligence recognition model 5-12 and a new entity type recognition model 5-13, the outputs of which are inputs to the fusion entity recognition model 5-2.

Further, in the embodiment of the present invention, before S102, that is, before the entity recognition device adopts at least one basic entity recognition model to perform entity recognition on the text information to be processed to obtain at least one recognition result respectively, the entity recognition method further includes a training process of a new entity type recognition model.

Here, the entity recognition device performs labeling of training data (such as video search strings) by using the dictionary recognition model and the artificial intelligent recognition model, and selects training data with the same labeling result to form target training data for training the new entity type recognition model

Further, referring to fig. 6, fig. 6 is a schematic flow chart of an alternative method for extracting target text features according to an embodiment of the present invention; as shown in fig. 6, in embodiment S103 of the present invention, the entity recognition device extracts text features of text information to be recognized, and obtains target text features, including S1031 and S1032, and the steps shown in fig. 6 are described below.

S1031, word segmentation processing is carried out on the text information to be identified, and a character string sequence is obtained.

It should be noted that, when the entity recognition device performs feature extraction on the text information to be recognized, the feature extraction is performed by taking characters in the text information to be recognized as units; therefore, word segmentation processing is needed to be performed on the text information to be recognized, and then feature extraction is performed on characters in the character string. Here, the obtained word segmentation processing result is a character string sequence, the character string sequence refers to a sequence formed by each character string in the text information to be recognized in sequence, the sequence comprises at least one character string, and each character string is a character string in the text information to be recognized. In addition, the character is the minimum unit for composing the text, for example, the character can be a single word in Chinese, and also can be a single word in other languages; and the character string may be a word including at least one character.

S1032, extracting character text features corresponding to the current target character, thereby realizing acquisition of character string text features corresponding to the current character string, and further realizing acquisition of target text features corresponding to the character string sequence; the current character string is any character string in the character string sequence, the current character is any character in the current character string, the character string text feature comprises at least one character text feature, the target text feature comprises at least one character string text feature, and the character text feature comprises at least one of semantic feature of the current character, position information of the current character in the current character string, semantic feature of the current character string and part-of-speech information of the current character string.

It should be noted that, the entity recognition device traverses each character string in the character string sequence, and in the traversing process, the current character string traversed is the current character string, which is easy to know, and is any character string in the character string sequence; for the current character string, the entity recognition equipment traverses each character in the current character string, and in the traversing process, the traversed current character is the current character, and the current character is easily known to be any character in the current character string.

In the embodiment of the invention, for the current character, the entity recognition equipment extracts text features from at least one of four dimensions (the character, the position of the character in the character string, the character string and the part of speech of the character string), wherein the extracted features are text features corresponding to the current character, namely character text features; the entity recognition equipment extracts text features according to the extraction process of the text features of the current character aiming at each character in the current character string, so that when the extraction of the text features of each character of the current character string is completed, the extraction of the text features of the current character string is completed, and the text features of the character string comprising at least one text feature are also obtained; it is easy to know that the number of character text features included in the character string text feature is the number of characters included in the current character string. The entity recognition equipment extracts text features according to the extraction process of the text features of the current character string aiming at each character string in the character string sequence, so that when the extraction of the text features of each character string in the character string sequence is completed, the extraction of the text features of the character string sequence is completed, and target text features comprising at least one character string text feature are obtained; it is easy to know that the number of character string text features included in the target text feature is the number of character strings included in the character string sequence.

It should be noted that the character text features include at least one of semantic features of the current character, position information of the current character in the current character string, semantic features of the current character string and part-of-speech information of the current character string; when the character text feature comprises a feature constructed based on the dimension of the position of the character in the character string, the character text feature comprises the position information of the current target character in the current character string; for example, the location information of the current target character in the current character string may be represented by IOBES (Intermediate Other BeginEnd Single) labels (where B represents the beginning, I represents the middle, E represents the end, S represents a single character, O represents the other for marking unrelated characters), and the feature corresponding to the location information may be obtained by mapping the IOBES label onto 5 labels, respectively.

Further, referring to fig. 7, fig. 7 is a schematic flow chart of an alternative method for obtaining basic model features according to an embodiment of the present invention; as shown in fig. 7, in embodiment S103 of the present invention, the entity recognition device performs feature construction on at least one recognition result to obtain a basic model feature, including S1033-S1035, and the following description will be made with reference to the steps shown in fig. 7.

S1033, respectively carrying out characteristic construction of the character entity type and characteristic construction of the character position on the current recognition result to obtain the current character entity type characteristic and the current character position characteristic, thereby realizing obtaining at least one current character entity type characteristic and at least one current character position characteristic corresponding to the at least one recognition result; the current recognition result is any one of the at least one recognition result.

It should be noted that, the entity recognition device traverses each recognition result in at least one recognition result, and in the traversing process, the currently traversed recognition result is the current recognition result; it is easy to know that the current recognition result is any one of the at least one recognition result.

In the embodiment of the invention, the identification result comprises an entity and an entity type corresponding to the entity, wherein the entity consists of at least one character; therefore, the entity recognition device can perform feature construction on the entity type corresponding to the character in the current recognition result, and when the feature construction of the entity type corresponding to the character in the current recognition result is completed, the obtained construction feature is the current character entity type feature corresponding to the current recognition result. The recognition result comprises the position information of the characters in the corresponding character strings; therefore, the entity recognition device can perform feature construction on the position information of the character in the current recognition result in the corresponding character string, and when the feature construction of the position information of the character in the current recognition result in the corresponding character string is completed, the obtained construction feature is the current character position feature corresponding to the current recognition result. Thus, when the feature construction of the at least one recognition result is completed, the corresponding at least one current character entity type feature and at least one current character position feature can be obtained. At least one current character entity type feature and at least one current character position feature are in one-to-one correspondence with at least one recognition result.

It should be noted that, the current character entity type features refer to a set formed by features corresponding to the entity types to which each character belongs in the current recognition result; the current character position features refer to a set of features corresponding to the position of each character in the corresponding word in the current recognition result. In addition, the entity type includes a type that includes not only the types to which various entities belong, but also a type that includes an entity word and a non-entity word, and the non-entity word is, for example, a single character, other characters, irrelevant characters, or the like.

S1034, counting the voting times of each character belonging to each entity type in the entity types in the text information to be identified from at least one identification result, and obtaining voting characteristics.

In the embodiment of the invention, the entity identification equipment counts the voting times of each character belonging to each entity type in the entity types in the text information to be identified according to at least one identification result and the entity type to which the entity belongs, and takes the voting times as voting characteristics.

It should be noted that, when the error in the at least one recognition result is caused by the change of the definition of the entity type, S1034 includes counting, from the at least one recognition result, the number of votes in which each character in the text information to be recognized belongs to each entity type in the new entity type, and obtaining the voting feature. Here, when the entity type corresponding to at least one recognition result is the historical entity type, statistics of voting times is performed according to the mapping relationship between the historical entity type and the new entity type.

S1035, fusing at least one current character entity type feature, at least one current character position feature and voting features to obtain basic model features.

In the embodiment of the invention, after the entity recognition equipment obtains at least one current character entity type feature, at least one current character position feature and voting feature, the granularity corresponding to the at least one current character entity type feature, the at least one current character position feature and the voting feature is a character; therefore, the entity recognition device can fuse at least one current character entity type feature, at least one current character position feature and voting feature by taking characters as units, and the fusion result is a basic model feature.

Further, in the embodiment S1033 of the present invention, the entity recognition device performs the feature construction of the character entity type and the feature construction of the character position on the current recognition result to obtain the current character entity type feature and the current character position feature, which includes S10331-S10334, and the following steps are respectively described.

S10331, obtaining the entity type corresponding to each character in the current recognition result, and obtaining the entity type information corresponding to the character.

In the embodiment of the invention, because the current recognition result comprises the entity and the entity type corresponding to the entity, and the entity is composed of at least one character, the entity recognition equipment can acquire the entity type corresponding to each character in the current recognition result, namely the entity type information corresponding to the character; it is easy to know that the character-corresponding entity type information corresponds to each character.

S10332, carrying out feature construction on the entity type information corresponding to the characters to obtain the entity type features corresponding to the characters, thereby obtaining the current character entity type features corresponding to the current recognition result; the current character entity type feature includes at least one character corresponding entity type feature.

In the embodiment of the invention, after the entity recognition equipment obtains the entity type information corresponding to the characters, the entity type information corresponding to the characters is subjected to characteristic construction, so that the entity type characteristics corresponding to the characters are obtained; and because the character corresponding entity type feature is corresponding to each character in the current recognition result, when the feature construction of all the characters in the current recognition result is completed, the feature construction of the current recognition result is completed, and the current character entity type feature comprising at least one character corresponding entity type feature is obtained; it is easy to know that the current character entity type feature corresponds to the current recognition result, and the number of the character corresponding entity type features included in the current character entity type feature is the number of characters in the text information to be recognized.

S10333, obtaining the position of each word in the current recognition result in the corresponding character string, and obtaining the position information of the character string corresponding to the word.

In the embodiment of the invention, because the current recognition result comprises the position information of the character in the character string to which the current recognition result belongs, the entity recognition equipment can acquire the position of each character in the current recognition result in the corresponding character string, namely the position information of the character corresponding to the character string; it is easy to know that character-to-character string position information corresponds to each character.

S10334, carrying out feature construction on character corresponding character string position information to obtain character corresponding character string position features, thereby realizing acquisition of current character position features corresponding to current recognition results; the current character position feature includes at least one character corresponding to a character string position feature.

In the embodiment of the invention, after the entity recognition equipment obtains the position information of the word corresponding to the character, the character string position information corresponding to the character is subjected to characteristic construction, so that the character string position characteristic corresponding to the character is obtained; and because the character corresponding character string position feature is corresponding to each character in the current recognition result, when the feature construction of all the characters in the current recognition result is completed, the feature construction of the current recognition result is completed, and the current character position feature comprising at least one character corresponding character string position feature is obtained; it is easy to know that the current character position feature corresponds to the current recognition result, and the number of character string position features corresponding to the characters included in the current character position feature is the number of characters in the text information to be recognized.

Further, in an embodiment of the present invention, when the dictionary recognition model is included in the at least one basic entity recognition model, S1035 further includes: s1036 and S1037; that is, the entity recognition apparatus fuses at least one current character entity type feature, at least one current character position feature, and a voting feature, and before obtaining the basic model feature, the entity recognition method further includes S1036 and S1037, which are described below.

S1036, obtaining dictionary recognition results from at least one recognition result; the dictionary recognition result is a result of recognizing text information to be recognized by the dictionary recognition model.

In the embodiment of the invention, when the text information to be recognized is considered to be a complete entity, the accuracy of the recognition result corresponding to the dictionary recognition model is high; therefore, in order to enhance the confidence of the recognition result obtained by the dictionary recognition model to the text information to be recognized in this case, the entity recognition apparatus first obtains the result of recognition of the dictionary recognition model to the text information to be recognized from at least one recognition result, at which time the dictionary recognition result is obtained.

S1037, obtaining a length ratio of the entity length corresponding to each character in the dictionary recognition result to the text length, and obtaining at least one length ratio corresponding to the dictionary recognition result.

In the embodiment of the invention, the entity recognition device increases the characteristic of the length ratio of the entity length to the text length, so as to enhance the dictionary recognition result when the text information to be recognized is a complete entity. It is easy to know that the length ratio corresponding to the words in the non-entity is 0, and the number of the length ratios contained in at least one length ratio is the number of the words in the text information to be identified.

It should be noted that, on each character of the same entity, the length ratio of the corresponding entity length to the text length is the same.

Accordingly, in the embodiment S1035 of the present invention, the entity recognition device fuses at least one current character entity type feature, at least one current character position feature and a voting feature to obtain a basic model feature, including: the entity recognition device fuses at least one length ratio, at least one current character entity type feature, at least one current character position feature and voting feature to obtain a basic model feature.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an input feature of a fusion entity recognition model according to an embodiment of the present invention; as shown in fig. 8, in the input features of the fusion entity recognition model, the feature 8-1 corresponding to each word includes two parts: model features 8-21 corresponding to each word (features corresponding to each word in the basic model features) and text features 8-22 corresponding to each word (features corresponding to each word in the target text features); the model features 8-21 corresponding to each word include dictionary recognition features 8-31 corresponding to each word (features constructed by recognition results corresponding to dictionary recognition models for each word), network model recognition features 8-32 corresponding to each word (features constructed by recognition results corresponding to artificial intelligence recognition models for each word), new entity type network model recognition features 8-33 corresponding to each word (features constructed by recognition results corresponding to new entity type recognition models for each word), and recognition result voting features 8-34 corresponding to each word (features corresponding to each word in voting features). Here, the dictionary recognition feature 8-31 corresponding to each word further includes an entity type feature 8-311 (current character entity type feature) corresponding to each word, a position feature 8-312 (current character position feature) of the word in the entity word, and a ratio 8-313 (length ratio) of an entity length to a text length corresponding to each word; the network model identification characteristics 8-32 corresponding to each word also comprise entity type characteristics 8-321 (current character entity type characteristics) corresponding to each word and position characteristics 8-322 (current character position characteristics) of the word in the entity word; the new entity type network model identification characteristics 8-33 corresponding to each word also comprise entity type characteristics 8-331 (current character entity type characteristics) corresponding to each word and position characteristics 8-332 (current character position characteristics) of the word in the entity word; in addition, when the new entity type includes an IP type, a person name type, an address type, and an organization type, the recognition result voting feature 8-34 corresponding to each word also includes a number of votes 8-341 belonging to the IP type, a number of votes 8-342 belonging to the person name type, a number of votes 8-343 belonging to the address type, and a number of votes 8-344 belonging to the organization type. And the text feature 8-22 corresponding to each word includes a word vector 8-221 (semantic feature of the current character), a word vector 8-222 (semantic feature of the current character string), a part-of-speech vector 8-223 (part-of-speech information of the current character string), and a position feature 8-224 of the word in the word (position information of the current character in the current character string).

It should be noted that, because the richer the features are, the more accurate the obtained entity identification result is, when the fused entity identification model in the embodiment of the present invention adopts the target text feature and the basic model feature, and the comparison model only adopts the target text feature, the entity identification of the IP type, the name type, the address type and the organization type is performed, and three indexes of accuracy (P for short), recall (R for short) and ratio (F1 Score for short) are used for evaluation, the evaluation results are shown in table 1:

TABLE 1

When the dictionary recognition model, the artificial intelligent recognition model, the new entity type recognition model and the fusion entity recognition model are used for carrying out entity recognition of the IP type, the name type, the address type and the organization type, and evaluation is carried out by using F1, the evaluation results are shown in the table 2:

TABLE 2

Further, in the embodiment S103 of the present invention, the entity recognition device performs entity recognition by combining the basic model feature and the target text feature to obtain the target recognition result, including S1038-S10310, and each step is described below.

S1038, splicing the basic model features and the target text features to obtain features to be identified.

In the embodiment of the invention, because the basic model features and the target text features are features taking characters as granularity, the entity identification equipment can fuse the basic model features and the target text features by taking the characters as units; here, fusion is achieved by stitching, and the stitched features are features to be identified.

Illustratively, based on the feature structure shown in fig. 8, the feature to be identified is represented in a data format of "libsvm", that is, "character 1 st feature: a characteristic value; feature 2: a characteristic value; … …; 15 th feature: the characteristic value "time; and aiming at the text to be identified, namely the bean duckling. Jackson dance teaching-! "corresponds to the feature input format to be recognized as shown in fig. 9; wherein, each word (character) of the text to be recognized is behind the serial number 1, serial numbers 2 to 16 correspond to 15 features in fig. 8, and the sentence is divided into three sentences by using the following formula! The following is carried out N/n "partitions.

S1039, carrying out semantic coding on the features to be identified to obtain a semantic coding result.

In the embodiment of the invention, the entity identification equipment performs semantic coding on the feature to be identified, so that a semantic coding result is obtained, and the semantic coding result is used for determining the entity identification result of the information to be processed.

S10310, decoding the semantic coding result to realize entity identification, and obtaining a target identification result.

In the embodiment of the invention, after the entity identification equipment obtains the semantic coding result, the entity identification equipment decodes the semantic coding result, so that the entity identification of the text information to be identified can be realized, and the decoding result is the target identification result.

Further, referring to fig. 10, fig. 10 is another alternative flow chart of the entity identification method according to the embodiment of the present invention; as shown in fig. 10, in the embodiment of the present invention, after S103, the entity identification method further includes S104 and S105, and the following description is given with reference to the steps shown in fig. 10.

S104, extracting the entity and the entity type corresponding to the entity from the target identification result to obtain the entity information to be processed.

In the embodiment of the invention, after the entity identification device obtains the target identification result, the entity and the entity type corresponding to the entity are included in the target identification result; therefore, the entity recognition device can extract the entity and the entity type corresponding to the entity from the target recognition result, and the extracted entity and the entity type corresponding to the entity are used as the entity information to be processed.

S105, in a preset resource library, carrying out information retrieval according to the entity information to be processed to obtain an information retrieval result; the information retrieval result is information associated with the information to be processed in a preset resource library.

In the embodiment of the invention, the information associated with the information to be processed is acquired from a preset resource library according to the obtained target identification result by carrying out entity identification on the information to be processed; therefore, after the entity identification equipment obtains the entity information to be processed from the target identification result, the entity information to be processed is used as a search keyword, and information search is carried out in a preset resource library, so that an information search result is obtained; it is easy to know that the information retrieval result is information associated with the information to be processed in the preset resource library.

Further, referring to fig. 11, fig. 11 is a schematic flow chart of an alternative entity identification method according to an embodiment of the present invention; as shown in fig. 11, in the embodiment of the present invention, after S103, the entity identification method further includes S106, and the following description is given with reference to the steps shown in fig. 11.

S106, sending the information to be processed and the target identification result to the blockchain network, so that nodes of the blockchain network fill the information to be processed and the target identification result into new blocks, and when the new blocks are consistent in consensus, adding the new blocks to the tail of the blockchain to finish the uplink.

It should be noted that, the entity identification device links the information to be processed and the target identification result, so as to ensure that the information cannot be tampered.

Referring to fig. 12, fig. 12 is another schematic diagram of an alternative architecture of an entity identification system according to an embodiment of the present invention, including a blockchain network 600 (illustrating a consensus node 610-1 to a consensus node 610-3), an authentication center 700, a service agent 800, and a service agent 900, respectively, are described below.

The type of blockchain network 600 is flexible and diverse, and may be any of public, private, or federated chains, for example. Taking public chains as an example, any electronic device of a business entity, such as a user terminal and a server, can access the blockchain network 600 without authorization; taking the alliance chain as an example, an electronic device (e.g., a terminal/server) under the jurisdiction of the service body after being authorized can access the blockchain network 600, and then becomes a client node in the blockchain network 600.

In some embodiments, the client node may act only as an observer of the blockchain network 600, i.e., providing functionality to support the business entity to initiate transactions (e.g., for storing data in the uplink or querying data on the chain), and may be implemented by default or selectively (e.g., depending on the specific business needs of the business entity) for functions of the nodes of the blockchain network 600, such as ordering functions, consensus services, ledger functions, and the like. Thus, the data and service processing logic of the service body can be migrated to the blockchain network 600 to the greatest extent, and the credibility and traceability of the data and service processing process are realized through the blockchain network 600.

Nodes in the blockchain network 600 receive transactions submitted from client nodes (e.g., client node 810, i.e., entity identification device, shown in fig. 12 attributed to business entity 800) of different business entities (e.g., business entity 800 shown in fig. 12), execute the transactions to update or query the ledger, and various intermediate or final results of executing the transactions may be displayed back in the client nodes of the business entity.

An exemplary application of a blockchain network is described below with multiple service principals accessing the blockchain network to implement entity identification.

With continued reference to fig. 12, the service entity 800 may be an entity identification system, the service entity 900 may be a retrieval system based on entity identification, and the registration is performed from the authentication center 700 to obtain respective digital certificates, where the digital certificates include a public key of the service entity, and a digital signature signed by the authentication center 700 with respect to the public key and identity information of the service entity, and the digital signatures are used to be attached to a transaction together with the digital signature of the service entity with respect to the transaction, and sent to a blockchain network, so that the blockchain network can take the digital certificates and signatures from the transaction, verify the reliability of the message (i.e. whether the message has not been tampered with) and the identity information of the service entity sending the message, and the blockchain network can verify according to the identity, for example, whether the authority to initiate the transaction is provided. A client operated by an electronic device (e.g., a terminal or a server) under the control of a service entity may request access from the blockchain network 600 to become a client node.

The client node 810 of the service body 800 is configured to obtain information to be processed, and obtain a target identification result corresponding to the information to be processed; the information to be processed and the target recognition result are sent to the blockchain network 600.

In the operation of sending the information to be processed and the target recognition result to the blockchain network 600, service logic may be set in the client node 810 in advance, when the acquisition of the target recognition result corresponding to the information to be processed is completed, the client node 810 automatically sends the information to be processed and the target recognition result to the blockchain network 600, or a service person of the service body 800 logs in the client node 810, packages the information to be processed and the target recognition result manually, and sends the information to the blockchain network 600. When sending, the client node 810 generates a transaction corresponding to the update operation according to the information to be processed and the target identification result, specifies in the transaction an intelligent contract that needs to be invoked to implement the update operation, and parameters transferred to the intelligent contract, the transaction also carries a digital certificate of the client node 810, a signed digital signature (e.g., obtained by encrypting a digest of the transaction using a private key in the digital certificate of the client node 810), and broadcasts the transaction to consensus nodes in the blockchain network 600.

When a transaction is received in a consensus node in the blockchain network 600, a digital certificate and a digital signature carried by the transaction are verified, after the verification is successful, whether the transaction main body 800 has transaction permission is confirmed according to the identity of the transaction main body 800 carried in the transaction, and any one verification judgment of the digital signature and the permission verification causes the transaction to fail. After verification is successful, the consensus node's own digital signature (e.g., encrypted with the private key of the consensus node 610-1) is signed and broadcast in the blockchain network 600 continues.

After receiving a successful transaction, the consensus node in the blockchain network 600 fills the transaction into a new block and broadcasts. When a new block is broadcast by a consensus node in the blockchain network 600, a consensus process is performed on the new block, if the consensus is successful, the new block is added to the tail of the blockchain stored in the new block, and the state database is updated according to the result of the transaction, so as to execute the transaction in the new block: for submitting a transaction for updating the information to be processed and the target identification result, adding a key value pair comprising the information to be processed and the target identification result into a state database.

The service person of the service agent 900 logs in the client node 910 (terminal 400), inputs a query request of an entity identification result of the information to be processed, the client node 910 obtains the query request to generate a transaction corresponding to the update operation/query operation, specifies an intelligent contract required to be invoked for implementing the update operation/query operation and parameters transferred to the intelligent contract in the transaction, the transaction further carries a digital certificate of the client node 910, a signed digital signature (for example, a digest of the transaction is encrypted using a private key in the digital certificate of the client node 910), and broadcasts the transaction to a consensus node in the blockchain network 600. The query request is used for querying the information to be processed and the target recognition result.

After the transaction is verified, the blocks are filled and the consensus is consistent, the filled new block is added to the tail part of the block chain stored by the block chain network 600, and the state database is updated according to the result of the transaction, so that the transaction in the new block is executed; for example, for the transaction of the identification result of the submitted query pending information, the key value pair corresponding to the multimedia text is queried from the state database, and the transaction result is returned.

It should be noted that, in fig. 12, a process of directly linking the to-be-processed information and the target recognition result is exemplarily shown, but in other embodiments, for a case where the data size of the to-be-processed information and the target recognition result is large, the client node 810 may pair-link hashes of the to-be-processed information and the target recognition result, and store the original to-be-processed information and the target recognition result in a distributed file system or database. After the client node 910 obtains the information to be processed and the target identification result from the distributed file system or the database, the client node may perform verification in combination with the corresponding hash in the blockchain network 600, thereby reducing the workload of the uplink operation.

As an example of a blockchain, referring to fig. 13, fig. 13 is a schematic structural diagram of a blockchain in a blockchain network according to an embodiment of the present invention, where a header of each block may include hash values of all transactions in the block, and also include hash values of all transactions in a previous block, and a record of a newly generated transaction is filled into the block and after passing through a consensus of nodes in the blockchain network, is appended to a tail of the blockchain to form a chain growth, and a chain structure based on the hash values between the blocks ensures tamper resistance and forgery resistance of transactions in the block.

Referring to fig. 14, fig. 14 is a schematic diagram of a functional architecture of a blockchain network according to an embodiment of the present invention, including an application layer 601, a consensus layer 602, a network layer 603, a data layer 604, and a resource layer 605, which are described below.

The resource layer 605 encapsulates computing resources, storage resources, and communication resources that implement the various consensus nodes 610 in the blockchain network 600.

The data layer 604 encapsulates various data structures that implement the ledger, including blockchains implemented with files in the file system, a state database of key values, and presence certificates (e.g., hash trees of transactions in the blocks).

The network layer 603 encapsulates the functions of Point-to-Point (P2P) network protocols, data propagation mechanisms and data verification mechanisms, access authentication mechanisms, and service body identity management.

Wherein the P2P network protocol enables communication between nodes in the blockchain network 600, a data propagation mechanism ensures propagation of transactions in the blockchain network 600, and a data verification mechanism is used to implement reliability of data transmission between nodes based on cryptography methods (e.g., digital certificates, digital signatures, public/private key pairs); the access authentication mechanism is used for authenticating the identity of the service entity joining the blockchain network 600 according to the actual service scene, and giving the authority of the service entity to access the blockchain network 600 when the authentication is passed; the service principal identity management is used to store the identity of the service principal that is allowed to access the blockchain network 600, as well as the permissions (e.g., the type of transaction that can be initiated).

The consensus layer 602 encapsulates the functionality of the mechanism by which consensus nodes in the blockchain network 600 agree on blocks (i.e., consensus mechanism), transaction management, and ledger management. The consensus mechanism comprises consensus algorithms such as POS, POW and DPOS, and supports the pluggable of the consensus algorithms.

The transaction management is used for verifying the digital signature carried in the transaction received by the node, verifying the identity information of the service entity, judging and confirming whether the service entity has authority to conduct the transaction according to the identity information (reading the related information from the identity management of the service entity); for the business entity that obtains authorization to access the blockchain network 600, all possess the digital certificates issued by the authentication center, and the business entity signs the submitted transaction with the private key in its own digital certificate, thereby declaring its legal identity.

Ledger management is used to maintain blockchains and state databases. For the block with consensus, adding to the tail of the block chain; executing the transaction in the block with consensus, updating the key value pairs in the state database when the transaction comprises an update operation, querying the key value pairs in the state database when the transaction comprises a query operation, and returning a query result to the client node of the business entity. Supporting query operations for multiple dimensions of a state database, comprising: querying a block based on a block sequence number (e.g., a hash value of a transaction); inquiring the block according to the block hash value; inquiring the block according to the transaction serial number; inquiring the transaction according to the transaction serial number; inquiring account data of the service body according to the account (serial number) of the service body; the blockchains in the channel are queried according to the channel name.

The application layer 601 encapsulates various services that can be implemented by the blockchain network, including tracing, certification and verification of transactions, etc.

In the following, an exemplary application of the embodiment of the present invention in a practical application scenario will be described.

Illustratively, referring to FIG. 15, FIG. 15 is a schematic illustration of an exemplary entity identification application provided by an embodiment of the present invention; as shown in fig. 15, in the index establishment stage of the search scene, the document title 15-21 (text information to be identified) of the document 15-11 (information to be processed) is obtained, and the entity identification method 15-3 provided by the embodiment of the present invention is used to identify the entity of the document title 15-21, so as to obtain the target identification result 15-41, and then the inverted row is established according to the target identification result 15-41, so that the document 15-11 is stored in the index library 15-5 (preset resource library).

In the retrieval stage of the search scene, corresponding text information is extracted from input information 15-12 (information to be processed) to obtain input text 15-22 (text information to be identified), and the entity identification method 15-3 provided by the embodiment of the invention is utilized to carry out entity identification on the input text 15-22 to obtain a target identification result 15-42, and document recall is carried out from an index library 15-5 according to the target identification result 15-42 to obtain a recall document 15-7 (information retrieval result), and the recall document 15-7 is used as supplement of the recall document, and is combined with the recalled documents in other modes to sort and output the retrieval result 15-8 so as to realize presentation of the retrieval result.

Referring to fig. 16, for a process of performing entity recognition on a document title 15-21 using the entity recognition method 15-3 provided in the embodiment of the present invention, first, at least one recognition result 16-11 of the document title 16-1 (the document title 15-21 in fig. 15) is extracted, and features are constructed using the feature construction module 16-2 and according to the at least one recognition result 16-11: at least one length ratio 16-21, at least one current character entity type feature 16-22, at least one current character position feature 16-23, and a voting feature 16-24; and the feature construction module 16-2 is utilized to extract the target text feature 16-25 of the document title 16-1. Next, at least one length ratio 16-21, at least one current character entity type feature 16-22, at least one current character position feature 16-23, a voting feature 16-24, and a target text feature 16-25 are concatenated at word granularity to obtain a feature to be identified 16-26. Then, the features to be identified 16-26 are sequentially input to the "Bi LSTM" model 16-3 (including the plurality of neurons 16-31) and the "CRF" layer 16-4, and the target identification result 16-5 (the target identification result 15-41 in FIG. 15) is also obtained. Here, feature construction module 16-2, "Bi LSTM" model 16-3 and "CRF" layer 16-4 are fusion entity identification models. In addition, for the process of performing entity recognition on the input text 15-22 by using the entity recognition method 15-3 provided by the embodiment of the present invention, similar to the process of performing entity recognition on the document title 15-21 by using the entity recognition method 15-3 provided by the embodiment of the present invention, the embodiment of the present invention is not repeated here.

Continuing with the description below of an exemplary architecture of the entity recognition device 455 implemented as a software module provided by embodiments of the present invention, in some embodiments, as shown in fig. 3, the software module stored in the entity recognition device 455 of the memory 450 may include:

the information acquisition module 4551 is configured to acquire information to be processed and text information corresponding to the information to be processed, so as to obtain text information to be identified;

the basic recognition module 4552 is configured to perform entity recognition on the text information to be recognized by using at least one basic entity recognition model, so as to obtain at least one recognition result that corresponds to the text information to be recognized respectively; the at least one basic entity recognition model refers to at least one mode for entity recognition;

the fusion recognition module 4553 is configured to extract text features of the text information to be recognized by using a fusion entity recognition model to obtain target text features, and perform feature construction on the at least one recognition result to obtain basic model features; combining the basic model features and the target text features to perform entity recognition to obtain a target recognition result; the fusion entity recognition model is used for carrying out entity recognition on the text information to be recognized by utilizing the at least one recognition result, the target text characteristics refer to characteristics about characters and character strings in the text information to be recognized, and the target recognition result is the entity recognition result of the information to be processed.

Further, the at least one base entity recognition model includes at least one of a dictionary recognition model, an artificial intelligence recognition model, and a new entity type recognition model; the dictionary recognition model refers to a mode of carrying out entity recognition based on a historical entity type and utilizing an entity dictionary, the artificial intelligent recognition model refers to a mode of carrying out entity recognition based on a historical entity type and utilizing a network model, the new entity type recognition model refers to a mode of carrying out entity recognition based on a new entity type, and the new entity type comprises the historical entity type.

Further, the fusion recognition module 4553 is further configured to perform word segmentation on the text information to be recognized to obtain a string sequence; extracting character text features corresponding to the current character, thereby realizing the acquisition of character string text features corresponding to the current character string, and further realizing the acquisition of the target text features corresponding to the character string sequence; the current character string is any character string in the character string sequence, the current character is any character in the current character string, the character string text feature comprises at least one character text feature, the target text feature comprises at least one character string text feature, and the character text feature comprises at least one of semantic features of the current character, position information of the current character in the current character string, semantic features of the current character string and part-of-speech information of the current character string.

Further, the fusion recognition module 4553 is further configured to perform a feature construction of a character entity type and a feature construction of a character position on the current recognition result respectively, so as to obtain a current character entity type feature and a current character position feature, thereby obtaining at least one current character entity type feature and at least one current character position feature corresponding to the at least one recognition result; the current recognition result is any recognition result in the at least one recognition result; counting the voting times of each character belonging to each entity type in the text information to be identified from the at least one identification result to obtain voting characteristics; and fusing the at least one current character entity type feature, the at least one current character position feature and the voting feature to obtain the basic model feature.

Further, the fusion recognition module 4553 is further configured to obtain an entity type corresponding to each character in the current recognition result, so as to obtain character corresponding entity type information; performing feature construction on the entity type information corresponding to the characters to obtain entity type features corresponding to the characters, thereby obtaining the current character entity type features corresponding to the current recognition result; the current character entity type feature comprises at least one character corresponding entity type feature; acquiring the position of each character in the current recognition result in the corresponding character string to obtain character corresponding character string position information; performing feature construction on the character corresponding to character string position information to obtain character corresponding to character string position features, thereby obtaining the current character position features corresponding to the current recognition result; the current character position feature includes at least one character corresponding to a character string position feature.

Further, when the at least one basic entity recognition model includes a dictionary recognition model, the fusion recognition module 4553 is further configured to obtain a dictionary recognition result from the at least one recognition result; the dictionary recognition result is a result of recognizing the text information to be recognized by the dictionary recognition model; and obtaining a length ratio of the entity length corresponding to each character in the dictionary identification result to the text length to obtain at least one length ratio corresponding to the dictionary identification result.

Correspondingly, the fusion recognition module 4553 is further configured to fuse the at least one length ratio, the at least one current character entity type feature, the at least one current character position feature, and the voting feature to obtain the basic model feature.

Further, the fusion recognition module 4553 is further configured to splice the basic model feature and the target text feature to obtain a feature to be recognized; carrying out semantic coding on the features to be identified to obtain a semantic coding result; and decoding the semantic coding result to realize entity identification, and obtaining the target identification result.

Further, the entity identification transpose 455 further includes an application module 4554 configured to extract an entity and an entity type corresponding to the entity from the target identification result, so as to obtain information of the entity to be processed; in a preset resource library, carrying out information retrieval according to the entity information to be processed to obtain an information retrieval result; and the information retrieval result is information associated with the information to be processed in the preset resource library.

Further, the entity identification transpose 455 further includes a blockchain module 4555 configured to send the information to be processed and the target identification result to a blockchain network, so that nodes of the blockchain network populate new blocks with the information to be processed and the target identification result, and when the new blocks are consistent, append the new blocks to a tail of a blockchain to complete a chaining.

Embodiments of the present invention provide a computer-readable storage medium storing executable instructions that, when executed by a processor, cause the processor to perform an entity identification method provided by embodiments of the present invention, for example, the entity identification method shown in fig. 4.

In some embodiments, the computer readable storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories.

In some embodiments, the executable instructions may be in the form of programs, software modules, scripts, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.

As an example, the executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in one or more scripts in a hypertext markup language (HTML, hyper Text Markup Language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

As an example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices located at one site or, alternatively, distributed across multiple sites and interconnected by a communication network.

In summary, according to the embodiment of the present invention, since the obtained target recognition result is obtained by performing feature construction by combining at least one recognition result and text features corresponding to text information to be recognized; therefore, on one hand, even if at least one recognition result has an error recognition result, text features corresponding to the text information to be recognized can correct the error recognition result; on the other hand, the text features corresponding to the text information to be identified are combined with at least one identification result to perform feature construction to obtain the features to be identified for entity identification, so that the features to be identified are rich, and therefore, when the features to be identified are subjected to entity identification, the accuracy of the obtained target identification result is high; in summary, the entity identification method provided by the embodiment of the invention improves the accuracy of entity identification.

The foregoing is merely exemplary embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. A method of entity identification, comprising:

the fusion entity recognition model is used for carrying out entity recognition on the text information to be recognized by utilizing the at least one recognition result, the target text characteristics refer to characteristics about characters and character strings in the text information to be recognized, and the target recognition result is the entity recognition result of the information to be processed;

the feature construction is performed on the at least one recognition result to obtain basic model features, and the feature construction comprises the following steps: respectively carrying out characteristic construction of the character entity type and characteristic construction of the character position on the current recognition result to obtain the current character entity type characteristic and the current character position characteristic, thereby realizing acquisition of at least one current character entity type characteristic and at least one current character position characteristic corresponding to the at least one recognition result; the current recognition result is any recognition result in the at least one recognition result; counting the voting times of each character belonging to each entity type in the text information to be identified from the at least one identification result to obtain voting characteristics; fusing the at least one current character entity type feature, the at least one current character position feature and the voting feature to obtain the basic model feature;

And combining the basic model features and the target text features to perform entity recognition to obtain a target recognition result, wherein the method comprises the following steps: splicing the basic model features and the target text features to obtain features to be identified; carrying out semantic coding on the features to be identified to obtain a semantic coding result; and decoding the semantic coding result to realize entity identification, and obtaining the target identification result.

2. The method of claim 1, wherein the at least one base entity recognition model comprises at least one of a dictionary recognition model, an artificial intelligence recognition model, and a new entity type recognition model;

the dictionary recognition model refers to a mode of carrying out entity recognition based on a historical entity type and utilizing an entity dictionary, the artificial intelligent recognition model refers to a mode of carrying out entity recognition based on a historical entity type and utilizing a network model, the new entity type recognition model refers to a mode of carrying out entity recognition based on a new entity type, and the new entity type comprises the historical entity type.

3. The method according to claim 1 or 2, wherein the extracting text features of the text information to be identified, to obtain target text features, comprises:

Word segmentation is carried out on the text information to be identified, and a character string sequence is obtained;

extracting character text features corresponding to the current character, thereby realizing the acquisition of character string text features corresponding to the current character string, and further realizing the acquisition of the target text features corresponding to the character string sequence;

the current character string is any character string in the character string sequence, the current character is any character in the current character string, the character string text feature comprises at least one character text feature, the target text feature comprises at least one character string text feature, and the character text feature comprises at least one of semantic features of the current character, position information of the current character in the current character string, semantic features of the current character string and part-of-speech information of the current character string.

4. The method according to claim 1, wherein the performing feature construction of the character entity type and feature construction of the character position on the current recognition result to obtain the current character entity type feature and the current character position feature respectively includes:

acquiring an entity type corresponding to each character in the current recognition result, and acquiring entity type information corresponding to the character;

Performing feature construction on the entity type information corresponding to the characters to obtain entity type features corresponding to the characters, thereby obtaining the current character entity type features corresponding to the current recognition result; the current character entity type feature comprises at least one character corresponding entity type feature;

acquiring the position of each character in the current recognition result in the corresponding character string to obtain character corresponding character string position information;

performing feature construction on the character corresponding to character string position information to obtain character corresponding to character string position features, thereby obtaining the current character position features corresponding to the current recognition result; the current character position feature includes at least one character corresponding to a character string position feature.

5. The method of claim 1, wherein when the at least one base entity recognition model comprises a dictionary recognition model, the fusing the at least one current character entity type feature, the at least one current character position feature, and the voting feature to obtain the base model feature, the method further comprises:

acquiring dictionary recognition results from the at least one recognition result; the dictionary recognition result is a result of recognizing the text information to be recognized by the dictionary recognition model;

Acquiring a length ratio of an entity length corresponding to each character in the dictionary identification result to a text length to obtain at least one length ratio corresponding to the dictionary identification result;

correspondingly, the fusing the at least one current character entity type feature, the at least one current character position feature and the voting feature to obtain the basic model feature comprises the following steps:

and fusing the at least one length ratio, the at least one current character entity type characteristic, the at least one current character position characteristic and the voting characteristic to obtain the basic model characteristic.

6. The method according to claim 1 or 2, wherein after the combining the basic model feature and the target text feature for entity recognition, the method further comprises:

extracting an entity and an entity type corresponding to the entity from the target identification result to obtain entity information to be processed;

in a preset resource library, carrying out information retrieval according to the entity information to be processed to obtain an information retrieval result; and the information retrieval result is information associated with the information to be processed in the preset resource library.

7. The method according to claim 1 or 2, wherein after the combining the basic model feature and the target text feature for entity recognition, the method further comprises:

transmitting the information to be processed and the target identification result to a blockchain network so that

And filling the information to be processed and the target identification result into a new block by the node of the blockchain network, and adding the new block to the tail of the blockchain to finish the uplink when the new block is consistent in consensus.

8. An entity identification device, comprising:

a memory for storing executable instructions;

a processor for implementing the method of any one of claims 1 to 7 when executing executable instructions stored in said memory.

9. An entity identification device, comprising:

The fusion recognition module is used for extracting text features of the text information to be recognized by adopting a fusion entity recognition model to obtain target text features, and carrying out feature construction on the at least one recognition result to obtain basic model features; combining the basic model features and the target text features to perform entity recognition to obtain a target recognition result; the fusion entity recognition model is used for carrying out entity recognition on the text information to be recognized by utilizing the at least one recognition result, the target text characteristics refer to characteristics about characters and character strings in the text information to be recognized, and the target recognition result is the entity recognition result of the information to be processed;

the fusion recognition module is further used for respectively carrying out characteristic construction of the character entity type and characteristic construction of the character position on the current recognition result to obtain the current character entity type characteristic and the current character position characteristic, so that at least one current character entity type characteristic and at least one current character position characteristic corresponding to the at least one recognition result are obtained; the current recognition result is any recognition result in the at least one recognition result; counting the voting times of each character belonging to each entity type in the text information to be identified from the at least one identification result to obtain voting characteristics; fusing the at least one current character entity type feature, the at least one current character position feature and the voting feature to obtain the basic model feature;

The fusion recognition module is further used for splicing the basic model features and the target text features to obtain features to be recognized; carrying out semantic coding on the features to be identified to obtain a semantic coding result; and decoding the semantic coding result to realize entity identification, and obtaining the target identification result.

10. A computer readable storage medium storing computer executable instructions or a computer program, which when executed by a processor, implement the method of any one of claims 1 to 7.