CN110134965A - Method, apparatus, equipment and computer readable storage medium for information processing - Google Patents

Method, apparatus, equipment and computer readable storage medium for information processing Download PDF

Info

Publication number
CN110134965A
CN110134965A CN201910426142.7A CN201910426142A CN110134965A CN 110134965 A CN110134965 A CN 110134965A CN 201910426142 A CN201910426142 A CN 201910426142A CN 110134965 A CN110134965 A CN 110134965A
Authority
CN
China
Prior art keywords
instance
feature
entity
similitude
description information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910426142.7A
Other languages
Chinese (zh)
Other versions
CN110134965B (en
Inventor
方舟
冯知凡
张扬
陆超
朱勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910426142.7A priority Critical patent/CN110134965B/en
Publication of CN110134965A publication Critical patent/CN110134965A/en
Application granted granted Critical
Publication of CN110134965B publication Critical patent/CN110134965B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

According to an example embodiment of the present disclosure, a kind of method, apparatus, equipment and computer readable storage medium for information processing is provided.A method of for information processing, comprising: obtain the feature of first instance and the feature of second instance;Feature based on first instance, generating first instance indicates;Feature based on second instance, generating second instance indicates;Determine the characteristic similarity between the feature of first instance and the feature of second instance;And based on first instance expression, second instance expression and characteristic similarity, determine the entity similitude between first instance and second instance.This programme can realize that more accurate entity disambiguates in conjunction with the similitude of the feature of the Similarity Model and entity of the description information for entity as a result,.

Description

Method, apparatus, equipment and computer readable storage medium for information processing
Technical field
Embodiment of the disclosure relates generally to field of information processing, and more particularly, to the side for information processing Method, device, equipment and computer readable storage medium.
Background technique
With the fast development of network technology, information is growing, accurately obtains the demand of requested information also therewith Increase.However, the search result of user is usually inaccurate since natural language is there are ambiguousness.Existing disambiguation scheme without Method meets the search need of user, so that reducing the search experience of user.
Summary of the invention
According to an example embodiment of the present disclosure, a kind of scheme for information processing is provided.
In the first aspect of the disclosure, provide a kind of for information processing method, comprising: obtain the spy of first instance It seeks peace the feature of second instance;Feature based on first instance, generating first instance indicates;Feature based on second instance, it is raw It is indicated at second instance;Determine the characteristic similarity between the feature of first instance and the feature of second instance;And based on the One entity indicates, second instance indicates and characteristic similarity, determines the entity similitude between first instance and second instance.
In the second aspect of the disclosure, a kind of information processing unit is provided, comprising: feature obtains module, is configured To obtain the feature of first instance and the feature of second instance;First instance indicates generation module, is configured as real based on first The feature of body, generating first instance indicates;Second instance indicates generation module, is configured as the feature based on second instance, raw It is indicated at second instance;Characteristic similarity determining module is configured to determine that the feature of first instance and the feature of second instance Between characteristic similarity;And entity similitude determining module, it is configured as indicating based on first instance expression, second instance And characteristic similarity, determine the entity similitude between first instance and second instance.
In the third aspect of the disclosure, a kind of electronic equipment, including one or more processors are provided;And storage Device, for storing one or more programs, when one or more programs are executed by one or more processors so that one or The method that multiple processors realize the first aspect according to the disclosure.
In the fourth aspect of the disclosure, a kind of computer-readable medium is provided, computer program is stored thereon with, it should The method of the first aspect according to the disclosure is realized when program is executed by processor.
It should be appreciated that content described in Summary be not intended to limit embodiment of the disclosure key or Important feature, it is also non-for limiting the scope of the present disclosure.The other feature of the disclosure will become easy reason by description below Solution.
Detailed description of the invention
It refers to the following detailed description in conjunction with the accompanying drawings, the above and other feature, advantage and aspect of each embodiment of the disclosure It will be apparent.In the accompanying drawings, the same or similar appended drawing reference indicates the same or similar element, in which:
Fig. 1 shows embodiment of the disclosure can be in the schematic diagram for the example context wherein realized;
Fig. 2 shows the flow charts for information processing according to some embodiments of the present disclosure;
Fig. 3 shows the schematic diagram of the Similarity Model according to some embodiments of the present disclosure;
Fig. 4 shows the schematic block diagram of the device for information processing according to some embodiments of the present disclosure;And
Fig. 5 shows the block diagram that can implement the calculating equipment of some embodiments of the present disclosure.
Specific embodiment
Embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the certain of the disclosure in attached drawing Embodiment, it should be understood that, the disclosure can be realized by various forms, and should not be construed as being limited to this In the embodiment that illustrates, providing these embodiments on the contrary is in order to more thorough and be fully understood by the disclosure.It should be understood that It is that being given for example only property of the accompanying drawings and embodiments effect of the disclosure is not intended to limit the protection scope of the disclosure.
In the description of embodiment of the disclosure, term " includes " and its similar term should be understood as that opening includes, I.e. " including but not limited to ".Term "based" should be understood as " being based at least partially on ".Term " one embodiment " or " reality Apply example " it should be understood as " at least one embodiment ".Term " first ", " second " etc. may refer to different or identical right As.Hereafter it is also possible that other specific and implicit definition.
Term " entity " can be an article, the information half structure less abundant in segment or content, such as network Change or structureless article, segment or content etc..
Traditionally, in order to disambiguate to entity, there are two kinds of solutions.The first scheme is to be based on having monitor model Method.Disambiguation of the program for the entity of the more rich structuring of information.Since the program relies on structuring abundant Entity information, therefore in less semi-structured or Un-structured the entity of processing entities information, it is difficult to capture entity information In deep semantic information, to be difficult to that problem associated with semi-structured or Un-structured entity is effectively treated.In addition, The program needs a large amount of labeled data and complicated feature construction, causes with high costs.
Second scheme is the method based on template and rule match.For example, the program can be simply by entity Certain features (such as, name or alias) matched, Lai Jinhang entity disambiguate.Although the program is with higher accurate Degree, but various types of entities can not be flexibly applied to.If there is the entity of new type, then program needs regenerate New template or rule matching method causes moving costs larger poor with scalability.Disappear for this purpose, herein proposing for entity The solution of discrimination.
In general, in accordance with an embodiment of the present disclosure, available entity to be disambiguated (referred to herein as " first instance ") With ambiguity entity (referred to herein as " second instance ") identical with first instance possibility.It is less rich that first instance can be information Rich semi-structured or structureless entity, such as an article on network.Similarly, second instance be also possible to information compared with Semi-structured or structureless entity not abundant, such as another article on network.
It is then possible to receive to first instance and second instance will compared with feature selection, and use rule Method is (for example, accurate comparison, editing distance compares, the time compares, text similarity compares, co-occurrence compares, digital comparison and class Type compares) feature of first instance and second instance is compared.
Further, it is also possible to determine first instance and second instance for the general of identical entity using Similarity Model method Rate.In Similarity Model method, the depth model of such as twin neural network (siamese) model can be used.In tradition In depth model, first instance and second is determined using only the description of the text of first instance and second instance or description information in fact Body is the probability of identical entity.However, different from conventional depth model, this programme is in addition to using first instance and second instance Description information, can be combined with similitude between the feature of first instance and the feature of second instance and first instance, there are discriminations The probability of adopted entity, to determine first instance and second instance for the probability of identical entity.
By this method, this programme can carry out entity disambiguation with binding rule method and Similarity Model method, to realize More accurate entity disambiguates.Further, in Similarity Model method, this programme can be in conjunction with the description information for entity Similarity Model and entity feature similitude, more accurately determine entity between similitude.
Hereinafter, the specific example of this programme will be more fully described in conjunction with Fig. 1-Fig. 5.Fig. 1 shows the disclosure Embodiment can be in the schematic diagram for the example context 100 wherein realized.Storage system 100 includes calculating equipment 110 and depositing Store up equipment 120.Calculate equipment 110 can include but is not limited to cloud computing equipment, mainframe computer, server, personal computer, Any equipment with computing capability such as desktop computer, laptop computer, tablet computer and personal digital assistant.Storage Equipment 120, which may include that database, cloud storage equipment, magnetic storage apparatus, optical storage apparatus etc. are any, has storage capacity Physics or virtual memory facilities.
First instance 130 can be obtained from storage equipment 120 by calculating equipment 110.As described above, first instance 130 can be with It is that information is less abundant semi-structured or structureless entity, such as an article on network.Calculating equipment 110 can be right First instance 130 is pre-processed, with use several features (for example, mark, type, description information, key-value pair, related entities, And multimedia content) indicate first instance 130.
In addition, calculating, equipment 110 can also be obtained from storage equipment 120 and first instance 130 possible identical second is real Body 140.Similarly, second instance 140 is also possible to that information is less abundant semi-structured or structureless entity, such as network On another article.Therefore, in certain embodiments, second instance 140 can also be pre-processed by calculating equipment 110, To use several character representation second instances 140.
Then, calculate equipment 110 can receive to first instance 130 and second instance 140 will compared with feature Selection, and using rule and method (for example, accurate comparison, editing distance compares, the time compares, text similarity compares, co-occurrence Compare, digital comparison and type are compared) feature of first instance 130 and second instance 140 is compared.
In addition, first instance 130 and second instance can also be determined using Similarity Model method by calculating equipment 110 140 be the probability of identical entity.In Similarity Model method, such as twin neural network (siamese) model can be used Depth model determine the similitude 150 between first instance 130 and second instance 140.Twin neural network model can be with Including two map units (for example, it uses two-way shot and long term memory models (Bi-LSTM)) with identical parameters, full connection Unit (for example, it is full articulamentum) and taxon (for example, it uses softmax model).
Specifically, calculating equipment 110 can description information (referred to herein as " the first description letter based on first instance 130 Breath ") and second instance 140 description information (referred to herein as " the second description information ") generate expression the first description information to It measures (referred to herein as " the first text representation ") and indicates vector (referred to herein as " the second text table of the second description information Show ").In certain embodiments, term vector list can be applied to for the first description information and the second description information by calculating equipment 110 First description information and the second description information are mapped as the first text representation and the second text representation by member.Term vector unit Such as BERT model can be used, but not limited to this, term vector unit can also use such as Word2Vec model, ELMo model Deng.
Calculating equipment 110 can be applied to have identical parameters for the first text representation generated and the second text representation Two map units (being referred to as " the first map unit " and " the second map unit " herein), it is real for first to generate The entity of body 130 and second instance 140 indicates (being referred to as " first instance expression " and " second instance expression " herein), all Such as the low-dimensional hidden layer vector of the first description information and the second description information.Such as Bi-LSTM model can be used in map unit, but Without being limited thereto, map unit can also be used such as CNN model, RNN model.Then, calculating equipment 110 can be real by first Body surface, which shows, to be indicated to be applied to full connection unit with second instance.
Further, calculate equipment 110 can also generate first instance 130 feature and second instance 140 feature it Between similitude (referred to herein as " characteristic similarity ").In certain embodiments, characteristic similarity can be by following come table Sign: the co-occurrence of the feature of entity, the similitude of the description information of entity, the type consistency of the type of entity, entity it is more The similitude etc. of media content.
In addition, calculating equipment 110 can also obtain predetermined first instance 130, there are the probability of ambiguity entity.So Afterwards, calculating equipment 110 can be by characteristic similarity and there are the probability of ambiguity entity to apply equally to full connection unit.
Thus, it is possible to by through connecting entirely first instance indicate, second instance indicate, characteristic similarity and there are ambiguity reality The probability of body is applied to taxon, to determine that the similitude 150 between first instance 130 and second instance 140 (claims herein For " entity similitude ").Such as softmax model can be used in taxon, but not limited to this, taxon can also use Such as comparison loss (Contrastive Loss) model, COS distance model, Euclidean distance model etc..
By this method, this programme can realize that more accurate entity disappears with binding rule method and Similarity Model method Discrimination.In rule and method, this programme can permit the selection received to the feature to be compared of entity, make it possible to configure than Compared with feature and the rule and method for comparing combination, thereby increase the flexibility, adaptability and robustness of this programme.Into One step, in Similarity Model method, this programme can be in conjunction with the Similarity Model and entity of the description information for entity Feature similitude, so as to more accurately determine entity between similitude.
Fig. 2 shows the flow charts 200 for information processing according to some embodiments of the present disclosure.For example, method 200 It can be performed at calculating equipment 110 as shown in Figure 1 or other equipment appropriate.In addition, method 200 can also include Unshowned additional step and/or it can be omitted shown step, the scope of the present disclosure is not limited in this respect.
210, calculates equipment 110 and obtain the feature of first instance and the feature of second instance.As described above, first instance It can be that information is less abundant semi-structured or structureless entity with second instance, such as an article on network.? In some embodiments, the feature of first instance can be extracted from first instance by calculating equipment 110.Similarly, equipment 110 is calculated The feature of second instance can be extracted from second instance.For example, calculating the available first instance of equipment 110 and second in fact Body, and first instance and second instance are pre-processed, to use several character representation first instances and second instance.It is special Sign may, for example, be mark, type, description information, key-value pair, related entities or multimedia content.
Mark can for example indicate the title or alias of entity.Type can indicate the type or related with type of entity Label.In certain embodiments, if acquired entity lacks type, calculating equipment 110 will be pre- to entity progress type It surveys, to determine the type of the entity.
Description information can for example indicate that the text for entity describes.In certain embodiments, if acquired reality Body lacks description information, and description information will be generated using other features of entity by calculating equipment 110.Key-value pair can be for example from The attribute information extracted in description information, the attribute information are represented as the form of key-value.
For example, the key-value pair extracted from description information " " daphne odera " is Zhou Jielun in the album of distribution in 2004 " can be by It is expressed as " Zhou Jielun, works, daphne odera ".Furthermore it is possible to for such as number in description information, place, personage, group loom The predetermined informations such as structure distinguish expression.For example, the key-value pair extracted from description information " change of team in sieve C 2018 to Juventus " Can be represented as " number: 2018, organization: Juventus ".
Related entities can for example indicate other entities relevant to entity, such as be related to the entity of Zhou Jielun and be related to elder brother The entity of icepro is related.Multimedia content can image for example including entity and link.
In certain embodiments, calculating equipment 110 can be only not all to candidate entity associated with first instance Entity carries out entity disambiguation, to reduce the computation complexity of entity disambiguation.For example, first instance can be based on by calculating equipment 110 Feature, obtain the mark of first instance.The mark of first instance and the mark of candidate entity can be carried out by calculating equipment 110 Compare.In the case where the similitude of mark is more than predetermined threshold, calculate equipment 110 just obtains the feature of second instance for into Row following entities disambiguate.
Then, calculating equipment 110 can be by Similarity Model method, to determine between first instance and second instance Entity similitude.Hereinafter, method 200 will be described in conjunction with Similarity Model 300 shown in Fig. 3.Similitude mould Type 300 is used to determine first instance and second instance is the probability of identical entity.Similarity Model can use such as twin mind The similitude between first instance and second instance is determined through network (siamese) model.Twin neural network model can be with Including two map units 330 and 335 (for example, two-way shot and long term memory models (Bi-LSTM)) with identical parameters, Quan Lian Order member 340 (for example, full articulamentum) and taxon 350 (for example, softmax model).
220, feature of the equipment 110 based on first instance is calculated, generating first instance indicates.In certain embodiments, The first description information for being directed to first instance can be obtained from the feature of first instance by calculating equipment 110.Calculating equipment 110 can be with Based on the first description information, the first text representation for indicating the first description information is generated.Then, calculating equipment 110 can be by the One text representation is applied to the first map unit 330 of Similarity Model, and the first text representation is mapped as first instance table Show.
As the example for generating first instance expression, the first word can be applied to for the first description information by calculating equipment 110 First description information is mapped as the first text representation by vector location 320.First term vector unit 320 can be used for example BERT model, but not limited to this, the first term vector unit 320 can also be used such as Word2Vec model, ELMo model.So Afterwards, the first map unit 330 can be applied to for the first text representation generated by calculating equipment 110, to generate first instance It indicates.Such as Bi-LSTM model can be used in first map unit 330, but not limited to this, the first map unit 330 can be with Use such as CNN model, RNN model etc..First instance indicates to can be the low-dimensional hidden layer vector of such as the first description information.
230, feature of the equipment 110 based on second instance is calculated, generating second instance indicates.In certain embodiments, The second description information for being directed to second instance can be obtained from the feature of second instance by calculating equipment 110.Calculating equipment 110 can be with Based on the second description information, the second text representation for indicating the second description information is generated.Then, calculating equipment 110 can be by the Two text representations are applied to the second map unit 335 of Similarity Model, and the second text representation is mapped as second instance table Show.First map unit 330 and the second map unit 335 have identical parameters.
It is indicated similar to first instance is generated, when generating second instance indicates, calculating equipment 110 for example can be by second Description information is applied to the second term vector unit 325, and the second description information is mapped as the second text representation.Then, it calculates Second text representation generated can be applied to the second map unit 335 by equipment 110, be indicated with generating second instance.The Two map units 335 have parameter identical with the first map unit 330, and identical model, such as Bi- can be used LSTM model, CNN model, RNN model etc..Second instance indicates to can be the low-dimensional hidden layer vector of such as the second description information.
240, the characteristic similarity between the feature of the determining first instance of equipment 110 and the feature of second instance is calculated. In certain embodiments, comparing unit can be applied to for the feature of first instance and the feature of second instance by calculating equipment 110 360 to determine characteristic similarity.Characteristic similarity can be determined by following various ways.In certain embodiments, calculating is set Standby 110 can determine the co-occurrence of the feature of first instance and the feature of second instance.For example, the related entities of first instance and The co-occurrence of the related entities of second instance.In further embodiments, the description of first instance can be determined by calculating equipment 110 The similitude of information and the description information of second instance, for example, description information probability latent semantic analysis (PLSA) similarity, The similarity etc. of the substring of description information.In still other embodiments, calculate equipment 110 can determine first instance type and The type consistency of the type of second instance, such as whether type is identical, whether type belongs to hyponymy, type prediction knot Whether fruit is consistent etc..Additionally or alternatively, the multimedia content and second instance of first instance can be determined by calculating equipment 110 The similitude, such as similitude, the similitude of link of entity reference of picture of entity of multimedia content etc..
It 250, calculates that equipment 110 can be indicated based on first instance, second instance indicates and characteristic similarity, determines the Entity similitude between one entity and second instance.In certain embodiments, calculating equipment 110 can be by first instance table Show, the full connection unit 340 of second instance expression and characteristic similarity applied to Similarity Model, indicated with generating full connection, Such as full link vector.Further, in certain embodiments, calculating equipment 110, which can also obtain first instance, whether there is The probability of ambiguity entity.In this case, calculate equipment 110 and acquired probability and first instance can also be indicated, Second instance indicates and characteristic similarity is applied to the full connection unit 340 of Similarity Model, is indicated with generating full connection.
Then, it calculates equipment 110 and will can connect full expression and be applied to taxon 350, to generate first instance and the Two entities are the probability of identical entity as entity similitude.As described above, taxon 350 can be such as softmax mould Type, but not limited to this, taxon 350 can also be such as comparison loss (Contrastive Loss) model, COS distance Model, Euclidean distance model etc..In the case where entity similitude is more than predetermined threshold, first can be determined by calculating equipment 110 Entity and second instance are similar entities
The foregoing describe Similarity Model methods.In certain embodiments, Similarity Model method can be with binding rule side Method executes.For example, in rule and method, calculate equipment 110 can receive to first instance and second instance will compared with The selection of feature.Calculate equipment 110 can by the feature of the second instance of the feature and selection of the first instance to selection into Row compares, to determine whether first instance and second instance are different entities.
Such comparison include accurate comparison, editing distance compares, the time compares, text similarity compares, co-occurrence compares, Digital comparison or type compare.It is identical to determine if accurately relatively for example can accurately to match two character strings.Editor away from From comparing with a distance from the minimum editor (Levinstein) that can for example return to two character strings, which be can be between 0-1 A successive value.Time relatively can for example customized threshold value, if the absolute value of the difference of two values be less than threshold value, it is determined that Its is identical.Text similarity relatively can for example calculate the PLSA similarity of two values.Co-occurrence relatively can for example determine one Whether character string occurs in another character string.Number relatively can for example compare two floating numbers, and customized threshold value, If the absolute value of the difference of two values is less than threshold value, it is determined that its is identical.Type relatively can for example compare the class of two entities Similarity between type.
In certain embodiments, determining first instance and second instance for the feelings of different entities using above-mentioned rule and method Under condition, similarity model method can be executed to determine entity similitude by calculating equipment 110.Alternatively, in first instance and In the case that two entities are identical entity, similarity model method can also be executed to determine entity similitude by calculating equipment 110. Although in addition, herein Similarity Model method is described as executing after rule and method, Similarity Model method and The execution sequence of rule and method is not restricted by, such as Similarity Model method can execute before rule and method, or parallel It executes.
By this method, this programme can carry out entity disambiguation with binding rule method and Similarity Model method, to realize More accurate entity disambiguates.Further, in Similarity Model method, this programme can be in conjunction with the description information for entity Depth model and entity feature similitude, more accurately determine entity between similitude.
Fig. 4 shows the schematic block diagram of the device 400 according to an embodiment of the present disclosure for information processing.Such as Fig. 4 institute Show, device 400 includes: that feature obtains module 410, is configured as obtaining the feature of first instance and the feature of second instance;The One entity indicates generation module 420, is configured as the feature based on first instance, and generating first instance indicates;Second instance table Show generation module 430, be configured as the feature based on second instance, generating second instance indicates;Characteristic similarity determining module 440, the characteristic similarity being configured to determine that between the feature of first instance and the feature of second instance;And entity similitude Determining module 450, be configured as based on first instance indicate, second instance indicate and characteristic similarity, determine first instance and Entity similitude between second instance.
In certain embodiments, it includes: that first instance feature obtains module that feature, which obtains module 410, is configured as from the The feature of first instance is extracted in one entity, feature includes at least one of the following: mark, type, description information, key assignments To, related entities and multimedia content.
In certain embodiments, it includes: that first instance mark obtains module that feature, which obtains module 410, is configured as obtaining The mark of first instance;And second instance feature obtains module, mark and the candidate for being configured to respond to first instance are real Mark similitude between the mark of body is more than predetermined threshold, obtains the feature of second instance.
In certain embodiments, first instance indicates that generation module 420 includes: that the first description information obtains module, is matched It is set to the first description information for obtaining from the feature of first instance and being directed to first instance;First text representation generation module, is matched It is set to based on the first description information, generates the first text representation for indicating the first description information;And first instance indicates mapping Module is configured as the first text representation being applied to the first map unit of Similarity Model, the first text representation is reflected It penetrates as first instance expression.
In certain embodiments, second instance indicates that generation module 430 includes: that the second description information obtains module, is matched It is set to the second description information for obtaining from the feature of second instance and being directed to second instance;Second text representation generation module, is matched It is set to based on the second description information, generates the second text representation for indicating the second description information;And second instance indicates mapping Module is configured as the second text representation being applied to the second map unit of Similarity Model, the second text representation is reflected Penetrating indicates for second instance, wherein the first map unit and the second map unit have identical parameters.
In certain embodiments, entity similitude determining module 450 includes: that full connection indicates generation module, is configured as First instance is indicated, the full connection unit of second instance expression and characteristic similarity applied to Similarity Model, it is complete to generate Connection indicates;And first instance similitude determining module, it is configured as indicating full connection to be applied to taxon, to generate First instance and second instance are the probability of identical entity as entity similitude.
In certain embodiments, entity similitude determining module 450 includes: that probability obtains module, is configured as acquisition the One entity whether there is the probability of ambiguity entity;And second instance similitude determining module, it is configured as based on first instance Expression, second instance expression, characteristic similarity and probability, determine the entity similitude between first instance and second instance
In certain embodiments, entity similitude determining module 450 includes: selection receiving module, is configured as reception pair With first instance and second instance will compared with feature selection;Comparison module is configured as real by first to selection The feature of the second instance of the feature and selection of body is compared, to determine whether first instance and second instance are different realities Body;And third entity similitude determining module, it is configured to respond to first instance and second instance is different entities, determine Entity similitude.
Fig. 5 shows the schematic block diagram that can be used to implement the example apparatus 500 of embodiment of the disclosure.Equipment 500 It can be used to implement the calculating equipment 110 of Fig. 1.As shown, equipment 500 includes central processing unit (CPU) 501, it can be with Random access is loaded into according to the computer program instructions being stored in read-only memory (ROM) 502 or from storage unit 508 Computer program instructions in memory (RAM) 503, to execute various movements appropriate and processing.In RAM 503, may be used also Storage equipment 500 operates required various programs and data.CPU 501, ROM 502 and RAM 503 pass through bus 504 each other It is connected.Input/output (I/O) interface 505 is also connected to bus 504.
Multiple components in equipment 500 are connected to I/O interface 505, comprising: input unit 506, such as keyboard, mouse etc.; Output unit 507, such as various types of displays, loudspeaker etc.;Storage unit 508, such as disk, CD etc.;And it is logical Believe unit 509, such as network interface card, modem, wireless communication transceiver etc..Communication unit 509 allows equipment 500 by such as The computer network of internet and/or various telecommunication networks exchange information/data with other equipment.
Processing unit 501 executes each method as described above and processing, such as process 200.For example, in some implementations In example, process 200 can be implemented as computer software programs, be tangibly embodied in machine readable media, such as storage list Member 508.In some embodiments, some or all of of computer program can be via ROM 502 and/or communication unit 509 And it is loaded into and/or is installed in equipment 500.It, can be with when computer program loads to RAM 503 and when being executed by CPU 501 Execute the one or more steps of procedures described above 200.Alternatively, in other embodiments, CPU 501 can pass through it His any mode (for example, by means of firmware) appropriate and be configured as implementation procedure 200.
Function described herein can be executed at least partly by one or more hardware logic components.Example Such as, without limitation, the hardware logic component for the exemplary type that can be used includes: field programmable gate array (FPGA), dedicated Integrated circuit (ASIC), Application Specific Standard Product (ASSP), the system (SOC) of system on chip, load programmable logic device (CPLD) etc..
For implement disclosed method program code can using any combination of one or more programming languages come It writes.These program codes can be supplied to the place of general purpose computer, special purpose computer or other programmable data processing units Device or controller are managed, so that program code makes defined in flowchart and or block diagram when by processor or controller execution Function/operation is carried out.Program code can be executed completely on machine, partly be executed on machine, as stand alone software Is executed on machine and partly execute or executed on remote machine or server completely on the remote machine to packet portion.
In the context of the disclosure, machine readable media can be tangible medium, may include or is stored for The program that instruction execution system, device or equipment are used or is used in combination with instruction execution system, device or equipment.Machine can Reading medium can be machine-readable signal medium or machine-readable storage medium.Machine readable media can include but is not limited to electricity Son, magnetic, optical, electromagnetism, infrared or semiconductor system, device or equipment or above content any conjunction Suitable combination.The more specific example of machine readable storage medium will include the electrical connection of line based on one or more, portable meter Calculation machine disk, hard disk, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM Or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage facilities or Any appropriate combination of above content.
Although this should be understood as requiring operating in this way with shown in addition, depicting each operation using certain order Certain order out executes in sequential order, or requires the operation of all diagrams that should be performed to obtain desired result. Under certain environment, multitask and parallel processing be may be advantageous.Similarly, although containing several tools in being discussed above Body realizes details, but these are not construed as the limitation to the scope of the present disclosure.In the context of individual embodiment Described in certain features can also realize in combination in single realize.On the contrary, in the described in the text up and down individually realized Various features can also realize individually or in any suitable subcombination in multiple realizations.
Although having used specific to this theme of the language description of structure feature and/or method logical action, answer When understanding that theme defined in the appended claims is not necessarily limited to special characteristic described above or movement.On on the contrary, Special characteristic described in face and movement are only to realize the exemplary forms of claims.

Claims (20)

1. a kind of information processing method, comprising:
Obtain the feature of first instance and the feature of second instance;
Based on the feature of the first instance, generating first instance is indicated;
Based on the feature of the second instance, generating second instance is indicated;
Determine the characteristic similarity between the feature of the first instance and the feature of the second instance;And
Based on the first instance indicate, the second instance indicate and the characteristic similarity, determine the first instance and Entity similitude between the second instance.
2. according to the method described in claim 1, the feature for wherein obtaining the first instance includes:
The feature of the first instance is extracted from the first instance, the feature includes at least one of the following: mark, Type, description information, key-value pair, related entities and multimedia content.
3. according to the method described in claim 1, the feature for wherein obtaining the second instance includes:
Obtain the mark of the first instance;And
Mark in response to the first instance and the mark similitude between the mark of candidate entity are more than predetermined threshold, are obtained The feature of the second instance.
4. according to the method described in claim 1, wherein generating first instance expression and including:
The first description information for being directed to the first instance is obtained from the feature of the first instance;
Based on first description information, the first text representation for indicating first description information is generated;And
First text representation is applied to the first map unit of Similarity Model, first text representation is mapped For first instance expression.
5. according to the method described in claim 4, wherein generating second instance expression and including:
The second description information for being directed to the second instance is obtained from the feature of the second instance;
Based on second description information, the second text representation for indicating second description information is generated;And
Second text representation is applied to the second map unit of the Similarity Model, by second text representation Being mapped as the second instance indicates, wherein first map unit and second map unit have identical parameters.
6. according to the method described in claim 1, wherein determining that the entity similitude includes:
The first instance is indicated, the second instance indicates and the characteristic similarity is applied to connecting entirely for Similarity Model Order member is indicated with generating full connection;And
The full connection is indicated to be applied to taxon, to generate the first instance and the second instance is identical entity Probability as the entity similitude.
7. according to the method described in claim 1, wherein determining that the entity similitude includes:
Obtaining the first instance whether there is the probability of ambiguity entity;And
Indicated based on the first instance, the second instance indicates, the characteristic similarity and the probability, determine described the Entity similitude between one entity and the second instance.
8. according to the method described in claim 1, wherein determining that the entity similitude includes:
Receive to the first instance and the second instance will compared with feature selection;
It is compared by the feature of the second instance of the feature and selection of the first instance to selection, to determine State whether first instance and the second instance are different entities;And
It is different entities in response to the first instance and the second instance, determines the entity similitude.
9. according to the method described in claim 8, wherein the comparison includes at least one of the following: accurate comparison, editing distance Comparison, time are compared, text similarity compares, co-occurrence compares, digital comparison and type are compared.
10. a kind of information processing unit, comprising:
Feature obtains module, is configured as obtaining the feature of first instance and the feature of second instance;
First instance indicates generation module, is configured as the feature based on the first instance, and generating first instance indicates;
Second instance indicates generation module, is configured as the feature based on the second instance, and generating second instance indicates;
Characteristic similarity determining module is configured to determine that between the feature of the first instance and the feature of the second instance Characteristic similarity;And
Entity similitude determining module is configured as based on first instance expression, the second instance indicates and the spy Similitude is levied, determines the entity similitude between the first instance and the second instance.
11. device according to claim 10, wherein the feature for obtaining the first instance includes:
The feature of the first instance is extracted from the first instance, the feature includes at least one of the following: mark, Type, description information, key-value pair, related entities and multimedia content.
12. device according to claim 10, wherein the feature for obtaining the second instance includes:
Obtain the mark of the first instance;And
Mark in response to the first instance and the mark similitude between the mark of candidate entity are more than predetermined threshold, are obtained The feature of the second instance.
13. device according to claim 10, wherein generating the first instance expression and including:
The first description information for being directed to the first instance is obtained from the feature of the first instance;
Based on first description information, the first text representation for indicating first description information is generated;And
First text representation is applied to the first map unit of Similarity Model, first text representation is mapped For first instance expression.
14. device according to claim 13, wherein generating the second instance expression and including:
The second description information for being directed to the second instance is obtained from the feature of the second instance;
Based on second description information, the second text representation for indicating second description information is generated;And
Second text representation is applied to the second map unit of the Similarity Model, by second text representation Being mapped as the second instance indicates, wherein first map unit and second map unit have identical parameters.
15. device according to claim 10, wherein determining that the entity similitude includes:
The first instance is indicated, the second instance indicates and the characteristic similarity is applied to connecting entirely for Similarity Model Order member is indicated with generating full connection;And
The full connection is indicated to be applied to taxon, to generate the first instance and the second instance is identical entity Probability as the entity similitude.
16. device according to claim 10, wherein determining that the entity similitude includes:
Obtaining the first instance whether there is the probability of ambiguity entity;And
Indicated based on the first instance, the second instance indicates, the characteristic similarity and the probability, determine described the Entity similitude between one entity and the second instance.
17. device according to claim 10, wherein determining that the entity similitude includes:
Receive to the first instance and the second instance will compared with feature selection;
It is compared by the feature of the second instance of the feature and selection of the first instance to selection, to determine State whether first instance and the second instance are different entities;And
It is different entities in response to the first instance and the second instance, determines the entity similitude.
18. device according to claim 17, wherein the comparison include at least one of the following: it is accurate relatively, edit away from From compare, the time compares, text similarity compares, co-occurrence compares, digital comparison and type are compared.
19. a kind of electronic equipment, the equipment include:
One or more processors;And
Storage device, for storing one or more programs, when one or more of programs are by one or more of processing Device executes, so that one or more of processors realize method as claimed in any one of claims 1-9 wherein.
20. a kind of computer readable storage medium is stored thereon with computer program, realization when described program is executed by processor Method as claimed in any one of claims 1-9 wherein.
CN201910426142.7A 2019-05-21 2019-05-21 Method, apparatus, device and computer readable storage medium for information processing Active CN110134965B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910426142.7A CN110134965B (en) 2019-05-21 2019-05-21 Method, apparatus, device and computer readable storage medium for information processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910426142.7A CN110134965B (en) 2019-05-21 2019-05-21 Method, apparatus, device and computer readable storage medium for information processing

Publications (2)

Publication Number Publication Date
CN110134965A true CN110134965A (en) 2019-08-16
CN110134965B CN110134965B (en) 2023-08-18

Family

ID=67572367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910426142.7A Active CN110134965B (en) 2019-05-21 2019-05-21 Method, apparatus, device and computer readable storage medium for information processing

Country Status (1)

Country Link
CN (1) CN110134965B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674304A (en) * 2019-10-09 2020-01-10 北京明略软件***有限公司 Entity disambiguation method and device, readable storage medium and electronic equipment
CN112163109A (en) * 2020-09-24 2021-01-01 中国科学院计算机网络信息中心 Entity disambiguation method and system based on picture
US20210065046A1 (en) * 2019-08-29 2021-03-04 International Business Machines Corporation System for identifying duplicate parties using entity resolution
WO2021179708A1 (en) * 2020-10-20 2021-09-16 平安科技(深圳)有限公司 Named-entity recognition method and apparatus, computer device and readable storage medium
US11544477B2 (en) 2019-08-29 2023-01-03 International Business Machines Corporation System for identifying duplicate parties using entity resolution

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2664997A2 (en) * 2012-05-18 2013-11-20 Xerox Corporation System and method for resolving named entity coreference
CN105808689A (en) * 2016-03-03 2016-07-27 中国地质大学(武汉) Drainage system entity semantic similarity measurement method based on artificial neural network
US9710544B1 (en) * 2016-05-19 2017-07-18 Quid, Inc. Pivoting from a graph of semantic similarity of documents to a derivative graph of relationships between entities mentioned in the documents
US20170255693A1 (en) * 2016-03-04 2017-09-07 Microsoft Technology Licensing, Llc Providing images for search queries
CN107506486A (en) * 2017-09-21 2017-12-22 北京航空航天大学 A kind of relation extending method based on entity link
CN107861939A (en) * 2017-09-30 2018-03-30 昆明理工大学 A kind of domain entities disambiguation method for merging term vector and topic model
CN107943860A (en) * 2017-11-08 2018-04-20 北京奇艺世纪科技有限公司 The recognition methods and device that the training method of model, text are intended to
CN107992480A (en) * 2017-12-25 2018-05-04 东软集团股份有限公司 A kind of method, apparatus for realizing entity disambiguation and storage medium, program product
CN108280061A (en) * 2018-01-17 2018-07-13 北京百度网讯科技有限公司 Text handling method based on ambiguity entity word and device
CN108288067A (en) * 2017-09-12 2018-07-17 腾讯科技(深圳)有限公司 Training method, bidirectional research method and the relevant apparatus of image text Matching Model
CN108376160A (en) * 2018-02-12 2018-08-07 北京大学 A kind of Chinese knowledge mapping construction method and system
CN108389614A (en) * 2018-03-02 2018-08-10 西安交通大学 The method for building medical image collection of illustrative plates based on image segmentation and convolutional neural networks
CN108399163A (en) * 2018-03-21 2018-08-14 北京理工大学 Bluebeard compound polymerize the text similarity measure with word combination semantic feature
AU2018201708A1 (en) * 2017-03-09 2018-09-27 Tata Consultancy Services Limited Method and system for mapping attributes of entities
CN108681537A (en) * 2018-05-08 2018-10-19 中国人民解放军国防科技大学 Chinese entity linking method based on neural network and word vector
US20180349350A1 (en) * 2017-06-01 2018-12-06 Beijing Baidu Netcom Science And Technology Co., Ltd. Artificial intelligence based method and apparatus for checking text
JP6462970B1 (en) * 2018-05-21 2019-01-30 楽天株式会社 Classification device, classification method, generation method, classification program, and generation program
CN109299462A (en) * 2018-09-20 2019-02-01 武汉理工大学 Short text similarity calculating method based on multidimensional convolution feature
JPWO2018042665A1 (en) * 2016-09-05 2019-04-11 富士通株式会社 Information presentation method, apparatus, and program
CN109635297A (en) * 2018-12-11 2019-04-16 湖南星汉数智科技有限公司 A kind of entity disambiguation method, device, computer installation and computer storage medium
CN109710760A (en) * 2018-12-20 2019-05-03 泰康保险集团股份有限公司 Clustering method, device, medium and the electronic equipment of short text

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130311467A1 (en) * 2012-05-18 2013-11-21 Xerox Corporation System and method for resolving entity coreference
EP2664997A2 (en) * 2012-05-18 2013-11-20 Xerox Corporation System and method for resolving named entity coreference
CN105808689A (en) * 2016-03-03 2016-07-27 中国地质大学(武汉) Drainage system entity semantic similarity measurement method based on artificial neural network
US20170255693A1 (en) * 2016-03-04 2017-09-07 Microsoft Technology Licensing, Llc Providing images for search queries
US9710544B1 (en) * 2016-05-19 2017-07-18 Quid, Inc. Pivoting from a graph of semantic similarity of documents to a derivative graph of relationships between entities mentioned in the documents
JPWO2018042665A1 (en) * 2016-09-05 2019-04-11 富士通株式会社 Information presentation method, apparatus, and program
AU2018201708A1 (en) * 2017-03-09 2018-09-27 Tata Consultancy Services Limited Method and system for mapping attributes of entities
US20180349350A1 (en) * 2017-06-01 2018-12-06 Beijing Baidu Netcom Science And Technology Co., Ltd. Artificial intelligence based method and apparatus for checking text
CN108288067A (en) * 2017-09-12 2018-07-17 腾讯科技(深圳)有限公司 Training method, bidirectional research method and the relevant apparatus of image text Matching Model
CN107506486A (en) * 2017-09-21 2017-12-22 北京航空航天大学 A kind of relation extending method based on entity link
CN107861939A (en) * 2017-09-30 2018-03-30 昆明理工大学 A kind of domain entities disambiguation method for merging term vector and topic model
CN107943860A (en) * 2017-11-08 2018-04-20 北京奇艺世纪科技有限公司 The recognition methods and device that the training method of model, text are intended to
CN107992480A (en) * 2017-12-25 2018-05-04 东软集团股份有限公司 A kind of method, apparatus for realizing entity disambiguation and storage medium, program product
CN108280061A (en) * 2018-01-17 2018-07-13 北京百度网讯科技有限公司 Text handling method based on ambiguity entity word and device
CN108376160A (en) * 2018-02-12 2018-08-07 北京大学 A kind of Chinese knowledge mapping construction method and system
CN108389614A (en) * 2018-03-02 2018-08-10 西安交通大学 The method for building medical image collection of illustrative plates based on image segmentation and convolutional neural networks
CN108399163A (en) * 2018-03-21 2018-08-14 北京理工大学 Bluebeard compound polymerize the text similarity measure with word combination semantic feature
CN108681537A (en) * 2018-05-08 2018-10-19 中国人民解放军国防科技大学 Chinese entity linking method based on neural network and word vector
JP6462970B1 (en) * 2018-05-21 2019-01-30 楽天株式会社 Classification device, classification method, generation method, classification program, and generation program
CN109299462A (en) * 2018-09-20 2019-02-01 武汉理工大学 Short text similarity calculating method based on multidimensional convolution feature
CN109635297A (en) * 2018-12-11 2019-04-16 湖南星汉数智科技有限公司 A kind of entity disambiguation method, device, computer installation and computer storage medium
CN109710760A (en) * 2018-12-20 2019-05-03 泰康保险集团股份有限公司 Clustering method, device, medium and the electronic equipment of short text

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
刘永平;胡忠顺;阳德青;肖仰华;: "基于给定实体和属性的相似实体推荐方法", 计算机工程, no. 10, pages 181 - 186 *
吴卫祖;刘利群;谢冬青;: "基于神经网络的异构网络向量化表示方法", 计算机科学, no. 05, pages 272 - 275 *
阳怡林;周杰;李弼程;: "基于聚类集成的人名消歧算法", 计算机应用研究, no. 09, pages 2716 - 2720 *
马晓军;郭剑毅;王红斌;张志坤;线岩团;余正涛;: "融合词向量和主题模型的领域实体消歧", 模式识别与人工智能, no. 12, pages 1130 - 1137 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210065046A1 (en) * 2019-08-29 2021-03-04 International Business Machines Corporation System for identifying duplicate parties using entity resolution
US11544477B2 (en) 2019-08-29 2023-01-03 International Business Machines Corporation System for identifying duplicate parties using entity resolution
US11556845B2 (en) * 2019-08-29 2023-01-17 International Business Machines Corporation System for identifying duplicate parties using entity resolution
CN110674304A (en) * 2019-10-09 2020-01-10 北京明略软件***有限公司 Entity disambiguation method and device, readable storage medium and electronic equipment
CN112163109A (en) * 2020-09-24 2021-01-01 中国科学院计算机网络信息中心 Entity disambiguation method and system based on picture
WO2021179708A1 (en) * 2020-10-20 2021-09-16 平安科技(深圳)有限公司 Named-entity recognition method and apparatus, computer device and readable storage medium

Also Published As

Publication number Publication date
CN110134965B (en) 2023-08-18

Similar Documents

Publication Publication Date Title
CN110134965A (en) Method, apparatus, equipment and computer readable storage medium for information processing
WO2021179570A1 (en) Sequence labeling method and apparatus, and computer device and storage medium
JP7304370B2 (en) Video retrieval method, apparatus, device and medium
JP7113097B2 (en) Sense description processing method, device and equipment for text entities
JP2021184237A (en) Dataset processing method, apparatus, electronic device, and storage medium
CN116935169B (en) Training method for draft graph model and draft graph method
US10853580B1 (en) Generation of text classifier training data
WO2021208601A1 (en) Artificial-intelligence-based image processing method and apparatus, and device and storage medium
KR20170004154A (en) Method and system for automatically summarizing documents to images and providing the image-based contents
CN109408829B (en) Method, device, equipment and medium for determining readability of article
CN112861514B (en) Attention-enhanced full-correlation variational self-encoder for partitioning grammar and semantics
CN115983271B (en) Named entity recognition method and named entity recognition model training method
JP2022003537A (en) Method and device for recognizing intent of dialog, electronic apparatus, and storage medium
CN111666766A (en) Data processing method, device and equipment
CN109408834A (en) Auxiliary machinery interpretation method, device, equipment and storage medium
CN113704393A (en) Keyword extraction method, device, equipment and medium
CN115269781A (en) Modal association degree prediction method, device, equipment, storage medium and program product
CN113723077B (en) Sentence vector generation method and device based on bidirectional characterization model and computer equipment
CN114722794A (en) Data extraction method and data extraction device
Springstein et al. QuTI! quantifying text-image consistency in multimodal documents
CN112036439B (en) Dependency relationship classification method and related equipment
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
CN116681083A (en) Text data sensitive detection method, device, equipment and medium
CN111190235A (en) Block chain information receiving and recording platform
JP2023062150A (en) Character recognition model training, character recognition method, apparatus, equipment, and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant