[summary of the invention]
In view of this, embodiments provide a kind of acquisition methods and device of type of relationship, automatically can obtain the relation between the type of entity and type, improve the acquisition efficiency of the relation between the type of entity and type, reduce the procurement cost of the type of entity and the relation of type.
The one side of the embodiment of the present invention, provides a kind of acquisition methods of type of relationship, comprising:
Obtain the description text of each entity and each entity;
Obtain the type that each entity is corresponding;
According to the description text of each entity corresponding to each type, generate the description text of each type;
According to the type of relationship of specifying, from the description text of each type, extract the M set type meeting described type of relationship of specifying, M is positive integer.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, the type that each entity of described acquisition is corresponding, comprising:
According to classification of type knowledge, and according to type, each entity is polymerized, to obtain type corresponding to each entity; Or,
By each entity input type disaggregated model respectively, to make classification of type model carry out classification of type to each entity, to obtain type corresponding to each entity.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and the description text of the described each entity corresponding according to each type, generates the description text of each type, comprising:
Word process is cut to the description text of each entity corresponding to each type, cuts word result to obtain;
Use pattern knowledge base is mated respectively cutting in word result;
If one is cut in word result the keyword comprising and define in described type knowledge base, extract and comprise the text fragments that this cuts word result;
According to each text fragments extracted, generate the description text of each type.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, the described type of relationship according to specifying, and extracts the M set type meeting described type of relationship of specifying, comprising from the description text of each type:
Obtain the relationship templates of specifying, the corresponding type of relationship of described relationship templates, described relationship templates comprises the content of text of the type of relationship between instruction two types;
Utilize described relationship templates, in the description text of each type, carry out character match, from the description text of each type, extract N set type; N is greater than or equal to M, and is positive integer;
According to the N set type extracted, obtain the M set type meeting described type of relationship of specifying.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, the N set type that described basis extracts, and obtains the M set type meeting described type of relationship of specifying, comprising:
Carry out title normalized to P type in described N set type, P is positive integer;
For the N set type after normalized, according to belonging to the same type of different group and described type of relationship of specifying, N set type is merged into described M set type.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described method also comprises: add described type of relationship of specifying and the M set type meeting described type of relationship of specifying to knowledge mapping.
The one side of the embodiment of the present invention, provides a kind of acquisition device of type of relationship, comprising:
Receiver module, for obtaining the description text of each entity and each entity;
Sort module, for obtaining type corresponding to each entity;
Generation module, for the description text according to each entity corresponding to each type, generates the description text of each type;
Acquisition module, for according to the type of relationship of specifying, from the description text of each type, extract the M set type meeting described type of relationship of specifying, M is positive integer.
Aspect as above and arbitrary possible implementation, provide a kind of implementation, described sort module further, specifically for:
According to classification of type knowledge, and according to type, each entity is polymerized, to obtain type corresponding to each entity; Or,
By each entity input type disaggregated model respectively, to make classification of type model carry out classification of type to each entity, to obtain type corresponding to each entity.
Aspect as above and arbitrary possible implementation, provide a kind of implementation, described generation module further, specifically for:
Word process is cut to the description text of each entity corresponding to each type, cuts word result to obtain;
Use pattern knowledge base is mated respectively cutting in word result;
If one is cut in word result the keyword comprising and define in described type knowledge base, extract and comprise the text fragments that this cuts word result;
According to each text fragments extracted, generate the description text of each type.
Aspect as above and arbitrary possible implementation, provide a kind of implementation, described acquisition module further, specifically for:
Obtain the relationship templates of specifying, the corresponding type of relationship of described relationship templates, described relationship templates comprises the content of text of the type of relationship between instruction two types;
Utilize described relationship templates, in the description text of each type, carry out character match, from the description text of each type, extract N set type; N is greater than or equal to M, and is positive integer;
According to the N set type extracted, obtain the M set type meeting described type of relationship of specifying.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and described acquisition module is used for the N set type according to extracting, when acquisition meets the M set type of described type of relationship of specifying, specifically for:
Carry out title normalized to P type in described N set type, P is positive integer;
For the N set type after normalized, according to belonging to the same type of different group and described type of relationship of specifying, N set type is merged into described M set type.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described device also comprises: processing module, for adding described type of relationship of specifying and the M set type meeting described type of relationship of specifying to knowledge mapping.
As can be seen from the above technical solutions, the embodiment of the present invention has following beneficial effect:
The technical scheme that the embodiment of the present invention provides, automatically the relation between the type of entity and type can be obtained, compared with relying on the mode of relation between the artificial type gathering entity in prior art, by relation between the automatic type gathering entity in the embodiment of the present invention, artificial collection can be avoided, therefore improve the acquisition efficiency of the relation between the type of entity and type, reduce the procurement cost of the type of entity and the relation of type.
Embodiment one
The embodiment of the present invention provides a kind of acquisition methods of type of relationship, please refer to Fig. 1, the schematic flow sheet of the acquisition methods of its type of relationship provided for the embodiment of the present invention, and as shown in the figure, the method comprises the following steps:
S101, obtains the description text of each entity and each entity.
Concrete, given some entities can be obtained, and the description text of each entity in given some entities.
Be company for entity, the description text of company can be the description text of corporate business.Such as, the product that the scope of business that can comprise company in the description text of corporate business describes, the title of company, company produce, company produce the vendor name etc. of required starting material and necessary parts.
Such as, for entity " Huawei Tech Co., Ltd ", it describes text can for " Huawei Tech Co., Ltd is the private communication scientific & technical corporation of a production and sales communication facilities, and general headquarters are positioned at Chinese Guangdong province Huawei Base, Bantian, Longgang District, Shenzhen City.The product of Huawei relates generally to exchange network, transmission network, wireless and wired fixed access network network and data communication network in communication network and wireless terminal product, for common carrier and specialized network owner provide hardware device, software, service and solution all over the world ".
S102, obtains the type that each entity is corresponding.
Concrete, for given each entity, need first to obtain type corresponding to each entity.
Illustrate, the method obtaining type corresponding to each entity can include but not limited to following two kinds:
The first: according to classification of type knowledge, and according to type, each entity is polymerized, to obtain type corresponding to each entity.
Be understandable that, described classification of type knowledge comprises classification of type tree, and the node in classification of type tree is type, and the child node of type is Extended-type word.
In a concrete implementation procedure, word can be cut to the description text of given each entity, with the description text obtaining each entity corresponding cut word result.Then the type in use pattern classification tree or Extended-type word, carry out character match respectively cutting in word result.If cut word result can hit certain type or certain Extended-type word, then can think that the type that this cuts entity corresponding to word result is that this cuts the type of word result hit, or this cuts the type corresponding to Extended-type word of word result hit.Like this, for certain entity, just can according to the description text of this entity corresponding cut word result, determine the type of this entity, achieve and according to type, each entity be polymerized, obtain the type that each entity is corresponding, be also just equivalent to obtain all types of in some entities corresponding to each type.
Be company with entity below, the type of entity is industry belonging to company is that example is illustrated.
For entity " Huawei Tech Co., Ltd ", it describes text for " Huawei Tech Co., Ltd is the private communication scientific & technical corporation of a production and sales communication facilities, and general headquarters are positioned at Chinese Guangdong province Huawei Base, Bantian, Longgang District, Shenzhen City.The product of Huawei relates generally to exchange network, transmission network, wireless and wired fixed access network network and data communication network in communication network and wireless terminal product, for common carrier and specialized network owner provide hardware device, software, service and solution all over the world ".
Describe after text cuts word to this, all types of or all types of expansion word is utilized to cut in word result carry out character match at this, word result is cut in discovery can hit type " communication equipment manufacturing industry ", be somebody's turn to do " communication equipment manufacturing industry " subtype as " manufacturing industry ", can using the type of subtype " communication equipment manufacturing industry " as above-mentioned entity in the embodiment of the present invention.Or, cut word result and can also hit Extended-type word, as " telephone set, mobile phone, the network switch ", then according to type corresponding to Extended-type word that this cuts the hit of word result, also can determine that the type of above-mentioned entity is for " communication equipment manufacturing industry ".Like this, just its type can be obtained for entity " Huawei Tech Co., Ltd ", and, utilize said method also can obtain some entities corresponding to the type " communication equipment manufacturing industry ", thus realize by carrying out cluster to each entity, obtain the type of each entity, and all types of in some types corresponding to each entity.
The second: by each entity input type disaggregated model respectively, to make classification of type model carry out classification of type to each entity, to obtain type corresponding to each entity.
In a concrete implementation procedure, the description text of a large amount of entities, entity and entity corresponding types can be utilized as training sample, machine learning is carried out to this training sample, generates classification of type model.Then, for given each entity, can by description Text Input the type disaggregated model of the title of each entity or entity, to make classification of type model, type identification is carried out to its entity, classification of type model can obtain and export the type of each entity, be also just equivalent to obtain all types of in some entities corresponding to each type.
S103, according to the description text of each entity corresponding to each type, generates the description text of each type.
Concrete, due to the type of each entity can be obtained in S102, therefore can according to the type of each entity, obtain all types of in some entities corresponding to each type.In this step, need the description text according to each entity corresponding to each type, describe text for each type generates.
Illustrate, according to the description text of each entity corresponding to each type, the method generating the description text of each type can include but not limited to:
First, word process is cut to the description text of each entity corresponding to each type, cuts word result to obtain.Then, use pattern knowledge base is mated respectively cutting in word result; If one is cut in word result the keyword comprising and define in described type knowledge base, extract and comprise the text fragments that this cuts word result.Finally, according to each text fragments extracted, generate the description text of each type.
It should be noted that, the title of type, another name and Extended-type word etc. in type knowledge base, can be comprised.
In a concrete implementation procedure, for each type, the description text of each entity corresponding for the type can be carried out cutting word, obtain the description text of this entity corresponding cut word result.Then, use pattern knowledge base, carry out character match respectively cutting in word result, if find, certain is cut word result and has hit the title of the type, another name or Extended-type word, just can extracting from the description text of this entity and comprise the text fragments that this cuts word result, cutting sentence or one section of text of word result as comprised.So, corresponding text fragments can be extracted from the description text of each entity corresponding to the type, then utilize the some text fragments extracted to form set, using the description text of this set as the type.Aforesaid way can be used to obtain the description text of the type for each type, thus all types of description texts can be obtained.
Take entity as company, type industry belonging to entity is example, the description text of type can comprise " starting material of industry a are provided by industry b ", " industry a relies on industry b " or " industry a is by the impact of industry b " etc., these can as the description text of industry a, and industry a belongs to the type of company.
S104, according to the type of relationship of specifying, from the description text of each type, extract the M set type meeting described type of relationship of specifying, M is positive integer.
Concrete, after the description text obtaining each entity, according to the type of relationship of specifying, can extract the some set types meeting this specified relationship from the description text of each type, wherein, every set type can comprise two different types.
Illustrate, in the embodiment of the present invention, according to the type of relationship of specifying, from the description text of each type, extract the method meeting the M set type of described type of relationship of specifying can include but not limited to:
First, obtain the relationship templates of specifying, the corresponding type of relationship of this relationship templates, described relationship templates comprises the content of text of the type of relationship between instruction two types.Then, utilize this relationship templates, in the description text of each type, carry out character match, from the description text of each type, extract N set type; N is greater than or equal to M, and is positive integer; Finally, according to the N set type extracted, obtain the M set type meeting described type of relationship of specifying.
Be understandable that, type of relationship between two types due to what define in relationship templates, therefore, when utilizing relationship templates to mate in the description text of type, if the content in description text and the character of relationship templates match, then can extract a set type from description text, some text fragments can be comprised in the description text of each type, for the description text of the type, some set types can be extracted, two types in the every set type extracted are exactly the type meeting type of relationship corresponding to this relationship templates, thus the type of relationship of two types can be obtained.
Be company with entity, with type, belonging to entity, industry is example, and the type of relationship that the relationship templates of specifying is corresponding is Relationship.Such as, relationship templates can comprise " xx starting material are provided by xx ", " xx relies on xx " or the impact of xx " xx by ", the Relationship that these relationship templates are corresponding is " relation of downstream and upstream ", namely in two industries that the often group industry utilizing these relationship templates to extract from the description text of type comprises, the former is the downstream industry of the latter, the latter is the former upstream industry, and therefore, the relation of two industries is relations of upstream industry and downstream industry.
Illustrate, in the embodiment of the present invention, according to the N set type extracted, obtaining the method meeting the M set type of described type of relationship of specifying can include but not limited to:
First, carry out title normalized to P type in described N set type, wherein, P is positive integer, and P is less than 2N.Then, for the N set type after normalized, according to belonging to the same type of different group and described type of relationship of specifying, N set type is merged into described M set type.
Such as, if comprise same type in certain two set type, but the title of the type in two groups is different, and one of them is title accurately, and another is another name, then can by unified for the title of the type in two groups be title accurately.After Uniform Name, just easily two set types comprising an identical type can be merged, generate one group of new type, the type of relationship that before this set type still meets, two set types meet.
Industry for type, if the first set type meeting the relation of upstream industry and downstream industry comprises " industry 1-industry 2 ", second set type comprises " industry 2-industry 3 ", therefore, two groups of industries can be combined, be merged into one group of industry, be i.e. " industry 1-industry 2-industry 3 ", in this group industry, previous industry is all the upstream industry of a rear industry, and a rear industry is all the downstream industry of previous industry.
Alternatively, in one of the present embodiment possible implementation, obtain after meeting the M set type of the type of relationship of specifying in S104, can described type of relationship and the M set type meeting described type of relationship be added to knowledge mapping.
Such as, the M set type of acquisition can be added under type of relationship described in knowledge mapping, in M set type, plural type can be comprised.Or, can, in knowledge mapping in each type, be also the upper another type of the type mark, and the type of relationship of mark above and between another type.
Industry for type, relation between two industries can be added and add some groups of industries for Relationship in the relevant knowledge mapping of the relevant knowledge mapping of business or market, represent that the two or more industry in every group industry has the sector relation.Thus the excavation of relation between industry can be realized, obtain the relation between two industries, the relation between industry may be used for disclosing business supply chain, be the important component part of process commercial competition information.Therefore, the pass between industry ties up in real life has very big effect.
The embodiment of the present invention provides the device embodiment realizing each step and method in said method embodiment further.
Please refer to Fig. 2, the functional block diagram of the acquisition device of its type of relationship provided for the embodiment of the present invention.As shown in the figure, this device comprises:
Receiver module 21, for obtaining the description text of each entity and each entity;
Sort module 22, for obtaining type corresponding to each entity;
Generation module 23, for the description text according to each entity corresponding to each type, generates the description text of each type;
Acquisition module 24, for according to the type of relationship of specifying, from the description text of each type, extract the M set type meeting described type of relationship of specifying, M is positive integer.
In a concrete implementation procedure, described sort module 22, specifically for:
According to classification of type knowledge, and according to type, each entity is polymerized, to obtain type corresponding to each entity; Or, by each entity input type disaggregated model respectively, to make classification of type model carry out classification of type to each entity, to obtain type corresponding to each entity.
In a concrete implementation procedure, described generation module 23, specifically for:
Word process is cut to the description text of each entity corresponding to each type, cuts word result to obtain;
Use pattern knowledge base is mated respectively cutting in word result;
If one is cut in word result the keyword comprising and define in described type knowledge base, extract and comprise the text fragments that this cuts word result;
According to each text fragments extracted, generate the description text of each type.
In a concrete implementation procedure, described acquisition module 24, specifically for:
Obtain the relationship templates of specifying, the corresponding type of relationship of described relationship templates, described relationship templates comprises the content of text of the type of relationship between instruction two types;
Utilize described relationship templates, in the description text of each type, carry out character match, from the description text of each type, extract N set type; N is greater than or equal to M, and is positive integer;
According to the N set type extracted, obtain the M set type meeting described type of relationship of specifying.
In a concrete implementation procedure, described acquisition module 24 for according to the N set type extracted, obtains when meeting the M set type of described type of relationship of specifying, specifically for:
Carry out title normalized to P type in described N set type, P is positive integer;
For the N set type after normalized, according to belonging to the same type of different group and described type of relationship of specifying, N set type is merged into described M set type.
Optionally, in one of the present embodiment possible implementation, described device also comprises:
Processing module 25, for adding described type of relationship of specifying and the M set type meeting described type of relationship of specifying to knowledge mapping.
Because each unit in the present embodiment can perform the method shown in Fig. 1, the part that the present embodiment is not described in detail, can with reference to the related description to Fig. 1.
The technical scheme of the embodiment of the present invention has following beneficial effect:
In the embodiment of the present invention, by obtaining the description text of each entity and each entity; Thus, obtain the type that each entity is corresponding, and, according to the description text of each entity corresponding to each type, generate the description text of each type; And then according to the type of relationship of specifying, extract the M set type meeting described type of relationship of specifying from the description text of each type, M is positive integer.
The technical scheme that the embodiment of the present invention provides, automatically the relation between the type of entity and type can be obtained, compared with relying on the mode of relation between the artificial type gathering entity in prior art, by relation between the automatic type gathering entity in the embodiment of the present invention, artificial collection can be avoided, therefore improve the acquisition efficiency of the relation between the type of entity and type, reduce the procurement cost of the type of entity and the relation of type, use manpower and material resources sparingly.
Those skilled in the art can be well understood to, and for convenience and simplicity of description, the system of foregoing description, the specific works process of device and unit, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
In several embodiment provided by the present invention, should be understood that, disclosed system, apparatus and method, can realize by another way.Such as, device embodiment described above is only schematic, such as, the division of described unit, is only a kind of logic function and divides, and actual can have other dividing mode when realizing, such as, multiple unit or assembly can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of device or unit or communication connection can be electrical, machinery or other form.
The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated.Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form that hardware also can be adopted to add SFU software functional unit realizes.
The above-mentioned integrated unit realized with the form of SFU software functional unit, can be stored in a computer read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, comprising some instructions in order to make a computer installation (can be personal computer, server, or network equipment etc.) or processor (Processor) perform the part steps of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (Read-OnlyMemory, ROM), random access memory (RandomAccessMemory, RAM), magnetic disc or CD etc. various can be program code stored medium.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within the scope of protection of the invention.