Disclosure of Invention
The technical problem to be solved by the application is to provide a construction method of enterprise incidence relation based on a knowledge graph and a corresponding construction system. The method and the device are based on enterprise business information, and meanwhile, the information of each dimension of the enterprise is mined, analyzed and constructed optionally by combining structured data and unstructured data related to the enterprise. The method and the device can reduce the requirements on computing resources, and simultaneously improve the accuracy of enterprise information and greatly improve data dimensions.
In order to solve the technical problem, the application discloses an enterprise incidence relation construction method, which comprises the following steps. Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information. Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information. Step S130: and extending and increasing the attributes of the enterprise nodes in the knowledge graph based on the enterprise structural information. Step S140: one or more edges characterizing the enterprise having associated features are expanded in the knowledge graph based on the enterprise structured information. Step S150: and extracting structured triple information from the unstructured information of the enterprise. Step S160: adding one or more edges in the knowledge graph, which characterize the enterprise as having associated features, based on the unstructured information of the enterprise. Step S170: and expanding and updating the knowledge graph based on the aging information. The enterprise incidence relation construction method is an embodiment I of the application, and the enterprise incidence relation is constructed based on enterprise business information.
Further, in step S110, data cleaning is performed on the equity data and the high-management data in the enterprise business information, and then a knowledge graph is constructed based on the cleaned data. Therefore, additional burden on the construction of the knowledge graph caused by invalid data, error data and the like can be avoided, and interference on subsequent operation can also be avoided.
Further, the data cleaning comprises one or more of enterprise basic attribute legality cleaning, stock right proportion legality detection cleaning, high management data legality cleaning, data consistency checking, invalid data eliminating and missing data filling. This is a preferred implementation of data cleansing.
Further, in step S110, data standardization is performed on the equity data and the high-management data in the enterprise business information, and then a knowledge graph is constructed based on the standardized data. This facilitates subsequent graph operations, avoiding errors or deviations due to data non-normality.
Further, the data normalization includes one or more of the following operations; firstly, address information registered by a manufacturer is decomposed and standardized; standardizing mailbox domain names and website domain names registered by enterprises and businesses, and deleting a public domain name; thirdly, standardizing the telephone information registered by the enterprise. This is a preferred implementation of data normalization.
Further, stock right data and high management data in the enterprise business information are subjected to data cleaning, then data standardization processing is carried out, and a knowledge graph is constructed based on the cleaned and standardized data. This is a preferred sequential arrangement.
Furthermore, in the knowledge graph, each enterprise and the direct shareholder and high-management personnel thereof are respectively used as each node in the graph; the direct investment relation of the direct shareholder node to the enterprise node is represented by a first type edge; the high management occupational relationship of the high management personnel nodes at the enterprise nodes is represented by a second class of edges; the same natural human relationship is represented by a third class of edges. This is a preferred implementation of constructing a knowledge graph.
Further, each node contains three attributes: entity ID, entity name, entity type; the enterprise node also has the following attributes: the system comprises a business registration address, a business registered mailbox domain name, a business registered website domain name, a business registered telephone, past name information of an enterprise and product name information of the enterprise. The attributes of the nodes are used in subsequent graph computations.
Further, the edges all have a type attribute to distinguish the different types of edges. This is used to distinguish between different types of edges.
Further, in step S120, when the high-priority manager nodes connected by the second type of edge of any two enterprise nodes are the same, or there is a connection of the third type of edge between the high-priority manager nodes, a fourth type of edge representing the same high-priority manager is newly added between the two enterprise nodes. Preferred implementations of edges characterizing enterprise-related features based on enterprise business information extensions are presented herein.
Further, in step S170, the updating manner includes one or more combinations of real-time updating, incremental updating, and full-volume updating. Several common knowledge graph updating methods are given.
Furthermore, a real-time updating mode is adopted for enterprise business data. This is a preferred implementation.
Further, an incremental updating mode is adopted for the enterprise structured information and the enterprise unstructured information, and the incremental updating mode comprises the steps of updating the connection relation of edges which represent that the enterprises have associated characteristics and the attributes of the edges in the knowledge graph; the attributes of the edge include aging attributes. This is a preferred implementation and particularly emphasizes that aging information is also updated in the knowledge-graph.
The application also discloses an enterprise incidence relation construction method which comprises the following steps. Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information. Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information. Step S330: and extending and increasing the attributes of the enterprise nodes in the knowledge graph based on the enterprise structural information. Step S340: one or more edges characterizing the enterprise having associated features are expanded in the knowledge graph based on the enterprise structured information. Step S170: and expanding and updating the knowledge graph based on the aging information. The enterprise incidence relation construction method is an embodiment II of the application, and the enterprise incidence relation is constructed based on enterprise business information and enterprise structural information.
Further, in the step S330, the node attributes in the intellectual graph are extended based on the intellectual property information of the enterprise. The intellectual property information of an enterprise is collected, and three timeliness information are added for each piece of intellectual property information: application time, authorization time and expiration time; inquiring the name of the enterprise to which each intellectual property information belongs, searching in the entity name attribute of each enterprise node and the past name information attribute of the enterprise by using the name of the enterprise, finding the enterprise node corresponding to the intellectual property information, adding an intellectual property attribute for the corresponding enterprise node, and adding the intellectual property information into the intellectual property attribute of the corresponding enterprise node; intellectual property attributes include intellectual property type, application time, authorization time, and expiration time. A first implementation of extending node attributes in a knowledge-graph based on enterprise structured information is presented herein.
Further, in step S330, the node attributes in the knowledge graph are extended based on the business card information of the enterprise employee. Collecting business card information of enterprise employees, and clearing personal privacy information, wherein the rest business card information is public information of enterprises; adding creation time for each piece of business card information; calculating a hash value for the public information of each business card, and aggregating the public information of the business cards with the same hash value together to obtain enterprise business card template information; the creation time of the enterprise business card template information is the earliest creation time of all aggregated business cards; for each enterprise business card template information, searching by utilizing one or more of the entity name attribute of the enterprise name in each enterprise node, the past name information attribute of the enterprise, the trademark information in the intellectual property attribute and the product name information attribute of the enterprise to find out the enterprise node corresponding to the enterprise business card template information; and adding a name card template attribute for the corresponding enterprise node, wherein the name card template attribute comprises an enterprise name, an address, a mailbox domain name, a website domain name, an enterprise telephone and creation time. A second implementation of extending node attributes in a knowledge-graph based on enterprise structured information is presented herein.
Further, the step S340 includes any one or more of the following operations. When the intellectual property attributes of any two enterprise nodes contain at least one piece of same intellectual property information, and the intellectual property ID is taken as a judgment basis, a fifth class of edges representing the same intellectual property are added between the two enterprise nodes. When the business registered address attribute and the business card template attribute of any two enterprise nodes contain at least one same or similar address, and the similarity means that the two enterprise nodes are located in the same office building, a sixth class edge with the common address is newly added between the two enterprise nodes. When at least one same mailbox domain name is contained between the set of mailbox domain name attribute and business card template attribute registered by the industry and commerce of any two enterprise nodes, a seventh class edge representing the same mailbox domain name is newly added between the two enterprise nodes; the seventh class of edges has aging properties. When at least one identical website domain name is included between the website domain name attribute registered by the industry and commerce of any two enterprise nodes and the collection of the name card template attribute, an eighth class edge representing the website domain name with the same attribute is newly added between the two enterprise nodes; the eighth class of edges has aging properties. When at least one same telephone is included between the sets of the telephone attribute and the business card template attribute registered by the industry and commerce of any two enterprise nodes, which means that the telephone number hosts after the area code and the extension code are removed are the same, a ninth edge with the same telephone is newly added between the two enterprise nodes. Preferred implementations of edges characterizing enterprise-related features based on enterprise structured information extensions are presented herein.
Further, the fifth type edge to the ninth type edge all have aging properties. And in the intellectual property attributes of the two enterprise nodes connected by any fifth-class edge, the earliest application time, the earliest authorization time and the latest failure time in the same intellectual property information are taken as the aging attributes of the fifth-class edge. And if the business card template attributes of the two enterprise nodes connected by any sixth edge have the same or similar address information, taking the latest creation time in the business card template attributes of the two enterprise nodes as the aging attribute of the sixth edge. And if the business card template attributes of the two enterprise nodes connected by any seventh edge have the same mailbox domain name, taking the latest creation time in the business card template attributes of the two enterprise nodes as the timeliness attribute of the seventh edge. And if the business card template attributes of the two enterprise nodes connected by any eighth edge have the same website domain name, taking the latest creation time in the business card template attributes of the two enterprise nodes as the aging attribute of the eighth edge. And if the same telephone exists in the business card template attributes of the two enterprise nodes connected by any ninth edge, taking the latest creation time in the business card template attributes of the two enterprise nodes as the aging attribute of the ninth edge. The time property of the edge which characterizes the enterprise association feature based on the enterprise structural information extension is given.
Further, step S120 is a first group, steps S330 to S340 are a second group, and the two groups are executed in an order, or in an order reversed, or simultaneously, or alternately. Here, a description is given of an unlimited order between several steps in the second embodiment.
The application also discloses an enterprise incidence relation construction method which comprises the following steps. Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information. Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information. Step S430: and extracting structured triple information from the unstructured information of the enterprise. Step S440: adding one or more edges in the knowledge graph, which characterize the enterprise as having associated features, based on the unstructured information of the enterprise. Step S170: and expanding and updating the knowledge graph based on the aging information. The enterprise incidence relation construction method is an embodiment III of the application, and the enterprise incidence relation is constructed based on enterprise business information and enterprise unstructured information.
Further, in step S430, extracting the triple information based on the official document of the enterprise; the triplet is defined as: entity-relationship-entity; the relationship here is determined as: common original relationship, common defendant relationship, original defendant relationship. A first implementation of extracting triples from enterprise unstructured information is presented herein.
Further, in step S430, extracting triple information based on the bidding document of the enterprise; the triplet is defined as: entity-relationship-entity; the relationship here is determined as: common bidder relationship, common winning bidder relationship, winning bidder relationship. A second implementation of extracting triples from enterprise unstructured information is presented herein.
Further, in step S440, searching one or more of an entity name attribute of each enterprise node, a past name information attribute of the enterprise, trademark information in an intellectual property attribute, and a product name information attribute of the enterprise of each entity in the triple information in the intellectual map, and finding two enterprise nodes corresponding to the triple information; when any two enterprise nodes are corresponding to the same triple information, a tenth edge which represents the relationship of the unstructured information is added between the two enterprise nodes. Preferred implementations of adding edges characterizing enterprise-associated features in a knowledge graph based on enterprise unstructured information are presented herein.
Furthermore, no matter how many triples of information correspond to two enterprise nodes, only one tenth edge is added between the two enterprise nodes; the attribute of the tenth type edge is the union of the relations of all the triple information corresponding to the two enterprise nodes; the tenth class of edges has the aging property: the earliest creation time and the latest creation time in the corresponding triplet information between the two enterprise nodes. A first specific implementation of edges characterizing enterprise-associated features that are extended in a knowledge graph based on enterprise unstructured information is presented herein.
Furthermore, a plurality of triple information correspond to two enterprise nodes, and a plurality of tenth edges are newly added between the two enterprise nodes; the attribute of each tenth edge is the relationship of a triple information corresponding to the two enterprise nodes; the tenth type edge has an aging property, i.e. the creation time of the corresponding triple information. A second specific implementation of edges characterizing enterprise-associated features that are extended in a knowledge graph based on enterprise unstructured information is presented herein.
Further, step S120 is the first group, steps S430 to S440 are the third group, and the two groups are executed in the same order or in the same order, or alternately. Here, a description is given of the order between several steps in the third embodiment without limitation.
The application also discloses an enterprise incidence relation construction method which comprises the following steps. Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information. Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information. Step S130: and extending and increasing the attributes of the enterprise nodes in the knowledge graph based on the enterprise structural information. Step S140: one or more edges characterizing the enterprise having associated features are expanded in the knowledge graph based on the enterprise structured information. Step S150: and extracting structured triple information from the unstructured information of the enterprise. Step S160: adding one or more edges in the knowledge graph, which characterize the enterprise as having associated features, based on the unstructured information of the enterprise. Step S170: and expanding and updating the knowledge graph based on the aging information. The enterprise incidence relation construction method is the fourth embodiment of the application, and the enterprise incidence relation is constructed based on enterprise business information, enterprise structured information and enterprise unstructured information.
Further, step S120 is a first group, steps S330 to S340 are a second group, steps S430 to S440 are a third group, and the three groups are performed in an order, or in an order reversed, or performed simultaneously, or performed alternately. A non-limiting illustration of the order between the steps is provided herein.
The application also discloses an enterprise incidence relation construction system which comprises a map construction module, a first extension module and an aging updating module. The map building module is used for building a knowledge map which reflects stockholder investment and high management occupational relation of the enterprise and labels the same natural person by adopting a data structure of map calculation according to stock right data and high management data in the enterprise business information. The first expansion module is used for expanding and adding one or more edges which are used for representing the enterprises and have associated characteristics in the knowledge graph based on the enterprise business information. The aging updating module is used for expanding and updating the knowledge graph based on the aging information. The enterprise incidence relation construction system is an embodiment one of the application, and the enterprise incidence relation is constructed based on enterprise business information.
The application also discloses an enterprise incidence relation construction system which comprises a map construction module, a first extension module, a second extension module, a third extension module and an aging updating module. The map building module is used for building a knowledge map which reflects stockholder investment and high management occupational relation of the enterprise and labels the same natural person by adopting a data structure of map calculation according to stock right data and high management data in the enterprise business information. The first expansion module is used for expanding and adding one or more edges which are used for representing the enterprises and have associated characteristics in the knowledge graph based on the enterprise business information. The second expansion module is used for expanding the attributes of the added enterprise nodes in the knowledge graph based on the enterprise structural information. The third extension module is used for adding one or more edges which characterize the enterprise and have associated characteristics in the knowledge graph based on the enterprise structural information. The aging updating module is used for expanding and updating the knowledge graph based on the aging information. The enterprise incidence relation construction system is an embodiment two of the application, and the enterprise incidence relation is constructed based on enterprise business information and enterprise structural information.
The application also discloses an enterprise incidence relation construction system which comprises a map construction module, a first extension module, an information extraction module, a fourth extension module and an aging updating module. The map building module is used for building a knowledge map which reflects stockholder investment and high management occupational relation of the enterprise and labels the same natural person by adopting a data structure of map calculation according to stock right data and high management data in the enterprise business information. The first expansion module is used for expanding and adding one or more edges which are used for representing the enterprises and have associated characteristics in the knowledge graph based on the enterprise business information. The information extraction module is used for extracting structured triple information from the unstructured information of the enterprise. The fourth expansion module is used for adding one or more edges which characterize the enterprise and have associated characteristics in the knowledge graph based on the unstructured information of the enterprise. The aging updating module is used for expanding and updating the knowledge graph based on the aging information. The enterprise incidence relation construction system is an embodiment three of the application, and an enterprise incidence relation is constructed based on enterprise business information and enterprise unstructured information.
The application also discloses an enterprise incidence relation construction system which comprises a map construction module, a first extension module, a second extension module, a third extension module, an information extraction module, a fourth extension module and an aging updating module. The map building module is used for building a knowledge map which reflects stockholder investment and high management occupational relation of the enterprise and labels the same natural person by adopting a data structure of map calculation according to stock right data and high management data in the enterprise business information. The first expansion module is used for expanding and adding one or more edges which are used for representing the enterprises and have associated characteristics in the knowledge graph based on the enterprise business information. The second expansion module is used for expanding the attributes of the added enterprise nodes in the knowledge graph based on the enterprise structural information. The third extension module is used for adding one or more edges which characterize the enterprise and have associated characteristics in the knowledge graph based on the enterprise structural information. The information extraction module is used for extracting structured triple information from the unstructured information of the enterprise. The fourth expansion module is used for adding one or more edges which characterize the enterprise and have associated characteristics in the knowledge graph based on the unstructured information of the enterprise. The aging updating module is used for expanding and updating the knowledge graph based on the aging information. The enterprise incidence relation construction system is the fourth embodiment of the present application, and the enterprise incidence relation is constructed based on enterprise business information, enterprise structured information and enterprise unstructured information.
The method has the technical effects that the knowledge graph is constructed and stored by the graph database, enterprise business information, enterprise structural information and/or enterprise unstructured information are/is used as data sources, the association characteristics of enterprises are represented in the knowledge graph through node attributes, edge establishment and edge attributes, and aging information is widely provided.
Detailed Description
Referring to fig. 1, an embodiment of the method for constructing an enterprise association relationship provided by the present application includes the following steps.
Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information.
The enterprise business information refers to information registered by an enterprise in a business administration management department, and comprises an enterprise name, an enterprise address, enterprise registered capital, enterprise share right data, enterprise high management data and the like. The stock right data refers to direct stockholders and the ratio of capital investment of the enterprise. The high management data refers to high-level manager information of the enterprise, such as legal representatives, directors, supervisors, and the like.
Preferably, in step S110, data cleaning (data cleaning) is performed on the equity data and the high-management data in the enterprise and business information, and then a knowledge graph is constructed based on the cleaned data. The data cleaning comprises one or more of enterprise basic attribute legality cleaning, stock right proportion legality detection cleaning, high management data legality cleaning, data consistency checking, invalid data eliminating and missing data filling.
Preferably, in step S110, data standardization is performed on the equity data and the high management data in the enterprise business information, and then a knowledge graph is constructed based on the standardized data. The data normalization includes one or more of the following operations. First, address information registered by a manufacturer is decomposed and standardized, and each address information is decomposed into a province, a city, a district, a road and a garden corresponding to the address information. Secondly, standardizing mailbox domain names and website domain names registered by enterprises and businesses, uniformly converting the mailbox domain names and the website domain names into upper-case letters or lower-case letters, converting all punctuation marks into half-angle marks, and simultaneously deleting some public domain names such as 163.com, qq.com, sina.com, gmail.com, sina.com.cn and the like. Thirdly, standardizing the telephone information registered by the enterprise, and decomposing the telephone number into area code, telephone number host and extension number information.
Preferably, stock right data and high management data in the enterprise business information are subjected to data cleaning, then data standardization processing is carried out, and a knowledge graph is constructed based on the cleaned and standardized data.
Referring to fig. 2, the construction of the knowledge graph specifically includes the following steps.
Step S210: and taking each enterprise in the enterprise business information and the direct shareholder and high-management personnel thereof as each node in the graph. Each node contains three attributes: entity ID, entity name, entity type. The entity ID is a unique ID given to each node as a unique identification of the node. The entity name refers to a unit name or a natural person name. Entity types include E, P, G, S, Z, etc., where E represents various types of businesses such as individual payroll businesses, individual exclusive enterprises, collaborators, enterprise legal entities, etc.; p represents a natural person; g represents a government agency; s represents a career unit; z represents a social organization. For the enterprise node, the following attributes are also available: the system comprises a business registration address, a business registered mailbox domain name, a business registered website domain name, a business registered telephone, past name information of an enterprise and product name information of the enterprise.
Step S220: and adding a first class edge representing a direct investment relation between the enterprise node and the direct shareholder node thereof based on the equity data of each enterprise. The first type of edge has a direction, which may be, for example, from a direct shareholder to the business, or may change to the opposite direction. The attribute of the first class of edges is the direct investment proportion.
Step S230: and adding a second class of edges representing the high management and duties relationship between the enterprise nodes and the high manager nodes thereof based on the high management and duties data of each enterprise. The second type of edge may or may not have a direction. The attribute of the second class of edges is the job title of the job.
Step S240: and adding third edges representing the relation of the same natural person between every two natural person nodes which have the same name and are actually the same natural person. The third type of edge is preferably non-directional. The attributes of the third class of edges are the same natural human relationship.
The execution sequence of steps S220 to S240 is not strictly limited, and the two are allowed to be either interchanged in sequence, or performed simultaneously or interleaved.
Preferably, all edges in the knowledge-graph have a type attribute to distinguish the first class of edges from the second class of edges, … ….
Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information.
For example, when any two enterprise nodes are connected by the second type of edge, and the high-management personnel nodes are the same, or the high-management personnel nodes are connected by the third type of edge, a fourth type of edge representing the same high-management personnel is newly added between the two enterprise nodes. The fourth type of edge is preferably non-directional.
Step S170: and expanding and updating the knowledge graph based on the aging information. The method is used for updating node attributes in the knowledge graph, updating the connection relation of edges which characterize the enterprise and have associated characteristics and the attributes of the edges in the knowledge graph, and particularly paying attention to updating of the aging attributes.
For example, a real-time update mode is adopted. Preferably, a real-time updating mode is adopted for enterprise business data. Acquiring data to be updated, analyzing the data into an agreed format, inquiring corresponding nodes in the knowledge graph, comparing update fields, determining whether to update or not and updating modes, and finally updating the knowledge graph.
In another example, an incremental update approach is used. Preferably, incremental updating is adopted for the structured information and the unstructured information of the enterprise, for example, the incremental updating is timed every day. And (3) for the newly added data, after calculation analysis such as desensitization and aggregation, fusing the data into the existing knowledge graph, and updating the connection relation of the edges which represent the associated characteristics of the enterprise and the attributes of the edges. Since the enterprise structured information and the enterprise unstructured information contain aging information, when the connection relationship of the edges representing the enterprise with the associated features and the attributes of the edges are updated, the aging attributes of the edges also need to be updated. For example, the age information for certain edges is the latest creation time of the associated business card template. Therefore, if the creation time of the newly added enterprise business card template is later than the current creation time, the time property of the edges is synchronously updated.
For another example, a full-scale update approach is employed. The knowledge-graph may be updated in full, either periodically or manually, for stability or special cases. And aiming at the full amount of structured data, an initial knowledge graph is constructed and edges with associated characteristics of an enterprise are represented after data processing and big data calculation and analysis. And then converting the full amount of unstructured data into triple information, and updating the knowledge graph and representing edges of the enterprise with associated characteristics.
In the first embodiment, only the enterprise business information is utilized to construct the enterprise association relationship.
Referring to fig. 3, an embodiment of the method for constructing an enterprise association relationship provided by the present application includes the following steps.
Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information.
Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information.
Step S330: and extending and increasing the attributes of the enterprise nodes in the knowledge graph based on the enterprise structural information. The enterprise structured information refers to information which is structured well, such as intellectual property information of an enterprise, business card information of employees of the enterprise and the like, and has the characteristic of high accuracy.
For example, node attributes in the intellectual graph are extended based on intellectual property information of the enterprise.
Intellectual property information of enterprises, including trademarks, patents, software copyright, work copyright, qualification certification and the like, is collected, and a unique intellectual property ID is constructed for each piece of intellectual property information. Because the intellectual property information has timeliness, three timeliness information are added to each piece of intellectual property information: application time, authorization time, and expiration time. And inquiring the name of the enterprise to which each intellectual property information belongs, searching the entity name attribute of each enterprise node and the past name information attribute of the enterprise by using the name of the enterprise, finding the enterprise node corresponding to the intellectual property information, adding an intellectual property attribute to the corresponding enterprise node, adding the intellectual property information to the intellectual property attribute of the corresponding enterprise node, and updating the intellectual property map. The intellectual property attribute comprises intellectual property type, application time, authorization time, expiration time and the like. Since the intellectual property attribute of the enterprise node has timeliness information, the timeliness information of the intellectual property attribute is also updated into the intellectual graph.
Preferably, if a piece of intellectual property information is matched to a plurality of enterprise nodes at the same time, the piece of intellectual property information is added to intellectual property attributes of the plurality of enterprise nodes at the same time.
Preferably, if a plurality of pieces of intellectual property information are contained under the intellectual property attribute of an enterprise node, the intellectual property information is aggregated into an array, and the array is added under the intellectual property attribute of the enterprise node to update the intellectual map.
As another example, node attributes in the knowledge-graph are extended based on business card information for business employees.
The method comprises the steps of collecting business card information of enterprise employees, and clearing personal privacy information such as employee names, positions, departments, personal phones, personal mailboxes and the like, namely obtaining the business card information of the desensitized enterprise employees. The rest business card information is the public information of the enterprise, and mainly comprises the information of enterprise name, address, mailbox domain name, website domain name, enterprise telephone and the like. Preferably, data cleansing and data normalization are performed on these data. And adding creation time for each piece of business card information to embody timeliness. And calculating a hash value for the public information of each business card, and aggregating the public information of the business cards with the same hash value together to obtain the enterprise business card template information. The enterprise business card template information has all public information of the business cards and also has timeliness information, and the creation time of the enterprise business card template information is the earliest creation time of all aggregated business cards. And for each enterprise business card template information, searching one or more of the entity name attribute of the enterprise name at each enterprise node, the past name information attribute of the enterprise, the trademark information in the intellectual property attribute and the product name information attribute of the enterprise, searching and matching the product name with the priority of entity name > past name > trademark > and finding the enterprise node corresponding to the enterprise business card template information. And adding a name card template attribute for the corresponding enterprise node, wherein the name card template attribute comprises an enterprise name, an address, a mailbox domain name, a website domain name, an enterprise telephone, creation time and the like, and updating the knowledge graph. Because business card template attributes of the enterprise nodes have timeliness information, the timeliness information of the business card template attributes is also updated into the knowledge graph.
Preferably, if a plurality of enterprise business card template information are matched with the same enterprise node, the enterprise business card template information is aggregated into an array and then added into the business card template attribute of the corresponding enterprise node.
Step S340: one or more edges characterizing the enterprise having associated features are expanded in the knowledge graph based on the enterprise structured information.
For example, when the intellectual property attribute of any two enterprise nodes contains at least one piece of the same intellectual property information, and the intellectual property ID is used as a judgment basis, a fifth class of edges representing the same intellectual property are added between the two enterprise nodes. The fifth class of edges has aging properties. And in the intellectual property attributes of the two enterprise nodes connected by any fifth-class edge, the earliest application time, the earliest authorization time and the latest failure time in the same intellectual property information are taken as the aging attributes of the fifth-class edge.
For another example, when the business registered address attribute and the business card template attribute of any two enterprise nodes contain at least one same or similar address, and the similarity means that the two enterprise nodes are located in the same office building, a sixth class edge representing the common address is newly added between the two enterprise nodes. The sixth class of edges has aging properties. And if the business card template attributes of the two enterprise nodes connected by any sixth edge have the same or similar address information, taking the latest creation time in the business card template attributes of the two enterprise nodes as the aging attribute of the sixth edge.
For another example, when at least one identical mailbox domain name is included between the mailbox domain name attribute and the collection of business card template attributes registered by the industry and commerce of any two enterprise nodes, a seventh class edge representing the same mailbox domain name is newly added between the two enterprise nodes. The seventh class of edges has aging properties. And if the business card template attributes of the two enterprise nodes connected by any seventh edge have the same mailbox domain name, taking the latest creation time in the business card template attributes of the two enterprise nodes as the timeliness attribute of the seventh edge.
For another example, when at least one identical website domain name is included between the website domain name attribute registered by the industry and commerce of any two enterprise nodes and the collection of the business card template attribute, an eighth class edge representing the website domain name with the same attribute is newly added between the two enterprise nodes. The eighth class of edges has aging properties. And if the business card template attributes of the two enterprise nodes connected by any eighth edge have the same website domain name, taking the latest creation time in the business card template attributes of the two enterprise nodes as the aging attribute of the eighth edge.
For another example, when at least one same phone is included between the sets of business card template attributes and business telephone attributes registered by the two enterprise nodes, which means that the phone number hosts after the area code and the extension code are removed are the same, a ninth type edge representing the same phone is added between the two enterprise nodes. The ninth type of edge has aging properties. And if the same telephone exists in the business card template attributes of the two enterprise nodes connected by any ninth edge, taking the latest creation time in the business card template attributes of the two enterprise nodes as the aging attribute of the ninth edge.
Step S170: and expanding and updating the knowledge graph based on the aging information.
In the second embodiment, the enterprise business information and the enterprise structural information are simultaneously utilized to construct the enterprise association relationship. Wherein, the step S120 is the first group, the steps S330 to S340 are the second group, the execution order of the two groups is not strictly limited, and the two groups are allowed to be either exchanged in order, or performed simultaneously or performed alternately.
Referring to fig. 4, a third embodiment of the enterprise association relationship construction method provided by the present application includes the following steps.
Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information.
Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information.
Step S430: and extracting structured triple information from the unstructured information of the enterprise. The unstructured enterprise information refers to free text information related to an enterprise, and it is necessary to convert unstructured information into structured information that can be read and understood by a computer through technologies such as natural language processing. Errors may occur during the structured processing, so unstructured information generally serves as a richness and supplement to the construction of business associations.
In this step, machine learning techniques are used to extract the required entity information from the free text relating to the business. For example, Bert, BilSTM, CRF algorithms are employed to extract business names from referee documents and bid documents as entities. Machine learning techniques are employed to classify a relationship between two entities in free text relating to a business. The relationship between any two entities is determined from the referee document and the bid document, for example, using the Bert and MLP algorithms. Since there is no relationship between many entities, this option of no relationship can be output in the output result of Bert and MLP algorithms. A pair of entities and their relationship constitutes a structured triplet of information. Since the unstructured data is time-efficient, for each relationship, time-efficient information is also added as creation time.
For example, the triple information is extracted based on the official document of the enterprise. The triplet is defined as: entity-relationship-entity. The relationship here is determined as: common original relationship, common defendant relationship, original defendant relationship.
As another example, the triplet information is extracted based on the bid and bid document of the enterprise. The triplet is defined as: entity-relationship-entity. The relationship here is determined as: common bidder relationship, common winning bidder relationship, winning bidder relationship.
Step S440: adding one or more edges in the knowledge graph, which characterize the enterprise as having associated features, based on the unstructured information of the enterprise.
Searching one or more of an entity name attribute of each enterprise node, a past name information attribute of an enterprise, trademark information in an intellectual property attribute and a product name information attribute of the enterprise of each entity in the triple information in an intellectual map, searching and matching the entity name, the past name, the trademark and the product name with the priority of entity name, past name and trademark and finding out two enterprise nodes corresponding to the triple information.
When any two enterprise nodes are corresponding to the same triple information, a tenth edge which represents the relationship of the unstructured information is added between the two enterprise nodes.
One implementation is that no matter how many triplets correspond to two enterprise nodes, only one tenth edge is added between the two enterprise nodes. The attribute of the tenth type edge is the union of the relations of all the triple information corresponding to the two enterprise nodes. The tenth class of edges has the aging property: the earliest creation time and the latest creation time in the corresponding triplet information between the two enterprise nodes.
The other realization mode is as follows: and if the two enterprise nodes are corresponding to a plurality of triple information, adding a plurality of tenth edges between the two enterprise nodes. The attribute of each tenth edge is the relationship of a triple information corresponding to the two enterprise nodes. The tenth type edge has an aging property, i.e. the creation time of the corresponding triple information.
Step S170: and expanding and updating the knowledge graph based on the aging information.
In the third embodiment, the enterprise business information and the enterprise unstructured information are simultaneously utilized to construct the enterprise association relationship. Wherein step S120 is the first group, steps S430 to S440 are the third group, the execution order of the two groups is not strictly limited, and the two groups are allowed to be either exchanged in order, or performed simultaneously or performed alternately.
Referring to fig. 5, a fourth embodiment of the method for constructing an enterprise association relationship provided by the present application includes the following steps.
Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information.
Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information.
Step S330: and extending and increasing the attributes of the enterprise nodes in the knowledge graph based on the enterprise structural information. The enterprise structured information refers to information which is structured well, such as intellectual property information of an enterprise, business card information of employees of the enterprise and the like, and has the characteristic of high accuracy.
Step S340: one or more edges characterizing the enterprise having associated features are expanded in the knowledge graph based on the enterprise structured information.
Step S430: and extracting structured triple information from the unstructured information of the enterprise.
Step S440: adding one or more edges in the knowledge graph, which characterize the enterprise as having associated features, based on the unstructured information of the enterprise.
Step S170: and expanding and updating the knowledge graph based on the aging information.
In the fourth embodiment, the enterprise business information, the enterprise structured information and the enterprise unstructured information are simultaneously utilized to construct the enterprise association relationship. Wherein, step S120 is a first group, steps S330 to S340 are a second group, and steps S430 to S440 are a third group, the execution order of the three groups is not strictly limited, and the three groups are either interchanged in order, or performed simultaneously, or performed alternately, all of which are allowed.
Referring to fig. 6, in correspondence with the first embodiment of the enterprise association relationship construction method, the present application further provides a first embodiment of an enterprise association relationship construction system. The enterprise incidence relation building system 600 comprises a map building module 610, a first extending module 620 and an aging updating module 670.
The map construction module 610 is used for constructing a knowledge map which reflects stockholder investment and high managerial duties relationship of an enterprise and labels the same natural person by adopting a data structure of map calculation according to stock right data and high managerial data in the enterprise business information. In the constructed knowledge graph, each enterprise and the direct shareholder and high-management personnel thereof are respectively used as each node in the graph. Each node contains three attributes: entity ID, entity name, entity type. The enterprise node also has the following attributes: the system comprises a business registration address, a business registered mailbox domain name, a business registered website domain name, a business registered telephone, past name information of an enterprise and product name information of the enterprise. The direct investment relation and the direct investment proportion of the direct stockholder nodes to the enterprise nodes are represented by directional first class edges. The occupational relationship and the occupational duties of the high-management personnel nodes in the enterprise nodes are represented by the second class edges. The same natural human relationship is represented by a third class of edges.
The first expansion module 620 is used for expanding and adding one or more edges which characterize the enterprise and have associated features in the knowledge graph based on the enterprise business information.
The age update module 670 is used to extend and update the knowledge-graph based on age information.
In the first embodiment, only the enterprise business information is utilized to construct the enterprise association relationship.
Referring to fig. 7, in correspondence with the second embodiment of the enterprise association relationship construction method, the present application further provides a second embodiment of an enterprise association relationship construction system. The enterprise association relationship building system 700 comprises a map building module 610, a first extension module 620, a second extension module 730, a third extension module 740 and an aging updating module 670. The three modules are the same as the first embodiment and are not described again.
The second expansion module 730 is used to expand the attributes of the added enterprise nodes in the knowledge-graph based on the enterprise structured information.
The third extension module 740 is configured to add one or more edges in the knowledge-graph that characterize the business as having associated features based on the business structured information.
In the second embodiment, the enterprise business information and the enterprise structural information are simultaneously utilized to construct the enterprise association relationship.
Referring to fig. 8, in correspondence with the third embodiment of the enterprise association relationship construction method, the present application further provides a third embodiment of an enterprise association relationship construction system. The enterprise association relationship building system 800 comprises a map building module 610, a first extension module 620, an information extraction module 830, a fourth extension module 840 and an aging updating module 670. The three modules are the same as the first embodiment and are not described again.
The information extraction module 250 is used to extract structured triple information from the unstructured information of the enterprise.
The fourth extension module 260 is used to add one or more edges in the knowledge-graph that characterize the business as having associated features based on the business unstructured information.
The third embodiment simultaneously utilizes the enterprise business information and the enterprise waste structured information to construct the enterprise association relationship.
Referring to fig. 9, in correspondence with the fourth embodiment of the enterprise association relationship construction method, the present application further provides a fourth embodiment of an enterprise association relationship construction system. The enterprise association relationship building system 900 comprises a map building module 610, a first extension module 620, a second extension module 730, a third extension module 740, an information extraction module 830, a fourth extension module 840 and an aging updating module 670. The three modules are the same as the first embodiment, the two modules are the same as the second embodiment, and the two modules are the same as the third embodiment, which is not described again.
In the fourth embodiment, the enterprise business information, the enterprise structured information and the enterprise unstructured information are simultaneously utilized to construct the enterprise association relationship.
The method and the system construct and store the knowledge graph which reflects the equity investment and the high management and arbitrary function relationship of the enterprise and marks the same natural person based on the graph database, and expand and increase the node attributes and edges representing the association characteristics of the enterprise in the knowledge graph based on the enterprise business information, the enterprise structural information and the enterprise unstructured information, so that the method and the system are favorable for calculating and storing the enterprise association relationship and the suspected enterprise association relationship through graph calculation in the follow-up process.
The above are merely preferred embodiments of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.