CN110781246A - Enterprise association relationship construction method and system - Google Patents

Enterprise association relationship construction method and system Download PDF

Info

Publication number
CN110781246A
CN110781246A CN201910878683.3A CN201910878683A CN110781246A CN 110781246 A CN110781246 A CN 110781246A CN 201910878683 A CN201910878683 A CN 201910878683A CN 110781246 A CN110781246 A CN 110781246A
Authority
CN
China
Prior art keywords
enterprise
information
knowledge graph
nodes
business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910878683.3A
Other languages
Chinese (zh)
Inventor
丁凯
龙腾
陈青山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Linguan Data Technology Co.,Ltd.
Shanghai Shengteng Data Technology Co.,Ltd.
Shanghai yingwuchu Data Technology Co.,Ltd.
Shanghai Hehe Information Technology Development Co Ltd
Original Assignee
Shanghai Shengteng Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Shengteng Data Technology Co Ltd filed Critical Shanghai Shengteng Data Technology Co Ltd
Priority to CN201910878683.3A priority Critical patent/CN110781246A/en
Publication of CN110781246A publication Critical patent/CN110781246A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses an enterprise incidence relation construction method which comprises the following steps. Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information. Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information. Step S170: and expanding and updating the knowledge graph based on the aging information. The method adopts the graph database to construct and store the knowledge graph, and represents the association characteristics of the enterprise through the establishment and the attributes of the node attributes and the edges in the knowledge graph based on the enterprise business information, the enterprise structural information and/or the enterprise unstructured information, and has the aging information widely.

Description

Enterprise association relationship construction method and system
Technical Field
The present application relates to a data storage and operation method based on Graph Computing (Graph Computing), and more particularly, to a method for storing and constructing enterprise-related data based on Graph Computing.
Background
Business affiliations refer to relationships between high-management personnel, such as the holding stockholders, actual controllers, directors, supervisors, etc., of an enterprise and the enterprise that they directly or indirectly control, as well as other relationships that may result in a transfer of interest to the company. In the current society, the association relationship between enterprises is increasingly complicated, and various risks and actual losses caused by insufficient identification of the associated enterprises in banks and related units are frequent. The effective identification of the associated enterprises becomes a key element for preventing group customers from multi-head credit authorization, excessive credit authorization and associated guarantee, and is very necessary and timely for ensuring the safety of bank credit assets. Therefore, the data of the enterprises and the personal clients are collected from various different sources in a legal compliance mode to be fused, analyzed and mined to analyze the incidence relation between the enterprises, and the functions of financial risk prevention and control, financial information statistics and prediction and the like are favorably realized.
Graph theory (graph theory) is a branch of mathematics, which takes a graph (graph) as a research object. A graph in graph theory is a graph formed by a number of given nodes (also called vertices, points, verticals, nodes or points) and edges (also called lines, edges, arc or lines) connecting the nodes, and such a graph is generally used to describe a certain relationship between certain things, represent things by nodes, represent two things by edges, and describe the characteristics of things and/or relationships by the properties or weights of the nodes and/or edges.
Knowledge Graph (knowledgegraph) is an application of Graph computation, which consists of several nodes and edges. Nodes represent knowledge and edges between nodes represent relationships between knowledge. If there is a relationship between two nodes, they are connected together by an edge. Knowledge maps are typically stored using Graph databases (Graph Database), commonly used by Neo4j, and the like. At present, the application based on the knowledge graph is mainly embodied in information retrieval, and the related technology of the knowledge graph is relatively deficient in mining enterprise information.
Disclosure of Invention
The technical problem to be solved by the application is to provide a construction method of enterprise incidence relation based on a knowledge graph and a corresponding construction system. The method and the device are based on enterprise business information, and meanwhile, the information of each dimension of the enterprise is mined, analyzed and constructed optionally by combining structured data and unstructured data related to the enterprise. The method and the device can reduce the requirements on computing resources, and simultaneously improve the accuracy of enterprise information and greatly improve data dimensions.
In order to solve the technical problem, the application discloses an enterprise incidence relation construction method, which comprises the following steps. Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information. Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information. Step S130: and extending and increasing the attributes of the enterprise nodes in the knowledge graph based on the enterprise structural information. Step S140: one or more edges characterizing the enterprise having associated features are expanded in the knowledge graph based on the enterprise structured information. Step S150: and extracting structured triple information from the unstructured information of the enterprise. Step S160: adding one or more edges in the knowledge graph, which characterize the enterprise as having associated features, based on the unstructured information of the enterprise. Step S170: and expanding and updating the knowledge graph based on the aging information. The enterprise incidence relation construction method is an embodiment I of the application, and the enterprise incidence relation is constructed based on enterprise business information.
Further, in step S110, data cleaning is performed on the equity data and the high-management data in the enterprise business information, and then a knowledge graph is constructed based on the cleaned data. Therefore, additional burden on the construction of the knowledge graph caused by invalid data, error data and the like can be avoided, and interference on subsequent operation can also be avoided.
Further, the data cleaning comprises one or more of enterprise basic attribute legality cleaning, stock right proportion legality detection cleaning, high management data legality cleaning, data consistency checking, invalid data eliminating and missing data filling. This is a preferred implementation of data cleansing.
Further, in step S110, data standardization is performed on the equity data and the high-management data in the enterprise business information, and then a knowledge graph is constructed based on the standardized data. This facilitates subsequent graph operations, avoiding errors or deviations due to data non-normality.
Further, the data normalization includes one or more of the following operations; firstly, address information registered by a manufacturer is decomposed and standardized; standardizing mailbox domain names and website domain names registered by enterprises and businesses, and deleting a public domain name; thirdly, standardizing the telephone information registered by the enterprise. This is a preferred implementation of data normalization.
Further, stock right data and high management data in the enterprise business information are subjected to data cleaning, then data standardization processing is carried out, and a knowledge graph is constructed based on the cleaned and standardized data. This is a preferred sequential arrangement.
Furthermore, in the knowledge graph, each enterprise and the direct shareholder and high-management personnel thereof are respectively used as each node in the graph; the direct investment relation of the direct shareholder node to the enterprise node is represented by a first type edge; the high management occupational relationship of the high management personnel nodes at the enterprise nodes is represented by a second class of edges; the same natural human relationship is represented by a third class of edges. This is a preferred implementation of constructing a knowledge graph.
Further, each node contains three attributes: entity ID, entity name, entity type; the enterprise node also has the following attributes: the system comprises a business registration address, a business registered mailbox domain name, a business registered website domain name, a business registered telephone, past name information of an enterprise and product name information of the enterprise. The attributes of the nodes are used in subsequent graph computations.
Further, the edges all have a type attribute to distinguish the different types of edges. This is used to distinguish between different types of edges.
Further, in step S120, when the high-priority manager nodes connected by the second type of edge of any two enterprise nodes are the same, or there is a connection of the third type of edge between the high-priority manager nodes, a fourth type of edge representing the same high-priority manager is newly added between the two enterprise nodes. Preferred implementations of edges characterizing enterprise-related features based on enterprise business information extensions are presented herein.
Further, in step S170, the updating manner includes one or more combinations of real-time updating, incremental updating, and full-volume updating. Several common knowledge graph updating methods are given.
Furthermore, a real-time updating mode is adopted for enterprise business data. This is a preferred implementation.
Further, an incremental updating mode is adopted for the enterprise structured information and the enterprise unstructured information, and the incremental updating mode comprises the steps of updating the connection relation of edges which represent that the enterprises have associated characteristics and the attributes of the edges in the knowledge graph; the attributes of the edge include aging attributes. This is a preferred implementation and particularly emphasizes that aging information is also updated in the knowledge-graph.
The application also discloses an enterprise incidence relation construction method which comprises the following steps. Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information. Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information. Step S330: and extending and increasing the attributes of the enterprise nodes in the knowledge graph based on the enterprise structural information. Step S340: one or more edges characterizing the enterprise having associated features are expanded in the knowledge graph based on the enterprise structured information. Step S170: and expanding and updating the knowledge graph based on the aging information. The enterprise incidence relation construction method is an embodiment II of the application, and the enterprise incidence relation is constructed based on enterprise business information and enterprise structural information.
Further, in the step S330, the node attributes in the intellectual graph are extended based on the intellectual property information of the enterprise. The intellectual property information of an enterprise is collected, and three timeliness information are added for each piece of intellectual property information: application time, authorization time and expiration time; inquiring the name of the enterprise to which each intellectual property information belongs, searching in the entity name attribute of each enterprise node and the past name information attribute of the enterprise by using the name of the enterprise, finding the enterprise node corresponding to the intellectual property information, adding an intellectual property attribute for the corresponding enterprise node, and adding the intellectual property information into the intellectual property attribute of the corresponding enterprise node; intellectual property attributes include intellectual property type, application time, authorization time, and expiration time. A first implementation of extending node attributes in a knowledge-graph based on enterprise structured information is presented herein.
Further, in step S330, the node attributes in the knowledge graph are extended based on the business card information of the enterprise employee. Collecting business card information of enterprise employees, and clearing personal privacy information, wherein the rest business card information is public information of enterprises; adding creation time for each piece of business card information; calculating a hash value for the public information of each business card, and aggregating the public information of the business cards with the same hash value together to obtain enterprise business card template information; the creation time of the enterprise business card template information is the earliest creation time of all aggregated business cards; for each enterprise business card template information, searching by utilizing one or more of the entity name attribute of the enterprise name in each enterprise node, the past name information attribute of the enterprise, the trademark information in the intellectual property attribute and the product name information attribute of the enterprise to find out the enterprise node corresponding to the enterprise business card template information; and adding a name card template attribute for the corresponding enterprise node, wherein the name card template attribute comprises an enterprise name, an address, a mailbox domain name, a website domain name, an enterprise telephone and creation time. A second implementation of extending node attributes in a knowledge-graph based on enterprise structured information is presented herein.
Further, the step S340 includes any one or more of the following operations. When the intellectual property attributes of any two enterprise nodes contain at least one piece of same intellectual property information, and the intellectual property ID is taken as a judgment basis, a fifth class of edges representing the same intellectual property are added between the two enterprise nodes. When the business registered address attribute and the business card template attribute of any two enterprise nodes contain at least one same or similar address, and the similarity means that the two enterprise nodes are located in the same office building, a sixth class edge with the common address is newly added between the two enterprise nodes. When at least one same mailbox domain name is contained between the set of mailbox domain name attribute and business card template attribute registered by the industry and commerce of any two enterprise nodes, a seventh class edge representing the same mailbox domain name is newly added between the two enterprise nodes; the seventh class of edges has aging properties. When at least one identical website domain name is included between the website domain name attribute registered by the industry and commerce of any two enterprise nodes and the collection of the name card template attribute, an eighth class edge representing the website domain name with the same attribute is newly added between the two enterprise nodes; the eighth class of edges has aging properties. When at least one same telephone is included between the sets of the telephone attribute and the business card template attribute registered by the industry and commerce of any two enterprise nodes, which means that the telephone number hosts after the area code and the extension code are removed are the same, a ninth edge with the same telephone is newly added between the two enterprise nodes. Preferred implementations of edges characterizing enterprise-related features based on enterprise structured information extensions are presented herein.
Further, the fifth type edge to the ninth type edge all have aging properties. And in the intellectual property attributes of the two enterprise nodes connected by any fifth-class edge, the earliest application time, the earliest authorization time and the latest failure time in the same intellectual property information are taken as the aging attributes of the fifth-class edge. And if the business card template attributes of the two enterprise nodes connected by any sixth edge have the same or similar address information, taking the latest creation time in the business card template attributes of the two enterprise nodes as the aging attribute of the sixth edge. And if the business card template attributes of the two enterprise nodes connected by any seventh edge have the same mailbox domain name, taking the latest creation time in the business card template attributes of the two enterprise nodes as the timeliness attribute of the seventh edge. And if the business card template attributes of the two enterprise nodes connected by any eighth edge have the same website domain name, taking the latest creation time in the business card template attributes of the two enterprise nodes as the aging attribute of the eighth edge. And if the same telephone exists in the business card template attributes of the two enterprise nodes connected by any ninth edge, taking the latest creation time in the business card template attributes of the two enterprise nodes as the aging attribute of the ninth edge. The time property of the edge which characterizes the enterprise association feature based on the enterprise structural information extension is given.
Further, step S120 is a first group, steps S330 to S340 are a second group, and the two groups are executed in an order, or in an order reversed, or simultaneously, or alternately. Here, a description is given of an unlimited order between several steps in the second embodiment.
The application also discloses an enterprise incidence relation construction method which comprises the following steps. Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information. Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information. Step S430: and extracting structured triple information from the unstructured information of the enterprise. Step S440: adding one or more edges in the knowledge graph, which characterize the enterprise as having associated features, based on the unstructured information of the enterprise. Step S170: and expanding and updating the knowledge graph based on the aging information. The enterprise incidence relation construction method is an embodiment III of the application, and the enterprise incidence relation is constructed based on enterprise business information and enterprise unstructured information.
Further, in step S430, extracting the triple information based on the official document of the enterprise; the triplet is defined as: entity-relationship-entity; the relationship here is determined as: common original relationship, common defendant relationship, original defendant relationship. A first implementation of extracting triples from enterprise unstructured information is presented herein.
Further, in step S430, extracting triple information based on the bidding document of the enterprise; the triplet is defined as: entity-relationship-entity; the relationship here is determined as: common bidder relationship, common winning bidder relationship, winning bidder relationship. A second implementation of extracting triples from enterprise unstructured information is presented herein.
Further, in step S440, searching one or more of an entity name attribute of each enterprise node, a past name information attribute of the enterprise, trademark information in an intellectual property attribute, and a product name information attribute of the enterprise of each entity in the triple information in the intellectual map, and finding two enterprise nodes corresponding to the triple information; when any two enterprise nodes are corresponding to the same triple information, a tenth edge which represents the relationship of the unstructured information is added between the two enterprise nodes. Preferred implementations of adding edges characterizing enterprise-associated features in a knowledge graph based on enterprise unstructured information are presented herein.
Furthermore, no matter how many triples of information correspond to two enterprise nodes, only one tenth edge is added between the two enterprise nodes; the attribute of the tenth type edge is the union of the relations of all the triple information corresponding to the two enterprise nodes; the tenth class of edges has the aging property: the earliest creation time and the latest creation time in the corresponding triplet information between the two enterprise nodes. A first specific implementation of edges characterizing enterprise-associated features that are extended in a knowledge graph based on enterprise unstructured information is presented herein.
Furthermore, a plurality of triple information correspond to two enterprise nodes, and a plurality of tenth edges are newly added between the two enterprise nodes; the attribute of each tenth edge is the relationship of a triple information corresponding to the two enterprise nodes; the tenth type edge has an aging property, i.e. the creation time of the corresponding triple information. A second specific implementation of edges characterizing enterprise-associated features that are extended in a knowledge graph based on enterprise unstructured information is presented herein.
Further, step S120 is the first group, steps S430 to S440 are the third group, and the two groups are executed in the same order or in the same order, or alternately. Here, a description is given of the order between several steps in the third embodiment without limitation.
The application also discloses an enterprise incidence relation construction method which comprises the following steps. Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information. Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information. Step S130: and extending and increasing the attributes of the enterprise nodes in the knowledge graph based on the enterprise structural information. Step S140: one or more edges characterizing the enterprise having associated features are expanded in the knowledge graph based on the enterprise structured information. Step S150: and extracting structured triple information from the unstructured information of the enterprise. Step S160: adding one or more edges in the knowledge graph, which characterize the enterprise as having associated features, based on the unstructured information of the enterprise. Step S170: and expanding and updating the knowledge graph based on the aging information. The enterprise incidence relation construction method is the fourth embodiment of the application, and the enterprise incidence relation is constructed based on enterprise business information, enterprise structured information and enterprise unstructured information.
Further, step S120 is a first group, steps S330 to S340 are a second group, steps S430 to S440 are a third group, and the three groups are performed in an order, or in an order reversed, or performed simultaneously, or performed alternately. A non-limiting illustration of the order between the steps is provided herein.
The application also discloses an enterprise incidence relation construction system which comprises a map construction module, a first extension module and an aging updating module. The map building module is used for building a knowledge map which reflects stockholder investment and high management occupational relation of the enterprise and labels the same natural person by adopting a data structure of map calculation according to stock right data and high management data in the enterprise business information. The first expansion module is used for expanding and adding one or more edges which are used for representing the enterprises and have associated characteristics in the knowledge graph based on the enterprise business information. The aging updating module is used for expanding and updating the knowledge graph based on the aging information. The enterprise incidence relation construction system is an embodiment one of the application, and the enterprise incidence relation is constructed based on enterprise business information.
The application also discloses an enterprise incidence relation construction system which comprises a map construction module, a first extension module, a second extension module, a third extension module and an aging updating module. The map building module is used for building a knowledge map which reflects stockholder investment and high management occupational relation of the enterprise and labels the same natural person by adopting a data structure of map calculation according to stock right data and high management data in the enterprise business information. The first expansion module is used for expanding and adding one or more edges which are used for representing the enterprises and have associated characteristics in the knowledge graph based on the enterprise business information. The second expansion module is used for expanding the attributes of the added enterprise nodes in the knowledge graph based on the enterprise structural information. The third extension module is used for adding one or more edges which characterize the enterprise and have associated characteristics in the knowledge graph based on the enterprise structural information. The aging updating module is used for expanding and updating the knowledge graph based on the aging information. The enterprise incidence relation construction system is an embodiment two of the application, and the enterprise incidence relation is constructed based on enterprise business information and enterprise structural information.
The application also discloses an enterprise incidence relation construction system which comprises a map construction module, a first extension module, an information extraction module, a fourth extension module and an aging updating module. The map building module is used for building a knowledge map which reflects stockholder investment and high management occupational relation of the enterprise and labels the same natural person by adopting a data structure of map calculation according to stock right data and high management data in the enterprise business information. The first expansion module is used for expanding and adding one or more edges which are used for representing the enterprises and have associated characteristics in the knowledge graph based on the enterprise business information. The information extraction module is used for extracting structured triple information from the unstructured information of the enterprise. The fourth expansion module is used for adding one or more edges which characterize the enterprise and have associated characteristics in the knowledge graph based on the unstructured information of the enterprise. The aging updating module is used for expanding and updating the knowledge graph based on the aging information. The enterprise incidence relation construction system is an embodiment three of the application, and an enterprise incidence relation is constructed based on enterprise business information and enterprise unstructured information.
The application also discloses an enterprise incidence relation construction system which comprises a map construction module, a first extension module, a second extension module, a third extension module, an information extraction module, a fourth extension module and an aging updating module. The map building module is used for building a knowledge map which reflects stockholder investment and high management occupational relation of the enterprise and labels the same natural person by adopting a data structure of map calculation according to stock right data and high management data in the enterprise business information. The first expansion module is used for expanding and adding one or more edges which are used for representing the enterprises and have associated characteristics in the knowledge graph based on the enterprise business information. The second expansion module is used for expanding the attributes of the added enterprise nodes in the knowledge graph based on the enterprise structural information. The third extension module is used for adding one or more edges which characterize the enterprise and have associated characteristics in the knowledge graph based on the enterprise structural information. The information extraction module is used for extracting structured triple information from the unstructured information of the enterprise. The fourth expansion module is used for adding one or more edges which characterize the enterprise and have associated characteristics in the knowledge graph based on the unstructured information of the enterprise. The aging updating module is used for expanding and updating the knowledge graph based on the aging information. The enterprise incidence relation construction system is the fourth embodiment of the present application, and the enterprise incidence relation is constructed based on enterprise business information, enterprise structured information and enterprise unstructured information.
The method has the technical effects that the knowledge graph is constructed and stored by the graph database, enterprise business information, enterprise structural information and/or enterprise unstructured information are/is used as data sources, the association characteristics of enterprises are represented in the knowledge graph through node attributes, edge establishment and edge attributes, and aging information is widely provided.
Drawings
Fig. 1 is a flowchart of a first embodiment of an enterprise association relationship building method provided by the present application.
Fig. 2 is a detailed flowchart of the method of constructing a knowledge-graph in step S110.
Fig. 3 is a flowchart of a second embodiment of an enterprise association relationship building method provided by the present application.
Fig. 4 is a flowchart of a third embodiment of an enterprise association relationship building method provided by the present application.
Fig. 5 is a flowchart of a fourth embodiment of an enterprise association relationship building method provided by the present application.
Fig. 6 is a schematic structural diagram of a first embodiment of an enterprise association relationship building system provided by the present application.
Fig. 7 is a schematic structural diagram of a second embodiment of the enterprise association relationship building system provided by the present application.
Fig. 8 is a schematic structural diagram of a third embodiment of an enterprise association relationship building system provided by the present application.
Fig. 9 is a schematic structural diagram of a fourth embodiment of the enterprise association relation building system provided by the present application.
The reference numbers in the figures illustrate: 600. 700, 800 and 900 are enterprise association relation construction systems; 610 is a map building block; 620 is a first expansion module; 670 is a time efficiency updating module; 730 is a second expansion module; 740 is a third expansion module; 830 is an information extraction module; 840 is a fourth expansion module.
Detailed Description
Referring to fig. 1, an embodiment of the method for constructing an enterprise association relationship provided by the present application includes the following steps.
Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information.
The enterprise business information refers to information registered by an enterprise in a business administration management department, and comprises an enterprise name, an enterprise address, enterprise registered capital, enterprise share right data, enterprise high management data and the like. The stock right data refers to direct stockholders and the ratio of capital investment of the enterprise. The high management data refers to high-level manager information of the enterprise, such as legal representatives, directors, supervisors, and the like.
Preferably, in step S110, data cleaning (data cleaning) is performed on the equity data and the high-management data in the enterprise and business information, and then a knowledge graph is constructed based on the cleaned data. The data cleaning comprises one or more of enterprise basic attribute legality cleaning, stock right proportion legality detection cleaning, high management data legality cleaning, data consistency checking, invalid data eliminating and missing data filling.
Preferably, in step S110, data standardization is performed on the equity data and the high management data in the enterprise business information, and then a knowledge graph is constructed based on the standardized data. The data normalization includes one or more of the following operations. First, address information registered by a manufacturer is decomposed and standardized, and each address information is decomposed into a province, a city, a district, a road and a garden corresponding to the address information. Secondly, standardizing mailbox domain names and website domain names registered by enterprises and businesses, uniformly converting the mailbox domain names and the website domain names into upper-case letters or lower-case letters, converting all punctuation marks into half-angle marks, and simultaneously deleting some public domain names such as 163.com, qq.com, sina.com, gmail.com, sina.com.cn and the like. Thirdly, standardizing the telephone information registered by the enterprise, and decomposing the telephone number into area code, telephone number host and extension number information.
Preferably, stock right data and high management data in the enterprise business information are subjected to data cleaning, then data standardization processing is carried out, and a knowledge graph is constructed based on the cleaned and standardized data.
Referring to fig. 2, the construction of the knowledge graph specifically includes the following steps.
Step S210: and taking each enterprise in the enterprise business information and the direct shareholder and high-management personnel thereof as each node in the graph. Each node contains three attributes: entity ID, entity name, entity type. The entity ID is a unique ID given to each node as a unique identification of the node. The entity name refers to a unit name or a natural person name. Entity types include E, P, G, S, Z, etc., where E represents various types of businesses such as individual payroll businesses, individual exclusive enterprises, collaborators, enterprise legal entities, etc.; p represents a natural person; g represents a government agency; s represents a career unit; z represents a social organization. For the enterprise node, the following attributes are also available: the system comprises a business registration address, a business registered mailbox domain name, a business registered website domain name, a business registered telephone, past name information of an enterprise and product name information of the enterprise.
Step S220: and adding a first class edge representing a direct investment relation between the enterprise node and the direct shareholder node thereof based on the equity data of each enterprise. The first type of edge has a direction, which may be, for example, from a direct shareholder to the business, or may change to the opposite direction. The attribute of the first class of edges is the direct investment proportion.
Step S230: and adding a second class of edges representing the high management and duties relationship between the enterprise nodes and the high manager nodes thereof based on the high management and duties data of each enterprise. The second type of edge may or may not have a direction. The attribute of the second class of edges is the job title of the job.
Step S240: and adding third edges representing the relation of the same natural person between every two natural person nodes which have the same name and are actually the same natural person. The third type of edge is preferably non-directional. The attributes of the third class of edges are the same natural human relationship.
The execution sequence of steps S220 to S240 is not strictly limited, and the two are allowed to be either interchanged in sequence, or performed simultaneously or interleaved.
Preferably, all edges in the knowledge-graph have a type attribute to distinguish the first class of edges from the second class of edges, … ….
Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information.
For example, when any two enterprise nodes are connected by the second type of edge, and the high-management personnel nodes are the same, or the high-management personnel nodes are connected by the third type of edge, a fourth type of edge representing the same high-management personnel is newly added between the two enterprise nodes. The fourth type of edge is preferably non-directional.
Step S170: and expanding and updating the knowledge graph based on the aging information. The method is used for updating node attributes in the knowledge graph, updating the connection relation of edges which characterize the enterprise and have associated characteristics and the attributes of the edges in the knowledge graph, and particularly paying attention to updating of the aging attributes.
For example, a real-time update mode is adopted. Preferably, a real-time updating mode is adopted for enterprise business data. Acquiring data to be updated, analyzing the data into an agreed format, inquiring corresponding nodes in the knowledge graph, comparing update fields, determining whether to update or not and updating modes, and finally updating the knowledge graph.
In another example, an incremental update approach is used. Preferably, incremental updating is adopted for the structured information and the unstructured information of the enterprise, for example, the incremental updating is timed every day. And (3) for the newly added data, after calculation analysis such as desensitization and aggregation, fusing the data into the existing knowledge graph, and updating the connection relation of the edges which represent the associated characteristics of the enterprise and the attributes of the edges. Since the enterprise structured information and the enterprise unstructured information contain aging information, when the connection relationship of the edges representing the enterprise with the associated features and the attributes of the edges are updated, the aging attributes of the edges also need to be updated. For example, the age information for certain edges is the latest creation time of the associated business card template. Therefore, if the creation time of the newly added enterprise business card template is later than the current creation time, the time property of the edges is synchronously updated.
For another example, a full-scale update approach is employed. The knowledge-graph may be updated in full, either periodically or manually, for stability or special cases. And aiming at the full amount of structured data, an initial knowledge graph is constructed and edges with associated characteristics of an enterprise are represented after data processing and big data calculation and analysis. And then converting the full amount of unstructured data into triple information, and updating the knowledge graph and representing edges of the enterprise with associated characteristics.
In the first embodiment, only the enterprise business information is utilized to construct the enterprise association relationship.
Referring to fig. 3, an embodiment of the method for constructing an enterprise association relationship provided by the present application includes the following steps.
Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information.
Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information.
Step S330: and extending and increasing the attributes of the enterprise nodes in the knowledge graph based on the enterprise structural information. The enterprise structured information refers to information which is structured well, such as intellectual property information of an enterprise, business card information of employees of the enterprise and the like, and has the characteristic of high accuracy.
For example, node attributes in the intellectual graph are extended based on intellectual property information of the enterprise.
Intellectual property information of enterprises, including trademarks, patents, software copyright, work copyright, qualification certification and the like, is collected, and a unique intellectual property ID is constructed for each piece of intellectual property information. Because the intellectual property information has timeliness, three timeliness information are added to each piece of intellectual property information: application time, authorization time, and expiration time. And inquiring the name of the enterprise to which each intellectual property information belongs, searching the entity name attribute of each enterprise node and the past name information attribute of the enterprise by using the name of the enterprise, finding the enterprise node corresponding to the intellectual property information, adding an intellectual property attribute to the corresponding enterprise node, adding the intellectual property information to the intellectual property attribute of the corresponding enterprise node, and updating the intellectual property map. The intellectual property attribute comprises intellectual property type, application time, authorization time, expiration time and the like. Since the intellectual property attribute of the enterprise node has timeliness information, the timeliness information of the intellectual property attribute is also updated into the intellectual graph.
Preferably, if a piece of intellectual property information is matched to a plurality of enterprise nodes at the same time, the piece of intellectual property information is added to intellectual property attributes of the plurality of enterprise nodes at the same time.
Preferably, if a plurality of pieces of intellectual property information are contained under the intellectual property attribute of an enterprise node, the intellectual property information is aggregated into an array, and the array is added under the intellectual property attribute of the enterprise node to update the intellectual map.
As another example, node attributes in the knowledge-graph are extended based on business card information for business employees.
The method comprises the steps of collecting business card information of enterprise employees, and clearing personal privacy information such as employee names, positions, departments, personal phones, personal mailboxes and the like, namely obtaining the business card information of the desensitized enterprise employees. The rest business card information is the public information of the enterprise, and mainly comprises the information of enterprise name, address, mailbox domain name, website domain name, enterprise telephone and the like. Preferably, data cleansing and data normalization are performed on these data. And adding creation time for each piece of business card information to embody timeliness. And calculating a hash value for the public information of each business card, and aggregating the public information of the business cards with the same hash value together to obtain the enterprise business card template information. The enterprise business card template information has all public information of the business cards and also has timeliness information, and the creation time of the enterprise business card template information is the earliest creation time of all aggregated business cards. And for each enterprise business card template information, searching one or more of the entity name attribute of the enterprise name at each enterprise node, the past name information attribute of the enterprise, the trademark information in the intellectual property attribute and the product name information attribute of the enterprise, searching and matching the product name with the priority of entity name > past name > trademark > and finding the enterprise node corresponding to the enterprise business card template information. And adding a name card template attribute for the corresponding enterprise node, wherein the name card template attribute comprises an enterprise name, an address, a mailbox domain name, a website domain name, an enterprise telephone, creation time and the like, and updating the knowledge graph. Because business card template attributes of the enterprise nodes have timeliness information, the timeliness information of the business card template attributes is also updated into the knowledge graph.
Preferably, if a plurality of enterprise business card template information are matched with the same enterprise node, the enterprise business card template information is aggregated into an array and then added into the business card template attribute of the corresponding enterprise node.
Step S340: one or more edges characterizing the enterprise having associated features are expanded in the knowledge graph based on the enterprise structured information.
For example, when the intellectual property attribute of any two enterprise nodes contains at least one piece of the same intellectual property information, and the intellectual property ID is used as a judgment basis, a fifth class of edges representing the same intellectual property are added between the two enterprise nodes. The fifth class of edges has aging properties. And in the intellectual property attributes of the two enterprise nodes connected by any fifth-class edge, the earliest application time, the earliest authorization time and the latest failure time in the same intellectual property information are taken as the aging attributes of the fifth-class edge.
For another example, when the business registered address attribute and the business card template attribute of any two enterprise nodes contain at least one same or similar address, and the similarity means that the two enterprise nodes are located in the same office building, a sixth class edge representing the common address is newly added between the two enterprise nodes. The sixth class of edges has aging properties. And if the business card template attributes of the two enterprise nodes connected by any sixth edge have the same or similar address information, taking the latest creation time in the business card template attributes of the two enterprise nodes as the aging attribute of the sixth edge.
For another example, when at least one identical mailbox domain name is included between the mailbox domain name attribute and the collection of business card template attributes registered by the industry and commerce of any two enterprise nodes, a seventh class edge representing the same mailbox domain name is newly added between the two enterprise nodes. The seventh class of edges has aging properties. And if the business card template attributes of the two enterprise nodes connected by any seventh edge have the same mailbox domain name, taking the latest creation time in the business card template attributes of the two enterprise nodes as the timeliness attribute of the seventh edge.
For another example, when at least one identical website domain name is included between the website domain name attribute registered by the industry and commerce of any two enterprise nodes and the collection of the business card template attribute, an eighth class edge representing the website domain name with the same attribute is newly added between the two enterprise nodes. The eighth class of edges has aging properties. And if the business card template attributes of the two enterprise nodes connected by any eighth edge have the same website domain name, taking the latest creation time in the business card template attributes of the two enterprise nodes as the aging attribute of the eighth edge.
For another example, when at least one same phone is included between the sets of business card template attributes and business telephone attributes registered by the two enterprise nodes, which means that the phone number hosts after the area code and the extension code are removed are the same, a ninth type edge representing the same phone is added between the two enterprise nodes. The ninth type of edge has aging properties. And if the same telephone exists in the business card template attributes of the two enterprise nodes connected by any ninth edge, taking the latest creation time in the business card template attributes of the two enterprise nodes as the aging attribute of the ninth edge.
Step S170: and expanding and updating the knowledge graph based on the aging information.
In the second embodiment, the enterprise business information and the enterprise structural information are simultaneously utilized to construct the enterprise association relationship. Wherein, the step S120 is the first group, the steps S330 to S340 are the second group, the execution order of the two groups is not strictly limited, and the two groups are allowed to be either exchanged in order, or performed simultaneously or performed alternately.
Referring to fig. 4, a third embodiment of the enterprise association relationship construction method provided by the present application includes the following steps.
Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information.
Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information.
Step S430: and extracting structured triple information from the unstructured information of the enterprise. The unstructured enterprise information refers to free text information related to an enterprise, and it is necessary to convert unstructured information into structured information that can be read and understood by a computer through technologies such as natural language processing. Errors may occur during the structured processing, so unstructured information generally serves as a richness and supplement to the construction of business associations.
In this step, machine learning techniques are used to extract the required entity information from the free text relating to the business. For example, Bert, BilSTM, CRF algorithms are employed to extract business names from referee documents and bid documents as entities. Machine learning techniques are employed to classify a relationship between two entities in free text relating to a business. The relationship between any two entities is determined from the referee document and the bid document, for example, using the Bert and MLP algorithms. Since there is no relationship between many entities, this option of no relationship can be output in the output result of Bert and MLP algorithms. A pair of entities and their relationship constitutes a structured triplet of information. Since the unstructured data is time-efficient, for each relationship, time-efficient information is also added as creation time.
For example, the triple information is extracted based on the official document of the enterprise. The triplet is defined as: entity-relationship-entity. The relationship here is determined as: common original relationship, common defendant relationship, original defendant relationship.
As another example, the triplet information is extracted based on the bid and bid document of the enterprise. The triplet is defined as: entity-relationship-entity. The relationship here is determined as: common bidder relationship, common winning bidder relationship, winning bidder relationship.
Step S440: adding one or more edges in the knowledge graph, which characterize the enterprise as having associated features, based on the unstructured information of the enterprise.
Searching one or more of an entity name attribute of each enterprise node, a past name information attribute of an enterprise, trademark information in an intellectual property attribute and a product name information attribute of the enterprise of each entity in the triple information in an intellectual map, searching and matching the entity name, the past name, the trademark and the product name with the priority of entity name, past name and trademark and finding out two enterprise nodes corresponding to the triple information.
When any two enterprise nodes are corresponding to the same triple information, a tenth edge which represents the relationship of the unstructured information is added between the two enterprise nodes.
One implementation is that no matter how many triplets correspond to two enterprise nodes, only one tenth edge is added between the two enterprise nodes. The attribute of the tenth type edge is the union of the relations of all the triple information corresponding to the two enterprise nodes. The tenth class of edges has the aging property: the earliest creation time and the latest creation time in the corresponding triplet information between the two enterprise nodes.
The other realization mode is as follows: and if the two enterprise nodes are corresponding to a plurality of triple information, adding a plurality of tenth edges between the two enterprise nodes. The attribute of each tenth edge is the relationship of a triple information corresponding to the two enterprise nodes. The tenth type edge has an aging property, i.e. the creation time of the corresponding triple information.
Step S170: and expanding and updating the knowledge graph based on the aging information.
In the third embodiment, the enterprise business information and the enterprise unstructured information are simultaneously utilized to construct the enterprise association relationship. Wherein step S120 is the first group, steps S430 to S440 are the third group, the execution order of the two groups is not strictly limited, and the two groups are allowed to be either exchanged in order, or performed simultaneously or performed alternately.
Referring to fig. 5, a fourth embodiment of the method for constructing an enterprise association relationship provided by the present application includes the following steps.
Step S110: and constructing a knowledge graph which reflects the stockholder investment and high management duties relationship of the enterprise and marks the same natural person by adopting a data structure calculated by a graph according to the stock right data and the high management data in the enterprise business information.
Step S120: one or more edges which are used for representing the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information.
Step S330: and extending and increasing the attributes of the enterprise nodes in the knowledge graph based on the enterprise structural information. The enterprise structured information refers to information which is structured well, such as intellectual property information of an enterprise, business card information of employees of the enterprise and the like, and has the characteristic of high accuracy.
Step S340: one or more edges characterizing the enterprise having associated features are expanded in the knowledge graph based on the enterprise structured information.
Step S430: and extracting structured triple information from the unstructured information of the enterprise.
Step S440: adding one or more edges in the knowledge graph, which characterize the enterprise as having associated features, based on the unstructured information of the enterprise.
Step S170: and expanding and updating the knowledge graph based on the aging information.
In the fourth embodiment, the enterprise business information, the enterprise structured information and the enterprise unstructured information are simultaneously utilized to construct the enterprise association relationship. Wherein, step S120 is a first group, steps S330 to S340 are a second group, and steps S430 to S440 are a third group, the execution order of the three groups is not strictly limited, and the three groups are either interchanged in order, or performed simultaneously, or performed alternately, all of which are allowed.
Referring to fig. 6, in correspondence with the first embodiment of the enterprise association relationship construction method, the present application further provides a first embodiment of an enterprise association relationship construction system. The enterprise incidence relation building system 600 comprises a map building module 610, a first extending module 620 and an aging updating module 670.
The map construction module 610 is used for constructing a knowledge map which reflects stockholder investment and high managerial duties relationship of an enterprise and labels the same natural person by adopting a data structure of map calculation according to stock right data and high managerial data in the enterprise business information. In the constructed knowledge graph, each enterprise and the direct shareholder and high-management personnel thereof are respectively used as each node in the graph. Each node contains three attributes: entity ID, entity name, entity type. The enterprise node also has the following attributes: the system comprises a business registration address, a business registered mailbox domain name, a business registered website domain name, a business registered telephone, past name information of an enterprise and product name information of the enterprise. The direct investment relation and the direct investment proportion of the direct stockholder nodes to the enterprise nodes are represented by directional first class edges. The occupational relationship and the occupational duties of the high-management personnel nodes in the enterprise nodes are represented by the second class edges. The same natural human relationship is represented by a third class of edges.
The first expansion module 620 is used for expanding and adding one or more edges which characterize the enterprise and have associated features in the knowledge graph based on the enterprise business information.
The age update module 670 is used to extend and update the knowledge-graph based on age information.
In the first embodiment, only the enterprise business information is utilized to construct the enterprise association relationship.
Referring to fig. 7, in correspondence with the second embodiment of the enterprise association relationship construction method, the present application further provides a second embodiment of an enterprise association relationship construction system. The enterprise association relationship building system 700 comprises a map building module 610, a first extension module 620, a second extension module 730, a third extension module 740 and an aging updating module 670. The three modules are the same as the first embodiment and are not described again.
The second expansion module 730 is used to expand the attributes of the added enterprise nodes in the knowledge-graph based on the enterprise structured information.
The third extension module 740 is configured to add one or more edges in the knowledge-graph that characterize the business as having associated features based on the business structured information.
In the second embodiment, the enterprise business information and the enterprise structural information are simultaneously utilized to construct the enterprise association relationship.
Referring to fig. 8, in correspondence with the third embodiment of the enterprise association relationship construction method, the present application further provides a third embodiment of an enterprise association relationship construction system. The enterprise association relationship building system 800 comprises a map building module 610, a first extension module 620, an information extraction module 830, a fourth extension module 840 and an aging updating module 670. The three modules are the same as the first embodiment and are not described again.
The information extraction module 250 is used to extract structured triple information from the unstructured information of the enterprise.
The fourth extension module 260 is used to add one or more edges in the knowledge-graph that characterize the business as having associated features based on the business unstructured information.
The third embodiment simultaneously utilizes the enterprise business information and the enterprise waste structured information to construct the enterprise association relationship.
Referring to fig. 9, in correspondence with the fourth embodiment of the enterprise association relationship construction method, the present application further provides a fourth embodiment of an enterprise association relationship construction system. The enterprise association relationship building system 900 comprises a map building module 610, a first extension module 620, a second extension module 730, a third extension module 740, an information extraction module 830, a fourth extension module 840 and an aging updating module 670. The three modules are the same as the first embodiment, the two modules are the same as the second embodiment, and the two modules are the same as the third embodiment, which is not described again.
In the fourth embodiment, the enterprise business information, the enterprise structured information and the enterprise unstructured information are simultaneously utilized to construct the enterprise association relationship.
The method and the system construct and store the knowledge graph which reflects the equity investment and the high management and arbitrary function relationship of the enterprise and marks the same natural person based on the graph database, and expand and increase the node attributes and edges representing the association characteristics of the enterprise in the knowledge graph based on the enterprise business information, the enterprise structural information and the enterprise unstructured information, so that the method and the system are favorable for calculating and storing the enterprise association relationship and the suspected enterprise association relationship through graph calculation in the follow-up process.
The above are merely preferred embodiments of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (32)

1. An enterprise incidence relation construction method is characterized by comprising the following steps:
step S110: according to the equity data and the high management data in the enterprise business information, a data structure calculated by a graph is adopted to construct a knowledge graph which reflects the stockholder investment and the high management duties of the enterprise and marks the same natural person;
step S120: one or more edges which represent the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information;
step S170: and expanding and updating the knowledge graph based on the aging information.
2. The method as claimed in claim 1, wherein in step S110, the stock right data and the high-management data in the enterprise business information are first cleaned, and then a knowledge graph is constructed based on the cleaned data.
3. The method for constructing the enterprise association relationship as claimed in claim 2, wherein the data cleaning comprises one or more of enterprise basic attribute legality cleaning, stock right proportion legality detection cleaning, high-management data legality cleaning, data consistency checking, invalid data eliminating and missing data filling.
4. The method for building enterprise association relationship as claimed in claim 1, wherein in step S110, the stock right data and the high management data in the enterprise business information are standardized, and then a knowledge graph is built based on the standardized data.
5. The method of claim 4, wherein the data normalization comprises one or more of the following operations; firstly, address information registered by a manufacturer is decomposed and standardized; standardizing mailbox domain names and website domain names registered by enterprises and businesses, and deleting a public domain name; thirdly, standardizing the telephone information registered by the enterprise.
6. The method for constructing the enterprise association relationship as claimed in claim 2 or 4, wherein stock right data and high management data in the enterprise business information are subjected to data cleaning, then subjected to data standardization processing, and then a knowledge graph is constructed based on the cleaned and standardized data.
7. The method for constructing the enterprise association relationship as claimed in claim 1, wherein each enterprise and the direct shareholder and high-management personnel thereof in the knowledge graph are respectively used as each node in the graph; the direct investment relation of the direct shareholder node to the enterprise node is represented by a first type edge; the high management occupational relationship of the high management personnel nodes at the enterprise nodes is represented by a second class of edges; the same natural human relationship is represented by a third class of edges.
8. The method of claim 7, wherein each node comprises three attributes: entity ID, entity name, entity type; the enterprise node also has the following attributes: the system comprises a business registration address, a business registered mailbox domain name, a business registered website domain name, a business registered telephone, past name information of an enterprise and product name information of the enterprise.
9. The method of claim 7, wherein the edges each have a type attribute to distinguish between different types of edges.
10. The method according to claim 1, wherein in step S120, when the high-management person nodes connected by the second type of edge of any two enterprise nodes are the same, or there is a connection of the third type of edge between the high-management person nodes, a fourth type of edge representing the same high-management person is added between the two enterprise nodes.
11. The method for building enterprise relationship according to claim 1, wherein in step S170, the updating manner includes one or more combinations of real-time updating, incremental updating and full-volume updating.
12. The method for building the enterprise association relationship as claimed in claim 11, wherein a real-time updating mode is adopted for enterprise business data.
13. The method for constructing the enterprise incidence relation according to claim 11, wherein an incremental updating mode is adopted for the enterprise structured information and the enterprise unstructured information, and comprises the steps of updating the connection relation of edges which represent the enterprise and have incidence characteristics and the attributes of the edges in a knowledge graph; the attributes of the edge include aging attributes.
14. An enterprise incidence relation construction method is characterized by comprising the following steps:
step S110: according to the equity data and the high management data in the enterprise business information, a data structure calculated by a graph is adopted to construct a knowledge graph which reflects the stockholder investment and the high management duties of the enterprise and marks the same natural person;
step S120: one or more edges which represent the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information;
step S330: expanding and increasing the attributes of the enterprise nodes in the knowledge graph based on the enterprise structural information;
step S340: one or more edges which represent the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise structural information;
step S170: and expanding and updating the knowledge graph based on the aging information.
15. The method according to claim 14, wherein in step S330, the node attributes in the intellectual property map are extended based on intellectual property information of the enterprise;
the intellectual property information of an enterprise is collected, and three timeliness information are added for each piece of intellectual property information: application time, authorization time and expiration time; inquiring the name of the enterprise to which each intellectual property information belongs, searching in the entity name attribute of each enterprise node and the past name information attribute of the enterprise by using the name of the enterprise, finding the enterprise node corresponding to the intellectual property information, adding an intellectual property attribute for the corresponding enterprise node, and adding the intellectual property information into the intellectual property attribute of the corresponding enterprise node; intellectual property attributes include intellectual property type, application time, authorization time, and expiration time.
16. The method according to claim 14, wherein in step S330, the node attributes in the knowledge-graph are expanded based on business card information of employees of the enterprise;
collecting business card information of enterprise employees, and clearing personal privacy information, wherein the rest business card information is public information of enterprises; adding creation time for each piece of business card information; calculating a hash value for the public information of each business card, and aggregating the public information of the business cards with the same hash value together to obtain enterprise business card template information; the creation time of the enterprise business card template information is the earliest creation time of all aggregated business cards; for each enterprise business card template information, searching by utilizing one or more of the entity name attribute of the enterprise name in each enterprise node, the past name information attribute of the enterprise, the trademark information in the intellectual property attribute and the product name information attribute of the enterprise to find out the enterprise node corresponding to the enterprise business card template information; and adding a name card template attribute for the corresponding enterprise node, wherein the name card template attribute comprises an enterprise name, an address, a mailbox domain name, a website domain name, an enterprise telephone and creation time.
17. The method according to claim 14, wherein the step S340 includes any one or more of the following operations;
when the intellectual property attributes of any two enterprise nodes contain at least one piece of same intellectual property information, and the intellectual property ID is taken as a judgment basis, a fifth class of edges representing the same intellectual property are newly added between the two enterprise nodes;
when the business registered address attribute and the name card template attribute of any two enterprise nodes contain at least one same or similar address, wherein the similarity means that the two enterprise nodes are located in the same office building, a sixth class edge representing the common address is added between the two enterprise nodes;
when at least one same mailbox domain name is contained between the set of mailbox domain name attribute and business card template attribute registered by the industry and commerce of any two enterprise nodes, a seventh class edge representing the same mailbox domain name is newly added between the two enterprise nodes; the seventh class of edges has aging properties;
when at least one identical website domain name is included between the website domain name attribute registered by the industry and commerce of any two enterprise nodes and the collection of the name card template attribute, an eighth class edge representing the website domain name with the same attribute is newly added between the two enterprise nodes; the eighth class of edges has aging properties;
when at least one same telephone is included between the sets of the telephone attribute and the business card template attribute registered by the industry and commerce of any two enterprise nodes, which means that the telephone number hosts after the area code and the extension code are removed are the same, a ninth edge with the same telephone is newly added between the two enterprise nodes.
18. The method of claim 17, wherein the fifth to ninth types of edges have aging properties;
in the intellectual property attributes of the two enterprise nodes connected by any fifth-class edge, the earliest application time, the earliest authorization time and the latest failure time in the same intellectual property information are taken as the aging attributes of the fifth-class edge;
if the business card template attributes of two enterprise nodes connected by any sixth edge have the same or similar address information, the latest creation time in the business card template attributes of the two enterprise nodes is taken as the aging attribute of the sixth edge;
if the business card template attributes of two enterprise nodes connected by any seventh edge have the same mailbox domain name, the latest creation time in the business card template attributes of the two enterprise nodes is taken as the timeliness attribute of the seventh edge;
if the business card template attributes of the two enterprise nodes connected by any eighth edge have the same website domain name, the latest creation time in the business card template attributes of the two enterprise nodes is taken as the aging attribute of the eighth edge;
and if the same telephone exists in the business card template attributes of the two enterprise nodes connected by any ninth edge, taking the latest creation time in the business card template attributes of the two enterprise nodes as the aging attribute of the ninth edge.
19. The method as claimed in claim 14, wherein step S120 is a first group, steps S330 to S340 are a second group, and the two groups are performed sequentially or in a sequence reversed, or performed simultaneously or interleaved.
20. An enterprise incidence relation construction method is characterized by comprising the following steps:
step S110: according to the equity data and the high management data in the enterprise business information, a data structure calculated by a graph is adopted to construct a knowledge graph which reflects the stockholder investment and the high management duties of the enterprise and marks the same natural person;
step S120: one or more edges which represent the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information;
step S430: extracting structured triple information from the unstructured information of the enterprise;
step S440: adding one or more edges which characterize the enterprise and have associated characteristics in the knowledge graph based on the unstructured information of the enterprise;
step S170: and expanding and updating the knowledge graph based on the aging information.
21. The method as claimed in claim 20, wherein in step S430, the triplet information is extracted based on the official document of the enterprise; the triplet is defined as: entity-relationship-entity; the relationship here is determined as: common original relationship, common defendant relationship, original defendant relationship.
22. The method as claimed in claim 20, wherein in step S430, the triplet information is extracted based on the bidding document of the enterprise; the triplet is defined as: entity-relationship-entity; the relationship here is determined as: common bidder relationship, common winning bidder relationship, winning bidder relationship.
23. The method for building enterprise association relationship according to claim 20, wherein in step S440, one or more of an entity name attribute of each enterprise node, a past name information attribute of the enterprise, a trademark information in an intellectual property attribute, and a product name information attribute of the enterprise of each entity in the triple information in the intellectual map are searched to find two enterprise nodes corresponding to the triple information; when any two enterprise nodes are corresponding to the same triple information, a tenth edge which represents the relationship of the unstructured information is added between the two enterprise nodes.
24. The method according to claim 23, wherein no matter how many triplets correspond to two enterprise nodes, only a tenth edge is added between the two enterprise nodes; the attribute of the tenth type edge is the union of the relations of all the triple information corresponding to the two enterprise nodes; the tenth class of edges has the aging property: the earliest creation time and the latest creation time in the corresponding triplet information between the two enterprise nodes.
25. The method according to claim 23, wherein a plurality of triplets correspond to two enterprise nodes, and a plurality of tenth edges are added between the two enterprise nodes; the attribute of each tenth edge is the relationship of a triple information corresponding to the two enterprise nodes; the tenth type edge has an aging property, i.e. the creation time of the corresponding triple information.
26. The method as claimed in claim 20, wherein step S120 is a first group, steps S430 to S440 are a third group, and the two groups are performed sequentially or in a reversed order, or simultaneously or alternately.
27. An enterprise incidence relation construction method is characterized by comprising the following steps:
step S110: according to the equity data and the high management data in the enterprise business information, a data structure calculated by a graph is adopted to construct a knowledge graph which reflects the stockholder investment and the high management duties of the enterprise and marks the same natural person;
step S120: one or more edges which represent the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise business information;
step S130: expanding and increasing the attributes of the enterprise nodes in the knowledge graph based on the enterprise structural information;
step S140: one or more edges which represent the enterprises and have associated characteristics are expanded and added in the knowledge graph based on the enterprise structural information;
step S150: extracting structured triple information from the unstructured information of the enterprise;
step S160: adding one or more edges which characterize the enterprise and have associated characteristics in the knowledge graph based on the unstructured information of the enterprise;
step S170: and expanding and updating the knowledge graph based on the aging information.
28. The method as claimed in claim 27, wherein step S120 is a first group, steps S130 to S140 are a second group, and steps S150 to S160 are a third group, and the three groups are performed in an order, or a combination thereof, or alternately.
29. An enterprise incidence relation construction system is characterized by comprising a map construction module, a first extension module and an aging updating module;
the map construction module is used for constructing a knowledge map which reflects the stockholder investment and high-management occupational relation of the enterprise and labels the same natural person by adopting a data structure of map calculation according to the stock right data and the high-management data in the enterprise business information;
the first expansion module is used for expanding and adding one or more edges which are used for representing the enterprises and have associated characteristics in the knowledge graph based on the enterprise business information;
the aging updating module is used for expanding and updating the knowledge graph based on the aging information.
30. An enterprise incidence relation construction system is characterized by comprising a map construction module, a first extension module, a second extension module, a third extension module and an aging updating module;
the map construction module is used for constructing a knowledge map which reflects the stockholder investment and high-management occupational relation of the enterprise and labels the same natural person by adopting a data structure of map calculation according to the stock right data and the high-management data in the enterprise business information;
the first expansion module is used for expanding and adding one or more edges which are used for representing the enterprises and have associated characteristics in the knowledge graph based on the enterprise business information;
the second expansion module is used for expanding and increasing the attributes of the enterprise nodes in the knowledge graph based on the enterprise structural information;
the third expansion module is used for adding one or more edges which characterize the enterprise and have associated characteristics in the knowledge graph based on the enterprise structural information;
the aging updating module is used for expanding and updating the knowledge graph based on the aging information.
31. An enterprise incidence relation construction system is characterized by comprising a map construction module, a first extension module, an information extraction module, a fourth extension module and an aging updating module;
the map construction module is used for constructing a knowledge map which reflects the stockholder investment and high-management occupational relation of the enterprise and labels the same natural person by adopting a data structure of map calculation according to the stock right data and the high-management data in the enterprise business information;
the first expansion module is used for expanding and adding one or more edges which are used for representing the enterprises and have associated characteristics in the knowledge graph based on the enterprise business information;
the information extraction module is used for extracting structured triple information from the unstructured information of the enterprise;
the fourth expansion module is used for adding one or more edges which characterize the enterprise and have associated characteristics in the knowledge graph based on the unstructured information of the enterprise;
the aging updating module is used for expanding and updating the knowledge graph based on the aging information.
32. An enterprise incidence relation construction system is characterized by comprising a map construction module, a first extension module, a second extension module, a third extension module, an information extraction module, a fourth extension module and an aging updating module;
the map construction module is used for constructing a knowledge map which reflects the stockholder investment and high-management occupational relation of the enterprise and labels the same natural person by adopting a data structure of map calculation according to the stock right data and the high-management data in the enterprise business information;
the first expansion module is used for expanding and adding one or more edges which are used for representing the enterprises and have associated characteristics in the knowledge graph based on the enterprise business information;
the second expansion module is used for expanding and increasing the attributes of the enterprise nodes in the knowledge graph based on the enterprise structural information;
the third expansion module is used for adding one or more edges which characterize the enterprise and have associated characteristics in the knowledge graph based on the enterprise structural information;
the information extraction module is used for extracting structured triple information from the unstructured information of the enterprise;
the fourth expansion module is used for adding one or more edges which characterize the enterprise and have associated characteristics in the knowledge graph based on the unstructured information of the enterprise;
the aging updating module is used for expanding and updating the knowledge graph based on the aging information.
CN201910878683.3A 2019-09-18 2019-09-18 Enterprise association relationship construction method and system Pending CN110781246A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910878683.3A CN110781246A (en) 2019-09-18 2019-09-18 Enterprise association relationship construction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910878683.3A CN110781246A (en) 2019-09-18 2019-09-18 Enterprise association relationship construction method and system

Publications (1)

Publication Number Publication Date
CN110781246A true CN110781246A (en) 2020-02-11

Family

ID=69383519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910878683.3A Pending CN110781246A (en) 2019-09-18 2019-09-18 Enterprise association relationship construction method and system

Country Status (1)

Country Link
CN (1) CN110781246A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414485A (en) * 2020-03-17 2020-07-14 北京恒通慧源大数据技术有限公司 Enterprise customer association relation map construction method and device, storage and computer
CN111444189A (en) * 2020-04-17 2020-07-24 贝壳技术有限公司 Data processing method, device, medium and electronic equipment
CN111984643A (en) * 2020-06-29 2020-11-24 联想(北京)有限公司 Knowledge graph construction method and device, knowledge graph system and equipment
CN112115278A (en) * 2020-09-28 2020-12-22 中国建设银行股份有限公司 Actual control person relation mining method and device based on knowledge graph
CN112487209A (en) * 2020-12-15 2021-03-12 厦门市美亚柏科信息股份有限公司 String mark behavior analysis method based on knowledge graph, terminal equipment and storage medium
CN112633889A (en) * 2020-11-12 2021-04-09 中科金审(北京)科技有限公司 Enterprise gene sequencing system and method
CN113315866A (en) * 2021-07-21 2021-08-27 深圳市百诺信科技有限公司 Enterprise telephone updating method and related product
CN113407734A (en) * 2021-07-14 2021-09-17 重庆富民银行股份有限公司 Construction method of knowledge map system based on real-time big data
CN113538147A (en) * 2021-07-27 2021-10-22 北京金堤征信服务有限公司 Stock right detail data generation method and device and electronic equipment
CN113591158A (en) * 2021-07-23 2021-11-02 新奥数能科技有限公司 Method and device for processing newly added information of energy enterprise, computer equipment and medium
CN113779273A (en) * 2021-09-16 2021-12-10 平安国际智慧城市科技股份有限公司 Method, device, computer and medium for mining enterprise information based on knowledge graph
CN113849579A (en) * 2021-09-27 2021-12-28 支付宝(杭州)信息技术有限公司 Knowledge graph data processing method and system based on knowledge view
CN114385833A (en) * 2022-03-23 2022-04-22 支付宝(杭州)信息技术有限公司 Method and device for updating knowledge graph
CN115774793A (en) * 2023-01-29 2023-03-10 上海蜜度信息技术有限公司 Detection method and system for timeliness of mechanism, electronic device and storage medium
CN117408584A (en) * 2023-12-07 2024-01-16 国网智能电网研究院有限公司 Carbon asset operation data model construction method, device, equipment and medium
CN113849579B (en) * 2021-09-27 2024-06-28 支付宝(杭州)信息技术有限公司 Knowledge graph data processing method and system based on knowledge view

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414485A (en) * 2020-03-17 2020-07-14 北京恒通慧源大数据技术有限公司 Enterprise customer association relation map construction method and device, storage and computer
CN111414485B (en) * 2020-03-17 2022-09-30 北京恒通慧源大数据技术有限公司 Enterprise customer association relationship map construction method and device, storage and computer
CN111444189A (en) * 2020-04-17 2020-07-24 贝壳技术有限公司 Data processing method, device, medium and electronic equipment
CN111444189B (en) * 2020-04-17 2021-04-16 北京房江湖科技有限公司 Data processing method, device, medium and electronic equipment
CN111984643A (en) * 2020-06-29 2020-11-24 联想(北京)有限公司 Knowledge graph construction method and device, knowledge graph system and equipment
CN112115278A (en) * 2020-09-28 2020-12-22 中国建设银行股份有限公司 Actual control person relation mining method and device based on knowledge graph
CN112633889A (en) * 2020-11-12 2021-04-09 中科金审(北京)科技有限公司 Enterprise gene sequencing system and method
CN112487209A (en) * 2020-12-15 2021-03-12 厦门市美亚柏科信息股份有限公司 String mark behavior analysis method based on knowledge graph, terminal equipment and storage medium
CN113407734A (en) * 2021-07-14 2021-09-17 重庆富民银行股份有限公司 Construction method of knowledge map system based on real-time big data
CN113315866A (en) * 2021-07-21 2021-08-27 深圳市百诺信科技有限公司 Enterprise telephone updating method and related product
CN113591158A (en) * 2021-07-23 2021-11-02 新奥数能科技有限公司 Method and device for processing newly added information of energy enterprise, computer equipment and medium
CN113538147A (en) * 2021-07-27 2021-10-22 北京金堤征信服务有限公司 Stock right detail data generation method and device and electronic equipment
CN113538147B (en) * 2021-07-27 2024-02-09 北京金堤征信服务有限公司 Stock right detail data generation method and device and electronic equipment
CN113779273A (en) * 2021-09-16 2021-12-10 平安国际智慧城市科技股份有限公司 Method, device, computer and medium for mining enterprise information based on knowledge graph
CN113849579A (en) * 2021-09-27 2021-12-28 支付宝(杭州)信息技术有限公司 Knowledge graph data processing method and system based on knowledge view
CN113849579B (en) * 2021-09-27 2024-06-28 支付宝(杭州)信息技术有限公司 Knowledge graph data processing method and system based on knowledge view
CN114385833A (en) * 2022-03-23 2022-04-22 支付宝(杭州)信息技术有限公司 Method and device for updating knowledge graph
WO2023179176A1 (en) * 2022-03-23 2023-09-28 支付宝(杭州)信息技术有限公司 Knowledge graph updating method and apparatus
CN115774793A (en) * 2023-01-29 2023-03-10 上海蜜度信息技术有限公司 Detection method and system for timeliness of mechanism, electronic device and storage medium
CN115774793B (en) * 2023-01-29 2023-05-30 上海蜜度信息技术有限公司 Mechanism timeliness detection method, system, electronic equipment and storage medium
CN117408584A (en) * 2023-12-07 2024-01-16 国网智能电网研究院有限公司 Carbon asset operation data model construction method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN110781246A (en) Enterprise association relationship construction method and system
Ilyas et al. Data cleaning
US10762561B2 (en) Systems and methods for improving computation efficiency in the detection of fraud indicators for loans
Kościelniak et al. BIG DATA in decision making processes of enterprises
CN110597870A (en) Enterprise relation mining method
CN110826976A (en) Enterprise actual controller operation system and method
WO2019109698A1 (en) Method and apparatus for determining target user group
CN110825817B (en) Enterprise suspected association judgment method and system
CN109635276B (en) Information matching method and terminal
US20230061746A1 (en) Managing hierarchical data structures for entity matching
CN111125116B (en) Method and system for positioning code field in service table and corresponding code table
CN111383004A (en) Method for extracting entity position of digital currency, method for extracting information and device thereof
US10311093B2 (en) Entity resolution from documents
CN111191123A (en) Business information pushing method and device, readable storage medium and computer equipment
CN113722617A (en) Method and device for identifying actual office address of enterprise and electronic equipment
CN113987190A (en) Data quality check rule extraction method and system
CN110765317A (en) Enterprise beneficiary operation system and method
CN115687787A (en) Industry policy target group portrait construction method, system and storage medium
Lan et al. Urban morphology and residential differentiation across Great Britain, 1881–1901
CN110705297A (en) Enterprise name-identifying method, system, medium and equipment
CN110781311A (en) Enterprise consistent action calculation system and method
CN113792081A (en) Method and system for automatically checking data assets
CN112416992A (en) Industry type identification method, system and equipment based on big data and keywords
CN112527813A (en) Data processing method and device of business system, electronic equipment and storage medium
CN109919811B (en) Insurance agent culture scheme generation method based on big data and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210223

Address after: Room 1105-1123, 1256 and 1258 Wanrong Road, Jing'an District, Shanghai, 200436

Applicant after: Shanghai hehe Information Technology Co., Ltd

Applicant after: Shanghai Shengteng Data Technology Co.,Ltd.

Applicant after: Shanghai Linguan Data Technology Co.,Ltd.

Applicant after: Shanghai yingwuchu Data Technology Co.,Ltd.

Address before: Room 1601-120, 238 JIANGCHANG Third Road, Jing'an District, Shanghai, 200436

Applicant before: Shanghai Shengteng Data Technology Co.,Ltd.