WO2022267769A1 - 图数据生成的方法及装置 - Google Patents

图数据生成的方法及装置 Download PDF

Info

Publication number
WO2022267769A1
WO2022267769A1 PCT/CN2022/093771 CN2022093771W WO2022267769A1 WO 2022267769 A1 WO2022267769 A1 WO 2022267769A1 CN 2022093771 W CN2022093771 W CN 2022093771W WO 2022267769 A1 WO2022267769 A1 WO 2022267769A1
Authority
WO
WIPO (PCT)
Prior art keywords
vertex
entity
account
entity account
relationship
Prior art date
Application number
PCT/CN2022/093771
Other languages
English (en)
French (fr)
Inventor
黄科
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2022267769A1 publication Critical patent/WO2022267769A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Definitions

  • the embodiments of this specification generally relate to the field of benchmark testing, and in particular, relate to a method and device for generating graph data applied to benchmark testing.
  • the embodiments of the present specification provide a method and an apparatus for generating graph data applied to a benchmark test. With the method and device, graph data for benchmark testing can be efficiently generated.
  • a method for generating graph data applied to a benchmark test including: creating a plurality of entity vertices and corresponding entity account vertices of each entity vertex; Create an ownership relationship between account vertices; determine the starting entity account vertex set and the end entity account vertex set according to the created entity account vertex, and there is no overlapping entity between the starting entity account vertex set and the end entity account vertex set account vertices; and based on the set of entity account vertices of the start point and the set of vertices of entity account vertices of the end point, create an account association relationship between the entity account vertices.
  • the account vertex attributes of each entity account vertex include account association attributes
  • the method may further include: creating an account attribute vertex based on the account association attributes of each entity account vertex; An account attribute relationship is created between each account attribute vertex and between each account attribute vertex and the corresponding entity account vertex.
  • the entity vertex includes a personal vertex and an organization vertex
  • the entity account vertex includes a personal account vertex and an organization account vertex
  • the account attribute vertex includes account registration address, registration phone number, and login network address and at least one of the registered physical addresses
  • the account attribute relationship includes at least one of a location relationship, a phone registration relationship, a registered network address relationship and a registered physical address relationship.
  • the method may further include: acquiring vertex out-degree distribution information of entity vertices.
  • creating a corresponding entity account vertex of each entity vertex may include: creating a corresponding entity account vertex of each entity vertex according to the vertex out-degree distribution information.
  • the account vertex attributes of each entity account vertex include vertex out-degree and vertex in-degree, and an account between entity account vertices is created based on the starting entity account vertex set and the end entity account vertex set
  • the association relationship may include: according to the vertex out-degree of each start entity account vertex in the start entity account vertex set and the vertex in-degree of each end entity account vertex in the end entity account vertex set, determine each start entity account vertex and each The selection probability of the terminal entity account vertex; based on the selection probability of each starting entity account vertex and each terminal entity account vertex, select at least one starting entity account vertex and corresponding from the starting entity account vertex set and the terminal entity account vertex set End entity account vertex; calculate the attribute distance between the selected start entity account vertex and the corresponding end entity account vertex; based on the calculated attribute distance, determine the distance between the selected start entity account vertex and the corresponding end entity account vertex relationship creation probability; and according to the relationship creation probability, create an account association
  • the creation process of the account association relationship is executed cyclically until no new account association relationship is created, wherein the relationship creation probability used in each cyclic process is determined by the previous cyclic process.
  • the relationship creation probability is obtained by decaying.
  • the selection process from the start entity account vertex and the corresponding end entity account vertex to the creation process of the account association relationship is executed cyclically until the number of account association relationships created reaches a predetermined number .
  • the method may further include: obtaining vertex out-degree/in-degree distribution information of entity account vertices; and determining the vertex out-degree/in-degree distribution information of each entity account vertex according to the vertex out-degree/in-degree distribution information. degree and vertex indegree.
  • the method may further include: acquiring social network out-degree/in-degree distribution information; belong.
  • determining the relationship creation probability between the selected origin entity account vertex and the destination entity account vertex may include: based on the calculated attribute distance and the selected origin entity account vertex and destination entity The acquaintance/subordination relationship between the entity vertices to which the account vertices belong respectively determines the relationship creation probability between the selected start entity account vertices and end entity account vertices.
  • creating corresponding entity account vertices of the plurality of entity vertices according to the vertex out-degree distribution information may include: creating corresponding entity account vertices of each entity vertex according to the vertex out-degree distribution information and a business application vertex; and creating an application relationship between each business application vertex and the corresponding entity vertex.
  • the method may further include: extracting a plurality of first entity vertices from the plurality of entity vertices.
  • creating a corresponding entity account vertex of each entity vertex may include: creating a corresponding entity account vertex of each first entity vertex.
  • a method for generating graph data applied to a benchmark test including: creating a plurality of entity vertices through each vertex generation framework; In the entity vertex, a plurality of first entity vertices are extracted for each vertex generation framework; through each vertex generation framework, the corresponding entity account vertices of each extracted first entity vertex are respectively created, and between each entity account vertex and the corresponding entity vertex Create an ownership relationship among them; extract the starting entity account vertex set and the end entity account vertex set from the created entity account vertex for each vertex relationship generation framework through the vertex block framework; and generate the framework through each vertex relationship, respectively based on The extracted starting point entity account vertex set and end point entity account vertex set create an account association relationship between the entity account vertices.
  • the account vertex attributes of each entity account vertex include account association attributes
  • the method may further include: creating account attribute vertices based on the account association attributes of the respective entity account vertices via each vertex generation framework, And based on the account association attribute, an account attribute relationship is created between each account attribute vertex and between each account attribute vertex and a corresponding entity account vertex.
  • the process from the entity vertex extraction process of the vertex block framework to the account association relationship creation process of the vertex relationship generation framework is executed cyclically.
  • the vertex extraction process of the vertex block framework is a non-replacement extraction process until all vertices are extracted.
  • the account vertex attributes of each entity account vertex include vertex out-degree and vertex in-degree, through each vertex relationship generation framework, based on the start entity account vertex set and the end entity account vertex set to create
  • the account association relationship between the entity account vertices may include: according to the vertex out-degree of each origin entity account vertex in the origin entity account vertex set and the vertex in-degree of each end entity account vertex in the end entity account vertex set, determine The selection probability of each start entity account vertex and each end entity account vertex; the following process is cyclically executed until the account association relationship created reaches the first predetermined number M: based on the selection probability of each start entity account vertex and each end entity account vertex , select at least one starting point entity account vertex and the corresponding end point entity account vertex from the starting point entity account vertex set and the end point entity account vertex set; calculate the attribute distance between the selected starting point entity account vertex and the end point entity account vertex ; Based on the calculated attribute distance, determine
  • the first predetermined number M P/K, wherein P is the total out-degree quantity of the vertices of the multiple entity accounts, and K is the number of loop execution times.
  • the creation process of the account association relationship is executed cyclically until no new account association relationship is created, wherein the relationship creation probability used in each cyclic process is determined by the previous cyclic process.
  • the relationship creation probability is obtained by decaying.
  • the method may further include: obtaining the vertex out-degree/in-degree distribution information of the vertex of the entity account through the corresponding data distribution interface of each vertex generation framework; Out-degree/in-degree distribution information, to determine the vertex out-degree and vertex in-degree of each entity account vertex.
  • the method may further include: obtaining social network out-degree/in-degree distribution information via corresponding data distribution interfaces of each vertex generation framework; In-degree distribution information that creates awareness/affiliation relationships between the entity vertices.
  • determining the relationship creation probability between the selected origin entity account vertex and the destination entity account vertex may include: based on the calculated attribute distance and the selected origin entity account vertex and destination entity The acquaintance/subordination relationship between the entity vertices to which the account vertices belong respectively determines the relationship creation probability between the selected start entity account vertices and end entity account vertices.
  • the method may further include: acquiring vertex out-degree distribution information of entity vertices via corresponding data distribution interfaces of each vertex generation framework, and obtaining vertex out-degree distribution information via each vertex generation framework , to determine the vertex out-degree of each entity vertex.
  • each vertex generation framework respectively creating the extracted corresponding entity account vertices of each first entity vertex may include: through each vertex generation framework, respectively based on the extracted vertex out-degree of each first entity vertex, creating the The corresponding entity account vertex of each first entity vertex.
  • a device for generating graph data applied to benchmark tests including: a vertex generation unit that creates a plurality of entity vertices and corresponding entity account vertices of each entity vertices; has a relationship The generation unit creates an ownership relationship between each entity vertex and the corresponding entity account vertex; the vertex block unit determines the starting entity account vertex set and the terminal entity account vertex set according to the created entity account vertex, and the starting entity account vertex There are no overlapping entity account vertices between the set and the terminal entity account vertex set; and an association relationship generating unit, based on the starting entity account vertex set and the terminal entity account vertex set, creating an account between entity account vertices connection relation.
  • an apparatus for generating graph data applied to a benchmark test including: at least two vertex generation frameworks, each vertex generation framework deployed at a first device; at least Two vertex relationship generation frameworks, each vertex relationship generation framework deployed at a second device; and a vertex block framework deployed at a third device, wherein each vertex generation framework is configured to: create multiple entity vertices; Create the corresponding entity account vertex of each first entity vertex extracted by the vertex block framework; and create an ownership relationship between each entity account vertex and the corresponding entity vertex, the vertex block framework is configured to create Extract a plurality of first entity vertices for each vertex generation frame in the entity vertex; and extract the start entity account vertex set and the end entity account vertex set from the created entity account vertex for each vertex relationship generation framework, and each vertex relationship generation framework It is configured to create an account association relationship between entity account vertices based on the extracted starting point entity account vertex set and end point entity account ver
  • the apparatus may further include: a data distribution interface deployed at each first device to obtain vertex out-degree information, wherein the vertex out-degree information of each entity vertex is based on the corresponding vertex out-degree distribution information Sure.
  • the account vertex attributes of each entity account vertex include vertex out-degree and vertex in-degree.
  • Each vertex relationship generation framework is configured to: determine each start entity according to the vertex out-degree of each start entity account vertex in the start entity account vertex set and the vertex in-degree of each end entity account vertex in the end entity account vertex set The selection probability of the account vertex and each terminal entity account vertex; the following process is cyclically executed until the account association relationship created reaches the first predetermined number M: based on the selection probability of each starting entity account vertex and each terminal entity account vertex, from all Select at least one starting point entity account vertex and the corresponding end point entity account vertex from the starting point entity account vertex set and the end point entity account vertex set; calculate the attribute distance between the selected starting point entity account vertex and the end point entity account vertex; based on The calculated attribute distance determines the relationship creation probability between the selected start entity account vertex and the end entity account vertex; and based on the relationship creation probability, creates Account
  • the apparatus may further include: a data distribution interface deployed at each first device to obtain the vertex out-degree/in-degree distribution information of the vertex of the entity account; wherein, the vertex out-degree/in-degree distribution information of each entity account vertex The degree and in-degree of a vertex are determined according to the corresponding vertex out-degree/in-degree distribution information.
  • the apparatus may further include: a data distribution interface deployed at each first device to obtain social network out-degree/in-degree distribution information; each vertex generation framework according to the obtained social network out-degree /in-degree distribution information to create acquaintance/subordination relationship between the entity vertices, and based on the calculated attribute distance and the acquaintance/subordination between the selected start entity account vertex and end entity account vertex respectively belonging entity vertices Relationship, to determine the relationship creation probability between the selected start entity account vertex and end entity account vertex.
  • part of the first devices or each first device in the plurality of first devices is respectively the same as one of the second devices in the plurality of second devices, and/or the first device
  • the third device is the same as one of the plurality of first devices and/or the plurality of second devices.
  • a system for generating graph data applied to benchmark tests including: at least two first devices, each of which is deployed with a vertex generation framework; at least two Second devices each deployed with a vertex relationship generation framework; and third devices deployed with a vertex chunking framework.
  • Each vertex generation framework is configured to: create a plurality of entity vertices; create corresponding entity account vertices of each first entity vertex extracted by the vertex block framework; relation.
  • the vertex block framework is configured to extract a plurality of first entity vertices from the created entity vertices for each vertex generation framework; and extract a starting point entity account vertex set from the created entity account vertices for each vertex relationship generation framework and end entity account vertex sets.
  • Each vertex relationship generating framework is configured to create an account association relationship between entity account vertices based on the extracted starting point entity account vertex set and end point entity account vertex set.
  • an apparatus for generating graph data applied to a benchmark test comprising: at least one processor, a memory coupled to the at least one processor, and stored in the A computer program in a memory, the at least one processor executes the computer program to implement the method as described above.
  • a computer-readable storage medium storing executable instructions that, when executed, cause a processor to perform the method as described above.
  • a computer program product including a computer program, the computer program is executed by a processor to implement the above method.
  • FIG. 1 shows an example flowchart of a graph data generating method according to a first embodiment of the present specification.
  • Fig. 2 shows an example flow chart of the process of creating an account association relationship according to the first embodiment of this specification.
  • Fig. 3 shows another exemplary flow chart of the process of creating an account association relationship according to the first embodiment of this specification.
  • Fig. 4 shows an example schematic diagram of a graph data generation process according to the first embodiment of the present specification.
  • Fig. 5 is a schematic diagram showing an example of a data structure of graph data according to the first embodiment of the present specification.
  • Fig. 6 shows a block diagram of an apparatus for generating graph data applied to a benchmark test according to the first embodiment of the present specification.
  • FIG. 7 shows a block diagram of a system for generating graph data applied to benchmark tests according to a second embodiment of the present specification.
  • FIG. 8 shows an example flowchart of a graph data generating method according to the second embodiment of the present specification.
  • Fig. 9 shows an example flow chart of the process of creating an account association relationship according to the second embodiment of this specification.
  • Fig. 10 shows a block diagram of a graph data generating device according to a second embodiment of the present specification.
  • Fig. 11 shows an example block diagram of a vertex generation framework according to a second embodiment of the present specification.
  • Fig. 12 shows an example block diagram of a vertex relationship generation framework according to the second embodiment of the present specification.
  • Fig. 13 shows a schematic diagram of an example of an apparatus for generating graph data based on a computer system according to an embodiment of the present specification.
  • the term “comprising” and its variants represent open terms meaning “including but not limited to”.
  • the term “based on” means “based at least in part on”.
  • the terms “one embodiment” and “an embodiment” mean “at least one embodiment.”
  • the term “another embodiment” means “at least one other embodiment.”
  • the terms “first”, “second”, etc. may refer to different or the same object. The following may include other definitions, either express or implied. Unless the context clearly indicates otherwise, the definition of a term is consistent throughout the specification.
  • Benchmark testing refers to the quantitative and comparable testing of a certain performance index of a class of test objects through the design of scientific testing methods, testing tools and testing systems.
  • the benchmark test of floating-point operations, data access bandwidth, and latency of computer CPUs can enable users to clearly understand whether the computing performance and job throughput of each CPU meet the requirements of the application.
  • Benchmarking performance indicators such as ACID (Atomicity, Consistency, Isolation, Durability, Atomicity, Consistency, Independence, and Persistence), query time, and online transaction processing capabilities of the database management system is also helpful for users to choose The database system that best meets your needs.
  • ACID Atomicity, Consistency, Isolation, Durability, Atomicity, Consistency, Independence, and Persistence
  • LDBC SNB DATAGEN proposed by LDBC (Linked Data Benchmark Council) is a social network-based benchmark test SNB (Social Network Benchmark).
  • the data scale generated by LDBC SNB DATAGEN ranges from 100MB to 1TB.
  • the data scenarios generated by LDBC SNB DATAGEN are too customized and difficult to modify, which is quite different from the requirements of some application scenarios (for example, financial application scenarios).
  • LDBC SNB DATAGEN uses the attribute distance of two vertex attributes as the influencing factor of the relationship creation probability, and the relationship generation logic is relatively simple.
  • using the LDBC SNB DATAGEN scheme when the vertices are divided into blocks when the relationship is generated due to factors such as the physical bottleneck of the computer hardware, the relationship between the vertices between the blocks and the blocks cannot be generated.
  • embodiments of the present specification provide a solution for generating graph data for benchmark testing.
  • a plurality of entity vertices and corresponding entity account vertices of each entity vertex are created via the vertex generation framework, and an ownership relationship is created between each entity vertex and the corresponding entity account vertices.
  • the starting entity account vertex set and the end entity account vertex set are determined according to the created entity account vertex via the vertex block framework, and there is no overlapping entity account vertex between the starting entity account vertex set and the end entity account vertex set. Then, based on the starting entity account vertex set and the end entity account vertex set through the vertex relationship generation framework, the account association relationship between the entity account vertices is created.
  • the term “account” refers to the carrier used to reflect the increase or decrease of asset data and its results, such as financial asset accounts, digital asset accounts or other types of data asset accounts.
  • the term “account data” may include financial asset data (eg, fund data, loan data, liability data, etc.), digital asset data, or other types of asset data, and the like.
  • the term “account association relationship” refers to all types of relationships that may occur between two accounts, for example, account data transfer relationship, account binding relationship, account affiliation relationship, and other types of relationship that may occur between accounts.
  • Fig. 1 shows an example flowchart of a graph data generation method 100 according to the first embodiment of the present specification.
  • the graph data generating method shown in FIG. 1 is executed by a graph data generating device, and components of the graph data generating device can be deployed on the same device or on different devices.
  • each solid vertex may have solid vertex attributes.
  • Entity vertex attributes may include vertex out-degree.
  • the corresponding entity account vertex can be created based on the vertex out-degree of each entity vertex.
  • entity vertex attributes may include entity identification. Entity ID is used to uniquely identify entity vertices.
  • the entity identifier may be a globally unique identifier, for example, a globally unique integer created based on the corresponding block number.
  • entities may include individual entities and organizational entities.
  • entity vertices may include personal vertices (Person) and organizational vertices (Organization).
  • the vertex out-degree of each entity vertex may be a preset fixed value.
  • the vertex out-degree of each entity vertex may be determined based on, for example, vertex out-degree distribution information input via a data distribution interface. For example, integers may be randomly generated based on vertex out-degree distribution information (eg, power-law distribution).
  • the entity vertex attribute may also include vertex in-degree.
  • entity vertex attributes may also include entity names.
  • entity vertex attributes may also include entity names.
  • the entity vertex may include a First Name and a Last Name.
  • the entity name may include Organization Name.
  • the created entity account vertex may include a personal account vertex (PersonalAccount) and an organizational account vertex (OrganizationalAccount).
  • the account vertex attribute of each entity account vertex may include vertex identifier, account creation date (CreateDate), account validity identifier (IsBlocked) and so on.
  • the account validity flag IsBlocked may be represented by a Boolean value (Boolean), and is used to indicate whether the account is valid. For example, a Boolean value of "1" may be used for valid and a Boolean value of "0" for invalid. In another example, it can also be expressed in reverse.
  • the value DateTime of CreateDate can be generated within a limited time range by a random generator.
  • the value of IsBlocked can be generated by a random generator.
  • a service application vertex may also be created.
  • the specific form of the business application apex can be determined based on specific application scenarios.
  • examples of a business application vertex may include a loan application (LoanApplication) vertex, a financing application vertex, and the like.
  • the entity vertex attribute of the LoanApplication vertex can have vertex ID and LoanAmount.
  • the value of LoanAmount is a Decimal value.
  • a corresponding entity account vertex and a service application vertex are created for each entity vertex based on the vertex out-degree of each entity vertex.
  • the entity account vertex and the service application vertex may be collectively referred to as an entity association vertex, for example.
  • an ownership relationship is created between each entity vertex and the corresponding entity account vertex.
  • an application relationship (Apply).
  • the application relationship may also have a relationship attribute (ApplyDate). The value of ApplyDate is generated within a limited time range by a random generator.
  • each entity account vertex may also have an account vertex attribute.
  • Account vertex attributes may include account association attributes.
  • examples of account-associated attributes may include, but are not limited to, account registration address, registration phone (Phone), login network address (IP) and Register the physical address (MAC).
  • the account registration address may be, for example, the account registration city (City).
  • the login network address (IP) may be, for example, the IP address used to log in to the account.
  • the login physical address (MAC) may be the device physical address of the device used to log in to the account, for example, MAC address and the like.
  • the registration phone (Phone), login network address (IP), login physical address (MAC) and registration address (City) of a personal account PersonalAccount or organizational account OrganizationalAccount will be created when creating a personal account or an organizational account.
  • the value of City is randomly selected in the city data resource database
  • the value of Phone is randomly selected in the telephone data resource database
  • the number of IP addresses is generated by a random generator, and then the corresponding number of IP addresses is randomly selected from the network address data resource database address.
  • the number of MAC addresses is generated by a random generator, and then a corresponding number of MAC addresses is randomly selected from the physical address data resource library.
  • the account attribute vertex can also be created based on the account association attribute of each entity account vertex; and according to the account association attribute, between each account attribute vertex and each account attribute Create an account attribute relationship between the vertex and the corresponding entity account vertex.
  • account attribute relationships include, but are not limited to: at least one of a location relationship (IsLocatedIn), a phone registration relationship (SignUpDate), a login network address relationship (SignInWithIP), and a login physical address relationship (SignInWithMAC).
  • an account attribute relationship SignInWithIP is created between PersonalAccount and account attribute vertex IP, and the account attribute relationship has a relationship attribute SignInDate.
  • the value of SignInDate is generated within a limited time range by a random generator.
  • An account attribute relationship SignInWithMAC is created between PersonalAccount and account attribute vertex MAC, and the account attribute relationship has a relationship attribute SignInDate.
  • the value of SignInDate is generated within a limited time range by a random generator.
  • An account attribute relationship SignUpWithPhone is created between PersonalAccount and account attribute vertex Phone, and the account attribute relationship has a relationship attribute SignUpDate.
  • the value of SignUpDate is generated within a limited time range by a random generator. Create an account attribute relationship IsLocatedIn between PersonalAccount and the account attribute vertex City. Create an account attribute relationship IsLocatedIn between the account attribute vertex Phone and the account attribute vertex City.
  • the start entity account vertex set and the end entity account vertex set are determined according to the created entity account vertex, and there is no overlapping entity between the start entity account vertex set and the end entity account vertex set Account Vertex.
  • the start entity account vertex is used as the start point of the edge relationship of graph data
  • the end entity account vertex is used as the end point of the edge relationship of graph data.
  • the created entity account vertices may be classified into a set of origin entity account vertices and a set of end entity account vertices.
  • the start entity account vertex set and the end entity account vertex set may also be extracted from the created entity account vertices.
  • graph data refers to directed graph data.
  • an account association relationship between entity account vertices is created, thereby creating required graph data.
  • examples of the account association relationship between two accounts may include, but not limited to, account data transfer relationship, account binding relationship, and other types of association relationship that may occur between accounts.
  • Examples of account data transfer relationships may include, but are not limited to, account fund transfer relationships, loan data transfer relationships, liability data transfer relationships, and the like.
  • the created graph data may be financial graph data
  • the account association relationship may be a transfer relationship.
  • multiple first entity vertices may also be extracted from multiple entity vertices. Then, create entity account vertices corresponding to each of the extracted first entity vertices.
  • Fig. 2 shows an example flow chart of an account association relationship creation process 200 according to the first embodiment of this specification.
  • the account vertex attributes of the entity account vertex include vertex out-degree and vertex in-degree.
  • each starting point entity account vertex and each The selection probability of the terminal entity account vertex determine each starting point entity account vertex and each The selection probability of the terminal entity account vertex. For example, for the origin entity account vertex, the selection probability of the origin entity account vertex is determined based on dividing the vertex out-degree of the origin entity account vertex by the total vertex out-degree of the origin entity account vertex set. The sum of the selection probabilities of each starting entity account vertex in each starting entity account vertex set is 1.
  • the selection probability of the terminal entity account vertex is determined based on dividing the vertex in-degree of the terminal entity account vertex by the total vertex in-degree of the terminal entity account vertex set.
  • the sum of the selection probabilities of each terminal entity account vertex in each terminal entity account vertex set is 1.
  • the vertex in-degree used in the process of determining the selection probability is the vertex in-degree in the vertex attribute information of the vertex of the terminal entity account.
  • the vertex in-degree used in the process of determining the selection probability is the vertex in-degree obtained by removing the vertex in-degree from the entity vertex from the vertex in-degree in the vertex attribute information of the terminal entity account vertex.
  • each start entity account vertex and each end entity account vertex After determining the selection probabilities of each start entity account vertex and each end entity account vertex, at 220, based on the selection probabilities of each start entity account vertex and each end entity account vertex, from the start entity account vertex set and the end entity account vertex set Select at least one starting point entity account vertex and the corresponding end point entity account vertex.
  • the selection process of the entity account vertex is a random selection process based on the selection probability.
  • the selected origin entity account vertex may include one or more origin entity account vertices, and each origin entity account vertex includes a corresponding end entity account vertex.
  • the attribute distance between the selected origin entity account vertex and the corresponding end entity account vertex is calculated. For example, when there are multiple attributes of the same type between the selected starting entity account vertex and the destination entity account vertex, the attribute distance D between the multiple attributes of the same type may be calculated. For example, assuming that the selected starting point entity account vertex and end point entity account vertex both have a registered address, registered phone number, and logged-in network address, corresponding attribute distances D1 to D3 can be calculated based on the registered address, registered phone number, and logged-in network address.
  • the attribute distance includes multiple attribute distances
  • an integrated attribute distance may be determined based on the multiple attribute distances, and then the relationship creation probability is determined based on the integrated attribute distance.
  • different weights can also be assigned, and then the relationship creation probability is determined based on each attribute distance and its weight.
  • the created account association relationship may include, for example, account data transfer relationship, account binding relationship, account affiliation relationship, and other types of association relationship that may occur between accounts.
  • the account data transfer relationship may be, for example, an account data transfer behavior.
  • multiple account association relationships can be created between each selected start entity account vertex and corresponding end entity account vertex, so that the created account association relationship reaches a predetermined number of account association relationships .
  • the creation process of the above-mentioned account association relationship may be a cyclic process. Specifically, for each starting point entity account vertex and corresponding end point entity account vertex, the relationship creation probability created in 240 is used as the initial relationship creation probability, and the following process is cyclically executed until no account association relationship is created: When looping, based on the current relationship creation probability, an account association relationship is created between the starting point entity account vertex and the corresponding end point entity account vertex. Then, it is judged whether an account association relationship is currently created. If the account association relationship is currently created, the relationship creation probability used in the current cycle process is attenuated to obtain the current relationship creation probability of the next cycle process, and then the next cycle process is executed.
  • the loop ends.
  • the attenuation processing may include, but not limited to: performing attenuation processing on the relationship creation probability according to a linear attenuation function or a nonlinear attenuation function.
  • the function expression of the linear attenuation function or the nonlinear attenuation function may be any suitable function expression determined based on a specific application scenario.
  • Fig. 3 shows another exemplary flow chart of an account association relationship creation process 300 according to the first embodiment of this specification.
  • the account vertex attributes of the entity account vertex include vertex out-degree and vertex in-degree.
  • each start entity account vertex in the start entity account vertex set determines the selection of each start entity account vertex and each end entity account vertex probability. For the process of determining the selection probability, reference may be made to the process described above with reference to FIG. 2 .
  • each cycle at 320, based on the selection probabilities of each starting entity account vertex and each ending entity account vertex, at least one starting point entity account vertex and corresponding The endpoint entity account vertex.
  • the selection process of the entity account vertex is a random selection process based on the selection probability.
  • the selected origin entity account vertex may include one or more origin entity account vertices, and each origin entity account vertex includes a corresponding end entity account vertex.
  • the attribute distance between the selected origin entity account vertex and the corresponding end entity account vertex is calculated.
  • the attribute distance For the calculation process of the attribute distance, reference may be made to the process described above with reference to 230 in FIG. 2 .
  • an initial relationship creation probability between each selected origin entity account vertex and a corresponding end entity account vertex is determined based on the calculated attribute distances.
  • the initial relationship creation probability can refer to the process described above with reference to 240 of FIG. 2 .
  • each loop at 350, according to the current relationship creation probability, an account association relationship is created between each selected start entity account vertex and the corresponding end entity account vertex.
  • the current relationship creation probability is the initial relationship creation probability.
  • it may also include obtaining the vertex out-degree/in-degree distribution information of the vertex of the entity account; and according to the acquired vertex out-degree/in-degree distribution information; Degree distribution information to determine the vertex out-degree and vertex in-degree of each entity account vertex.
  • the account association relationship creation process shown in FIG. 2 or FIG. 3 it may also include acquiring social network out-degree/in-degree distribution information; and according to the obtained social network out-degree/in-degree distribution information to create awareness/subordination relationships between entity vertices. Then, when determining the relationship creation probability, based on the calculated attribute distance and the cognition/subordination relationship between the selected starting entity account vertex and the ending entity account vertex respectively belonging entity vertices, determine the selected starting entity account vertex and Probability of relationship creation between endpoint entity account vertices.
  • FIG. 4 shows an example schematic diagram of a map data generation process 400 according to an embodiment of the present specification.
  • FIG. 5 shows an exemplary schematic diagram of a data structure of graph data according to an embodiment of the present specification.
  • entity vertices, entity account vertices, and account attribute vertices are created in the vertex generation framework, and the creation mechanisms of entity vertices, entity account vertices, and account attribute vertices are different. Creation of solid vertices does not require any data input.
  • the creation of the entity account vertex needs to input the created entity vertex, and the creation of the account attribute vertex needs the account association attribute of the created entity account vertex.
  • the ownership relationship between each entity account vertex and the corresponding entity vertex, and the account attribute relationship between each account attribute vertex and between each account attribute vertex and the corresponding entity account vertex are also created .
  • create the account association relationship between each entity account vertex for example, the transfer relationship (Transfer). As shown in FIG. 5, the transfer relationship has a relationship attribute TransferAmount.
  • the value of TransferAmount is a Decimal value.
  • Fig. 6 shows a block diagram of an apparatus 600 for generating graph data applied to a benchmark test according to the first embodiment of the present specification.
  • the apparatus 600 includes a vertex generation unit 610 , an ownership relationship generation unit 620 , a vertex block unit 630 and an association relationship generation unit 640 .
  • the vertex generation unit 610 is configured to create a plurality of entity vertices and corresponding entity account vertices of each entity vertex.
  • the operation of the vertex generation unit 610 may refer to the operation described above with reference to 110 of FIG. 1 .
  • the ownership relationship generation unit 620 is configured to create an ownership relationship between each entity vertex and the corresponding entity account vertex. For operations of the ownership relationship generating unit 620, reference may be made to the operations described above with reference to 120 in FIG. 1 .
  • the vertex block unit 630 is configured to determine a starting entity account vertex set and an end entity account vertex set according to the created entity account vertex, and there is no overlapping entity between the starting entity account vertex set and the end entity account vertex set Account Vertex.
  • the operation of the vertex blocking unit 630 may refer to the operation described above with reference to 130 of FIG. 1 .
  • the association relationship generation unit 640 is configured to create an account association relationship between entity account vertices based on the starting entity account vertex set and the end entity account vertex set.
  • the association relationship generating unit 640 reference may be made to the operations described above with reference to 140 in FIG. 1 and the operations described with reference to FIG. 2 or FIG. 3 .
  • ownership relationship generation unit 620 and the association relationship generation unit 640 may be implemented by using the same relationship generation unit.
  • the vertex block unit 630 may also be configured to extract a plurality of first entity vertices from the plurality of entity vertices. Then, the vertex generation unit 610 creates entity account vertices corresponding to each extracted first entity vertex.
  • the vertex generation unit 610 may also be configured to create a service application vertex for each entity vertex.
  • the apparatus 600 may also include an application relationship generating unit (not shown).
  • the application relationship generating unit is configured to create an application relationship (Apply) between each service application vertex and the corresponding entity vertex.
  • the application relationship generation unit may be implemented by the same unit as the ownership relationship generation unit 620 and the association relationship generation unit 640, or may be implemented by different units.
  • the apparatus 600 may further include a data distribution information acquiring unit (not shown).
  • the data distribution information obtaining unit may be configured to obtain vertex out-degree distribution information of entity vertices.
  • the vertex generating unit 610 creates the corresponding entity account vertex of each entity vertex according to the acquired vertex out-degree distribution information.
  • the data distribution information obtaining unit may also be configured to obtain the vertex out-degree/in-degree distribution information of the entity account vertex.
  • the vertex generation unit 610 determines the vertex out-degree and vertex in-degree of each entity account vertex according to the acquired vertex out-degree/in-degree distribution information.
  • the data distribution information obtaining unit may also be configured to obtain social network out-degree/in-degree distribution information.
  • the apparatus 600 may further include an entity-vertex relationship generation unit (not shown).
  • the entity vertex relationship generation unit creates acquaintance/affiliation relationship between entity vertices according to the acquired social network out-degree/in-degree distribution information.
  • the association relationship generating unit 640 determines the selected start entity account vertex and end entity based on the calculated attribute distance and the recognition/affiliation between the selected start entity account vertex and end entity account vertex respectively belonging entity vertices. Relationship creation probability between account vertices.
  • the entity vertex relationship generation unit may be implemented by the same unit as the application relationship generation unit, the ownership relationship generation unit 620 and the association relationship generation unit 640, or may be implemented by different units.
  • the graph data generation scheme shown in the first embodiment of this specification it is possible to generate test graph data having a real graph data structure, thereby being applied to benchmark tests.
  • the graph data generation scheme is particularly suitable for generating financial graph data.
  • FIG. 7 shows a block diagram of a system 700 for generating graph data for benchmarking according to a second embodiment of the present specification.
  • the system 700 includes M first devices 710 - 1 to 710 -M, N second devices 720 - 1 to 720 -N, and a third device 730 .
  • the values of M and N may be the same or different.
  • the specific values of M and N can be determined according to specific application scenarios, for example, based on the scale of graph data that needs to be generated in the application scenario.
  • the first device, the second device and the third device may be any type of server device or terminal device with computing capability or processing capability.
  • examples of the server device may include but not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
  • Examples of terminal devices may include, but are not limited to: any one of smart terminal devices such as smart phones, personal computers (personal computers, PCs), notebook computers, tablet computers, e-readers, network TVs, and wearable devices.
  • the first device, the second device, and the third device may communicate directly or perform data transmission via network communication.
  • the network may be any one or more of a wired network or a wireless network.
  • networks may include, but are not limited to, cable networks, fiber optic networks, telecommunications networks, intranets, the Internet, local area networks (LANs), wide area networks (WANs), wireless local area networks (WLANs), metropolitan area networks (MANs), Public Switched Telephone Network (PSTN), Bluetooth Network, ZigZee Network (ZigZee), Near Field Communication (NFC), In-Device Bus, In-Device Line, etc. or any combination thereof.
  • Each of the first devices 710 - 1 to 710 -M may be deployed with a data distribution interface 711 and a vertex generation framework 712 .
  • Each of the second devices 720 - 1 to 720 -N may be deployed with the vertex relationship generation framework 721 .
  • the third device 730 may be deployed with a vertex tiling framework 731 .
  • framework may be equivalent to "unit”, “module”, “platform” and the like.
  • the data distribution interface 711 may be configured to acquire (for example, for user input) vertex out-degree distribution information or vertex out-degree/in-degree distribution information.
  • the out-degree of a vertex refers to the number of edges starting from the vertex.
  • the in-degree of a vertex is the number of edges ending at that vertex.
  • the vertex out-degree distribution information may be used by the vertex generation framework 712 to determine the vertex out-degree of each created entity vertex.
  • the data distribution interface 711 may also be configured to obtain the vertex out-degree/in-degree distribution information of the vertex of the entity account.
  • the vertex generation framework 712 determines the vertex out-degree and vertex in-degree of each entity account vertex according to the vertex out-degree/in-degree distribution information of the entity account vertex.
  • the data distribution interface 711 may also be configured to acquire social network out-degree/in-degree distribution information. The acquired social network out-degree/in-degree distribution information is used by the vertex generation framework 712 to create acquaintance/affiliation relationships between the created entity vertices.
  • Each first device in the first devices 710-1 to 710-M may correspond to each vertex block in the plurality of vertex blocks partitioned by the vertex block framework 731, and each first device in the The vertex generation framework 712 is configured to process vertex tiles received from the vertex tile framework 731 .
  • the vertex generation framework 712 on each first device is configured to create a plurality of entity vertices.
  • the entity vertices created by each vertex generation framework 712 can be sent to the vertex block framework 731, and can also be stored in the same data storage space (data memory or data storage unit), so that the vertex block framework 731 can retrieve the data from the data storage space.
  • the vertex block framework 731 is configured to extract entity vertex blocks for each vertex generation framework 712 from the created entity vertices, each vertex generation frame 712 corresponds to an entity vertex block, and each entity vertex block includes a plurality of A solid vertex.
  • the entity vertex extraction performed by the vertex block framework 731 is random extraction without replacement, and each extraction process needs to extract all the created entity vertices.
  • the vertex block framework 731 needs to perform 10 random extraction processes, and the 100 entity vertices are extracted as 10 entity vertex blocks, and the number of entity vertices included in each entity vertex block can be the same or different. Moreover, during the random extraction process, the entity vertices extracted in the previous extraction process will not be put back into the entity vertex pool of the current extraction process.
  • the extracted 10 entity vertex blocks may be distributed to each vertex generation framework 712 , for example.
  • each vertex generation framework 712 After each vertex generation framework 712 obtains a plurality of first entity vertices (entity vertex blocks) extracted by the vertex block framework 731, each vertex generation framework 712 is also configured to generate Create a corresponding entity account vertex for each first entity vertex. In addition, in another example, each vertex generating framework 712 may also generate a service application vertex.
  • the specific form of the business application apex can be determined based on specific application scenarios. For example, in a financial application scenario, examples of a business application vertex may include a loan application (LoanApplication) vertex, a financing application vertex, and the like.
  • each vertex generation framework 712 is configured to create a corresponding entity account vertex and a service application vertex for each first entity vertex based on the obtained vertex out-degree of each first entity vertex.
  • the entity account vertex and the service application vertex may be collectively referred to as an entity association vertex, for example.
  • the created entity account vertex can be sent to the vertex block framework 731 or stored in the same data storage space for the vertex block framework 731 to obtain from the data storage space.
  • each vertex generation framework 712 is configured to create an ownership relationship (Owe) between each entity account vertex and the corresponding first entity vertex.
  • each vertex generation framework 712 also creates a service application vertex, in addition to creating an ownership relationship between each entity account vertex and the corresponding first entity vertex, each vertex generation framework 712 also creates a An application relationship (Apply) is established between the service application vertex and the corresponding first entity vertex.
  • each vertex generation framework 712 is also configured to create an account attribute vertex based on the account-associated attribute of each entity account vertex, and based on the account-associated attribute, between each account attribute vertex Create an account attribute relationship between each account attribute vertex and the corresponding entity account vertex.
  • the vertex block framework 731 can also be configured to extract a start entity account vertex set and an end entity account vertex set for each vertex relationship generation framework from the created entity account vertices. Similarly, the extraction process of the start entity account vertex set and the end entity account vertex set of the vertex block framework 731 is extraction without replacement. In addition, the above extraction process of the vertex block framework may be until all entity account vertices are extracted.
  • each vertex relationship generation framework 721 is configured to create an account association relationship between entity account vertices based on the received start entity account vertex set and end entity account vertex set, by This creates the required graph data.
  • the graph data may be financial graph data
  • the account association relationship may be a transfer relationship.
  • each first device may not include the data distribution interface 711 .
  • the first device, the second device, and the third device are shown as different devices.
  • some of the first devices or each of the first devices 710-1 to 710-M may be connected to one of the second devices 720-1 to 720-N respectively. same.
  • the vertex generation framework and the vertex relationship generation framework can be deployed on one device at the same time.
  • the third device 730 may be the same as one of the first devices 710-1 to 710-M and/or the second devices 720-1 to 720-N.
  • the vertex generation framework and the vertex block framework, the vertex relation generation framework and the vertex block framework, or the vertex generation framework, vertex relation generation framework, and vertex block framework can be deployed on a device at the same time.
  • FIG. 8 shows an example flowchart of a graph data generation method 800 according to an embodiment of the present specification.
  • the entity vertex attributes of each entity vertex may include the vertex out-degree.
  • the vertex out-degree of each entity vertex may be determined based on the vertex out-degree distribution information acquired through the data distribution interface at the first device where the vertex generation framework is located.
  • the created entity vertices can be sent to the vertex block framework, and can also be stored in a common data storage space for acquisition by the vertex block framework.
  • the vertex out-degree/in-degree distribution information may be acquired via the data distribution interface at the first device where the vertex generation framework is located.
  • the operations from 820 to 860 are executed in a loop until the loop is executed a predetermined number of times, for example, K times.
  • the vertex block framework at the third device extracts entity vertex blocks from the created entity vertices for each vertex generation frame, and each vertex generation frame corresponds to an entity vertex segment block, each entity vertex block includes a plurality of first entity vertices.
  • the plurality of first entity vertex blocks extracted by the vertex block framework may be distributed to the corresponding vertex generation framework.
  • the entity vertices used for entity vertex extraction include all entity vertices created in step 810 .
  • the entity vertex extraction process of the vertex block framework adopts the entity vertex extraction process described above with reference to FIG. 7 .
  • each vertex generation framework create a corresponding entity account vertex for each first entity vertex based on the extracted vertex out-degree of each first entity vertex, and create a link between each entity account vertex and the corresponding entity vertex Create an owning relationship.
  • the created entity account vertex may be sent to the vertex block framework, and may also be stored in a common data storage space for acquisition by the vertex block framework.
  • the vertex block framework extracts a start entity account vertex set and an end entity account vertex set for each vertex relationship generation framework from the created entity account vertices.
  • each vertex relationship generation framework create an account association relationship between entity account vertices based on the extracted start entity account vertex set and end entity account vertex set respectively. The process of creating an account association relationship will be described in detail below with reference to FIG. 9 .
  • a predetermined number of cycles eg, K times
  • the process ends. If the predetermined number of cycles is not reached, return to step 820 to execute the next cycle.
  • the graph data generating method described in FIG. 8 may also be modified in a modification manner corresponding to the modification of the graph data generating method described in FIG. 1 .
  • FIG. 9 shows an example flowchart of an account association relationship creation process 850 according to an embodiment of the present specification.
  • the account association relationship creation process is a process performed by a single vertex relationship generation framework.
  • each starting point entity account vertex in the starting point entity account vertex set and the vertex in-degree of each end point entity account vertex in the end point entity account vertex set determine each starting point entity account vertex and each The selection probability of the terminal entity account vertex.
  • the first predetermined number M P/K
  • P is the The total out-degree number of created multiple entity account vertices (all entity account vertices).
  • P may also be a preset predetermined value used to indicate the total number of account association relationships that need to be created.
  • each loop process at 852, based on the selection probabilities of each starting entity account vertex and each ending entity account vertex, at least one starting entity account vertex is selected from the starting entity account vertex set and the ending entity account vertex set and The corresponding endpoint entity account vertex.
  • one start entity account vertex and one end entity account vertex are selected each time.
  • multiple starting point entity account vertices and corresponding end point entity account vertices may also be selected each time.
  • the selection process of the entity account vertex is a random selection process based on the selection probability.
  • the attribute distance between the selected origin entity account vertex and destination entity account vertex is calculated.
  • the attribute distance For the calculation process of the attribute distance, reference may be made to the process described above with reference to 230 in FIG. 2 .
  • an initial relationship creation probability between the selected origin entity account vertex and destination entity account vertex is determined. For the determination process of the initial relationship creation probability, reference may be made to the process described above with reference to 240 in FIG. 2 .
  • steps 855 to 857 in a loop until no new account association relationship is created.
  • an account association relationship is created between the selected origin entity account vertex and end entity account vertex based on the current relationship creation probability.
  • step 858 it is judged whether the relationship quantity of the created account association relationship reaches the first predetermined number M. If the first predetermined number M is reached, flow proceeds to 860 of FIG. 8 . If the first predetermined number M is not reached, return to 852 and execute the next loop process.
  • the social network out-degree/in-degree distribution information may also be obtained via the corresponding data distribution interface of each vertex generation framework. Then, at each vertex generation framework, acquaintance/affiliation relationships are created between entity vertices according to the acquired social network out-degree/in-degree distribution information. For example, create an acquaintance/affiliation relationship between a personal apex and an/organization apex.
  • the initial relationship creation probability in addition to considering the attribute distance between the selected start entity account vertex and end entity account vertex, it is also necessary to consider the respective attributes of the selected start entity account vertex and end entity account vertex. Awareness/subordination between entity vertices.
  • the distance between the selected start entity account vertex and end entity account vertex is determined.
  • the process of creating an account association relationship based on the relationship creation probability is shown as a cyclic process.
  • multiple account association relationships may also be created at one time without performing a cyclic process.
  • each vertex generation framework randomly blocks all 100 entity vertices into 10 entity vertex blocks, each entity vertex block includes 10 entity vertices.
  • the vertex chunking framework then distributes a solid vertex chunk to each vertex generation framework.
  • each vertex generation framework creates corresponding entity account vertices according to the vertex out-degree of each entity vertex, and creates an ownership relationship between the created entity account vertices and corresponding entity vertices.
  • the vertex block framework randomly blocks all the created entity account vertices into 10 entity account blocks, and each entity account block includes a starting entity account vertex set and an end entity account vertex set. There are no common entity account vertices among the entity account vertex sets that are divided into blocks. Then, the vertex block framework distributes an entity account vertex block to each vertex relationship generation framework. After receiving the entity account vertex block, each vertex relationship generation framework creates the account association relationship between the entity account vertices according to the start entity account vertex set and the end entity account vertex set. This cycle is repeated 5 times until a predetermined number of account association relationships are created.
  • the vertex generation process and the vertex relationship generation process are distributed to be executed in a plurality of vertex generation frameworks and a plurality of vertex relationship generation frameworks, so that any data can be easily generated scale graph data.
  • the vertex generation process related to the application scenario by deploying the vertex generation process related to the application scenario, the vertex relationship generation process, the attribute relationship generation process and the vertex block process irrelevant to the application scenario on different processing frameworks, thus Decoupling the vertex generation process, vertex relationship generation process, attribute relationship generation process and application scenario-independent data block process related to the application scenario makes it possible to modify and expand the application scenario.
  • the start entity account vertex set and the end entity account vertex set are extracted. Vertices between tiles can generate relationships.
  • the account association relationship when creating an account association relationship, by determining the initial relationship creation probability, the account association relationship is created based on the initial relationship creation probability, and after the account association relationship is created, the initial relationship is attenuated The probability is created to further create the account association relationship, and this cycle is repeated multiple times, so that the created account association relationship is more in line with the actual application scenario.
  • FIG. 10 shows a block diagram of a graph data generation device 1000 according to an embodiment of the present specification.
  • the graph data generation device 1000 includes multiple (for example, M) data distribution interfaces 1010, multiple (for example, M) vertex generation frameworks 1020, multiple (for example, N) vertex relationship generation frameworks 1030 and Vertex Tiling Framework 1040.
  • M and N may be the same or different.
  • Each data distribution interface 1010 and a vertex generation framework 1020 are deployed on a first device, and each vertex relationship generation framework 1030 is deployed on a second device.
  • the vertex partitioning framework 1040 is deployed on the third device.
  • the data distribution interface 1010 is configured to obtain vertex out-degree distribution information of entity vertices.
  • Each vertex generation framework 1020 is configured to create a plurality of entity vertices, and the entity vertex attributes of each entity vertex include vertex out-degree, wherein the vertex out-degree of each entity vertex can be determined based on the acquired vertex out-degree distribution information.
  • the vertex block framework 1040 is configured to extract a plurality of first entity vertices for each vertex generation framework from the created entity vertices. Then, each vertex generation framework 1020 is also configured to create corresponding entity account vertices for each first entity vertex based on the vertex out-degree of each first entity vertex extracted by the vertex block framework, and create corresponding entity account vertices between each entity account vertex and the corresponding Create an owning relationship between the vertices of the first entity.
  • the vertex block framework 1040 is further configured to extract a start entity account vertex set and an end entity account vertex set for each vertex relationship generation framework from the created entity account vertex.
  • Each vertex relationship generating framework 1030 is configured to create an account association relationship between entity account vertices based on the extracted starting point entity account vertex set and end point entity account vertex set.
  • the data distribution interface 1010 may also be configured to obtain the vertex out-degree/in-degree distribution information of the entity account vertex.
  • each vertex generation framework 1020 may determine the vertex out-degree and vertex in-degree of each entity account vertex based on the acquired vertex out-degree/in-degree distribution information.
  • FIG. 11 shows an example block diagram of a vertex generation framework 1100 according to an embodiment of the specification.
  • the vertex generation framework 1100 includes an entity vertex creation unit 1110 , an entity vertex receiving unit 1120 , an associated vertex creation unit 1130 , an account attribute vertex creation unit 1140 and a relationship creation unit 1150 .
  • the entity vertex creation unit 1110 is configured to create a plurality of entity vertices.
  • the vertex out-degree distribution information of entity vertices may be obtained via the data distribution interface, and the entity vertex creation unit 1110 may determine the vertex out-degrees of each entity vertex based on the obtained vertex distribution information.
  • the entity vertex receiving unit 1120 is configured to receive a plurality of corresponding first entity vertices from the vertex block framework.
  • the vertex block framework and the vertex generation framework are located in the same device body, the entity vertex receiving unit 1120 may not be needed.
  • the associated vertex creation unit 1130 is configured to create a corresponding entity account vertex for each first entity vertex based on the vertex out-degree of each first entity vertex received from the vertex block framework.
  • the relationship creation unit 1150 is configured to create an ownership relationship between the created entity account vertex and the corresponding entity vertex.
  • the associated vertex creation unit 1130 is configured to create a corresponding entity account vertex and Business Application Capstone.
  • the relationship creation unit 1150 is configured to create an ownership relationship between the created entity account vertex and the corresponding entity vertex, and create an application relationship between each business application vertex and the corresponding entity vertex.
  • the account attribute vertex creation unit 1140 is configured to create an account attribute vertex based on the account association attributes of each entity account vertex.
  • the relationship creating unit 1150 is configured to create an account attribute relationship between each account attribute vertex and between each account attribute vertex and the corresponding entity account vertex according to the account relationship attribute.
  • the account attribute vertex creating unit 1140 may not be needed.
  • the units in the entity vertex creation unit 1110 , the associated vertex creation unit 1130 and the account attribute vertex creation unit 1140 may be implemented by the same unit.
  • FIG. 12 shows an example block diagram of a vertex relationship generation framework 1200 according to an embodiment of the specification.
  • the vertex relationship generation framework 1200 includes a selection probability determination unit 1210 , an entity account vertex selection unit 1220 , an attribute distance calculation unit 1230 , a relationship creation probability determination unit 1240 and a relationship creation unit 1250 .
  • the selection probability determination unit 1210 is configured to determine each start entity account vertex and each end point according to the vertex out-degree of each start entity account vertex in the start point entity account vertex set and the vertex in-degree of each end entity account vertex in the end point entity account vertex set The selection probability of the entity account vertex.
  • the entity account vertex selection unit 1220 , the attribute distance calculation unit 1230 , the relationship creation probability determination unit 1240 and the relationship creation unit 1250 perform operations cyclically until the created account association relationship reaches the first predetermined number M.
  • the entity account vertex selection unit 1220 is configured to select at least A starting entity account vertex and a corresponding end entity account vertex.
  • the attribute distance calculating unit 1230 is configured to calculate the attribute distance between the selected starting point entity account vertex and end point entity account vertex.
  • the relationship creation probability determining unit 1240 is configured to determine an initial relationship creation probability between the selected start entity account vertex and the end entity account vertex based on the calculated attribute distance.
  • the relationship creation unit 1250 is configured to execute the following process cyclically until no new account association relationship is created: based on the current relationship creation probability, create an account association relationship between the selected start entity account vertex and the end entity account vertex, Wherein, the relationship creation probability used in each cyclic process is obtained by attenuating the relationship creation probability of the previous cyclic process.
  • the data distribution interface can be configured to obtain social network out-degree/in-degree distribution information.
  • the relationship creating unit 1250 may be configured to create an acquaintance/affiliation relationship between entity vertices according to the acquired social network out-degree/in-degree distribution information.
  • the relationship creation probability determination unit 1240 is configured to determine the selected starting entity account based on the calculated attribute distance and the acquaintance/subordination relationship between the selected starting entity account vertex and the ending entity account vertex respectively belonging entity vertices. The initial relationship creation probability between a vertex and an end entity account vertex.
  • the vertex generation framework and the corresponding vertex relationship generation framework can be deployed on the same device.
  • the relationship creation unit 1150 may also be included in the vertex relationship generation framework as a component of the vertex relationship generation framework instead of being a component of the vertex relationship generation framework.
  • the above graph data generation device can be realized by hardware, software or a combination of hardware and software.
  • Fig. 13 shows a schematic diagram of a graph data generation device 1300 implemented based on a computer system according to an embodiment of the present specification.
  • the graph data generation device 1300 may include at least one processor 1310, a memory (such as a non-volatile memory) 1320, a memory 1330 and a communication interface 1340, and at least one processor 1310, a memory 1320, a memory 1330 and The communication interfaces 1340 are connected together via a bus 1360 .
  • At least one processor 1310 executes at least one computer-readable instruction stored or encoded in a memory (ie, the aforementioned elements implemented in software).
  • computer-executable instructions are stored in memory which, when executed, cause at least one processor 1310 to: create a plurality of entity vertices and corresponding entity account vertices for each entity vertex; Create an ownership relationship between the vertices; determine the starting entity account vertex set and the end entity account vertex set according to the created entity account vertex, and there is no overlapping entity account vertex between the starting entity account vertex set and the end entity account vertex set; and based on The starting entity account vertex set and the end entity account vertex set create the account association relationship between the entity account vertices.
  • computer-executable instructions are stored in the memory which, when executed, cause at least one processor 1310 to: via each vertex generation framework, respectively create a plurality of solid vertices; In the entity vertex, a plurality of first entity vertices are extracted for each vertex generation framework; through each vertex generation framework, the corresponding entity account vertices of each extracted first entity vertex are respectively created, and between each entity account vertex and the corresponding entity vertex Create the ownership relationship among them; extract the starting entity account vertex set and the end entity account vertex set from the created entity account vertex through the vertex block framework for each vertex relationship generation framework; and generate the framework through each vertex relationship, respectively based on the extracted The starting entity account vertex set and the end entity account vertex set create the account association relationship between the entity account vertices.
  • a program product such as a machine-readable medium (eg, a non-transitory machine-readable medium) is provided.
  • the machine-readable medium may have instructions (that is, the aforementioned elements implemented in software), which, when executed by the machine, cause the machine to perform the various operations and operations described above in conjunction with FIGS. 1-12 in various embodiments of this specification.
  • Function Specifically, a system or device equipped with a readable storage medium can be provided, on which a software program code for realizing the functions of any one of the above embodiments is stored, and the computer or device of the system or device can The processor reads and executes the instructions stored in the readable storage medium.
  • the program code read from the readable medium itself can realize the functions of any one of the above-mentioned embodiments, so the machine-readable code and the readable storage medium storing the machine-readable code constitute a part of the present invention.
  • Examples of readable storage media include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD-RW), magnetic tape, non- Volatile memory card and ROM.
  • the program code can be downloaded from a server computer or cloud via a communication network.
  • a computer program product includes a computer program, and when the computer program is executed by a processor, the processor executes the above described in conjunction with FIGS. 1-12 in various embodiments of this specification. Various operations and functions.
  • the execution order of each step is not fixed, and can be determined as required.
  • the device structures described in the above embodiments may be physical structures or logical structures, that is, some units may be realized by the same physical entity, or some units may be realized by multiple physical entities, or may be realized by multiple physical entities. Certain components in individual devices are implemented together.
  • the hardware units or modules may be implemented mechanically or electrically.
  • a hardware unit, module or processor may include permanently dedicated circuitry or logic (such as a dedicated processor, FPGA or ASIC) to perform the corresponding operations.
  • the hardware unit or processor may also include programmable logic or circuits (such as a general-purpose processor or other programmable processors), which can be temporarily configured by software to complete corresponding operations.
  • the specific implementation mechanical way, or a dedicated permanent circuit, or a temporary circuit

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种用于生成应用于基准测试的图数据的方法和装置。经由顶点生成框架创建多个实体顶点以及各个实体顶点的对应实体账户顶点(110),并且在各个实体顶点以及对应的实体账户顶点之间创建拥有关系(120)。经由顶点分块框架根据所创建的实体账户顶点确定起点实体账户顶点集和终点实体账户顶点集(130),起点实体账户顶点集和终点实体账户顶点集之间不具有重合的实体账户顶点。然后,经由顶点关系生成框架基于起点实体账户顶点集和终点实体账户顶点集,创建实体账户顶点之间的账户关联关系(140)。

Description

图数据生成的方法及装置 技术领域
本说明书实施例通常涉及基准测试领域,尤其涉及应用于基准测试的图数据生成的方法及装置。
背景技术
随着图计算技术逐渐成熟,图数据库和图计算被越来越广泛地应用于金融、客服、医疗等领域,尤其是金融领域。在基于图数据实现的应用投入使用之前,需要使用图数据来对该应用进行基准测试,并且只有通过基准测试后的应用才被允许投入使用。如何高效地生成用于基准测试的图数据成为亟待解决的问题。
发明内容
鉴于上述,本说明书实施例提供用于生成应用于基准测试的图数据的方法及装置。利用该方法及装置,可以高效地生成用于基准测试的图数据。
根据本说明书实施例的一个方面,提供一种用于生成应用于基准测试的图数据的方法,包括:创建多个实体顶点以及各个实体顶点的对应实体账户顶点;在各个实体顶点以及对应的实体账户顶点之间创建拥有关系;根据所创建的实体账户顶点确定起点实体账户顶点集和终点实体账户顶点集,所述起点实体账户顶点集和所述终点实体账户顶点集之间不具有重合的实体账户顶点;以及基于所述起点实体账户顶点集和所述终点实体账户顶点集,创建实体账户顶点之间的账户关联关系。
在上述方面的一个示例中,每个实体账户顶点的账户顶点属性包括账户关联属性,所述方法还可以包括:基于各个实体账户顶点的账户关联属性创建账户属性顶点;以及根据账户关联属性来在各个账户属性顶点之间以及各个账户属性顶点与对应的实体账户顶点之间创建账户属性关系。
在上述方面的一个示例中,所述实体顶点包括个人顶点和组织顶点,所述实体账户顶点包括个人账户顶点和组织账户顶点,以及所述账户属性顶点包括账户注册地址、注册电话、登录网络地址和登录物理地址中的至少一个,其中,所述账户属性关系包括位于关系、电话注册关系、登录网络地址关系和登录物理地址关系中的至少一个。
在上述方面的一个示例中,所述方法还可以包括:获取实体顶点的顶点出度分布信息。此外,创建各个实体顶点的对应实体账户顶点可以包括:根据所述顶点出度分布信息,创建各个实体顶点的对应实体账户顶点。
在上述方面的一个示例中,每个实体账户顶点的账户顶点属性包括顶点出度和顶点入度,基于所述起点实体账户顶点集和所述终点实体账户顶点集创建实体账户顶点之间的账户关联关系可以包括:根据所述起点实体账户顶点集中的各个起点实体账户顶点的顶点出度以及所述终点实体账户顶点集中的各个终点实体账户顶点的顶点入度,确定各个起点实体账户顶点和各个终点实体账户顶点的选中概率;基于各个起点实体账户顶点和各个终点实体账户顶点的选中概率,从所述起点实体账户顶点集和所述终点实体账户顶点集中选择至少一个起点实体账户顶点以及对应的终点实体账户顶点;计算所选择的起点实体账户顶点和对应的终点实体账户顶点之间的属性距离;基于所计算出的属性距离,确定所选择的起点实体账户顶点和对应的终点实体账户顶点之间的关系创建概率;以及根据所述关系创建概率,在所选择的起点实体账户顶点和对应的终点实体账户顶点之间创建账户关联关系。
在上述方面的一个示例中,所述账户关联关系的创建过程被循环执行,直到未创建出新的账户关联关系为止,其中,每次循环过程所使用的关系创建概率通过对上一循环过程的关系创建概率进行衰减处理得到。
在上述方面的一个示例中,从所述起点实体账户顶点和对应的终点实体账户顶点的选择过程到所述账户关联关系的创建过程被循环执行,直到所创建的账户关联关系的数目达到预定数目。
在上述方面的一个示例中,所述方法还可以包括:获取实体账户顶点的顶点出度/入度分布信息;以及根据所述顶点出度/入度分布信息,确定各个实体账户顶点的顶点出度和顶点入度。
在上述方面的一个示例中,所述方法还可以包括:获取社交网络出度/入度分布信息;以及根据所述社交网络出度/入度分布信息来在所述实体顶点之间创建认识/从属关系。此外,基于所计算出的属性距离,确定所选择的起点实体账户顶点和终点实体账户顶点之间的关系创建概率可以包括:基于所计算出的属性距离以及所选择的起点实体账户顶点和终点实体账户顶点各自所属实体顶点之间的认识/从属关系,确定所选择的起点实体账户顶点和终点实体账户顶点之间的关系创建概率。
在上述方面的一个示例中,根据所述顶点出度分布信息,创建所述多个实体顶点的对应实体账户顶点可以包括:根据所述顶点出度分布信息,创建各个实体顶点的对应实体账户顶点以及业务申请顶点;以及在各个业务申请顶点与对应的实体顶点之间创建申请关系。
在上述方面的一个示例中,所述方法还可以包括:从所述多个实体顶点中抽取多个第一实体顶点。此外,创建各个实体顶点的对应实体账户顶点可以包括:创建各个第一实体顶点的对应实体账户顶点。
根据本说明书的另一实施例,提供一种用于生成应用于基准测试的图数据的方法,包括:经由各个顶点生成框架,分别创建多个实体顶点;经由顶点分块框架,从所创建的实体顶点中为各个顶点生成框架抽取多个第一实体顶点;经由各个顶点生成框架,分别创建所抽取的各个第一实体顶点的对应实体账户顶点,并且在各个实体账户顶点与对应的实体顶点之间创建拥有关系;经由所述顶点分块框架来从所创建的实体账户顶点中为各个顶点关系生成框架抽取起点实体账户顶点集和终点实体账户顶点集;以及经由各个顶点关系生成框架,分别基于所抽取的起点实体账户顶点集和终点实体账户顶点集,创建实体账户顶点之间的账户关联关系。
在上述方面的一个示例中,每个实体账户顶点的账户顶点属性包括账户关联属性,所述方法还可以包括:经由各个顶点生成框架,基于各自的实体账户顶点的账户关联属性创建账户属性顶点,并且基于账户关联属性来在各个账户属性顶点之间以及各个账户属性顶点与对应的实体账户顶点之间创建账户属性关系。
在上述方面的一个示例中,从所述顶点分块框架的实体顶点抽取过程到所述各个顶点关系生成框架的账户关联关系创建过程被循环执行。
在上述方面的一个示例中,所述顶点分块框架的顶点抽取过程是不放回抽取过程,并且直到所有顶点被抽取完毕为止。
在上述方面的一个示例中,每个实体账户顶点的账户顶点属性包括顶点出度和顶点入度,经由各个顶点关系生成框架,基于所述起点实体账户顶点集和所述终点实体账户顶点集创建实体账户顶点之间的账户关联关系可以包括:根据所述起点实体账户顶点集中的各个起点实体账户顶点的顶点出度以及所述终点实体账户顶点集中的各个终点实体账户顶点的顶点入度,确定各个起点实体账户顶点和各个终点实体账户顶点的选中概率;循环执行下述过程,直到所创建的账户关联关系达到第一预定数目M:基于各个起点实体账户顶点和各个终点实体账户顶点的选中概率,从所述起点实体账户顶点集和所述终点实体账户顶点集中选择至少一个起点实体账户顶点以及对应的终点实体账户顶点;计算所选择的起点实体账户顶点和终点实体账户顶点之间的属性距离;基于所计算出的属性距离,确定所选择的起点实体账户顶点和终点实体账户顶点之间的关系创建概率;以及基于所述关系创建概率来在所选择的起点实体账户顶点和终点实体账户顶点之间创建账户关联关系。
在上述方面的一个示例中,所述第一预定数目M=P/K,其中,P为所述多个实体账户顶点的总出度数量,以及K为循环执行次数。
在上述方面的一个示例中,所述账户关联关系的创建过程被循环执行,直到未创建出新的账户关联关系为止,其中,每次循环过程所使用的关系创建概率通过对上一循环过程的关系创建概率进行衰减处理得到。
在上述方面的一个示例中,所述方法还可以包括:经由各个顶点生成框架的对应数据分布接口获取实体账户顶点的顶点出度/入度分布信息;以及经由各个顶点生成框架根据所获取的顶点出度/入度分布信息,确定各个实体账户顶点的顶点出度和顶点入度。
在上述方面的一个示例中,所述方法还可以包括:经由各个顶点生成框架的对应数据分布接口获取社交网络出度/入度分布信息;以及经由各个顶点生成框架根据所获取社交网络出度/入度分布信息,在所述实体顶点之间创建认识/从属关系。此外,基于所计算出的属性距离,确定所选择的起点实体账户顶点和终点实体账户顶点之间的关系创建概率可以包括:基于所计算出的属性距离以及所选择的起点实体账户顶点和终点实体账户顶点各自所属实体顶点之间的认识/从属关系,确定所选择的起点实体账户顶点和终点实体账户顶点之间的关系创建概率。
在上述方面的一个示例中,所述方法还可以包括:经由各个顶点生成框架的对应数据分布接口获取实体顶点的顶点出度分布信息,以及经由各个顶点生成框架根据所获取的顶点出度分布信息,确定各个实体顶点的顶点出度。此外,经由各个顶点生成框架,分别创建所抽取的各个第一实体顶点的对应实体账户顶点可以包括:经由各个顶点生成框架,分别基于所抽取的各个第一实体顶点的顶点出度,创建所述各个第一实体顶点的对应实体账户顶点。
根据本说明书的实施例的另一方面,提供一种用于生成应用于基准测试的图数据的装置,包括:顶点生成单元,创建多个实体顶点以及各个实体顶点的对应实体账户顶点;拥有关系生成单元,在各个实体顶点以及对应的实体账户顶点之间创建拥有关系;顶点分块单元,根据所创建的实体账户顶点确定起点实体账户顶点集和终点实体账户顶点集,所述起点实体账户顶点集和所述终点实体账户顶点集之间不具有重合的实体账户顶点;以及关联关系生成单元,基于所述起点实体账户顶点集和所述终点实体账户顶点集,创建实体账户顶点之间的账户关联关系。
根据本说明书的实施例的另一方面,提供一种用于生成应用于基准测试的图数据的装置,包括:至少两个顶点生成框架,每个顶点生成框架部署在一个第一设备处;至少两个顶点关系生成框架,每个顶点关系生成框架部署在一个第二设备处;以及顶点分块框架,部署在第三设备处,其中,各个顶点生成框架被配置为:创建多个实体顶点;创建所述顶点分块框架所抽取的各个第一实体顶点的对应实体账户顶点;以及在各个实体账户顶点和对应的实体顶点之间创建拥有关系,所述顶点分块框架被配置为从所创建的实体顶点中为各个顶点生成框架抽取多个第一实体顶点;以及从所创建的实体账户顶点中为各个顶点关系生成框架抽取起点实体账户顶点集和终点实体账户顶点集,各个顶点关系生成框架被配置为基于所抽取的起点实体账户顶点集和终点实体账户顶点集,创建实体账户顶点之间的账户关联关系。
在上述方面的一个示例中,所述装置还可以包括:部署在各个第一设备处的数据分布接口,获取顶点出度信息,其中,各个实体顶点的顶点出度基于对应的顶点出度分布信息确定。
在上述方面的一个示例中,每个实体账户顶点的账户顶点属性包括顶点出度和顶点入度。各个顶点关系生成框架被配置为:根据所述起点实体账户顶点集中的各个起点实体账户顶点的顶点出度及所述终点实体账户顶点集中的各个终点实体账户顶点的顶点入度,确定各个起点实体账户顶点和各个终点实体账户顶点的选中概率;循环执行下述过程,直到所创建的账户关联关系达到第一预定数目M:基于各个起点实体账户顶点和各个终点实体账户顶点的选中概率,从所述起点实体账户顶点集和所述终点实体账户顶点集中选择出至少一个起点实体账户顶点及对应的终点实体账户顶点;计算所选择的起 点实体账户顶点和终点实体账户顶点之间的属性距离;基于计算出的属性距离,确定所选择的起点实体账户顶点和终点实体账户顶点之间的关系创建概率;以及基于所述关系创建概率来在所选择的起点实体账户顶点和终点实体账户顶点之间创建账户关联关系。
在上述方面的一个示例中,所述装置还可以包括:部署在各个第一设备处的数据分布接口,获取实体账户顶点的顶点出度/入度分布信息;其中,各个实体账户顶点的顶点出度和顶点入度根据对应的顶点出度/入度分布信息确定。
在上述方面的一个示例中,所述装置还可以包括:部署在各个第一设备处的数据分布接口,获取社交网络出度/入度分布信息;各个顶点生成框架根据所获取的社交网络出度/入度分布信息来在所述实体顶点之间创建认识/从属关系,并且基于所计算出的属性距离以及所选择的起点实体账户顶点和终点实体账户顶点各自所属实体顶点之间的认识/从属关系,确定所选择的起点实体账户顶点和终点实体账户顶点之间的关系创建概率。
在上述方面的一个示例中,所述多个第一设备中的部分第一设备或每个第一设备分别与所述多个第二设备中的一个第二设备相同,和/或所述第三设备与所述多个第一设备和/或所述多个第二设备中的一个设备相同。
根据本说明书的实施例的另一方面,提供一种用于生成应用于基准测试的图数据的***,包括:至少两个第一设备,每个第一设备部署有顶点生成框架;至少两个第二设备,每个第二设备部署有顶点关系生成框架;以及第三设备,部署有顶点分块框架。各个顶点生成框架被配置为:创建多个实体顶点;创建所述顶点分块框架所抽取的各个第一实体顶点的对应实体账户顶点;以及在各个实体账户顶点与对应的实体顶点之间创建拥有关系。所述顶点分块框架被配置为从所创建的实体顶点中为各个顶点生成框架抽取多个第一实体顶点;以及从所创建的实体账户顶点中为各个顶点关系生成框架抽取起点实体账户顶点集和终点实体账户顶点集。各个顶点关系生成框架被配置为基于所抽取的起点实体账户顶点集和终点实体账户顶点集,创建实体账户顶点之间的账户关联关系。
根据本说明书的实施例的另一方面,提供一种用于生成应用于基准测试的图数据的装置,包括:至少一个处理器,与所述至少一个处理器耦合的存储器,以及存储在所述存储器中的计算机程序,所述至少一个处理器执行所述计算机程序来实现如上所述的方法。
根据本说明书的实施例的另一方面,提供一种计算机可读存储介质,其存储有可执行指令,所述指令当被执行时使得处理器执行如上所述的方法。
根据本说明书的实施例的另一方面,提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行来实现如上所述的方法。
附图说明
通过参照下面的附图,可以实现对于本说明书内容的本质和优点的进一步理解。在附图中,类似组件或特征可以具有相同的附图标记。
图1示出了根据本说明书的第一实施例的图数据生成方法的示例流程图。
图2示出了根据本说明书的第一实施例的账户关联关系创建过程的一个示例流程图。
图3示出了根据本说明书的第一实施例的账户关联关系创建过程的另一示例流程图。
图4示出了根据本说明书的第一实施例的图数据生成过程的示例示意图。
图5示出了根据本说明书的第一实施例的图数据的数据结构的示例示意图。
图6示出了根据本说明书的第一实施例的用于生成应用于基准测试的图数据的装置的方框图。
图7示出了根据本说明书的第二实施例的用于生成应用于基准测试的图数据的***的方框图。
图8示出了根据本说明书的第二实施例的图数据生成方法的示例流程图。
图9示出了根据本说明书的第二实施例的账户关联关系创建过程的示例流程图。
图10示出了根据本说明书的第二实施例的图数据生成装置的方框图。
图11示出了根据本说明书的第二实施例的顶点生成框架的示例方框图。
图12示出了根据本说明书的第二实施例的顶点关系生成框架的示例方框图。
图13示出了根据本说明书的实施例的基于计算机***实现的图数据生成装置的示例示意图。
具体实施方式
现在将参考示例实施方式讨论本文描述的主题。应该理解,讨论这些实施方式只是为了使得本领域技术人员能够更好地理解从而实现本文描述的主题,并非是对权利要求书中所阐述的保护范围、适用性或者示例的限制。可以在不脱离本说明书内容的保护范围的情况下,对所讨论的元素的功能和排列进行改变。各个示例可以根据需要,省略、替代或者添加各种过程或组件。例如,所描述的方法可以按照与所描述的顺序不同的顺序来执行,以及各个步骤可以被添加、省略或者组合。另外,相对一些示例所描述的特征在其它例子中也可以进行组合。
如本文中使用的,术语“包括”及其变型表示开放的术语,含义是“包括但不限于”。术语“基于”表示“至少部分地基于”。术语“一个实施例”和“一实施例”表示“至少一个实施例”。术语“另一个实施例”表示“至少一个其他实施例”。术语“第一”、“第二”等可以指代不同的或相同的对象。下面可以包括其他的定义,无论是明确的还是隐含的。除非上下文中明确地指明,否则一个术语的定义在整个说明书中是一致的。
在基于图数据实现的应用投入使用之前,需要使用图数据来对该应用进行基准测试,并且只有通过基准测试后的应用才被允许投入使用。基准测试是指通过设计科学的测试方法、测试工具和测试***,实现对一类测试对象的某项性能指标进行定量和可对比的测试。例如,对计算机CPU进行浮点运算、数据访问的带宽和延迟等指标的基准测试,可以使用户清楚地了解每款CPU的运算性能及作业吞吐能力是否满足应用程序的要求。对数据库管理***的ACID(Atomicity,Consistency,Isolation,Durability,原子性、一致性、独立性和持久性)、查询时间和联机事务处理能力等方面的性能指标进行基准测试,也有助于使用者挑选最符合自己需求的数据库***。
LDBC(Linked Data Benchmark Council)提出的LDBC SNB DATAGEN是一种基于社交网络的基准测试SNB(Social Network Benchmark)。LDBC SNB DATAGEN所生成的数据规模范围为100MB-1TB。然而,LDBC SNB DATAGEN所生成的数据场景过于定制化,不易修改,与一些应用场景(例如,金融应用场景)的需求相差较大。此外,LDBC SNB DATAGEN采用两个顶点属性的属性距离作为关系创建概率的影响因素,关系生成逻辑较为简单。此外,采用LDBC SNB DATAGEN方案,在由于计算机硬件物理瓶颈等因素导致在关系生成时会对顶点进行分块的情况下,会导致分块与分块之间的顶点无法生成关系。
鉴于上述,本说明书的实施例提供了一种用于生成应用于基准测试的图数据的方案。在该方案中,经由顶点生成框架创建多个实体顶点以及各个实体顶点的对应实体账户顶点,并且在各个实体顶点以及对应的实体账户顶点之间创建拥有关系。经由顶点分块框架根据所创建的实体账户顶点确定起点实体账户顶点集和终点实体账户顶点集,起点实体账户顶点集和终点实体账户顶点集之间不具有重合的实体账户顶点。然后,经由顶点关系生成框架基于起点实体账户顶点集和终点实体账户顶点集,创建实体账户顶点之间的账户关联关系。
在本说明书中,术语“账户”是指用于反映资产数据的增减变动情况及其结果的载体,例如,金融资产账户、数字资产账户或者其它类型的数据资产账户等。术语“账户数据”可以包括金融资产数据(例如,资金数据、借贷数据、负债数据等)、数字资产数据或者其它类型的资产数据等。术语“账户关联关系”是指两个账户之间可能发生的所有类型的关系,例如,账户数据转移关系、账户绑定关系、账户从属关系以及账户之间可以发生的其它类型的关联关系等。
下面参照图1到图12描述根据本说明书的实施例的图数据生成的***、方法及装置。
图1示出了根据本说明书的第一实施例的图数据生成方法100的示例流程图。图1中示出的图数据生成方法由图数据生成装置执行,该图数据生成装置的各个组件可以部署在同一设备处或不同设备处。
如图1所示,在110,创建多个实体顶点以及各个实体顶点的对应实体账户顶点。在一个示例中,每个实体顶点可具有实体顶点属性。实体顶点属性可包括顶点出度。相应地,可基于各个实体顶点的顶点出度创建对应的实体账户顶点。此外,实体顶点属性可包括实体标识。实体标识用于唯一标识实体顶点。实体标识例如可是全局唯一标识,例如,基于所对应的分块编号创建的全局唯一整数。在一个示例(例如,金融应用场景示例)中,实体可包括个人实体和组织实体。相应地,实体顶点可包括个人顶点(Person)和组织顶点(Organization)。在一个示例中,各个实体顶点的顶点出度可是预先设置的固定值。在另一示例中,各个实体顶点的顶点出度可基于例如经由数据分布接口输入的顶点出度分布信息确定。例如,可以基于顶点出度分布信息(例如,幂率分布)随机产生整数。在另一示例中,实体顶点属性还可包括顶点入度。相应地,各个实体顶点的顶点出度和顶点入度可预先设置,或者基于例如经由数据分布接口输入的顶点出度/入度分布信息确定。此外,实体顶点属性还可包括实体名称。例如,在实体顶点是个人顶点的情况下,实体名称可包括姓(First Name)和名(Last Name)。在实体顶点是组织顶点的情况下,实体名称可包括组织名称(Organization Name)。
在一个示例中,所创建的实体账户顶点可以包括个人账户顶点(PersonalAccount)和组织账户顶点(OrganizationalAccount)。此外,在一个示例中,每个实体账户顶点的账户顶点属性可以包括顶点标识、账户创建日期(CreateDate)和账户有效性标识(IsBlocked)等。账户有效性标识IsBlocked可以采用布尔值(Boolean)表示,用于指示账户是否有效。例如,可以采用布尔值“1”表示有效,以及布尔值“0”表示无效。在另一示例中,也可以反向表示。在一个示例中,CreateDate的取值DateTime可以通过随机生成器在限定时间范围内产生。IsBlocked的取值可以通过随机生成器产生。
此外,在另一示例中,针对各个实体顶点,还可以创建业务申请顶点。业务申请顶点的具体形式可以基于具体的应用场景决定。例如,在金融应用场景下,业务申请顶点的示例可以包括贷款申请(LoanApplication)顶点、融资申请顶点等。LoanApplication顶点的实体顶点属性可以具有顶点标识和LoanAmount。LoanAmount的取值为Decimal值。相应地,基于各个实体顶点的顶点出度为该各个实体顶点创建对应的实体账户顶点以及业务申请顶点。这里,实体账户顶点和业务申请顶点例如可以统称为实体关联顶点。
在120,在各个实体顶点以及对应的实体账户顶点之间创建拥有关系(Owe)。在另一示例中,在还创建有业务申请顶点的情况下,除了在各个实体账户顶点与对应的实体顶点之间创建拥有关系之外,还可以在各个业务申请顶点与对应的实体顶点之间创建申请关系(Apply)。该申请关系还可以具有关系属性(ApplyDate)。ApplyDate的取值通过随机生成器在限定时间范围内产生。
在另一示例中,每个实体账户顶点还可具有账户顶点属性。账户顶点属性可包括账户关联属性。在实体账户顶点包括个人账户顶点(PersonalAccount)和组织账户顶点(OrganizationalAccount)的情况下,账户关联属性的示例例如可包括但不限于账户注册地址、注册电话(Phone)、登录网络地址(IP)和登录物理地址(MAC)。账户注册地址例如可是账户注册城市(City)。登录网络地址(IP)例如可是登录账户时所使用的IP地址。登录物理地址(MAC)可是登录账户时所使用设备的设备物理地址,例如,MAC地址等。
个人账户PersonalAccount或组织账户OrganizationalAccount的注册电话(Phone)、登录网络地址(IP)、登录物理地址(MAC)和注册地址(City)会在创建个人账户或组织账户时创建。City的取值在城市数据资源库中随机抽取,Phone的取值在电话数 据资源库中随机抽取,IP地址的数量通过随机生成器生成,然后从网络地址数据资源库中随机抽取相应数量的IP地址。MAC地址的数量通过随机生成器生成,然后从物理地址数据资源库中随机抽取相应数量的MAC地址。
在一个示例中,在实体账户顶点具有账户关联属性的情况下,还可以基于各个实体账户顶点的账户关联属性创建账户属性顶点;并且根据账户关联属性来在各个账户属性顶点之间以及各个账户属性顶点与对应的实体账户顶点之间创建账户属性关系。账户属性关系的示例例如可以包括但不限于:位于关系(IsLocatedIn)、电话注册关系(SignUpDate)、登录网络地址关系(SignInWithIP)和登录物理地址关系(SignInWithMAC)中的至少一个。例如,在PersonalAccount与账户属性顶点IP之间创建账户属性关系SignInWithIP,该账户属性关系具有关系属性SignInDate。SignInDate的取值通过随机生成器在限定时间范围内产生。在PersonalAccount与账户属性顶点MAC之间创建账户属性关系SignInWithMAC,该账户属性关系具有关系属性SignInDate。SignInDate的取值通过随机生成器在限定时间范围内产生。在PersonalAccount与账户属性顶点Phone之间创建账户属性关系SignUpWithPhone,该账户属性关系具有关系属性SignUpDate。SignUpDate的取值通过随机生成器在限定时间范围内产生。在PersonalAccount与账户属性顶点City之间创建账户属性关系IsLocatedIn。在账户属性顶点Phone与账户属性顶点City之间创建账户属性关系IsLocatedIn。
在如上创建完实体账户顶点后,在130,根据所创建的实体账户顶点确定起点实体账户顶点集和终点实体账户顶点集,起点实体账户顶点集和终点实体账户顶点集之间不具有重合的实体账户顶点。在本说明书中,起点实体账户顶点作为图数据的边关系的起点,以及终点实体账户顶点作为图数据的边关系的终点。在一个示例中,可以将所创建的实体账户顶点分类为起点实体账户顶点集和终点实体账户顶点集。在另一示例中,也可以从所创建的实体账户顶点中抽取出起点实体账户顶点集和终点实体账户顶点集。在本说明书中,图数据是指有向图数据。
在140,基于起点实体账户顶点集和终点实体账户顶点集,创建实体账户顶点之间的账户关联关系,由此创建出所需的图数据。在本说明书中,两个账户的账户关联关系的示例例如可以包括但不限于账户数据转移关系、账户绑定关系以及账户之间可以发生的其它类型的关联关系等。账户数据转移关系的示例可以包括但不限于账户资金转移关系、借贷数据转移关系、负债数据转移关系等。在一个示例中,所创建的图数据可以是金融图数据,以及账户关联关系可以是转账关系。
在一个示例中,针对图1示出的图数据生成方法,还可从多个实体顶点中抽取多个第一实体顶点。然后,创建各个所抽取出的第一实体顶点的对应实体账户顶点。
图2示出了根据本说明书的第一实施例的账户关联关系创建过程200的一个示例流程图。在图2的示例中,实体账户顶点的账户顶点属性包括顶点出度和顶点入度。
如图2所示,在210,根据起点实体账户顶点集中的各个起点实体账户顶点的顶点出度以及终点实体账户顶点集中的各个终点实体账户顶点的顶点入度,确定各个起点实体账户顶点和各个终点实体账户顶点的选中概率。例如,对于起点实体账户顶点,基于该起点实体账户顶点的顶点出度除以该起点实体账户顶点集的总顶点出度,确定该起点实体账户顶点的选中概率。每个起点实体账户顶点集中的各个起点实体账户顶点的选中概率和为1。对于终点实体账户顶点,基于该终点实体账户顶点的顶点入度除以该终点实体账户顶点集的总顶点入度,确定该终点实体账户顶点的选中概率。每个终点实体账户顶点集中的各个终点实体账户顶点的选中概率和为1。在一个示例中,选中概率确定过程中所使用的顶点入度是终点实体账户顶点的顶点属性信息中的顶点入度。在另一示例中,选中概率确定过程中所使用的顶点入度是从终点实体账户顶点的顶点属性信息中的顶点入度中去除来自实体顶点的顶点入度之后得到的顶点入度。
在确定出各个起点实体账户顶点和各个终点实体账户顶点的选中概率后,在220, 基于各个起点实体账户顶点和各个终点实体账户顶点的选中概率,从起点实体账户顶点集和终点实体账户顶点集中选择至少一个起点实体账户顶点以及对应的终点实体账户顶点。这里,实体账户顶点的选择过程是基于选中概率的随机选择过程。所选择出的起点实体账户顶点可以包括一个或多个起点实体账户顶点,每个起点实体账户顶点包括一个对应的终点实体账户顶点。
在230,计算所选择的起点实体账户顶点和对应的终点实体账户顶点之间的属性距离。例如,在所选择的起点实体账户顶点和终点实体账户顶点之间存在多种相同类型的属性时,可以计算该多种相同类型的属性之间的属性距离D。例如,假设所选择的起点实体账户顶点和终点实体账户顶点都具有注册地址、注册电话、登录网络地址,可以基于注册地址、注册电话、登录网络地址分别计算出相应的属性距离D1到D3。
在240,基于所计算出的属性距离,确定所选择的各个起点实体账户顶点和对应的终点实体账户顶点之间的关系创建概率。例如,可以利用属性距离D与关系创建概率P之间的函数关系P=f(D)来确定。在属性距离包括多个属性距离的情况下,在一个示例中,可以基于多个属性距离确定出整合后的属性距离,然后基于该整合后的属性距离来确定关系创建概率。或者,可以基于函数关系P=f(D 1,...,D i)来确定关系创建概率,其中,i为属性个数。针对各个属性距离,还可以分配不同的权重,然后基于各个属性距离及其权重来确定关系创建概率。
在如上确定出各个起点实体账户顶点和对应的终点实体账户顶点之间的关系创建概率后,在250,根据关系创建概率,在所选择的各个起点实体账户顶点和对应的终点实体账户顶点之间创建账户关联关系。在本说明书中,所创建的账户关联关系例如可包括账户数据转移关系、账户绑定关系、账户从属关系以及账户之间可发生的其它类型的关联关系等。账户数据转移关系例如可是账户数据转移行为。例如,假设起点实体账户顶点是“张三”,终点实体账户顶点是“李四”,则实体账户顶点“张三”和“李四”之间的一条账户数据转移关系可是“张三在02月18日向李四转账XX元”。此外,相较于该条账户数据转移关系,“张三在08月20日向李四转账XX元”则是另一条账户数据转移关系。
在一个示例中,可以根据关系创建概率,在所选择的各个起点实体账户顶点和对应的终点实体账户顶点之间创建多条账户关联关系,使得所创建的账户关联关系达到预定数目条账户关联关系。
在另一示例中,上述账户关联关系的创建过程可以是循环过程。具体地,针对每个起点实体账户顶点和对应的终点实体账户顶点,将240中创建的关系创建概率作为初始关系创建概率,循环执行下述过程,直到未创建出账户关联关系为止:在每次循环时,基于当前关系创建概率来在起点实体账户顶点和对应的终点实体账户顶点创建账户关联关系。然后,判断当前是否创建出账户关联关系。如果当前创建出账户关联关系,则对当前循环过程所使用的关系创建概率进行衰减处理得到下一循环过程的当前关系创建概率,然后执行下一循环过程。如果当前未创建出账户关联关系,则循环结束。所述衰减处理的示例例如可以包括但不限于:根据线性衰减函数或者非线性衰减函数来对关系创建概率进行衰减处理。线性衰减函数或者非线性衰减函数的函数表达式可以是基于具体应用场景确定出的任何合适的函数表达式。
图3示出了根据本说明书的第一实施例的账户关联关系创建过程300的另一示例流程图。在图3的示例中,实体账户顶点的账户顶点属性包括顶点出度和顶点入度。
在310,根据起点实体账户顶点集中的各个起点实体账户顶点的顶点出度及终点实体账户顶点集中的各个终点实体账户顶点的顶点入度,确定各个起点实体账户顶点和各个终点实体账户顶点的选中概率。选中概率的确定过程可参考上面参照图2描述的过程。
在如上确定出各个起点实体账户顶点和各个终点实体账户顶点的选中概率后,循环执行320到380,直到所创建的账户关联关系达到预定数目。
具体地,在每次循环时,在320,基于各个起点实体账户顶点和各个终点实体账户 顶点的选中概率,从起点实体账户顶点集和终点实体账户顶点集中选择至少一个起点实体账户顶点以及对应的终点实体账户顶点。这里,实体账户顶点的选择过程是基于选中概率的随机选择过程。所选择出的起点实体账户顶点可以包括一个或多个起点实体账户顶点,每个起点实体账户顶点包括一个对应的终点实体账户顶点。
在330,计算所选择的起点实体账户顶点和对应的终点实体账户顶点之间的属性距离。属性距离的计算过程可以参考上面参照图2的230描述的过程。
在340,基于所计算出的属性距离,确定所选择的各个起点实体账户顶点和对应的终点实体账户顶点之间的初始关系创建概率。初始关系创建概率可以参考上面参照图2的240描述的过程。
在如上确定出各个起点实体账户顶点和对应的终点实体账户顶点之间的初始关系创建概率后,针对每个起点实体账户顶点和对应的终点实体账户顶点,循环执行350到370,直到未创建出账户关联关系。
具体地,在每次循环时,在350,根据当前关系创建概率,在所选择的各个起点实体账户顶点和对应的终点实体账户顶点之间创建账户关联关系。在首次循环时,当前关系创建概率是初始关系创建概率。接着,在360,判断当前是否创建出账户关联关系。如果当前创建出账户关联关系,则在370,对当前循环过程所使用的关系创建概率进行衰减处理得到下一循环过程的当前关系创建概率,然后返回到350,执行下一循环过程。如果当前未创建出账户关联关系,则过程进行到380。
在380,判断所创建出的账户关联关系的关系数目是否达到预定数目。如果达到预定数目,则流程结束。如果未达到预定数目,则返回到320,执行下一循环过程。
在另一示例中,在图2或图3中示出的账户关联关系创建过程中,还可以包括获取实体账户顶点的顶点出度/入度分布信息;并且根据所获取的顶点出度/入度分布信息,确定各个实体账户顶点的顶点出度和顶点入度。
在另一示例中,在图2或图3中示出的账户关联关系创建过程中,还可包括获取社交网络出度/入度分布信息;并且根据所获取的社交网络出度/入度分布信息来在实体顶点之间创建认识/从属关系。然后,在确定关系创建概率时,基于所计算出的属性距离以及所选择的起点实体账户顶点和终点实体账户顶点各自所属实体顶点之间的认识/从属关系,确定所选择的起点实体账户顶点和终点实体账户顶点之间的关系创建概率。
图4示出了根据本说明书的实施例的图数据生成过程400的示例示意图。图5示出了根据本说明书的实施例的图数据的数据结构的示例示意图。
如图4所示,在该图数据生成过程中,在顶点生成框架中创建实体顶点、实体账户顶点和账户属性顶点,并且实体顶点、实体账户顶点和账户属性顶点的创建机制不同。实体顶点的创建不需要任何数据输入。实体账户顶点的创建需要输入已经创建出的实体顶点,以及账户属性顶点的创建需要已经创建的实体账户顶点的账户关联属性。此外,在顶点生成框架中,还分别创建各个实体账户顶点与对应的实体顶点之间的拥有关系,以及各个账户属性顶点之间以及各个账户属性顶点与对应的实体账户顶点之间的账户属性关系。在顶点关系生成框架,创建各个实体账户顶点之间的账户关联关系,例如,转账关系(Transfer)。如图5中所示,转账关系具有关系属性TransferAmount。TransferAmount的取值为Decimal值。
图6示出了根据本说明书的第一实施例的用于生成应用于基准测试的图数据的装置600的方框图。如图6所示,装置600包括顶点生成单元610、拥有关系生成单元620、顶点分块单元630和关联关系生成单元640。
顶点生成单元610被配置为创建多个实体顶点以及各个实体顶点的对应实体账户顶点。顶点生成单元610的操作可以参考上面参照图1的110描述的操作。
拥有关系生成单元620被配置为在各个实体顶点以及对应的实体账户顶点之间创建拥有关系。拥有关系生成单元620的操作可以参考上面参照图1的120描述的操作。
顶点分块单元630被配置为根据所创建的实体账户顶点确定起点实体账户顶点集和终点实体账户顶点集,所述起点实体账户顶点集和所述终点实体账户顶点集之间不具有重合的实体账户顶点。顶点分块单元630的操作可参考上面参照图1的130描述的操作。
关联关系生成单元640被配置为基于起点实体账户顶点集和终点实体账户顶点集,创建实体账户顶点之间的账户关联关系。关联关系生成单元640的操作可以参考上面参照图1的140描述的操作以及参照图2或图3描述的操作。
在另一示例中,拥有关系生成单元620和关联关系生成单元640可以采用同一关系生成单元实现。
另一示例中,顶点分块单元630还可被配置为从多个实体顶点中抽取多个第一实体顶点。然后,顶点生成单元610创建各个所抽取出的第一实体顶点的对应实体账户顶点。
在另一示例中,顶点生成单元610还可以被配置为针对各个实体顶点创建业务申请顶点。相应地,装置600还可以包括申请关系生成单元(未示出)。申请关系生成单元被配置为在各个业务申请顶点与对应的实体顶点之间创建申请关系(Apply)。申请关系生成单元可以与拥有关系生成单元620和关联关系生成单元640采用同一单元实现,也可以采用不同单元实现。
在另一示例中,装置600还可以包括数据分布信息获取单元(未示出)。数据分布信息获取单元可以被配置为获取实体顶点的顶点出度分布信息。相应地,顶点生成单元610根据所获取的顶点出度分布信息,创建各个实体顶点的对应实体账户顶点。数据分布信息获取单元还可以被配置为获取实体账户顶点的顶点出度/入度分布信息。相应地,顶点生成单元610根据所获取的顶点出度/入度分布信息,确定各个实体账户顶点的顶点出度和顶点入度。数据分布信息获取单元还可以被配置为获取社交网络出度/入度分布信息。相应地,装置600还可以包括实体顶点关系生成单元(未示出)。实体顶点关系生成单元根据所获取的社交网络出度/入度分布信息来在实体顶点之间创建认识/从属关系。然后,关联关系生成单元640基于所计算出的属性距离以及所选择的起点实体账户顶点和终点实体账户顶点各自所属实体顶点之间的认识/从属关系,确定所选择的起点实体账户顶点和终点实体账户顶点之间的关系创建概率。同样,实体顶点关系生成单元可以与申请关系生成单元、拥有关系生成单元620和关联关系生成单元640采用同一单元实现,也可以采用不同单元实现。
利用本说明书的第一实施例示出的图数据生成方案,可以生成具有真实图数据结构的测试图数据,由此应用于基准测试。该图数据生成方案尤其适用于生成金融图数据。
图7示出了根据本说明书的第二实施例的用于生成应用于基准测试的图数据的***700的方框图。
如图7所示,***700包括M个第一设备710-1到710-M、N个第二设备720-1到720-N以及第三设备730。这里,M和N的取值可以相同,也可以不同。M和N的具体取值可以根据具体的应用场景决定,例如,可以基于应用场景所需要生成的图数据规模决定。第一设备、第二设备和第三设备可以是任意类型的具有计算能力或处理能力的服务器设备或终端设备。例如,服务器设备的示例可以包括但不限于:单台服务器、服务器集群、云端服务器或云端服务器集群等。终端设备的示例可以包括但不限于:智能手机、个人电脑(personal computer,PC)、笔记本电脑、平板电脑、电子阅读器、网络电视、可穿戴设备等智能终端设备中的任一种。
第一设备、第二设备和第三设备之间可以直接通信或者经由网络通信来进行数据传输。在一些实施例中,网络可以是有线网络或无线网络中的任意一种或多种。网络的示例例如可以包括但不限于电缆网络、光纤网络、电信网络、企业内部网络、互联网、局域网络(LAN)、广域网络(WAN)、无线局域网络(WLAN)、城域网(MAN)、公共交换电话网络(PSTN)、蓝牙网络、紫蜂网络(ZigZee)、近场通讯(NFC)、设备内总线、设备内线路等或其任意组合。
第一设备710-1到710-M中的每个第一设备可以部署有数据分布接口711和顶点生成框架712。第二设备720-1到720-N中的每个第二设备可以部署有顶点关系生成框架721。第三设备730可以部署有顶点分块框架731。在本说明书中,术语“框架”可以等同于“单元”、“模块”、“平台”等。
数据分布接口711可以被配置为获取(例如,供用户输入)顶点出度分布信息或者顶点出度/入度分布信息。这里,顶点出度是指以该顶点为起点的边的数量。顶点入度是指以该顶点为终点的边的数量。顶点出度分布信息可以被顶点生成框架712使用来确定所创建的各个实体顶点的顶点出度。此外,数据分布接口711还可以被配置为获取实体账户顶点的顶点出度/入度分布信息。相应地,顶点生成框架712根据实体账户顶点的顶点出度/入度分布信息,确定各个实体账户顶点的顶点出度和顶点入度。此外,数据分布接口711还可以被配置为获取社交网络出度/入度分布信息。所获取的社交网络出度/入度分布信息被顶点生成框架712使用来在所创建的实体顶点之间创建认识/从属关系。
第一设备710-1到710-M中的每个第一设备可以对应于顶点分块框架731所分块出的多个顶点分块中的每个顶点分块,每个第一设备中的顶点生成框架712被配置为对从顶点分块框架731接收的顶点分块进行处理。
具体地,各个第一设备上的顶点生成框架712被配置为创建多个实体顶点。各个顶点生成框架712所创建的实体顶点可以发送给顶点分块框架731,也可以存储到相同的数据存储空间(数据存储器或数据存储单元)中,以供顶点分块框架731从该数据存储空间获取。
顶点分块框架731被配置为从所创建的实体顶点中为各个顶点生成框架712抽取实体顶点分块,每个顶点生成框架712对应一个实体顶点分块,每个实体顶点分块包括多个第一实体顶点。这里,顶点分块框架731所执行的实体顶点抽取是不放回的随机抽取,并且每次抽取处理时,需要将所创建的所有实体顶点都抽取完毕为止。例如,假设各个顶点生成框架所创建的实体顶点为100个实体顶点,并且顶点生成框架的数目为10,则顶点分块框架731需要执行10次随机抽取处理,将该100个实体顶点抽取为10个实体顶点分块,每个实体顶点分块所包括的实体顶点的数目可以相同或者不同。而且,在随机抽取处理时,前一抽取处理所抽取出的实体顶点不再放回当前抽取处理的实体顶点池。所抽取出的10个实体顶点分块例如可以被分发到各个顶点生成框架712。
在各个顶点生成框架712得到顶点分块框架731抽取出的多个第一实体顶点(实体顶点分块)后,各个顶点生成框架712还被配置为基于所得到的各个第一实体顶点的顶点出度为该各个第一实体顶点创建对应的实体账户顶点。此外,在另一示例中,各个顶点生成框架712还可以生成业务申请顶点。业务申请顶点的具体形式可以基于具体的应用场景决定。例如,在金融应用场景下,业务申请顶点的示例可以包括贷款申请(LoanApplication)顶点、融资申请顶点等。相应地,各个顶点生成框架712被配置为基于所得到的各个第一实体顶点的顶点出度为该各个第一实体顶点创建对应的实体账户顶点以及业务申请顶点。这里,实体账户顶点和业务申请顶点例如可以统称为实体关联顶点。同样,所创建的实体账户顶点可以发送给顶点分块框架731,也可以存储到相同的数据存储空间中,以供顶点分块框架731从该数据存储空间获取。
在如上创建出实体账户顶点后,各个顶点生成框架712被配置为在各个实体账户顶点与对应的第一实体顶点之间创建拥有关系(Owe)。在另一示例中,在各个顶点生成框架712还创建业务申请顶点的情况下,除了在各个实体账户顶点与对应的第一实体顶点之间创建拥有关系之外,各个顶点生成框架712还在各个业务申请顶点与对应的第一实体顶点之间创建申请关系(Apply)。
此外,在各个实体账户顶点具有账户关联属性的情况下,各个顶点生成框架712还被配置为基于各个实体账户顶点的账户关联属性创建账户属性顶点,并且基于账户关联属性,在各个账户属性顶点之间以及各个账户属性顶点与对应的实体账户顶点之间创 建账户属性关系。
在各个顶点生成框架712创建完实体账户顶点后,顶点分块框架731还可以被配置为从所创建的实体账户顶点中为各个顶点关系生成框架抽取起点实体账户顶点集和终点实体账户顶点集。同样,顶点分块框架731的起点实体账户顶点集和终点实体账户顶点集的抽取过程是不放回抽取。此外,顶点分块框架的上述抽取过程可以是直到所有实体账户顶点被抽取完为止。
在得到起点实体账户集和终点实体账户集后,各个顶点关系生成框架721被配置为基于所接收的起点实体账户顶点集和终点实体账户顶点集来创建实体账户顶点之间的账户关联关系,由此创建出所需的图数据。在一个示例中,所述图数据可以是金融图数据,以及账户关联关系可以是转账关系。顶点关系生成框架721的账户关联关系创建过程将会在下面参照附图具体描述。
在本说明书的其它实施例中,各个第一设备也可以不包括数据分布接口711。
此外,在图7的示例中,第一设备、第二设备和第三设备被示出为不同的设备。在本说明书的其它实施例中,第一设备710-1到710-M中的部分第一设备或每个第一设备可以分别与第二设备720-1到720-N中的一个第二设备相同。换言之,一个设备上可以同时部署顶点生成框架和顶点关系生成框架。在另一示例中,第三设备730可以与第一设备710-1到710-M和/或第二设备720-1到720-N中的一个设备相同。换言之,一个设备上可以同时部署顶点生成框架和顶点分块框架,同时部署顶点关系生成框架和顶点分块框架,或者同时部署顶点生成框架、顶点关系生成框架和顶点分块框架。
图8示出了根据本说明书的实施例的图数据生成方法800的示例流程图。
如图8所示,在810,在各个第一设备的顶点生成框架处,分别创建多个实体顶点。在一个示例中,每个实体顶点的实体顶点属性可以包括顶点出度。这里,各个实体顶点的顶点出度可以是基于经由该顶点生成框架所位于的第一设备处的数据分布接口获取的顶点出度分布信息确定出。在一个示例中,所创建的实体顶点可以发送给顶点分块框架,也可以存储到共同的数据存储空间中,以供顶点分块框架获取。在实体顶点的实体顶点属性包括顶点出度和顶点入度的情况下,可以经由该顶点生成框架所位于的第一设备处的数据分布接口获取顶点出度/入度分布信息。
在各个顶点生成框架创建出实体顶点后,循环执行820到860的操作过程,直到循环执行预定次数,例如,K次。
具体地,在每次循环过程中,在820,第三设备处的顶点分块框架从所创建的实体顶点中为各个顶点生成框架抽取实体顶点分块,每个顶点生成框架对应一个实体顶点分块,每个实体顶点分块包括多个第一实体顶点。在顶点分块框架与顶点生成框架位于不同设备主体的情况下,顶点分块框架所抽取的多个第一实体顶点分块可以分发到对应的顶点生成框架。要说明的是,在每次循环过程中,用于实体顶点抽取的实体顶点包括步骤810中创建的所有实体顶点。此外,顶点分块框架的实体顶点抽取过程采用如上参照图7所述的实体顶点抽取过程。
在830,在各个顶点生成框架处,分别基于所抽取的各个第一实体顶点的顶点出度为各个第一实体顶点创建对应的实体账户顶点,并且在各个实体账户顶点与对应的实体顶点之间创建拥有关系。在一个示例中,所创建的实体账户顶点可以发送给顶点分块框架,也可以存储到共同的数据存储空间中,以供顶点分块框架获取。
在840,顶点分块框架从所创建的实体账户顶点中为各个顶点关系生成框架抽取起点实体账户顶点集和终点实体账户顶点集。
在850,在各个顶点关系生成框架处,分别基于所抽取的起点实体账户顶点集和终点实体账户顶点集来创建实体账户顶点之间的账户关联关系。账户关联关系的创建过程将在下面参照图9详细描述。
在860,判断是否达到预定循环次数(例如,K次)。如果达到预定循环次数,则 流程结束。如果未达到预定循环次数,则返回到820,执行下一循环过程。
针对图8描述的图数据生成方法,也可以采用与针对图1描述的图数据生成方法的修改相应的修改方式进行修改。
图9示出了根据本说明书的实施例的账户关联关系创建过程850的示例流程图。该账户关联关系创建过程是单个顶点关系生成框架所执行的过程。
如图8所示,在851,根据起点实体账户顶点集中的各个起点实体账户顶点的顶点出度以及终点实体账户顶点集中的各个终点实体账户顶点的顶点入度,确定各个起点实体账户顶点和各个终点实体账户顶点的选中概率。
在如上确定出各个起点实体账户顶点和各个终点实体账户顶点的选中概率后,循环执行852到858,直到所创建的账户关联关系达到第一预定数目M。在一个示例中,在从顶点分块框架的实体顶点抽取过程到各个顶点关系生成框架的账户关联关系创建过程被循环执行K次时,第一预定数目M=P/K,其中,P为所创建的多个实体账户顶点(所有实体账户顶点)的总出度数量。在另一示例中,P也可以是预定设置的用于指示需要创建的账户关联关系总数的预定值。
具体地,在每次循环过程中,在852,基于各个起点实体账户顶点和各个终点实体账户顶点的选中概率,从起点实体账户顶点集和终点实体账户顶点集中选择出至少一个起点实体账户顶点以及对应的终点实体账户顶点。在一个示例中,每次选择出一个起点实体账户顶点和一个终点实体账户顶点。在另一示例中,每次也可以选择出多个起点实体账户顶点以及对应的终点实体账户顶点。这里,实体账户顶点的选择过程是基于选中概率的随机选择过程。
在853,计算所选择的起点实体账户顶点和终点实体账户顶点之间的属性距离。属性距离的计算过程可以参考上面参照图2的230描述的过程。
在854,基于所计算出的属性距离D,确定所选择的起点实体账户顶点和终点实体账户顶点之间的初始关系创建概率。初始关系创建概率的确定过程可以参考上面参照图2的240描述的过程。
接着,循环执行855到857,直到未创建出新的账户关联关系为止。在855,基于当前关系创建概率来在所选择的起点实体账户顶点和终点实体账户顶点之间创建账户关联关系。在856,判断当前是否创建出账户关联关系。如果当前创建出账户关联关系,则在857,对当前循环过程所使用的关系创建概率进行衰减处理得到下一循环过程的当前关系创建概率,然后返回到855,执行下一循环过程。
如果当前未创建出账户关联关系,则进行到858。在858,判断所创建的账户关联关系的关系数量是否达到第一预定数目M。如果达到第一预定数目M,则流程进行到图8的860。如果未达到第一预定数目M,则返回到852,执行下一循环过程。
如上参照图7到图9,描述了根据本说明书的第二实施例的图数据生成方法。要说明的是,上述参照附图描述的实施例仅仅是例示性的,在其它实施例中,还可以对上述实施例进行各种适应性修改。
例如,在其它实施例中,还可以经由各个顶点生成框架的对应数据分布接口获取社交网络出度/入度分布信息。然后,在各个顶点生成框架处,根据所获取的社交网络出度/入度分布信息来在实体顶点之间创建认识/从属关系。例如,在个人顶点和/组织顶点之间创建认识/从属关系。相应地,在确定初始关系创建概率时,除了考虑所选择的起点实体账户顶点和终点实体账户顶点之间的属性距离之外,还需要考虑所选择的起点实体账户顶点和终点实体账户顶点各自所属实体顶点之间的认识/从属关系。换言之,基于所计算出的属性距离以及所选择的起点实体账户顶点和终点实体账户顶点各自所属实体顶点之间的认识/从属关系,确定所选择的起点实体账户顶点和终点实体账户顶点之间的初始关系创建概率。
此外,在图9的示例中,基于关系创建概率的账户关联关系的创建过程被示出为 循环过程。在其它实施例中,也可一次性创建出多条账户关联关系,而不执行循环过程。
下面将结合一个示例来说明根据本说明书的第二实施例的图数据生成过程。
该示例中,假设存在10个顶点生成框架,10个顶点关系生成框架及1个顶点分块框架。在10个顶点生成框架总共生成100个实体顶点后,执行5次循环过程来生成图数据。在每次循环时,顶点分块框架将所有100个实体顶点随机分块为10个实体顶点分块,每个实体顶点分块包括10个实体顶点。然后,顶点分块框架向每个顶点生成框架分发一个实体顶点分块。在接收到实体顶点分块后,各个顶点生成框架根据各个实体顶点的顶点出度创建对应的实体账户顶点,并且在所创建的实体账户顶点和对应的实体顶点之间创建拥有关系。
随后,顶点分块框架将所创建的所有实体账户顶点随机分块为10个实体账户分块,每个实体账户分块包括一个起点实体账户顶点集和一个终点实体账户顶点集。所分块出的各个实体账户顶点集之间不具有公共实体账户顶点。然后,顶点分块框架向每个顶点关系生成框架分发一个实体账户顶点分块。在接收到实体账户顶点分块后,各个顶点关系生成框架根据起点实体账户顶点集和终点实体账户顶点集来创建实体账户顶点之间的账户关联关系。如此循环5次,直到创建出预定数目条账户关联关系。
利用根据本说明书的第二实施例的图数据生成方案,顶点生成过程和顶点关系生成过程被分发到多个顶点生成框架和多个顶点关系生成框架中执行,从而使得可以容易地生成任一数据规模的图数据。此外,在上述图数据生成方案中,通过将与应用场景相关的顶点生成过程、顶点关系生成过程、属性关系生成过程和与应用场景无关的顶点分块过程部署在不同的处理框架上执行,从而将与应用场景相关的顶点生成过程、顶点关系生成过程、属性关系生成过程和与应用场景无关的数据分块过程解耦,从而使得应用场景修改和扩展成为可能。而且,在进行账户关联关系创建时,从所创建的实体账户顶点中为各个顶点关系生成框架抽取起点实体账户顶点集和终点实体账户顶点集,这种抽取过程是随机抽取,从而确保分块与分块之间的顶点可以生成关系。
此外,利用上述图数据生成方案,在创建账户关联关系时,通过确定出初始关系创建概率,基于该初始关系创建概率来创建出账户关联关系,并且在创建出账户关联关系后,衰减该初始关系创建概率来进一步创建账户关联关系,如此循环多次,由此使得所创建出的账户关联关系更加符合实际的应用场景。
图10示出了根据本说明书的实施例的图数据生成装置1000的方框图。如图10所示,图数据生成装置1000包括多个(例如,M个)数据分布接口1010、多个(例如,M个)顶点生成框架1020、多个(例如,N个)顶点关系生成框架1030和顶点分块框架1040。这里,M和N的取值可以相同,也可以不同。每个数据分布接口1010和一个顶点生成框架1020部署在一个第一设备上,以及每个顶点关系生成框架1030部署在一个第二设备上。顶点分块框架1040部署在第三设备上。
数据分布接口1010被配置为获取实体顶点的顶点出度分布信息。各个顶点生成框架1020被配置为创建多个实体顶点,每个实体顶点的实体顶点属性包括顶点出度,其中,各个实体顶点的顶点出度可以基于所获取的顶点出度分布信息确定。
顶点分块框架1040被配置为从所创建的实体顶点中为各个顶点生成框架抽取多个第一实体顶点。然后,各个顶点生成框架1020还被配置为基于顶点分块框架所抽取的各个第一实体顶点的顶点出度为该各个第一实体顶点创建对应实体账户顶点,并且在各个实体账户顶点与对应的第一实体顶点之间创建拥有关系。
顶点分块框架1040还被配置为从所创建的实体账户顶点中为各个顶点关系生成框架抽取起点实体账户顶点集和终点实体账户顶点集。
各个顶点关系生成框架1030被配置为基于所抽取的起点实体账户顶点集和终点实体账户顶点集来创建实体账户顶点之间的账户关联关系。
数据分布接口1010还可以被配置为获取实体账户顶点的顶点出度/入度分布信息。 相应地,各个顶点生成框架1020可以基于所获取的顶点出度/入度分布信息,确定各个实体账户顶点的顶点出度和顶点入度。
图11示出了根据本说明书的实施例的顶点生成框架1100的示例方框图。如图11所示,顶点生成框架1100包括实体顶点创建单元1110、实体顶点接收单元1120、关联顶点创建单元1130、账户属性顶点创建单元1140和关系创建单元1150。
实体顶点创建单元1110被配置为创建多个实体顶点。在一个示例中,可以经由数据分布接口获取实体顶点的顶点出度分布信息,并且实体顶点创建单元1110可以基于所获取的顶点分布信息确定各个实体顶点的顶点出度。
在顶点分块框架对所创建的多个实体顶点进行实体顶点抽取后,实体顶点接收单元1120被配置为从顶点分块框架接收对应的多个第一实体顶点。在顶点分块框架与顶点生成框架位于同一设备主体时,可以无需实体顶点接收单元1120。
关联顶点创建单元1130被配置为基于从顶点分块框架接收的各个第一实体顶点的顶点出度为该各个第一实体顶点创建对应的实体账户顶点。关系创建单元1150被配置为在所创建的实体账户顶点与对应的实体顶点之间创建拥有关系。
此外,在存在业务申请顶点的情况下,关联顶点创建单元1130被配置为基于从顶点分块框架接收的各个第一实体顶点的顶点出度为该各个第一实体顶点创建对应的实体账户顶点和业务申请顶点。相应地,关系创建单元1150被配置为在所创建的实体账户顶点与对应的实体顶点之间创建拥有关系以及在各个业务申请顶点与对应的实体顶点之间创建申请关系。
在实体账户顶点具有账户关联属性的情况下,账户属性顶点创建单元1140被配置为基于各个实体账户顶点的账户关联属性创建账户属性顶点。相应地,关系创建单元1150被配置为根据账户关系属性在各个账户属性顶点以及各个账户属性顶点与对应的实体账户顶点之间创建账户属性关系。在实体账户顶点不具有账户关联属性的情况下,可以无需账户属性顶点创建单元1140。
要说明的是,在其它实施例中,实体顶点创建单元1110、关联顶点创建单元1130和账户属性顶点创建单元1140中的部分单元或全部单元可以采用同一单元实现。
图12示出了根据本说明书的实施例的顶点关系生成框架1200的示例方框图。如图12所示,顶点关系生成框架1200包括选中概率确定单元1210、实体账户顶点选择单元1220、属性距离计算单元1230、关系创建概率确定单元1240和关系创建单元1250。
选中概率确定单元1210被配置为根据起点实体账户顶点集中的各个起点实体账户顶点的顶点出度以及终点实体账户顶点集中的各个终点实体账户顶点的顶点入度,确定各个起点实体账户顶点和各个终点实体账户顶点的选中概率。
实体账户顶点选择单元1220、属性距离计算单元1230、关系创建概率确定单元1240和关系创建单元1250循环执行操作,直到所创建的账户关联关系达到第一预定数目M。
具体地,在每次循环过程中,实体账户顶点选择单元1220被配置为基于各个起点实体账户顶点和各个终点实体账户顶点的选中概率,从起点实体账户顶点集和终点实体账户顶点集中选择出至少一个起点实体账户顶点以及对应的终点实体账户顶点。
属性距离计算单元1230被配置为计算所选择的起点实体账户顶点和终点实体账户顶点之间的属性距离。
关系创建概率确定单元1240被配置为基于所计算出的属性距离,确定所选择的起点实体账户顶点和终点实体账户顶点之间的初始关系创建概率。
关系创建单元1250被配置为循环执行下述过程,直到未创建出新的账户关联关系为止:基于当前关系创建概率来在所选择的起点实体账户顶点和终点实体账户顶点之间创建账户关联关系,其中,每次循环过程所使用的关系创建概率通过对上一循环过程的关系创建概率进行衰减处理得到。
此外,数据分布接口可以被配置为获取社交网络出度/入度分布信息。在这种情况 下,关系创建单元1250可以被配置为根据所获取的社交网络出度/入度分布信息来在实体顶点之间创建认识/从属关系。此外,关系创建概率确定单元1240被配置为基于所计算出的属性距离以及所选择的起点实体账户顶点和终点实体账户顶点各自所属实体顶点之间的认识/从属关系,确定所选择的起点实体账户顶点和终点实体账户顶点之间的初始关系创建概率。
在本说明书中,在一个示例中,顶点生成框架与顶点关系生成框架之间可以具有一一对应关系。在顶点生成框架与顶点关系生成框架之间具有一一对应关系的情况下,该顶点生成框架可以与对应的顶点关系生成框架部署在同一设备处。在这种情况下,关系创建单元1150也可以作为顶点关系生成框架的组件包含在顶点关系生成框架中,而不作为顶点生成框架的组件。
如上参照图1到图12,对根据本说明书实施例的图数据生成的方法、装置和***进行了描述。上面的图数据生成装置可以采用硬件实现,也可以采用软件或者硬件和软件的组合来实现。
图13示出了根据本说明书的实施例的基于计算机***实现的图数据生成装置1300的示意图。如图13所示,图数据生成装置1300可包括至少一个处理器1310、存储器(例如非易失性存储器)1320、内存1330和通信接口1340,且至少一个处理器1310、存储器1320、内存1330和通信接口1340经由总线1360连接在一起。至少一个处理器1310执行存储器中存储或编码的至少一个计算机可读指令(即上述以软件形式实现的元素)。
在一个实施例中,在存储器中存储计算机可执行指令,其当执行时使得至少一个处理器1310:创建多个实体顶点以及各个实体顶点的对应实体账户顶点;在各个实体顶点以及对应的实体账户顶点之间创建拥有关系;根据所创建的实体账户顶点确定起点实体账户顶点集和终点实体账户顶点集,起点实体账户顶点集和终点实体账户顶点集之间不具有重合的实体账户顶点;以及基于起点实体账户顶点集和终点实体账户顶点集,创建实体账户顶点之间的账户关联关系。
在另一实施例中,在存储器中存储计算机可执行指令,其当执行时使得至少一个处理器1310:经由各个顶点生成框架,分别创建多个实体顶点;经由顶点分块框架,从所创建的实体顶点中为各个顶点生成框架抽取多个第一实体顶点;经由各个顶点生成框架,分别创建所抽取的各个第一实体顶点的对应实体账户顶点,并且在各个实体账户顶点与对应的实体顶点之间创建拥有关系;经由顶点分块框架来从所创建的实体账户顶点中为各个顶点关系生成框架抽取起点实体账户顶点集和终点实体账户顶点集;以及经由各个顶点关系生成框架,分别基于所抽取的起点实体账户顶点集和终点实体账户顶点集,创建实体账户顶点之间的账户关联关系。
应该理解,在存储器中存储的计算机可执行指令当执行时使得至少一个处理器1310进行本说明书的各个实施例中以上结合图1-图12描述的各种操作和功能。
根据一个实施例,提供了一种比如机器可读介质(例如,非暂时性机器可读介质)的程序产品。机器可读介质可以具有指令(即,上述以软件形式实现的元素),该指令当被机器执行时,使得机器执行本说明书的各个实施例中以上结合图1-图12描述的各种操作和功能。具体地,可以提供配有可读存储介质的***或者装置,在该可读存储介质上存储着实现上述实施例中任一实施例的功能的软件程序代码,且使该***或者装置的计算机或处理器读出并执行存储在该可读存储介质中的指令。
在这种情况下,从可读介质读取的程序代码本身可实现上述任何一项实施例的功能,因此机器可读代码和存储机器可读代码的可读存储介质构成了本发明的一部分。
可读存储介质的实施例包括软盘、硬盘、磁光盘、光盘(如CD-ROM、CD-R、CD-RW、DVD-ROM、DVD-RAM、DVD-RW、DVD-RW)、磁带、非易失性存储卡和ROM。可选择地,可以由通信网络从服务器计算机上或云上下载程序代码。
根据一个实施例,提供一种计算机程序产品,该计算机程序产品包括计算机程序, 该计算机程序当被处理器执行时,使得处理器执行本说明书的各个实施例中以上结合图1-图12描述的各种操作和功能。
本领域技术人员应当理解,上面公开的各个实施例可以在不偏离发明实质的情况下做出各种变形和修改。因此,本发明的保护范围应当由所附的权利要求书来限定。
需要说明的是,上述各流程和各***结构图中不是所有的步骤和单元都是必须的,可以根据实际的需要忽略某些步骤或单元。各步骤的执行顺序不是固定的,可以根据需要进行确定。上述各实施例中描述的装置结构可以是物理结构,也可以是逻辑结构,即,有些单元可能由同一物理实体实现,或者,有些单元可能分由多个物理实体实现,或者,可以由多个独立设备中的某些部件共同实现。
以上各实施例中,硬件单元或模块可通过机械或电气方式实现。例如,一个硬件单元、模块或处理器可包括永久性专用的电路或逻辑(如专门的处理器,FPGA或ASIC)来完成相应操作。硬件单元或处理器还可包括可编程逻辑或电路(如通用处理器或其它可编程处理器),可由软件进行临时的设置以完成相应操作。具体的实现方式(机械方式、或专用的永久性电路、或者临时设置的电路)可基于成本和时间上的考虑来确定。
上面结合附图阐述的具体实施方式描述了示例性实施例,但并不表示可以实现的或者落入权利要求书的保护范围的所有实施例。在整个本说明书中使用的术语“示例性”意味着“用作示例、实例或例示”,并不意味着比其它实施例“优选”或“具有优势”。出于提供对所描述技术的理解的目的,具体实施方式包括具体细节。然而,可以在没有这些具体细节的情况下实施这些技术。在一些实例中,为了避免对所描述的实施例的概念造成难以理解,公知的结构和装置以框图形式示出。
本公开内容的上述描述被提供来使得本领域任何普通技术人员能够实现或者使用本公开内容。对于本领域普通技术人员来说,对本公开内容进行的各种修改是显而易见的,并且,也可以在不脱离本公开内容的保护范围的情况下,将本文所定义的一般性原理应用于其它变型。因此,本公开内容并不限于本文所描述的示例和设计,而是与符合本文公开的原理和新颖性特征的最广范围相一致。

Claims (32)

  1. 一种用于生成应用于基准测试的图数据的方法,包括:
    创建多个实体顶点以及各个实体顶点的对应实体账户顶点;
    在各个实体顶点以及对应的实体账户顶点之间创建拥有关系;
    根据所创建的实体账户顶点确定起点实体账户顶点集和终点实体账户顶点集,所述起点实体账户顶点集和所述终点实体账户顶点集之间不具有重合的实体账户顶点;以及
    基于所述起点实体账户顶点集和所述终点实体账户顶点集,创建实体账户顶点之间的账户关联关系。
  2. 如权利要求1所述的方法,其中,每个实体账户顶点的账户顶点属性包括账户关联属性,所述方法还包括:
    基于各个实体账户顶点的账户关联属性创建账户属性顶点;以及
    根据账户关联属性来在各个账户属性顶点之间和/或各个账户属性顶点与对应的实体账户顶点之间创建账户属性关系。
  3. 如权利要求2所述的方法,其中,所述实体顶点包括个人顶点和组织顶点,所述实体账户顶点包括个人账户顶点和组织账户顶点,以及所述账户属性顶点包括账户注册地址、注册电话、登录网络地址和登录物理地址中的至少一个,
    其中,所述账户属性关系包括位于关系、电话注册关系、登录网络地址关系和登录物理地址关系中的至少一个。
  4. 如权利要求1所述的方法,还包括:
    获取实体顶点的顶点出度分布信息,
    创建各个实体顶点的对应实体账户顶点包括:
    根据所述顶点出度分布信息,创建各个实体顶点的对应实体账户顶点。
  5. 如权利要求1所述的方法,其中,每个实体账户顶点的账户顶点属性包括顶点出度和顶点入度,基于所述起点实体账户顶点集和所述终点实体账户顶点集创建实体账户顶点之间的账户关联关系包括:
    根据所述起点实体账户顶点集中的各个起点实体账户顶点的顶点出度以及所述终点实体账户顶点集中的各个终点实体账户顶点的顶点入度,确定各个起点实体账户顶点和各个终点实体账户顶点的选中概率;
    基于各个起点实体账户顶点和各个终点实体账户顶点的选中概率,从所述起点实体账户顶点集和所述终点实体账户顶点集中选择至少一个起点实体账户顶点以及对应的终点实体账户顶点;
    计算所选择的起点实体账户顶点和对应的终点实体账户顶点之间的属性距离;
    基于所计算出的属性距离,确定所选择的起点实体账户顶点和对应的终点实体账户顶点之间的关系创建概率;以及
    根据所述关系创建概率,在所选择的起点实体账户顶点和对应的终点实体账户顶点之间创建账户关联关系。
  6. 如权利要求5所述的方法,其中,所述账户关联关系的创建过程被循环执行,直到未创建出新的账户关联关系为止,其中,每次循环过程所使用的关系创建概率通过对上一循环过程的关系创建概率进行衰减处理得到。
  7. 如权利要求5所述的方法,其中,从所述起点实体账户顶点和对应的终点实体账户顶点的选择过程到所述账户关联关系的创建过程被循环执行,直到所创建的账户关联关系的数目达到预定数目。
  8. 如权利要求5所述的方法,还包括:
    获取实体账户顶点的顶点出度/入度分布信息;以及
    根据所述顶点出度/入度分布信息,确定各个实体账户顶点的顶点出度和顶点入度。
  9. 如权利要求5所述的方法,还包括:
    获取社交网络出度/入度分布信息;以及
    根据所述社交网络出度/入度分布信息来在所述实体顶点之间创建认识/从属关系,
    基于所计算出的属性距离,确定所选择的起点实体账户顶点和终点实体账户顶点之间的关系创建概率包括:
    基于所计算出的属性距离以及所选择的起点实体账户顶点和终点实体账户顶点各自所属实体顶点之间的认识/从属关系,确定所选择的起点实体账户顶点和终点实体账户顶点之间的关系创建概率。
  10. 如权利要求4所述的方法,其中,根据所述顶点出度分布信息,创建所述多个实体顶点的对应实体账户顶点包括:
    根据所述顶点出度分布信息,创建各个实体顶点的对应实体账户顶点以及业务申请顶点;以及
    在各个业务申请顶点与对应的实体顶点之间创建申请关系。
  11. 如权利要求1所述的方法,还包括:
    从所述多个实体顶点中抽取多个第一实体顶点;
    创建各个实体顶点的对应实体账户顶点包括:
    创建各个第一实体顶点的对应实体账户顶点。
  12. 一种用于生成应用于基准测试的图数据的方法,包括:
    经由各个顶点生成框架,分别创建多个实体顶点;
    经由顶点分块框架,从所创建的实体顶点中为各个顶点生成框架抽取多个第一实体顶点;
    经由各个顶点生成框架,分别创建所抽取的各个第一实体顶点的对应实体账户顶点,并且在各个实体账户顶点与对应的实体顶点之间创建拥有关系;
    经由所述顶点分块框架,从所创建的实体账户顶点中为各个顶点关系生成框架抽取起点实体账户顶点集和终点实体账户顶点集;以及
    经由各个顶点关系生成框架,分别基于所抽取的起点实体账户顶点集和终点实体账户顶点集创建实体账户顶点之间的账户关联关系。
  13. 如权利要求12所述的方法,其中,每个实体账户顶点的账户顶点属性包括账户关联属性,所述方法还包括:
    经由各个顶点生成框架,基于各自的实体账户顶点的账户关联属性创建账户属性顶点;并且根据账户关联属性来在各个账户属性顶点之间以及各个账户属性顶点与对应的实体账户顶点之间创建账户属性关系。
  14. 如权利要求12所述的方法,其中,从所述顶点分块框架的实体顶点抽取过程到所述各个顶点关系生成框架的账户关联关系创建过程被循环执行。
  15. 如权利要求12所述的方法,其中,所述顶点分块框架的顶点抽取过程是不放回抽取过程,并且直到所有顶点被抽取完毕为止。
  16. 如权利要求12或14所述的方法,其中,每个实体账户顶点的账户顶点属性包括顶点出度和顶点入度,
    经由各个顶点关系生成框架,基于所述起点实体账户顶点集和所述终点实体账户顶点集创建实体账户顶点之间的账户关联关系包括:
    根据所述起点实体账户顶点集中的各个起点实体账户顶点的顶点出度以及所述终点实体账户顶点集中的各个终点实体账户顶点的顶点入度,确定各个起点实体账户顶点和各个终点实体账户顶点的选中概率;
    循环执行下述过程,直到所创建的账户关联关系达到第一预定数目M:
    基于各个起点实体账户顶点和各个终点实体账户顶点的选中概率,从所述起点实体账户顶点集和所述终点实体账户顶点集中选择至少一个起点实体账户顶点以及对应的终点实体账户顶点;
    计算所选择的起点实体账户顶点和终点实体账户顶点之间的属性距离;
    基于所计算出的属性距离,确定所选择的起点实体账户顶点和终点实体账户顶点之间的关系创建概率;以及
    基于所述关系创建概率来在所选择的起点实体账户顶点和终点实体账户顶点之间创建账户关联关系。
  17. 如权利要求16所述的方法,其中,所述第一预定数目M=P/K,其中,P为所述多个实体账户顶点的总出度数量,以及K为循环执行次数。
  18. 如权利要求16所述的方法,其中,所述账户关联关系的创建过程被循环执行,直到未创建出新的账户关联关系为止,其中,每次循环过程所使用的关系创建概率通过对上一循环过程的关系创建概率进行衰减处理得到。
  19. 如权利要求16所述的方法,还包括:
    经由各个顶点生成框架的对应数据分布接口获取实体账户顶点的顶点出度/入度分布信息;以及
    经由各个顶点生成框架,根据所获取的顶点出度/入度分布信息确定各个实体账户顶点的顶点出度和顶点入度。
  20. 如权利要求16所述的方法,还包括:
    经由各个顶点生成框架的对应数据分布接口获取社交网络出度/入度分布信息;以及
    经由各个顶点生成框架,根据所获取的社交网络出度/入度分布信息在所述实体顶点之间创建认识/从属关系,
    基于所计算出的属性距离,确定所选择的起点实体账户顶点和终点实体账户顶点之间的关系创建概率包括:
    基于所计算出的属性距离以及所选择的起点实体账户顶点和终点实体账户顶点各自所属实体顶点之间的认识/从属关系,确定所选择的起点实体账户顶点和终点实体账户顶点之间的关系创建概率。
  21. 如权利要求12所述的方法,还包括:
    经由各个顶点生成框架的对应数据分布接口获取实体顶点的顶点出度分布信息,及
    经由各个顶点生成框架,根据所获取的顶点出度分布信息确定各个实体顶点的顶点出度,
    经由各个顶点生成框架,分别创建所抽取的各个第一实体顶点的对应实体账户顶点包括:
    经由各个顶点生成框架,分别基于所抽取的各个第一实体顶点的顶点出度,创建所述各个第一实体顶点的对应实体账户顶点。
  22. 一种用于生成应用于基准测试的图数据的装置,包括:
    顶点生成单元,创建多个实体顶点以及各个实体顶点的对应实体账户顶点;
    拥有关系生成单元,在各个实体顶点以及对应的实体账户顶点之间创建拥有关系;
    顶点分块单元,根据所创建的实体账户顶点确定起点实体账户顶点集和终点实体账户顶点集,所述起点实体账户顶点集和所述终点实体账户顶点集之间不具有重合的实体账户顶点;以及
    关联关系生成单元,基于所述起点实体账户顶点集和所述终点实体账户顶点集,创建实体账户顶点之间的账户关联关系。
  23. 一种用于生成应用于基准测试的图数据的装置,包括:
    至少两个顶点生成框架,每个顶点生成框架部署在一个第一设备处;
    至少两个顶点关系生成框架,每个顶点关系生成框架部署在一个第二设备处;以及
    顶点分块框架,部署在第三设备处,
    其中,各个顶点生成框架被配置为:
    创建多个实体顶点;
    创建所述顶点分块框架所抽取的各个第一实体顶点的对应实体账户顶点;及
    在各个实体账户顶点和对应的实体顶点之间创建拥有关系;
    所述顶点分块框架被配置为从所创建的实体顶点中为各个顶点生成框架抽取多个第一实体顶点;以及从所创建的实体账户顶点中为各个顶点关系生成框架抽取起点实体账户顶点集和终点实体账户顶点集;
    各个顶点关系生成框架被配置为基于所抽取的起点实体账户顶点集和终点实体账户顶点集,创建实体账户顶点之间的账户关联关系。
  24. 如权利要求23所述的装置,还包括:
    部署在各个第一设备处的数据分布接口,获取顶点出度信息,
    其中,各个实体顶点的顶点出度基于对应的顶点出度分布信息确定。
  25. 如权利要求23所述的装置,其中,每个实体账户顶点的账户顶点属性包括顶点出度和顶点入度,
    各个顶点关系生成框架被配置为:
    根据所述起点实体账户顶点集中的各个起点实体账户顶点的顶点出度以及所述终点实体账户顶点集中的各个终点实体账户顶点的顶点入度,确定各个起点实体账户顶点和各个终点实体账户顶点的选中概率;
    循环执行下述过程,直到所创建的账户关联关系达到第一预定数目M:
    基于各个起点实体账户顶点和各个终点实体账户顶点的选中概率,从所述起点实体账户顶点集和所述终点实体账户顶点集中选择至少一个起点实体账户顶点以及对应的终点实体账户顶点;
    计算所选择的起点实体账户顶点和终点实体账户顶点之间的属性距离;
    基于所计算出的属性距离,确定所选择的起点实体账户顶点和终点实体账户顶点之间的关系创建概率;以及
    基于所述关系创建概率来在所选择的起点实体账户顶点和终点实体账户顶点之间创建账户关联关系。
  26. 如权利要求25所述的装置,还包括:
    部署在各个第一设备处的数据分布接口,获取实体账户顶点的顶点出度/入度分布信息;
    其中,各个实体账户顶点的顶点出度和顶点入度根据对应的顶点出度/入度分布信息确定。
  27. 如权利要求25所述的装置,还包括:
    部署在各个第一设备处的数据分布接口,获取社交网络出度/入度分布信息;
    各个顶点生成框架根据所获取的社交网络出度/入度分布信息来在所述实体顶点之间创建认识/从属关系,并且各个顶点关系生成框架基于所计算出的属性距离以及所选择的起点实体账户顶点和终点实体账户顶点各自所属实体顶点之间的认识/从属关系,确定所选择的起点实体账户顶点和终点实体账户顶点之间的关系创建概率。
  28. 如权利要求23所述的装置,其中,所述多个第一设备中的部分第一设备或每个第一设备分别与所述多个第二设备中的一个第二设备相同,和/或所述第三设备与所述多个第一设备和/或所述多个第二设备中的一个设备相同。
  29. 一种用于生成应用于基准测试的图数据的***,包括:
    至少两个第一设备,每个第一设备部署有顶点生成框架;
    至少两个第二设备,每个第二设备部署有顶点关系生成框架;以及
    第三设备,部署有顶点分块框架,
    其中,各个顶点生成框架被配置为:
    创建多个实体顶点;
    创建所述顶点分块框架所抽取的各个第一实体顶点的对应实体账户顶点;及
    在各个实体账户顶点与对应的实体顶点之间创建拥有关系,以及;
    所述顶点分块框架被配置为从所创建的实体顶点中为各个顶点生成框架抽取多个第一实体顶点;以及从所创建的实体账户顶点中为各个顶点关系生成框架抽取起点实体账户顶点集和终点实体账户顶点集;
    各个顶点关系生成框架被配置为基于所抽取的起点实体账户顶点集和终点实体账户顶点集,创建实体账户顶点之间的账户关联关系。
  30. 一种用于生成应用于基准测试的图数据的装置,包括:
    至少一个处理器,
    与所述至少一个处理器耦合的存储器,以及
    存储在所述存储器中的计算机程序,所述至少一个处理器执行所述计算机程序来实现如权利要求1到11中任一所述的方法或者实现如权利要求12到21中任一所述的方法。
  31. 一种计算机可读存储介质,其存储有可执行指令,所述指令当被执行时使得处理器执行如权利要求1到11中任一所述的方法或者实现如权利要求12到21中任一所述的方法。
  32. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行来实现如权利要求1到11中任一所述的方法或者实现如权利要求12到21中任一所述的方法。
PCT/CN2022/093771 2021-06-24 2022-05-19 图数据生成的方法及装置 WO2022267769A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110702337.7 2021-06-24
CN202110702337.7A CN113254351B (zh) 2021-06-24 2021-06-24 图数据生成方法及装置

Publications (1)

Publication Number Publication Date
WO2022267769A1 true WO2022267769A1 (zh) 2022-12-29

Family

ID=77189434

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/093771 WO2022267769A1 (zh) 2021-06-24 2022-05-19 图数据生成的方法及装置

Country Status (2)

Country Link
CN (1) CN113254351B (zh)
WO (1) WO2022267769A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254351B (zh) * 2021-06-24 2022-02-15 支付宝(杭州)信息技术有限公司 图数据生成方法及装置
CN113688068B (zh) * 2021-10-25 2022-02-15 支付宝(杭州)信息技术有限公司 图数据加载方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171519A (zh) * 2016-12-07 2018-06-15 阿里巴巴集团控股有限公司 业务数据的处理、账户识别方法及装置、计算机终端
US20180302430A1 (en) * 2017-04-14 2018-10-18 Microsoft Technology Licensing, Llc SYSTEM AND METHOD FOR DETECTING CREATION OF MALICIOUS new USER ACCOUNTS BY AN ATTACKER
CN110287688A (zh) * 2019-06-28 2019-09-27 京东数字科技控股有限公司 关联账号分析方法、装置和计算机可读存储介质
CN110517097A (zh) * 2019-09-09 2019-11-29 平安普惠企业管理有限公司 识别异常用户的方法、装置、设备及存储介质
CN113254351A (zh) * 2021-06-24 2021-08-13 支付宝(杭州)信息技术有限公司 图数据生成方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8924269B2 (en) * 2006-05-13 2014-12-30 Sap Ag Consistent set of interfaces derived from a business object model
CN107018000A (zh) * 2016-01-27 2017-08-04 阿里巴巴集团控股有限公司 账户关联方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171519A (zh) * 2016-12-07 2018-06-15 阿里巴巴集团控股有限公司 业务数据的处理、账户识别方法及装置、计算机终端
US20180302430A1 (en) * 2017-04-14 2018-10-18 Microsoft Technology Licensing, Llc SYSTEM AND METHOD FOR DETECTING CREATION OF MALICIOUS new USER ACCOUNTS BY AN ATTACKER
CN110287688A (zh) * 2019-06-28 2019-09-27 京东数字科技控股有限公司 关联账号分析方法、装置和计算机可读存储介质
CN110517097A (zh) * 2019-09-09 2019-11-29 平安普惠企业管理有限公司 识别异常用户的方法、装置、设备及存储介质
CN113254351A (zh) * 2021-06-24 2021-08-13 支付宝(杭州)信息技术有限公司 图数据生成方法及装置

Also Published As

Publication number Publication date
CN113254351A (zh) 2021-08-13
CN113254351B (zh) 2022-02-15

Similar Documents

Publication Publication Date Title
WO2022267769A1 (zh) 图数据生成的方法及装置
US11030340B2 (en) Method/system for the online identification and blocking of privacy vulnerabilities in data streams
CN107133309B (zh) 流程实例的存储、查询方法及装置、存储介质及电子设备
US10331669B2 (en) Fast query processing in columnar databases with GPUs
CN111046237B (zh) 用户行为数据处理方法、装置、电子设备及可读介质
US8694777B2 (en) Securely identifying host systems
CN111427971B (zh) 用于计算机***的业务建模方法、装置、***和介质
US20180077157A1 (en) Method and system for identifying user information in social network
US9830333B1 (en) Deterministic data replication with conflict resolution
CN113268336B (zh) 一种服务的获取方法、装置、设备以及可读介质
US11961039B2 (en) Linked blockchain structures for accelerated multi-chain verification
US20150066951A1 (en) Combined deterministic and probabilistic matching for data management
US10747763B2 (en) Efficient multiple aggregation distinct processing
US11362997B2 (en) Real-time policy rule evaluation with multistage processing
CN110737425B (zh) 一种计费平台***的应用程序的建立方法及装置
CN111767144A (zh) 交易数据的交易路由确定方法、装置、设备及***
CN112528067A (zh) 图数据库的存储方法、读取方法、装置及设备
CN115145587A (zh) 一种产品参数校验方法、装置、电子设备及存储介质
CN112528327A (zh) 数据脱敏方法及装置、数据还原方法及装置
CN111291084A (zh) 样本id对齐方法、装置、设备及存储介质
US9652766B1 (en) Managing data stored in memory locations having size limitations
CN107526530B (zh) 数据处理方法和设备
WO2022160443A1 (zh) 谱系挖掘方法、装置、电子设备及计算机可读存储介质
CN115329395A (zh) 数据库的数据处理方法、装置、***、设备及存储介质
US10922312B2 (en) Optimization of data processing job execution using hash trees

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22827267

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22827267

Country of ref document: EP

Kind code of ref document: A1