CN116932496A

CN116932496A - Method and device for determining common object, storage medium and electronic equipment

Info

Publication number: CN116932496A
Application number: CN202210356135.6A
Authority: CN
Inventors: 许杰; 蒋杰; 李晓森; 肖品; 欧阳文; 陶阳宇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-06
Filing date: 2022-04-06
Publication date: 2023-10-24

Abstract

The application discloses a method and a device for determining a common object, a storage medium and electronic equipment. Wherein the method comprises the following steps: acquiring a neighbor object identification set of each object in the object relation diagram; performing fragment compression on the neighbor object identification set of each object to obtain and store a group of compressed fragments corresponding to each object; decompressing a group of compressed fragments corresponding to each stored object to obtain a group of decompressed fragments corresponding to each object; sequencing and splicing the identifiers of the neighbor objects recorded in a group of decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object; and determining a common object corresponding to the two objects with the connection relationship in the object relationship graph according to the neighbor object identification sequence corresponding to each object. The method and the device solve the technical problem that the determination efficiency of the common object is low.

Description

Method and device for determining common object, storage medium and electronic equipment

Technical Field

The present application relates to the field of computers, and in particular, to a method and apparatus for determining a common object, a storage medium, and an electronic device.

Background

The determination of the common object in the graph structure involves an algorithm in the graph mining algorithm, the calculation logic is simple, and the intersection of the common neighbors of two nodes is generally calculated, but the related technology generally has a large number of operations such as disk reading and writing, memory copying and the like for the acquisition of the common object, so that the overall performance is dragged slowly, and the efficiency is lower. Therefore, there is a problem in that the determination efficiency of the common object is low.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides a method and a device for determining a common object, a storage medium and electronic equipment, and aims to at least solve the technical problem that the determination efficiency of the common object is low.

According to an aspect of an embodiment of the present application, there is provided a method for determining a common object, including: acquiring a neighbor object identification set of each object in an object relation graph, wherein the neighbor object identification set of each object comprises identifications of neighbor objects with connection relation with each object; performing fragment compression on the neighbor object identification set of each object to obtain and store a group of compressed fragments corresponding to each object, wherein each compressed fragment comprises an array of first data types, and each group member in the array of first data types is used for recording the identification of one neighbor object; decompressing the stored compressed set of fragments corresponding to each object to obtain a decompressed set of fragments corresponding to each object, wherein each decompressed fragment comprises an array of second data types, each member in the array of second data types is used for recording the identifier of one neighbor object, and the number of bits occupied by the second data types is greater than the number of bits occupied by the first data types; sequencing and splicing the identifiers of the neighbor objects recorded in a group of decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object; and determining a common object corresponding to the two objects with the connection relationship in the object relationship graph according to the neighbor object identification sequence corresponding to each object.

According to another aspect of the embodiment of the present application, there is also provided a common object determining apparatus, including: a first obtaining unit, configured to obtain a neighbor object identifier set of each object in an object relationship graph, where the neighbor object identifier set of each object includes an identifier of a neighbor object having a connection relationship with each object; the first compression unit is used for carrying out fragment compression on the neighbor object identification set of each object to obtain and store a group of compressed fragments corresponding to each object, wherein each compressed fragment comprises an array of first data types, and each member in the array of the first data types is used for recording the identification of one neighbor object; the decompression unit is used for decompressing the group of compressed fragments corresponding to each stored object to obtain a group of decompressed fragments corresponding to each object, wherein each decompressed fragment comprises an array of second data types, each member in the array of second data types is used for recording the identification of one neighbor object, and the number of bits occupied by the second data types is larger than the number of bits occupied by the first data types; the processing unit is used for sequencing and splicing the identifiers of the neighbor objects recorded in the decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object; and the determining unit is used for determining a common object corresponding to the two objects with the connection relationship in the object relationship graph according to the neighbor object identification sequence corresponding to each object.

As an alternative, the first compression unit includes: the first execution module is configured to execute the following steps for the neighbor object identifier set of each object, where each object is a current object when the following steps are executed: writing the neighbor object identifiers in the neighbor object identifier set of the current object into a target memory in batches, then compressing the neighbor object identifiers recorded in the target memory into one compressed fragment, and storing the compressed fragment, wherein the maximum value of the number of the neighbor object identifiers written into the target memory in each batch is N, N is the array size of the array of the second data type, and N is a positive integer greater than or equal to 1.

As an alternative, the first execution module includes: the execution submodule is configured to repeatedly execute the following steps for the neighbor object identifier set of the current object until the neighbor object identifier set of the current object is written into the target memory, where the target memory is an initial memory of n×m bytes allocated in advance, and the M bytes are the number of bits occupied by the second data type: determining whether the number of neighbor object identifiers which are not written into the target memory is greater than or equal to N in the neighbor object identifier set of the current object; when the number of the neighbor object identifiers which are not written into the target memory is greater than or equal to N, writing N neighbor object identifiers in the neighbor object identifiers which are not written into the target memory, compressing the N neighbor object identifiers in the target memory into a compressed fragment, and emptying the target memory; and under the condition that the number of the neighbor object identifiers which are not written into the target memory is smaller than N, writing the neighbor object identifiers which are not written into the target memory, compressing the neighbor object identifiers in the target memory into a compressed fragment, and emptying the target memory.

As an alternative, the decompression unit includes: the second execution module is configured to execute the following steps for the set of compressed fragments corresponding to each object, where each object is a current object when the following steps are executed: decompressing the P groups of compressed fragments corresponding to the current object respectively to obtain P groups of decompressed fragments, wherein P is a positive integer greater than or equal to 1; and sequencing and splicing the identifiers of the neighbor objects recorded in the decompressed fragments of the P groups to obtain a neighbor object identifier sequence corresponding to the current object, wherein the neighbor object identifiers in the neighbor object identifier sequence corresponding to the current object are identical to the neighbor object identifiers in the neighbor object identifier set of the current object.

As an alternative, the apparatus further includes: the second compression unit is used for sorting and splicing the identifiers of the neighbor objects recorded in the decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object, and then respectively compressing the neighbor object identifier sequences corresponding to each object to obtain and store a target array of the first data type corresponding to each object; the above-mentioned determination unit includes: the decompression module is used for respectively decompressing the stored target arrays of the first data type corresponding to each object to obtain a neighbor object identification sequence corresponding to each object; and the first determining module is used for determining a common object corresponding to the two objects with the connection relationship in the object relationship graph according to the neighbor object identification sequence corresponding to each object.

As an alternative, the determining unit includes: the searching module is used for searching the neighbor object identifiers included in the neighbor object identifier sequence of the first object and the neighbor object identifier sequence of the second object under the condition that the neighbor object identifier sequence corresponding to each object comprises the neighbor object identifier sequence of the first object and the neighbor object identifier sequence of the second object; and the second determining module is used for determining the object represented by the searched neighbor object identifier as a common object corresponding to the first object and the second object under the condition that the neighbor object identifiers included in the neighbor object identifier sequence of the first object and the neighbor object identifier sequence of the second object are searched.

As an alternative, the first obtaining unit includes: the loading module is used for loading the object relation graph from the distributed storage system, and distributing each edge structure in the object relation graph to each computing node of the distributed storage system for management, wherein the edge structure is used for representing at least two neighbor objects with connection relations, and the at least two neighbor objects comprise at least one group of source objects and target objects; the above-mentioned determination unit includes: the calling module is used for calling each computing node to traverse the edge structure managed by each computing node to obtain each group of source objects and target objects corresponding to the edge structure managed by each computing node; and the third determining module is used for determining the common object corresponding to each group of source object and target object according to the neighbor object identification sequence corresponding to each object.

As an alternative, the determining unit includes: the acquisition module is used for acquiring a group of neighbor object identification sequences to be confirmed from the neighbor object identification sequences corresponding to each object, wherein the group of neighbor object identification sequences to be confirmed comprises a first neighbor object identification sequence and a second neighbor object identification sequence; the creating module is used for creating a first pointer for the first neighbor object identification sequence and a second pointer for the second neighbor object identification sequence, wherein the first pointer is used for indicating the sequence starting point position of the first neighbor object identification sequence, and the second pointer is used for indicating the sequence starting point position of the second neighbor object identification sequence; a first moving module, configured to move a position of the first pointer in the first neighbor object identification sequence by at least one sequence unit in a target direction when an identification of a position indicated by the first pointer is smaller than an identification of a position indicated by the second pointer; a second moving module configured to move the position of the second pointer in the second neighbor object identification sequence by the at least one sequence unit in the target direction when the identification of the position indicated by the first pointer is greater than the identification of the position indicated by the second pointer; a third moving module configured to, when the identifier of the position indicated by the first pointer is equal to the identifier of the position indicated by the second pointer, add the equal identifier to an intersection set, move the position of the second pointer in the second neighbor object identifier sequence by at least one sequence unit in the target direction, and move the position of the first pointer in the first neighbor object identifier sequence by the at least one sequence unit in the target direction; and a fourth determining module, configured to determine, when the first pointer has traversed all positions in the first neighbor object identification sequence and the second pointer has traversed all positions in the second neighbor object identification sequence, an object corresponding to the identifier in the intersection set as a common object corresponding to all the two objects having a connection relationship in the object relationship graph.

As an alternative, the apparatus includes: after the neighbor object identifier set of each object is obtained in the object relation diagram, a second obtaining unit is used for obtaining the object identifier corresponding to each object; a third obtaining unit, configured to obtain a calculation result obtained by modulo a preset value by using an object identifier corresponding to each object after obtaining a neighbor object identifier set of each object in the object relationship diagram; the distribution unit is used for distributing each object to each corresponding partition according to the calculation result after the neighbor object identification set of each object is obtained in the object relation diagram;

the first compression unit includes: the segmentation module is used for compressing the neighbor object identification set of each object in a segmentation way to obtain a group of compressed segments corresponding to each object; and the storage module is used for storing the compressed fragments corresponding to each object into the corresponding partitions.

According to yet another aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the method of determining the common object as above.

According to still another aspect of the embodiment of the present application, there is further provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the above-mentioned method for determining a common object by using the computer program.

In the embodiment of the application, a neighbor object identification set of each object is obtained in an object relation diagram, wherein the neighbor object identification set of each object comprises identifications of neighbor objects with connection relation with each object; performing fragment compression on the neighbor object identification set of each object to obtain and store a group of compressed fragments corresponding to each object, wherein each compressed fragment comprises an array of first data types, and each group member in the array of first data types is used for recording the identification of one neighbor object; decompressing the stored compressed set of fragments corresponding to each object to obtain a decompressed set of fragments corresponding to each object, wherein each decompressed fragment comprises an array of second data types, each member in the array of second data types is used for recording the identifier of one neighbor object, and the number of bits occupied by the second data types is greater than the number of bits occupied by the first data types; sequencing and splicing the identifiers of the neighbor objects recorded in a group of decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object; according to the neighbor object identification sequence corresponding to each object, determining a common object corresponding to two objects with a connection relationship in the object relationship graph, and reducing the dependence on storage resources by adding a compression step in the determination process of the common object; in the process of carrying out subsequent operation on the data, the communication traffic is reduced when the data is pulled because the data is compressed, so that the communication cost in the calculation process is saved, the technical effect of improving the determination efficiency of the common object is realized, and the technical problem of lower determination efficiency of the common object is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic illustration of an application environment of an alternative method of determination of common objects according to an embodiment of the present application;

FIG. 2 is a schematic illustration of a flow of an alternative method of determining a common object according to an embodiment of the application;

FIG. 3 is a schematic diagram of an alternative method of determining a common object according to an embodiment of the present application;

FIG. 4 is a schematic diagram of another alternative method of determining a common object according to an embodiment of the present application;

FIG. 5 is a schematic diagram of another alternative method of determining a common object according to an embodiment of the present application;

FIG. 6 is a schematic diagram of another alternative method of determining a common object according to an embodiment of the present application;

FIG. 7 is a schematic diagram of another alternative method of determining a common object according to an embodiment of the present application;

FIG. 8 is a schematic diagram of another alternative method of determining a common object according to an embodiment of the present application;

FIG. 9 is a schematic diagram of another alternative method of determining a common object according to an embodiment of the present application;

FIG. 10 is a schematic diagram of another alternative method of determining a common object according to an embodiment of the present application;

FIG. 11 is a schematic diagram of another alternative method of determining a common object according to an embodiment of the present application;

FIG. 12 is a schematic diagram of an alternative common object determination apparatus according to an embodiment of the present application;

fig. 13 is a schematic structural view of an alternative electronic device according to an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiment of the present application, there is provided a method for determining a common object, optionally, as an alternative implementation, the method for determining a common object may be applied, but not limited to, in the environment shown in fig. 1. Including but not limited to a user device 102, a network 110, and a server 112, where the user device 102 may include but is not limited to a display 108, a processor 106, and a memory 104.

The specific process comprises the following steps:

step S102, the user equipment 102 obtains a common object determination request, wherein the common object determination request is used for requesting to determine a common object of the object relation graph 1022;

steps S104-S106, the user equipment 102 sends the common object determination request to the server 112 through the network 110;

step S108, the server 112 searches the object relation diagram 1022 corresponding to the common object determining request through the database 114, acquires the neighbor object identification set of each object in the object relation diagram through the processing engine 116, performs fragment compression on the neighbor object identification set of each object to obtain and store a group of compressed fragments corresponding to each object, and decompresses the stored group of compressed fragments corresponding to each object to obtain a group of decompressed fragments corresponding to each object; the identification of the neighbor objects recorded in a group of decompressed fragments corresponding to each object is further sequenced and spliced to obtain a neighbor object identification sequence corresponding to each object, so that the determination information of the common objects corresponding to the two objects with the connection relationship in the object relationship graph is determined according to the neighbor object identification sequence corresponding to each object;

In steps S110-S112, the server 112 sends the determination information of the common object to the user device 102 via the network 110, the processor 106 in the user device 102 displays the determination information in the display 108, and the determination information in the memory 104.

In addition to the example shown in fig. 1, the above steps may be performed independently by the user device 102, i.e., the user device 102 performs the steps of obtaining the object relationship diagram 1022, compressing the neighbor object identification set, and so on, thereby reducing the processing pressure of the server. The user device 102 includes, but is not limited to, a handheld device (e.g., a mobile phone), a notebook computer, a desktop computer, a vehicle-mounted device, etc., and the application is not limited to a particular implementation of the user device 102.

Optionally, as an optional implementation manner, as shown in fig. 2, the method for determining the common object includes:

s202, acquiring a neighbor object identification set of each object in an object relation graph, wherein the neighbor object identification set of each object comprises identifications of neighbor objects with connection relation with each object;

s204, performing fragment compression on the neighbor object identification set of each object to obtain and store a group of compressed fragments corresponding to each object, wherein each compressed fragment comprises an array of first data types, and each group member in the array of first data types is used for recording the identification of one neighbor object;

S206, decompressing a group of compressed fragments corresponding to each stored object to obtain a group of decompressed fragments corresponding to each object, wherein each decompressed fragment comprises an array of second data types, each member in the array of second data types is used for recording the identification of one neighbor object, and the number of bits occupied by the second data types is larger than the number of bits occupied by the first data types;

s208, sorting and splicing the identifiers of the neighbor objects recorded in a group of decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object;

s210, determining a common object corresponding to two objects with a connection relationship in the object relationship diagram according to the neighbor object identification sequence corresponding to each object.

Optionally, in this embodiment, the method for determining the common object may be, but not limited to, calculating and recommending the degree of association applied in the social network, for example, the more common friends of two nodes, the higher the degree of association of the relationship or the more overlapping circles where the common friends are located, which may be used as the basis of social or product recommendation.

Optionally, in this embodiment, the above method for determining the common object may be applied to a large-scale high-performance computing scenario, such as a graph structure (object relationship graph) for a large scale (e.g. trillion edges), where the related technology is usually unacceptably long in time and occupies a lot of resources, but for this embodiment, the effect of completing high-performance computing with less resources occupied may be achieved in the above technical scenario;

Further illustrating, determining a neighbor set (neighbor object identification set) of each node (object) in the object relation diagram, and then taking an intersection of at least two sets to obtain a common object of part or all of the nodes; when the graph structure scale of the object relation graph does not reach the target threshold, the whole object relation graph can be stored in a single machine, and a neighbor set of each node is obtained, and when the graph structure scale of the object relation graph reaches the target threshold, the graph structure scale cannot be stored on a single machine, distributed storage and calculation are further adopted, for example, the execution of the method for determining the common object is completed based on a Parameter Server (PS) for short, and the dependence on storage resources is reduced by adding a compression step in the determination process of the common object; in the process of carrying out subsequent operation on the data, the communication traffic is reduced when the data is pulled because the data is compressed, so that the communication cost in the calculation process is saved; in addition, since the PS architecture is full memory, the read-write operation of the disk is not involved in the calculation process, and thus, the data processing efficiency is improved, wherein the PS can be used for storing (updating) the ultra-large scale parameters in a distributed manner in the machine learning field without limitation.

Alternatively, in the present embodiment, in the graph structure data (object relationship graph), the neighbor (object having an association relationship) of a certain node (object) is generally referred to as a neighbor object of the node, and a common object may be understood as a common neighbor of two nodes, but is not limited thereto. Further by way of example, if the neighbor set of node 5 includes 1,2,6,7 and the neighbor set of node 7 includes 3,4,5,6, then node 5 and 7 may have a common neighbor 6 by intersecting the two neighbor sets, optionally as shown in graph structure 302 of FIG. 3.

Optionally, in this embodiment, the above method for determining the common object may be implemented based on a large-scale distributed framework, where the large-scale distributed framework may be understood as a high-performance distributed computing platform that combines a Parameter Server function with a large-scale data processing capability, and supports machine learning, deep learning, and various graph algorithms, the Parameter Server function may be understood as a function that can be implemented by a high-performance distributed machine learning platform that is developed based on a Parameter Server (Parameter Server) concept, the large-scale data processing capability may be understood as a function that can be implemented by a fast general-purpose computing engine that is specifically and specifically as shown in fig. 4, where the first module 402 may be understood as PS or Angel Parameter Server, the second module 404 may be understood as Spark Driver, and the second module 404 may be used as a module for applying for resources to clusters, invoking master information, and complex tasks, and analyzing the following steps, and so on:

Step S402, the computing node (the number of computing nodes in the figure is only an example, and it can be understood that the executing step of each computing node in the plurality of computing nodes) pushes the neighboring node to the first module 402 (such as PS), so that the first module 402 generates an initial adjacency table (a neighboring object identifier set);

step S404, the first module 402 compresses and sorts the initial adjacency list to obtain a compressed ordered adjacency list (neighbor object identification sequence);

step S406, the computing node pulls the compressed ordered adjacency list to the first module 402;

step S408, decompressing the pulled compressed ordered adjacency list by the computing node and computing a common object;

it should be noted that, in this embodiment, the data magnitudes of the two operations of pushing the neighbor and pulling the adjacency list to the PS may be adjusted according to the size of the memory that may be actually used. Specifically, the number of nodes in each Graph Partition (Graph Partition) is usually relatively large, and if data is pushed and pulled to ps at a time, the efficiency is low due to excessive traffic, and the problem of insufficient memory is also likely to occur. In practice, the data in each partition is typically subdivided into multiple latches, one at a time, and the size of the latch size can be estimated based on the memory of the application, e.g., 1024. The computing resources can be properly reduced when limited, thereby ensuring the stable running of the task. In addition, the sorting and compressing of the initial adjacency list requires that PS has a certain computing capability, and common PS usually only supports storage and access, while Angel PS can customize a computing function, which is a key factor that can be successfully executed in this embodiment.

Alternatively, in the present embodiment, the slice compression may be used, but is not limited to, to compress the data type occupying more bytes into the data type occupying smaller bytes for storage; the tile decompression may be used, but is not limited to, to restore the data type occupying smaller bytes to the original data type occupying more bytes for the relevant operations, wherein the relevant operations may include, but are not limited to, at least one of the following: sequencing and splicing the identifiers of the neighbor objects recorded in a group of decompressed fragments corresponding to each object, and determining common objects corresponding to two objects with a connection relationship in an object relationship diagram according to the neighbor object identifier sequence corresponding to each object;

in other words, in this embodiment, before the identifiers of the neighbor objects recorded in a group of decompressed fragments corresponding to each object are sequenced and spliced to obtain the neighbor object identifier sequence corresponding to each object, the compressed data is decompressed for use, and after the neighbor object identifier sequence corresponding to each object is obtained, the data of the neighbor object identifier sequence corresponding to each object may be compressed again by using the above-mentioned manner of fragment compression, where the recompression is to save the total storage space; further, after the identifiers of the neighbor objects recorded in the decompressed fragments corresponding to each object are sequenced and spliced to obtain the neighbor object identifier sequence corresponding to each object, the decompression can be performed again by the way of the above-mentioned fragments decompression, so as to determine the common object corresponding to the two objects with the connection relationship in the object relationship graph according to the decompressed neighbor object identifier sequence corresponding to each object; similarly, after determining the common object corresponding to the two objects having the connection relationship in the object relationship graph, the obtained data of the common object may be compressed again by using, but not limited to, the above-mentioned method of slice compression. Because the compressed data is pulled, the data size is reduced more than that of uncompressed data, and the communication pressure between the data is further reduced.

Alternatively, in this embodiment, the data of the first data type may be, but is not limited to, data that occupies more bytes of storage space, such as an array of Long (Long) types, where typically one Long type occupies 8 bytes of storage space; while the data of the second data type may be, but is not limited to, data that occupies less Byte memory space, such as an array of bytes (Byte), saving more than one time the memory space of an array of the original Long type.

Optionally, in this embodiment, the sorting and stitching of the identifiers of the neighbor objects recorded in the decompressed set of slices corresponding to each object may, but is not limited to, provide traversing operation data for a subsequent determination operation of a common object, for example, may, but is not limited to, perform orderly selection on an orderly neighbor object identifier sequence, so as to determine a common object corresponding to two objects having a connection relationship in the object relationship graph.

It should be noted that, a neighbor object identifier set of each object is obtained in the object relationship graph, where the neighbor object identifier set of each object includes identifiers of neighbor objects having a connection relationship with each object; performing fragment compression on the neighbor object identification set of each object to obtain and store a group of compressed fragments corresponding to each object, wherein each compressed fragment comprises an array of first data types, and each group member in the array of first data types is used for recording the identification of one neighbor object; decompressing a group of compressed fragments corresponding to each stored object to obtain a group of decompressed fragments corresponding to each object, wherein each decompressed fragment comprises an array of second data types, each member of the array of second data types is used for recording the identification of one neighbor object, and the number of bits occupied by the second data types is larger than the number of bits occupied by the first data types; sequencing and splicing the identifiers of the neighbor objects recorded in a group of decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object; according to the neighbor object identification sequence corresponding to each object, determining a common object corresponding to two objects with a connection relationship in the object relationship graph, and reducing the dependence on storage resources by adding a compression step in the determination process of the common object; in the process of carrying out subsequent operation on the data, the communication traffic is reduced when the data is pulled because the data is already compressed, so that the communication overhead in the calculation process is saved.

Further by way of example, continuing with the scenario shown in fig. 3, as shown in fig. 5, a neighbor object identification set for each object (neighbor object identification set for each object in neighbor object identification set 502) is obtained in an object relationship graph (graph structure 302), wherein neighbor object identification set 502 for each object includes identifications of neighbor objects that have a connection relationship with each object; performing fragment compression on the neighbor object identification set 502 of each object to obtain and store a group of compressed fragments (a group of fragments corresponding to each object in the fragment set 504) corresponding to each object, wherein each compressed fragment comprises an array of first data types, and each group member in the array of first data types is used for recording the identification of one neighbor object; decompressing a group of compressed fragments corresponding to each stored object to obtain a group of decompressed fragments corresponding to each object (or a neighbor object identifier set of each object in a neighbor object identifier set 506 can be understood), wherein each decompressed fragment comprises an array of second data types, each member in the array of second data types is used for recording the identifier of one neighbor object, and the number of bits occupied by the second data types is greater than the number of bits occupied by the first data types; sequencing and splicing the identifiers of the neighbor objects recorded in a group of decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object (a neighbor object identifier sequence corresponding to each object in the neighbor object identifier sequence set 508); according to the neighbor object identification sequence corresponding to each object, the common object corresponding to the two objects with the connection relationship in the object relationship graph (the common object corresponding to the two objects with the connection relationship in the common object set 510) is determined.

According to the embodiment provided by the application, the neighbor object identification set of each object is obtained in the object relation graph, wherein the neighbor object identification set of each object comprises identifications of neighbor objects with connection relation with each object; performing fragment compression on the neighbor object identification set of each object to obtain and store a group of compressed fragments corresponding to each object, wherein each compressed fragment comprises an array of first data types, and each group member in the array of first data types is used for recording the identification of one neighbor object; decompressing a group of compressed fragments corresponding to each stored object to obtain a group of decompressed fragments corresponding to each object, wherein each decompressed fragment comprises an array of second data types, each member of the array of second data types is used for recording the identification of one neighbor object, and the number of bits occupied by the second data types is larger than the number of bits occupied by the first data types; sequencing and splicing the identifiers of the neighbor objects recorded in a group of decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object; according to the neighbor object identification sequence corresponding to each object, determining a common object corresponding to two objects with a connection relationship in the object relationship graph, and reducing the dependence on storage resources by adding a compression step in the determination process of the common object; in the process of carrying out subsequent operation on the data, the communication traffic is reduced when the data is pulled because the data is compressed, so that the communication cost in the calculation process is saved, and the technical effect of improving the determination efficiency of the common object is realized.

As an optional solution, performing fragment compression on the neighbor object identifier set of each object to obtain and store a group of compressed fragments corresponding to each object, including:

the following steps are performed on the neighbor object identification set of each object, wherein each object is the current object when the following steps are performed:

and writing the neighbor object identifiers in the neighbor object identifier set of the current object into the target memory in batches, then compressing the neighbor object identifiers recorded in the target memory into a compressed fragment, and storing the compressed fragment, wherein the maximum value of the number of the neighbor object identifiers written into the target memory in each batch is N, N is the array size of the array of the second data type, and N is a positive integer greater than or equal to 1.

Optionally, in this embodiment, the manner of slice compression may be, but not limited to, dynamic compression, where dynamic compression may be, but not limited to, performing compression when acquiring a neighbor object identifier set of any object, and asynchronously performing acquisition of neighbor object identifier sets of other objects during execution of compression; in addition, when the neighbor object identifiers in the neighbor object identifier set of the current object are compressed, the array size of each compression process can be limited, the array size of each compression process can be determined according to the memory of the storage space, and can be properly reduced when the computing resources are limited, so that the stable operation of the task is ensured, wherein the array size of each compression process is limited in the upper limit of the number of the neighbor object identifiers written into the target memory in each batch.

As an alternative, writing the neighbor object identifiers in the neighbor object identifier set of the current object into the target memory in batches, and then compressing the neighbor object identifiers recorded in the target memory into a compressed fragment, including:

s1, repeatedly executing the following steps until the neighbor object identification set of the current object is written into a target memory, wherein the target memory is an initial memory of N multiplied by M bytes which are pre-allocated, and M bytes are the number of bits occupied by a second data type:

s2, determining whether the number of neighbor object identifiers which are not written into a target memory is greater than or equal to N in a neighbor object identifier set of the current object;

s3, under the condition that the number of neighbor object identifiers which are not written into the target memory is greater than or equal to N, N neighbor object identifiers which are not written into the target memory are written into the target memory, the N neighbor object identifiers in the target memory are compressed into a compressed fragment, and the target memory is emptied;

and S4, under the condition that the number of neighbor object identifiers which are not written into the target memory is smaller than N, writing the neighbor object identifiers which are not written into the target memory, compressing the neighbor object identifiers in the target memory into a compressed fragment, and clearing the target memory.

Optionally, in this embodiment, it is assumed that the data structure of the neighbor object identifiers in the original neighbor object identifier set is an array with a Long-form data type (Long), and the neighbor object identifiers in the original neighbor object identifier set are further stored in the form of compressed multiple fragments. Specifically, setting a batch size parameter to represent the maximum number of neighbors that each fragment can accommodate, for example, when the batch size is 3 (N is 3), firstly allocating an initial memory of 3×8 bytes (n×m bytes), sequentially writing neighbors inwards, and when the number of written neighbors reaches 3, compressing the fragment into an array with a data type of Byte; and then the original allocated initial memory is emptied, and the steps are continuously executed until all neighbors are obtained, and a plurality of compressed neighbor fragments are stored at the moment, so that the space can be saved by one time compared with the original Long type array.

It should be noted that, the following steps are repeatedly executed on the neighbor object identifier set of the current object until the neighbor object identifier set of the current object is written into the target memory, where the target memory is an initial memory of n×m bytes allocated in advance, and M bytes are the number of bits occupied by the second data type: determining whether the number of neighbor object identifiers which are not written into a target memory is greater than or equal to N in a neighbor object identifier set of the current object; under the condition that the number of neighbor object identifiers which are not written into the target memory is greater than or equal to N, N neighbor object identifiers which are not written into the target memory are written into the target memory, the N neighbor object identifiers in the target memory are compressed into a compressed fragment, and the target memory is emptied; and under the condition that the number of neighbor object identifiers which are not written into the target memory is smaller than N, writing the neighbor object identifiers which are not written into the target memory, compressing the neighbor object identifiers in the target memory into a compressed fragment, and clearing the target memory.

By way of further illustration, and optionally such as shown in FIG. 6, assuming N is 3, determining in the set of neighbor object identifiers 602 for the current object whether the number of neighbor object identifiers not written to the target memory is greater than or equal to N; as shown in fig. 6 (a), in the case that the neighbor object identifiers not written into the target memory include 1, 2, 3, 4, 5, 6, 7, 8, the number of which is greater than or equal to 3, 3 neighbor object identifiers (such as 1, 2, 3) among the neighbor object identifiers not written into the target memory 604 are written into the target memory 604, the 3 neighbor object identifiers in the target memory 604 are compressed into a compressed fragment 606 (fragment 1), and the target memory 604 is emptied; further as shown in fig. 6 b, in the case that the neighbor object identifiers not written into the target memory include 4, 5, 6, 7, 8, the number of which is greater than or equal to 3, 3 neighbor object identifiers (e.g. 4, 5, 6) among the neighbor object identifiers not written into the target memory 604 are written into the target memory 604, the 3 neighbor object identifiers in the target memory 604 are compressed into a compressed fragment 608 (fragment 2), and the target memory 604 is emptied; in addition, as shown in fig. 6 (c), in the case that the neighbor object identifiers not written to the target memory include 7, 8, the number of which is less than 3, the neighbor object identifiers not written to the target memory 604 (e.g., 7, 8) are written to the target memory 604, the neighbor object identifiers in the target memory 604 are compressed into one compressed fragment 610 (fragment 3), and the target memory 604 is emptied.

Through the embodiment provided by the application, the following steps are repeatedly executed on the neighbor object identification set of the current object until the neighbor object identification set of the current object is written into the target memory, wherein the target memory is the pre-allocated initial memory with N multiplied by M bytes, and the M bytes are the bit number occupied by the second data type: determining whether the number of neighbor object identifiers which are not written into a target memory is greater than or equal to N in a neighbor object identifier set of the current object; under the condition that the number of neighbor object identifiers which are not written into the target memory is greater than or equal to N, N neighbor object identifiers which are not written into the target memory are written into the target memory, the N neighbor object identifiers in the target memory are compressed into a compressed fragment, and the target memory is emptied; and under the condition that the number of neighbor object identifiers which are not written into the target memory is smaller than N, writing the neighbor object identifiers which are not written into the target memory, compressing the neighbor object identifiers in the target memory into a compressed fragment, and clearing the target memory, thereby realizing the effect of saving the storage space.

As an alternative, decompressing a set of compressed fragments corresponding to each stored object to obtain a set of decompressed fragments corresponding to each object, and sorting and splicing the identifiers of the neighbor objects recorded in the set of decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object, including:

Executing the following steps on a group of compressed fragments corresponding to each object, wherein each object is a current object when executing the following steps:

s1, respectively decompressing P groups of compressed fragments corresponding to a current object to obtain P groups of decompressed fragments, wherein P is a positive integer greater than or equal to 1;

s2, sequencing and splicing the identifiers of the neighbor objects recorded in the P groups of decompressed fragments to obtain a neighbor object identifier sequence corresponding to the current object, wherein the neighbor object identifiers in the neighbor object identifier sequence corresponding to the current object are identical to the neighbor object identifiers in the neighbor object identifier set of the current object.

As an alternative solution, after sorting and splicing the identifiers of the neighbor objects recorded in a group of decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object, the method further includes: respectively compressing the neighbor object identification sequences corresponding to each object to obtain and store a target array of the first data type corresponding to each object;

as an alternative, determining a common object corresponding to two objects having a connection relationship in the object relationship graph according to a neighbor object identifier sequence corresponding to each object, including: respectively decompressing the stored target arrays of the first data type corresponding to each object to obtain a neighbor object identification sequence corresponding to each object; and determining a common object corresponding to the two objects with the connection relationship in the object relationship graph according to the neighbor object identification sequence corresponding to each object.

Optionally, in this embodiment, since the fragments of the compressed neighbor set are stored, and the complete neighbor set needs to be pulled during calculation, the method may, but is not limited to, firstly decompressing each compressed fragment into an array of the first data type, splicing together, sorting, and finally recompressing the spliced and sorted neighbor set into an array of the first data type, where recompression is to save the total storage space. In addition, the neighbor set of the node is required to be decompressed and then calculated when being pulled. Because the compressed data is pulled, the data size is reduced by one time compared with the original edition, and the communication pressure between nodes is greatly reduced.

It should be noted that, compressing the neighbor object identification sequences corresponding to each object respectively to obtain and store the target array of the first data type corresponding to each object; respectively decompressing the stored target arrays of the first data type corresponding to each object to obtain a neighbor object identification sequence corresponding to each object; and determining a common object corresponding to the two objects with the connection relationship in the object relationship graph according to the neighbor object identification sequence corresponding to each object.

Further by way of example, continuing with the scenario shown in fig. 5, as shown in fig. 7, a neighbor object identification set (neighbor object identification set of each object in the neighbor object identification set 502) of each object is obtained in an object relationship graph (graph structure 302), wherein the neighbor object identification set 502 of each object includes identifications of neighbor objects having a connection relationship with each object; performing fragment compression on the neighbor object identification set 502 of each object to obtain and store a group of compressed fragments (a group of fragments corresponding to each object in the fragment set 504) corresponding to each object, wherein each compressed fragment comprises an array of first data types, and each group member in the array of first data types is used for recording the identification of one neighbor object; decompressing a group of compressed fragments corresponding to each stored object to obtain a group of decompressed fragments corresponding to each object (or a neighbor object identifier set of each object in a neighbor object identifier set 506 can be understood), wherein each decompressed fragment comprises an array of second data types, each member in the array of second data types is used for recording the identifier of one neighbor object, and the number of bits occupied by the second data types is greater than the number of bits occupied by the first data types; sequencing and splicing the identifiers of the neighbor objects recorded in a group of decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object (a neighbor object identifier sequence corresponding to each object in the neighbor object identifier sequence set 508); respectively compressing the neighbor object identification sequences corresponding to each object to obtain and store a target array of the first data type corresponding to each object (the target array of the first data type corresponding to each object in the neighbor object identification sequence set 702); decompressing the stored target arrays of the first data type corresponding to each object respectively to obtain a neighbor object identification sequence corresponding to each object (a neighbor object identification sequence corresponding to each object in the neighbor object identification sequence set 704); according to the neighbor object identification sequence corresponding to each object, the common object corresponding to the two objects with the connection relationship in the object relationship graph (the common object corresponding to the two objects with the connection relationship in the common object set 706) is determined.

By the embodiment of the application, the neighbor object identification sequences corresponding to each object are respectively compressed to obtain and store the target array of the first data type corresponding to each object; respectively decompressing the stored target arrays of the first data type corresponding to each object to obtain a neighbor object identification sequence corresponding to each object; according to the neighbor object identification sequence corresponding to each object, the common object corresponding to the two objects with the connection relation in the object relation diagram is determined, and the effect of reducing the communication pressure during data processing is achieved.

As an alternative, determining a common object corresponding to two objects having a connection relationship in the object relationship graph according to a neighbor object identifier sequence corresponding to each object, including:

s1, searching for neighbor object identifiers included in the neighbor object identifier sequence of the first object and the neighbor object identifier sequence of the second object under the condition that the neighbor object identifier sequence corresponding to each object comprises the neighbor object identifier sequence of the first object and the neighbor object identifier sequence of the second object;

s2, under the condition that the neighbor object identifiers included in the neighbor object identifier sequence of the first object and the neighbor object identifier sequence of the second object are found, determining the object represented by the found neighbor object identifier as a common object corresponding to the first object and the second object.

Alternatively, in the present embodiment, the common object corresponding to the first object and the second object may be determined, but not limited to, in such a manner that the intersection of the neighbor object identification sequence of the first object and the neighbor object identification sequence of the second object is found.

As an alternative, acquiring a neighbor object identifier set of each object in the object relationship graph includes: loading an object relation graph from a distributed storage system, and distributing each edge structure in the object relation graph to each computing node of the distributed storage system for management, wherein the edge structure is used for representing at least two neighbor objects with a connection relation, and the at least two neighbor objects comprise at least one group of source objects and target objects;

as an alternative, determining a common object corresponding to two objects having a connection relationship in the object relationship graph according to a neighbor object identifier sequence corresponding to each object, including: each computing node is called to traverse the edge structure managed by each computing node, and each group of source objects and target objects corresponding to the edge structure managed by each computing node are obtained; and determining common objects corresponding to each group of source objects and target objects according to the neighbor object identification sequences corresponding to each object.

Optionally, in this embodiment, the distributed processing manner may process the object data asynchronously and/or in parallel, so as to improve the processing efficiency of the object data.

It should be noted that, loading an object relationship graph from a distributed storage system, and distributing each edge structure in the object relationship graph to each computing node of the distributed storage system for management, where the edge structure is used to represent at least two neighbor objects with a connection relationship, and the at least two neighbor objects include at least one group of source objects and target objects; each computing node is called to traverse the edge structure managed by each computing node, and each group of source objects and target objects corresponding to the edge structure managed by each computing node are obtained; and determining common objects corresponding to each group of source objects and target objects according to the neighbor object identification sequences corresponding to each object.

Further by way of example, and optionally based on the scenario shown in FIG. 5, continuing with the example shown in FIG. 8, loading an object relationship graph (graph structure 302) from the distributed storage system, distributing each edge structure 802 in the object relationship graph to a respective computing node (e.g., computing node 1, computing node 2, computing node 3) of the distributed storage system for management; acquiring a neighbor object identification set of each object (the neighbor object identification set of each object in the neighbor object identification sets 502) according to the edge structure 802, wherein the neighbor object identification set 502 of each object comprises identifications of neighbor objects with connection relation with each object; performing fragment compression on the neighbor object identification set 502 of each object to obtain and store a group of compressed fragments (a group of fragments corresponding to each object in the fragment set 504) corresponding to each object, wherein each compressed fragment comprises an array of first data types, and each group member in the array of first data types is used for recording the identification of one neighbor object; decompressing a group of compressed fragments corresponding to each stored object to obtain a group of decompressed fragments corresponding to each object (or a neighbor object identifier set of each object in a neighbor object identifier set 506 can be understood), wherein each decompressed fragment comprises an array of second data types, each member in the array of second data types is used for recording the identifier of one neighbor object, and the number of bits occupied by the second data types is greater than the number of bits occupied by the first data types; sequencing and splicing the identifiers of the neighbor objects recorded in a group of decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object (a neighbor object identifier sequence corresponding to each object in the neighbor object identifier sequence set 508); each computing node is called to traverse the edge structure 802 managed by each computing node, and each group of source objects and target objects corresponding to the edge structure 802 managed by each computing node are obtained; common objects corresponding to each group of source objects and target objects (common objects corresponding to two objects with a connection relationship in the common object set 806) are determined according to the neighbor object identification sequence corresponding to each object.

According to the embodiment provided by the application, an object relation graph is loaded from a distributed storage system, each edge structure in the object relation graph is distributed to each computing node of the distributed storage system for management, wherein the edge structure is used for representing at least two neighbor objects with a connection relation, and the at least two neighbor objects comprise at least one group of source objects and target objects; each computing node is called to traverse the edge structure managed by each computing node, and each group of source objects and target objects corresponding to the edge structure managed by each computing node are obtained; and determining common objects corresponding to each group of source objects and target objects according to the neighbor object identification sequences corresponding to each object, thereby realizing the effect of improving the processing efficiency of the object data.

s1, acquiring a group of neighbor object identification sequences to be confirmed from neighbor object identification sequences corresponding to each object, wherein the group of neighbor object identification sequences to be confirmed comprises a first neighbor object identification sequence and a second neighbor object identification sequence;

S2, creating a first pointer for the first neighbor object identification sequence and a second pointer for the second neighbor object identification sequence, wherein the first pointer is used for indicating the sequence starting point position of the first neighbor object identification sequence, and the second pointer is used for indicating the sequence starting point position of the second neighbor object identification sequence;

s3, under the condition that the mark of the position indicated by the first pointer is smaller than the mark of the position indicated by the second pointer, moving the position of the first pointer in the first neighbor object mark sequence by at least one sequence unit towards the target direction;

s4, under the condition that the mark of the position indicated by the first pointer is larger than the mark of the position indicated by the second pointer, moving the position of the second pointer in the second neighbor object mark sequence by at least one sequence unit towards the target direction;

s5, under the condition that the mark of the position indicated by the first pointer is equal to the mark of the position indicated by the second pointer, adding the equal mark into the intersection set, moving the position of the second pointer in the second neighbor object mark sequence to the target direction by at least one sequence unit, and moving the position of the first pointer in the first neighbor object mark sequence to the target direction by at least one sequence unit;

S6, determining the object corresponding to the identifier in the intersection set as a common object corresponding to all the two objects with the connection relationship in the object relationship graph when the first pointer traverses all the positions in the first neighbor object identification sequence and the second pointer traverses all the positions in the second neighbor object identification sequence.

Optionally, in this embodiment, in order to facilitate computation of an intersection (common object) of a neighbor set (a neighbor object identifier sequence corresponding to each object), an adjacency table (identifiers of neighbor objects recorded in a group of decompressed fragments corresponding to each object) needs to be ordered first, and based on the ordered adjacency table, more efficient intersection computation can be completed.

Further illustratively, as shown in fig. 9, a set of neighbor object identification sequences to be confirmed is obtained from the neighbor object identification sequences corresponding to each object, where the set of neighbor object identification sequences to be confirmed includes a first neighbor object identification sequence 902 and a second neighbor object identification sequence 904; creating a first pointer for the first neighbor object identification sequence 902 and a second pointer for the second neighbor object identification sequence 904, wherein the first pointer is used for indicating the sequence starting point position of the first neighbor object identification sequence 902, and the second pointer is used for indicating the sequence starting point position of the second neighbor object identification sequence 904; moving the position of the first pointer in the first neighbor object identification sequence 902 by at least one sequence unit in the target direction, in case the identification of the position indicated by the first pointer is smaller than the identification of the position indicated by the second pointer; moving the position of the second pointer in the second neighbor object identification sequence 904 by at least one sequence unit towards the target direction under the condition that the identification of the position indicated by the first pointer is larger than the identification of the position indicated by the second pointer; adding the equal identification to the intersection set, and moving the position of the second pointer in the second neighbor object identification sequence 904 by at least one sequence unit towards the target direction, and moving the position of the first pointer in the first neighbor object identification sequence 902 by at least one sequence unit towards the target direction, in case the identification of the position indicated by the first pointer is equal to the identification of the position indicated by the second pointer; in the case where the first pointer has traversed all positions in the first neighbor object identification sequence 902 and the second pointer has traversed all positions in the second neighbor object identification sequence 904, the objects corresponding to the identifications in the intersection set are determined to be common objects corresponding to all the two objects having a connection relationship in the object relationship graph.

According to the embodiment provided by the application, a group of neighbor object identification sequences to be confirmed are obtained from the neighbor object identification sequences corresponding to each object, wherein the group of neighbor object identification sequences to be confirmed comprises a first neighbor object identification sequence and a second neighbor object identification sequence; creating a first pointer for the first neighbor object identification sequence and a second pointer for the second neighbor object identification sequence, wherein the first pointer is used for indicating the sequence starting point position of the first neighbor object identification sequence, and the second pointer is used for indicating the sequence starting point position of the second neighbor object identification sequence; under the condition that the mark of the position indicated by the first pointer is smaller than the mark of the position indicated by the second pointer, the position of the first pointer in the first neighbor object mark sequence is moved by at least one sequence unit towards the target direction; under the condition that the mark of the position indicated by the first pointer is larger than the mark of the position indicated by the second pointer, the position of the second pointer in the second neighbor object mark sequence is moved by at least one sequence unit towards the target direction; adding the equal identification to the intersection set under the condition that the identification of the position indicated by the first pointer is equal to the identification of the position indicated by the second pointer, moving the position of the second pointer in the second neighbor object identification sequence by at least one sequence unit towards the target direction, and moving the position of the first pointer in the first neighbor object identification sequence by at least one sequence unit towards the target direction; under the condition that the first pointer traverses all positions in the first neighbor object identification sequence and the second pointer traverses all positions in the second neighbor object identification sequence, the objects corresponding to the identifications in the intersection set are determined to be common objects corresponding to all the two objects with the connection relationship in the object relationship graph, and the effect of improving the determination efficiency of the common objects is achieved.

As an alternative, after acquiring the neighbor object identifier set of each object in the object relationship graph, the method includes: obtaining an object identifier corresponding to each object; obtaining a calculation result obtained by modulo a preset value by an object identifier corresponding to each object; distributing each object to a corresponding partition according to the calculation result;

as an optional solution, performing fragment compression on the neighbor object identifier set of each object to obtain and store a group of compressed fragments corresponding to each object, including: performing fragment compression on the neighbor object identification set of each object to obtain a group of compressed fragments corresponding to each object; and storing a group of compressed fragments corresponding to each object into the corresponding partition.

Alternatively, in this embodiment, the calculation result obtained by taking the modulo of the preset value by the object identifier corresponding to each object may be, but not limited to, determined by calculating the hash value, and based on the calculation result, the partition corresponding to each object may be determined, for example, assuming that there are 7 objects, 1, 2, 3, 4, 5, 6, 7, and 3 partitions, respectively, are partition 0, partition 1, and partition 2, and further, the value obtained by taking the modulo of the object ID (1, 2, 3, 4, 5, 6, 7) to the partition number may be determined as the partition ID, for example, if the modulo of the object 1 to 3 is 1, then the partition is divided into partition 1, and the modulo of the object 6 to 3 is 0, that is, the partition 0 is divided into partition 0.

It should be noted that, obtaining an object identifier corresponding to each object; obtaining a calculation result obtained by modulo a preset value by an object identifier corresponding to each object; distributing each object to a corresponding partition according to the calculation result; performing fragment compression on the neighbor object identification set of each object to obtain a group of compressed fragments corresponding to each object; and storing a group of compressed fragments corresponding to each object into the corresponding partition.

Further by way of example, continuing with the scenario shown in fig. 3, as shown in fig. 10, a neighbor object identification set of each object (neighbor object identification set of each object in the neighbor object identification set 1002) is obtained in the object relationship graph (graph structure 302), and the neighbor object identification set of each object is assigned to a respective corresponding group (e.g., partition 1, partition 2, partition 3), wherein the neighbor object identification set 502 of each object includes identifications of neighbor objects having a connection relationship with each object; performing fragment compression on the neighbor object identification set 1002 of each object to obtain and store a group of compressed fragments (a group of fragments corresponding to each object in the fragment set 1004) corresponding to each object, wherein each compressed fragment comprises an array of a first data type, and each group member in the array of the first data type is used for recording the identification of one neighbor object; decompressing a group of compressed fragments corresponding to each stored object to obtain a group of decompressed fragments corresponding to each object (or a neighbor object identifier set of each object in a neighbor object identifier set 1006 is understood), wherein each decompressed fragment comprises an array of second data types, each member in the array of second data types is used for recording the identifier of one neighbor object, and the number of bits occupied by the second data types is greater than the number of bits occupied by the first data types; sequencing and splicing the identifiers of the neighbor objects recorded in a group of decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object (a neighbor object identifier sequence corresponding to each object in a neighbor object identifier sequence set 1006); and determining a common object corresponding to the two objects with the connection relation in the object relation graph (the common object corresponding to the two objects with the connection relation in the common object set 1008) according to the neighbor object identification sequence corresponding to each object.

Optionally, in this embodiment, initially, the neighbor set (neighbor object identifier set) corresponding to each object may be, but is not limited to being, empty, and further may be, but is not limited to, each computing node traversing the edge structure in the partition associated with each object, and dynamically pushing the neighbor set of each object, where, for example, for the edge structure "1-5", it can be seen that there is one neighbor of object 1 that is object 5, and that there is one neighbor of object 5 that is object 1, so that object 1 is added with one neighbor 5, and object 5 is added with one object 1. Furthermore, the process of each compute node traversing and pushing a neighbor object is parallel.

According to the embodiment provided by the application, the object identification corresponding to each object is obtained; obtaining a calculation result obtained by modulo a preset value by an object identifier corresponding to each object; distributing each object to a corresponding partition according to the calculation result; performing fragment compression on the neighbor object identification set of each object to obtain a group of compressed fragments corresponding to each object; and storing a group of compressed fragments corresponding to each object into the corresponding partition, and managing each object in a partition mode by using a distributed processing mode, so that the execution of the subsequent determination steps of the common objects is facilitated, and the effect of improving the determination efficiency of the common objects is realized.

As an alternative solution, for easy understanding, the above method for determining a common object is applied to a PS architecture to obtain a common friend (object) of two nodes in the PS architecture, where the common friend of two nodes needs to know a neighbor set of each node first, and then an intersection is obtained for the two sets, and based on this embodiment, a large-scale high-performance computing scheme based on a parameter server is provided, where a computing process is shown in fig. 11, and specific steps are as follows:

step S1102, loading the graph structure 1102; loading the graph structure 1102 from the distributed storage system such that edges (edge set 1104) in the graph structure are distributed to the computing nodes; as shown in step S1102, 11 edges are distributed to 3 computing nodes (e.g., computing node 1, computing node 2, computing node 3);

step S1104, generating an adjacency list 1106 on the parameter server; the data structure formed by all nodes in the graph structure 1102 and their corresponding neighbor sets may be referred to, but is not limited to, as an adjacency table 1106. In this embodiment, the adjacency list 1106 is stored in a distributed manner in the parameter server, specifically, a large adjacency list 1106 is divided into a plurality of partitions (such as partition 1, partition 2, partition 3), and the partitions are uniformly distributed on different computing nodes, and each partition includes a part of nodes and their corresponding neighbor sets. The hash value may be calculated to determine which partition a node and its neighbor set are divided into, such as 7 nodes in total and 3 PS partitions in fig. 11, in which, in the simplest terms, the value obtained by modulo the number of partitions by the node ID is the PS partition to which the node belongs, such as 1 by modulo 3 by the node 1, and 0 by modulo 3 by the node 6, i.e. to the 0 th partition.

Optionally, in this embodiment, initially, the neighbor set of all nodes in PS is empty, each computing node in step S1102 traverses the edges in the respective partition, and dynamically pushes the neighbor node of each node onto PS, as for the edge "1-5", it can be seen that node 1 has one neighbor as node 5 and node 5 also has one neighbor as node 1, so that node 1 on PS has one neighbor 5 added and node 5 has one neighbor 1 added. The process of each computing node traversing and pushing neighboring nodes is parallel.

Step S1106, sorting the adjacency list 1106 to obtain an adjacency list 1108; since each neighbor set of the neighbor table 1106 obtained in step S1104 is unordered, the neighbor table 1106 needs to be ordered first in order to facilitate computation of an intersection of the neighbor sets.

Step S1108, calculating common friends; when the adjacency list on the PS is prepared sufficiently, the computation of the common friends can begin. Specifically, each compute node traverses each edge within the respective partition, and for each edge's source node and target node, obtains a neighbor set from the corresponding PS partition, respectively, such as neighbor set [1,2,6,7] for node 5 from the 2 nd PS partition for edge "5-6," and neighbor set [2,5,7] for node 6 from the 0 th PS partition.

Optionally, in this embodiment, for a graph structure with a particularly large scale (such as trillion edges), this embodiment proposes a method for generating an adjacency table by dynamic compression, which can greatly reduce the memory occupation on the parameter server and the communication overhead in the calculation process. Compared to the calculation process shown in fig. 11, there are several differences:

in the process of generating the adjacency list in step S1104, the data structure of the neighbor set of each node on the original PS is an array with a data type of Long (Long), and node IDs of multiple Long types are sequentially arranged according to the writing sequence, wherein one Long type occupies 8 bytes; the basic principle of the method is that the neighbor set is stored in the form of compressed fragments. Specifically, setting a batch size parameter to represent the maximum number of neighbors which can be accommodated by each fragment, further assuming that the batch size is 3, firstly allocating an initial memory of 3*8 bytes, sequentially writing neighbors inwards, and compressing the fragment into an array with a data type of Byte when the number of the written neighbors reaches 3; next, the original allocated initial memory is emptied, the steps are continuously executed until all neighbors are obtained, and a plurality of compressed neighbor fragments are stored at the moment, so that more than one time of space can be saved compared with an original Long type array;

In addition, since the PS stores fragments of the compressed neighbor set, and the complete neighbor set needs to be pulled during calculation, compared with step S1106 in fig. 11, the data processing operation on the PS is also different, for example, only the neighbor set needs to be sorted originally, but the compressed fragments need to be decompressed into Long-type arrays, sorted after being spliced together, and finally the spliced and sorted neighbor sets are compressed again into Byte-type arrays, where the recompression is to save the total storage space. In the process of calculating the common friends in step S1106, each calculation node pulls the neighbor set of the node from the PS, decompresses, and then calculates. Because the compressed data is pulled, the data size is reduced by one time compared with the original edition, and the communication pressure between nodes is greatly reduced.

By the embodiment of the application, common friend calculation of the ultra-large-scale network of the trillion/trillion relation chain level can be realized under the condition of limited resources, so that the running speed is improved, and the consumption of resources is reduced; in the context of association calculation and recommendation in a social network, the more the common friends of two objects are, the more the relationship is shown to be tighter or the circle where the two objects are located is overlapped to a certain extent, so that the relationship can be used as a basis for social or product recommendation.

It will be appreciated that in the specific embodiments of the present application, related data such as user information is involved, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

According to another aspect of the embodiment of the present application, there is also provided a common object determining apparatus for implementing the above-described common object determining method. As shown in fig. 12, the apparatus includes:

a first obtaining unit 1202, configured to obtain, in an object relationship graph, a neighbor object identifier set of each object, where the neighbor object identifier set of each object includes an identifier of a neighbor object that has a connection relationship with each object;

A first compression unit 1204, configured to perform fragment compression on a neighbor object identifier set of each object, to obtain and store a set of compressed fragments corresponding to each object, where each compressed fragment includes an array of first data types, and each member of the array of first data types is used to record an identifier of a neighbor object;

the decompression unit 1206 is configured to decompress a set of compressed fragments corresponding to each stored object to obtain a set of decompressed fragments corresponding to each object, where each decompressed fragment includes an array of second data types, each member in the array of second data types is configured to record an identifier of a neighboring object, and the number of bits occupied by the second data type is greater than the number of bits occupied by the first data type;

the processing unit 1208 is configured to sort and splice the identifiers of the neighbor objects recorded in the decompressed set of slices corresponding to each object, so as to obtain a neighbor object identifier sequence corresponding to each object;

the determining unit 1210 is configured to determine, according to the neighbor object identification sequence corresponding to each object, a common object corresponding to two objects having a connection relationship in the object relationship graph.

Optionally, in this embodiment, the determining device of the common object may be, but not limited to, calculating and recommending the degree of association applied in the social network, for example, the more common friends of two nodes, the higher the degree of association of the relationship or the more overlapping circles where the common friends are located, which may be used as the basis of social or product recommendation.

Optionally, in this embodiment, the above-mentioned common object determining apparatus may be applied, but not limited to, in a large-scale high-performance computing scenario, such as a graph structure (object relationship graph) for a large scale (e.g. trillion edges), where the related technology is usually unacceptable due to too long time consumption and more resources are occupied, but for this embodiment, the effect of completing high-performance computing with less resources occupied may be achieved in the above-mentioned technical scenario;

Specific embodiments may refer to examples shown in the above method for determining the common object, which are not described herein.

As an alternative, the first compression unit 1204 includes:

the first execution module is configured to execute the following steps for a neighbor object identifier set of each object, where each object is a current object when executing the following steps:

As an alternative, the first execution module includes:

the execution submodule is used for repeatedly executing the following steps for the neighbor object identification set of the current object until the neighbor object identification set of the current object is written into the target memory, wherein the target memory is an initial memory of N multiplied by M bytes which are pre-allocated, and the M bytes are the bit number occupied by the second data type:

Determining whether the number of neighbor object identifiers which are not written into a target memory is greater than or equal to N in a neighbor object identifier set of the current object;

under the condition that the number of neighbor object identifiers which are not written into the target memory is greater than or equal to N, N neighbor object identifiers which are not written into the target memory are written into the target memory, the N neighbor object identifiers in the target memory are compressed into a compressed fragment, and the target memory is emptied;

and under the condition that the number of neighbor object identifiers which are not written into the target memory is smaller than N, writing the neighbor object identifiers which are not written into the target memory, compressing the neighbor object identifiers in the target memory into a compressed fragment, and clearing the target memory.

As an alternative, the decompression unit 1206 includes:

the second execution module is configured to execute the following steps for a group of compressed fragments corresponding to each object, where each object is a current object when the following steps are executed:

decompressing the P groups of compressed fragments corresponding to the current object respectively to obtain P groups of decompressed fragments, wherein P is a positive integer greater than or equal to 1;

Sequencing and splicing the identifiers of the neighbor objects recorded in the P groups of decompressed fragments to obtain a neighbor object identifier sequence corresponding to the current object, wherein the neighbor object identifiers in the neighbor object identifier sequence corresponding to the current object are identical to the neighbor object identifiers in the neighbor object identifier set of the current object.

As an alternative, the apparatus further includes: the second compression unit is used for sorting and splicing the identifiers of the neighbor objects recorded in the decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object, and then respectively compressing the neighbor object identifier sequence corresponding to each object to obtain and store a target array of the first data type corresponding to each object;

the determining unit 1210 includes: the decompression module is used for respectively decompressing the stored target arrays of the first data type corresponding to each object to obtain a neighbor object identification sequence corresponding to each object; and the first determining module is used for determining a common object corresponding to the two objects with the connection relation in the object relation diagram according to the neighbor object identification sequence corresponding to each object.

As an alternative, the determining unit 1210 includes:

the searching module is used for searching the neighbor object identifiers included in the neighbor object identifier sequence of the first object and the neighbor object identifier sequence of the second object under the condition that the neighbor object identifier sequence corresponding to each object comprises the neighbor object identifier sequence of the first object and the neighbor object identifier sequence of the second object;

and the second determining module is used for determining the object represented by the searched neighbor object identifier as a common object corresponding to the first object and the second object under the condition that the neighbor object identifier included in the neighbor object identifier sequence of the first object and the neighbor object identifier sequence of the second object are searched.

As an alternative, the first obtaining unit 1202 includes: the loading module is used for loading the object relation graph from the distributed storage system, and distributing each edge structure in the object relation graph to each computing node of the distributed storage system for management, wherein the edge structure is used for representing at least two neighbor objects with a connection relation, and the at least two neighbor objects comprise at least one group of source objects and target objects;

The determining unit 1210 includes: the calling module is used for calling each computing node to traverse the edge structure managed by each computing node to acquire each group of source objects and target objects corresponding to the edge structure managed by each computing node; and the third determining module is used for determining common objects corresponding to each group of source objects and target objects according to the neighbor object identification sequences corresponding to each object.

As an alternative, the determining unit 1210 includes:

the acquisition module is used for acquiring a group of neighbor object identification sequences to be confirmed from the neighbor object identification sequences corresponding to each object, wherein the group of neighbor object identification sequences to be confirmed comprises a first neighbor object identification sequence and a second neighbor object identification sequence;

the creating module is used for creating a first pointer for the first neighbor object identification sequence and a second pointer for the second neighbor object identification sequence, wherein the first pointer is used for indicating the sequence starting point position of the first neighbor object identification sequence, and the second pointer is used for indicating the sequence starting point position of the second neighbor object identification sequence;

The first moving module is used for moving the position of the first pointer in the first neighbor object identification sequence by at least one sequence unit towards the target direction under the condition that the identification of the position indicated by the first pointer is smaller than the identification of the position indicated by the second pointer;

the second moving module is used for moving the position of the second pointer in the second neighbor object identification sequence by at least one sequence unit towards the target direction under the condition that the identification of the position indicated by the first pointer is larger than the identification of the position indicated by the second pointer;

a third moving module, configured to add the equal identifier to the intersection set, and move the position of the second pointer in the second neighbor object identifier sequence by at least one sequence unit toward the target direction, and move the position of the first pointer in the first neighbor object identifier sequence by at least one sequence unit toward the target direction, when the identifier of the position indicated by the first pointer is equal to the identifier of the position indicated by the second pointer;

and the fourth determining module is used for determining the object corresponding to the identifier in the intersection set as a common object corresponding to all the two objects with the connection relationship in the object relationship graph under the condition that the first pointer traverses all the positions in the first neighbor object identification sequence and the second pointer traverses all the positions in the second neighbor object identification sequence.

As an alternative, the apparatus includes: after acquiring the neighbor object identification set of each object in the object relation diagram, a second acquisition unit is used for acquiring the object identification corresponding to each object; the third obtaining unit is used for obtaining a calculation result obtained by modulo a preset value by the object identifier corresponding to each object after obtaining the neighbor object identifier set of each object in the object relation diagram; the distribution unit is used for distributing each object to each corresponding partition according to the calculation result after the neighbor object identification set of each object is acquired in the object relation diagram;

the first compression unit 1204 includes: the segmentation module is used for compressing the neighbor object identification set of each object in a segmentation way to obtain a group of compressed segments corresponding to each object; and the storage module is used for storing a group of compressed fragments corresponding to each object to the corresponding partition.

According to a further aspect of the embodiments of the present application, there is also provided an electronic device for implementing the above-mentioned method of determining a common object, the electronic device comprising a memory 1302 and a processor 1304, as shown in fig. 13, the memory 1302 having stored therein a computer program, the processor 1304 being arranged to perform the steps of any of the method embodiments described above by means of the computer program.

Alternatively, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of the computer network.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:

s1, acquiring a neighbor object identification set of each object in an object relation graph, wherein the neighbor object identification set of each object comprises identifications of neighbor objects with connection relation with each object;

s2, performing fragment compression on the neighbor object identification set of each object to obtain and store a group of compressed fragments corresponding to each object, wherein each compressed fragment comprises an array of first data types, and each group member in the array of first data types is used for recording the identification of one neighbor object;

S3, decompressing a group of compressed fragments corresponding to each stored object to obtain a group of decompressed fragments corresponding to each object, wherein each decompressed fragment comprises an array of second data types, each member in the array of second data types is used for recording the identification of one neighbor object, and the number of bits occupied by the second data types is larger than the number of bits occupied by the first data types;

s4, sequencing and splicing the identifiers of the neighbor objects recorded in a group of decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object;

s5, determining a common object corresponding to the two objects with the connection relationship in the object relationship graph according to the neighbor object identification sequence corresponding to each object.

Alternatively, it will be understood by those skilled in the art that the structure shown in fig. 13 is only schematic, and the electronic device may also be a terminal device such as a smart phone (e.g. an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile internet device (Mobile Internet Devices, MID), a PAD, etc. Fig. 13 is not limited to the structure of the electronic device described above. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 13, or have a different configuration than shown in FIG. 13.

The memory 1302 may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for determining a common object in the embodiments of the present application, and the processor 1304 executes the software programs and modules stored in the memory 1302 to perform various functional applications and data processing, that is, implement the method for determining a common object. Memory 1302 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 1302 may further include memory located remotely from processor 1304, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1302 may be specifically, but not limited to, configured to store information such as a neighbor object identifier set, a set of compressed fragments corresponding to each object, and a common object. As an example, as shown in fig. 13, the memory 1302 may include, but is not limited to, a first acquiring unit 1202, a first compressing unit 1204, a decompressing unit 1206, a processing unit 1208, and a determining unit 1210 in the determining device including the common object. In addition, other module units in the above-mentioned common object determining device may be included, but are not limited to, and are not described in detail in this example.

Optionally, the transmission device 1306 is configured to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission means 1306 comprises a network adapter (Network Interface Controller, NIC) which can be connected to other network devices and routers via network lines so as to communicate with the internet or a local area network. In one example, the transmission device 1306 is a Radio Frequency (RF) module for communicating wirelessly with the internet.

In addition, the electronic device further includes: a display 1308, configured to display the neighbor object identifier set, a group of compressed fragments corresponding to each object, and information such as a common object; and a connection bus 1313 for connecting the respective module parts in the above-described electronic apparatus.

In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting the plurality of nodes through a network communication. Among them, the nodes may form a Peer-To-Peer (P2P) network, and any type of computing device, such as a server, a terminal, etc., may become a node in the blockchain system by joining the Peer-To-Peer network.

According to one aspect of the present application, there is provided a computer program product comprising a computer program/instruction containing program code for executing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via a communication portion, and/or installed from a removable medium. When executed by a central processing unit, performs various functions provided by embodiments of the present application.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

It should be noted that the computer system of the electronic device is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

The computer system includes a central processing unit (Central Processing Unit, CPU) which can execute various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) or a program loaded from a storage section into a random access Memory (Random Access Memory, RAM). In the random access memory, various programs and data required for the system operation are also stored. The CPU, the ROM and the RAM are connected to each other by bus. An Input/Output interface (i.e., I/O interface) is also connected to the bus.

The following components are connected to the input/output interface: an input section including a keyboard, a mouse, etc.; an output section including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and the like, and a speaker, and the like; a storage section including a hard disk or the like; and a communication section including a network interface card such as a local area network card, a modem, and the like. The communication section performs communication processing via a network such as the internet. The drive is also connected to the input/output interface as needed. Removable media such as magnetic disks, optical disks, magneto-optical disks, semiconductor memories, and the like are mounted on the drive as needed so that a computer program read therefrom is mounted into the storage section as needed.

In particular, the processes described in the various method flowcharts may be implemented as computer software programs according to embodiments of the application. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such embodiments, the computer program may be downloaded and installed from a network via a communication portion, and/or installed from a removable medium. The computer program, when executed by a central processing unit, performs the various functions defined in the system of the application.

According to one aspect of the present application, there is provided a computer-readable storage medium, from which a processor of a computer device reads the computer instructions, the processor executing the computer instructions, causing the computer device to perform the methods provided in the various alternative implementations described above.

Alternatively, in the present embodiment, the above-described computer-readable storage medium may be configured to store a computer program for executing the steps of:

Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. A method of determining a common object, comprising:

acquiring a neighbor object identification set of each object in an object relation graph, wherein the neighbor object identification set of each object comprises identifications of neighbor objects with connection relation with each object;

performing fragment compression on the neighbor object identification set of each object to obtain and store a group of compressed fragments corresponding to each object, wherein each compressed fragment comprises an array of first data types, and each group member in the array of first data types is used for recording the identification of one neighbor object;

Decompressing the stored compressed set of fragments corresponding to each object to obtain a decompressed set of fragments corresponding to each object, wherein each decompressed fragment comprises an array of second data types, each member of the array of second data types is used for recording an identifier of one neighbor object, and the number of bits occupied by the second data types is greater than the number of bits occupied by the first data types;

sequencing and splicing the identifiers of the neighbor objects recorded in a group of decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object;

and determining a common object corresponding to the two objects with the connection relationship in the object relationship graph according to the neighbor object identification sequence corresponding to each object.

2. The method according to claim 1, wherein performing tile compression on the neighbor object identifier set of each object to obtain and store a set of compressed tiles corresponding to each object, includes:

executing the following steps on the neighbor object identification set of each object, wherein each object is a current object when executing the following steps:

Writing the neighbor object identifiers in the neighbor object identifier set of the current object into a target memory in batches, then compressing the neighbor object identifiers recorded in the target memory into one compressed fragment, and storing the compressed fragment, wherein the maximum value of the number of the neighbor object identifiers written into the target memory in each batch is N, N is the array size of the array of the second data type, and N is a positive integer greater than or equal to 1.

3. The method according to claim 2, wherein writing the neighbor object identifiers in the neighbor object identifier set of the current object into the target memory in batches, and then compressing the neighbor object identifiers recorded in the target memory into one of the compressed fragments comprises:

repeatedly executing the following steps on the neighbor object identification set of the current object until the neighbor object identification set of the current object is written into the target memory, wherein the target memory is an initial memory of preassigned N multiplied by M bytes, and the M bytes are the number of bits occupied by the second data type:

Determining whether the number of neighbor object identifiers which are not written into the target memory is greater than or equal to N in a neighbor object identifier set of the current object;

when the number of neighbor object identifiers which are not written into the target memory is greater than or equal to N, N neighbor object identifiers in the neighbor object identifiers which are not written into the target memory are written into the target memory, the N neighbor object identifiers in the target memory are compressed into one compressed fragment, and the target memory is emptied;

and under the condition that the number of the neighbor object identifiers which are not written into the target memory is smaller than N, writing the neighbor object identifiers which are not written into the target memory, compressing the neighbor object identifiers in the target memory into a compressed fragment, and emptying the target memory.

4. The method of claim 1, wherein decompressing the stored set of compressed fragments corresponding to each object to obtain a set of decompressed fragments corresponding to each object, ordering and stitching the identifiers of the neighbor objects recorded in the set of decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object, and comprising:

Executing the following steps on the group of compressed fragments corresponding to each object, wherein when executing the following steps, each object is a current object:

and sequencing and splicing the identifiers of the neighbor objects recorded in the P groups of decompressed fragments to obtain a neighbor object identifier sequence corresponding to the current object, wherein the neighbor object identifiers in the neighbor object identifier sequence corresponding to the current object are identical to the neighbor object identifiers in the neighbor object identifier set of the current object.

5. The method of claim 1, wherein the step of determining the position of the substrate comprises,

after the identifiers of the neighbor objects recorded in the decompressed set of fragments corresponding to each object are sequenced and spliced to obtain a neighbor object identifier sequence corresponding to each object, the method further comprises: respectively compressing the neighbor object identification sequences corresponding to each object to obtain and store a target array of the first data type corresponding to each object;

The determining, according to the neighbor object identifier sequence corresponding to each object, a common object corresponding to two objects having a connection relationship in the object relationship graph includes: respectively decompressing the stored target arrays of the first data type corresponding to each object to obtain a neighbor object identification sequence corresponding to each object; and determining a common object corresponding to the two objects with the connection relationship in the object relationship graph according to the neighbor object identification sequence corresponding to each object.

6. The method according to any one of claims 1 to 5, wherein the determining, according to the neighbor object identification sequence corresponding to each object, a common object corresponding to two objects having a connection relationship in the object relationship graph includes:

searching the neighbor object identification contained in the neighbor object identification sequence of the first object and the neighbor object identification sequence of the second object under the condition that the neighbor object identification sequence corresponding to each object comprises the neighbor object identification sequence of the first object and the neighbor object identification sequence of the second object;

and under the condition that the neighbor object identifiers included in the neighbor object identifier sequence of the first object and the neighbor object identifier sequence of the second object are searched, determining the object represented by the searched neighbor object identifier as a common object corresponding to the first object and the second object.

7. The method according to any one of claim 1 to 5, wherein,

the obtaining the neighbor object identification set of each object in the object relation graph comprises the following steps: loading the object relation graph from a distributed storage system, and distributing each edge structure in the object relation graph to each computing node of the distributed storage system for management, wherein the edge structure is used for representing at least two neighbor objects with connection relations, and the at least two neighbor objects comprise at least one group of source objects and target objects;

the determining, according to the neighbor object identifier sequence corresponding to each object, a common object corresponding to two objects having a connection relationship in the object relationship graph includes: each computing node is called to traverse the edge structure managed by each computing node, and each group of source objects and target objects corresponding to the edge structure managed by each computing node are obtained; and determining the common object corresponding to each group of source object and target object according to the neighbor object identification sequence corresponding to each object.

8. The method according to any one of claims 1 to 5, wherein the determining, according to the neighbor object identification sequence corresponding to each object, a common object corresponding to two objects having a connection relationship in the object relationship graph includes:

Acquiring a group of neighbor object identification sequences to be confirmed from the neighbor object identification sequences corresponding to each object, wherein the group of neighbor object identification sequences to be confirmed comprises a first neighbor object identification sequence and a second neighbor object identification sequence;

creating a first pointer for the first neighbor object identification sequence and a second pointer for the second neighbor object identification sequence, wherein the first pointer is used for indicating the sequence starting point position of the first neighbor object identification sequence, and the second pointer is used for indicating the sequence starting point position of the second neighbor object identification sequence;

moving the position of the first pointer in the first neighbor object identification sequence by at least one sequence unit towards a target direction under the condition that the identification of the position indicated by the first pointer is smaller than the identification of the position indicated by the second pointer;

moving the position of the second pointer in the second neighbor object identification sequence by the at least one sequence unit towards the target direction under the condition that the identification of the position indicated by the first pointer is larger than the identification of the position indicated by the second pointer;

Adding the equal identification to an intersection set under the condition that the identification of the position indicated by the first pointer is equal to the identification of the position indicated by the second pointer, moving the position of the second pointer in the second neighbor object identification sequence by at least one sequence unit towards the target direction, and moving the position of the first pointer in the first neighbor object identification sequence by at least one sequence unit towards the target direction;

and determining the object corresponding to the identification in the intersection set as a common object corresponding to all the two objects with the connection relation in the object relation graph under the condition that the first pointer traverses all the positions in the first neighbor object identification sequence and the second pointer traverses all the positions in the second neighbor object identification sequence.

9. The method according to any one of claim 1 to 5, wherein,

after the neighbor object identification set of each object is obtained in the object relation graph, the method comprises the following steps: acquiring an object identifier corresponding to each object; obtaining a calculation result obtained by taking a model of a preset numerical value by an object identifier corresponding to each object; distributing each object to each corresponding partition according to the calculation result;

The compressing the fragments of the neighbor object identifier set of each object to obtain and store a group of compressed fragments corresponding to each object, including: performing fragment compression on the neighbor object identification set of each object to obtain a group of compressed fragments corresponding to each object; and storing the compressed fragments corresponding to each object into the corresponding partition.

10. A common object determining apparatus, comprising:

a first obtaining unit, configured to obtain a neighbor object identifier set of each object in an object relationship graph, where the neighbor object identifier set of each object includes an identifier of a neighbor object that has a connection relationship with each object;

the first compression unit is used for carrying out fragment compression on the neighbor object identification set of each object to obtain and store a group of compressed fragments corresponding to each object, wherein each compressed fragment comprises an array of a first data type, and each member of the array of the first data type is used for recording the identification of one neighbor object;

the decompression unit is used for decompressing the group of compressed fragments corresponding to each stored object to obtain a group of decompressed fragments corresponding to each object, wherein each decompressed fragment comprises an array of second data types, each member in the array of second data types is used for recording the identification of one neighbor object, and the number of bits occupied by the second data types is larger than the number of bits occupied by the first data types;

The processing unit is used for sequencing and splicing the identifiers of the neighbor objects recorded in the decompressed fragments corresponding to each object to obtain a neighbor object identifier sequence corresponding to each object;

and the determining unit is used for determining a common object corresponding to the two objects with the connection relationship in the object relationship graph according to the neighbor object identification sequence corresponding to each object.

11. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program when run performs the method of any one of claims 1 to 9.

12. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 9.

13. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1 to 9 by means of the computer program.