CN103177414A

CN103177414A - Structure-based dependency graph node similarity concurrent computation method

Info

Publication number: CN103177414A
Application number: CN2013101022817A
Authority: CN
Inventors: 冯伟; 万亮; 谭志羽; 鲁志超; 江健民
Original assignee: Tianjin University
Current assignee: Shenzhen kanghongtai Technology Co.,Ltd.
Priority date: 2013-03-27
Filing date: 2013-03-27
Publication date: 2013-06-26
Anticipated expiration: 2033-03-27
Also published as: CN103177414B

Abstract

The invention discloses a structure-based dependency graph node similarity concurrent computation method. The method comprises the following steps of: a CPU (central processing unit) end is taken as a host machine end, reading a plurality of story texts or graphics, and building a graph model to obtain adjacent matrixes of the graphics; a GPU (graphics processing unit) end is taken as an equipment end, receiving the adjacent matrixes output by the CPU end, and computing the adjacent matrixes by the GPU end; and obtaining the adjacent matrixes by the GPU end, and transmitting to the CPU end. According to the method, after the concurrent method is used, the similarity computation speed can be greatly accelerated, the higher computation precision can be guaranteed, and the efficiency and precision requirements of the computation and the application of a mass of medias can be met; and the experiment result shows that on the premise of similar precision, the acceleration algorithm provided by the method obtains the speed-up ratio which is averagely more than 100 times as high as the speed-up ratio obtained by the existing algorithm.

Description

A kind of node of graph similarity parallel calculating method based on structure

Technical field

The present invention relates to the media computation field, particularly a kind of node of graph similarity parallel calculating method based on structure.

Background technology

At present in the media computation field, when solving the problem such as image segmentation, content retrieval and coupling, by the design of graphics model, spread to obtain corresponding result based on the similarity between node.Simple, the node of graph similarity calculate be node in evaluation map (for example: super pixel) a kind of means of structural similarity.

Usually adopt the descriptor between node to go to measure two similarities between node in prior art, carry out the similarity diffusion based on the similarity relation between the node neighbours and syntople.

The inventor finds to exist at least in prior art following shortcoming and defect in realizing process of the present invention:

Along with the increase of figure scale, can greatly increase the computing time of similarity diffusion, increased the complexity of calculating, and complexity even can reach O(kn ⁴), can't satisfy the needs in practical application.

Summary of the invention

The invention provides a kind of node of graph similarity parallel calculating method based on structure, this method has reduced complexity and the computing time of calculating, and has satisfied the needs in the practical application, sees for details hereinafter and describes:

A kind of node of graph similarity parallel calculating method based on structure said method comprising the steps of:

(1) the CPU end reads in a plurality of story text or image as host side, sets up graph model, obtains adjacency matrix;

(2) the GPU end as equipment end, receives the adjacency matrix of CPU end transmission, and the GPU end calculates adjacency matrix;

(3) the GPU end obtains adjacency matrix, and transfers to the CPU end.

When the CPU end read in a plurality of story text as host side, the step that described GPU end calculates adjacency matrix W was specially: described GPU end calculates the first adjacency matrix, that is,

1) pass through node a and the location index computing node of b in described the first adjacency matrix to (a, b) corresponding block index and thread index in grid, wherein grid is the grid of GPU kernel function, and block is the thread block in grid, and thread is the thread in thread block;

2) GPU end be in the first adjacency matrix each node to (a, b) the corresponding thread of the similarity dispensed between, that is: by block index and thread index search node to (a, b) corresponding thread, hold by the similarity of corresponding thread computes node to (a, b) at GPU.

When the CPU end read in a plurality of image as host side, the step that described GPU end calculates adjacency matrix W was specially: described GPU end calculates the second adjacency matrix, comprising:

Location matrix P when 1) searching for the k-1 time iteration _k-1Middle nonzero value is charged to respectively row with line index, column index and the respective value of nonzero element, and col is in three arrays of value;

2) by location matrix P _k-1Calculate the location matrix P of the K time iteration _k:

3) calculate diagonal element and;

4) M that this iteration is obtained _kAdd in S (a, b): S (a, b)=S (a, b)+M _k

When the CPU end read in a plurality of image as host side, described method also comprised:

Described CPU end obtains transition matrix T, and described GPU end receives described transition matrix T as equipment end.

When the CPU end reads in a plurality of image as host side, described method also comprises: the structure that described transition matrix T is stored with the row compression is stored as sparse matrix, the step that described GPU end calculates adjacency matrix W is specially: described GPU end calculates the second adjacency matrix, comprising:

1) GPU end kernel function parallel computation similarity is called in the circulation of CPU end for K time;

2) the GPU end is passed result of calculation back the CPU end;

3) GPU end calculating location matrix P _kDiagonal line and M _k:

4) the GPU end calculates the value S (a, b) of the similarity of corresponding element in similarity matrix s: S (a, b)=S (a, b)+M _k

The circulation of described CPU end is called GPU end kernel function parallel computation similarity for K time and is specifically comprised:

A) calculate T _iIn nonzero value index x;

B) calculate T _jIn nonzero value index y;

C) calculate the similarity of manipulative indexing;

D) calculate node to (a, b) position in location matrix;

E) upgrade location matrix P _k:

The beneficial effect of technical scheme provided by the invention is: the CPU end reads in a plurality of story text or image as host side, sets up graph model, obtains adjacency matrix; The GPU end calculates adjacency matrix, and transfers to the CPU end; Improve the precision of calculating chart node similarity by this method, reduced complexity and the computing time of calculating, satisfied the needs in the practical application; Experimental result shows, under the prerequisite of similarity precision, accelerating algorithm proposed by the invention has obtained average speed-up ratio more than 100 times.

Description of drawings

Fig. 1 (a) and Fig. 1 (b) are original graph;

The result that the conspicuousness that Fig. 1 (c) and Fig. 1 (d) calculate for this method detects;

The result that the conspicuousness that Fig. 1 (e) and Fig. 1 (f) calculate for prior art detects;

Fig. 2 is a kind of process flow diagram of the node of graph similarity parallel calculating method based on structure;

Fig. 3 is another process flow diagram of a kind of node of graph similarity parallel calculating method based on structure;

Fig. 4 is another process flow diagram of a kind of node of graph similarity parallel calculating method based on structure.

Embodiment

For making the purpose, technical solutions and advantages of the present invention clearer, embodiment of the present invention is described further in detail below in conjunction with accompanying drawing.

In the CUDA programming model, there are two concepts of main frame and equipment.Computer CPU is commonly called host side (Host), and its GPU is commonly called equipment end (Device).A main frame and a plurality of equipment can be arranged in General System.

Use the CUDA programming model to programme, can assign the task to host side and equipment end is completed separately.Wherein host side is responsible for processing the logic of function and things and the calculating of suitable serial, and equipment end can be used for carrying out the thread-level task that is fit to highly-parallel.CPU, GPU have separate memory address space separately: the internal memory of host side and the video memory of equipment end.CUDA uses same syntax of figs and the function of former C language manipulation internal memory to the operation of internal memory, the operation video memory needs to call function relevant with memory management in CUDA API, these operations have comprised application, release and initialization video memory space, and the copy of data between internal memory and video memory etc.Be fit to parallel part in having analyzed program after, just can consider that these tasks that can walk abreast are transferred to GPU to be calculated.

For complexity and the computing time of reducing calculating, satisfy the needs in practical application, the embodiment of the present invention provides a kind of node of graph similarity parallel calculating method based on structure, and the method comprises the following steps:

Embodiment 1

The 101:CPU end reads in a plurality of story text as host side, sets up graph model, obtains the first adjacency matrix W of figure;

Node in figure represents the word in story, the similarity between the limit representation node between node, and according to the first similarity measurement rule, word between story inside word and story is set up the relation on limit in graph model, obtain the first adjacency matrix W of figure.

Wherein, the first similarity measurement rule is set according to the needs in practical application, and for example: the distance between the frequency that the similarity measurement between the inner word of story is occurred by word A, the frequency that word B occurs and word A and the common appearance of word B and two words is determined jointly less than the number of times of preset value; Between story, the similarity measurement of word is determined by the frequency of word A appearance and the frequency of word B appearance, and preset value is set by the needs in practical application.

The 102:GPU end receives the first adjacency matrix W of CPU end transmission as equipment end, and the GPU end calculates the first adjacency matrix W;

Wherein, the first adjacency matrix W is definite by the right similarity of all nodes in matrix, and this step specifically comprises:

1) be GPU end line journey Distribution Calculation task: the location index computing node in the first adjacency matrix W is to (a by node a and b, b) corresponding block index and thread index in grid, wherein grid is the grid of GPU kernel function, block is the thread block in grid, and thread is the thread in thread block;

2) GPU end be in the first adjacency matrix W each node to (a, b) the corresponding thread of the similarity dispensed between, that is: by block index and thread index search node to (a, b) corresponding thread, hold by the similarity of corresponding thread computes node to (a, b) at GPU:

For a given digraph G, make s (a, b) come similarity between representation node a and node b, these two node SimRank similarities are defined as follows:

For the situation of a=b, s (a, b)=1; For the situation of a ≠ b, s (a, b) is calculated as follows:

S (a, b) = \frac{C}{| I (a) | | I (b) |} Σ_{i = 1}^{| I (a) |} Σ_{j = 1}^{| I (b) |} S (I_{i} (a), I_{j} (b))

Wherein, C is the constant coefficient between 0,1; | I (a) | and | I (b) | represent that respectively a and b's enters neighbours' number; I _i(a) i of representation node a enters neighbours, I _j(b) j of representation node b enters neighbours.

This full point to algorithm in, calculate simultaneously all nodes between similarity use for next iterative computation.Make R _kThe SimRank similarity of the k time iteration between (a, b) expression (a, b), S (a, b)=lim _{K → ∞}R _k(a, b).

For R _kIterative manner is used in the calculating of (a, b).During initialization, when a is not equal to b, R (a, b)=0, otherwise it equals 1, then carries out iteration by following formula:

R_{k + 1} (a + b) = \frac{C}{| I (a) | | I (b) |} Σ_{i = 1}^{| I (a) |} Σ_{j = 1}^{| I (b) |} R_{k} (I_{i} (a), I_{j} (b))

The 103:GPU end obtains the first adjacency matrix W, and transfers to the CPU end.

Embodiment 2

The 201:CPU end is set up graph model as many images of host side input, obtains the second adjacency matrix W and the transition matrix T of figure;

Super pixel in node representative image in figure, similarity between limit representation node between node, and according to the second similarity measurement rule between the super pixel of an image inside and set up the relation on limit between the super pixel of different images in graph model, finally calculate the second adjacency matrix W, and calculate transition matrix T by the second adjacency matrix W, and read in algorithm parameter constant decay factor C and error e rr.

Wherein, the second similarity measurement rule is set according to the needs in practical application, and for example: the super pixel of each piece is calculated obtained range descriptors, what image was interior calculates its similarity by range descriptors between super pixel in twos; Calculate its similarity by range descriptors between super pixel in twos between image.

During specific implementation, the CPU end also needs:

1) initialization similarity matrix and location matrix: similarity matrix is initialized as zero, s (a, b)=0; Location matrix is initialized as 1,

A kind of character that has due to this algorithm, that is: a, the internodal similarity of b two are equal to two random walk persons each set out since a, b, the possibility of First sight when advancing at random in converse digraph.

Expression has been walked k during the step, and two random walk person a, b are at the location matrix that had not met before, and the element in matrix [i, j] represents that namely migration person a, b have walked k and gone on foot the probability that arrives respectively i, j afterwards.

2) calculate number of times K repeatly, K=(int) log _ΔC。

Wherein, C represents decay factor: C=0.5, Δ represents error: Δ=0.01.

The 202:GPU end receives the second adjacency matrix W and the transition matrix T of CPU end transmission as equipment end, and the GPU end calculates the second adjacency matrix W;

Wherein, this step specifically comprises:

1) search P _k-1Nonzero value in matrix (location matrix during the k-1 time iteration) is charged to respectively row with line index, column index and the respective value of nonzero element, and col is in three arrays of value;

At search P _k-1During matrix, when running into nonzero element, deposit line number in array row, row number deposit array col in, and respective value deposits array value in.

2) pass through P _k-1The location matrix P of the K time iteration of matrix computations _k: by following formula calculating location matrix P _k:

P_{k} = C \cdot Σ_{i = 1}^{| V |} Σ_{j = 1; j &NotEqual; i}^{| V |} {(P_{k - 1})}_{ij} \cdot ({T_{i}}^{'} T_{j})

Wherein, T _i, T _jBe the row vector of transition matrix T, T _i' be T _iTransposition, | V| is neighbours' number of node.

During specific implementation, the algorithm of corresponding kernel function is as follows: at first, calculate horizontal ordinate and the ordinate that will calculate in location matrix according to built-in variable; Then, initialized location matrix P _k, make P _k[a, b]=0, go to calculate according to following formula by circulation n time (n is the number of figure node) at last:

P _k[a,b]=P _k[a,b]+T[i,a]*T[j,b]*value[m]

Wherein, T[i, a] represent that the i of transition matrix T is capable, a column element; T[j, b] represent that the j of transition matrix T is capable, the b column element; Value[m] be P _k-1Value in [a, b].

2) calculate diagonal element and;

M_{k} = Σ_{i = 1}^{n} P_{k} [i, i]

3) M that this iteration is obtained _kAdd in S (a, b): S (a, b)=S (a, b)+M _k

The 203:GPU end obtains the second adjacency matrix W, and transfers to the CPU end.

Embodiment 3

When the larger of figure and when more sparse, in order to improve the speed of calculating the second adjacency matrix, this method can also be calculated the second adjacency matrix by step 302.

The 301:CPU end is set up graph model as many images of host side input to it, obtains the second adjacency matrix W and the transition matrix T of figure, and transition matrix T is stored as sparse matrix with CRS (Compressed Row Storage, compressed line storage) structure;

This file layout is stored following vector with continuous core position: val line number group is sequentially stored the non-vanishing matrix element element with the behavior master, the column index of each element in col storage of array val array, the rowptr vector is stored the element sequence number that begins delegation in the val array.

Super pixel in node representative image in figure, similarity between limit representation node between node, and according to the second similarity measurement rule between the super pixel of an image inside and set up the relation on limit between the super pixel of different images in graph model, finally calculate the second adjacency matrix W, and calculate transition matrix T by the second adjacency matrix W, and read in algorithm parameter constant C and error e rr.

During specific implementation, the CPU end also needs:

Wherein, a kind of character that has due to this algorithm, that is: a, the internodal similarity of b two are equal to two random walk persons each set out since a, b, the possibility of First sight when advancing at random in converse digraph.

2) calculate number of times K repeatly, K=(int) log _ΔC。

The 302:GPU end receives the second adjacency matrix W and the transition matrix T of CPU end transmission as equipment end, and the GPU end calculates the second adjacency matrix W;

Wherein, this step specifically comprises:

Each nonzero element to corresponding in transition matrix T calls kernel function, starts accordingly a thread, and this thread adds location matrix P after corresponding nonzero value is calculated _kMiddle correspondence position wherein, mainly comprises the following steps:

A) calculate T _iIn nonzero value index x: calculate and represent that this thread need to use T _iIn x nonzero value.

B) calculate T _jIn nonzero value index y: calculate and represent that this thread need to use T _jIn y nonzero value.

C) calculate the similarity of manipulative indexing: according to x, y, with corresponding T _i, T _jIn value fetch calculating, similarity is designated as s.

D) calculate node to (a, b) position in location matrix: according to index x, y, calculate the index in the location matrix that s should insert;

Wherein, transition matrix T is the result after the converse digraph adjacency matrix column criterion of carrying out.

E) upgrade location matrix P _k:

2) the GPU end is passed result of calculation back the CPU end: with P _kMatrix is passed the CPU end back;

3) GPU end calculating location matrix P _kDiagonal line and M _k:

The 303:GPU end obtains the second adjacency matrix W, and transfers to the CPU end.

The below verifies the feasibility of this method with concrete experiment, see for details hereinafter to describe:

One, story text

Choose five story text, the corresponding theme id of each story text, be respectively: 20001,20015,20039,20070 and 20076, adopt prior art and this method that above-mentioned five story text are calculated respectively, obtain corresponding adjacency matrix, and then obtain the similarity of each story text, then calculate between theme average similarity and average operating time in average similarity, theme, be respectively shown in table 1 and table 2.

Table 1

Table 2

Contrast by table 1 and table 2 adopts in the theme that prior art and this method get that between average similarity and theme, the ratio of average similarity is more or less the same as can be known, but the average operating time of this method is far smaller than the average operating time of prior art.

Two, many saliencies are collaborative detects

Adopt prior art and this method simultaneously Fig. 1 (a) and Fig. 1 (b) to be processed, get the result that the conspicuousness of calculating for this algorithm in Fig. 1 (c) and Fig. 1 (d) detects, and the result that the conspicuousness that Fig. 1 (e) and Fig. 1 (f) calculate for original serial algorithm detects.As can be seen from the figure, when conspicuousness result of calculation was more or less the same, be far smaller than the working time of prior art the working time of this method, referring to table 3.

Table 3

?	Working time
		Serial Simrank	1.98minutes
Parallel Simrank	1.27minutes

It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the invention described above embodiment sequence number does not represent the quality of embodiment just to description.

The above is only preferred embodiment of the present invention, and is in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, is equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. the node of graph similarity parallel calculating method based on structure, is characterized in that, said method comprising the steps of:

(3) the GPU end obtains adjacency matrix, and transfers to the CPU end.

2. a kind of node of graph similarity parallel calculating method based on structure according to claim 1, it is characterized in that, when the CPU end read in a plurality of story text as host side, the step that described GPU end calculates adjacency matrix W was specially: described GPU end calculates the first adjacency matrix, namely

3. a kind of node of graph similarity parallel calculating method based on structure according to claim 2, it is characterized in that, when the CPU end read in a plurality of image as host side, the step that described GPU end calculates adjacency matrix W was specially: described GPU end calculates the second adjacency matrix, comprising:

3) calculate diagonal element and;

4) M that this iteration is obtained _kAdd in S (a, b): S (a, b)=S (a, b)+M _k

4. a kind of node of graph similarity parallel calculating method based on structure according to claim 1, is characterized in that, when the CPU end read in a plurality of image as host side, described method also comprised:

5. a kind of node of graph similarity parallel calculating method based on structure according to claim 4, it is characterized in that, when the CPU end reads in a plurality of image as host side, described method also comprises: the structure that described transition matrix T is stored with the row compression is stored as sparse matrix, the step that described GPU end calculates adjacency matrix W is specially: described GPU end calculates the second adjacency matrix, comprising:

2) the GPU end is passed result of calculation back the CPU end;

3) GPU end calculating location matrix P _kDiagonal line and M _k:

6. a kind of node of graph similarity parallel calculating method based on structure according to claim 5, is characterized in that, the circulation of described CPU end is called GPU end kernel function parallel computation similarity for K time and specifically comprised:

A) calculate T _iIn nonzero value index x;

B) calculate T _jIn nonzero value index y;

C) calculate the similarity of manipulative indexing;

D) calculate node to (a, b) position in location matrix;

E) upgrade location matrix P _k: