CN103177414A - Structure-based dependency graph node similarity concurrent computation method - Google Patents
Structure-based dependency graph node similarity concurrent computation method Download PDFInfo
- Publication number
- CN103177414A CN103177414A CN2013101022817A CN201310102281A CN103177414A CN 103177414 A CN103177414 A CN 103177414A CN 2013101022817 A CN2013101022817 A CN 2013101022817A CN 201310102281 A CN201310102281 A CN 201310102281A CN 103177414 A CN103177414 A CN 103177414A
- Authority
- CN
- China
- Prior art keywords
- matrix
- similarity
- gpu
- node
- cpu
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a structure-based dependency graph node similarity concurrent computation method. The method comprises the following steps of: a CPU (central processing unit) end is taken as a host machine end, reading a plurality of story texts or graphics, and building a graph model to obtain adjacent matrixes of the graphics; a GPU (graphics processing unit) end is taken as an equipment end, receiving the adjacent matrixes output by the CPU end, and computing the adjacent matrixes by the GPU end; and obtaining the adjacent matrixes by the GPU end, and transmitting to the CPU end. According to the method, after the concurrent method is used, the similarity computation speed can be greatly accelerated, the higher computation precision can be guaranteed, and the efficiency and precision requirements of the computation and the application of a mass of medias can be met; and the experiment result shows that on the premise of similar precision, the acceleration algorithm provided by the method obtains the speed-up ratio which is averagely more than 100 times as high as the speed-up ratio obtained by the existing algorithm.
Description
Technical field
The present invention relates to the media computation field, particularly a kind of node of graph similarity parallel calculating method based on structure.
Background technology
At present in the media computation field, when solving the problem such as image segmentation, content retrieval and coupling, by the design of graphics model, spread to obtain corresponding result based on the similarity between node.Simple, the node of graph similarity calculate be node in evaluation map (for example: super pixel) a kind of means of structural similarity.
Usually adopt the descriptor between node to go to measure two similarities between node in prior art, carry out the similarity diffusion based on the similarity relation between the node neighbours and syntople.
The inventor finds to exist at least in prior art following shortcoming and defect in realizing process of the present invention:
Along with the increase of figure scale, can greatly increase the computing time of similarity diffusion, increased the complexity of calculating, and complexity even can reach O(kn
4), can't satisfy the needs in practical application.
Summary of the invention
The invention provides a kind of node of graph similarity parallel calculating method based on structure, this method has reduced complexity and the computing time of calculating, and has satisfied the needs in the practical application, sees for details hereinafter and describes:
A kind of node of graph similarity parallel calculating method based on structure said method comprising the steps of:
(1) the CPU end reads in a plurality of story text or image as host side, sets up graph model, obtains adjacency matrix;
(2) the GPU end as equipment end, receives the adjacency matrix of CPU end transmission, and the GPU end calculates adjacency matrix;
(3) the GPU end obtains adjacency matrix, and transfers to the CPU end.
When the CPU end read in a plurality of story text as host side, the step that described GPU end calculates adjacency matrix W was specially: described GPU end calculates the first adjacency matrix, that is,
1) pass through node a and the location index computing node of b in described the first adjacency matrix to (a, b) corresponding block index and thread index in grid, wherein grid is the grid of GPU kernel function, and block is the thread block in grid, and thread is the thread in thread block;
2) GPU end be in the first adjacency matrix each node to (a, b) the corresponding thread of the similarity dispensed between, that is: by block index and thread index search node to (a, b) corresponding thread, hold by the similarity of corresponding thread computes node to (a, b) at GPU.
When the CPU end read in a plurality of image as host side, the step that described GPU end calculates adjacency matrix W was specially: described GPU end calculates the second adjacency matrix, comprising:
Location matrix P when 1) searching for the k-1 time iteration
k-1Middle nonzero value is charged to respectively row with line index, column index and the respective value of nonzero element, and col is in three arrays of value;
2) by location matrix P
k-1Calculate the location matrix P of the K time iteration
k:
3) calculate diagonal element and;
4) M that this iteration is obtained
kAdd in S (a, b): S (a, b)=S (a, b)+M
k
When the CPU end read in a plurality of image as host side, described method also comprised:
Described CPU end obtains transition matrix T, and described GPU end receives described transition matrix T as equipment end.
When the CPU end reads in a plurality of image as host side, described method also comprises: the structure that described transition matrix T is stored with the row compression is stored as sparse matrix, the step that described GPU end calculates adjacency matrix W is specially: described GPU end calculates the second adjacency matrix, comprising:
1) GPU end kernel function parallel computation similarity is called in the circulation of CPU end for K time;
2) the GPU end is passed result of calculation back the CPU end;
4) the GPU end calculates the value S (a, b) of the similarity of corresponding element in similarity matrix s: S (a, b)=S (a, b)+M
k
The circulation of described CPU end is called GPU end kernel function parallel computation similarity for K time and is specifically comprised:
A) calculate T
iIn nonzero value index x;
B) calculate T
jIn nonzero value index y;
C) calculate the similarity of manipulative indexing;
D) calculate node to (a, b) position in location matrix;
The beneficial effect of technical scheme provided by the invention is: the CPU end reads in a plurality of story text or image as host side, sets up graph model, obtains adjacency matrix; The GPU end calculates adjacency matrix, and transfers to the CPU end; Improve the precision of calculating chart node similarity by this method, reduced complexity and the computing time of calculating, satisfied the needs in the practical application; Experimental result shows, under the prerequisite of similarity precision, accelerating algorithm proposed by the invention has obtained average speed-up ratio more than 100 times.
Description of drawings
Fig. 1 (a) and Fig. 1 (b) are original graph;
The result that the conspicuousness that Fig. 1 (c) and Fig. 1 (d) calculate for this method detects;
The result that the conspicuousness that Fig. 1 (e) and Fig. 1 (f) calculate for prior art detects;
Fig. 2 is a kind of process flow diagram of the node of graph similarity parallel calculating method based on structure;
Fig. 3 is another process flow diagram of a kind of node of graph similarity parallel calculating method based on structure;
Fig. 4 is another process flow diagram of a kind of node of graph similarity parallel calculating method based on structure.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, embodiment of the present invention is described further in detail below in conjunction with accompanying drawing.
In the CUDA programming model, there are two concepts of main frame and equipment.Computer CPU is commonly called host side (Host), and its GPU is commonly called equipment end (Device).A main frame and a plurality of equipment can be arranged in General System.
Use the CUDA programming model to programme, can assign the task to host side and equipment end is completed separately.Wherein host side is responsible for processing the logic of function and things and the calculating of suitable serial, and equipment end can be used for carrying out the thread-level task that is fit to highly-parallel.CPU, GPU have separate memory address space separately: the internal memory of host side and the video memory of equipment end.CUDA uses same syntax of figs and the function of former C language manipulation internal memory to the operation of internal memory, the operation video memory needs to call function relevant with memory management in CUDA API, these operations have comprised application, release and initialization video memory space, and the copy of data between internal memory and video memory etc.Be fit to parallel part in having analyzed program after, just can consider that these tasks that can walk abreast are transferred to GPU to be calculated.
For complexity and the computing time of reducing calculating, satisfy the needs in practical application, the embodiment of the present invention provides a kind of node of graph similarity parallel calculating method based on structure, and the method comprises the following steps:
Embodiment 1
The 101:CPU end reads in a plurality of story text as host side, sets up graph model, obtains the first adjacency matrix W of figure;
Node in figure represents the word in story, the similarity between the limit representation node between node, and according to the first similarity measurement rule, word between story inside word and story is set up the relation on limit in graph model, obtain the first adjacency matrix W of figure.
Wherein, the first similarity measurement rule is set according to the needs in practical application, and for example: the distance between the frequency that the similarity measurement between the inner word of story is occurred by word A, the frequency that word B occurs and word A and the common appearance of word B and two words is determined jointly less than the number of times of preset value; Between story, the similarity measurement of word is determined by the frequency of word A appearance and the frequency of word B appearance, and preset value is set by the needs in practical application.
The 102:GPU end receives the first adjacency matrix W of CPU end transmission as equipment end, and the GPU end calculates the first adjacency matrix W;
Wherein, the first adjacency matrix W is definite by the right similarity of all nodes in matrix, and this step specifically comprises:
1) be GPU end line journey Distribution Calculation task: the location index computing node in the first adjacency matrix W is to (a by node a and b, b) corresponding block index and thread index in grid, wherein grid is the grid of GPU kernel function, block is the thread block in grid, and thread is the thread in thread block;
2) GPU end be in the first adjacency matrix W each node to (a, b) the corresponding thread of the similarity dispensed between, that is: by block index and thread index search node to (a, b) corresponding thread, hold by the similarity of corresponding thread computes node to (a, b) at GPU:
For a given digraph G, make s (a, b) come similarity between representation node a and node b, these two node SimRank similarities are defined as follows:
For the situation of a=b, s (a, b)=1; For the situation of a ≠ b, s (a, b) is calculated as follows:
Wherein, C is the constant coefficient between 0,1; | I (a) | and | I (b) | represent that respectively a and b's enters neighbours' number; I
i(a) i of representation node a enters neighbours, I
j(b) j of representation node b enters neighbours.
This full point to algorithm in, calculate simultaneously all nodes between similarity use for next iterative computation.Make R
kThe SimRank similarity of the k time iteration between (a, b) expression (a, b), S (a, b)=lim
K → ∞R
k(a, b).
For R
kIterative manner is used in the calculating of (a, b).During initialization, when a is not equal to b, R (a, b)=0, otherwise it equals 1, then carries out iteration by following formula:
The 103:GPU end obtains the first adjacency matrix W, and transfers to the CPU end.
Embodiment 2
The 201:CPU end is set up graph model as many images of host side input, obtains the second adjacency matrix W and the transition matrix T of figure;
Super pixel in node representative image in figure, similarity between limit representation node between node, and according to the second similarity measurement rule between the super pixel of an image inside and set up the relation on limit between the super pixel of different images in graph model, finally calculate the second adjacency matrix W, and calculate transition matrix T by the second adjacency matrix W, and read in algorithm parameter constant decay factor C and error e rr.
Wherein, the second similarity measurement rule is set according to the needs in practical application, and for example: the super pixel of each piece is calculated obtained range descriptors, what image was interior calculates its similarity by range descriptors between super pixel in twos; Calculate its similarity by range descriptors between super pixel in twos between image.
During specific implementation, the CPU end also needs:
1) initialization similarity matrix and location matrix: similarity matrix is initialized as zero, s (a, b)=0; Location matrix is initialized as 1,
A kind of character that has due to this algorithm, that is: a, the internodal similarity of b two are equal to two random walk persons each set out since a, b, the possibility of First sight when advancing at random in converse digraph.
Expression has been walked k during the step, and two random walk person a, b are at the location matrix that had not met before, and the element in matrix [i, j] represents that namely migration person a, b have walked k and gone on foot the probability that arrives respectively i, j afterwards.
2) calculate number of times K repeatly, K=(int) log
ΔC。
Wherein, C represents decay factor: C=0.5, Δ represents error: Δ=0.01.
The 202:GPU end receives the second adjacency matrix W and the transition matrix T of CPU end transmission as equipment end, and the GPU end calculates the second adjacency matrix W;
Wherein, this step specifically comprises:
1) search P
k-1Nonzero value in matrix (location matrix during the k-1 time iteration) is charged to respectively row with line index, column index and the respective value of nonzero element, and col is in three arrays of value;
At search P
k-1During matrix, when running into nonzero element, deposit line number in array row, row number deposit array col in, and respective value deposits array value in.
2) pass through P
k-1The location matrix P of the K time iteration of matrix computations
k: by following formula calculating location matrix P
k:
Wherein, T
i, T
jBe the row vector of transition matrix T, T
i' be T
iTransposition, | V| is neighbours' number of node.
During specific implementation, the algorithm of corresponding kernel function is as follows: at first, calculate horizontal ordinate and the ordinate that will calculate in location matrix according to built-in variable; Then, initialized location matrix P
k, make P
k[a, b]=0, go to calculate according to following formula by circulation n time (n is the number of figure node) at last:
P
k[a,b]=P
k[a,b]+T[i,a]*T[j,b]*value[m]
Wherein, T[i, a] represent that the i of transition matrix T is capable, a column element; T[j, b] represent that the j of transition matrix T is capable, the b column element; Value[m] be P
k-1Value in [a, b].
2) calculate diagonal element and;
3) M that this iteration is obtained
kAdd in S (a, b): S (a, b)=S (a, b)+M
k
The 203:GPU end obtains the second adjacency matrix W, and transfers to the CPU end.
Embodiment 3
When the larger of figure and when more sparse, in order to improve the speed of calculating the second adjacency matrix, this method can also be calculated the second adjacency matrix by step 302.
The 301:CPU end is set up graph model as many images of host side input to it, obtains the second adjacency matrix W and the transition matrix T of figure, and transition matrix T is stored as sparse matrix with CRS (Compressed Row Storage, compressed line storage) structure;
This file layout is stored following vector with continuous core position: val line number group is sequentially stored the non-vanishing matrix element element with the behavior master, the column index of each element in col storage of array val array, the rowptr vector is stored the element sequence number that begins delegation in the val array.
Super pixel in node representative image in figure, similarity between limit representation node between node, and according to the second similarity measurement rule between the super pixel of an image inside and set up the relation on limit between the super pixel of different images in graph model, finally calculate the second adjacency matrix W, and calculate transition matrix T by the second adjacency matrix W, and read in algorithm parameter constant C and error e rr.
Wherein, the second similarity measurement rule is set according to the needs in practical application, and for example: the super pixel of each piece is calculated obtained range descriptors, what image was interior calculates its similarity by range descriptors between super pixel in twos; Calculate its similarity by range descriptors between super pixel in twos between image.
During specific implementation, the CPU end also needs:
1) initialization similarity matrix and location matrix: similarity matrix is initialized as zero, S (a, b)=0; Location matrix is initialized as 1,
Wherein, a kind of character that has due to this algorithm, that is: a, the internodal similarity of b two are equal to two random walk persons each set out since a, b, the possibility of First sight when advancing at random in converse digraph.
Expression has been walked k during the step, and two random walk person a, b are at the location matrix that had not met before, and the element in matrix [i, j] represents that namely migration person a, b have walked k and gone on foot the probability that arrives respectively i, j afterwards.
2) calculate number of times K repeatly, K=(int) log
ΔC。
The 302:GPU end receives the second adjacency matrix W and the transition matrix T of CPU end transmission as equipment end, and the GPU end calculates the second adjacency matrix W;
Wherein, this step specifically comprises:
1) GPU end kernel function parallel computation similarity is called in the circulation of CPU end for K time;
Each nonzero element to corresponding in transition matrix T calls kernel function, starts accordingly a thread, and this thread adds location matrix P after corresponding nonzero value is calculated
kMiddle correspondence position wherein, mainly comprises the following steps:
A) calculate T
iIn nonzero value index x: calculate and represent that this thread need to use T
iIn x nonzero value.
B) calculate T
jIn nonzero value index y: calculate and represent that this thread need to use T
jIn y nonzero value.
C) calculate the similarity of manipulative indexing: according to x, y, with corresponding T
i, T
jIn value fetch calculating, similarity is designated as s.
D) calculate node to (a, b) position in location matrix: according to index x, y, calculate the index in the location matrix that s should insert;
Wherein, transition matrix T is the result after the converse digraph adjacency matrix column criterion of carrying out.
2) the GPU end is passed result of calculation back the CPU end: with P
kMatrix is passed the CPU end back;
4) the GPU end calculates the value S (a, b) of the similarity of corresponding element in similarity matrix s: S (a, b)=S (a, b)+M
k
The 303:GPU end obtains the second adjacency matrix W, and transfers to the CPU end.
The below verifies the feasibility of this method with concrete experiment, see for details hereinafter to describe:
One, story text
Choose five story text, the corresponding theme id of each story text, be respectively: 20001,20015,20039,20070 and 20076, adopt prior art and this method that above-mentioned five story text are calculated respectively, obtain corresponding adjacency matrix, and then obtain the similarity of each story text, then calculate between theme average similarity and average operating time in average similarity, theme, be respectively shown in table 1 and table 2.
Table 1
Table 2
Contrast by table 1 and table 2 adopts in the theme that prior art and this method get that between average similarity and theme, the ratio of average similarity is more or less the same as can be known, but the average operating time of this method is far smaller than the average operating time of prior art.
Two, many saliencies are collaborative detects
Adopt prior art and this method simultaneously Fig. 1 (a) and Fig. 1 (b) to be processed, get the result that the conspicuousness of calculating for this algorithm in Fig. 1 (c) and Fig. 1 (d) detects, and the result that the conspicuousness that Fig. 1 (e) and Fig. 1 (f) calculate for original serial algorithm detects.As can be seen from the figure, when conspicuousness result of calculation was more or less the same, be far smaller than the working time of prior art the working time of this method, referring to table 3.
Table 3
? | Working time |
Serial Simrank | 1.98minutes |
Parallel Simrank | 1.27minutes |
It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the invention described above embodiment sequence number does not represent the quality of embodiment just to description.
The above is only preferred embodiment of the present invention, and is in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, is equal to replacement, improvement etc., within all should being included in protection scope of the present invention.
Claims (6)
1. the node of graph similarity parallel calculating method based on structure, is characterized in that, said method comprising the steps of:
(1) the CPU end reads in a plurality of story text or image as host side, sets up graph model, obtains adjacency matrix;
(2) the GPU end as equipment end, receives the adjacency matrix of CPU end transmission, and the GPU end calculates adjacency matrix;
(3) the GPU end obtains adjacency matrix, and transfers to the CPU end.
2. a kind of node of graph similarity parallel calculating method based on structure according to claim 1, it is characterized in that, when the CPU end read in a plurality of story text as host side, the step that described GPU end calculates adjacency matrix W was specially: described GPU end calculates the first adjacency matrix, namely
1) pass through node a and the location index computing node of b in described the first adjacency matrix to (a, b) corresponding block index and thread index in grid, wherein grid is the grid of GPU kernel function, and block is the thread block in grid, and thread is the thread in thread block;
2) GPU end be in the first adjacency matrix each node to (a, b) the corresponding thread of the similarity dispensed between, that is: by block index and thread index search node to (a, b) corresponding thread, hold by the similarity of corresponding thread computes node to (a, b) at GPU.
3. a kind of node of graph similarity parallel calculating method based on structure according to claim 2, it is characterized in that, when the CPU end read in a plurality of image as host side, the step that described GPU end calculates adjacency matrix W was specially: described GPU end calculates the second adjacency matrix, comprising:
Location matrix P when 1) searching for the k-1 time iteration
k-1Middle nonzero value is charged to respectively row with line index, column index and the respective value of nonzero element, and col is in three arrays of value;
2) by location matrix P
k-1Calculate the location matrix P of the K time iteration
k:
3) calculate diagonal element and;
4) M that this iteration is obtained
kAdd in S (a, b): S (a, b)=S (a, b)+M
k
4. a kind of node of graph similarity parallel calculating method based on structure according to claim 1, is characterized in that, when the CPU end read in a plurality of image as host side, described method also comprised:
Described CPU end obtains transition matrix T, and described GPU end receives described transition matrix T as equipment end.
5. a kind of node of graph similarity parallel calculating method based on structure according to claim 4, it is characterized in that, when the CPU end reads in a plurality of image as host side, described method also comprises: the structure that described transition matrix T is stored with the row compression is stored as sparse matrix, the step that described GPU end calculates adjacency matrix W is specially: described GPU end calculates the second adjacency matrix, comprising:
1) GPU end kernel function parallel computation similarity is called in the circulation of CPU end for K time;
2) the GPU end is passed result of calculation back the CPU end;
4) the GPU end calculates the value S (a, b) of the similarity of corresponding element in similarity matrix s: S (a, b)=S (a, b)+M
k
6. a kind of node of graph similarity parallel calculating method based on structure according to claim 5, is characterized in that, the circulation of described CPU end is called GPU end kernel function parallel computation similarity for K time and specifically comprised:
A) calculate T
iIn nonzero value index x;
B) calculate T
jIn nonzero value index y;
C) calculate the similarity of manipulative indexing;
D) calculate node to (a, b) position in location matrix;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310102281.7A CN103177414B (en) | 2013-03-27 | 2013-03-27 | A kind of node of graph similarity parallel calculating method of structure based |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310102281.7A CN103177414B (en) | 2013-03-27 | 2013-03-27 | A kind of node of graph similarity parallel calculating method of structure based |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103177414A true CN103177414A (en) | 2013-06-26 |
CN103177414B CN103177414B (en) | 2015-12-09 |
Family
ID=48637247
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310102281.7A Active CN103177414B (en) | 2013-03-27 | 2013-03-27 | A kind of node of graph similarity parallel calculating method of structure based |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103177414B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103427844A (en) * | 2013-07-26 | 2013-12-04 | 华中科技大学 | High-speed lossless data compression method based on GPU-CPU hybrid platform |
CN104360985A (en) * | 2014-10-20 | 2015-02-18 | 浪潮电子信息产业股份有限公司 | Method and device for realizing clustering algorithm based on MIC |
WO2016138836A1 (en) * | 2015-03-03 | 2016-09-09 | 华为技术有限公司 | Similarity measurement method and equipment |
CN106204669A (en) * | 2016-07-05 | 2016-12-07 | 电子科技大学 | A kind of parallel image compression sensing method based on GPU platform |
CN106202224A (en) * | 2016-06-29 | 2016-12-07 | 北京百度网讯科技有限公司 | Search processing method and device |
WO2018149299A1 (en) * | 2017-02-20 | 2018-08-23 | 平安科技(深圳)有限公司 | Method of identifying social insurance fraud, device, apparatus, and computer storage medium |
CN110263209A (en) * | 2019-06-27 | 2019-09-20 | 北京百度网讯科技有限公司 | Method and apparatus for generating information |
CN110851987A (en) * | 2019-11-14 | 2020-02-28 | 上汽通用五菱汽车股份有限公司 | Method, apparatus and storage medium for predicting calculated duration based on acceleration ratio |
CN111078957A (en) * | 2019-12-18 | 2020-04-28 | 无锡恒鼎超级计算中心有限公司 | Storage method based on graph storage structure |
CN111860588A (en) * | 2020-06-12 | 2020-10-30 | 华为技术有限公司 | Training method for graph neural network and related equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050231521A1 (en) * | 2004-04-16 | 2005-10-20 | John Harper | System for reducing the number of programs necessary to render an image |
CN102436545A (en) * | 2011-10-13 | 2012-05-02 | 苏州东方楷模医药科技有限公司 | Diversity analysis method based on chemical structure with CPU (Central Processing Unit) acceleration |
-
2013
- 2013-03-27 CN CN201310102281.7A patent/CN103177414B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050231521A1 (en) * | 2004-04-16 | 2005-10-20 | John Harper | System for reducing the number of programs necessary to render an image |
CN102436545A (en) * | 2011-10-13 | 2012-05-02 | 苏州东方楷模医药科技有限公司 | Diversity analysis method based on chemical structure with CPU (Central Processing Unit) acceleration |
Non-Patent Citations (4)
Title |
---|
K.A. HAWICK, ET AL.: "Parallel graph component labelling with GPUs and CUDA", 《PARALLEL COMPUTING》 * |
PAWAN HARISH,ET AL.: "Accelerating Large Graph Algorithms on the GPU Using CUDA", 《HIGH PERFORMANCE COMPUTING – HIPC 2007, LECTURE NOTES IN COMPUTER SCIENCE》 * |
YONGPENG ZHANG, ET AL.: "Large-Scale Multi-Dimensional Document Clustering on GPU Clusters", 《PARALLEL & DISTRIBUTED PROCESSING (IPDPS), 2010 IEEE INTERNATIONAL SYMPOSIUM ON》 * |
张聪,等: "基于GPU的数学形态学运算并行加速研究", 《电子设计工程》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103427844A (en) * | 2013-07-26 | 2013-12-04 | 华中科技大学 | High-speed lossless data compression method based on GPU-CPU hybrid platform |
CN103427844B (en) * | 2013-07-26 | 2016-03-02 | 华中科技大学 | A kind of high-speed lossless data compression method based on GPU and CPU mixing platform |
CN104360985A (en) * | 2014-10-20 | 2015-02-18 | 浪潮电子信息产业股份有限公司 | Method and device for realizing clustering algorithm based on MIC |
CN105989154B (en) * | 2015-03-03 | 2020-07-14 | 华为技术有限公司 | Similarity measurement method and equipment |
US10579703B2 (en) | 2015-03-03 | 2020-03-03 | Huawei Technologies Co., Ltd. | Similarity measurement method and device |
CN105989154A (en) * | 2015-03-03 | 2016-10-05 | 华为技术有限公司 | Similarity measurement method and equipment |
WO2016138836A1 (en) * | 2015-03-03 | 2016-09-09 | 华为技术有限公司 | Similarity measurement method and equipment |
CN106202224A (en) * | 2016-06-29 | 2016-12-07 | 北京百度网讯科技有限公司 | Search processing method and device |
CN106202224B (en) * | 2016-06-29 | 2022-01-07 | 北京百度网讯科技有限公司 | Search processing method and device |
CN106204669A (en) * | 2016-07-05 | 2016-12-07 | 电子科技大学 | A kind of parallel image compression sensing method based on GPU platform |
WO2018149299A1 (en) * | 2017-02-20 | 2018-08-23 | 平安科技(深圳)有限公司 | Method of identifying social insurance fraud, device, apparatus, and computer storage medium |
CN110263209B (en) * | 2019-06-27 | 2021-07-09 | 北京百度网讯科技有限公司 | Method and apparatus for generating information |
CN110263209A (en) * | 2019-06-27 | 2019-09-20 | 北京百度网讯科技有限公司 | Method and apparatus for generating information |
CN110851987A (en) * | 2019-11-14 | 2020-02-28 | 上汽通用五菱汽车股份有限公司 | Method, apparatus and storage medium for predicting calculated duration based on acceleration ratio |
CN110851987B (en) * | 2019-11-14 | 2022-09-09 | 上汽通用五菱汽车股份有限公司 | Method, apparatus and storage medium for predicting calculated duration based on acceleration ratio |
CN111078957A (en) * | 2019-12-18 | 2020-04-28 | 无锡恒鼎超级计算中心有限公司 | Storage method based on graph storage structure |
CN111860588A (en) * | 2020-06-12 | 2020-10-30 | 华为技术有限公司 | Training method for graph neural network and related equipment |
CN111860588B (en) * | 2020-06-12 | 2024-06-21 | 华为技术有限公司 | Training method for graphic neural network and related equipment |
Also Published As
Publication number | Publication date |
---|---|
CN103177414B (en) | 2015-12-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103177414A (en) | Structure-based dependency graph node similarity concurrent computation method | |
US8400458B2 (en) | Method and system for blocking data on a GPU | |
CN108921188B (en) | Parallel CRF method based on Spark big data platform | |
CN108170639B (en) | Tensor CP decomposition implementation method based on distributed environment | |
US20180121388A1 (en) | Symmetric block sparse matrix-vector multiplication | |
CN108140061B (en) | Method, storage medium, and system for determining co-occurrence in graph | |
CN105739951A (en) | GPU-based L1 minimization problem fast solving method | |
CN106709503A (en) | Large spatial data clustering algorithm K-DBSCAN based on density | |
CN106202224B (en) | Search processing method and device | |
US11037356B2 (en) | System and method for executing non-graphical algorithms on a GPU (graphics processing unit) | |
CN113435521A (en) | Neural network model training method and device and computer readable storage medium | |
CN114138231B (en) | Method, circuit and SOC for executing matrix multiplication operation | |
CN109460398A (en) | Complementing method, device and the electronic equipment of time series data | |
CN106484532B (en) | GPGPU parallel calculating method towards SPH fluid simulation | |
CN110264392B (en) | Strong connection graph detection method based on multiple GPUs | |
CN114492753A (en) | Sparse accelerator applied to on-chip training | |
CN104572588A (en) | Matrix inversion processing method and device | |
CN117093538A (en) | Sparse Cholesky decomposition hardware acceleration system and solving method thereof | |
CN104156268B (en) | The load distribution of MapReduce and thread structure optimization method on a kind of GPU | |
CN106844024A (en) | The GPU/CPU dispatching methods and system of a kind of self study run time forecast model | |
CN110059813A (en) | The method, device and equipment of convolutional neural networks is updated using GPU cluster | |
DE102023105572A1 (en) | Efficient matrix multiplication and addition with a group of warps | |
US20220129755A1 (en) | Incorporating a ternary matrix into a neural network | |
CN114116208A (en) | Short wave radiation transmission mode three-dimensional acceleration method based on GPU | |
US9600446B2 (en) | Parallel multicolor incomplete LU factorization preconditioning processor and method of use thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220112 Address after: 518000 room 4009a, No. 38, Liuxian Third Road, district 72, Bao'an District, Shenzhen, Guangdong Province (office space) Patentee after: Shenzhen kanghongtai Technology Co.,Ltd. Address before: 300072 Tianjin City, Nankai District Wei Jin Road No. 92 Patentee before: Tianjin University |
|
TR01 | Transfer of patent right |