CN103488775A - Computing system and computing method for big data processing - Google Patents

Computing system and computing method for big data processing Download PDF

Info

Publication number
CN103488775A
CN103488775A CN201310455174.2A CN201310455174A CN103488775A CN 103488775 A CN103488775 A CN 103488775A CN 201310455174 A CN201310455174 A CN 201310455174A CN 103488775 A CN103488775 A CN 103488775A
Authority
CN
China
Prior art keywords
computation model
module
computation
computing
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310455174.2A
Other languages
Chinese (zh)
Other versions
CN103488775B (en
Inventor
王鹏
韩冀中
王伟平
孟丹
张云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201310455174.2A priority Critical patent/CN103488775B/en
Publication of CN103488775A publication Critical patent/CN103488775A/en
Application granted granted Critical
Publication of CN103488775B publication Critical patent/CN103488775B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a computing system and computing method for big data processing. The computing system comprises a bottommost-layer module, a middle-layer module and a topmost-layer module sequentially from bottom to top. The middle-layer module comprises a message transmission module and computation model modules. A Hadoop distributed file system is adopted in the bottommost-layer module, and the bottommost-layer module is used for storing data. The message transmission module is used for transmitting messages between the computation model modules operating at different computing nodes. The computation model modules operating at the different computing nodes work cooperatively according to the messages transmitted by the message transmission module and establish specific types of computation models respectively to process the data. The topmost-layer module is used for providing programmatic interfaces for the computation models, combining computation expressed by the different computation models in a serial mode and enabling the different computation models to share the data based on an internal storage flow line mode through setting at the same time. According to the computing system and computing method for big data processing, application programs can be written in one system through the multiple computation models, and more complex problems can be solved.

Description

A kind of computing system and computing method of processing for large data
Technical field
The present invention relates to count greatly process field, particularly relate to a kind of computing system and computing method based on hybrid programming.
Background technology
In recent years, along with the fast development of social informatization, no matter at scientific research, commercial production, business and internet arena, data all present explosive growth.At present, the data in a lot of application are from the TB(terabyte) level develops the PB(petabyte rapidly) order of magnitude that level is even higher.The Computational frame of processing towards large data has become much-talked-about topic.The Hadoop system of increasing income at present, is applied widely in industrial community.Although the MapReduce model that Hadoop provides is simple and easy to use, its computation model has limitation, and ability to express is limited, when solving challenges such as iterative computation, map analysis, be difficult to algorithm is mapped in the MapReduce model, the workload of exploitation is large, and the inefficiency of operation.At present, diversified Computational frame has appearred, for example, Dryad, Piccolo, Pregel and Spark etc., these Computational frames have greatly enriched the means that large data are processed.But these Computational frames are towards the particular problem field, in its suitable application area, can efficiently deal with problems fast.For example, Pregel calculates for large-scale figure, in solving application such as web link analysis, illness spread path and optimization traffic route, has obvious advantage.Current large data processing task diversification day by day, do not exist the framework of a kind of " omnipotent " to be applicable to all application scenarioss, and the data processing platform (DPP) that merges multiple Computational frame becomes trend of the times.Under the data processing platform (DPP) of this " unification ", the means that large data are processed can be more and more abundanter, and dealing with problems can be more and more easier, and treatment effeciency also can be more and more higher.Current solution is to hold multiple Computational frame by resource management system on cluster, and typical system comprises Mesos and YARN etc.This type systematic can allow multiple Computational frame to share same cluster resource, but also exist significantly not enough, be mainly reflected in three aspects:: (1) programming threshold is high, the programmer needs to be grasped multiple programming language (as the C of MPI, the Java language of Hadoop), therefore, there is the poor shortcoming of ease for use; (2) can't reuse the code of different frames, and code reuse is extremely important in modern software engineering, therefore, has the shortcoming that development efficiency is low; (3) between different work with distributed file system (as HDFS, Hadoop Distributed File System, i.e. Hadoop distributed file system) mode share data, bring a large amount of magnetic disc i/o expenses, therefore, the shortcoming that there is the wasting of resources.
In sum, the method and system of the multiple Computational frame of existing fusion also exist that ease for use is poor, development efficiency is low and the shortcoming of the wasting of resources.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of computing system and computing method of processing for large data, the limitation problem existed for solving the existing method that merges multiple Computational frame.
The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of computing system of processing for large data, it runs on a plurality of computing nodes, and comprise successively from bottom to up three layers of module, be respectively bottom module, middle layer module and top module, middle layer module comprises again transmission of messages module and computation model module;
Described bottom module, it adopts HDFS, for storing data;
Described transmission of messages module, it is for realizing pass-along message between the computation model module of different computing node operations;
The described computation model module in the operation of different computing nodes, its message according to the transmission of described transmission of messages module realizes collaborative work, and the computation model that builds separately particular type is processed the data that read from HDFS;
Described top module, it is used to the computation model of each particular type that corresponding DLL (dynamic link library) is provided, and combines in the mode of serial the calculating that different computation models are expressed, and arranges between different computation models simultaneously and shares data based on the internal memory pipeline system.
On the basis of technique scheme, the present invention can also do following improvement.
Further, the data of described bottom module stores comprise input data set, intermediate result data set and Output rusults data set.
Further, described transmission of messages module comprises transmitter and receiver, described transmitter is used for from the computation model module receipt message in same computing node, and the message of reception is sent to the receiver of the computing node of appointment, the message that described receiver sends for the transmitter that receives different computing nodes from network, and the message of reception is transmitted to the computation model module in same computing node.
Further, the message of transmitting between described transmission of messages module, described computation model module and described transmitter and described receiver comprises request message and response message.
Further, described computation model module comprises the controller of the computation model of some employing particular types, also comprise the processor that adopts the identical calculations model with each controller, described controller is for coordinating the execution flow process of its computation model adopted, the request message that described processor sends for receiving controller, the data processing operation of implementation controller appointment, and report response message to controller; Described processor is also for reading the input data of HDFS, and to HDFS output result of calculation.
Further, described top module also provides corresponding configuration for the computation model to each particular type, comprise input directory, output directory, computing node number and output journal catalogue are set, described input directory is used to specify pending data set path, and described output directory is used to specify the storing path of final calculation result.
Further, when the mode with serial combines the calculating of different computation models expression, described top module is also for providing the DLL (dynamic link library) of the multiple computation model of serial combination.
Further, described dissimilar computation model comprises BSP(Bulk Synchronous Parallel, piece is run simultaneously) computation model, DAG(Directed Acyclic Graph, directed acyclic graph) computation model, GraphLab computation model and Spark computation model.The DAG computation model is a kind of comparatively general computation model, and the Dryad Computational frame of Microsoft has adopted this model.The Pregel figure Computational frame that Google proposes is derived from the BSP computation model.GraphLab is a kind of computation model based on asynchronous mode, and Spark has proposed the concept of elastic data collection, and the mode that adopts internal memory to calculate improves the efficiency that data are processed.
Technical scheme of the present invention also comprises a kind of computing method of processing for large data, and it has adopted above-mentioned computing system, and concrete steps comprise:
Step 1, be uploaded to HDFS by pending data set;
Step 2, according to the application demand of pending data set, be split as the whole calculation process of pending data set the segmentation calculation process of a plurality of serials;
Step 3, for each segmentation calculation process is chosen the computation model of particular type, and utilize the DLL (dynamic link library) provided to write the correlative code of the computation model of each particular type;
Step 4, adopt the mode of serial to combine the calculating that different computation models are expressed, and arranges between different computation models simultaneously and share data based on the internal memory pipeline system;
Step 5, the relevant configuration of application program is provided according to the DLL (dynamic link library) provided;
Step 6, carry out the correlative code of writing.
Further, described computation model comprises BSP computation model, DAG computation model, GraphLab computation model and Spark computation model.
The invention has the beneficial effects as follows: unique distinction of the present invention is to support multiple computation model in a computing system, overcome current Computational frame and can only support a kind of limitation of computation model, allow the developer in a system, to adopt multiple computation model to write application program, and can be by multiple calculation combination together, thereby solve more complicated problem.The computing system that the present invention proposes is supported the computation model of two kinds of main flows, i.e. DAG and BSP yet can compatible other models, as GraphLab, Spark etc.
The accompanying drawing explanation
The structural representation that Fig. 1 is the computing system of processing for large data of the present invention;
The schematic flow sheet that Fig. 2 is the computing method of processing for large data of the present invention;
Fig. 3 adopts the comparison diagram of different pieces of information sharing mode in application examples of the present invention.
In accompanying drawing, the list of parts of each label representative is as follows:
1, bottom module, 2, middle layer module, 3, top module, 21, the transmission of messages module, 22, the computation model module, 211, transmitter, 212, receiver, 221, controller, 222, processor.
Embodiment
Below in conjunction with accompanying drawing, principle of the present invention and feature are described, example, only for explaining the present invention, is not intended to limit scope of the present invention.
As shown in Figure 1, the present embodiment has provided a kind of computing system of processing for large data, it runs on a plurality of computing nodes, comprise successively from bottom to up three layers of module, be respectively bottom module 1, middle layer module 2 and top module 3, and middle layer module 2 comprises again transmission of messages module 21 and computation model module 22.
Described bottom module 1, it adopts HDFS, and for storing data, the data of its storage comprise input data set, intermediate result data set and Output rusults data set.
Described transmission of messages module 21, it is for realizing pass-along message between the computation model module of different computing node operations, comprise transmitter 211 and receiver 212, described transmitter 211 is for computation model module 22 receipt messages from same computing node, and the message of reception is sent to the receiver 212 of the computing node of appointment, the message that described receiver 212 sends for the transmitter that receives different computing nodes from network, and the message of reception is transmitted to the computation model module 22 in same computing node.Described message comprises request message (as carried out " request " class message of certain operation) and response message (as certain operates " response " class message whether executed).
The described computation model module 22 in the operation of different computing nodes, its message of transmitting according to described transmission of messages module 21 realizes collaborative work, and the computation model that builds separately particular type carrys out deal with data.Described computation model module 22 comprises the controller 221 of the dissimilar computation model of some employings, also comprise the processor 222 that adopts the identical calculations model with each controller, described controller 221 is for coordinating the execution flow process of its computation model adopted, the request message that described processor 222 sends for receiving controller 221, the data processing operation of implementation controller 221 appointments, and report response message to controller 221; Described processor 222 is also for reading the input data of HDFS, and to HDFS output result of calculation.In addition, described computation model comprises BSP computation model, DAG computation model, GraphLab computation model and Spark computation model.
Described top module 3, it is used to the computation model of each particular type that corresponding DLL (dynamic link library) is provided, and combines in the mode of serial the calculating that different computation models are expressed, and arranges between different computation models simultaneously and shares data based on the internal memory pipeline system.Described top module also provides corresponding configuration for the computation model to each particular type, comprise input directory, output directory, computing node number and output journal catalogue etc. are set, described input directory is used to specify pending data set path, and described output directory is used to specify the storing path of final calculation result.When processing the some segmentation calculation process that split, obtain data by processor from HDFS, share data owing to arranging between different computation models based on the internal memory pipeline system simultaneously, also by pipeline system, from internal memory, read the output data of previous segmentation calculation process, after data are processed, export to again next segmentation calculation process, and can read input data and intermediate data and write intermediate data and output data to HDFS from HDFS in whole process.
In addition, when the mode with serial combines the calculating of different computation models expression, each computation model need carry out serial combination, and described top module is also for providing the DLL (dynamic link library) of the multiple computation model of serial combination.If any computation model A, B and C, by it with A-B-C-the order of A carries out serial combination, top module corresponding providing is carried out A-> B-C-the DLL (dynamic link library) of this serial combination of A.
Computing system based on above-mentioned, as shown in Figure 2, technical scheme of the present invention also comprises a kind of computing method of processing for large data, concrete steps comprise:
Step 1, be uploaded to HDFS by pending data set;
Step 2, according to the application demand of pending data set, be split as the whole calculation process of pending data set the segmentation calculation process of a plurality of serials;
Step 3, for each segmentation calculation process is chosen the computation model of particular type, and utilize the DLL (dynamic link library) provided to write the correlative code of the computation model of each particular type;
Step 4, adopt the mode of serial to combine the calculating that different computation models are expressed, and arranges between different computation models simultaneously and share data based on the internal memory pipeline system;
Step 5, the relevant configuration of application program is provided according to the DLL (dynamic link library) provided;
Step 6, carry out the correlative code of writing.
Based on such scheme, hereinafter provided the basic procedure that adopts above-mentioned computing system and method.
(1) all nodes that will participate in calculating couple together by switch, the cluster that foundation can intercom mutually.
(2) the Hadoop system of increasing income in each node deploy of cluster, and the HDFS of startup Hadoop system;
(3) pending data set is uploaded to HDFS, the computing system that adopts the present embodiment to propose, write the application program that data are processed, and specific implementation method is referring to relevant portion above.
(4) use compiler that the compiling of application of writing is become to an executable file.
(5) choose a collection of node from cluster, be used for running job.Node is divided into 2 classes, and one is main controlled node, and remaining is computing node.
(6) start executable program on main controlled node, it is master that its parameter is set, and is designated hereinafter simply as master.
(7) start executable program on computing node, it is worker that its parameter is set, and is designated hereinafter simply as worker.
(8) after master starts, carry out following step:
1) at first created the queue of a controller object (Controller).
2) master ejects a controller object from queue, and carries out this controller.The step that controller is carried out is as follows:
[1] controller object of master is responsible for worker Distribution Calculation task, sends control command to worker, and waits for the feedback of worker.
[2] according to the feedback of worker, controller dispatches next step computation process.
Above-mentioned steps [1] [2] is circulation constantly, until a segmentation calculation process executes, controller exits.
3) execute when a segmentation calculation process, master can take out next controller from queue to be continued to carry out.Whole process is until the controller queue is empty.
(9) after worker starts, carry out following step:
1) at first to master, send log-on message.
2) create a processor (handler) queue, then wait for the control command of master.
3) receive the control command that master sends, call handler and carry out the compute function.
4) report executing state to master.
5) if receive and exit command, just quit a program.
Above-mentioned steps 3) and step 4) constantly carry out.
(10) when the controller queue of master is sky, mean that all segmentation calculation process are finished, whole operation also executes.Master notifies all worker to exit calculating.Master also exits thereupon, and the calculating of whole operation completes.
(11) after having calculated, result of calculation is saved in HDFS, checks result of calculation in output directory.
Above-mentioned steps 4 is the method based on computing system mentioned above that the present embodiment proposes, and is summarized as follows:
At first, according to the demand of application, calculating is split as to the segmentation calculation process of a plurality of serials;
Secondly, for each segmentation calculation process is chosen suitable computation model, and write the processing function that model is relevant.Calculate the execute phase for figure, adopt the BSP computation model.For the calculating of other types, adopt the DAG computation model.The computing system of the present embodiment provides DAG and BSP BCL, and the developer need to write the computing function of BSP BCL and the computing function of DAG BCL.
Finally, the driving interface that adopts framework to provide, specify the serial execution sequence of the code of a plurality of segmentation calculation process, and configure corresponding parameter, comprises the sharing mode of input directory, output directory and intermediate data set.Usually, in order to improve treatment effeciency, adopt online intermediate data set sharing mode, rather than the intermediate data set sharing mode of off-line.The former adopts the mode based on the internal memory streamline to share data, and the latter adopts the mode of HDFS to share data.
With a concrete application examples, the performing step of the basic procedure of shared cluster resource is done to concrete description.Suppose at first to carry out PageRank calculating to a web data collection, obtain the PageRank value of each webpage; Then the PageRank value of webpage is carried out to descending sort.This calculating can be decomposed into 2 segmentation calculation process: at first adopt the BSP model to carry out PageRank calculating, then adopt the calculating of sorting of DAG model.
Should adopt the Go language by use-case, and utilize the distributed file system HDFS increased income to build experimental situation, concrete implementation step is as follows.
(1) all nodes that will participate in calculating couple together by switch, the cluster that foundation can intercom mutually.
(2) the Hadoop system of increasing income in each node deploy of cluster, and the HDFS of startup Hadoop system.
(3) pending data set is uploaded to HDFS.
(4) write the application program that data are processed:
At first, according to the demand of application, calculating is split as to the segmentation calculation process of 2 serials: calculate the stage of PageRank and the stage of sequence;
Secondly, calculate the PageRank stage, adopt the BSP computation model, inherit the BSP base class, write corresponding computing function; Phase sorting, adopt the DAG computation model, inherits the DAG base class, writes corresponding computing function;
Finally, the driving API that adopts top module to provide, add BSP subclass and the DAG subclass of writing in the execute phase to, configures input directory and output directory, and the sharing mode of intermediate data set is set.
(5) use go build order, the compiling of application of writing is become to an executable file.
(6) choose a collection of node from cluster, be used for running job.Node is divided into 2 classes, and one is main controlled node, and remaining is computing node.
(7) start executable program on main controlled node, it is master that its parameter is set.
(8) start executable program on computing node, it is worker that its parameter is set.
(8) after master starts, carry out following step:
1) at first created the queue of a controller object (Controller), totally two objects: a BSP controller, another one is DAG controller.
2) master ejects BSP controller object from queue, and carries out this controller.
3), after BSP controller object executes, master takes out next controller from the queue of controller object, i.e. DAG controller, and call its and carry out.
4), after DAG controller executes, queue is empty.Master notice worker exits calculating.
5) master self exits calculating.
(9) after worker starts, carry out following step:
1) at first to master, send log-on message
2) create a processor (handler) queue, totally two objects: a BSP handler and a DAG handler.Wait for the control command of master.
3) control command received according to master, call BSP handler and carry out corresponding compute function.
4) report executing state to master.
5) control command received according to master, call DAG handler and carry out corresponding compute function.
6) report executing state to master.
7) receive and exit command, worker quits a program.
(10) after having calculated, result of calculation is saved in HDFS, checks result of calculation in output directory.
In should use-case, test data set be to be generated by the benchmark program, has produced the diagram data collection on 400 ten thousand, 800 ten thousand and 1.6 thousand ten thousand summits.At first data set is carried out to PageRank calculating, then the PageRank value of calculating is carried out to descending sort.PageRank calculates and completes with the BSP model, and sequence is calculated and completed with the DAG model.Wherein, having calculated intermediate data set after PageRank offers follow-up sequence and calculates.Simultaneously, should two kinds of middle working times that collect under the data sharing mode in step (4) have been compared by use-case: based on the HDFS mode, share data and share data based on internal memory streamline (In-mermory Pipeline) mode.Fig. 3 has provided the comparing result of the sharing mode of two kinds of intermediate data sets.From the comparing result of Fig. 3, owing to having avoided a large amount of HDFS magnetic disc i/o expenses based on the internal memory pipeline system, have obvious shortening than HDFS working time.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1. a computing system of processing for large data, it is characterized in that, it runs on a plurality of computing nodes, and comprises successively from bottom to up three layers of module, be respectively bottom module, middle layer module and top module, middle layer module comprises again transmission of messages module and computation model module:
Described bottom module, it adopts the Hadoop distributed file system, for storing data;
Described transmission of messages module, it is for realizing pass-along message between the computation model module of different computing node operations;
The described computation model module in the operation of different computing nodes, its message according to the transmission of described transmission of messages module realizes collaborative work, and the computation model that builds separately particular type is processed the data that read from the Hadoop distributed file system;
Described top module, it is used to the computation model of each particular type that corresponding DLL (dynamic link library) is provided, and combines in the mode of serial the calculating that different computation models are expressed, and arranges between different computation models simultaneously and shares data based on the internal memory pipeline system.
2. computing system according to claim 1, is characterized in that, the data of described bottom module stores comprise input data set, intermediate result data set and Output rusults data set.
3. computing system according to claim 1, it is characterized in that, described transmission of messages module comprises transmitter and receiver, described transmitter is used for from the computation model module receipt message in same computing node, and the message of reception is sent to the receiver of the computing node of appointment, the message that described receiver sends for the transmitter that receives different computing nodes from network, and the message of reception is transmitted to the computation model module in same computing node.
4. according to the described computing system of claim 1 or 3, it is characterized in that, described message comprises request message and response message.
5. computing system according to claim 1, it is characterized in that, described computation model module comprises the controller of the computation model of some employing particular types, also comprise the processor that adopts the identical calculations model with each controller, described controller is for coordinating the execution flow process of its computation model adopted, the request message that described processor sends for receiving controller, the data processing operation of implementation controller appointment, and report response message to controller; Described processor is also for reading the input data of Hadoop distributed file system, and to Hadoop distributed file system output result of calculation.
6. computing system according to claim 1, it is characterized in that, described top module also provides corresponding configuration for the computation model to each particular type, comprise input directory, output directory, computing node number and output journal catalogue are set, described input directory is used to specify pending data set path, and described output directory is used to specify the storing path of final calculation result.
7. computing system according to claim 1, is characterized in that, when the mode with serial combines the calculating of different computation models expression, described top module is also for providing the DLL (dynamic link library) of the multiple computation model of serial combination.
8. according to claim 1,5,6 or 7 described computing systems, it is characterized in that, described computation model comprises BSP computation model, DAG computation model, GraphLab computation model and Spark computation model.
9. computing method of processing for large data, it is applicable to a plurality of computing nodes, it is characterized in that, and concrete steps comprise:
Step 1, be uploaded to the Hadoop distributed file system by pending data set;
Step 2, according to the application demand of pending data set, be split as the whole calculation process of pending data set the segmentation calculation process of a plurality of serials;
Step 3, for each segmentation calculation process is chosen the computation model of particular type, and utilize the DLL (dynamic link library) provided to write the correlative code of the computation model of each particular type;
Step 4, adopt the mode of serial to combine the calculating that different computation models are expressed, and arranges between different computation models simultaneously and share data based on the internal memory pipeline system;
Step 5, the relevant configuration of application program is provided according to the DLL (dynamic link library) provided;
Step 6, carry out the correlative code of writing.
10. computing method according to claim 9, is characterized in that, described computation model comprises BSP computation model, DAG computation model, GraphLab computation model and Spark computation model.
CN201310455174.2A 2013-09-29 2013-09-29 A kind of calculating system processed for big data and computational methods Active CN103488775B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310455174.2A CN103488775B (en) 2013-09-29 2013-09-29 A kind of calculating system processed for big data and computational methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310455174.2A CN103488775B (en) 2013-09-29 2013-09-29 A kind of calculating system processed for big data and computational methods

Publications (2)

Publication Number Publication Date
CN103488775A true CN103488775A (en) 2014-01-01
CN103488775B CN103488775B (en) 2016-08-10

Family

ID=49829001

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310455174.2A Active CN103488775B (en) 2013-09-29 2013-09-29 A kind of calculating system processed for big data and computational methods

Country Status (1)

Country Link
CN (1) CN103488775B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063228A (en) * 2014-07-02 2014-09-24 中央民族大学 Pipeline data processing system
CN104881491A (en) * 2015-06-11 2015-09-02 广州市云润大数据服务有限公司 Software development system based on big data platform
CN105335215A (en) * 2015-12-05 2016-02-17 中国科学院苏州生物医学工程技术研究所 Monte-Carlo simulation accelerating method and system based on cloud computing
CN105404611A (en) * 2015-11-09 2016-03-16 南京大学 Matrix model based multi-calculation-engine automatic selection method
CN105468770A (en) * 2015-12-09 2016-04-06 合一网络技术(北京)有限公司 Data processing method and system
CN105700998A (en) * 2016-01-13 2016-06-22 浪潮(北京)电子信息产业有限公司 Method and device for monitoring and analyzing performance of parallel programs
CN105956049A (en) * 2016-04-26 2016-09-21 乐视控股(北京)有限公司 Data output control method and device
CN106021484A (en) * 2016-05-18 2016-10-12 中国电子科技集团公司第三十二研究所 Customizable multi-mode big data processing system based on memory calculation
CN106155635A (en) * 2015-04-03 2016-11-23 北京奇虎科技有限公司 A kind of data processing method and device
CN107025099A (en) * 2016-02-01 2017-08-08 北京大学 A kind of asynchronous figure based on deque's model calculates realization method and system
CN107807983A (en) * 2017-10-30 2018-03-16 辽宁大学 A kind of parallel processing framework and design method for supporting extensive Dynamic Graph data query
CN108255619A (en) * 2017-12-28 2018-07-06 新华三大数据技术有限公司 A kind of data processing method and device
CN110440858A (en) * 2019-09-12 2019-11-12 武汉轻工大学 Grain condition monitoring system and method
WO2020052241A1 (en) * 2018-09-11 2020-03-19 Huawei Technologies Co., Ltd. Heterogeneous scheduling for sequential compute dag
US10606654B2 (en) 2015-04-29 2020-03-31 Huawei Technologies Co., Ltd. Data processing method and apparatus
CN111625243A (en) * 2020-05-13 2020-09-04 北京字节跳动网络技术有限公司 Cross-language task processing method and device and electronic equipment
CN111679860A (en) * 2020-08-12 2020-09-18 上海冰鉴信息科技有限公司 Distributed information processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169505A (en) * 2011-05-16 2011-08-31 苏州两江科技有限公司 Recommendation system building method based on cloud computing
US20120182891A1 (en) * 2011-01-19 2012-07-19 Youngseok Lee Packet analysis system and method using hadoop based parallel computation
CN102880510A (en) * 2012-09-24 2013-01-16 中国科学院对地观测与数字地球科学中心 Parallel programming method oriented to data intensive application based on multiple data architecture centers
CN103279390A (en) * 2012-08-21 2013-09-04 中国科学院信息工程研究所 Parallel processing system for small operation optimizing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120182891A1 (en) * 2011-01-19 2012-07-19 Youngseok Lee Packet analysis system and method using hadoop based parallel computation
CN102169505A (en) * 2011-05-16 2011-08-31 苏州两江科技有限公司 Recommendation system building method based on cloud computing
CN103279390A (en) * 2012-08-21 2013-09-04 中国科学院信息工程研究所 Parallel processing system for small operation optimizing
CN102880510A (en) * 2012-09-24 2013-01-16 中国科学院对地观测与数字地球科学中心 Parallel programming method oriented to data intensive application based on multiple data architecture centers

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHRISTOPHER OLSTON等: "Automatic Optimization of Parallel Dataflow Programs", 《USENIX’08: 2008 USENIX ANNUAL TECHNICAL CONFERENCE》 *
MICHAEL ISARD等: "Dryad:Distributed data-parallel programs from sequential building blocks", 《PROCESSINGS OF THE 2ND EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS’07)》 *
王鹏等: "数据密集型计算编程模型研究进展", 《计算机研究与发展》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063228A (en) * 2014-07-02 2014-09-24 中央民族大学 Pipeline data processing system
CN106155635B (en) * 2015-04-03 2020-09-18 北京奇虎科技有限公司 Data processing method and device
CN106155635A (en) * 2015-04-03 2016-11-23 北京奇虎科技有限公司 A kind of data processing method and device
US10606654B2 (en) 2015-04-29 2020-03-31 Huawei Technologies Co., Ltd. Data processing method and apparatus
CN104881491A (en) * 2015-06-11 2015-09-02 广州市云润大数据服务有限公司 Software development system based on big data platform
CN105404611A (en) * 2015-11-09 2016-03-16 南京大学 Matrix model based multi-calculation-engine automatic selection method
CN105404611B (en) * 2015-11-09 2018-07-20 南京大学 A kind of automatic selecting method of more computing engines based on matrix model
CN105335215A (en) * 2015-12-05 2016-02-17 中国科学院苏州生物医学工程技术研究所 Monte-Carlo simulation accelerating method and system based on cloud computing
CN105335215B (en) * 2015-12-05 2019-02-05 中国科学院苏州生物医学工程技术研究所 A kind of Monte Carlo simulation accelerated method and system based on cloud computing
CN105468770A (en) * 2015-12-09 2016-04-06 合一网络技术(北京)有限公司 Data processing method and system
CN105700998A (en) * 2016-01-13 2016-06-22 浪潮(北京)电子信息产业有限公司 Method and device for monitoring and analyzing performance of parallel programs
CN107025099A (en) * 2016-02-01 2017-08-08 北京大学 A kind of asynchronous figure based on deque's model calculates realization method and system
CN107025099B (en) * 2016-02-01 2019-12-27 北京大学 Asynchronous graph calculation implementation method and system based on double-queue model
CN105956049A (en) * 2016-04-26 2016-09-21 乐视控股(北京)有限公司 Data output control method and device
CN106021484A (en) * 2016-05-18 2016-10-12 中国电子科技集团公司第三十二研究所 Customizable multi-mode big data processing system based on memory calculation
CN107807983A (en) * 2017-10-30 2018-03-16 辽宁大学 A kind of parallel processing framework and design method for supporting extensive Dynamic Graph data query
CN107807983B (en) * 2017-10-30 2021-08-24 辽宁大学 Design method of parallel processing framework supporting large-scale dynamic graph data query
CN108255619B (en) * 2017-12-28 2019-09-17 新华三大数据技术有限公司 A kind of data processing method and device
CN108255619A (en) * 2017-12-28 2018-07-06 新华三大数据技术有限公司 A kind of data processing method and device
WO2020052241A1 (en) * 2018-09-11 2020-03-19 Huawei Technologies Co., Ltd. Heterogeneous scheduling for sequential compute dag
CN110440858A (en) * 2019-09-12 2019-11-12 武汉轻工大学 Grain condition monitoring system and method
CN111625243A (en) * 2020-05-13 2020-09-04 北京字节跳动网络技术有限公司 Cross-language task processing method and device and electronic equipment
CN111679860A (en) * 2020-08-12 2020-09-18 上海冰鉴信息科技有限公司 Distributed information processing method and device

Also Published As

Publication number Publication date
CN103488775B (en) 2016-08-10

Similar Documents

Publication Publication Date Title
CN103488775A (en) Computing system and computing method for big data processing
Gu et al. Liquid: Intelligent resource estimation and network-efficient scheduling for deep learning jobs on distributed GPU clusters
CN102541640B (en) Cluster GPU (graphic processing unit) resource scheduling system and method
US10469588B2 (en) Methods and apparatus for iterative nonspecific distributed runtime architecture and its application to cloud intelligence
CN110851237B (en) Container cross-isomerism cluster reconstruction method for domestic platform
CN105117286A (en) Task scheduling and pipelining executing method in MapReduce
Gu et al. Partitioning and offloading in smart mobile devices for mobile cloud computing: State of the art and future directions
CN104965689A (en) Hybrid parallel computing method and device for CPUs/GPUs
Desell et al. Malleable applications for scalable high performance computing
CN106293757A (en) Robotic system software's framework and its implementation and device
CN102193831B (en) Method for establishing hierarchical mapping/reduction parallel programming model
Yang et al. Reliable dynamic service chain scheduling in 5G networks
CN116011562A (en) Operator processing method, operator processing device, electronic device and readable storage medium
Płóciennik et al. Approaches to distributed execution of scientific workflows in kepler
CN103049305A (en) Multithreading method of dynamic code conversion of loongson multi-core central processing unit (CPU) simulation
CN102141917A (en) Method for realizing multi-service linkage based on IronPython script language
Li et al. Smart simulation cloud (simulation cloud 2.0)—the newly development of simulation cloud
Cui et al. A scheduling algorithm for multi-tenants instance-intensive workflows
Agarwal et al. Towards an MPI-like framework for the Azure cloud platform
Olaniyan et al. Multipoint synchronization for fog-controlled Internet of Things
Jiang et al. An optimized resource scheduling strategy for Hadoop speculative execution based on non-cooperative game schemes
Liu A Programming Model for the Cloud Platform
De Munck et al. Design and performance evaluation of a conservative parallel discrete event core for GES
CN112506496A (en) Method and system for building system-on-chip development environment
CN105468451A (en) Job scheduling system of computer cluster on the basis of high-throughput sequencing data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant