CN107463595A - A kind of data processing method and system based on Spark - Google Patents

A kind of data processing method and system based on Spark Download PDF

Info

Publication number
CN107463595A
CN107463595A CN201710335307.0A CN201710335307A CN107463595A CN 107463595 A CN107463595 A CN 107463595A CN 201710335307 A CN201710335307 A CN 201710335307A CN 107463595 A CN107463595 A CN 107463595A
Authority
CN
China
Prior art keywords
operator
subjob
subtask
complicated
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710335307.0A
Other languages
Chinese (zh)
Inventor
木伟民
张云
李名扬
张明诚
王伟平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201710335307.0A priority Critical patent/CN107463595A/en
Publication of CN107463595A publication Critical patent/CN107463595A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Abstract

The invention discloses a kind of data processing method and system based on Spark.This method is:1) user chooses operator according to the demand of pending document and configures selected operator parameter, then the annexation of operator selected by foundation, generates the XML file of scene;The XML file of the scene includes each selected XML content of operator and the annexation of each operator;2) corresponding directed acyclic graph DAG is generated according to the XML file of scene;3) directed acyclic graph DAG is cut into some subtask subJob that can be performed in a distributed computing environment, the subtask subJob obtained after cutting is performed under Spark Computational frames, realizes the processing to the pending document.The present invention can achieve a butt joint various isomeric datas, improve data processing flexibility.

Description

A kind of data processing method and system based on Spark
Technical field
The present invention relates to a kind of data processing method and system based on Spark, belong to computer software technical field.
Background technology
What is currently existed is developed based on Hadoop mostly on big data pretreatment system, in Hadoop Between result be stored in HDFS file system, this will cause have many extra expenses, and Spark has used RDD's Theory, this allow it can in transparent internal memory data storage.This way greatly reducing magnetic in data handling procedure The read-write of disk.There are some big data pretreatment systems to be developed based on spark in addition, but it does not have versatility.
The characteristics of present system there is provided substantial amounts of operator interface, and user can be realized to specific with self-defined scene The respective handling of file;User can need self-defined operator according to oneself;The system is the further encapsulation to Spark, is used Family need not use Spark Basic API when self-defined operator;The system can be from the different data sources that user specifies Move data to HDFS;The system can handle different types of file.The present invention solves existing big data pretreatment system Efficiency of uniting is low, technical problem without versatility.
Existing similar operation does not have versatility mostly, and user can only use the function operator that system provides, it is impossible to root It is self-defined according to the demand of oneself, some flexible application scenarios can not be applied to, and all deposited from performance, scalability More or less the problem of.
The content of the invention
It is an object of the invention to provide a kind of data processing method and system based on Spark, the system can be realized Dock various isomeric datas.
The technical scheme is that:
A kind of data processing method based on Distributed Computing Platform, its step are:
1) user chooses operator according to the demand of pending document and configures selected operator parameter, then selected by foundation The annexation of operator, generate the XML file of scene;The XML file of the scene include the XML content of each selected operator with And the annexation of each operator;
2) corresponding directed acyclic graph DAG is generated according to the XML file of scene;
3) directed acyclic graph DAG is cut into some subtasks that can be performed in a distributed computing environment SubJob, the subtask subJob obtained after cutting is performed under Spark Computational frames, realizes the place to the pending document Reason.
Further, the method for directed acyclic graph DAG being cut into some subtask subJob is:
21) XML file of the scene is read, obtains the type of each operator, judges whether complicated operator;Wherein, The complicated operator refers to that operation object is the operator of data complete or collected works;
22) if there is no complicated operator, then using the scene as a subtask subJob;Calculated if there is complexity Son, then using the subtask subjob independent as one of each operator in directed acyclic graph DAG, then advised according to setting Then subtask subjob is merged;The operator is divided into two classes, that is, is adapted to operator and calculates operator;Being adapted to operator includes fitting With input operator and adaptation output operator, calculating operator includes simple computation operator and complicated calculations operator;The setting rule Including:
1) simple computation operator connects simple computation operator and then merged
2) simple computation operator connects the then nonjoinder of complicated calculations operator
3) complicated calculations operator connects the then nonjoinder of simple computation operator
4) complicated calculations operator connects the then nonjoinder of complicated calculations operator
5) adaptation input operator connects simple computation operator and then merged
6) adaptation input operator connects the then nonjoinder of complicated calculations operator
7) simple computation operator connects adaptation output operator and then merged
8) complicated calculations operator connects adaptation output operator then nonjoinder
23) for the subtask subjob after step 22) processing, if subtask subjob end end is not that adaptation is defeated Go out operator or complicated operator, then sink operators are added in subtask subjob ends, the wherein function of sink operators is by number According to storage into the interim tables of hive;If subtask subjob top is not adaptation input operator or complicated operator, at this Subtask subjob tops add scan operators, and the wherein function of scan operators is to read data from the interim tables of hive.
Further, in step 2), directed acyclic graph DAG is judged, determine in directed acyclic graph DAG whether There is ring, have subring or fracture, if one of them, then stop performing, and the interface that result is fed back to where the user.
Further, in the step 3), before subtasking sujob, subtask subjob is scanned first; If it find that Reduce operators, then add ReduceSink operators before the operator during scanning, if do not found, Do not process then;Subtasking subjob after scanning.
A kind of data handling system based on Distributed Computing Platform, it is characterised in that including administrative unit, execution unit And computing unit;Wherein,
The administrative unit, operator is chosen according to the demand of pending document for user and configures selected operator and is joined The annexation of number, then operator selected by foundation, generate the XML file of scene;The XML file of the scene includes each selected The annexation of the XML content of operator and each operator;
The computing unit, for generating corresponding directed acyclic graph DAG according to the XML file of scene;
The execution unit, for directed acyclic graph DAG to be cut into what can be performed in a distributed computing environment Subtask subJob;Then subtask subJob is submitted into Distributed Computing Platform to perform.
As shown in figure 1, the main handling process of the system is:
First, user pulls operator, configuration operator parameter, connection operator according to the demand of itself processing document on interface Generating scene, (each operator has an XML file in itself, when operator generation scene is pulled, according to each calculation in scene The XML file of the annexation generation scene of son.The XML file of scene includes the XML content of each operator and each The annexation of operator), when scene is run, its corresponding XML file is submitted to backstage and carries out related resolution, according to scene XML file in the order of connection of operator that records generate corresponding DAG (Directed acyclic graph, directed acyclic Figure).
Then DAG is cut into many subtask subJob by system controller according to dependency rule, and controller is by subJob Submit to actuator to perform under Spark Computational frames, while real-time running state and result are fed back into interface.
Finally by the file distribution handled well to HDFS for further analysis of the down-stream to file, excavation etc..
Off-line data processing system provided by the invention based on Spark can be divided into four parts, be management respectively Layer, execution level, computation layer and system monitoring O&M.
Each several part main functional modules are as follows:
(1) management level:
1) interface:
Friendly user mutual is provided, user can be carried out being increased, delete, change the behaviour such as specifying information, inquiry Make.Interface can list the information of each operator, facilitate selection and use of the user to operator.User can be under oneself authority Operator such as is increased, deleted, being changed, being inquired about at the operation.When scene is run, the operation feelings of scene can be shown on interface Condition, scene operation progress is fed back into user.
2) process management:
Storage is provided each scene, control and performs and (be divided into and perform and regularly perform immediately) function.Wherein regularly hold Row is controlled by Cron.
3) user management
The management for provide platform user registration, deleting, distribute resource and authority.
4) operator management
Function of registration, renewal and deletion etc. is provided platform operator.
5) resource management
The computing resource and storage resource of each user are managed.
6) rights management
The operator access right of each user, data access authority and execution authority are managed.
(2) execution level:The part is converted to user-defined application layer DAG parsings can be in a distributed computing environment The task Spark Job of execution, and submit it and performed in Spark frameworks, while Spark Job operation information is carried out Collect.
1) metadata:
Storage to operator, process and task is provided.
2) scheduler:
1. resolver (Parser)
Parsing to XML is provided.
2. controller (Controller)
Control to performing task is provided.
3) actuator
The execution for receiving controller is asked so as to perform task, there is provided the hot standby and function of load balancing.
(3) computation layer:The system is based on big data, and user is by pulling operator, configuration operator parameter, line operator Scene is generated, realizes operator DAG.With reference to Spark calculating platforms, the input, calculating and output of data are realized.The present invention is to be based on Spark Computational frames.
1)Spark
Apache Spark computing engines.
2)HDFS
Hadoop distributed file systems.
(4) system monitoring O&M:
Monitoring function is provided to scene implementation progress, operator running status, O&M is provided to data prediction platform.
Compared with prior art, the present invention has following advantage:
1. system provides substantial amounts of operator, user can be with self-defined scene;
2. user can need self-defined operator according to oneself;
3. system is the further encapsulation to Spark, user need not use Spark Basic API;
4. due to the system uses DAG, so it possesses directed acyclic graph autgmentability and the characteristic of flexibility.
Brief description of the drawings
Fig. 1 is flow chart of the method for the present invention.
Embodiment
With reference to specific embodiment, the present invention will be further described in detail, but do not limit the invention in any way Scope.
Two student tables files are handled, has id, name, Chinese Achievement Test in table 1, there is id, name, number in table 2 Study achievement, it is desirable to which last result is:This row of grade will be increased in the file of table 1, be all second grade, by the file of table 2 The mathematics achievement of student all adds 3 points, and two tables finally are merged into a table.
User draws operator on interface, is two adaptation input operators respectively, and one is realized " increase row " function operator, One operator for realizing " increase point " function, one is the operator for realizing " merging of two tables " function, and one is adapted to output operator.
User has configured the parameter of relational operator on interface:" increase row " operator:Increased row are " grades ", and content is " two ";" increase point " operator:In " mathematics achievement ", that is arranged for increase, and increased fraction is " 3 ";Adaptation input operator 1:Extraction File is table 1;Adaptation input operator 2:The file of extraction is table 2;Combined operators:Two tables merge according to id;Adaptation output operator: It is determined that the title of output table.
User connects the context between each operator with line, to each operator in the XML file of operator Input and output be marked, the output according to an operator is that this relation of input of some other operator can be in XML Annexation between middle determination operator;Then point preserves, that is, generates DAG scenes.When user clicks on execution, scene pair The XML file answered is transferred to backstage and parsed, and obtains the context between each operator of whole scene, program pair from the background DAG is judged, determines whether ring, has subring or fracture, and if one of them, then program stopped performs, and will knot Fruit feeds back to front-end interface;If DAG is normal, continue executing with.Controller carries out DAG cuttings, merging, combination, generation below SubJob, this example are exactly a subjob.Subjob is submitted to actuator and performed by controller after generation subjob, is held Row device subjob is scanned first (result obtained after scanning has 2 kinds of situations, i.e., if scanning during if it find that Reduce operators, then ReduceSink operators are added before the operator;If do not found, do not process), have in this example Reduce operators are " combined operators ", so adding reduceSink operators between " combined operators " and each of which father node. Subjob is performed below, in processing procedure, processing progress can on interface real-time display, after processing terminates, place Reason result can show that user can take the file after processing on HDFS.
The key problem in technology point of the present invention is:
1. facing isomeric data, how to realize and the data file of different-format is handled
In face of isomeric data, system is parsed using different methods, makes the file of each type finally all unified A kind of form is parsed into, such system just easily can identify and handle file.The present invention is All Files by inputting operator It is processed into avro forms.
2. how to make the scene conversion that user builds into the program that can be run on Spark
Judge the DAG scenes (whole scene is exactly a Job) of user's structure with the presence or absence of complicated operator (complicated operator life Name with " CO.CO " start):XML file corresponding to reading scene, the class attributes of each operator are obtained one by one.Class attributes Middle display operator type.The type of operator can also be read from the description.xml of operator registration packet.Complicated operator is Refer to the operator that operation object is data complete or collected works.
If complicated operator is not present in whole scene, whole scene is exactly a subJob, directly issues actuator .
If containing complicated operator in scene, three step operations are carried out to Job, first, will be each in DAG Operator (operator) is cut into an independent subjob, second, the subjob being related in following 8 big rules is merged Into a subjob.Operator is broadly divided into two classes, that is, be adapted to operator and calculate operator, adaptation operator include adaptation input operator and Adaptation output operator, calculating operator includes simple computation operator and complicated calculations operator (referred to as complicated operator).Third, to part Subjob adds sink (landing) operators or scan (to pick up) operator, if subjob end end be not adaptation output operator or Complicated operator, then in the subjob ends, plus sink, (sink is acted on:Store data into the interim tables of hive);If Subjob top is not adaptation input operator or complicated operator, then at the subjob tops, plus scan, (scan is acted on:From Data are read in the interim tables of hive).Arrive here, subjob, which is just constructed, to finish.
8 big rules are as follows:
1) simple computation operator connects simple computation operator and then merged
2) simple computation operator connects the then nonjoinder of complicated calculations operator
3) complicated calculations operator connects the then nonjoinder of simple computation operator
4) complicated calculations operator connects the then nonjoinder of complicated calculations operator
5) adaptation input operator connects simple computation operator and then merged
6) adaptation input operator connects the then nonjoinder of complicated calculations operator
7) simple computation operator connects adaptation output operator and then merged
8) complicated calculations operator connects adaptation output operator then nonjoinder
After subjob segmentation is completed, whole scene (Job) partial ordering relation has been obtained.Will be each after segmentation Subjob issues actuator and is scanned, if it find that Reduce operators, then add before the operator during scanning ReduceSink operators.Then carry out second to scan, by Transformation execution flow by RDD (Resilient Distributed Data sets elasticity distribution formula data sets) dependence build come, formation may finally be in Spark The Job of upper execution, i.e. Spark Job.When second of scanning, by Transformation execution flow by RDD's Dependence, which is built, to be come.Such as:map(func):Each element in RDD data sets to calling map is used Func, it is then back to a RDD.filter(func):Each element in RDD data sets to calling filter uses Func, it is then back to one and includes the RDD for forming the element that func is true.
When startup program, according to Job partial ordering relation, while start all no predecessor nodes in DAG, realize simultaneously Row is performed, and the node in figure is then deleted after the node, which performs, to be terminated, and repeats said process, untill execution terminates.

Claims (7)

1. a kind of data processing method based on Distributed Computing Platform, its step are:
1) user chooses operator according to the demand of pending document and configures selected operator parameter, then operator selected by foundation Annexation, generate the XML file of scene;The XML file of the scene includes the XML content of each selected operator and each The annexation of operator;
2) corresponding directed acyclic graph DAG is generated according to the XML file of scene;
3) directed acyclic graph DAG is cut into some subtask subJob that can be performed in a distributed computing environment, The subtask subJob obtained after cutting is performed under Spark Computational frames, realizes the processing to the pending document.
2. the method as described in claim 1, it is characterised in that directed acyclic graph DAG is cut into some subtasks SubJob method is:
21) XML file of the scene is read, obtains the type of each operator, judges whether complicated operator;Wherein, it is described Complicated operator refers to that operation object is the operator of data complete or collected works;
22) if there is no complicated operator, then using the scene as a subtask subJob;If there is complicated operator, then It is then right according to setting rule using the subtask subjob that each operator in directed acyclic graph DAG is independent as one Subtask subjob is merged;The operator is divided into two classes, that is, is adapted to operator and calculates operator;It is defeated including being adapted to be adapted to operator Enter operator and adaptation output operator, calculating operator includes simple computation operator and complicated calculations operator;
The setting rule includes:
1) simple computation operator connects simple computation operator and then merged
2) simple computation operator connects the then nonjoinder of complicated calculations operator
3) complicated calculations operator connects the then nonjoinder of simple computation operator
4) complicated calculations operator connects the then nonjoinder of complicated calculations operator
5) adaptation input operator connects simple computation operator and then merged
6) adaptation input operator connects the then nonjoinder of complicated calculations operator
7) simple computation operator connects adaptation output operator and then merged
8) complicated calculations operator connects adaptation output operator then nonjoinder
23) for the subtask subjob after step 22) processing, if subtask subjob end end is not adaptation, output is calculated Sub or complicated operator, then sink operators are added in subtask subjob ends, the wherein function of sink operators is to deposit data Store up in the interim tables of hive;If subtask subjob top is not adaptation input operator or complicated operator, appoint in the son Business subjob tops add scan operators, and the wherein function of scan operators is to read data from the interim tables of hive.
3. method as claimed in claim 1 or 2, it is characterised in that in step 2), directed acyclic graph DAG is judged, Determine whether there is ring in directed acyclic graph DAG, have subring or fracture, if one of them, then stop performing, and will knot The interface that fruit is fed back to where the user.
4. method as claimed in claim 1 or 2, it is characterised in that first before subtasking sujob in the step 3) First subtask subjob is scanned;If it find that Reduce operators, then add before the operator during scanning ReduceSink operators, if do not found, do not process;Subtasking subjob after scanning.
A kind of 5. data handling system based on Distributed Computing Platform, it is characterised in that including administrative unit, execution unit and Computing unit;Wherein,
The administrative unit, operator is chosen according to the demand of pending document for user and configures selected operator parameter, Then the annexation of operator selected by establishing, the XML file of scene is generated;The XML file of the scene includes each selected calculation The XML content of son and the annexation of each operator;
The computing unit, for generating corresponding directed acyclic graph DAG according to the XML file of scene;
The execution unit, appoint for directed acyclic graph DAG to be cut into the son that can be performed in a distributed computing environment Be engaged in subJob;Then subtask subJob is submitted into Distributed Computing Platform to perform.
6. system as claimed in claim 5, it is characterised in that the computing unit reads the XML file of the scene, obtains every The type of individual operator, judge whether complicated operator;Wherein, the complicated operator refers to that operation object is the calculation of data complete or collected works Son;If there is no complicated operator, then using the scene as a subtask subJob;If there is complicated operator, then should Each operator in directed acyclic graph DAG subtask subjob independent as one, then appoint according to setting regular antithetical phrase Business subjob is merged;The operator is divided into two classes, that is, is adapted to operator and calculates operator;Being adapted to operator includes adaptation input calculation Son and adaptation output operator, calculating operator includes simple computation operator and complicated calculations operator;The setting rule includes:
1) simple computation operator connects simple computation operator and then merged
2) simple computation operator connects the then nonjoinder of complicated calculations operator
3) complicated calculations operator connects the then nonjoinder of simple computation operator
4) complicated calculations operator connects the then nonjoinder of complicated calculations operator
5) adaptation input operator connects simple computation operator and then merged
6) adaptation input operator connects the then nonjoinder of complicated calculations operator
7) simple computation operator connects adaptation output operator and then merged
8) complicated calculations operator connects adaptation output operator then nonjoinder
Then for the subtask subjob after above-mentioned processing, if subtask subjob end end is not adaptation output operator Or complicated operator, then sink operators are added in subtask subjob ends, the wherein function of sink operators is by data storage Into the interim tables of hive;If subtask subjob top is not adaptation input operator or complicated operator, in the subtask Subjob tops add scan operators, and the wherein function of scan operators is to read data from the interim tables of hive.
7. the system as described in claim 5 or 6, it is characterised in that the execution unit is to the subtask subjob after cutting It is scanned;If it find that Reduce operators, then add ReduceSink operators before the operator;Then subJob is submitted Performed to Distributed Computing Platform.
CN201710335307.0A 2017-05-12 2017-05-12 A kind of data processing method and system based on Spark Pending CN107463595A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710335307.0A CN107463595A (en) 2017-05-12 2017-05-12 A kind of data processing method and system based on Spark

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710335307.0A CN107463595A (en) 2017-05-12 2017-05-12 A kind of data processing method and system based on Spark

Publications (1)

Publication Number Publication Date
CN107463595A true CN107463595A (en) 2017-12-12

Family

ID=60543751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710335307.0A Pending CN107463595A (en) 2017-05-12 2017-05-12 A kind of data processing method and system based on Spark

Country Status (1)

Country Link
CN (1) CN107463595A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062251A (en) * 2018-01-09 2018-05-22 福建星瑞格软件有限公司 A kind of server resource recovery method and computer equipment
CN108628605A (en) * 2018-04-28 2018-10-09 百度在线网络技术(北京)有限公司 Stream data processing method, device, server and medium
CN108733832A (en) * 2018-05-28 2018-11-02 北京阿可科技有限公司 The distributed storage method of directed acyclic graph
CN108984155A (en) * 2018-05-17 2018-12-11 阿里巴巴集团控股有限公司 Flow chart of data processing setting method and device
CN109063056A (en) * 2018-07-20 2018-12-21 阿里巴巴集团控股有限公司 A kind of data query method, system and terminal device
CN109117141A (en) * 2018-09-04 2019-01-01 深圳市木瓜移动科技有限公司 Simplify method, apparatus, the electronic equipment, computer readable storage medium of programming
CN109445926A (en) * 2018-11-09 2019-03-08 杭州玳数科技有限公司 Data task dispatching method and data task dispatch system
CN110297632A (en) * 2019-06-12 2019-10-01 百度在线网络技术(北京)有限公司 Code generating method and device
CN110532447A (en) * 2019-08-29 2019-12-03 上海云从汇临人工智能科技有限公司 A kind of business data processing method, device, medium and equipment
CN110851283A (en) * 2019-11-14 2020-02-28 百度在线网络技术(北京)有限公司 Resource processing method and device and electronic equipment
CN110888720A (en) * 2019-10-08 2020-03-17 北京百度网讯科技有限公司 Task processing method and device, computer equipment and storage medium
CN111291106A (en) * 2020-05-13 2020-06-16 成都四方伟业软件股份有限公司 Efficient flow arrangement method and system for ETL system
CN111625243A (en) * 2020-05-13 2020-09-04 北京字节跳动网络技术有限公司 Cross-language task processing method and device and electronic equipment
CN112130851A (en) * 2020-08-04 2020-12-25 中科天玑数据科技股份有限公司 Modeling method for artificial intelligence, electronic equipment and storage medium
CN112632113A (en) * 2020-12-31 2021-04-09 北京九章云极科技有限公司 Operator management method and operator management system
CN113342346A (en) * 2021-05-18 2021-09-03 北京百度网讯科技有限公司 Operator registration method, device, equipment and storage medium of deep learning framework

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104052811A (en) * 2014-06-17 2014-09-17 华为技术有限公司 Service scheduling method and device and system
CN104360903A (en) * 2014-11-18 2015-02-18 北京美琦华悦通讯科技有限公司 Method for realizing task data decoupling in spark operation scheduling system
CN105354089A (en) * 2015-10-15 2016-02-24 北京航空航天大学 Streaming data processing model and system supporting iterative calculation
CN105354242A (en) * 2015-10-15 2016-02-24 北京航空航天大学 Distributed data processing method and device
CN105426504A (en) * 2015-11-27 2016-03-23 陕西艾特信息化工程咨询有限责任公司 Distributed data analysis processing method based on memory computation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104052811A (en) * 2014-06-17 2014-09-17 华为技术有限公司 Service scheduling method and device and system
CN104360903A (en) * 2014-11-18 2015-02-18 北京美琦华悦通讯科技有限公司 Method for realizing task data decoupling in spark operation scheduling system
CN105354089A (en) * 2015-10-15 2016-02-24 北京航空航天大学 Streaming data processing model and system supporting iterative calculation
CN105354242A (en) * 2015-10-15 2016-02-24 北京航空航天大学 Distributed data processing method and device
CN105426504A (en) * 2015-11-27 2016-03-23 陕西艾特信息化工程咨询有限责任公司 Distributed data analysis processing method based on memory computation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
殷荣: "《基于DAG模型的离线数据处理引擎的设计与实现》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062251A (en) * 2018-01-09 2018-05-22 福建星瑞格软件有限公司 A kind of server resource recovery method and computer equipment
CN108628605A (en) * 2018-04-28 2018-10-09 百度在线网络技术(北京)有限公司 Stream data processing method, device, server and medium
CN108984155A (en) * 2018-05-17 2018-12-11 阿里巴巴集团控股有限公司 Flow chart of data processing setting method and device
CN108984155B (en) * 2018-05-17 2021-09-07 创新先进技术有限公司 Data processing flow setting method and device
CN108733832A (en) * 2018-05-28 2018-11-02 北京阿可科技有限公司 The distributed storage method of directed acyclic graph
CN108733832B (en) * 2018-05-28 2019-04-30 北京阿可科技有限公司 The distributed storage method of directed acyclic graph
CN109063056A (en) * 2018-07-20 2018-12-21 阿里巴巴集团控股有限公司 A kind of data query method, system and terminal device
CN109117141A (en) * 2018-09-04 2019-01-01 深圳市木瓜移动科技有限公司 Simplify method, apparatus, the electronic equipment, computer readable storage medium of programming
CN109117141B (en) * 2018-09-04 2021-09-24 深圳市木瓜移动科技有限公司 Method, device, electronic equipment and computer readable storage medium for simplifying programming
CN109445926A (en) * 2018-11-09 2019-03-08 杭州玳数科技有限公司 Data task dispatching method and data task dispatch system
CN110297632A (en) * 2019-06-12 2019-10-01 百度在线网络技术(北京)有限公司 Code generating method and device
CN110532447A (en) * 2019-08-29 2019-12-03 上海云从汇临人工智能科技有限公司 A kind of business data processing method, device, medium and equipment
CN110888720A (en) * 2019-10-08 2020-03-17 北京百度网讯科技有限公司 Task processing method and device, computer equipment and storage medium
CN110851283A (en) * 2019-11-14 2020-02-28 百度在线网络技术(北京)有限公司 Resource processing method and device and electronic equipment
CN111625243A (en) * 2020-05-13 2020-09-04 北京字节跳动网络技术有限公司 Cross-language task processing method and device and electronic equipment
CN111291106A (en) * 2020-05-13 2020-06-16 成都四方伟业软件股份有限公司 Efficient flow arrangement method and system for ETL system
CN112130851A (en) * 2020-08-04 2020-12-25 中科天玑数据科技股份有限公司 Modeling method for artificial intelligence, electronic equipment and storage medium
CN112130851B (en) * 2020-08-04 2022-04-15 中科天玑数据科技股份有限公司 Modeling method for artificial intelligence, electronic equipment and storage medium
CN112632113A (en) * 2020-12-31 2021-04-09 北京九章云极科技有限公司 Operator management method and operator management system
CN113342346A (en) * 2021-05-18 2021-09-03 北京百度网讯科技有限公司 Operator registration method, device, equipment and storage medium of deep learning framework
US11625248B2 (en) 2021-05-18 2023-04-11 Beijing Baidu Netcom Science Technology Co., Ltd. Operator registration method and apparatus for deep learning framework, device and storage medium

Similar Documents

Publication Publication Date Title
CN107463595A (en) A kind of data processing method and system based on Spark
CN105550268B (en) Big data process modeling analysis engine
US10764370B2 (en) Hybrid cloud migration delay risk prediction engine
US6799314B2 (en) Work flow management method and work flow management system of controlling a work flow
CN106022007B (en) The cloud platform system and method learning big data and calculating is organized towards biology
US10769147B2 (en) Batch data query method and apparatus
US8560636B2 (en) Methods and systems for providing a virtual network process context for network participant processes in a networked business process
CA2845059C (en) Test script generation system
CN110168518A (en) Prepare and arrange the user interface of the data for subsequent analysis
CN105893509B (en) A kind of label of big data analysis model and explain system and method
US20070083875A1 (en) Method of delegating activity in service oriented architectures using continuations
US20120054335A1 (en) Methods and systems for managing quality of services for network participants in a networked business process
WO2016165321A1 (en) Method and apparatus for establishing requirement meta model for high-speed train
US20110126199A1 (en) Method and Apparatus for Communicating During Automated Data Processing
US20070106515A1 (en) Automated interactive statistical call visualization using abstractions stack model framework
CN116560626A (en) Data processing method, system, equipment and storage medium based on custom rules
US8819619B2 (en) Method and system for capturing user interface structure in a model based software system
CN109284324A (en) The dispatching device of flow tasks based on Apache Oozie frame processing big data
CN113568604B (en) Method and device for updating wind control strategy and computer readable storage medium
CN106294185A (en) Automated test frames based on five layers of framework and method
CN112686580A (en) Workflow definition method and system capable of customizing flow
WO2022253165A1 (en) Scheduling method, system, server and computer readable storage medium
CN104660697B (en) Based on Kepler scientific workflow Sensor Network service combining methods
US11586643B2 (en) Enabling dynamic data capture with database objects
CN111160403B (en) API (application program interface) multiplexing discovery method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171212