CN107463595A - A kind of data processing method and system based on Spark - Google Patents
A kind of data processing method and system based on Spark Download PDFInfo
- Publication number
- CN107463595A CN107463595A CN201710335307.0A CN201710335307A CN107463595A CN 107463595 A CN107463595 A CN 107463595A CN 201710335307 A CN201710335307 A CN 201710335307A CN 107463595 A CN107463595 A CN 107463595A
- Authority
- CN
- China
- Prior art keywords
- operator
- subjob
- subtask
- complicated
- scene
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
Abstract
The invention discloses a kind of data processing method and system based on Spark.This method is:1) user chooses operator according to the demand of pending document and configures selected operator parameter, then the annexation of operator selected by foundation, generates the XML file of scene;The XML file of the scene includes each selected XML content of operator and the annexation of each operator;2) corresponding directed acyclic graph DAG is generated according to the XML file of scene;3) directed acyclic graph DAG is cut into some subtask subJob that can be performed in a distributed computing environment, the subtask subJob obtained after cutting is performed under Spark Computational frames, realizes the processing to the pending document.The present invention can achieve a butt joint various isomeric datas, improve data processing flexibility.
Description
Technical field
The present invention relates to a kind of data processing method and system based on Spark, belong to computer software technical field.
Background technology
What is currently existed is developed based on Hadoop mostly on big data pretreatment system, in Hadoop
Between result be stored in HDFS file system, this will cause have many extra expenses, and Spark has used RDD's
Theory, this allow it can in transparent internal memory data storage.This way greatly reducing magnetic in data handling procedure
The read-write of disk.There are some big data pretreatment systems to be developed based on spark in addition, but it does not have versatility.
The characteristics of present system there is provided substantial amounts of operator interface, and user can be realized to specific with self-defined scene
The respective handling of file;User can need self-defined operator according to oneself;The system is the further encapsulation to Spark, is used
Family need not use Spark Basic API when self-defined operator;The system can be from the different data sources that user specifies
Move data to HDFS;The system can handle different types of file.The present invention solves existing big data pretreatment system
Efficiency of uniting is low, technical problem without versatility.
Existing similar operation does not have versatility mostly, and user can only use the function operator that system provides, it is impossible to root
It is self-defined according to the demand of oneself, some flexible application scenarios can not be applied to, and all deposited from performance, scalability
More or less the problem of.
The content of the invention
It is an object of the invention to provide a kind of data processing method and system based on Spark, the system can be realized
Dock various isomeric datas.
The technical scheme is that:
A kind of data processing method based on Distributed Computing Platform, its step are:
1) user chooses operator according to the demand of pending document and configures selected operator parameter, then selected by foundation
The annexation of operator, generate the XML file of scene;The XML file of the scene include the XML content of each selected operator with
And the annexation of each operator;
2) corresponding directed acyclic graph DAG is generated according to the XML file of scene;
3) directed acyclic graph DAG is cut into some subtasks that can be performed in a distributed computing environment
SubJob, the subtask subJob obtained after cutting is performed under Spark Computational frames, realizes the place to the pending document
Reason.
Further, the method for directed acyclic graph DAG being cut into some subtask subJob is:
21) XML file of the scene is read, obtains the type of each operator, judges whether complicated operator;Wherein,
The complicated operator refers to that operation object is the operator of data complete or collected works;
22) if there is no complicated operator, then using the scene as a subtask subJob;Calculated if there is complexity
Son, then using the subtask subjob independent as one of each operator in directed acyclic graph DAG, then advised according to setting
Then subtask subjob is merged;The operator is divided into two classes, that is, is adapted to operator and calculates operator;Being adapted to operator includes fitting
With input operator and adaptation output operator, calculating operator includes simple computation operator and complicated calculations operator;The setting rule
Including:
1) simple computation operator connects simple computation operator and then merged
2) simple computation operator connects the then nonjoinder of complicated calculations operator
3) complicated calculations operator connects the then nonjoinder of simple computation operator
4) complicated calculations operator connects the then nonjoinder of complicated calculations operator
5) adaptation input operator connects simple computation operator and then merged
6) adaptation input operator connects the then nonjoinder of complicated calculations operator
7) simple computation operator connects adaptation output operator and then merged
8) complicated calculations operator connects adaptation output operator then nonjoinder
23) for the subtask subjob after step 22) processing, if subtask subjob end end is not that adaptation is defeated
Go out operator or complicated operator, then sink operators are added in subtask subjob ends, the wherein function of sink operators is by number
According to storage into the interim tables of hive;If subtask subjob top is not adaptation input operator or complicated operator, at this
Subtask subjob tops add scan operators, and the wherein function of scan operators is to read data from the interim tables of hive.
Further, in step 2), directed acyclic graph DAG is judged, determine in directed acyclic graph DAG whether
There is ring, have subring or fracture, if one of them, then stop performing, and the interface that result is fed back to where the user.
Further, in the step 3), before subtasking sujob, subtask subjob is scanned first;
If it find that Reduce operators, then add ReduceSink operators before the operator during scanning, if do not found,
Do not process then;Subtasking subjob after scanning.
A kind of data handling system based on Distributed Computing Platform, it is characterised in that including administrative unit, execution unit
And computing unit;Wherein,
The administrative unit, operator is chosen according to the demand of pending document for user and configures selected operator and is joined
The annexation of number, then operator selected by foundation, generate the XML file of scene;The XML file of the scene includes each selected
The annexation of the XML content of operator and each operator;
The computing unit, for generating corresponding directed acyclic graph DAG according to the XML file of scene;
The execution unit, for directed acyclic graph DAG to be cut into what can be performed in a distributed computing environment
Subtask subJob;Then subtask subJob is submitted into Distributed Computing Platform to perform.
As shown in figure 1, the main handling process of the system is:
First, user pulls operator, configuration operator parameter, connection operator according to the demand of itself processing document on interface
Generating scene, (each operator has an XML file in itself, when operator generation scene is pulled, according to each calculation in scene
The XML file of the annexation generation scene of son.The XML file of scene includes the XML content of each operator and each
The annexation of operator), when scene is run, its corresponding XML file is submitted to backstage and carries out related resolution, according to scene
XML file in the order of connection of operator that records generate corresponding DAG (Directed acyclic graph, directed acyclic
Figure).
Then DAG is cut into many subtask subJob by system controller according to dependency rule, and controller is by subJob
Submit to actuator to perform under Spark Computational frames, while real-time running state and result are fed back into interface.
Finally by the file distribution handled well to HDFS for further analysis of the down-stream to file, excavation etc..
Off-line data processing system provided by the invention based on Spark can be divided into four parts, be management respectively
Layer, execution level, computation layer and system monitoring O&M.
Each several part main functional modules are as follows:
(1) management level:
1) interface:
Friendly user mutual is provided, user can be carried out being increased, delete, change the behaviour such as specifying information, inquiry
Make.Interface can list the information of each operator, facilitate selection and use of the user to operator.User can be under oneself authority
Operator such as is increased, deleted, being changed, being inquired about at the operation.When scene is run, the operation feelings of scene can be shown on interface
Condition, scene operation progress is fed back into user.
2) process management:
Storage is provided each scene, control and performs and (be divided into and perform and regularly perform immediately) function.Wherein regularly hold
Row is controlled by Cron.
3) user management
The management for provide platform user registration, deleting, distribute resource and authority.
4) operator management
Function of registration, renewal and deletion etc. is provided platform operator.
5) resource management
The computing resource and storage resource of each user are managed.
6) rights management
The operator access right of each user, data access authority and execution authority are managed.
(2) execution level:The part is converted to user-defined application layer DAG parsings can be in a distributed computing environment
The task Spark Job of execution, and submit it and performed in Spark frameworks, while Spark Job operation information is carried out
Collect.
1) metadata:
Storage to operator, process and task is provided.
2) scheduler:
1. resolver (Parser)
Parsing to XML is provided.
2. controller (Controller)
Control to performing task is provided.
3) actuator
The execution for receiving controller is asked so as to perform task, there is provided the hot standby and function of load balancing.
(3) computation layer:The system is based on big data, and user is by pulling operator, configuration operator parameter, line operator
Scene is generated, realizes operator DAG.With reference to Spark calculating platforms, the input, calculating and output of data are realized.The present invention is to be based on
Spark Computational frames.
1)Spark
Apache Spark computing engines.
2)HDFS
Hadoop distributed file systems.
(4) system monitoring O&M:
Monitoring function is provided to scene implementation progress, operator running status, O&M is provided to data prediction platform.
Compared with prior art, the present invention has following advantage:
1. system provides substantial amounts of operator, user can be with self-defined scene;
2. user can need self-defined operator according to oneself;
3. system is the further encapsulation to Spark, user need not use Spark Basic API;
4. due to the system uses DAG, so it possesses directed acyclic graph autgmentability and the characteristic of flexibility.
Brief description of the drawings
Fig. 1 is flow chart of the method for the present invention.
Embodiment
With reference to specific embodiment, the present invention will be further described in detail, but do not limit the invention in any way
Scope.
Two student tables files are handled, has id, name, Chinese Achievement Test in table 1, there is id, name, number in table 2
Study achievement, it is desirable to which last result is:This row of grade will be increased in the file of table 1, be all second grade, by the file of table 2
The mathematics achievement of student all adds 3 points, and two tables finally are merged into a table.
User draws operator on interface, is two adaptation input operators respectively, and one is realized " increase row " function operator,
One operator for realizing " increase point " function, one is the operator for realizing " merging of two tables " function, and one is adapted to output operator.
User has configured the parameter of relational operator on interface:" increase row " operator:Increased row are " grades ", and content is
" two ";" increase point " operator:In " mathematics achievement ", that is arranged for increase, and increased fraction is " 3 ";Adaptation input operator 1:Extraction
File is table 1;Adaptation input operator 2:The file of extraction is table 2;Combined operators:Two tables merge according to id;Adaptation output operator:
It is determined that the title of output table.
User connects the context between each operator with line, to each operator in the XML file of operator
Input and output be marked, the output according to an operator is that this relation of input of some other operator can be in XML
Annexation between middle determination operator;Then point preserves, that is, generates DAG scenes.When user clicks on execution, scene pair
The XML file answered is transferred to backstage and parsed, and obtains the context between each operator of whole scene, program pair from the background
DAG is judged, determines whether ring, has subring or fracture, and if one of them, then program stopped performs, and will knot
Fruit feeds back to front-end interface;If DAG is normal, continue executing with.Controller carries out DAG cuttings, merging, combination, generation below
SubJob, this example are exactly a subjob.Subjob is submitted to actuator and performed by controller after generation subjob, is held
Row device subjob is scanned first (result obtained after scanning has 2 kinds of situations, i.e., if scanning during if it find that
Reduce operators, then ReduceSink operators are added before the operator;If do not found, do not process), have in this example
Reduce operators are " combined operators ", so adding reduceSink operators between " combined operators " and each of which father node.
Subjob is performed below, in processing procedure, processing progress can on interface real-time display, after processing terminates, place
Reason result can show that user can take the file after processing on HDFS.
The key problem in technology point of the present invention is:
1. facing isomeric data, how to realize and the data file of different-format is handled
In face of isomeric data, system is parsed using different methods, makes the file of each type finally all unified
A kind of form is parsed into, such system just easily can identify and handle file.The present invention is All Files by inputting operator
It is processed into avro forms.
2. how to make the scene conversion that user builds into the program that can be run on Spark
Judge the DAG scenes (whole scene is exactly a Job) of user's structure with the presence or absence of complicated operator (complicated operator life
Name with " CO.CO " start):XML file corresponding to reading scene, the class attributes of each operator are obtained one by one.Class attributes
Middle display operator type.The type of operator can also be read from the description.xml of operator registration packet.Complicated operator is
Refer to the operator that operation object is data complete or collected works.
If complicated operator is not present in whole scene, whole scene is exactly a subJob, directly issues actuator
.
If containing complicated operator in scene, three step operations are carried out to Job, first, will be each in DAG
Operator (operator) is cut into an independent subjob, second, the subjob being related in following 8 big rules is merged
Into a subjob.Operator is broadly divided into two classes, that is, be adapted to operator and calculate operator, adaptation operator include adaptation input operator and
Adaptation output operator, calculating operator includes simple computation operator and complicated calculations operator (referred to as complicated operator).Third, to part
Subjob adds sink (landing) operators or scan (to pick up) operator, if subjob end end be not adaptation output operator or
Complicated operator, then in the subjob ends, plus sink, (sink is acted on:Store data into the interim tables of hive);If
Subjob top is not adaptation input operator or complicated operator, then at the subjob tops, plus scan, (scan is acted on:From
Data are read in the interim tables of hive).Arrive here, subjob, which is just constructed, to finish.
8 big rules are as follows:
1) simple computation operator connects simple computation operator and then merged
2) simple computation operator connects the then nonjoinder of complicated calculations operator
3) complicated calculations operator connects the then nonjoinder of simple computation operator
4) complicated calculations operator connects the then nonjoinder of complicated calculations operator
5) adaptation input operator connects simple computation operator and then merged
6) adaptation input operator connects the then nonjoinder of complicated calculations operator
7) simple computation operator connects adaptation output operator and then merged
8) complicated calculations operator connects adaptation output operator then nonjoinder
After subjob segmentation is completed, whole scene (Job) partial ordering relation has been obtained.Will be each after segmentation
Subjob issues actuator and is scanned, if it find that Reduce operators, then add before the operator during scanning
ReduceSink operators.Then carry out second to scan, by Transformation execution flow by RDD (Resilient
Distributed Data sets elasticity distribution formula data sets) dependence build come, formation may finally be in Spark
The Job of upper execution, i.e. Spark Job.When second of scanning, by Transformation execution flow by RDD's
Dependence, which is built, to be come.Such as:map(func):Each element in RDD data sets to calling map is used
Func, it is then back to a RDD.filter(func):Each element in RDD data sets to calling filter uses
Func, it is then back to one and includes the RDD for forming the element that func is true.
When startup program, according to Job partial ordering relation, while start all no predecessor nodes in DAG, realize simultaneously
Row is performed, and the node in figure is then deleted after the node, which performs, to be terminated, and repeats said process, untill execution terminates.
Claims (7)
1. a kind of data processing method based on Distributed Computing Platform, its step are:
1) user chooses operator according to the demand of pending document and configures selected operator parameter, then operator selected by foundation
Annexation, generate the XML file of scene;The XML file of the scene includes the XML content of each selected operator and each
The annexation of operator;
2) corresponding directed acyclic graph DAG is generated according to the XML file of scene;
3) directed acyclic graph DAG is cut into some subtask subJob that can be performed in a distributed computing environment,
The subtask subJob obtained after cutting is performed under Spark Computational frames, realizes the processing to the pending document.
2. the method as described in claim 1, it is characterised in that directed acyclic graph DAG is cut into some subtasks
SubJob method is:
21) XML file of the scene is read, obtains the type of each operator, judges whether complicated operator;Wherein, it is described
Complicated operator refers to that operation object is the operator of data complete or collected works;
22) if there is no complicated operator, then using the scene as a subtask subJob;If there is complicated operator, then
It is then right according to setting rule using the subtask subjob that each operator in directed acyclic graph DAG is independent as one
Subtask subjob is merged;The operator is divided into two classes, that is, is adapted to operator and calculates operator;It is defeated including being adapted to be adapted to operator
Enter operator and adaptation output operator, calculating operator includes simple computation operator and complicated calculations operator;
The setting rule includes:
1) simple computation operator connects simple computation operator and then merged
2) simple computation operator connects the then nonjoinder of complicated calculations operator
3) complicated calculations operator connects the then nonjoinder of simple computation operator
4) complicated calculations operator connects the then nonjoinder of complicated calculations operator
5) adaptation input operator connects simple computation operator and then merged
6) adaptation input operator connects the then nonjoinder of complicated calculations operator
7) simple computation operator connects adaptation output operator and then merged
8) complicated calculations operator connects adaptation output operator then nonjoinder
23) for the subtask subjob after step 22) processing, if subtask subjob end end is not adaptation, output is calculated
Sub or complicated operator, then sink operators are added in subtask subjob ends, the wherein function of sink operators is to deposit data
Store up in the interim tables of hive;If subtask subjob top is not adaptation input operator or complicated operator, appoint in the son
Business subjob tops add scan operators, and the wherein function of scan operators is to read data from the interim tables of hive.
3. method as claimed in claim 1 or 2, it is characterised in that in step 2), directed acyclic graph DAG is judged,
Determine whether there is ring in directed acyclic graph DAG, have subring or fracture, if one of them, then stop performing, and will knot
The interface that fruit is fed back to where the user.
4. method as claimed in claim 1 or 2, it is characterised in that first before subtasking sujob in the step 3)
First subtask subjob is scanned;If it find that Reduce operators, then add before the operator during scanning
ReduceSink operators, if do not found, do not process;Subtasking subjob after scanning.
A kind of 5. data handling system based on Distributed Computing Platform, it is characterised in that including administrative unit, execution unit and
Computing unit;Wherein,
The administrative unit, operator is chosen according to the demand of pending document for user and configures selected operator parameter,
Then the annexation of operator selected by establishing, the XML file of scene is generated;The XML file of the scene includes each selected calculation
The XML content of son and the annexation of each operator;
The computing unit, for generating corresponding directed acyclic graph DAG according to the XML file of scene;
The execution unit, appoint for directed acyclic graph DAG to be cut into the son that can be performed in a distributed computing environment
Be engaged in subJob;Then subtask subJob is submitted into Distributed Computing Platform to perform.
6. system as claimed in claim 5, it is characterised in that the computing unit reads the XML file of the scene, obtains every
The type of individual operator, judge whether complicated operator;Wherein, the complicated operator refers to that operation object is the calculation of data complete or collected works
Son;If there is no complicated operator, then using the scene as a subtask subJob;If there is complicated operator, then should
Each operator in directed acyclic graph DAG subtask subjob independent as one, then appoint according to setting regular antithetical phrase
Business subjob is merged;The operator is divided into two classes, that is, is adapted to operator and calculates operator;Being adapted to operator includes adaptation input calculation
Son and adaptation output operator, calculating operator includes simple computation operator and complicated calculations operator;The setting rule includes:
1) simple computation operator connects simple computation operator and then merged
2) simple computation operator connects the then nonjoinder of complicated calculations operator
3) complicated calculations operator connects the then nonjoinder of simple computation operator
4) complicated calculations operator connects the then nonjoinder of complicated calculations operator
5) adaptation input operator connects simple computation operator and then merged
6) adaptation input operator connects the then nonjoinder of complicated calculations operator
7) simple computation operator connects adaptation output operator and then merged
8) complicated calculations operator connects adaptation output operator then nonjoinder
Then for the subtask subjob after above-mentioned processing, if subtask subjob end end is not adaptation output operator
Or complicated operator, then sink operators are added in subtask subjob ends, the wherein function of sink operators is by data storage
Into the interim tables of hive;If subtask subjob top is not adaptation input operator or complicated operator, in the subtask
Subjob tops add scan operators, and the wherein function of scan operators is to read data from the interim tables of hive.
7. the system as described in claim 5 or 6, it is characterised in that the execution unit is to the subtask subjob after cutting
It is scanned;If it find that Reduce operators, then add ReduceSink operators before the operator;Then subJob is submitted
Performed to Distributed Computing Platform.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710335307.0A CN107463595A (en) | 2017-05-12 | 2017-05-12 | A kind of data processing method and system based on Spark |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710335307.0A CN107463595A (en) | 2017-05-12 | 2017-05-12 | A kind of data processing method and system based on Spark |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107463595A true CN107463595A (en) | 2017-12-12 |
Family
ID=60543751
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710335307.0A Pending CN107463595A (en) | 2017-05-12 | 2017-05-12 | A kind of data processing method and system based on Spark |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107463595A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108062251A (en) * | 2018-01-09 | 2018-05-22 | 福建星瑞格软件有限公司 | A kind of server resource recovery method and computer equipment |
CN108628605A (en) * | 2018-04-28 | 2018-10-09 | 百度在线网络技术(北京)有限公司 | Stream data processing method, device, server and medium |
CN108733832A (en) * | 2018-05-28 | 2018-11-02 | 北京阿可科技有限公司 | The distributed storage method of directed acyclic graph |
CN108984155A (en) * | 2018-05-17 | 2018-12-11 | 阿里巴巴集团控股有限公司 | Flow chart of data processing setting method and device |
CN109063056A (en) * | 2018-07-20 | 2018-12-21 | 阿里巴巴集团控股有限公司 | A kind of data query method, system and terminal device |
CN109117141A (en) * | 2018-09-04 | 2019-01-01 | 深圳市木瓜移动科技有限公司 | Simplify method, apparatus, the electronic equipment, computer readable storage medium of programming |
CN109445926A (en) * | 2018-11-09 | 2019-03-08 | 杭州玳数科技有限公司 | Data task dispatching method and data task dispatch system |
CN110297632A (en) * | 2019-06-12 | 2019-10-01 | 百度在线网络技术(北京)有限公司 | Code generating method and device |
CN110532447A (en) * | 2019-08-29 | 2019-12-03 | 上海云从汇临人工智能科技有限公司 | A kind of business data processing method, device, medium and equipment |
CN110851283A (en) * | 2019-11-14 | 2020-02-28 | 百度在线网络技术(北京)有限公司 | Resource processing method and device and electronic equipment |
CN110888720A (en) * | 2019-10-08 | 2020-03-17 | 北京百度网讯科技有限公司 | Task processing method and device, computer equipment and storage medium |
CN111291106A (en) * | 2020-05-13 | 2020-06-16 | 成都四方伟业软件股份有限公司 | Efficient flow arrangement method and system for ETL system |
CN111625243A (en) * | 2020-05-13 | 2020-09-04 | 北京字节跳动网络技术有限公司 | Cross-language task processing method and device and electronic equipment |
CN112130851A (en) * | 2020-08-04 | 2020-12-25 | 中科天玑数据科技股份有限公司 | Modeling method for artificial intelligence, electronic equipment and storage medium |
CN112632113A (en) * | 2020-12-31 | 2021-04-09 | 北京九章云极科技有限公司 | Operator management method and operator management system |
CN113342346A (en) * | 2021-05-18 | 2021-09-03 | 北京百度网讯科技有限公司 | Operator registration method, device, equipment and storage medium of deep learning framework |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104052811A (en) * | 2014-06-17 | 2014-09-17 | 华为技术有限公司 | Service scheduling method and device and system |
CN104360903A (en) * | 2014-11-18 | 2015-02-18 | 北京美琦华悦通讯科技有限公司 | Method for realizing task data decoupling in spark operation scheduling system |
CN105354089A (en) * | 2015-10-15 | 2016-02-24 | 北京航空航天大学 | Streaming data processing model and system supporting iterative calculation |
CN105354242A (en) * | 2015-10-15 | 2016-02-24 | 北京航空航天大学 | Distributed data processing method and device |
CN105426504A (en) * | 2015-11-27 | 2016-03-23 | 陕西艾特信息化工程咨询有限责任公司 | Distributed data analysis processing method based on memory computation |
-
2017
- 2017-05-12 CN CN201710335307.0A patent/CN107463595A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104052811A (en) * | 2014-06-17 | 2014-09-17 | 华为技术有限公司 | Service scheduling method and device and system |
CN104360903A (en) * | 2014-11-18 | 2015-02-18 | 北京美琦华悦通讯科技有限公司 | Method for realizing task data decoupling in spark operation scheduling system |
CN105354089A (en) * | 2015-10-15 | 2016-02-24 | 北京航空航天大学 | Streaming data processing model and system supporting iterative calculation |
CN105354242A (en) * | 2015-10-15 | 2016-02-24 | 北京航空航天大学 | Distributed data processing method and device |
CN105426504A (en) * | 2015-11-27 | 2016-03-23 | 陕西艾特信息化工程咨询有限责任公司 | Distributed data analysis processing method based on memory computation |
Non-Patent Citations (1)
Title |
---|
殷荣: "《基于DAG模型的离线数据处理引擎的设计与实现》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108062251A (en) * | 2018-01-09 | 2018-05-22 | 福建星瑞格软件有限公司 | A kind of server resource recovery method and computer equipment |
CN108628605A (en) * | 2018-04-28 | 2018-10-09 | 百度在线网络技术(北京)有限公司 | Stream data processing method, device, server and medium |
CN108984155A (en) * | 2018-05-17 | 2018-12-11 | 阿里巴巴集团控股有限公司 | Flow chart of data processing setting method and device |
CN108984155B (en) * | 2018-05-17 | 2021-09-07 | 创新先进技术有限公司 | Data processing flow setting method and device |
CN108733832A (en) * | 2018-05-28 | 2018-11-02 | 北京阿可科技有限公司 | The distributed storage method of directed acyclic graph |
CN108733832B (en) * | 2018-05-28 | 2019-04-30 | 北京阿可科技有限公司 | The distributed storage method of directed acyclic graph |
CN109063056A (en) * | 2018-07-20 | 2018-12-21 | 阿里巴巴集团控股有限公司 | A kind of data query method, system and terminal device |
CN109117141A (en) * | 2018-09-04 | 2019-01-01 | 深圳市木瓜移动科技有限公司 | Simplify method, apparatus, the electronic equipment, computer readable storage medium of programming |
CN109117141B (en) * | 2018-09-04 | 2021-09-24 | 深圳市木瓜移动科技有限公司 | Method, device, electronic equipment and computer readable storage medium for simplifying programming |
CN109445926A (en) * | 2018-11-09 | 2019-03-08 | 杭州玳数科技有限公司 | Data task dispatching method and data task dispatch system |
CN110297632A (en) * | 2019-06-12 | 2019-10-01 | 百度在线网络技术(北京)有限公司 | Code generating method and device |
CN110532447A (en) * | 2019-08-29 | 2019-12-03 | 上海云从汇临人工智能科技有限公司 | A kind of business data processing method, device, medium and equipment |
CN110888720A (en) * | 2019-10-08 | 2020-03-17 | 北京百度网讯科技有限公司 | Task processing method and device, computer equipment and storage medium |
CN110851283A (en) * | 2019-11-14 | 2020-02-28 | 百度在线网络技术(北京)有限公司 | Resource processing method and device and electronic equipment |
CN111625243A (en) * | 2020-05-13 | 2020-09-04 | 北京字节跳动网络技术有限公司 | Cross-language task processing method and device and electronic equipment |
CN111291106A (en) * | 2020-05-13 | 2020-06-16 | 成都四方伟业软件股份有限公司 | Efficient flow arrangement method and system for ETL system |
CN112130851A (en) * | 2020-08-04 | 2020-12-25 | 中科天玑数据科技股份有限公司 | Modeling method for artificial intelligence, electronic equipment and storage medium |
CN112130851B (en) * | 2020-08-04 | 2022-04-15 | 中科天玑数据科技股份有限公司 | Modeling method for artificial intelligence, electronic equipment and storage medium |
CN112632113A (en) * | 2020-12-31 | 2021-04-09 | 北京九章云极科技有限公司 | Operator management method and operator management system |
CN113342346A (en) * | 2021-05-18 | 2021-09-03 | 北京百度网讯科技有限公司 | Operator registration method, device, equipment and storage medium of deep learning framework |
US11625248B2 (en) | 2021-05-18 | 2023-04-11 | Beijing Baidu Netcom Science Technology Co., Ltd. | Operator registration method and apparatus for deep learning framework, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107463595A (en) | A kind of data processing method and system based on Spark | |
CN105550268B (en) | Big data process modeling analysis engine | |
US10764370B2 (en) | Hybrid cloud migration delay risk prediction engine | |
US6799314B2 (en) | Work flow management method and work flow management system of controlling a work flow | |
CN106022007B (en) | The cloud platform system and method learning big data and calculating is organized towards biology | |
US10769147B2 (en) | Batch data query method and apparatus | |
US8560636B2 (en) | Methods and systems for providing a virtual network process context for network participant processes in a networked business process | |
CA2845059C (en) | Test script generation system | |
CN110168518A (en) | Prepare and arrange the user interface of the data for subsequent analysis | |
CN105893509B (en) | A kind of label of big data analysis model and explain system and method | |
US20070083875A1 (en) | Method of delegating activity in service oriented architectures using continuations | |
US20120054335A1 (en) | Methods and systems for managing quality of services for network participants in a networked business process | |
WO2016165321A1 (en) | Method and apparatus for establishing requirement meta model for high-speed train | |
US20110126199A1 (en) | Method and Apparatus for Communicating During Automated Data Processing | |
US20070106515A1 (en) | Automated interactive statistical call visualization using abstractions stack model framework | |
CN116560626A (en) | Data processing method, system, equipment and storage medium based on custom rules | |
US8819619B2 (en) | Method and system for capturing user interface structure in a model based software system | |
CN109284324A (en) | The dispatching device of flow tasks based on Apache Oozie frame processing big data | |
CN113568604B (en) | Method and device for updating wind control strategy and computer readable storage medium | |
CN106294185A (en) | Automated test frames based on five layers of framework and method | |
CN112686580A (en) | Workflow definition method and system capable of customizing flow | |
WO2022253165A1 (en) | Scheduling method, system, server and computer readable storage medium | |
CN104660697B (en) | Based on Kepler scientific workflow Sensor Network service combining methods | |
US11586643B2 (en) | Enabling dynamic data capture with database objects | |
CN111160403B (en) | API (application program interface) multiplexing discovery method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171212 |