CN105975600A - Big data processing task optimization method and device - Google Patents
Big data processing task optimization method and device Download PDFInfo
- Publication number
- CN105975600A CN105975600A CN201610308355.6A CN201610308355A CN105975600A CN 105975600 A CN105975600 A CN 105975600A CN 201610308355 A CN201610308355 A CN 201610308355A CN 105975600 A CN105975600 A CN 105975600A
- Authority
- CN
- China
- Prior art keywords
- task
- data
- intermediate data
- process method
- business
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5017—Task decomposition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a big data processing task optimization method and device; the method comprises the following steps: analyzing data processing logic of various tasks; determining data relations between the tasks according to the data processing logic of various tasks; analyzing the data relations so as to determine whether to combine the tasks into one task or split one of the tasks into more tasks. The method and device can reduce the number of to-be-executed calculation tasks in the data warehouse, thus saving calculating resource, and improving data warehouse processing efficiency.
Description
Technical field
The invention belongs to computer realm, specifically, relate to the task optimization of a kind of big Data processing
Method and apparatus.
Background technology
Along with the fast development of the Internet, a lot of Internet firms have accumulated the data of TB magnitude the most.
Data warehouse every day is all receiving the data from different ecological, such as from mobile phone, intelligent television, regard
Frequently the user data record etc. of website, as a part for big data resource.
Data enter data warehouse from the entrance machine of data warehouse and are layered inside data warehouse,
Being required for carrying out data process, each data handling procedure is all the set of multiple task, each task
Having the process logic of inherence, such as task 1 is to be write by the digital independent of the part field in A table again
To B table.Sometimes, when a lot of data engineering teachers are required for some data, different data engineering Shi Li
Getting required data method path by available data may be different, now arises that a lot of weight
Multiple task, although or do not repeat, but purpose is identical task.Sometimes, some task is to table
The renewal of some field process relatively slow, cause follow-up task of utilizing other fields to carry out processing also to need
Waiting that this table has updated and cannot continue to process downwards, the time causing task to process is slack-off.
These problems are all not in place and cause owing to task inherent processing logical analysis, result in
A lot of wastes calculating resource, affect the processing speed of data warehouse.
Summary of the invention
In view of this, task optimization method and the dress of a kind of big Data processing are embodiments provided
Put, not in place and cause waste owing to task inherent to be processed logical analysis in order to solve in prior art
Calculate the technical problem of resource.
In order to solve above-mentioned technical problem, the invention discloses the task optimization side of a kind of big Data processing
Method, including: analyze the data process method of multiple task;Data process method according to multiple tasks is true
Data relationship between fixed multiple tasks;Data relationship is analyzed, it is determined whether multiple tasks are closed
And be a task or a task in multiple tasks is split into multiple task.
In order to solve above-mentioned technical problem, the invention also discloses the task optimization of a kind of big Data processing
Device, including: analyze module, for analyzing the data process method of multiple task;Determine module, use
In determining the data relationship between multiple task according to the data process method of the plurality of task;Process mould
Block, for being analyzed described data relationship, it is determined whether is one by the plurality of task merging and appoints
It is engaged in or a task in the plurality of task is split into multiple task.
Compared with prior art, the embodiment of the present invention provide big Data processing task optimization method and
Device, by being analyzed the data process method of tasks multiple in data warehouse, obtains between task
Data relationship, determine whether task to be merged or splits according to this data relationship, improve data
The tasks carrying efficiency in warehouse, contributes to calculating data warehouse the Appropriate application of resource.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to reality
Execute the required accompanying drawing used in example or description of the prior art to be briefly described, it should be apparent that under,
Accompanying drawing during face describes is some embodiments of the present invention, for those of ordinary skill in the art,
On the premise of not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the flow process of the task optimization method of a kind of big Data processing that the embodiment of the present invention provides
Figure;
Fig. 2 is the flow process of the task optimization method of a kind of big Data processing that the embodiment of the present invention provides
Figure;
Fig. 3 is the flow process of the task optimization method of a kind of big Data processing that the embodiment of the present invention provides
Figure;
Fig. 4 is the block diagram of the task optimization device of a kind of big Data processing that the embodiment of the present invention provides.
Detailed description of the invention
For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with this
Accompanying drawing in bright embodiment, is clearly and completely described the technical scheme in the embodiment of the present invention,
Obviously, described embodiment is a part of embodiment of the present invention rather than whole embodiments.Based on
Embodiment in the present invention, those of ordinary skill in the art are obtained under not making creative work premise
The every other embodiment obtained, broadly falls into the scope of protection of the invention.
In the embodiment of the present invention, it is analyzed for the calculating task in data warehouse, analyzes each task
Data process method, find the logical relation between each task and number by data process method
According to dependence, the implementation status of the intermediate data produced between each task and task is carried out point
Analysis, finds the point can being optimized existing task, existing task is carried out suitable merging and fractionation,
Thus save the calculating resource of data warehouse and improve the execution efficiency of task.Real to the present invention separately below
The multiple-task optimization method that executing example provides illustrates respectively.
Fig. 1 is the task optimization method of a kind of big Data processing that the embodiment of the present invention provides, and is suitable for clothes
Business device, the method comprises the following steps.
S10, analyzes the data process method of multiple task.
Data process method includes processing object and computational methods.Process object and include source data, number of targets
According to etc., such as task T01 is to read data write table B of three fields from Table A.Computational methods are
The method referring to utilize source data to generate target data, if directly reading data write table B from Table A
The most there are not computational methods, and if writing the result into table after the data read from Table A are calculated
, there are the computational methods between Table A and table B in this task in B.
S11, determines the data relationship between multiple task according to the data process method of multiple tasks.
Data relationship includes the intermediate data between task and data dependence relation.Such as, task T01
Read data write table B of three fields from Table A, the data in table B are sieved by task T02
Choosing, filters out and meets pre-conditioned data write table C, then, table B is just considered task T01
Intermediate data with task T02.
Data dependence relation refers to need to rely on by the task output in front execution in the task of rear execution
Data.Task T01 reads the first field and the data of the second field the table B that writes direct from Table A, from
Table A reads the 3rd field and the data of the 4th field, carries out pre-to the data of the 3rd field and the 4th field
Cls analysis, will predict the outcome write table B;Task T02 reads the first field and the second field from table B
Data are screened, by the selection result write table C;Task T03 reads from table B and predicts the outcome, in advance
Survey result is estimated, by assessment result write table D.So, task T02 and task T03 just rely on
In the output data of task T01, task T02 and task T03 have data dependence with task T01 respectively
Relation.
S12, is analyzed data relationship, it is determined whether by multiple task mergings be a task or
A task in multiple tasks is split into multiple task.
Analyse whether to exist in the intermediate data that will not be used or multiple task in first carrying out of task
Whether can affect the execution efficiency of task in rear execution.If there is the intermediate data that will not be used,
Then multiple tasks can be merged, thus reduce the quantity of execution task, save the meter of data warehouse
Calculate resource.If in the task impact first carried out in the execution efficiency of task of rear execution, then by this formerly
Performing of task splits into multiple task according to data dependence relation, in multiple tasks after splitting
The output data of individual task as this rear execution the input data of task, make this appointing in rear execution
Business can obtain its data relied on faster thus complete to perform, and improves this task in rear execution
Execution efficiency.
To in data warehouse, the situation that multiple tasks merge is illustrated first below.Fig. 2 is this
The task optimization method of a kind of big Data processing that inventive embodiments provides, is suitable for server, the method
Comprise the following steps.
S20, analyzes the data process method of multiple task.
Data process method includes processing object and computational methods.Process object and include source data, number of targets
According to etc., such as task T01 is to read data write table B of three fields from Table A.Computational methods are
The method referring to utilize source data to generate target data, if directly reading data write table B from Table A
The most there are not computational methods, and if writing the result into table after the data read from Table A are calculated
, there are the computational methods between Table A and table B in this task in B.
S21, determines the intermediate data produced between multiple task according to the data process method of multiple tasks.
From the data process method of multiple tasks, find out the logical relation between multiple task.Such as,
Task T01 reads data write table B of three fields from Table A, and task T02 is to three in table B
The data of individual field are screened, and filter out and meet pre-conditioned data write table C, task T03
The data of reading table C are also added in table D.Can be seen that task T01 to T03 be according to each other it
Between logical relation carry out successively.After finding the logical relation between multiple task, it is possible to determine each
Which intermediate data is all created, during table B in upper example and table C i.e. can be determined that between individual task
Between data.
The calculation that different data engineering teachers is arranged obtaining target data can be different, sometimes
Also can obtain some intermediate data according to the actual demand of its business being responsible for calculate for carrying out other
Use.Accordingly, it would be desirable to determine whether that these intermediate data can be used, namely judge in the middle of these
Whether data are necessary preserves.
S22, analyzes the use state of intermediate data to determine that intermediate data is the need of continuing to be saved.
Use state includes whether this intermediate data can be used for other and calculate, and this intermediate data itself
It it is whether the final result of other task chains.Therefore, for intermediate data the need of the judgement preserved,
Can carry out in several ways.
In one embodiment, this step S22 can be implemented as following steps further.
According to business demand, S220, analyzes whether intermediate data is used in business.
Business demand includes that whether these data exist for calculating and this intermediate data of other business datum
It business is the most also the final result needing to utilize.Such as, intermediate data B have recorded each door in Shanghai
Shop is at the intelligent television sales volume in January, 2016 to March, if also needing in business filter out further
First five shops of sales volume ranking, then represent this intermediate data B and also can be used;Or, this mediant
According to B inherently one statistically sea market in the task of the intelligent television sales volume in January, 2016 to March
The final result of chain, then represent this intermediate data and be also required to be used.
S221, when intermediate data is not used in business, determines that intermediate data needs not continue to be protected
Deposit.
Achieve, according in the service logic preset, the actual demand of data is judged task middle-of-chain
Data are the need of being saved.
In another embodiment, this step S22 can also be implemented as following steps further.
S222, the accumulation duration being not used by of median average evidence, when accumulation duration reaches pre-determined threshold
Time, labelling intermediate data is the data being not used.
For the intermediate data being judged as in task chain, the accumulation that this intermediate data is not used by can be added up
Duration, such as, as long as no there is the read operation for intermediate data B, just illustrates this intermediate data
B is not used, when intermediate data B is read out, and accumulation duration will be cleared and restart timing,
If all there is no the read operation for intermediate data B in preset duration (such as 12 hours), then mark
Remember that this intermediate data B is the data being not used.
In order to reduce the probability that erroneous judgement occurs, also can be marked as being not used to this intermediate data further
The number of times of data add up.If be still not used in these data of ensuing preset duration,
This intermediate data of labelling is the data that will not be used the most again.
S223, the number of times of data being marked as being not used when intermediate data is more than or equal to pre-determined threshold
Time, determine that intermediate data needs not continue to be saved.
Such as, intermediate data B is marked as the data being not used for the most continuous 10 times, then it is believed that
These data need not continue to preserve.
The appearance of this intermediate data that will not be used is the most all owing to different data engineering teachers is led to
Cross and artificially configure when different modes obtains target data, random can stronger again will not be by other
Other data engineerings teacher is utilized.
S23, when intermediate data need not be saved, according to data process method by multiple task mergings
It it is a task.
As above, in example, if table B is judged as the intermediate data that need not preserve, then process according to data
Task T01 and T02 are merged into T12 by logic, and the process object of task T12 after merging is exactly Table A
With table C, computational methods are merged into the most accordingly and are read the data of three fields from Table A and according to presetting bar
Part screens, by the selection result write table B.If table C is judged as the mediant that need not preserve
According to, then according to data process method, task T02 and T03 are merged into T23, task T23 after merging
Process object be exactly table B and table D, computational methods are merged into the most accordingly to three field datas in table B
Carry out screening and adding the selection result to table D.If table B and table C is judged as need not preserve
Intermediate data, then according to data process method, task T01, T02 and T03 are merged into T13, close
The process object of task T13 after and is exactly Table A and table D, and computational methods are merged into the most accordingly from Table A
The data of middle three fields of reading are also screened according to pre-conditioned, and the selection result is added to table D.
If it is to say, there is the intermediate data that will not be used between two tasks, then can with this two
One task of individual task merging, if multiple intermediate data that will not be used occur continuously, then can be by
Multiple task mergings are a task, thus decrease the calculating number of tasks needing to perform in data warehouse
Amount, has saved calculating resource, has been favorably improved the treatment effeciency of data warehouse.
In one embodiment, the task optimization method of above-mentioned big Data processing can farther include following
Step.
S24, judges whether to exist according to data process method simultaneously and multiple can produce identical intermediate data
Task.
S25, when there is the multiple task of can produce identical intermediate data simultaneously, can produce multiple
The task merging of raw identical intermediate data is a task.
The plurality of task of can produce identical intermediate data comes from the configuration of different pieces of information engineer.Example
As, everybody the most known Table A that exists, first needs to extract data write table B of three fields in Table A,
It is predicted analyzing to the data of table B, output analysis result to table C;And second needs to extract phase in Table A
The data of table B are screened and result are exported table by the data of three same fields write table B
D.Visible now existence two reads three field datas the task of write table B from Table A, then will
The two task merging is one, and other follow-up works of the first and second configurations utilize the task after this merging jointly
Output result.
The multiple tasks simultaneously producing identical intermediate data are merged, calculating can be reduced further and appoint
The quantity of business, saves and calculates resource.
To in data warehouse, the situation that a task is split as multiple task is described further below, right
The embodiment of the present invention additionally provides a kind of task optimization method of big Data processing, it is adaptable to server,
As it is shown on figure 3, the method comprises the following steps.
S30, analyzes the data process method of multiple task.
S31, determines the data dependence relation between multiple task according to the data process method of multiple tasks.
Data dependence relation refers to need to rely on by the task output in front execution in the task of rear execution
Data.
Such as, task T01 reads the first field and the data of the second field the table B that writes direct from Table A,
Read the 3rd field and the data of the 4th field from Table A, the data of the 3rd field and the 4th field are carried out
Forecast analysis, will predict the outcome write table B;Task T02 reads the first field and the second field from table B
Data screen, by the selection result write table C;Task T03 reads from table B and predicts the outcome, right
Predict the outcome and be estimated, by assessment result write table D.It can be seen that task T02 depends on T01
In the first field and the data of the second field, task T03 depends on the data that predict the outcome in T01.
S32, according to data dependence relation, it is judged that whether first carrying out in multiple tasks of task has influence on
The execution efficiency of the task of rear execution.
In upper example, the execution of task T02 and task T03 needs wait task T01 to perform, and
Owing to the process of forecast analysis can be relatively slow, even if therefore the data of the first field and the second field are
Being written into table B, task T02 can not start to perform, and needs wait to predict the outcome and is written into table B, appoints
Business T01 starts to perform task T02 after being finished again, and table B is predicted the outcome and do not deposits by task T02
In dependence, it is seen then that now had influence on the execution in rear task T02 in first carrying out of task T01
Efficiency, the task T02 of delaying starts the time performed.
The appearance of this situation be often as data engineering teacher consider when task configures the most thorough and
Cause.
S33, when the execution efficiency of performing after the task that judgement first carries out has influence on of task, according to number
According to dependence, first carrying out of task is split as multiple task, so that rear performing of task can earlier
Obtain the data relied on and start to perform.
In upper example, task T01 is split, be split as task T011 and task T012, task
T011 reads the first field and the data of the second field write table B from Table A, and task T012 is from Table A
Reading the 3rd field and the data of the 4th field and carry out forecast analysis, will predict the outcome write table B.Appoint
The execution of business T011 can the most faster, and after T011 has performed, task T02 just can start to perform,
Without wait task T012 performed, thus improve rear execution task T02 perform effect
Rate.
In the embodiment of the present invention, when find front task execution process to after the execution efficiency band of task
When carrying out deleterious effect, split in front task according to data dependence relation, it is therefore an objective to allow after splitting
Task can allow and obtain its data relied on faster in rear task, so that should can in rear task
Than starting faster before splitting to perform, improve overall execution efficiency.
Assembly of the invention embodiment is presented herein below, for performing the said method embodiment of the present invention.
Fig. 4 is the task optimization device of a kind of big Data processing that the embodiment of the present invention provides, including:
Analyze module 40, for analyzing the data process method of multiple task;Determine module 41, be used for
Data process method according to the plurality of task determines the data relationship between multiple task;Processing module
42, for described data relationship is analyzed, it is determined whether the plurality of task merging is one and appoints
It is engaged in or a task in the plurality of task is split into multiple task.
In one embodiment, this determines that module 41 farther includes: first determines submodule, is used for
Data process method according to multiple tasks determines the intermediate data produced between multiple task;
This processing module 42 farther includes: analyze submodule, for analyzing the use shape of intermediate data
State is to determine that intermediate data is the need of continuing to be saved;First merges submodule, for working as intermediate data
When need not be saved, it is a task according to data process method by multiple task mergings.
In one embodiment, this analysis submodule farther includes: analytic unit, according to business demand
Analyze whether intermediate data is used in business;First determines unit, when intermediate data in business not
When being used, determine that intermediate data needs not continue to be saved.
In one embodiment, this analysis submodule farther includes: indexing unit, median average evidence
The accumulation duration being not used by, when accumulation duration reaches pre-determined threshold, labelling intermediate data be not by
The data used;Second determines unit, and the number of times of data being marked as being not used when intermediate data is big
In or equal to pre-determined threshold time, determine that intermediate data needs not continue to be saved.
In one embodiment, this processing module 42 also includes: first judges submodule, for basis
Data process method judges whether to there is the multiple task of can produce identical intermediate data simultaneously;Second closes
And submodule, for when there is the multiple task of can produce identical intermediate data simultaneously, by multiple energy
The task merging enough producing identical intermediate data is a task.
In one embodiment, this determines that module 41 farther includes: second determines submodule, is used for
Data process method according to multiple tasks determines the data dependence relation between multiple task;
Described processing module 42 includes: second judges submodule, is used for according to described data dependence relation,
Judge the execution efficiency of task that first carrying out in multiple tasks of task performs after whether having influence on;Split
Submodule, is used for when the execution efficiency of performing after the task that judgement first carries out has influence on of task, according to
First carrying out of task is split as in multiple task, and the multiple tasks that will split out by data dependence relation
The output data of one task are as the input data of the task of rear execution.
Come real additionally, the embodiment of the present invention can be passed through hardware processor (hardware processor)
Existing each functional module above-mentioned.
The embodiment of the present invention additionally provides a kind of server, and this server includes: include processor;For
The memorizer of storage processor executable;Wherein, processor is configured to: analyze multiple task
Data process method;Data process method according to multiple tasks determines that the data between multiple task are closed
System;Described data relationship is analyzed, it is determined whether by multiple task mergings be a task or will
A task in multiple tasks splits into multiple task.
In one embodiment, the described data process method according to multiple tasks determines between multiple task
Data relationship include: according to the data process method of multiple tasks determine between multiple task produce in
Between data;
Described described data relationship is analyzed, it is determined whether be a task bag by multiple task mergings
Include: analyze the use state of intermediate data to determine that intermediate data is the need of continuing to be saved;Work as centre
When data need not be saved, it is a task according to data process method by multiple task mergings.
In one embodiment, the use state of described analysis intermediate data is to determine whether intermediate data needs
Continue to be saved and include: analyze whether intermediate data is used in business according to business demand;In the middle of
Between data when being not used in business, determine that intermediate data needs not continue to be saved.
In one embodiment, the use state of described analysis intermediate data is to determine whether intermediate data needs
Continue to be saved to include: the accumulation duration being not used by of median average evidence, when accumulation duration reaches
During pre-determined threshold, labelling intermediate data is the data being not used;When intermediate data is marked as not made
Data number of times more than or equal to pre-determined threshold time, determine that intermediate data needs not continue to be saved.
In one embodiment, described data relationship is analyzed, it is determined whether by multiple task mergings
Be that a task also includes: according to data process method judge whether to exist simultaneously multiple can produce identical
The task of intermediate data;When there is the multiple task of can produce identical intermediate data simultaneously, by multiple
The task merging that can produce identical intermediate data is a task.
In one embodiment, the described data process method according to multiple tasks determines between multiple task
Data relationship include: determine that the data between multiple task depend on according to the data process method of multiple tasks
The relation of relying;
Described data relationship is analyzed, it is determined whether a task in multiple tasks is split into many
Individual task includes: according to data dependence relation, it is judged that whether first carrying out in multiple tasks of task affects
Execution efficiency to the task of rear execution;When judging holding of performing after first carrying out of task has influence on of task
During line efficiency, according to data dependence relation, first carrying out of task is split as multiple task, and will split out
Multiple tasks in the output data of a task as the input data of the task of rear execution.
Device embodiment described above is only schematically, wherein said illustrates as separating component
Unit can be or may not be physically separate, the parts shown as unit can be or
Person may not be physical location, i.e. may be located at a place, or can also be distributed to multiple network
On unit.Some or all of module therein can be selected according to the actual needs to realize the present embodiment
The purpose of scheme.Those of ordinary skill in the art are not in the case of paying performing creative labour, the most permissible
Understand and implement.
Through the above description of the embodiments, those skilled in the art is it can be understood that arrive each reality
The mode of executing can add the mode of required general hardware platform by software and realize, naturally it is also possible to by firmly
Part.Based on such understanding, the portion that prior art is contributed by technique scheme the most in other words
Dividing and can embody with the form of software product, this computer software product can be stored in computer can
Read in storage medium, such as ROM/RAM, magnetic disc, CD etc., including some instructions with so that one
Computer equipment (can be personal computer, server, or the network equipment etc.) performs each to be implemented
The method described in some part of example or embodiment.
Last it is noted that above example is only in order to illustrate technical scheme, rather than to it
Limit;Although the present invention being described in detail with reference to previous embodiment, the ordinary skill of this area
Personnel it is understood that the technical scheme described in foregoing embodiments still can be modified by it, or
Person carries out equivalent to wherein portion of techniques feature;And these amendments or replacement, do not make corresponding skill
The essence of art scheme departs from the spirit and scope of various embodiments of the present invention technical scheme.
Claims (12)
1. the task optimization method of a big Data processing, it is characterised in that including:
Analyze the data process method of multiple task;
Data process method according to the plurality of task determines the data relationship between multiple task;
Described data relationship is analyzed, it is determined whether by the plurality of task merging be a task or
A task in the plurality of task is split into multiple task by person.
Method the most according to claim 1, it is characterised in that described according to the plurality of task
Data process method determine that the data relationship between multiple task includes:
Data process method according to the plurality of task determines the intermediate data produced between multiple task;
Described described data relationship is analyzed, it is determined whether the plurality of task merging is one and appoints
Business includes:
Analyze the use state of described intermediate data to determine that described intermediate data is the need of continuing to be protected
Deposit;
When described intermediate data need not be saved, according to described data process method by the plurality of
A task is merged in business.
Method the most according to claim 2, it is characterised in that the described intermediate data of described analysis
Use state to determine that described intermediate data includes the need of continuing to be saved:
Analyze whether described intermediate data is used in business according to business demand;
When described intermediate data is not used in business, determine described intermediate data need not continue to by
Preserve.
Method the most according to claim 2, it is characterised in that the described intermediate data of described analysis
Use state to determine that described intermediate data includes the need of continuing to be saved:
Add up the accumulation duration being not used by of described intermediate data, when described accumulation duration reaches to preset door
In limited time, intermediate data described in labelling is the data being not used;
The number of times of data being marked as being not used when described intermediate data is more than or equal to pre-determined threshold
Time, determine that described intermediate data needs not continue to be saved.
Method the most according to claim 2, it is characterised in that described described data relationship is entered
Row is analyzed, it is determined whether be that a task also includes by the plurality of task merging:
Judge whether to exist according to data process method simultaneously and multiple can produce appointing of identical intermediate data
Business;
When there is the multiple task of can produce identical intermediate data simultaneously, can produce the plurality of
The task merging of identical intermediate data is a task.
Method the most according to claim 1, it is characterised in that described according to the plurality of task
Data process method determine that the data relationship between multiple task includes:
Data process method according to the plurality of task determines the data dependence relation between multiple task;
Described described data relationship is analyzed, it is determined whether by a task in the plurality of task
Split into multiple task to include:
According to described data dependence relation, it is judged that whether first carrying out in the plurality of task of task affects
Execution efficiency to the task of rear execution;
When the execution efficiency of performing after the task that judgement first carries out has influence on of task, according to described data
Described first carrying out of task is split as multiple task by dependence, and by the described multiple tasks split out
In the output data of a task as the input data of performing after described of task.
7. the task optimization device of a big Data processing, it is characterised in that including:
Analyze module, for analyzing the data process method of multiple task;
Determine module, for determining between multiple task according to the data process method of the plurality of task
Data relationship;
Processing module, for being analyzed described data relationship, it is determined whether the plurality of task closed
And be a task or a task in the plurality of task is split into multiple task.
Device the most according to claim 7, it is characterised in that described determine that module includes:
First determines submodule, for determining multiple task according to the data process method of the plurality of task
Between produce intermediate data;
Described processing module includes:
Analyze submodule, for analyzing the use state of described intermediate data to determine that described intermediate data is
No needs continue to be saved;
First merges submodule, for when described intermediate data need not be saved, according to described data
Processing logic is a task by the plurality of task merging.
Device the most according to claim 8, it is characterised in that described analysis submodule includes:
According to business demand, analytic unit, analyzes whether described intermediate data is used in business;
First determines unit, when described intermediate data is not used in business, determines described mediant
It is saved according to needing not continue to.
Device the most according to claim 8, it is characterised in that described analysis submodule includes:
Indexing unit, adds up the accumulation duration being not used by of described intermediate data, when described accumulation duration
When reaching pre-determined threshold, intermediate data described in labelling is the data being not used;
Second determines unit, when described intermediate data be marked as the number of times of data that is not used more than or
During equal to pre-determined threshold, determine that described intermediate data needs not continue to be saved.
11. devices according to claim 8, it is characterised in that described processing module also includes:
First judges submodule, multiple can produce for judging whether to exist according to data process method simultaneously
The task of raw identical intermediate data;
Second merges submodule, for ought there is the multiple task of can produce identical intermediate data simultaneously
Time, it is a task by the plurality of task merging that can produce identical intermediate data.
12. devices according to claim 8, it is characterised in that described determine that module includes:
Second determines submodule, for determining multiple task according to the data process method of the plurality of task
Between data dependence relation;
Described processing module includes:
Second judges submodule, for according to described data dependence relation, it is judged that in the plurality of task
The execution efficiency of the task whether first carrying out of task performs after having influence on;
Split submodule, the execution efficiency of the task that the task for first carrying out performs after having influence on when judgement
Time, according to described data dependence relation, described first carrying out of task is split as multiple task, and by described
The output data of a task in the multiple tasks split out are as the input of described rear performing of task
Data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610308355.6A CN105975600A (en) | 2016-05-11 | 2016-05-11 | Big data processing task optimization method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610308355.6A CN105975600A (en) | 2016-05-11 | 2016-05-11 | Big data processing task optimization method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105975600A true CN105975600A (en) | 2016-09-28 |
Family
ID=56992907
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610308355.6A Pending CN105975600A (en) | 2016-05-11 | 2016-05-11 | Big data processing task optimization method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105975600A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108628675A (en) * | 2018-05-14 | 2018-10-09 | 五八有限公司 | A kind of data processing method, device, equipment and computer readable storage medium |
CN109992416A (en) * | 2019-03-20 | 2019-07-09 | 跬云(上海)信息科技有限公司 | Multi-tenant method of servicing and device based on precomputation OLAP model |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102456031A (en) * | 2010-10-26 | 2012-05-16 | 腾讯科技(深圳)有限公司 | MapReduce system and method for processing data streams |
CN102932416A (en) * | 2012-09-26 | 2013-02-13 | 东软集团股份有限公司 | Intermediate data storage method, processing method and device in information flow task |
CN103793530A (en) * | 2014-02-26 | 2014-05-14 | 北京京东尚科信息技术有限公司 | Method, device and system for cleaning up business data regularly |
CN104391748A (en) * | 2014-11-21 | 2015-03-04 | 浪潮电子信息产业股份有限公司 | Mapreduce calculation process optimization method |
-
2016
- 2016-05-11 CN CN201610308355.6A patent/CN105975600A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102456031A (en) * | 2010-10-26 | 2012-05-16 | 腾讯科技(深圳)有限公司 | MapReduce system and method for processing data streams |
CN102932416A (en) * | 2012-09-26 | 2013-02-13 | 东软集团股份有限公司 | Intermediate data storage method, processing method and device in information flow task |
CN103793530A (en) * | 2014-02-26 | 2014-05-14 | 北京京东尚科信息技术有限公司 | Method, device and system for cleaning up business data regularly |
CN104391748A (en) * | 2014-11-21 | 2015-03-04 | 浪潮电子信息产业股份有限公司 | Mapreduce calculation process optimization method |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108628675A (en) * | 2018-05-14 | 2018-10-09 | 五八有限公司 | A kind of data processing method, device, equipment and computer readable storage medium |
CN109992416A (en) * | 2019-03-20 | 2019-07-09 | 跬云(上海)信息科技有限公司 | Multi-tenant method of servicing and device based on precomputation OLAP model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108960119B (en) | Commodity recognition algorithm for multi-angle video fusion of unmanned sales counter | |
CN112181758B (en) | Fault root cause positioning method based on network topology and real-time alarm | |
CN109271970A (en) | Face datection model training method and device | |
US8811750B2 (en) | Apparatus and method for extracting edge in image | |
US11580560B2 (en) | Identity resolution for fraud ring detection | |
CN106384219A (en) | Warehouse partition assisted analysis method and device | |
CN105678323A (en) | Image-based-on method and system for analysis of users | |
CN111931809A (en) | Data processing method and device, storage medium and electronic equipment | |
CN111815432A (en) | Financial service risk prediction method and device | |
CN112532643B (en) | Flow anomaly detection method, system, terminal and medium based on deep learning | |
CN107748898A (en) | File classifying method, device, computing device and computer-readable storage medium | |
CN105975600A (en) | Big data processing task optimization method and device | |
TW201732655A (en) | Mining method and device for target characteristic data | |
CN113434685A (en) | Information classification processing method and system | |
CN113543117B (en) | Prediction method and device for number portability user and computing equipment | |
WO2020239910A3 (en) | An intelligent computer aided decision support system | |
CN106909454B (en) | Rule processing method and equipment | |
CN111160797A (en) | Wind control model construction method and device, storage medium and terminal | |
Pourbafrani et al. | Remaining time prediction for processes with inter-case dynamics | |
CN109977848A (en) | Training method and device, the computer equipment and readable medium of pornographic detection model | |
CN105975577A (en) | Data optimization method and device in big data processing | |
CN113641906A (en) | System, method, device, processor and medium for realizing similar target person identification processing based on fund transaction relation data | |
CN114913321A (en) | Object attention mining method and system based on local-to-global knowledge migration | |
CN106372236A (en) | Comment data processing method and device | |
CN105468726B (en) | Method for computing data and system based on local computing and distributed computing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160928 |