CN105975600A - Big data processing task optimization method and device - Google Patents

Big data processing task optimization method and device Download PDF

Info

Publication number
CN105975600A
CN105975600A CN201610308355.6A CN201610308355A CN105975600A CN 105975600 A CN105975600 A CN 105975600A CN 201610308355 A CN201610308355 A CN 201610308355A CN 105975600 A CN105975600 A CN 105975600A
Authority
CN
China
Prior art keywords
task
data
intermediate data
process method
business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610308355.6A
Other languages
Chinese (zh)
Inventor
刘宏斌
国铁龙
向滔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LeTV Holding Beijing Co Ltd
LeTV Information Technology Beijing Co Ltd
Original Assignee
LeTV Holding Beijing Co Ltd
LeTV Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LeTV Holding Beijing Co Ltd, LeTV Information Technology Beijing Co Ltd filed Critical LeTV Holding Beijing Co Ltd
Priority to CN201610308355.6A priority Critical patent/CN105975600A/en
Publication of CN105975600A publication Critical patent/CN105975600A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a big data processing task optimization method and device; the method comprises the following steps: analyzing data processing logic of various tasks; determining data relations between the tasks according to the data processing logic of various tasks; analyzing the data relations so as to determine whether to combine the tasks into one task or split one of the tasks into more tasks. The method and device can reduce the number of to-be-executed calculation tasks in the data warehouse, thus saving calculating resource, and improving data warehouse processing efficiency.

Description

The task optimization method and apparatus of big Data processing
Technical field
The invention belongs to computer realm, specifically, relate to the task optimization of a kind of big Data processing Method and apparatus.
Background technology
Along with the fast development of the Internet, a lot of Internet firms have accumulated the data of TB magnitude the most. Data warehouse every day is all receiving the data from different ecological, such as from mobile phone, intelligent television, regard Frequently the user data record etc. of website, as a part for big data resource.
Data enter data warehouse from the entrance machine of data warehouse and are layered inside data warehouse, Being required for carrying out data process, each data handling procedure is all the set of multiple task, each task Having the process logic of inherence, such as task 1 is to be write by the digital independent of the part field in A table again To B table.Sometimes, when a lot of data engineering teachers are required for some data, different data engineering Shi Li Getting required data method path by available data may be different, now arises that a lot of weight Multiple task, although or do not repeat, but purpose is identical task.Sometimes, some task is to table The renewal of some field process relatively slow, cause follow-up task of utilizing other fields to carry out processing also to need Waiting that this table has updated and cannot continue to process downwards, the time causing task to process is slack-off.
These problems are all not in place and cause owing to task inherent processing logical analysis, result in A lot of wastes calculating resource, affect the processing speed of data warehouse.
Summary of the invention
In view of this, task optimization method and the dress of a kind of big Data processing are embodiments provided Put, not in place and cause waste owing to task inherent to be processed logical analysis in order to solve in prior art Calculate the technical problem of resource.
In order to solve above-mentioned technical problem, the invention discloses the task optimization side of a kind of big Data processing Method, including: analyze the data process method of multiple task;Data process method according to multiple tasks is true Data relationship between fixed multiple tasks;Data relationship is analyzed, it is determined whether multiple tasks are closed And be a task or a task in multiple tasks is split into multiple task.
In order to solve above-mentioned technical problem, the invention also discloses the task optimization of a kind of big Data processing Device, including: analyze module, for analyzing the data process method of multiple task;Determine module, use In determining the data relationship between multiple task according to the data process method of the plurality of task;Process mould Block, for being analyzed described data relationship, it is determined whether is one by the plurality of task merging and appoints It is engaged in or a task in the plurality of task is split into multiple task.
Compared with prior art, the embodiment of the present invention provide big Data processing task optimization method and Device, by being analyzed the data process method of tasks multiple in data warehouse, obtains between task Data relationship, determine whether task to be merged or splits according to this data relationship, improve data The tasks carrying efficiency in warehouse, contributes to calculating data warehouse the Appropriate application of resource.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to reality Execute the required accompanying drawing used in example or description of the prior art to be briefly described, it should be apparent that under, Accompanying drawing during face describes is some embodiments of the present invention, for those of ordinary skill in the art, On the premise of not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the flow process of the task optimization method of a kind of big Data processing that the embodiment of the present invention provides Figure;
Fig. 2 is the flow process of the task optimization method of a kind of big Data processing that the embodiment of the present invention provides Figure;
Fig. 3 is the flow process of the task optimization method of a kind of big Data processing that the embodiment of the present invention provides Figure;
Fig. 4 is the block diagram of the task optimization device of a kind of big Data processing that the embodiment of the present invention provides.
Detailed description of the invention
For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with this Accompanying drawing in bright embodiment, is clearly and completely described the technical scheme in the embodiment of the present invention, Obviously, described embodiment is a part of embodiment of the present invention rather than whole embodiments.Based on Embodiment in the present invention, those of ordinary skill in the art are obtained under not making creative work premise The every other embodiment obtained, broadly falls into the scope of protection of the invention.
In the embodiment of the present invention, it is analyzed for the calculating task in data warehouse, analyzes each task Data process method, find the logical relation between each task and number by data process method According to dependence, the implementation status of the intermediate data produced between each task and task is carried out point Analysis, finds the point can being optimized existing task, existing task is carried out suitable merging and fractionation, Thus save the calculating resource of data warehouse and improve the execution efficiency of task.Real to the present invention separately below The multiple-task optimization method that executing example provides illustrates respectively.
Fig. 1 is the task optimization method of a kind of big Data processing that the embodiment of the present invention provides, and is suitable for clothes Business device, the method comprises the following steps.
S10, analyzes the data process method of multiple task.
Data process method includes processing object and computational methods.Process object and include source data, number of targets According to etc., such as task T01 is to read data write table B of three fields from Table A.Computational methods are The method referring to utilize source data to generate target data, if directly reading data write table B from Table A The most there are not computational methods, and if writing the result into table after the data read from Table A are calculated , there are the computational methods between Table A and table B in this task in B.
S11, determines the data relationship between multiple task according to the data process method of multiple tasks.
Data relationship includes the intermediate data between task and data dependence relation.Such as, task T01 Read data write table B of three fields from Table A, the data in table B are sieved by task T02 Choosing, filters out and meets pre-conditioned data write table C, then, table B is just considered task T01 Intermediate data with task T02.
Data dependence relation refers to need to rely on by the task output in front execution in the task of rear execution Data.Task T01 reads the first field and the data of the second field the table B that writes direct from Table A, from Table A reads the 3rd field and the data of the 4th field, carries out pre-to the data of the 3rd field and the 4th field Cls analysis, will predict the outcome write table B;Task T02 reads the first field and the second field from table B Data are screened, by the selection result write table C;Task T03 reads from table B and predicts the outcome, in advance Survey result is estimated, by assessment result write table D.So, task T02 and task T03 just rely on In the output data of task T01, task T02 and task T03 have data dependence with task T01 respectively Relation.
S12, is analyzed data relationship, it is determined whether by multiple task mergings be a task or A task in multiple tasks is split into multiple task.
Analyse whether to exist in the intermediate data that will not be used or multiple task in first carrying out of task Whether can affect the execution efficiency of task in rear execution.If there is the intermediate data that will not be used, Then multiple tasks can be merged, thus reduce the quantity of execution task, save the meter of data warehouse Calculate resource.If in the task impact first carried out in the execution efficiency of task of rear execution, then by this formerly Performing of task splits into multiple task according to data dependence relation, in multiple tasks after splitting The output data of individual task as this rear execution the input data of task, make this appointing in rear execution Business can obtain its data relied on faster thus complete to perform, and improves this task in rear execution Execution efficiency.
To in data warehouse, the situation that multiple tasks merge is illustrated first below.Fig. 2 is this The task optimization method of a kind of big Data processing that inventive embodiments provides, is suitable for server, the method Comprise the following steps.
S20, analyzes the data process method of multiple task.
Data process method includes processing object and computational methods.Process object and include source data, number of targets According to etc., such as task T01 is to read data write table B of three fields from Table A.Computational methods are The method referring to utilize source data to generate target data, if directly reading data write table B from Table A The most there are not computational methods, and if writing the result into table after the data read from Table A are calculated , there are the computational methods between Table A and table B in this task in B.
S21, determines the intermediate data produced between multiple task according to the data process method of multiple tasks.
From the data process method of multiple tasks, find out the logical relation between multiple task.Such as, Task T01 reads data write table B of three fields from Table A, and task T02 is to three in table B The data of individual field are screened, and filter out and meet pre-conditioned data write table C, task T03 The data of reading table C are also added in table D.Can be seen that task T01 to T03 be according to each other it Between logical relation carry out successively.After finding the logical relation between multiple task, it is possible to determine each Which intermediate data is all created, during table B in upper example and table C i.e. can be determined that between individual task Between data.
The calculation that different data engineering teachers is arranged obtaining target data can be different, sometimes Also can obtain some intermediate data according to the actual demand of its business being responsible for calculate for carrying out other Use.Accordingly, it would be desirable to determine whether that these intermediate data can be used, namely judge in the middle of these Whether data are necessary preserves.
S22, analyzes the use state of intermediate data to determine that intermediate data is the need of continuing to be saved.
Use state includes whether this intermediate data can be used for other and calculate, and this intermediate data itself It it is whether the final result of other task chains.Therefore, for intermediate data the need of the judgement preserved, Can carry out in several ways.
In one embodiment, this step S22 can be implemented as following steps further.
According to business demand, S220, analyzes whether intermediate data is used in business.
Business demand includes that whether these data exist for calculating and this intermediate data of other business datum It business is the most also the final result needing to utilize.Such as, intermediate data B have recorded each door in Shanghai Shop is at the intelligent television sales volume in January, 2016 to March, if also needing in business filter out further First five shops of sales volume ranking, then represent this intermediate data B and also can be used;Or, this mediant According to B inherently one statistically sea market in the task of the intelligent television sales volume in January, 2016 to March The final result of chain, then represent this intermediate data and be also required to be used.
S221, when intermediate data is not used in business, determines that intermediate data needs not continue to be protected Deposit.
Achieve, according in the service logic preset, the actual demand of data is judged task middle-of-chain Data are the need of being saved.
In another embodiment, this step S22 can also be implemented as following steps further.
S222, the accumulation duration being not used by of median average evidence, when accumulation duration reaches pre-determined threshold Time, labelling intermediate data is the data being not used.
For the intermediate data being judged as in task chain, the accumulation that this intermediate data is not used by can be added up Duration, such as, as long as no there is the read operation for intermediate data B, just illustrates this intermediate data B is not used, when intermediate data B is read out, and accumulation duration will be cleared and restart timing, If all there is no the read operation for intermediate data B in preset duration (such as 12 hours), then mark Remember that this intermediate data B is the data being not used.
In order to reduce the probability that erroneous judgement occurs, also can be marked as being not used to this intermediate data further The number of times of data add up.If be still not used in these data of ensuing preset duration, This intermediate data of labelling is the data that will not be used the most again.
S223, the number of times of data being marked as being not used when intermediate data is more than or equal to pre-determined threshold Time, determine that intermediate data needs not continue to be saved.
Such as, intermediate data B is marked as the data being not used for the most continuous 10 times, then it is believed that These data need not continue to preserve.
The appearance of this intermediate data that will not be used is the most all owing to different data engineering teachers is led to Cross and artificially configure when different modes obtains target data, random can stronger again will not be by other Other data engineerings teacher is utilized.
S23, when intermediate data need not be saved, according to data process method by multiple task mergings It it is a task.
As above, in example, if table B is judged as the intermediate data that need not preserve, then process according to data Task T01 and T02 are merged into T12 by logic, and the process object of task T12 after merging is exactly Table A With table C, computational methods are merged into the most accordingly and are read the data of three fields from Table A and according to presetting bar Part screens, by the selection result write table B.If table C is judged as the mediant that need not preserve According to, then according to data process method, task T02 and T03 are merged into T23, task T23 after merging Process object be exactly table B and table D, computational methods are merged into the most accordingly to three field datas in table B Carry out screening and adding the selection result to table D.If table B and table C is judged as need not preserve Intermediate data, then according to data process method, task T01, T02 and T03 are merged into T13, close The process object of task T13 after and is exactly Table A and table D, and computational methods are merged into the most accordingly from Table A The data of middle three fields of reading are also screened according to pre-conditioned, and the selection result is added to table D.
If it is to say, there is the intermediate data that will not be used between two tasks, then can with this two One task of individual task merging, if multiple intermediate data that will not be used occur continuously, then can be by Multiple task mergings are a task, thus decrease the calculating number of tasks needing to perform in data warehouse Amount, has saved calculating resource, has been favorably improved the treatment effeciency of data warehouse.
In one embodiment, the task optimization method of above-mentioned big Data processing can farther include following Step.
S24, judges whether to exist according to data process method simultaneously and multiple can produce identical intermediate data Task.
S25, when there is the multiple task of can produce identical intermediate data simultaneously, can produce multiple The task merging of raw identical intermediate data is a task.
The plurality of task of can produce identical intermediate data comes from the configuration of different pieces of information engineer.Example As, everybody the most known Table A that exists, first needs to extract data write table B of three fields in Table A, It is predicted analyzing to the data of table B, output analysis result to table C;And second needs to extract phase in Table A The data of table B are screened and result are exported table by the data of three same fields write table B D.Visible now existence two reads three field datas the task of write table B from Table A, then will The two task merging is one, and other follow-up works of the first and second configurations utilize the task after this merging jointly Output result.
The multiple tasks simultaneously producing identical intermediate data are merged, calculating can be reduced further and appoint The quantity of business, saves and calculates resource.
To in data warehouse, the situation that a task is split as multiple task is described further below, right The embodiment of the present invention additionally provides a kind of task optimization method of big Data processing, it is adaptable to server, As it is shown on figure 3, the method comprises the following steps.
S30, analyzes the data process method of multiple task.
S31, determines the data dependence relation between multiple task according to the data process method of multiple tasks.
Data dependence relation refers to need to rely on by the task output in front execution in the task of rear execution Data.
Such as, task T01 reads the first field and the data of the second field the table B that writes direct from Table A, Read the 3rd field and the data of the 4th field from Table A, the data of the 3rd field and the 4th field are carried out Forecast analysis, will predict the outcome write table B;Task T02 reads the first field and the second field from table B Data screen, by the selection result write table C;Task T03 reads from table B and predicts the outcome, right Predict the outcome and be estimated, by assessment result write table D.It can be seen that task T02 depends on T01 In the first field and the data of the second field, task T03 depends on the data that predict the outcome in T01.
S32, according to data dependence relation, it is judged that whether first carrying out in multiple tasks of task has influence on The execution efficiency of the task of rear execution.
In upper example, the execution of task T02 and task T03 needs wait task T01 to perform, and Owing to the process of forecast analysis can be relatively slow, even if therefore the data of the first field and the second field are Being written into table B, task T02 can not start to perform, and needs wait to predict the outcome and is written into table B, appoints Business T01 starts to perform task T02 after being finished again, and table B is predicted the outcome and do not deposits by task T02 In dependence, it is seen then that now had influence on the execution in rear task T02 in first carrying out of task T01 Efficiency, the task T02 of delaying starts the time performed.
The appearance of this situation be often as data engineering teacher consider when task configures the most thorough and Cause.
S33, when the execution efficiency of performing after the task that judgement first carries out has influence on of task, according to number According to dependence, first carrying out of task is split as multiple task, so that rear performing of task can earlier Obtain the data relied on and start to perform.
In upper example, task T01 is split, be split as task T011 and task T012, task T011 reads the first field and the data of the second field write table B from Table A, and task T012 is from Table A Reading the 3rd field and the data of the 4th field and carry out forecast analysis, will predict the outcome write table B.Appoint The execution of business T011 can the most faster, and after T011 has performed, task T02 just can start to perform, Without wait task T012 performed, thus improve rear execution task T02 perform effect Rate.
In the embodiment of the present invention, when find front task execution process to after the execution efficiency band of task When carrying out deleterious effect, split in front task according to data dependence relation, it is therefore an objective to allow after splitting Task can allow and obtain its data relied on faster in rear task, so that should can in rear task Than starting faster before splitting to perform, improve overall execution efficiency.
Assembly of the invention embodiment is presented herein below, for performing the said method embodiment of the present invention.
Fig. 4 is the task optimization device of a kind of big Data processing that the embodiment of the present invention provides, including:
Analyze module 40, for analyzing the data process method of multiple task;Determine module 41, be used for Data process method according to the plurality of task determines the data relationship between multiple task;Processing module 42, for described data relationship is analyzed, it is determined whether the plurality of task merging is one and appoints It is engaged in or a task in the plurality of task is split into multiple task.
In one embodiment, this determines that module 41 farther includes: first determines submodule, is used for Data process method according to multiple tasks determines the intermediate data produced between multiple task;
This processing module 42 farther includes: analyze submodule, for analyzing the use shape of intermediate data State is to determine that intermediate data is the need of continuing to be saved;First merges submodule, for working as intermediate data When need not be saved, it is a task according to data process method by multiple task mergings.
In one embodiment, this analysis submodule farther includes: analytic unit, according to business demand Analyze whether intermediate data is used in business;First determines unit, when intermediate data in business not When being used, determine that intermediate data needs not continue to be saved.
In one embodiment, this analysis submodule farther includes: indexing unit, median average evidence The accumulation duration being not used by, when accumulation duration reaches pre-determined threshold, labelling intermediate data be not by The data used;Second determines unit, and the number of times of data being marked as being not used when intermediate data is big In or equal to pre-determined threshold time, determine that intermediate data needs not continue to be saved.
In one embodiment, this processing module 42 also includes: first judges submodule, for basis Data process method judges whether to there is the multiple task of can produce identical intermediate data simultaneously;Second closes And submodule, for when there is the multiple task of can produce identical intermediate data simultaneously, by multiple energy The task merging enough producing identical intermediate data is a task.
In one embodiment, this determines that module 41 farther includes: second determines submodule, is used for Data process method according to multiple tasks determines the data dependence relation between multiple task;
Described processing module 42 includes: second judges submodule, is used for according to described data dependence relation, Judge the execution efficiency of task that first carrying out in multiple tasks of task performs after whether having influence on;Split Submodule, is used for when the execution efficiency of performing after the task that judgement first carries out has influence on of task, according to First carrying out of task is split as in multiple task, and the multiple tasks that will split out by data dependence relation The output data of one task are as the input data of the task of rear execution.
Come real additionally, the embodiment of the present invention can be passed through hardware processor (hardware processor) Existing each functional module above-mentioned.
The embodiment of the present invention additionally provides a kind of server, and this server includes: include processor;For The memorizer of storage processor executable;Wherein, processor is configured to: analyze multiple task Data process method;Data process method according to multiple tasks determines that the data between multiple task are closed System;Described data relationship is analyzed, it is determined whether by multiple task mergings be a task or will A task in multiple tasks splits into multiple task.
In one embodiment, the described data process method according to multiple tasks determines between multiple task Data relationship include: according to the data process method of multiple tasks determine between multiple task produce in Between data;
Described described data relationship is analyzed, it is determined whether be a task bag by multiple task mergings Include: analyze the use state of intermediate data to determine that intermediate data is the need of continuing to be saved;Work as centre When data need not be saved, it is a task according to data process method by multiple task mergings.
In one embodiment, the use state of described analysis intermediate data is to determine whether intermediate data needs Continue to be saved and include: analyze whether intermediate data is used in business according to business demand;In the middle of Between data when being not used in business, determine that intermediate data needs not continue to be saved.
In one embodiment, the use state of described analysis intermediate data is to determine whether intermediate data needs Continue to be saved to include: the accumulation duration being not used by of median average evidence, when accumulation duration reaches During pre-determined threshold, labelling intermediate data is the data being not used;When intermediate data is marked as not made Data number of times more than or equal to pre-determined threshold time, determine that intermediate data needs not continue to be saved.
In one embodiment, described data relationship is analyzed, it is determined whether by multiple task mergings Be that a task also includes: according to data process method judge whether to exist simultaneously multiple can produce identical The task of intermediate data;When there is the multiple task of can produce identical intermediate data simultaneously, by multiple The task merging that can produce identical intermediate data is a task.
In one embodiment, the described data process method according to multiple tasks determines between multiple task Data relationship include: determine that the data between multiple task depend on according to the data process method of multiple tasks The relation of relying;
Described data relationship is analyzed, it is determined whether a task in multiple tasks is split into many Individual task includes: according to data dependence relation, it is judged that whether first carrying out in multiple tasks of task affects Execution efficiency to the task of rear execution;When judging holding of performing after first carrying out of task has influence on of task During line efficiency, according to data dependence relation, first carrying out of task is split as multiple task, and will split out Multiple tasks in the output data of a task as the input data of the task of rear execution.
Device embodiment described above is only schematically, wherein said illustrates as separating component Unit can be or may not be physically separate, the parts shown as unit can be or Person may not be physical location, i.e. may be located at a place, or can also be distributed to multiple network On unit.Some or all of module therein can be selected according to the actual needs to realize the present embodiment The purpose of scheme.Those of ordinary skill in the art are not in the case of paying performing creative labour, the most permissible Understand and implement.
Through the above description of the embodiments, those skilled in the art is it can be understood that arrive each reality The mode of executing can add the mode of required general hardware platform by software and realize, naturally it is also possible to by firmly Part.Based on such understanding, the portion that prior art is contributed by technique scheme the most in other words Dividing and can embody with the form of software product, this computer software product can be stored in computer can Read in storage medium, such as ROM/RAM, magnetic disc, CD etc., including some instructions with so that one Computer equipment (can be personal computer, server, or the network equipment etc.) performs each to be implemented The method described in some part of example or embodiment.
Last it is noted that above example is only in order to illustrate technical scheme, rather than to it Limit;Although the present invention being described in detail with reference to previous embodiment, the ordinary skill of this area Personnel it is understood that the technical scheme described in foregoing embodiments still can be modified by it, or Person carries out equivalent to wherein portion of techniques feature;And these amendments or replacement, do not make corresponding skill The essence of art scheme departs from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (12)

1. the task optimization method of a big Data processing, it is characterised in that including:
Analyze the data process method of multiple task;
Data process method according to the plurality of task determines the data relationship between multiple task;
Described data relationship is analyzed, it is determined whether by the plurality of task merging be a task or A task in the plurality of task is split into multiple task by person.
Method the most according to claim 1, it is characterised in that described according to the plurality of task Data process method determine that the data relationship between multiple task includes:
Data process method according to the plurality of task determines the intermediate data produced between multiple task;
Described described data relationship is analyzed, it is determined whether the plurality of task merging is one and appoints Business includes:
Analyze the use state of described intermediate data to determine that described intermediate data is the need of continuing to be protected Deposit;
When described intermediate data need not be saved, according to described data process method by the plurality of A task is merged in business.
Method the most according to claim 2, it is characterised in that the described intermediate data of described analysis Use state to determine that described intermediate data includes the need of continuing to be saved:
Analyze whether described intermediate data is used in business according to business demand;
When described intermediate data is not used in business, determine described intermediate data need not continue to by Preserve.
Method the most according to claim 2, it is characterised in that the described intermediate data of described analysis Use state to determine that described intermediate data includes the need of continuing to be saved:
Add up the accumulation duration being not used by of described intermediate data, when described accumulation duration reaches to preset door In limited time, intermediate data described in labelling is the data being not used;
The number of times of data being marked as being not used when described intermediate data is more than or equal to pre-determined threshold Time, determine that described intermediate data needs not continue to be saved.
Method the most according to claim 2, it is characterised in that described described data relationship is entered Row is analyzed, it is determined whether be that a task also includes by the plurality of task merging:
Judge whether to exist according to data process method simultaneously and multiple can produce appointing of identical intermediate data Business;
When there is the multiple task of can produce identical intermediate data simultaneously, can produce the plurality of The task merging of identical intermediate data is a task.
Method the most according to claim 1, it is characterised in that described according to the plurality of task Data process method determine that the data relationship between multiple task includes:
Data process method according to the plurality of task determines the data dependence relation between multiple task;
Described described data relationship is analyzed, it is determined whether by a task in the plurality of task Split into multiple task to include:
According to described data dependence relation, it is judged that whether first carrying out in the plurality of task of task affects Execution efficiency to the task of rear execution;
When the execution efficiency of performing after the task that judgement first carries out has influence on of task, according to described data Described first carrying out of task is split as multiple task by dependence, and by the described multiple tasks split out In the output data of a task as the input data of performing after described of task.
7. the task optimization device of a big Data processing, it is characterised in that including:
Analyze module, for analyzing the data process method of multiple task;
Determine module, for determining between multiple task according to the data process method of the plurality of task Data relationship;
Processing module, for being analyzed described data relationship, it is determined whether the plurality of task closed And be a task or a task in the plurality of task is split into multiple task.
Device the most according to claim 7, it is characterised in that described determine that module includes:
First determines submodule, for determining multiple task according to the data process method of the plurality of task Between produce intermediate data;
Described processing module includes:
Analyze submodule, for analyzing the use state of described intermediate data to determine that described intermediate data is No needs continue to be saved;
First merges submodule, for when described intermediate data need not be saved, according to described data Processing logic is a task by the plurality of task merging.
Device the most according to claim 8, it is characterised in that described analysis submodule includes:
According to business demand, analytic unit, analyzes whether described intermediate data is used in business;
First determines unit, when described intermediate data is not used in business, determines described mediant It is saved according to needing not continue to.
Device the most according to claim 8, it is characterised in that described analysis submodule includes:
Indexing unit, adds up the accumulation duration being not used by of described intermediate data, when described accumulation duration When reaching pre-determined threshold, intermediate data described in labelling is the data being not used;
Second determines unit, when described intermediate data be marked as the number of times of data that is not used more than or During equal to pre-determined threshold, determine that described intermediate data needs not continue to be saved.
11. devices according to claim 8, it is characterised in that described processing module also includes:
First judges submodule, multiple can produce for judging whether to exist according to data process method simultaneously The task of raw identical intermediate data;
Second merges submodule, for ought there is the multiple task of can produce identical intermediate data simultaneously Time, it is a task by the plurality of task merging that can produce identical intermediate data.
12. devices according to claim 8, it is characterised in that described determine that module includes:
Second determines submodule, for determining multiple task according to the data process method of the plurality of task Between data dependence relation;
Described processing module includes:
Second judges submodule, for according to described data dependence relation, it is judged that in the plurality of task The execution efficiency of the task whether first carrying out of task performs after having influence on;
Split submodule, the execution efficiency of the task that the task for first carrying out performs after having influence on when judgement Time, according to described data dependence relation, described first carrying out of task is split as multiple task, and by described The output data of a task in the multiple tasks split out are as the input of described rear performing of task Data.
CN201610308355.6A 2016-05-11 2016-05-11 Big data processing task optimization method and device Pending CN105975600A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610308355.6A CN105975600A (en) 2016-05-11 2016-05-11 Big data processing task optimization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610308355.6A CN105975600A (en) 2016-05-11 2016-05-11 Big data processing task optimization method and device

Publications (1)

Publication Number Publication Date
CN105975600A true CN105975600A (en) 2016-09-28

Family

ID=56992907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610308355.6A Pending CN105975600A (en) 2016-05-11 2016-05-11 Big data processing task optimization method and device

Country Status (1)

Country Link
CN (1) CN105975600A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628675A (en) * 2018-05-14 2018-10-09 五八有限公司 A kind of data processing method, device, equipment and computer readable storage medium
CN109992416A (en) * 2019-03-20 2019-07-09 跬云(上海)信息科技有限公司 Multi-tenant method of servicing and device based on precomputation OLAP model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102456031A (en) * 2010-10-26 2012-05-16 腾讯科技(深圳)有限公司 MapReduce system and method for processing data streams
CN102932416A (en) * 2012-09-26 2013-02-13 东软集团股份有限公司 Intermediate data storage method, processing method and device in information flow task
CN103793530A (en) * 2014-02-26 2014-05-14 北京京东尚科信息技术有限公司 Method, device and system for cleaning up business data regularly
CN104391748A (en) * 2014-11-21 2015-03-04 浪潮电子信息产业股份有限公司 Mapreduce calculation process optimization method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102456031A (en) * 2010-10-26 2012-05-16 腾讯科技(深圳)有限公司 MapReduce system and method for processing data streams
CN102932416A (en) * 2012-09-26 2013-02-13 东软集团股份有限公司 Intermediate data storage method, processing method and device in information flow task
CN103793530A (en) * 2014-02-26 2014-05-14 北京京东尚科信息技术有限公司 Method, device and system for cleaning up business data regularly
CN104391748A (en) * 2014-11-21 2015-03-04 浪潮电子信息产业股份有限公司 Mapreduce calculation process optimization method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628675A (en) * 2018-05-14 2018-10-09 五八有限公司 A kind of data processing method, device, equipment and computer readable storage medium
CN109992416A (en) * 2019-03-20 2019-07-09 跬云(上海)信息科技有限公司 Multi-tenant method of servicing and device based on precomputation OLAP model

Similar Documents

Publication Publication Date Title
CN108960119B (en) Commodity recognition algorithm for multi-angle video fusion of unmanned sales counter
CN112181758B (en) Fault root cause positioning method based on network topology and real-time alarm
CN109271970A (en) Face datection model training method and device
US8811750B2 (en) Apparatus and method for extracting edge in image
US11580560B2 (en) Identity resolution for fraud ring detection
CN106384219A (en) Warehouse partition assisted analysis method and device
CN105678323A (en) Image-based-on method and system for analysis of users
CN111931809A (en) Data processing method and device, storage medium and electronic equipment
CN111815432A (en) Financial service risk prediction method and device
CN112532643B (en) Flow anomaly detection method, system, terminal and medium based on deep learning
CN107748898A (en) File classifying method, device, computing device and computer-readable storage medium
CN105975600A (en) Big data processing task optimization method and device
TW201732655A (en) Mining method and device for target characteristic data
CN113434685A (en) Information classification processing method and system
CN113543117B (en) Prediction method and device for number portability user and computing equipment
WO2020239910A3 (en) An intelligent computer aided decision support system
CN106909454B (en) Rule processing method and equipment
CN111160797A (en) Wind control model construction method and device, storage medium and terminal
Pourbafrani et al. Remaining time prediction for processes with inter-case dynamics
CN109977848A (en) Training method and device, the computer equipment and readable medium of pornographic detection model
CN105975577A (en) Data optimization method and device in big data processing
CN113641906A (en) System, method, device, processor and medium for realizing similar target person identification processing based on fund transaction relation data
CN114913321A (en) Object attention mining method and system based on local-to-global knowledge migration
CN106372236A (en) Comment data processing method and device
CN105468726B (en) Method for computing data and system based on local computing and distributed computing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160928