CN109684401A - Data processing method, device and system - Google Patents
Data processing method, device and system Download PDFInfo
- Publication number
- CN109684401A CN109684401A CN201811654198.XA CN201811654198A CN109684401A CN 109684401 A CN109684401 A CN 109684401A CN 201811654198 A CN201811654198 A CN 201811654198A CN 109684401 A CN109684401 A CN 109684401A
- Authority
- CN
- China
- Prior art keywords
- data
- keyword
- calculated
- packet
- data packet
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The present invention provides a kind of data processing methods, device and system, are related to the technical field of big data, if this method includes receiving data packet to be calculated, obtain the keyword in data packet to be calculated;Calculate the temperature of keyword;Judge whether temperature is higher than pre-set heat degree threshold;If so, data packet to be calculated is carried out to break up processing, multiple subdata packets are generated;Data calculating is carried out to multiple subdata packets according to pre-set computation rule.Data processing method provided by the invention, device and system, the keyword in data packet to be calculated can be obtained, and when the temperature for judging keyword is excessively high, and data packet to be calculated is carried out to break up processing, generates multiple subdata packets, finally, data calculating is carried out to it according to pre-set computation rule, passes through the judgement in advance to keyword temperature in data packet, real-time monitoring data state, data processing can be carried out in time, avoid the generation of data skew in advance.
Description
Technical field
The present invention relates to the technical fields of big data, more particularly, to a kind of data processing method, device and system.
Background technique
With the arriving of cloud era, big data has also attracted more and more concerns.During big data calculates, most intractable is asked
Topic is data skew.In general, data skew refers to the data entry for some keyword for including in data packet than other passes
The data entry of key word is mostly very much, causes the data processing amount of the data processing node where the keyword than other data processings
The data volume of node is big, so that the data processing node be made slowly to run endless phenomenon when handling data.And data skew
During typically occurring in shuffle, shuffle may be triggered using operators such as group by, reduceByKey in code
Operation.
Currently, the solution of data skew includes pre-processing data, avoid executing shuffle in Spark
Class operator, this mode is palliative, and data skew still can occur during data prediction;Other schemes are most
It is to cause the methods of inclined keyword to be handled after data skew occurs by filtering minority, cannot avoid in advance
The generation of data skew, data processing not in time, can not solution of emergent event.
Not in time for above-mentioned data processing, it is difficult to which the technical issues of avoiding data skew in advance not yet proposes have at present
The solution of effect.
Summary of the invention
In view of this, the purpose of the present invention is to provide a kind of data processing method, device and system, to alleviate above-mentioned number
Not in time according to processing, it is difficult to the technical issues of avoiding data skew in advance.
In a first aspect, the embodiment of the invention provides a kind of data processing methods, comprising: if receiving data to be calculated
Packet, obtains the keyword in data packet to be calculated;Calculate the temperature of keyword;Judge whether temperature is higher than pre-set temperature
Threshold value;If so, data packet to be calculated is carried out to break up processing, multiple subdata packets are generated;It is advised according to pre-set calculating
Data calculating then is carried out to multiple subdata packets.
With reference to first aspect, the embodiment of the invention provides the first possible embodiments of first aspect, wherein on
Stating and data packet to be calculated carries out to the step of breaing up processing, generating multiple subdata packets includes: to obtain pre-set to break up ratio
Data to be calculated are divided into multiple subdata packets according to ratio is broken up by rate, wherein the quantity that ratio includes subdata packet is broken up,
And accounting of each subdata packet relative to data packet to be calculated;The implant data packet identification code in each subdata packet, number
Data packet belonging to subdata packet is used to indicate according to packet identification code.
The possible embodiment of with reference to first aspect the first, the embodiment of the invention provides second of first aspect
Possible embodiment, wherein the above-mentioned step that according to pre-set computation rule multiple subdata packets are carried out with data calculating
It suddenly include: the package identification code for extracting each subdata packet;It is right that each subdata packet is respectively sent to package identification code
The primary node answered, so that primary node carries out data calculating according to pre-set computation rule subdata packet;It obtains every
The calculated result of a primary node, is sent to secondary nodes for calculated result and carries out data calculating, until secondary nodes are whole section
When point, calculated result is exported, wherein the data computation rule of secondary nodes is consistent with pre-set computation rule.
With reference to first aspect, the embodiment of the invention provides the third possible embodiments of first aspect, wherein on
State method further include: when receiving source data, generate data packet to be calculated according to pre-set list item;Wherein, to be calculated
Data packet includes pre-set list item and the corresponding entry of pre-set list item, includes keyword in entry.
The third possible embodiment with reference to first aspect, the embodiment of the invention provides the 4th kind of first aspect
Possible embodiment, wherein the step of temperature of above-mentioned calculating keyword includes: Mei Gebiao in statistics data packet to be calculated
The quantity for the keyword that item includes;Quantity is determined as to the temperature of keyword.
The possible embodiment of second with reference to first aspect, the embodiment of the invention provides the 5th kind of first aspect
Possible embodiment, wherein the above method further include: when judging temperature lower than pre-set heat degree threshold, according to
Pre-set computation rule carries out data calculating to data packet to be calculated.
With reference to first aspect, the embodiment of the invention provides the 6th kind of possible embodiments of first aspect, wherein on
State method further include: extract the temperature of keyword and keyword, generate keyword Thermometer;Show keyword Thermometer.
Second aspect, the embodiment of the invention also provides a kind of data processing equipments, comprising: keyword obtains module, such as
Fruit receives data packet to be calculated, and keyword obtains module and is used to obtain the keyword in data packet to be calculated;Temperature calculates mould
Block, for calculating the temperature of keyword;Judgment module, for judging whether temperature is higher than pre-set heat degree threshold;Data
Break up module, if it is judged that be it is yes, break up processing for carrying out data packet to be calculated, generate multiple subdata packets;The
One computing module, for carrying out data calculating to multiple subdata packets according to pre-set computation rule.
In conjunction with second aspect, the embodiment of the invention provides the first possible embodiments of second aspect, wherein on
State data and break up module and be also used to: obtain it is pre-set breaks up ratio, it is multiple according to breaing up ratio for data to be calculated and being divided into
Subdata packet, wherein break up quantity and each subdata packet that ratio includes subdata packet relative to data packet to be calculated
Accounting;Implant data packet identifies in each subdata packet, and package identification code is used to indicate data packet belonging to subdata packet.
In conjunction with the first possible embodiment of second aspect, the embodiment of the invention provides second of second aspect
Possible embodiment, wherein above-mentioned first computing module is also used to: the package identification code of each subdata packet is extracted;It will
Each subdata packet is respectively sent to the corresponding primary node of package identification code, so that primary node is according to pre-set meter
It calculates regular subdata packet and carries out data calculating;The calculated result for obtaining each primary node, is sent to secondary for calculated result
Node carries out data calculating, until exporting calculated result when secondary nodes are terminal note, wherein the data of secondary nodes calculate
It is regular consistent with pre-set computation rule.
In conjunction with second aspect, the embodiment of the invention provides the third possible embodiments of second aspect, wherein on
State device further include: generation module, for generating data to be calculated according to pre-set list item when receiving source data
Packet;Wherein, data packet to be calculated includes pre-set list item and the corresponding entry of pre-set list item, is wrapped in entry
Contain keyword.
In conjunction with the third possible embodiment of second aspect, the embodiment of the invention provides the 4th kind of second aspect
Possible embodiment, wherein above-mentioned temperature computing module is also used to: counting in data packet to be calculated, and each list item includes
The quantity of keyword;Quantity is determined as to the temperature of keyword.
In conjunction with second of possible embodiment of second aspect, the embodiment of the invention provides the 5th kind of second aspect
Possible embodiment, wherein above-mentioned apparatus further include: the second computing module, when judging temperature lower than pre-set heat
When spending threshold value, the second computing module is used to carry out data calculating to data packet to be calculated according to pre-set computation rule.
In conjunction with second aspect, the embodiment of the invention provides the 6th kind of possible embodiments of second aspect, wherein on
State device further include: extraction module generates keyword Thermometer for extracting the temperature of keyword and keyword;Show mould
Block, for showing keyword Thermometer.
The third aspect, the embodiment of the invention also provides a kind of data processing system, system includes memory and processing
Device, memory are used to store the program for supporting processor to execute any of the above-described method, and processor is configurable for executing and deposit
The program stored in reservoir.
Fourth aspect, the embodiment of the invention also provides a kind of computer storage mediums, refer to for storing computer program
It enables, when computer executes the computer program instructions, executes method described in first aspect.
The embodiment of the present invention bring it is following the utility model has the advantages that
Data processing method provided in an embodiment of the present invention, device and system, can obtain the pass in data packet to be calculated
Key word judges whether the temperature of keyword is excessively high, and break up to data packet to be calculated according to pre-set heat degree threshold
Processing, generates multiple subdata packets, finally, carries out data calculating to it according to pre-set computation rule, by right in advance
The judgement of keyword temperature in data packet, real-time monitoring data state can carry out data processing in time, avoid data in advance
Inclined generation.
Other features and advantages of the present invention will illustrate in the following description, also, partly become from specification
It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention are in specification, claims
And specifically noted structure is achieved and obtained in attached drawing.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate
Appended attached drawing, is described in detail below.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art
Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below
Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor
It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart of data processing method provided in an embodiment of the present invention;
Fig. 2 is the flow chart of another data processing method provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of data processing equipment provided in an embodiment of the present invention;
Fig. 4 is the structure structure chart of another data processing equipment provided in an embodiment of the present invention;
Fig. 5 is the structure structure chart of another data processing equipment provided in an embodiment of the present invention;
Fig. 6 is a kind of structural schematic diagram of data processing system provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention
Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than
Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise
Under every other embodiment obtained, shall fall within the protection scope of the present invention.
Currently, the most thorny issue is data skew, and data skew typically occurs in shuffle mistake in big data calculating
Cheng Zhong.During carrying out big data processing using Spark operation, drawing for stage is carried out generally according to shuffle class operator
Point, it, will be in the operator if performing some shuffle class operator (such as group by, reduceByKey) in code
Place, marks off a stage.When one stage starts to execute, each Task therein can be from the Task of a upper stage
The worker node at place pulls all keywords to be treated by network transmission, to all identical keys pulled
Word executes converging operation, this process is exactly shuffle, and when wherein the corresponding data volume of some keyword is especially big, meeting
Cause data skew.
There are many kinds of the solutions of data skew, and the key of data skew is caused including data prediction, filtering minority
Word, the degree of parallelism for improving shuffle operation etc..Wherein, data prediction is carried out by Hive to the scheme of data prediction, i.e.,
Data are polymerize according to keyword in advance, since data carried out converging operation in advance, in Spark operation no longer
It needs to execute this generic operation using shuffle class operator.And during data skew typically occurs in shuffle, due to this side
Case thoroughly avoids the execution shuffle class operator in Spark, so data skew will not be generated.But due to data itself
There are problems that being unevenly distributed, so still will appear data skew during pretreated, only the hair of data skew
It is raw to have advanceed in pretreatment, avoid Spark program that data skew occurs.
The schemes such as degree of parallelism that other filtering minorities lead to the keywords of data skew, improve shuffle operation, be
Processing scheme after data skew occurs, can not in time should handle the data skew situation of burst, cannot avoid in advance
The generation of data skew.
Based on this, a kind of data processing method provided in an embodiment of the present invention, device and system can carry out data in time
Processing, avoids the generation of data skew in advance.
For convenient for understanding the present embodiment, first to a kind of data processing method disclosed in the embodiment of the present invention into
Row is discussed in detail.
As shown in Figure 1, this method is applied in terminal device the embodiment of the invention provides a kind of data processing method
Data processing system, above-mentioned terminal device can be the terminals such as the computer comprising Database Systems, server, at above-mentioned data
Reason system can be data base management system SQL Server (Structured Query Language Server), can be with
It is the application program based on big data processing frame Spark, or is applied to ETL (Extract Transform Load) framework
Application program etc..
Using subscriber terminal equipment as desktop computer, data processing system therein is that the application program based on Spark is
Example, is illustrated the application background of data processing method provided in an embodiment of the present invention.In Spark job run, including it is more
A worker node is responsible for executing task task.After submitting Spark operation, Driver (driving) process, cluster management will start
Device starts multiple Executor processes according to the resource parameters of Spark operation on each worker node.Driver is responsible for tune
Degree and execution operation code, are divided into multiple stage, and each stage executes a part of code snippet, and is each
Stage creates a batch Task, these Task are assigned in each Executor process and are executed.It is all as stage
After Task is finished, results of intermediate calculations is written in each worker node local disk file, then starts to dispatch
Next stage is run, the input data of the Task of next stage is exactly the results of intermediate calculations of upper stage output.
It loops back and forth like this, until all having executed code logic, and all data has been calculated.
Data processing method provided in an embodiment of the present invention can be applied to during the shuffle of Spark operation, right
The data skew caused by shuffle class operator carries out data processing.By taking this application background as an example, this method is illustrated,
This method comprises the following steps:
Step S102 obtains the keyword in data packet to be calculated if receiving data packet to be calculated;
Step S104 calculates the temperature of keyword.
When specific implementation, in Spark operation, if code performs some shuffle class operator (such as group by),
A stage will be marked off at the operator.When one stage starts to execute, each Task therein can be from upper one
Worker node where the Task of stage pulls all keywords to be treated.It wherein, include more in worker node
A data packet includes keyword in each data packet.
When user terminal receives the data packet to be calculated pulled from worker node, can be obtained by Metrics function
Each keyword in data packet to be calculated is taken, and calculates the temperature of each keyword, wherein the temperature of keyword is to be calculated
The total quantity of the keyword in data packet.
Step S106, judges whether temperature is higher than pre-set heat degree threshold.
When specific implementation, heat degree threshold should be preset, which is the total quantity threshold value of keyword, specifically
In the data packet to be calculated of worker node, keyword quantity is excessive, causes the critical heat degree threshold of data skew.
Step S108 generates multiple subdata packets if so, carrying out above-mentioned data packet to be calculated to break up processing.
Step S110 carries out data calculating to multiple subdata packets according to pre-set computation rule.
When specific implementation, in the data packet to be calculated of worker node, the temperature (total quantity) of some keyword is higher than
Pre-set heat degree threshold then illustrates the possibility of this data skew that occurs now and then, and therefore, it is necessary to handle it.It can be with
Using ETL tool, the data in data packet to be calculated are extracted, carry out breaing up processing, generate multiple subdata packets.Generate multiple sons
After data packet, redistribute function can use, a sub- data packet is handled.
Data processing method provided in an embodiment of the present invention can obtain the keyword in data packet to be calculated, according to pre-
The heat degree threshold being first arranged judges whether the temperature of keyword is excessively high, and carries out breaing up processing to data packet to be calculated, generates more
A sub- data packet finally carries out data calculating to it according to pre-set computation rule., method is by advance to data packet
The judgement of middle keyword temperature, real-time monitoring data state can carry out data processing in time, avoid data skew in advance
Occur.
On the basis of method shown in Fig. 1, the embodiment of the invention also provides another data processing methods, specifically,
The method for computing data for breaing up processing method and subdata packet of data packet to be calculated is further described in the method,
As shown in Fig. 2, this method specifically comprises the following steps:
Step S202 obtains the keyword in data packet to be calculated if receiving data packet to be calculated;
Step S204 calculates the temperature of keyword;
Step S206, judges whether temperature is higher than pre-set heat degree threshold;
If so, executing step S208;If not, executing step S218;
Step S208, acquisition is pre-set to break up ratio, and data to be calculated are divided into multiple subnumbers according to ratio is broken up
According to packet.
When specific implementation, when subscriber terminal equipment is the computer comprising data base handling system, ETL work can use
Tool, extracts the data in data packet to be calculated, carries out breaing up processing, generates multiple subdata packets, generates the quantity of subdata packet
It is determined by breaing up ratio, wherein break up quantity and each subdata packet that ratio includes subdata packet relative to number to be calculated
According to the accounting of packet.It can use redistribute function, be configured to ratio is broken up.
By taking data packet to be calculated includes keyword a as an example, it is illustrated.When data base handling system receives number to be calculated
When according to packet, the keyword a in data packet to be calculated can be obtained, and calculate the temperature of keyword by Metrics function, when
When the temperature of keyword a is higher than pre-set heat degree threshold, according to the ratio of breaing up being arranged in redistribute function, benefit
Data packet to be calculated is extracted with ETL tool, multiple subdata packets are generated, and according to ratio is broken up, in conjunction with number to be calculated
According to the size of packet, data packet to be calculated can be divided into the subdatas packets such as 2,4 or 6, the specific number of subdata packet is answered
It is determined in conjunction with actual conditions.
Step S210, the implant data packet identification code in each subdata packet, the package identification code are used to indicate subnumber
According to data packet belonging to packet.
With the number of above-mentioned subdata packet for 4, for separately including keyword (a, 2), (a, 8), (a, 9), (a, 5),
Respectively and implant data packet identification code A and B for above-mentioned 4 sub- data packets, the keyword point of 4 sub- data packets after code implant
It Wei not (A, a, 2), (A, a, 8), (B, a, 9), (B, a, 5).When specific implementation, in different random generations, can be set according to actual needs
Code is implanted into, and it is not limited by the embodiments of the present invention.
Step S212 extracts the package identification code of each subdata packet;
Each subdata packet is respectively sent to the corresponding primary node of package identification code, so that primary by step S214
Node carries out data calculating according to pre-set computation rule subdata packet.
Above-mentioned steps, which are realized, is divided into multiple subdata packets for a data packet to be calculated, respectively according to pre-set
Computation rule subdata packet carries out data calculating, and group by sentence can be used in Spark SQL, will include keyword
(A, a, 2), the package identification code A in the subdata packet of (A, a, 8) is extracted, and data calculated result is sent to accordingly
Primary node, i.e. worker node;It will include keyword (B, a, 9) the package identification code B in the subdata packet of (B, a, 5)
It extracts, and data calculated result is sent to corresponding another primary node, i.e., another worker node.
When specific implementation, data packet to be calculated is dispersed as multiple sons by data processing method provided in an embodiment of the present invention
Data packet, and by way of code implant, so that it may it allows originally by the data of Task processing, is distributed to multiple Task,
And then solve the problems, such as that single Task processing data volume is excessive.Then the package identification code of each subdata packet implantation is extracted,
It is calculated according to pre-set computation rule, and calculated result is respectively sent to corresponding primary node.
Step S216 obtains the calculated result of each primary node, calculated result is sent to secondary nodes and carries out data
Calculate, until secondary nodes be terminal note when, export calculated result, wherein the data computation rule of secondary nodes with set in advance
The computation rule set is consistent.
Wherein, it is stored with the calculated result of subdata packet in above-mentioned primary node, needs to pull calculated result to same
Secondary nodes continue to carry out data calculating according to pre-set computation rule by the Task in secondary nodes, when secondary saves
When point is terminal, terminates data and calculate, and export calculated result.
Step S218 carries out data calculating to data packet to be calculated according to pre-set computation rule.
That is, when judging temperature lower than pre-set heat degree threshold, according to pre-set computation rule to multiple
Data packet to be calculated carries out data calculating.At this point, the risk of usually not data skew, above-mentioned break up and be implanted into without carrying out
The processing such as package identification code carries out data calculating to above-mentioned data packet to be calculated according to pre-set computation rule.
Data processing method provided in an embodiment of the present invention, data packet therein are usually the data generated by source data
Packet, in general, the source data can be initial data, for the initial data, user can carry out at data according to actual needs
Reason, therefore, user can will be when perhaps server is received when computer or server initial data input value computer
When source data, data packet to be calculated can be generated according to pre-set list item;Wherein, data packet to be calculated includes list item, with
And the corresponding entry of list item, it include keyword in entry.
When specific implementation, it can use ETL tool and the source data in Database Systems extracted, according to presetting
List item generate data packet to be calculated, by taking the form that table 1 provides as an example, be illustrated.
The data packet T to be calculated of table 1
f1 | f2 |
a | 2 |
a | 8 |
a | 9 |
a | 5 |
b | 3 |
b | 6 |
Wherein, table 1 is the data packet to be calculated that extracts from source data of ETL tool, title T, list item be f1 and
F2, wherein the list item is commonly referred to as the foundation classified to source data packet, and what such as above-mentioned f1 was indicated is that this is classified as key
Word, what f2 was indicated is corresponding entry of each keyword etc..It wherein, include six entries in the list item f1 of above-mentioned table 1, respectively
For keyword a, a, a, a, b, b;List item f2 correspond to six entries, entry 2 respectively corresponding with keyword a, with a pairs of keyword
The corresponding entry 9 of the entry 8 and keyword a answered and the corresponding entry 5 of keyword a, entry 3 corresponding with keyword b, with close
The corresponding entry 6 of key word b.
Based on above-mentioned data packet to be calculated, the step of temperature of above-mentioned calculating keyword may include: that (1) statistics is to be calculated
In data packet, which is determined as the temperature of keyword by the quantity for the keyword that each list item includes.
When specific implementation, after computer or server receive data packet to be calculated, it can be obtained by Metrics module
The keyword a and b in data packet T to be calculated are taken, the total quantity of keyword a and b are counted, and then executes the mistake of subsequent temperature judgement
Journey.
Further, for the ease of user in current data treatment process, the temperature of keyword is checked and is analyzed, on
State method further include: extract the temperature of above-mentioned keyword and keyword, generate keyword Thermometer, show keyword temperature
Table, so that user checks.
Wherein, by taking data packet to be calculated is the form of table 1 as an example, the keyword a in table 1 is extracted by Metrics module
And the temperature 4 and 2 of b and keyword a and b, generate and show that keyword Thermometer, keyword Thermometer can be shown in table 2
Form.
2 keyword Thermometer of table
Keyword | Temperature |
a | 4 |
b | 2 |
When specific implementation, by taking application Spark operation carries out big data processing as an example, to data provided in an embodiment of the present invention
The whole flow process of processing method is illustrated:
(1) when the pending source data of Database Systems, source data is extracted using ETL tool, according to pre-
The list item that is first arranged generates data packet to be calculated, specifically can be the form of the offer of table 1, wherein including with keyword a to
Calculate data packet, and the data packet to be calculated with keyword b;
(2) by Metrics module, the keyword a and its temperature 4, keyword b and its temperature 2 in table 1 are extracted, and raw
At keyword Thermometer, the form of the offer of table 2 specifically can be;
(3) it by redistribute function, presets and breaks up ratio and heat degree threshold, when the temperature of keyword a
When higher than heat degree threshold, have the tendency that causing data skew, data packet to be calculated is carried out using ETL tool to break up processing, it is raw
At four sub- data packets comprising keyword (a, 2), (a, 8), (a, 9), (a, 5);At the same time, it is assumed that the temperature of keyword b
Lower than heat degree threshold, then directly according to pre-set computation rule, at the data packet to be calculated comprising keyword b
Reason, until output calculated result;Specifically, which can be maximum Data-Statistics, minimum Data-Statistics, with
And the computation rules such as total quantity statistics, by taking the computation rule of maximum Data-Statistics as an example, if the temperature of keyword b is lower than temperature threshold
Value, then directly choosing (b, 6) item in table 1.
(4) it if the temperature of keyword a is higher than heat degree threshold, is broken up for ratio is 0.5, is passed through by presetting
Group by sentence in Spark SQL, according to breaing up ratio, to include keyword (a, 2), (a, 8), (a, 9), (a, 5)
Subdata packet implant data packet identification code, by taking package identification code is A and B as an example, then the son after implant data packet identification code
Data packet be comprising keyword (A, a, 2), (A, a, 8), (B, a, 9), (B, a, 5) subdata packet;
(5) number in the subdata packet for including keyword (A, a, 2), (A, a, 8) and (B, a, 9), (B, a, 5) is extracted respectively
According to packet identification code, the subdata packet comprising package identification code A is sent to same primary node before extracting, and will include before extraction
The subdata packet of code B is sent to another primary node, and the subdata packet after making two groups of extraction codes is in corresponding primary
Data calculating is carried out according to pre-set computation rule respectively in node, and stores calculated result;For example, (A, a, 2), (A,
A, 8) it is sent to a primary node, the maximum value of a corresponding entry is calculated, at this point, calculating (A, a, 8);(B, a, 9), (B, a,
5) it is sent to another section primary node, the maximum value of a corresponding entry is calculated, calculates (B, a, 9) at this time.
(6) secondary nodes pull the calculated result stored in above-mentioned two primary node, advise according to pre-set calculating
Then continue data calculating, when secondary nodes are terminal note, completes entire calculating process, export calculated result.For example, drawing
The calculated result (A, a, 8) of above-mentioned primary node, and (B, a, 9) are taken, the computation rule of maximum Data-Statistics is continued to execute, can be obtained
(B, a, 9), and then result (a, 9) is exported, and then export the maximum value statistical result (a, 9) of keyword a.
Data processing method provided in an embodiment of the present invention can obtain the keyword in data packet to be calculated, according to pre-
The heat degree threshold being first arranged judges whether the temperature of keyword is excessively high, and carries out breaing up processing to data packet to be calculated, generates more
A sub- data packet finally carries out data calculating to it according to pre-set computation rule, by advance to crucial in data packet
The judgement of word temperature, real-time monitoring data state can carry out data processing in time, avoid the generation of data skew in advance.
Corresponding to data processing method provided by the above embodiment, the embodiment of the invention also provides a kind of data processing dresses
It sets, which is set to terminal device, wherein the terminal device can be the computer comprising Database Systems, server etc..
A kind of structural schematic diagram of data processing equipment as shown in Figure 3, the device include with flowering structure:
Keyword obtains module 30, if receiving data packet to be calculated, keyword obtains module 61 by obtaining to based on
Calculate the keyword in data packet;
Temperature computing module 32, for calculating the temperature of keyword;
Judgment module 34, for judging whether temperature is higher than pre-set heat degree threshold;
Data break up module 36, if it is judged that be it is yes, for break up processing for data packet to be calculated, generate more
A sub- data packet;
First computing module 38, for carrying out data calculating to multiple subdata packets according to pre-set computation rule.
Wherein, above-mentioned data break up module be also used to obtain it is pre-set break up ratio, will be wait count according to ratio is broken up
The evidence that counts is divided into multiple subdata packets;, wherein it breaks up quantity that ratio includes subdata packet and each subdata packet is opposite
In the accounting of data packet to be calculated;Implant data packet identifies in each subdata packet, and package identification code is used to indicate subnumber
According to data packet belonging to packet.
Above-mentioned first computing module is also used to: extracting the package identification code of each subdata packet;By each subdata packet
It is respectively sent to the corresponding primary node of package identification code, so that primary node is according to pre-set computation rule to subnumber
Data calculating is carried out according to packet;The calculated result for obtaining each primary node, is sent to secondary nodes for calculated result and carries out data
Calculate, until secondary nodes be terminal note when, export calculated result, wherein the data computation rule of secondary nodes with set in advance
The computation rule set is consistent.
On the basis of data processing equipment shown in Fig. 3, the embodiment of the invention also provides another data processing dresses
It sets, the structural schematic diagram of another data processing equipment as shown in Figure 4, in addition to structure shown in Fig. 3, above-mentioned apparatus is also wrapped
It includes:
Generation module 40, for generating data packet to be calculated according to pre-set list item when receiving source data;Its
In, it includes related in entry that data packet to be calculated, which includes pre-set list item and pre-set list item corresponding entry,
Key word.
Further, above-mentioned temperature computing module is also used to count in data packet to be calculated, the keyword that each list item includes
Quantity;Quantity is determined as to the temperature of keyword.
In data processing equipment shown in Fig. 4, further includes: the second computing module 42 judges that temperature is lower than for working as
When pre-set heat degree threshold, data calculating is carried out to multiple data packets to be calculated according to pre-set computation rule.
Further, another data processing equipment as shown in Figure 5, above-mentioned apparatus further include:
Extraction module 44 generates keyword Thermometer for extracting the temperature of keyword and keyword;
Display module 46, for showing keyword Thermometer.
Data processing equipment provided by the embodiment of the present invention has phase with data processing method provided by the above embodiment
Same technical characteristic reaches identical technical effect so also can solve identical technical problem, and to briefly describe, device is real
It applies example part and does not refer to place, can refer to corresponding contents in preceding method embodiment.
The embodiment of the invention also provides a kind of data processing system, which includes memory and processor, storage
Device is used to store the program for supporting processor to execute any of the above-described method, and processor is configurable for executing and deposit in memory
The program of storage.
Further, the embodiment of the invention also provides a kind of computer storage mediums, for storing computer program instructions,
When computer executes the computer program instructions, data processing method described in above-described embodiment is executed.
Referring to Fig. 6, the embodiment of the invention also provides a kind of structural schematic diagrams of data processing system, comprising: processor
600, memory 601, bus 602 and communication interface 603, processor 600, communication interface 603 and memory 601 pass through bus
602 connections;Processor 600 is for executing the executable module stored in memory 601, such as computer program.Wherein, it stores
Device 601 may include high-speed random access memory (RAM, RandomAccess Memory), it is also possible to further include non-unstable
Memory (non-volatilememory), a for example, at least magnetic disk storage.Pass through at least one communication interface 603
(can be wired or wireless) realizes the communication connection between the system network element and at least one other network element, can be used mutually
Networking, wide area network, local network, Metropolitan Area Network (MAN) etc..Bus 602 can be isa bus, pci bus or eisa bus etc..Bus can be with
It is divided into address bus, data/address bus, control bus etc..Only to be indicated with a four-headed arrow in Fig. 6, but not convenient for indicating
Indicate only have a bus or a type of bus.Wherein, for storing program, processor 600 is receiving memory 601
After executing instruction, program, method performed by the data processing equipment that aforementioned any embodiment of the embodiment of the present invention discloses are executed
It can be applied in processor 600, or realized by processor 600.Processor 600 may be a kind of IC chip, tool
There is the processing capacity of signal.During realization, each step of the above method can pass through the integrated of the hardware in processor 600
The instruction of logic circuit or software form is completed.Above-mentioned processor 600 can be general processor, including central processing unit
(Central Processing Unit, abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.;May be used also
To be digital signal processor (Digital Signal Processing, abbreviation DSP), specific integrated circuit
(Application Specific Integrated Circuit, abbreviation ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, abbreviation FPGA) either other programmable logic device, discrete gate or transistor logic
Device, discrete hardware components.It may be implemented or execute disclosed each method, step and the logical box in the embodiment of the present invention
Figure.General processor can be microprocessor or the processor is also possible to any conventional processor etc..In conjunction with the present invention
The step of method disclosed in embodiment, can be embodied directly in hardware decoding processor and execute completion, or use decoding processor
In hardware and software module combination execute completion.Software module can be located at random access memory, and flash memory, read-only memory can
In the storage medium of this fields such as program read-only memory or electrically erasable programmable memory, register maturation.The storage
Medium is located at memory 601, and processor 600 reads the information in memory 601, and the step of the above method is completed in conjunction with its hardware
Suddenly.
The computer program product of data processing method, device and system provided by the embodiment of the present invention, including storage
The computer readable storage medium of program code, the instruction that said program code includes can be used for executing previous methods embodiment
Described in method, specific implementation can be found in embodiment of the method, details are not described herein.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description
It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product
It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a
People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.
And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention
Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair
It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art
In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light
It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make
The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention
Within the scope of.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.
Claims (16)
1. a kind of data processing method characterized by comprising
If receiving data packet to be calculated, the keyword in the data packet to be calculated is obtained;
Calculate the temperature of the keyword;
Judge whether the temperature is higher than pre-set heat degree threshold;
If so, carrying out the data packet to be calculated to break up processing, multiple subdata packets are generated;
Data calculating is carried out to multiple subdata packets according to pre-set computation rule.
2. the method according to claim 1, wherein described carry out the data packet to be calculated to break up processing,
The step of generating multiple subdata packets include:
Obtain it is pre-set break up ratio, break up ratio according to described the data to be calculated be divided into multiple subdata packets,
Wherein, described to break up quantity and each subdata packet that ratio includes the subdata packet relative to described to be calculated
The accounting of data packet;
The implant data packet identification code in each subdata packet, the package identification code are used to indicate the subdata packet
Affiliated data packet.
3. according to the method described in claim 2, it is characterized in that, it is described according to pre-set computation rule to multiple described
Subdata packet carry out data calculating the step of include:
Extract the package identification code of each subdata packet;
Each subdata packet is respectively sent to the corresponding primary node of the package identification code, so that the primary section
Point carries out data calculating to the subdata packet according to pre-set computation rule;
The calculated result for obtaining each primary node, is sent to secondary nodes for the calculated result and carries out data calculating,
Until the secondary nodes be terminal note when, export the calculated result, wherein the data computation rule of the secondary nodes with
The pre-set computation rule is consistent.
4. the method according to claim 1, wherein the method also includes:
When receiving source data, the data packet to be calculated is generated according to pre-set list item;
Wherein, the data packet to be calculated includes that the pre-set list item and the pre-set list item are corresponding
Entry includes the keyword in the entry.
5. according to the method described in claim 4, it is characterized in that, the step of temperature for calculating the keyword include:
It counts in the data packet to be calculated, the quantity for the keyword that each list item includes;
The quantity is determined as to the temperature of the keyword.
6. the method according to claim 1, wherein the method also includes:
When judging the temperature lower than pre-set heat degree threshold, according to pre-set computation rule to described wait count
It calculates data packet and carries out data calculating.
7. the method according to claim 1, wherein the method also includes:
The temperature of the keyword and the keyword is extracted, keyword Thermometer is generated;
Show the keyword Thermometer.
8. a kind of data processing equipment characterized by comprising
Keyword obtains module, if receiving data packet to be calculated, it is described to based on by obtaining that the keyword obtains module
Calculate the keyword in data packet;
Temperature computing module, for calculating the temperature of the keyword;
Judgment module, for judging whether the temperature is higher than pre-set heat degree threshold;
Data break up module, if it is judged that be it is yes, for break up processing for the data packet to be calculated, generate multiple
Subdata packet;
First computing module, for carrying out data calculating to multiple subdata packets according to pre-set computation rule.
9. device according to claim 8, which is characterized in that the data are broken up module and are also used to:
Obtain it is pre-set break up ratio, break up ratio according to described the data to be calculated be divided into multiple subdata packets,
Wherein, described to break up quantity and each subdata packet that ratio includes the subdata packet relative to described to be calculated
The accounting of data packet;
The implant data packet identification code in each subdata packet, the package identification code are used to indicate the subdata packet
Affiliated data packet.
10. device according to claim 9, which is characterized in that first computing module is also used to:
Extract the package identification code of each subdata packet;
Each subdata packet is respectively sent to the corresponding primary node of the package identification code, so that the primary section
Point carries out data calculating to the subdata packet according to pre-set computation rule;
The calculated result for obtaining each primary node, is sent to secondary nodes for the calculated result and carries out data calculating,
Until the secondary nodes be terminal note when, export the calculated result, wherein the data computation rule of the secondary nodes with
The pre-set computation rule is consistent.
11. device according to claim 8, which is characterized in that described device further include:
Generation module, for generating the data packet to be calculated according to pre-set list item when receiving source data;
Wherein, the data packet to be calculated includes that the pre-set list item and the pre-set list item are corresponding
Entry includes the keyword in the entry.
12. device according to claim 11, which is characterized in that the temperature computing module is also used to:
It counts in the data packet to be calculated, the quantity for the keyword that each list item includes;
The quantity is determined as to the temperature of the keyword.
13. device according to claim 10, which is characterized in that described device further include:
Second computing module, for when judging the temperature lower than pre-set heat degree threshold, according to pre-set
Computation rule carries out data calculating to the data packet to be calculated.
14. device according to claim 8, which is characterized in that described device further include:
Extraction module generates keyword Thermometer for extracting the temperature of the keyword and the keyword;
Display module, for showing the keyword Thermometer.
15. a kind of data processing system, which is characterized in that the system comprises memory and processor, the memory is used
Processor perform claim is supported to require the program of any one of 1 to 7 the method in storage, the processor is configured to for holding
The program stored in the row memory.
16. a kind of computer storage medium, which is characterized in that for storing computer program instructions, described in computer execution
When computer program instructions, method as described in any one of claim 1 to 7 is executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811654198.XA CN109684401A (en) | 2018-12-30 | 2018-12-30 | Data processing method, device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811654198.XA CN109684401A (en) | 2018-12-30 | 2018-12-30 | Data processing method, device and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109684401A true CN109684401A (en) | 2019-04-26 |
Family
ID=66190387
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811654198.XA Pending CN109684401A (en) | 2018-12-30 | 2018-12-30 | Data processing method, device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109684401A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112631860A (en) * | 2020-12-21 | 2021-04-09 | 常州微亿智造科技有限公司 | Industrial Internet of things data transmission Worker service monitoring method and device |
CN117009094A (en) * | 2023-10-07 | 2023-11-07 | 联通在线信息科技有限公司 | Data oblique scattering method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130159364A1 (en) * | 2011-12-20 | 2013-06-20 | UT-Battelle, LLC Oak Ridge National Laboratory | Parallel log structured file system collective buffering to achieve a compact representation of scientific and/or dimensional data |
CN105095413A (en) * | 2015-07-09 | 2015-11-25 | 北京京东尚科信息技术有限公司 | Method and apparatus for solving data skew |
CN106293938A (en) * | 2016-08-05 | 2017-01-04 | 飞思达技术(北京)有限公司 | Solve the method for data skew in big data calculation process |
CN107220123A (en) * | 2017-05-25 | 2017-09-29 | 郑州云海信息技术有限公司 | One kind solves Spark data skew method and system |
-
2018
- 2018-12-30 CN CN201811654198.XA patent/CN109684401A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130159364A1 (en) * | 2011-12-20 | 2013-06-20 | UT-Battelle, LLC Oak Ridge National Laboratory | Parallel log structured file system collective buffering to achieve a compact representation of scientific and/or dimensional data |
CN105095413A (en) * | 2015-07-09 | 2015-11-25 | 北京京东尚科信息技术有限公司 | Method and apparatus for solving data skew |
CN106293938A (en) * | 2016-08-05 | 2017-01-04 | 飞思达技术(北京)有限公司 | Solve the method for data skew in big data calculation process |
CN107220123A (en) * | 2017-05-25 | 2017-09-29 | 郑州云海信息技术有限公司 | One kind solves Spark data skew method and system |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112631860A (en) * | 2020-12-21 | 2021-04-09 | 常州微亿智造科技有限公司 | Industrial Internet of things data transmission Worker service monitoring method and device |
CN117009094A (en) * | 2023-10-07 | 2023-11-07 | 联通在线信息科技有限公司 | Data oblique scattering method and device, electronic equipment and storage medium |
CN117009094B (en) * | 2023-10-07 | 2024-02-23 | 联通在线信息科技有限公司 | Data oblique scattering method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111813573B (en) | Communication method of management platform and robot software and related equipment thereof | |
CN107729137A (en) | Server, the method and storage medium of the decryption of block chain sign test | |
CN109684401A (en) | Data processing method, device and system | |
CN105989137A (en) | Structured query language performance optimization method and system | |
CN111491002A (en) | Equipment inspection method and device, inspected equipment, inspection server and system | |
CN106909454B (en) | Rule processing method and equipment | |
CN112631754A (en) | Data processing method, data processing device, storage medium and electronic device | |
CN111523849A (en) | Resource transaction auditing method and device and server | |
CN107784195A (en) | Data processing method and device | |
CN106557483B (en) | Data processing method, data query method, data processing equipment and data query equipment | |
CN104243619B (en) | A kind of distributed mobile-payment system | |
CN108463813B (en) | Method and device for processing data | |
CN101344784B (en) | Standard operation time calculating device and standard operation time calculating method | |
CN112199407A (en) | Data packet sequencing method, device, equipment and storage medium | |
CN109800945B (en) | Optimization method, device, equipment and storage medium for shift management | |
CN111159129A (en) | Statistical method and device for log report | |
CN108255704B (en) | Abnormal response method of script calling event and terminal thereof | |
CN107844490A (en) | A kind of database divides storehouse method and device | |
US20220222159A1 (en) | Timing Index Anomaly Detection Method, Device and Apparatus | |
CN109299132A (en) | SQL data processing method, system and electronic equipment | |
CN115809265A (en) | Risk customer screening method and device based on robot flow automation | |
CN115168509A (en) | Processing method and device of wind control data, storage medium and computer equipment | |
CN105786945B (en) | A kind of power information data efficient processing method based on data channel | |
CN109542609B (en) | Deduction-based repayment method and device, computer equipment and storage medium | |
CN108063957A (en) | A kind of statistical method and device of network television user state |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190426 |
|
RJ01 | Rejection of invention patent application after publication |