CN109684401A - Data processing method, device and system - Google Patents

Data processing method, device and system Download PDF

Info

Publication number
CN109684401A
CN109684401A CN201811654198.XA CN201811654198A CN109684401A CN 109684401 A CN109684401 A CN 109684401A CN 201811654198 A CN201811654198 A CN 201811654198A CN 109684401 A CN109684401 A CN 109684401A
Authority
CN
China
Prior art keywords
data
keyword
calculated
packet
data packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811654198.XA
Other languages
Chinese (zh)
Inventor
郑舒力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Beijing Kingsoft Cloud Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Beijing Kingsoft Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd, Beijing Kingsoft Cloud Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN201811654198.XA priority Critical patent/CN109684401A/en
Publication of CN109684401A publication Critical patent/CN109684401A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present invention provides a kind of data processing methods, device and system, are related to the technical field of big data, if this method includes receiving data packet to be calculated, obtain the keyword in data packet to be calculated;Calculate the temperature of keyword;Judge whether temperature is higher than pre-set heat degree threshold;If so, data packet to be calculated is carried out to break up processing, multiple subdata packets are generated;Data calculating is carried out to multiple subdata packets according to pre-set computation rule.Data processing method provided by the invention, device and system, the keyword in data packet to be calculated can be obtained, and when the temperature for judging keyword is excessively high, and data packet to be calculated is carried out to break up processing, generates multiple subdata packets, finally, data calculating is carried out to it according to pre-set computation rule, passes through the judgement in advance to keyword temperature in data packet, real-time monitoring data state, data processing can be carried out in time, avoid the generation of data skew in advance.

Description

Data processing method, device and system
Technical field
The present invention relates to the technical fields of big data, more particularly, to a kind of data processing method, device and system.
Background technique
With the arriving of cloud era, big data has also attracted more and more concerns.During big data calculates, most intractable is asked Topic is data skew.In general, data skew refers to the data entry for some keyword for including in data packet than other passes The data entry of key word is mostly very much, causes the data processing amount of the data processing node where the keyword than other data processings The data volume of node is big, so that the data processing node be made slowly to run endless phenomenon when handling data.And data skew During typically occurring in shuffle, shuffle may be triggered using operators such as group by, reduceByKey in code Operation.
Currently, the solution of data skew includes pre-processing data, avoid executing shuffle in Spark Class operator, this mode is palliative, and data skew still can occur during data prediction;Other schemes are most It is to cause the methods of inclined keyword to be handled after data skew occurs by filtering minority, cannot avoid in advance The generation of data skew, data processing not in time, can not solution of emergent event.
Not in time for above-mentioned data processing, it is difficult to which the technical issues of avoiding data skew in advance not yet proposes have at present The solution of effect.
Summary of the invention
In view of this, the purpose of the present invention is to provide a kind of data processing method, device and system, to alleviate above-mentioned number Not in time according to processing, it is difficult to the technical issues of avoiding data skew in advance.
In a first aspect, the embodiment of the invention provides a kind of data processing methods, comprising: if receiving data to be calculated Packet, obtains the keyword in data packet to be calculated;Calculate the temperature of keyword;Judge whether temperature is higher than pre-set temperature Threshold value;If so, data packet to be calculated is carried out to break up processing, multiple subdata packets are generated;It is advised according to pre-set calculating Data calculating then is carried out to multiple subdata packets.
With reference to first aspect, the embodiment of the invention provides the first possible embodiments of first aspect, wherein on Stating and data packet to be calculated carries out to the step of breaing up processing, generating multiple subdata packets includes: to obtain pre-set to break up ratio Data to be calculated are divided into multiple subdata packets according to ratio is broken up by rate, wherein the quantity that ratio includes subdata packet is broken up, And accounting of each subdata packet relative to data packet to be calculated;The implant data packet identification code in each subdata packet, number Data packet belonging to subdata packet is used to indicate according to packet identification code.
The possible embodiment of with reference to first aspect the first, the embodiment of the invention provides second of first aspect Possible embodiment, wherein the above-mentioned step that according to pre-set computation rule multiple subdata packets are carried out with data calculating It suddenly include: the package identification code for extracting each subdata packet;It is right that each subdata packet is respectively sent to package identification code The primary node answered, so that primary node carries out data calculating according to pre-set computation rule subdata packet;It obtains every The calculated result of a primary node, is sent to secondary nodes for calculated result and carries out data calculating, until secondary nodes are whole section When point, calculated result is exported, wherein the data computation rule of secondary nodes is consistent with pre-set computation rule.
With reference to first aspect, the embodiment of the invention provides the third possible embodiments of first aspect, wherein on State method further include: when receiving source data, generate data packet to be calculated according to pre-set list item;Wherein, to be calculated Data packet includes pre-set list item and the corresponding entry of pre-set list item, includes keyword in entry.
The third possible embodiment with reference to first aspect, the embodiment of the invention provides the 4th kind of first aspect Possible embodiment, wherein the step of temperature of above-mentioned calculating keyword includes: Mei Gebiao in statistics data packet to be calculated The quantity for the keyword that item includes;Quantity is determined as to the temperature of keyword.
The possible embodiment of second with reference to first aspect, the embodiment of the invention provides the 5th kind of first aspect Possible embodiment, wherein the above method further include: when judging temperature lower than pre-set heat degree threshold, according to Pre-set computation rule carries out data calculating to data packet to be calculated.
With reference to first aspect, the embodiment of the invention provides the 6th kind of possible embodiments of first aspect, wherein on State method further include: extract the temperature of keyword and keyword, generate keyword Thermometer;Show keyword Thermometer.
Second aspect, the embodiment of the invention also provides a kind of data processing equipments, comprising: keyword obtains module, such as Fruit receives data packet to be calculated, and keyword obtains module and is used to obtain the keyword in data packet to be calculated;Temperature calculates mould Block, for calculating the temperature of keyword;Judgment module, for judging whether temperature is higher than pre-set heat degree threshold;Data Break up module, if it is judged that be it is yes, break up processing for carrying out data packet to be calculated, generate multiple subdata packets;The One computing module, for carrying out data calculating to multiple subdata packets according to pre-set computation rule.
In conjunction with second aspect, the embodiment of the invention provides the first possible embodiments of second aspect, wherein on State data and break up module and be also used to: obtain it is pre-set breaks up ratio, it is multiple according to breaing up ratio for data to be calculated and being divided into Subdata packet, wherein break up quantity and each subdata packet that ratio includes subdata packet relative to data packet to be calculated Accounting;Implant data packet identifies in each subdata packet, and package identification code is used to indicate data packet belonging to subdata packet.
In conjunction with the first possible embodiment of second aspect, the embodiment of the invention provides second of second aspect Possible embodiment, wherein above-mentioned first computing module is also used to: the package identification code of each subdata packet is extracted;It will Each subdata packet is respectively sent to the corresponding primary node of package identification code, so that primary node is according to pre-set meter It calculates regular subdata packet and carries out data calculating;The calculated result for obtaining each primary node, is sent to secondary for calculated result Node carries out data calculating, until exporting calculated result when secondary nodes are terminal note, wherein the data of secondary nodes calculate It is regular consistent with pre-set computation rule.
In conjunction with second aspect, the embodiment of the invention provides the third possible embodiments of second aspect, wherein on State device further include: generation module, for generating data to be calculated according to pre-set list item when receiving source data Packet;Wherein, data packet to be calculated includes pre-set list item and the corresponding entry of pre-set list item, is wrapped in entry Contain keyword.
In conjunction with the third possible embodiment of second aspect, the embodiment of the invention provides the 4th kind of second aspect Possible embodiment, wherein above-mentioned temperature computing module is also used to: counting in data packet to be calculated, and each list item includes The quantity of keyword;Quantity is determined as to the temperature of keyword.
In conjunction with second of possible embodiment of second aspect, the embodiment of the invention provides the 5th kind of second aspect Possible embodiment, wherein above-mentioned apparatus further include: the second computing module, when judging temperature lower than pre-set heat When spending threshold value, the second computing module is used to carry out data calculating to data packet to be calculated according to pre-set computation rule.
In conjunction with second aspect, the embodiment of the invention provides the 6th kind of possible embodiments of second aspect, wherein on State device further include: extraction module generates keyword Thermometer for extracting the temperature of keyword and keyword;Show mould Block, for showing keyword Thermometer.
The third aspect, the embodiment of the invention also provides a kind of data processing system, system includes memory and processing Device, memory are used to store the program for supporting processor to execute any of the above-described method, and processor is configurable for executing and deposit The program stored in reservoir.
Fourth aspect, the embodiment of the invention also provides a kind of computer storage mediums, refer to for storing computer program It enables, when computer executes the computer program instructions, executes method described in first aspect.
The embodiment of the present invention bring it is following the utility model has the advantages that
Data processing method provided in an embodiment of the present invention, device and system, can obtain the pass in data packet to be calculated Key word judges whether the temperature of keyword is excessively high, and break up to data packet to be calculated according to pre-set heat degree threshold Processing, generates multiple subdata packets, finally, carries out data calculating to it according to pre-set computation rule, by right in advance The judgement of keyword temperature in data packet, real-time monitoring data state can carry out data processing in time, avoid data in advance Inclined generation.
Other features and advantages of the present invention will illustrate in the following description, also, partly become from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention are in specification, claims And specifically noted structure is achieved and obtained in attached drawing.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart of data processing method provided in an embodiment of the present invention;
Fig. 2 is the flow chart of another data processing method provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of data processing equipment provided in an embodiment of the present invention;
Fig. 4 is the structure structure chart of another data processing equipment provided in an embodiment of the present invention;
Fig. 5 is the structure structure chart of another data processing equipment provided in an embodiment of the present invention;
Fig. 6 is a kind of structural schematic diagram of data processing system provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.
Currently, the most thorny issue is data skew, and data skew typically occurs in shuffle mistake in big data calculating Cheng Zhong.During carrying out big data processing using Spark operation, drawing for stage is carried out generally according to shuffle class operator Point, it, will be in the operator if performing some shuffle class operator (such as group by, reduceByKey) in code Place, marks off a stage.When one stage starts to execute, each Task therein can be from the Task of a upper stage The worker node at place pulls all keywords to be treated by network transmission, to all identical keys pulled Word executes converging operation, this process is exactly shuffle, and when wherein the corresponding data volume of some keyword is especially big, meeting Cause data skew.
There are many kinds of the solutions of data skew, and the key of data skew is caused including data prediction, filtering minority Word, the degree of parallelism for improving shuffle operation etc..Wherein, data prediction is carried out by Hive to the scheme of data prediction, i.e., Data are polymerize according to keyword in advance, since data carried out converging operation in advance, in Spark operation no longer It needs to execute this generic operation using shuffle class operator.And during data skew typically occurs in shuffle, due to this side Case thoroughly avoids the execution shuffle class operator in Spark, so data skew will not be generated.But due to data itself There are problems that being unevenly distributed, so still will appear data skew during pretreated, only the hair of data skew It is raw to have advanceed in pretreatment, avoid Spark program that data skew occurs.
The schemes such as degree of parallelism that other filtering minorities lead to the keywords of data skew, improve shuffle operation, be Processing scheme after data skew occurs, can not in time should handle the data skew situation of burst, cannot avoid in advance The generation of data skew.
Based on this, a kind of data processing method provided in an embodiment of the present invention, device and system can carry out data in time Processing, avoids the generation of data skew in advance.
For convenient for understanding the present embodiment, first to a kind of data processing method disclosed in the embodiment of the present invention into Row is discussed in detail.
As shown in Figure 1, this method is applied in terminal device the embodiment of the invention provides a kind of data processing method Data processing system, above-mentioned terminal device can be the terminals such as the computer comprising Database Systems, server, at above-mentioned data Reason system can be data base management system SQL Server (Structured Query Language Server), can be with It is the application program based on big data processing frame Spark, or is applied to ETL (Extract Transform Load) framework Application program etc..
Using subscriber terminal equipment as desktop computer, data processing system therein is that the application program based on Spark is Example, is illustrated the application background of data processing method provided in an embodiment of the present invention.In Spark job run, including it is more A worker node is responsible for executing task task.After submitting Spark operation, Driver (driving) process, cluster management will start Device starts multiple Executor processes according to the resource parameters of Spark operation on each worker node.Driver is responsible for tune Degree and execution operation code, are divided into multiple stage, and each stage executes a part of code snippet, and is each Stage creates a batch Task, these Task are assigned in each Executor process and are executed.It is all as stage After Task is finished, results of intermediate calculations is written in each worker node local disk file, then starts to dispatch Next stage is run, the input data of the Task of next stage is exactly the results of intermediate calculations of upper stage output. It loops back and forth like this, until all having executed code logic, and all data has been calculated.
Data processing method provided in an embodiment of the present invention can be applied to during the shuffle of Spark operation, right The data skew caused by shuffle class operator carries out data processing.By taking this application background as an example, this method is illustrated, This method comprises the following steps:
Step S102 obtains the keyword in data packet to be calculated if receiving data packet to be calculated;
Step S104 calculates the temperature of keyword.
When specific implementation, in Spark operation, if code performs some shuffle class operator (such as group by), A stage will be marked off at the operator.When one stage starts to execute, each Task therein can be from upper one Worker node where the Task of stage pulls all keywords to be treated.It wherein, include more in worker node A data packet includes keyword in each data packet.
When user terminal receives the data packet to be calculated pulled from worker node, can be obtained by Metrics function Each keyword in data packet to be calculated is taken, and calculates the temperature of each keyword, wherein the temperature of keyword is to be calculated The total quantity of the keyword in data packet.
Step S106, judges whether temperature is higher than pre-set heat degree threshold.
When specific implementation, heat degree threshold should be preset, which is the total quantity threshold value of keyword, specifically In the data packet to be calculated of worker node, keyword quantity is excessive, causes the critical heat degree threshold of data skew.
Step S108 generates multiple subdata packets if so, carrying out above-mentioned data packet to be calculated to break up processing.
Step S110 carries out data calculating to multiple subdata packets according to pre-set computation rule.
When specific implementation, in the data packet to be calculated of worker node, the temperature (total quantity) of some keyword is higher than Pre-set heat degree threshold then illustrates the possibility of this data skew that occurs now and then, and therefore, it is necessary to handle it.It can be with Using ETL tool, the data in data packet to be calculated are extracted, carry out breaing up processing, generate multiple subdata packets.Generate multiple sons After data packet, redistribute function can use, a sub- data packet is handled.
Data processing method provided in an embodiment of the present invention can obtain the keyword in data packet to be calculated, according to pre- The heat degree threshold being first arranged judges whether the temperature of keyword is excessively high, and carries out breaing up processing to data packet to be calculated, generates more A sub- data packet finally carries out data calculating to it according to pre-set computation rule., method is by advance to data packet The judgement of middle keyword temperature, real-time monitoring data state can carry out data processing in time, avoid data skew in advance Occur.
On the basis of method shown in Fig. 1, the embodiment of the invention also provides another data processing methods, specifically, The method for computing data for breaing up processing method and subdata packet of data packet to be calculated is further described in the method, As shown in Fig. 2, this method specifically comprises the following steps:
Step S202 obtains the keyword in data packet to be calculated if receiving data packet to be calculated;
Step S204 calculates the temperature of keyword;
Step S206, judges whether temperature is higher than pre-set heat degree threshold;
If so, executing step S208;If not, executing step S218;
Step S208, acquisition is pre-set to break up ratio, and data to be calculated are divided into multiple subnumbers according to ratio is broken up According to packet.
When specific implementation, when subscriber terminal equipment is the computer comprising data base handling system, ETL work can use Tool, extracts the data in data packet to be calculated, carries out breaing up processing, generates multiple subdata packets, generates the quantity of subdata packet It is determined by breaing up ratio, wherein break up quantity and each subdata packet that ratio includes subdata packet relative to number to be calculated According to the accounting of packet.It can use redistribute function, be configured to ratio is broken up.
By taking data packet to be calculated includes keyword a as an example, it is illustrated.When data base handling system receives number to be calculated When according to packet, the keyword a in data packet to be calculated can be obtained, and calculate the temperature of keyword by Metrics function, when When the temperature of keyword a is higher than pre-set heat degree threshold, according to the ratio of breaing up being arranged in redistribute function, benefit Data packet to be calculated is extracted with ETL tool, multiple subdata packets are generated, and according to ratio is broken up, in conjunction with number to be calculated According to the size of packet, data packet to be calculated can be divided into the subdatas packets such as 2,4 or 6, the specific number of subdata packet is answered It is determined in conjunction with actual conditions.
Step S210, the implant data packet identification code in each subdata packet, the package identification code are used to indicate subnumber According to data packet belonging to packet.
With the number of above-mentioned subdata packet for 4, for separately including keyword (a, 2), (a, 8), (a, 9), (a, 5), Respectively and implant data packet identification code A and B for above-mentioned 4 sub- data packets, the keyword point of 4 sub- data packets after code implant It Wei not (A, a, 2), (A, a, 8), (B, a, 9), (B, a, 5).When specific implementation, in different random generations, can be set according to actual needs Code is implanted into, and it is not limited by the embodiments of the present invention.
Step S212 extracts the package identification code of each subdata packet;
Each subdata packet is respectively sent to the corresponding primary node of package identification code, so that primary by step S214 Node carries out data calculating according to pre-set computation rule subdata packet.
Above-mentioned steps, which are realized, is divided into multiple subdata packets for a data packet to be calculated, respectively according to pre-set Computation rule subdata packet carries out data calculating, and group by sentence can be used in Spark SQL, will include keyword (A, a, 2), the package identification code A in the subdata packet of (A, a, 8) is extracted, and data calculated result is sent to accordingly Primary node, i.e. worker node;It will include keyword (B, a, 9) the package identification code B in the subdata packet of (B, a, 5) It extracts, and data calculated result is sent to corresponding another primary node, i.e., another worker node.
When specific implementation, data packet to be calculated is dispersed as multiple sons by data processing method provided in an embodiment of the present invention Data packet, and by way of code implant, so that it may it allows originally by the data of Task processing, is distributed to multiple Task, And then solve the problems, such as that single Task processing data volume is excessive.Then the package identification code of each subdata packet implantation is extracted, It is calculated according to pre-set computation rule, and calculated result is respectively sent to corresponding primary node.
Step S216 obtains the calculated result of each primary node, calculated result is sent to secondary nodes and carries out data Calculate, until secondary nodes be terminal note when, export calculated result, wherein the data computation rule of secondary nodes with set in advance The computation rule set is consistent.
Wherein, it is stored with the calculated result of subdata packet in above-mentioned primary node, needs to pull calculated result to same Secondary nodes continue to carry out data calculating according to pre-set computation rule by the Task in secondary nodes, when secondary saves When point is terminal, terminates data and calculate, and export calculated result.
Step S218 carries out data calculating to data packet to be calculated according to pre-set computation rule.
That is, when judging temperature lower than pre-set heat degree threshold, according to pre-set computation rule to multiple Data packet to be calculated carries out data calculating.At this point, the risk of usually not data skew, above-mentioned break up and be implanted into without carrying out The processing such as package identification code carries out data calculating to above-mentioned data packet to be calculated according to pre-set computation rule.
Data processing method provided in an embodiment of the present invention, data packet therein are usually the data generated by source data Packet, in general, the source data can be initial data, for the initial data, user can carry out at data according to actual needs Reason, therefore, user can will be when perhaps server is received when computer or server initial data input value computer When source data, data packet to be calculated can be generated according to pre-set list item;Wherein, data packet to be calculated includes list item, with And the corresponding entry of list item, it include keyword in entry.
When specific implementation, it can use ETL tool and the source data in Database Systems extracted, according to presetting List item generate data packet to be calculated, by taking the form that table 1 provides as an example, be illustrated.
The data packet T to be calculated of table 1
f1 f2
a 2
a 8
a 9
a 5
b 3
b 6
Wherein, table 1 is the data packet to be calculated that extracts from source data of ETL tool, title T, list item be f1 and F2, wherein the list item is commonly referred to as the foundation classified to source data packet, and what such as above-mentioned f1 was indicated is that this is classified as key Word, what f2 was indicated is corresponding entry of each keyword etc..It wherein, include six entries in the list item f1 of above-mentioned table 1, respectively For keyword a, a, a, a, b, b;List item f2 correspond to six entries, entry 2 respectively corresponding with keyword a, with a pairs of keyword The corresponding entry 9 of the entry 8 and keyword a answered and the corresponding entry 5 of keyword a, entry 3 corresponding with keyword b, with close The corresponding entry 6 of key word b.
Based on above-mentioned data packet to be calculated, the step of temperature of above-mentioned calculating keyword may include: that (1) statistics is to be calculated In data packet, which is determined as the temperature of keyword by the quantity for the keyword that each list item includes.
When specific implementation, after computer or server receive data packet to be calculated, it can be obtained by Metrics module The keyword a and b in data packet T to be calculated are taken, the total quantity of keyword a and b are counted, and then executes the mistake of subsequent temperature judgement Journey.
Further, for the ease of user in current data treatment process, the temperature of keyword is checked and is analyzed, on State method further include: extract the temperature of above-mentioned keyword and keyword, generate keyword Thermometer, show keyword temperature Table, so that user checks.
Wherein, by taking data packet to be calculated is the form of table 1 as an example, the keyword a in table 1 is extracted by Metrics module And the temperature 4 and 2 of b and keyword a and b, generate and show that keyword Thermometer, keyword Thermometer can be shown in table 2 Form.
2 keyword Thermometer of table
Keyword Temperature
a 4
b 2
When specific implementation, by taking application Spark operation carries out big data processing as an example, to data provided in an embodiment of the present invention The whole flow process of processing method is illustrated:
(1) when the pending source data of Database Systems, source data is extracted using ETL tool, according to pre- The list item that is first arranged generates data packet to be calculated, specifically can be the form of the offer of table 1, wherein including with keyword a to Calculate data packet, and the data packet to be calculated with keyword b;
(2) by Metrics module, the keyword a and its temperature 4, keyword b and its temperature 2 in table 1 are extracted, and raw At keyword Thermometer, the form of the offer of table 2 specifically can be;
(3) it by redistribute function, presets and breaks up ratio and heat degree threshold, when the temperature of keyword a When higher than heat degree threshold, have the tendency that causing data skew, data packet to be calculated is carried out using ETL tool to break up processing, it is raw At four sub- data packets comprising keyword (a, 2), (a, 8), (a, 9), (a, 5);At the same time, it is assumed that the temperature of keyword b Lower than heat degree threshold, then directly according to pre-set computation rule, at the data packet to be calculated comprising keyword b Reason, until output calculated result;Specifically, which can be maximum Data-Statistics, minimum Data-Statistics, with And the computation rules such as total quantity statistics, by taking the computation rule of maximum Data-Statistics as an example, if the temperature of keyword b is lower than temperature threshold Value, then directly choosing (b, 6) item in table 1.
(4) it if the temperature of keyword a is higher than heat degree threshold, is broken up for ratio is 0.5, is passed through by presetting Group by sentence in Spark SQL, according to breaing up ratio, to include keyword (a, 2), (a, 8), (a, 9), (a, 5) Subdata packet implant data packet identification code, by taking package identification code is A and B as an example, then the son after implant data packet identification code Data packet be comprising keyword (A, a, 2), (A, a, 8), (B, a, 9), (B, a, 5) subdata packet;
(5) number in the subdata packet for including keyword (A, a, 2), (A, a, 8) and (B, a, 9), (B, a, 5) is extracted respectively According to packet identification code, the subdata packet comprising package identification code A is sent to same primary node before extracting, and will include before extraction The subdata packet of code B is sent to another primary node, and the subdata packet after making two groups of extraction codes is in corresponding primary Data calculating is carried out according to pre-set computation rule respectively in node, and stores calculated result;For example, (A, a, 2), (A, A, 8) it is sent to a primary node, the maximum value of a corresponding entry is calculated, at this point, calculating (A, a, 8);(B, a, 9), (B, a, 5) it is sent to another section primary node, the maximum value of a corresponding entry is calculated, calculates (B, a, 9) at this time.
(6) secondary nodes pull the calculated result stored in above-mentioned two primary node, advise according to pre-set calculating Then continue data calculating, when secondary nodes are terminal note, completes entire calculating process, export calculated result.For example, drawing The calculated result (A, a, 8) of above-mentioned primary node, and (B, a, 9) are taken, the computation rule of maximum Data-Statistics is continued to execute, can be obtained (B, a, 9), and then result (a, 9) is exported, and then export the maximum value statistical result (a, 9) of keyword a.
Data processing method provided in an embodiment of the present invention can obtain the keyword in data packet to be calculated, according to pre- The heat degree threshold being first arranged judges whether the temperature of keyword is excessively high, and carries out breaing up processing to data packet to be calculated, generates more A sub- data packet finally carries out data calculating to it according to pre-set computation rule, by advance to crucial in data packet The judgement of word temperature, real-time monitoring data state can carry out data processing in time, avoid the generation of data skew in advance.
Corresponding to data processing method provided by the above embodiment, the embodiment of the invention also provides a kind of data processing dresses It sets, which is set to terminal device, wherein the terminal device can be the computer comprising Database Systems, server etc.. A kind of structural schematic diagram of data processing equipment as shown in Figure 3, the device include with flowering structure:
Keyword obtains module 30, if receiving data packet to be calculated, keyword obtains module 61 by obtaining to based on Calculate the keyword in data packet;
Temperature computing module 32, for calculating the temperature of keyword;
Judgment module 34, for judging whether temperature is higher than pre-set heat degree threshold;
Data break up module 36, if it is judged that be it is yes, for break up processing for data packet to be calculated, generate more A sub- data packet;
First computing module 38, for carrying out data calculating to multiple subdata packets according to pre-set computation rule.
Wherein, above-mentioned data break up module be also used to obtain it is pre-set break up ratio, will be wait count according to ratio is broken up The evidence that counts is divided into multiple subdata packets;, wherein it breaks up quantity that ratio includes subdata packet and each subdata packet is opposite In the accounting of data packet to be calculated;Implant data packet identifies in each subdata packet, and package identification code is used to indicate subnumber According to data packet belonging to packet.
Above-mentioned first computing module is also used to: extracting the package identification code of each subdata packet;By each subdata packet It is respectively sent to the corresponding primary node of package identification code, so that primary node is according to pre-set computation rule to subnumber Data calculating is carried out according to packet;The calculated result for obtaining each primary node, is sent to secondary nodes for calculated result and carries out data Calculate, until secondary nodes be terminal note when, export calculated result, wherein the data computation rule of secondary nodes with set in advance The computation rule set is consistent.
On the basis of data processing equipment shown in Fig. 3, the embodiment of the invention also provides another data processing dresses It sets, the structural schematic diagram of another data processing equipment as shown in Figure 4, in addition to structure shown in Fig. 3, above-mentioned apparatus is also wrapped It includes:
Generation module 40, for generating data packet to be calculated according to pre-set list item when receiving source data;Its In, it includes related in entry that data packet to be calculated, which includes pre-set list item and pre-set list item corresponding entry, Key word.
Further, above-mentioned temperature computing module is also used to count in data packet to be calculated, the keyword that each list item includes Quantity;Quantity is determined as to the temperature of keyword.
In data processing equipment shown in Fig. 4, further includes: the second computing module 42 judges that temperature is lower than for working as When pre-set heat degree threshold, data calculating is carried out to multiple data packets to be calculated according to pre-set computation rule.
Further, another data processing equipment as shown in Figure 5, above-mentioned apparatus further include:
Extraction module 44 generates keyword Thermometer for extracting the temperature of keyword and keyword;
Display module 46, for showing keyword Thermometer.
Data processing equipment provided by the embodiment of the present invention has phase with data processing method provided by the above embodiment Same technical characteristic reaches identical technical effect so also can solve identical technical problem, and to briefly describe, device is real It applies example part and does not refer to place, can refer to corresponding contents in preceding method embodiment.
The embodiment of the invention also provides a kind of data processing system, which includes memory and processor, storage Device is used to store the program for supporting processor to execute any of the above-described method, and processor is configurable for executing and deposit in memory The program of storage.
Further, the embodiment of the invention also provides a kind of computer storage mediums, for storing computer program instructions, When computer executes the computer program instructions, data processing method described in above-described embodiment is executed.
Referring to Fig. 6, the embodiment of the invention also provides a kind of structural schematic diagrams of data processing system, comprising: processor 600, memory 601, bus 602 and communication interface 603, processor 600, communication interface 603 and memory 601 pass through bus 602 connections;Processor 600 is for executing the executable module stored in memory 601, such as computer program.Wherein, it stores Device 601 may include high-speed random access memory (RAM, RandomAccess Memory), it is also possible to further include non-unstable Memory (non-volatilememory), a for example, at least magnetic disk storage.Pass through at least one communication interface 603 (can be wired or wireless) realizes the communication connection between the system network element and at least one other network element, can be used mutually Networking, wide area network, local network, Metropolitan Area Network (MAN) etc..Bus 602 can be isa bus, pci bus or eisa bus etc..Bus can be with It is divided into address bus, data/address bus, control bus etc..Only to be indicated with a four-headed arrow in Fig. 6, but not convenient for indicating Indicate only have a bus or a type of bus.Wherein, for storing program, processor 600 is receiving memory 601 After executing instruction, program, method performed by the data processing equipment that aforementioned any embodiment of the embodiment of the present invention discloses are executed It can be applied in processor 600, or realized by processor 600.Processor 600 may be a kind of IC chip, tool There is the processing capacity of signal.During realization, each step of the above method can pass through the integrated of the hardware in processor 600 The instruction of logic circuit or software form is completed.Above-mentioned processor 600 can be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.;May be used also To be digital signal processor (Digital Signal Processing, abbreviation DSP), specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), ready-made programmable gate array (Field- Programmable Gate Array, abbreviation FPGA) either other programmable logic device, discrete gate or transistor logic Device, discrete hardware components.It may be implemented or execute disclosed each method, step and the logical box in the embodiment of the present invention Figure.General processor can be microprocessor or the processor is also possible to any conventional processor etc..In conjunction with the present invention The step of method disclosed in embodiment, can be embodied directly in hardware decoding processor and execute completion, or use decoding processor In hardware and software module combination execute completion.Software module can be located at random access memory, and flash memory, read-only memory can In the storage medium of this fields such as program read-only memory or electrically erasable programmable memory, register maturation.The storage Medium is located at memory 601, and processor 600 reads the information in memory 601, and the step of the above method is completed in conjunction with its hardware Suddenly.
The computer program product of data processing method, device and system provided by the embodiment of the present invention, including storage The computer readable storage medium of program code, the instruction that said program code includes can be used for executing previous methods embodiment Described in method, specific implementation can be found in embodiment of the method, details are not described herein.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention Within the scope of.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims (16)

1. a kind of data processing method characterized by comprising
If receiving data packet to be calculated, the keyword in the data packet to be calculated is obtained;
Calculate the temperature of the keyword;
Judge whether the temperature is higher than pre-set heat degree threshold;
If so, carrying out the data packet to be calculated to break up processing, multiple subdata packets are generated;
Data calculating is carried out to multiple subdata packets according to pre-set computation rule.
2. the method according to claim 1, wherein described carry out the data packet to be calculated to break up processing, The step of generating multiple subdata packets include:
Obtain it is pre-set break up ratio, break up ratio according to described the data to be calculated be divided into multiple subdata packets, Wherein, described to break up quantity and each subdata packet that ratio includes the subdata packet relative to described to be calculated The accounting of data packet;
The implant data packet identification code in each subdata packet, the package identification code are used to indicate the subdata packet Affiliated data packet.
3. according to the method described in claim 2, it is characterized in that, it is described according to pre-set computation rule to multiple described Subdata packet carry out data calculating the step of include:
Extract the package identification code of each subdata packet;
Each subdata packet is respectively sent to the corresponding primary node of the package identification code, so that the primary section Point carries out data calculating to the subdata packet according to pre-set computation rule;
The calculated result for obtaining each primary node, is sent to secondary nodes for the calculated result and carries out data calculating, Until the secondary nodes be terminal note when, export the calculated result, wherein the data computation rule of the secondary nodes with The pre-set computation rule is consistent.
4. the method according to claim 1, wherein the method also includes:
When receiving source data, the data packet to be calculated is generated according to pre-set list item;
Wherein, the data packet to be calculated includes that the pre-set list item and the pre-set list item are corresponding Entry includes the keyword in the entry.
5. according to the method described in claim 4, it is characterized in that, the step of temperature for calculating the keyword include:
It counts in the data packet to be calculated, the quantity for the keyword that each list item includes;
The quantity is determined as to the temperature of the keyword.
6. the method according to claim 1, wherein the method also includes:
When judging the temperature lower than pre-set heat degree threshold, according to pre-set computation rule to described wait count It calculates data packet and carries out data calculating.
7. the method according to claim 1, wherein the method also includes:
The temperature of the keyword and the keyword is extracted, keyword Thermometer is generated;
Show the keyword Thermometer.
8. a kind of data processing equipment characterized by comprising
Keyword obtains module, if receiving data packet to be calculated, it is described to based on by obtaining that the keyword obtains module Calculate the keyword in data packet;
Temperature computing module, for calculating the temperature of the keyword;
Judgment module, for judging whether the temperature is higher than pre-set heat degree threshold;
Data break up module, if it is judged that be it is yes, for break up processing for the data packet to be calculated, generate multiple Subdata packet;
First computing module, for carrying out data calculating to multiple subdata packets according to pre-set computation rule.
9. device according to claim 8, which is characterized in that the data are broken up module and are also used to:
Obtain it is pre-set break up ratio, break up ratio according to described the data to be calculated be divided into multiple subdata packets, Wherein, described to break up quantity and each subdata packet that ratio includes the subdata packet relative to described to be calculated The accounting of data packet;
The implant data packet identification code in each subdata packet, the package identification code are used to indicate the subdata packet Affiliated data packet.
10. device according to claim 9, which is characterized in that first computing module is also used to:
Extract the package identification code of each subdata packet;
Each subdata packet is respectively sent to the corresponding primary node of the package identification code, so that the primary section Point carries out data calculating to the subdata packet according to pre-set computation rule;
The calculated result for obtaining each primary node, is sent to secondary nodes for the calculated result and carries out data calculating, Until the secondary nodes be terminal note when, export the calculated result, wherein the data computation rule of the secondary nodes with The pre-set computation rule is consistent.
11. device according to claim 8, which is characterized in that described device further include:
Generation module, for generating the data packet to be calculated according to pre-set list item when receiving source data;
Wherein, the data packet to be calculated includes that the pre-set list item and the pre-set list item are corresponding Entry includes the keyword in the entry.
12. device according to claim 11, which is characterized in that the temperature computing module is also used to:
It counts in the data packet to be calculated, the quantity for the keyword that each list item includes;
The quantity is determined as to the temperature of the keyword.
13. device according to claim 10, which is characterized in that described device further include:
Second computing module, for when judging the temperature lower than pre-set heat degree threshold, according to pre-set Computation rule carries out data calculating to the data packet to be calculated.
14. device according to claim 8, which is characterized in that described device further include:
Extraction module generates keyword Thermometer for extracting the temperature of the keyword and the keyword;
Display module, for showing the keyword Thermometer.
15. a kind of data processing system, which is characterized in that the system comprises memory and processor, the memory is used Processor perform claim is supported to require the program of any one of 1 to 7 the method in storage, the processor is configured to for holding The program stored in the row memory.
16. a kind of computer storage medium, which is characterized in that for storing computer program instructions, described in computer execution When computer program instructions, method as described in any one of claim 1 to 7 is executed.
CN201811654198.XA 2018-12-30 2018-12-30 Data processing method, device and system Pending CN109684401A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811654198.XA CN109684401A (en) 2018-12-30 2018-12-30 Data processing method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811654198.XA CN109684401A (en) 2018-12-30 2018-12-30 Data processing method, device and system

Publications (1)

Publication Number Publication Date
CN109684401A true CN109684401A (en) 2019-04-26

Family

ID=66190387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811654198.XA Pending CN109684401A (en) 2018-12-30 2018-12-30 Data processing method, device and system

Country Status (1)

Country Link
CN (1) CN109684401A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631860A (en) * 2020-12-21 2021-04-09 常州微亿智造科技有限公司 Industrial Internet of things data transmission Worker service monitoring method and device
CN117009094A (en) * 2023-10-07 2023-11-07 联通在线信息科技有限公司 Data oblique scattering method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130159364A1 (en) * 2011-12-20 2013-06-20 UT-Battelle, LLC Oak Ridge National Laboratory Parallel log structured file system collective buffering to achieve a compact representation of scientific and/or dimensional data
CN105095413A (en) * 2015-07-09 2015-11-25 北京京东尚科信息技术有限公司 Method and apparatus for solving data skew
CN106293938A (en) * 2016-08-05 2017-01-04 飞思达技术(北京)有限公司 Solve the method for data skew in big data calculation process
CN107220123A (en) * 2017-05-25 2017-09-29 郑州云海信息技术有限公司 One kind solves Spark data skew method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130159364A1 (en) * 2011-12-20 2013-06-20 UT-Battelle, LLC Oak Ridge National Laboratory Parallel log structured file system collective buffering to achieve a compact representation of scientific and/or dimensional data
CN105095413A (en) * 2015-07-09 2015-11-25 北京京东尚科信息技术有限公司 Method and apparatus for solving data skew
CN106293938A (en) * 2016-08-05 2017-01-04 飞思达技术(北京)有限公司 Solve the method for data skew in big data calculation process
CN107220123A (en) * 2017-05-25 2017-09-29 郑州云海信息技术有限公司 One kind solves Spark data skew method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631860A (en) * 2020-12-21 2021-04-09 常州微亿智造科技有限公司 Industrial Internet of things data transmission Worker service monitoring method and device
CN117009094A (en) * 2023-10-07 2023-11-07 联通在线信息科技有限公司 Data oblique scattering method and device, electronic equipment and storage medium
CN117009094B (en) * 2023-10-07 2024-02-23 联通在线信息科技有限公司 Data oblique scattering method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111813573B (en) Communication method of management platform and robot software and related equipment thereof
CN107729137A (en) Server, the method and storage medium of the decryption of block chain sign test
CN109684401A (en) Data processing method, device and system
CN105989137A (en) Structured query language performance optimization method and system
CN111491002A (en) Equipment inspection method and device, inspected equipment, inspection server and system
CN106909454B (en) Rule processing method and equipment
CN112631754A (en) Data processing method, data processing device, storage medium and electronic device
CN111523849A (en) Resource transaction auditing method and device and server
CN107784195A (en) Data processing method and device
CN106557483B (en) Data processing method, data query method, data processing equipment and data query equipment
CN104243619B (en) A kind of distributed mobile-payment system
CN108463813B (en) Method and device for processing data
CN101344784B (en) Standard operation time calculating device and standard operation time calculating method
CN112199407A (en) Data packet sequencing method, device, equipment and storage medium
CN109800945B (en) Optimization method, device, equipment and storage medium for shift management
CN111159129A (en) Statistical method and device for log report
CN108255704B (en) Abnormal response method of script calling event and terminal thereof
CN107844490A (en) A kind of database divides storehouse method and device
US20220222159A1 (en) Timing Index Anomaly Detection Method, Device and Apparatus
CN109299132A (en) SQL data processing method, system and electronic equipment
CN115809265A (en) Risk customer screening method and device based on robot flow automation
CN115168509A (en) Processing method and device of wind control data, storage medium and computer equipment
CN105786945B (en) A kind of power information data efficient processing method based on data channel
CN109542609B (en) Deduction-based repayment method and device, computer equipment and storage medium
CN108063957A (en) A kind of statistical method and device of network television user state

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190426

RJ01 Rejection of invention patent application after publication