A kind of index calculating method under big data environment
Technical field
The invention belongs to the technical field of big data statistics, specifically, being related to the index meter under a kind of big data environment
Calculation method.
Background technology
With internet, Internet of Things, wireless sensor, cloud computing fast development, there is explosive increasing in global metadata amount
Long, human society enters the big data epoch.The feature of big data includes:
1. data volume is huge, from TB ranks, PB ranks are risen to;
2. data type is various, including network log, video, picture, geographical location information etc.;
3. value density is low, in the storage data of magnanimity, contain huge value, but practical significant data, just
An especially small part wherein;
For these features of big data, many major companies provide one-stop solution, from storage to calculating, such as
Ali's cloud and Amazon cloud;But Ali's cloud does not provide specific index numerical procedure, it merely provide one it is open
Service environment is exactly to sell server substantially;Amazon provides data warehouse for data statistics, but service is too expensive, not only needs
The interface of data warehouse is known quite well, and service code exploitation is also comparatively laborious, safeguarded relatively difficult.
Invention content
For deficiency above-mentioned in the prior art, the present invention proposes the index calculating method under a kind of big data environment, point
Scheduling code is calculated from index and business calculates SQL, the reading of data, pretreatment, calculating, the SQL of result storage is whole
Database is put into, in this way, can realize that zero code completes demand substantially.
In order to achieve the above object, the solution that uses of the present invention is:A kind of index calculating side under big data environment
Method includes the following steps:
S1, initialization obtain Job scheduling from database and index calculate SQL information, and to needing in SQL to be used
Parameter is initialized;
S2, according to initialization information, load the corresponding event data of game to memory from cloud storage service device, and create pair
The event view answered is used for subsequent step;
S3, such as event data have dirty data or wrong data, then using SparkSQL, to the event data of step S2 loadings
Carry out data cleansing;If event data does not have dirty data or wrong data, then step S4 is jumped directly to;
S4, middle table calculating is carried out, and create middle table view using SparkSQL;
S5, using event view and middle table view, carry out game index and calculate, after having been calculated, be inserted into database with
For displaying;
S6, according to time and system requirements are calculated, terminate this operation;Or continue update and calculate time, return to step S1, weight
Multiple operation.
Further, step S1 includes the following steps:
S11, according to database table data, tectonic event load information EventLoadMap, event load information
EventLoadMap includes the corresponding filename of event and event in cloud storage service device and is loaded into the view created after memory
Name;
S12, corresponding event number E is judged, if E<0, illustrate the event data that the game is loaded without needs, then terminate this
The entire calculation process of game;If E >=0 enters step S13;
S13, according to database data, tectonic event data cleansing information CleanDirtyDataMap, event data cleaning
Information CleanDirtyDataMap includes the view name for needing data cleansing and the SparkSQL for cleaning data;
S14, construction middle table calculate information MidDataMap, and middle table calculates information MidDataMap and regarded including middle table
The map title and the SparkSQL for calculating middle table;
S15, construction index calculate information MetricComputeMap, and index calculates information MetricComputeMap and includes
Index needs the table name for being inserted into database and the SpakrSQL for calculating the index after having been calculated.
Further, step S2 includes the following steps:
S21, zero-computing time ST is obtained;
The event load information EventLoadMap that S22, traversal step S11 are constructed, obtain event name EventName and
The view name ViewName that will be created;
S23, the event name EventName according to acquisition construct the incident file path of cloud storage service device
OssSource reads corresponding event data using Spark;
If the corresponding incident file of S24, view name exists, view is created according to view name ViewName;If file
It is not present, then enters step S25;
If S25, event load information EventLoadMap traversal terminate, S31 is entered step;Otherwise return to step S22,
Continue load events data.
Further, the data cleansing of step S3 includes the following steps:
The event data cleaning information CleanDirtyDataMap that S31, traversal step S13 are constructed, obtains event name
The ViewName and DeleteSQL that will be performed;
S32, the ViewName obtained according to step S31, initialize the parameter in DeleteSQL;
If S33, the corresponding event views of cleaning data SQL exist, SQL is performed, otherwise, enters step S34;
If S34, event data cleaning information CleanDirtyDataMap traversals terminate, S41 is entered step;Otherwise it returns
Step S31 is returned, continues purge event data.
Further, the middle table calculating of step S4 includes the following steps:
S41:The middle table that traversal step S14 is constructed calculates information MidDataMap, obtains middle table view name
MidDataViewName;
S42:According to middle table view name, middle table data are copied to memory from cloud storage service device;
S43:If there are the middle table data, view is created;If without the middle table data, middle table InitSql is performed,
Construct the middle table view structure;
S44:DeleteSql is performed, deletes the middle table data of ST periods;
S45:ComputeSql is performed, calculates middle table data, and re-create the middle table view;
S46:Using Spark, data are uploaded to oss;
S47:If middle table calculates information MidDataMap, traversal terminates, and enters step S51;If middle table calculates information
MidDataMap, which is not traversed, to be terminated, return to step S41, continues to calculate event data.
Further, the game index calculating of step S5 includes the following steps:
S51:The index that traversal step S15 is constructed calculates information MetricComputeMap, obtains index corresponding data
Library table name and the table include the calculating SQL set C of index;
S52:It traverses index and calculates SQL set C, and the SQL to traversing out carries out parameter initialization, then index of performance meter
It calculates;
S53:If having traversed set C, the index calculated is merged;If set C, return to step have not been traversed
S52;
S54:If desired more new historical achievement data then enters in next step, otherwise skips to step 57;
S55:History achievement data is read, and restores the index dimension of the data;
S56:New legacy data merging is carried out, updates the data the index dimensional information in library, and to the index number after merging
According to dimensional information mapped, replace with the id of dimensional information;
S57:According to sql is deleted, corresponding historical data is deleted, while be inserted into new achievement data;
S58:If index calculates information MetricComputeMap, traversal does not terminate, jumps to step S51, continues index
Calculating;Otherwise the index for terminating the ST periods calculates, and jumps to step S6.
The invention has the advantages that the logic that index calculates is write on sql by the present invention completely by SparkSQL functions
In, and database is written into sql, increase an index newly, it is only necessary to which several sql can just achieve the goal, and substantially carry out for zero generation
Code is increased demand;After index has been calculated, by extracting public field, achieve the purpose that compression index data, while also carry
High index inquiry velocity, this method can be widely applied in the system with simple statistics data function.
Description of the drawings
Fig. 1 is the computational methods flow chart of the present invention.
Fig. 2 is the computational methods initialization flowchart of the present invention.
Fig. 3 is that event loads flow chart in computational methods of the invention.
Fig. 4 is event data cleaning process figure in computational methods of the invention.
Fig. 5 is middle table calculation flow chart in computational methods of the invention.
Fig. 6 is index calculation flow chart in computational methods of the invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, the technical solution in the present invention is carried out below
It clearly and completely describes, it is clear that described embodiment is part of the embodiment of the present invention, instead of all the embodiments.Base
Embodiment in the present invention, those of ordinary skill in the art obtained without creative efforts it is all its
His embodiment, shall fall within the protection scope of the present invention.
Below in conjunction with attached drawing, the invention will be further described:
With reference to attached drawing 1- attached drawings 6, for the present invention by taking the technical field of online game as an example, the present invention provides a kind of big data ring
Index calculating method under border, includes the following steps:
S1, initialization obtain Job scheduling from database and index calculate SQL information, and to needing in SQL to be used
Parameter is initialized;
S2, according to initialization information, load the corresponding event data of game to memory from cloud storage service device, and create pair
The view answered is used for subsequent step;
S3, using SparkSQL, the event data of step S2 loadings is cleaned.The step is not required, if
Event data does not have dirty data or without wrong data, just jumps directly to step S4;
S4, middle table calculating is carried out, and create middle table view using SparkSQL, middle table is mainly stored some and gone through
History data, for calculating the index that some need to rely on historical data;
S5, using event view and middle table view, carry out game index and calculate, index calculates slightly more complex, packet
Some achievement datas are included with new, merging etc., after having been calculated, are inserted into database for displaying;
It is S6, last, according to time and system requirements are calculated, terminate whole flow process or continue update to calculate the time, again
Return to step S1, performs whole flow process again.
In the present embodiment, the initialization of step S1 includes following sub-step:
S11, according to database table data, tectonic event load information EventLoadMap, it includes two parts:Yun Cun
After the corresponding filename of event and event are loaded into memory in storage server, the view name of establishment;
S12, corresponding event number E is judged, if E<0, illustrate the event data that the game is loaded without needs, then terminate this
The entire calculation process of game;Otherwise, S13 is entered step;
S13, according to database data, tectonic event data cleansing information CleanDirtyDataMap, it includes two
Point:It needs the view name of data cleansing and cleans the SparkSQL of data, which can be carried out some parameter processings;
S14, construction middle table calculate information MidDataMap, it includes two parts:In middle table view name and calculating
Between table SparkSQL, the sql can by carry out some parameter processings;
S15, construction index calculate information MetricComputeMap, it includes two parts:After index has been calculated, need
It is inserted into the table name of database and calculates the SpakrSQL of the index, which can be carried out some parameter processings.
In the present embodiment, the game events data loading of step S2 includes following sub-step:
S21, zero-computing time ST is obtained, because this system is the file named according to the time to store the thing of game
Number of packages evidence, so firstly the need of the time is obtained;
The event load information EventLoadMap that S22, traversal step S11 are constructed, obtain event name EventName and
The view name ViewName that will be created;
S23, the event name EventName according to acquisition construct the incident file path of cloud storage service device
OssSource reads corresponding event data using Spark;
If the corresponding incident file of S24, view name exists, view is created according to view name ViewName;If file
It is not present, then enters step S25;
If S25, event load information EventLoadMap traversal terminate, S31 is entered step;Otherwise return to step S22,
Continue load events data.
In the present embodiment, the game events data cleansing of step S3 includes following sub-step:
The event cleaning information CleanDirtyDataMap that S31, traversal step S13 are constructed, obtains event name
The ViewName and DeleteSQL that will be performed;
S32, the ViewName obtained according to step S31, initialize the parameter in DeleteSQL;
If S33, the corresponding event views of cleaning data SQL exist, SQL is performed, otherwise, enters step S34;
If S25, event cleaning information CleanDirtyDataMap traversals terminate, S41 is entered step;Otherwise step is returned
Rapid S31 continues purge event data.
In the present embodiment, the game middle table calculating of step S4 includes following sub-step:
The middle table that S41, traversal step S14 are constructed calculates information MidDataMap, obtains middle table view name
MidDataViewName;
S42, according to middle table view name, copy middle table data to memory from cloud storage service device;
If S43, there are the middle table data, view is created;Otherwise, middle table InitSql is performed, the middle table is constructed and regards
Graph structure;
S44, deleteSql is performed, deletes the middle table data of ST periods;
S45, computeSql is performed, calculates middle table data, and re-create the middle table view;
S46, Spark, upload data to oss are utilized;
If S47, middle table calculate information MidDataMap, traversal terminates, and enters step S51;Otherwise return to step S41,
Continue to calculate event data.
In the present embodiment, the game index calculating of step S5 includes following sub-step:
The index that S51, traversal step S15 are constructed calculates information MetricComputeMap, obtains index corresponding data
Library table name and the table include the calculating SQL set C of index;
S52, traversal index calculate SQL set C, and the SQL to traversing out carries out parameter initialization, then index of performance meter
It calculates;
If S53, having traversed set C, the index calculated is merged;Otherwise, return to step S52;
S54, if desired more new historical achievement data then enter in next step, otherwise skip to step 57;
S55, history achievement data is read, and restores the index dimension of the data;
S56, new legacy data merging is carried out, updates the data the index dimensional information in library, and to the index number after merging
According to dimensional information mapped, replace with the id of dimensional information;
S57, according to delete sql, delete corresponding historical data, while be inserted into new achievement data;
If S58, index calculate information MetricComputeMap, traversal does not terminate, jumps to step S51, continues index
Calculating;Otherwise the index for terminating the ST periods calculates, and jumps to step S6.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
The present invention is described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that:It still may be used
To modify to the technical solution recorded in foregoing embodiments or carry out equivalent replacement to which part technical characteristic;
And these modification or replace, various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution spirit and
Range.