CN108268639A

CN108268639A - A kind of index calculating method under big data environment

Info

Publication number: CN108268639A
Application number: CN201810048169.2A
Authority: CN
Inventors: 尹学渊; 蒋自国
Original assignee: Chengdu Hi Turn House Culture Communication Co Ltd
Current assignee: Chengdu Potential Artificial Intelligence Technology Co ltd
Priority date: 2018-01-18
Filing date: 2018-01-18
Publication date: 2018-07-10
Anticipated expiration: 2038-01-18
Also published as: CN108268639B

Abstract

The invention discloses the index calculating methods under a kind of big data environment, include the following steps：Data are read from data storage server；Data are cleaned；SparkSQL is performed, carries out middle table calculating；By performing the SparkSQL finished writing in advance, game index is calculated；Judge to calculate whether the time is the same day, in this way the same day, then stop calculating, if not being the same day, then continue to calculate the index of next day.The logic that index calculates is write in sql, and sql is written database by SparkSQL functions, increases an index newly, it is only necessary to which several sql can just achieve the goal, and substantially carried out zero code and increased demand by the present invention completely；After index has been calculated, by extracting public field, achieve the purpose that compression index data, while also improve index inquiry velocity, this method can be widely applied in the system with simple statistics data function.

Description

A kind of index calculating method under big data environment

Technical field

The invention belongs to the technical field of big data statistics, specifically, being related to the index meter under a kind of big data environment Calculation method.

Background technology

With internet, Internet of Things, wireless sensor, cloud computing fast development, there is explosive increasing in global metadata amount Long, human society enters the big data epoch.The feature of big data includes：

1. data volume is huge, from TB ranks, PB ranks are risen to；

2. data type is various, including network log, video, picture, geographical location information etc.；

3. value density is low, in the storage data of magnanimity, contain huge value, but practical significant data, just An especially small part wherein；

For these features of big data, many major companies provide one-stop solution, from storage to calculating, such as Ali's cloud and Amazon cloud；But Ali's cloud does not provide specific index numerical procedure, it merely provide one it is open Service environment is exactly to sell server substantially；Amazon provides data warehouse for data statistics, but service is too expensive, not only needs The interface of data warehouse is known quite well, and service code exploitation is also comparatively laborious, safeguarded relatively difficult.

Invention content

For deficiency above-mentioned in the prior art, the present invention proposes the index calculating method under a kind of big data environment, point Scheduling code is calculated from index and business calculates SQL, the reading of data, pretreatment, calculating, the SQL of result storage is whole Database is put into, in this way, can realize that zero code completes demand substantially.

In order to achieve the above object, the solution that uses of the present invention is：A kind of index calculating side under big data environment Method includes the following steps：

S1, initialization obtain Job scheduling from database and index calculate SQL information, and to needing in SQL to be used Parameter is initialized；

S2, according to initialization information, load the corresponding event data of game to memory from cloud storage service device, and create pair The event view answered is used for subsequent step；

S3, such as event data have dirty data or wrong data, then using SparkSQL, to the event data of step S2 loadings Carry out data cleansing；If event data does not have dirty data or wrong data, then step S4 is jumped directly to；

S4, middle table calculating is carried out, and create middle table view using SparkSQL；

S5, using event view and middle table view, carry out game index and calculate, after having been calculated, be inserted into database with For displaying；

S6, according to time and system requirements are calculated, terminate this operation；Or continue update and calculate time, return to step S1, weight Multiple operation.

Further, step S1 includes the following steps：

S11, according to database table data, tectonic event load information EventLoadMap, event load information EventLoadMap includes the corresponding filename of event and event in cloud storage service device and is loaded into the view created after memory Name；

S12, corresponding event number E is judged, if E<0, illustrate the event data that the game is loaded without needs, then terminate this The entire calculation process of game；If E >=0 enters step S13；

S13, according to database data, tectonic event data cleansing information CleanDirtyDataMap, event data cleaning Information CleanDirtyDataMap includes the view name for needing data cleansing and the SparkSQL for cleaning data；

S14, construction middle table calculate information MidDataMap, and middle table calculates information MidDataMap and regarded including middle table The map title and the SparkSQL for calculating middle table；

S15, construction index calculate information MetricComputeMap, and index calculates information MetricComputeMap and includes Index needs the table name for being inserted into database and the SpakrSQL for calculating the index after having been calculated.

Further, step S2 includes the following steps：

S21, zero-computing time ST is obtained；

The event load information EventLoadMap that S22, traversal step S11 are constructed, obtain event name EventName and The view name ViewName that will be created；

S23, the event name EventName according to acquisition construct the incident file path of cloud storage service device OssSource reads corresponding event data using Spark；

If the corresponding incident file of S24, view name exists, view is created according to view name ViewName；If file It is not present, then enters step S25；

If S25, event load information EventLoadMap traversal terminate, S31 is entered step；Otherwise return to step S22, Continue load events data.

Further, the data cleansing of step S3 includes the following steps：

The event data cleaning information CleanDirtyDataMap that S31, traversal step S13 are constructed, obtains event name The ViewName and DeleteSQL that will be performed；

S32, the ViewName obtained according to step S31, initialize the parameter in DeleteSQL；

If S33, the corresponding event views of cleaning data SQL exist, SQL is performed, otherwise, enters step S34；

If S34, event data cleaning information CleanDirtyDataMap traversals terminate, S41 is entered step；Otherwise it returns Step S31 is returned, continues purge event data.

Further, the middle table calculating of step S4 includes the following steps：

S41：The middle table that traversal step S14 is constructed calculates information MidDataMap, obtains middle table view name MidDataViewName；

S42：According to middle table view name, middle table data are copied to memory from cloud storage service device；

S43：If there are the middle table data, view is created；If without the middle table data, middle table InitSql is performed, Construct the middle table view structure；

S44：DeleteSql is performed, deletes the middle table data of ST periods；

S45：ComputeSql is performed, calculates middle table data, and re-create the middle table view；

S46：Using Spark, data are uploaded to oss；

S47：If middle table calculates information MidDataMap, traversal terminates, and enters step S51；If middle table calculates information MidDataMap, which is not traversed, to be terminated, return to step S41, continues to calculate event data.

Further, the game index calculating of step S5 includes the following steps：

S51：The index that traversal step S15 is constructed calculates information MetricComputeMap, obtains index corresponding data Library table name and the table include the calculating SQL set C of index；

S52：It traverses index and calculates SQL set C, and the SQL to traversing out carries out parameter initialization, then index of performance meter It calculates；

S53：If having traversed set C, the index calculated is merged；If set C, return to step have not been traversed S52；

S54：If desired more new historical achievement data then enters in next step, otherwise skips to step 57；

S55：History achievement data is read, and restores the index dimension of the data；

S56：New legacy data merging is carried out, updates the data the index dimensional information in library, and to the index number after merging According to dimensional information mapped, replace with the id of dimensional information；

S57：According to sql is deleted, corresponding historical data is deleted, while be inserted into new achievement data；

S58：If index calculates information MetricComputeMap, traversal does not terminate, jumps to step S51, continues index Calculating；Otherwise the index for terminating the ST periods calculates, and jumps to step S6.

The invention has the advantages that the logic that index calculates is write on sql by the present invention completely by SparkSQL functions In, and database is written into sql, increase an index newly, it is only necessary to which several sql can just achieve the goal, and substantially carry out for zero generation Code is increased demand；After index has been calculated, by extracting public field, achieve the purpose that compression index data, while also carry High index inquiry velocity, this method can be widely applied in the system with simple statistics data function.

Description of the drawings

Fig. 1 is the computational methods flow chart of the present invention.

Fig. 2 is the computational methods initialization flowchart of the present invention.

Fig. 3 is that event loads flow chart in computational methods of the invention.

Fig. 4 is event data cleaning process figure in computational methods of the invention.

Fig. 5 is middle table calculation flow chart in computational methods of the invention.

Fig. 6 is index calculation flow chart in computational methods of the invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, the technical solution in the present invention is carried out below It clearly and completely describes, it is clear that described embodiment is part of the embodiment of the present invention, instead of all the embodiments.Base Embodiment in the present invention, those of ordinary skill in the art obtained without creative efforts it is all its His embodiment, shall fall within the protection scope of the present invention.

Below in conjunction with attached drawing, the invention will be further described：

With reference to attached drawing 1- attached drawings 6, for the present invention by taking the technical field of online game as an example, the present invention provides a kind of big data ring Index calculating method under border, includes the following steps：

S2, according to initialization information, load the corresponding event data of game to memory from cloud storage service device, and create pair The view answered is used for subsequent step；

S3, using SparkSQL, the event data of step S2 loadings is cleaned.The step is not required, if Event data does not have dirty data or without wrong data, just jumps directly to step S4；

S4, middle table calculating is carried out, and create middle table view using SparkSQL, middle table is mainly stored some and gone through History data, for calculating the index that some need to rely on historical data；

S5, using event view and middle table view, carry out game index and calculate, index calculates slightly more complex, packet Some achievement datas are included with new, merging etc., after having been calculated, are inserted into database for displaying；

It is S6, last, according to time and system requirements are calculated, terminate whole flow process or continue update to calculate the time, again Return to step S1, performs whole flow process again.

In the present embodiment, the initialization of step S1 includes following sub-step：

S11, according to database table data, tectonic event load information EventLoadMap, it includes two parts：Yun Cun After the corresponding filename of event and event are loaded into memory in storage server, the view name of establishment；

S12, corresponding event number E is judged, if E<0, illustrate the event data that the game is loaded without needs, then terminate this The entire calculation process of game；Otherwise, S13 is entered step；

S13, according to database data, tectonic event data cleansing information CleanDirtyDataMap, it includes two Point：It needs the view name of data cleansing and cleans the SparkSQL of data, which can be carried out some parameter processings；

S14, construction middle table calculate information MidDataMap, it includes two parts：In middle table view name and calculating Between table SparkSQL, the sql can by carry out some parameter processings；

S15, construction index calculate information MetricComputeMap, it includes two parts：After index has been calculated, need It is inserted into the table name of database and calculates the SpakrSQL of the index, which can be carried out some parameter processings.

In the present embodiment, the game events data loading of step S2 includes following sub-step：

S21, zero-computing time ST is obtained, because this system is the file named according to the time to store the thing of game Number of packages evidence, so firstly the need of the time is obtained；

In the present embodiment, the game events data cleansing of step S3 includes following sub-step：

The event cleaning information CleanDirtyDataMap that S31, traversal step S13 are constructed, obtains event name The ViewName and DeleteSQL that will be performed；

If S25, event cleaning information CleanDirtyDataMap traversals terminate, S41 is entered step；Otherwise step is returned Rapid S31 continues purge event data.

In the present embodiment, the game middle table calculating of step S4 includes following sub-step：

The middle table that S41, traversal step S14 are constructed calculates information MidDataMap, obtains middle table view name MidDataViewName；

S42, according to middle table view name, copy middle table data to memory from cloud storage service device；

If S43, there are the middle table data, view is created；Otherwise, middle table InitSql is performed, the middle table is constructed and regards Graph structure；

S44, deleteSql is performed, deletes the middle table data of ST periods；

S45, computeSql is performed, calculates middle table data, and re-create the middle table view；

S46, Spark, upload data to oss are utilized；

If S47, middle table calculate information MidDataMap, traversal terminates, and enters step S51；Otherwise return to step S41, Continue to calculate event data.

In the present embodiment, the game index calculating of step S5 includes following sub-step：

The index that S51, traversal step S15 are constructed calculates information MetricComputeMap, obtains index corresponding data Library table name and the table include the calculating SQL set C of index；

S52, traversal index calculate SQL set C, and the SQL to traversing out carries out parameter initialization, then index of performance meter It calculates；

If S53, having traversed set C, the index calculated is merged；Otherwise, return to step S52；

S54, if desired more new historical achievement data then enter in next step, otherwise skip to step 57；

S55, history achievement data is read, and restores the index dimension of the data；

S56, new legacy data merging is carried out, updates the data the index dimensional information in library, and to the index number after merging According to dimensional information mapped, replace with the id of dimensional information；

S57, according to delete sql, delete corresponding historical data, while be inserted into new achievement data；

If S58, index calculate information MetricComputeMap, traversal does not terminate, jumps to step S51, continues index Calculating；Otherwise the index for terminating the ST periods calculates, and jumps to step S6.

Finally it should be noted that：The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although The present invention is described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that：It still may be used To modify to the technical solution recorded in foregoing embodiments or carry out equivalent replacement to which part technical characteristic； And these modification or replace, various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims

1. the index calculating method under a kind of big data environment, it is characterized in that, include the following steps：

S1, initialization obtain Job scheduling from database and index calculate SQL information, and to needing the parameter in SQL to be used It is initialized；

S2, according to initialization information, load the corresponding event data of game to memory, and create corresponding from cloud storage service device Event view is used for subsequent step；

S3, such as event data have dirty data or wrong data, then using SparkSQL, the event data of step S2 loadings is carried out Data cleansing；If event data does not have dirty data or wrong data, then step S4 is jumped directly to；

S5, using event view and middle table view, carry out game index and calculate, after having been calculated, be inserted into database for exhibition Show；

S6, according to time and system requirements are calculated, terminate this operation；Or continue update and calculate the time, return to step S1 repeats to grasp Make.

2. the index calculating method under big data environment according to claim 1, it is characterized in that, the step S1 include with Lower step：

S12, corresponding event number E is judged, if E<0, illustrate the event data that the game is loaded without needs, then terminate the game Entire calculation process；If E >=0 enters step S13；

S13, according to database data, tectonic event data cleansing information CleanDirtyDataMap, the event data cleaning Information CleanDirtyDataMap includes the view name for needing data cleansing and the SparkSQL for cleaning data；

S14, construction middle table calculate information MidDataMap, and the middle table calculates information MidDataMap and regarded including middle table The map title and the SparkSQL for calculating middle table；

S15, construction index calculate information MetricComputeMap, and the index calculates information MetricComputeMap and includes Index needs the table name for being inserted into database and the SpakrSQL for calculating the index after having been calculated.

3. the index calculating method under big data environment according to claim 1, it is characterized in that, the step S2 include with Lower step：

S21, zero-computing time ST is obtained；

The event load information EventLoadMap that S22, traversal step S11 are constructed obtains event name EventName and will The view name ViewName of establishment；

S23, the event name EventName according to step S22 construct the incident file path of cloud storage service device OssSource reads corresponding event data using Spark；

If the corresponding incident file of S24, view name exists, view is created according to view name ViewName；If file is not deposited Then entering step S25；

If S25, the event load information EventLoadMap traversal terminate, S31 is entered step；Otherwise return to step S22, Continue load events data.

4. the index calculating method under big data environment according to claim 1, it is characterized in that, the data of the step S3 Cleaning includes the following steps：

Described in S31, traversal step S13 event data clean information CleanDirtyDataMap, obtain event name ViewName and The DeleteSQL that will be performed；

S32, the ViewName according to step S31 initialize the parameter in DeleteSQL；

If S34, the event data cleaning information CleanDirtyDataMap traversals terminate, S41 is entered step；Otherwise it returns Step S31 is returned, continues purge event data.

5. the index calculating method under big data environment according to claim 1, it is characterized in that, the centre of the step S4 Meter includes the following steps：

S41：Middle table described in traversal step S14 calculates information MidDataMap, obtains middle table view name MidDataViewName；

S44：DeleteSql is performed, deletes the middle table data of ST periods；

S46：Using Spark, data are uploaded to oss；

S47：If the middle table calculates information MidDataMap, traversal terminates, and enters step S51；If the middle table calculates Information MidDataMap, which is not traversed, to be terminated, return to step S41, continues to calculate event data.

6. the index calculating method under big data environment according to claim 1, it is characterized in that, the game of the step S5 Index calculating includes the following steps：

S51：Described in traversal step S15 index calculate information MetricComputeMap, obtain index correspondence database table name and The table includes the calculating SQL set C of index；

S52：It traverses index and calculates SQL set C, and the SQL to traversing out carries out parameter initialization, then index of performance calculates；

S53：If having traversed set C, the index calculated is merged；If not traversed set C, return to step S52；

S56：New legacy data merging is carried out, updates the data the index dimensional information in library, and to the achievement data after merging Dimensional information is mapped, and replaces with the id of dimensional information；

S58：If the index calculates information MetricComputeMap, traversal does not terminate, jumps to step S51, continues index Calculating；Otherwise the index for terminating the ST periods calculates, and jumps to step S6.