CN109145027A - Data statistical approach, device, equipment and computer readable storage medium - Google Patents

Data statistical approach, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN109145027A
CN109145027A CN201710467826.2A CN201710467826A CN109145027A CN 109145027 A CN109145027 A CN 109145027A CN 201710467826 A CN201710467826 A CN 201710467826A CN 109145027 A CN109145027 A CN 109145027A
Authority
CN
China
Prior art keywords
statistical
statistics
data stream
data
original data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710467826.2A
Other languages
Chinese (zh)
Inventor
范晓亮
余俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201710467826.2A priority Critical patent/CN109145027A/en
Publication of CN109145027A publication Critical patent/CN109145027A/en
Pending legal-status Critical Current

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a kind of data statistical approach, this method comprises: obtaining original data stream to be counted;Preset data statistics configuration file is called and parsed according to the original data stream, obtains the pretreatment category information and statistical condition information of the statistical report form to be generated of preset kind;The original data stream is pre-processed according to the pretreatment category information, obtains corresponding statistics metadata;The statistics metadata is counted according to the statistical condition information, corresponding statistical report form is generated according to statistical result.The invention also discloses a kind of data statistics device, equipment and a kind of computer readable storage mediums.The present invention can be improved the source code flexible that data statistics is carried out based on Spark platform, reduce the exploitation maintenance cost of code.

Description

Data statistical approach, device, equipment and computer readable storage medium
Technical field
The present invention relates to big data processing technology fields more particularly to data statistical approach, device, equipment and computer can Read storage medium.
Background technique
Spark is a kind of general-purpose computing engine for aiming at large-scale data processing and designing, in recent years, as big data is flat How the appearance of platform Spark system and gradually mature realize various machine learning and data mining simultaneously on Spark platform The emphasis that rowization algorithm is designed to pay close attention to both at home and abroad at present.
Currently, often increasing a kind of business statistics newly when carrying out data analysis based on Spark platform, developer is needed Again statistics codes are developed, need to expend more time cost and human cost, moreover, the change of business and its statistical rules The frequent modification that but will cause code leads to higher error probability and testing cost, with the transition of time and the frequency of business Numerous variation, code also can be more and more too fat to move, and code is caused to be difficult to safeguard, thus, it is existing that data system is carried out based on Spark platform The source code flexible of meter need to be improved.
Summary of the invention
It is a primary object of the present invention to propose a kind of data statistical approach, device and computer readable storage medium, purport The source code flexible for carrying out data statistics based on Spark platform is being improved, the exploitation maintenance cost of code is reduced.
To achieve the above object, the present invention provides a kind of data statistical approach, and the data statistical approach includes following step It is rapid:
Obtain original data stream to be counted;
Preset data statistics configuration file is called and parsed according to the original data stream, obtains the to be generated of preset kind At the pretreatment category information and statistical condition information of statistical report form;
The original data stream is pre-processed according to the pretreatment category information, obtains corresponding statistics metadata;
The statistics metadata is counted according to the statistical condition information, corresponding system is generated according to statistical result Count report.
Preferably, before described the step of obtaining original data stream to be counted, further includes:
The type of statistical report form to be generated is set, and corresponding pretreatment is set for each type of statistical report form to be generated Category information and statistical condition information;
Setting result is saved into data statistics configuration file.
Preferably, described the step of obtaining original data stream to be counted, includes:
Message data request is sent to preset service message system;
The message data stream that the service message system is returned based on message data request is received, by the message count According to stream as original data stream to be counted.
Preferably, described that the original data stream is pre-processed according to the pretreatment category information, it obtains corresponding Count metadata the step of include:
The original data stream is divided into several pretreatment classes according to the separator in the pretreatment category information;
The original data stream under each pretreatment class is formatted according to preset format transformation rule, is obtained pair The statistics metadata answered.
Preferably, described that the statistics metadata is counted according to the statistical condition information, according to statistical result The step of generating corresponding statistical report form include:
Logical operation is carried out to the statistics metadata according to the logical operation expression formula in the statistical condition information;
Corresponding statistical report form is generated according to the logic operation result.
Preferably, after described the step of generating corresponding statistical report form according to the logic operation result, further includes:
The statistical report form of generation is stored into the database of Spark platform.
Preferably, the data statistical approach further include:
The modification instruction for receiving user, modifies to the data statistics configuration file according to modification instruction.
In addition, to achieve the above object, the present invention also provides a kind of data statistics device, the data statistics device packet It includes:
Module is obtained, for obtaining original data stream to be counted;
Parsing module is called, for preset data statistics configuration file to be called and parsed according to the original data stream, Obtain the pretreatment category information and statistical condition information of the statistical report form to be generated of preset kind;
Preprocessing module obtains pair for being pre-processed according to the pretreatment category information to the original data stream The statistics metadata answered;
Statistical module is tied for being counted according to the statistical condition information to the statistics metadata according to statistics Fruit generates corresponding statistical report form.
In addition, to achieve the above object, the present invention also provides a kind of data statistics equipment, the data statistics equipment packet It includes: memory, processor and being stored in the data statistics program that can be run on the memory and on the processor, it is described Data statistics program realizes following steps when being executed by the processor:
Obtain original data stream to be counted;
Preset data statistics configuration file is called and parsed according to the original data stream, obtains the to be generated of preset kind At the pretreatment category information and statistical condition information of statistical report form;
The original data stream is pre-processed according to the pretreatment category information, obtains corresponding statistics metadata;
The statistics metadata is counted according to the statistical condition information, corresponding system is generated according to statistical result Count report.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium Data statistics program is stored on storage medium, the data statistics program realizes following steps when being executed by processor:
Obtain original data stream to be counted;
Preset data statistics configuration file is called and parsed according to the original data stream, obtains the to be generated of preset kind At the pretreatment category information and statistical condition information of statistical report form;
The original data stream is pre-processed according to the pretreatment category information, obtains corresponding statistics metadata;
The statistics metadata is counted according to the statistical condition information, corresponding system is generated according to statistical result Count report.
Spark platform of the present invention obtains original data stream to be counted;It is called and is parsed pre- according to the original data stream If data statistics configuration file, obtain the statistical report form to be generated of preset kind pretreatment category information and statistical condition letter Breath;The original data stream is pre-processed according to the pretreatment category information, obtains corresponding statistics metadata;According to institute It states statistical condition information to count the statistics metadata, corresponding statistical report form is generated according to statistical result.The present invention By in Spark platform be arranged data statistics configuration file, and in data statistics configuration file be written preset kind to The pretreatment category information and statistical condition information for generating statistical report form realize and carry out data statistics based on Spark platform, relatively In the prior art, when business or statistical rules change, developer, which need to only modify in data statistics configuration file, matches confidence Breath, without developing new statistics codes again, to improve the source code flexible for carrying out data statistics based on Spark platform Property, reduce the exploitation maintenance cost of code.
Detailed description of the invention
Fig. 1 is the device structure schematic diagram for the hardware running environment that the embodiment of the present invention is related to;
Fig. 2 is the flow diagram of data statistical approach first embodiment of the present invention;
Fig. 3 is flow diagram for statistical analysis to Original CDR in the embodiment of the present invention;
Fig. 4 is the flow diagram of data statistical approach second embodiment of the present invention;
Fig. 5 is the flow diagram of data statistical approach 3rd embodiment of the present invention;
Fig. 6 is the flow diagram of data statistical approach fourth embodiment of the present invention;
Fig. 7 is the functional block diagram of one embodiment of data statistics device of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The primary solutions of the embodiment of the present invention are: obtaining original data stream to be counted;According to the initial data Stream calls and parses preset data statistics configuration file, obtains the pretreatment category information of the statistical report form to be generated of preset kind With statistical condition information;The original data stream is pre-processed according to the pretreatment category information, obtains corresponding statistics Metadata;The statistics metadata is counted according to the statistical condition information, corresponding system is generated according to statistical result Count report.
In the prior art, when carrying out data analysis based on Spark platform, a kind of business statistics, exploitation are often increased newly Personnel need to develop statistics codes again, need to expend more time cost and human cost, moreover, business and its statistics rule Change then can cause the frequent modification of code, lead to higher error probability and testing cost, with the transition and industry of time The frequent variation of business, code also can be more and more too fat to move, and code is caused to be difficult to safeguard.
The present invention is write in data statistics configuration file by the way that data statistics configuration file is arranged in Spark platform The pretreatment category information and statistical condition information for entering the statistical report form to be generated of preset kind, realize based on Spark platform into According to statistics, compared with the existing technology, when business or statistical rules change, developer need to only modify data statistics configuration to line number Configuration information in file carries out data system based on Spark platform to improve without developing new statistics codes again The source code flexible of meter reduces the exploitation maintenance cost of code.
The present invention provides a kind of data statistical approach.
As shown in Figure 1, Fig. 1 is the device structure schematic diagram for the hardware running environment that the embodiment of the present invention is related to.
Data statistics equipment of the embodiment of the present invention can be server, PC machine or virtual machine facility.
As shown in Figure 1, the data statistics equipment may include: processor 1001, such as CPU, network interface 1004, user Interface 1003, memory 1005, communication bus 1002.Wherein, communication bus 1002 is for realizing the connection between these components Communication.User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), optional user Interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 optionally may include having for standard Line interface, wireless interface (such as WI-FI interface).Memory 1005 can be high speed RAM memory, be also possible to stable storage Device (non-volatile memory), such as magnetic disk storage.Memory 1005 optionally can also be independently of aforementioned processing The storage device of device 1001.
It will be understood by those skilled in the art that device structure shown in Fig. 1 does not constitute the restriction to equipment, can wrap It includes than illustrating more or fewer components, perhaps combines certain components or different component layouts.
As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium Believe module, Subscriber Interface Module SIM and data statistics program.
In data statistics equipment shown in Fig. 1, network interface 1004 is mainly used for connecting background server, takes with backstage Business device carries out data communication;User interface 1003 is mainly used for connecting client (user terminal), carries out data communication with client; And processor 1001 can be used for calling the data statistics program stored in memory 1005, and execute following operation:
Obtain original data stream to be counted;
Preset data statistics configuration file is called and parsed according to the original data stream, obtains the to be generated of preset kind At the pretreatment category information and statistical condition information of statistical report form;
The original data stream is pre-processed according to the pretreatment category information, obtains corresponding statistics metadata;
The statistics metadata is counted according to the statistical condition information, corresponding system is generated according to statistical result Count report.
Further, processor 1001 can call the data statistics program stored in memory 1005, also execute following Operation:
The type of statistical report form to be generated is set, and corresponding pretreatment is set for each type of statistical report form to be generated Category information and statistical condition information;
Setting result is saved into data statistics configuration file.
Further, processor 1001 can call the data statistics program stored in memory 1005, also execute following Operation:
Message data request is sent to preset service message system;
The message data stream that the service message system is returned based on message data request is received, by the message count According to stream as original data stream to be counted.
Further, processor 1001 can call the data statistics program stored in memory 1005, also execute following Operation:
The original data stream is divided into several pretreatment classes according to the separator in the pretreatment category information;
The original data stream under each pretreatment class is formatted according to preset format transformation rule, is obtained pair The statistics metadata answered.
Further, processor 1001 can call the data statistics program stored in memory 1005, also execute following Operation:
Logical operation is carried out to the statistics metadata according to the logical operation expression formula in the statistical condition information;
Corresponding statistical report form is generated according to the logic operation result.
Further, processor 1001 can call the data statistics program stored in memory 1005, also execute following Operation:
The statistical report form of generation is stored into the database of Spark platform.
Further, processor 1001 can call the data statistics program stored in memory 1005, also execute following Operation:
The modification instruction for receiving user, modifies to the data statistics configuration file according to modification instruction.
Based on above-mentioned hardware configuration, data statistical approach embodiment of the present invention is proposed.
It is the flow diagram of data statistical approach first embodiment of the present invention, the data statistics side referring to Fig. 2, Fig. 2 Method includes:
Step S10 obtains original data stream to be counted;
Data statistical approach of the present invention is applied to Spark platform, and Spark platform is that a kind of currently a popular big data calculates With statistics platform, by being calculated big data and being counted, Spark platform can be realized various machine learning and data are dug Pick.
In the present embodiment, firstly, Spark platform obtains original data stream to be counted, as an implementation, step Suddenly S10 may include:
Step S11 sends message data request to preset service message system;
Step S12 receives the message data stream that the service message system is returned based on message data request, by institute Message data stream is stated as original data stream to be counted.
Above-mentioned preset service message system can be Apache Kafka, and Apache Kafka, that is, distributed post-is ordered Read message system, mainly for the treatment of active stream data, Apache Kafka compared with conventional message system, have it is following not Same: 1) it is designed to a distributed system, is easy to extend to the outside;2) it provides high-throughput simultaneously for publication and subscription; 3) it supports more subscribers, can autobalance consumer when failure;4) it therefore can be used for batch by message duration to disk Amount consumption.Since Apache Kafka has above-mentioned advantage, it is widely used in Message Processing at present.
Specifically, Spark platform is sent message data to Kafka and request and receive Kafka to be asked based on the message data The message data stream of return is sought, then using the message data stream as original data stream to be counted.Certainly, above-mentioned preset industry Business message system can not also be Apache Kafka, for example can also disappear for business such as RabbitMQ, Apache ActiveMQ Breath system, when specific implementation, can carry out flexible setting.
Step S20 calls according to the original data stream and parses preset data statistics configuration file, obtains default class The pretreatment category information and statistical condition information of the statistical report form to be generated of type;
In specific implementation, different data statistics configuration files can be set for different service message systems in advance, so The data statistics configuration file of setting is stored in Spark platform afterwards, wherein the format of data statistics configuration file can wrap Ini, xml etc. are included, is not construed as limiting herein.
After getting original data stream, Spark platform calls corresponding data statistics to configure text according to original data stream Part, and data statistics configuration file is parsed, obtain the statistical report form to be generated of preset kind pretreatment category information and Statistical condition information.Wherein, the type and quantity of statistical report form to be generated can be needed to carry out flexible by administrative staff according to business Setting, for each statistical report form to be generated comprising pretreatment category information and statistical condition information, pretreatment category information represent The pretreatment of original data stream configures, including title and quantity, preprocessing rule, the pretreatment separator etc. for pre-processing class, system Meter conditional information represents the arithmetic logic configuration of data statistics, including operand, operation expression etc..
Step S30 pre-processes the original data stream according to the pretreatment category information, obtains corresponding statistics Metadata;
In the step, Spark platform pre-processes original data stream according to the pretreatment category information got, obtains Corresponding statistics metadata.Specifically, Spark platform can be first according to the separator in pretreatment category information to original data stream It is split, then formats the original data stream after segmentation, to obtain different types of statistical report form to be generated Corresponding statistics metadata.It in addition to this, can also include that data are carried out with logical check, sieve to the pretreatment of original data stream Choosing, cleaning etc., these pretreatments can be used in combination with above-mentioned pretreating scheme, so that the statistics metadata got is simpler It is clean, accurate, it is convenient for subsequent statistical analysis.
Step S40 counts the statistics metadata according to the statistical condition information, is generated according to statistical result Corresponding statistical report form.
After getting statistics metadata, Spark platform is further according to statistical condition information to the statistics metadata got It is counted, and corresponding statistical report form is generated according to statistical result.Specifically, Spark platform can be believed according to statistical condition Logical operation expression formula in breath carries out logical operation to statistics metadata, to generate corresponding system according to logic operation result Count report, wherein statistical condition can carry out flexible setting and modification, for example, statistical condition can also include to the first number of statistics According to execution classification and marking etc..
Referring to Fig. 3, Fig. 3 is flow diagram for statistical analysis to Original CDR in the embodiment of the present invention.Original words Single original data stream got from business message system, Spark platform is first according to pre- in data statistics configuration file Processing class pre-processes Original CDR, generates corresponding pretreatment report respectively, i.e. statistics metadata, then further according to number The statistical condition in configuration file carries out logical operation to statistics metadata according to statistics, to generate corresponding statistical report form, finally, Statistical report form is stored into the database of Spark platform.
It should be noted that administrative staff are not necessarily to develop new statistics codes again in newly-increased a kind of business statistics, and It only needs to make above-mentioned pretreatment category information and statistical condition information corresponding modification, for example increases or delete pretreatment class, repair Change the logical operation expression formula etc. in statistical condition, implements more convenient, and flexibility is higher.
In the present embodiment, Spark platform obtains original data stream to be counted;It is called simultaneously according to the original data stream Preset data statistics configuration file is parsed, the pretreatment category information and statistics item of the statistical report form to be generated of preset kind are obtained Part information;The original data stream is pre-processed according to the pretreatment category information, obtains corresponding statistics metadata;Root The statistics metadata is counted according to the statistical condition information, corresponding statistical report form is generated according to statistical result.This Default class is written by the way that data statistics configuration file is arranged in Spark platform in embodiment in data statistics configuration file The pretreatment category information and statistical condition information of the statistical report form to be generated of type realize and carry out data system based on Spark platform Meter, compared with the existing technology, when business or statistical rules change, developer need to only be modified in data statistics configuration file Configuration information, without developing new statistics codes again, to improve the code for carrying out data statistics based on Spark platform Flexibility reduces the exploitation maintenance cost of code.
It further, is the flow diagram of data statistical approach second embodiment of the present invention referring to Fig. 4, Fig. 4.Based on upper Embodiment shown in Fig. 2 is stated, before step S10, can also include:
Step S50 is arranged the type of statistical report form to be generated, and corresponds to for each type of statistical report form setting to be generated Pretreatment category information and statistical condition information;
Step S60 saves setting result into data statistics configuration file.
In this embodiment, Spark platform can show interactive interface, carry out phase to data statistics configuration file for administrative staff It should be arranged.Specifically, the type of statistical report form to be generated can be arranged according to the setting instruction of administrative staff in Spark platform, and is Corresponding pretreatment category information and statistical condition information is arranged in each type of statistical report form to be generated, wherein statistics to be generated The type and quantity of report can need to carry out flexible setting according to business by administrative staff, such as purchasing by group business, administrator Member can be directed to its statistical report form that multiple types are set, as group purchase user gender statistical report form, goods browse amount statistical report form, at Its corresponding pretreatment class need to be arranged for each statistical report form to be generated in friendship amount statistical report form, closing time statistical report form etc. Information and statistical condition information, pretreatment category information represent the pretreatment configuration of original data stream, the title including pre-processing class With quantity, preprocessing rule, pretreatment separator etc., statistical condition information represents the arithmetic logic configuration of data statistics, including Operand, operation expression etc..Later, Spark platform is saved result is arranged into data statistics configuration file.
The present embodiment is that different types is arranged in statistical report form to be generated, and is arranged every kind according to the difference of type of service Pretreatment category information and statistical condition information preservation under statistical report form to be generated is subsequent into data statistics configuration file Spark platform carries out data statistics and provides premise.
It further, is the flow diagram of data statistical approach 3rd embodiment of the present invention referring to Fig. 5, Fig. 5.Based on upper Embodiment shown in Fig. 2 is stated, step S30 may include:
It is pre- to be divided into several according to the separator in the pretreatment category information by step S31 for the original data stream Handle class;
Step S32 carries out format to the original data stream under each pretreatment class according to preset format transformation rule and turns It changes, obtains corresponding statistics metadata.
Step S40 may include:
Step S41 patrols the statistics metadata according to the logical operation expression formula in the statistical condition information Collect operation;
Step S42 generates corresponding statistical report form according to the logic operation result.
In the present embodiment, pre-processing includes the separator configured by splitter (classifier) in category information, and Spark is flat Original data stream is divided into several pretreatment classes according to the separator by platform, and is each pretreatment class configuration service logical name Claim, then the original data stream under each pretreatment class is formatted according still further to preset format transformation rule, is obtained Corresponding statistics metadata, wherein format transformation rule can be by plain text be converted to xml format, json format or Form format etc., so that original data stream meets the parameter format of set data statistics interface defined, after pretreatment, Original data stream is i.e. as statistics metadata for subsequent statistical analysis.
It further, include logical operation expression formula in statistical condition information, for statistical analysis to statistics metadata When, Spark platform carries out logical operation to statistics metadata according to logical operation expression formula, and logical operation includes but is not limited to | |, && ,+,-, * ,/, % ,==,!=,>,>=,<,≤, (), the syntax rules such as in, if ... else, logic operation result The including but not limited to data types such as int, string, later, Spark platform generate corresponding statistics according to logic operation result Report.
Further, after step S42, can also include:
Step S43 stores the statistical report form of generation into the database of Spark platform.
In the step, Spark platform stores the statistical report form of generation to platform database, i.e., enters statistical report form Library processing, checks and is analyzed convenient for administrative staff.
It further, is the flow diagram of data statistical approach fourth embodiment of the present invention referring to Fig. 6, Fig. 6.Based on upper Embodiment shown in Fig. 2 is stated, after the step s 40, can also include:
Step S70 receives the modification instruction of user, is carried out according to modification instruction to the data statistics configuration file Modification.
In the present embodiment, when business change or statistical rules change, administrative staff are not necessarily to develop new statistics codes, And need to only modify to data statistics configuration file, increase or delete for example, Spark platform can be instructed according to the modification of user Except the pretreatment class in data statistics configuration file, or logical operation expression formula in modification statistical condition etc., implement compared with For convenience, and flexibility is higher.In this way, subsequent Spark platform carries out data according to modified data statistics configuration file Statistics.
The present invention also provides a kind of data statistics devices.
It is the functional block diagram of one embodiment of data statistics device of the present invention referring to Fig. 7, Fig. 7.Data system of the present invention Counter device includes:
Module 10 is obtained, for obtaining original data stream to be counted;
Data statistics device of the present invention is applied to Spark platform, and Spark platform is that a kind of currently a popular big data calculates With statistics platform, by being calculated big data and being counted, Spark platform can be realized various machine learning and data are dug Pick.
In the present embodiment, firstly, obtaining module 10 obtains original data stream to be counted, as an implementation, Acquisition module 10 can send message data request to preset service message system, then receive service message system and be based on being somebody's turn to do The message data stream that message data request returns, as original data stream to be counted.
Above-mentioned preset service message system can be Apache Kafka, and Apache Kafka, that is, distributed post-is ordered Read message system, mainly for the treatment of active stream data, Apache Kafka compared with conventional message system, have it is following not Same: 1) it is designed to a distributed system, is easy to extend to the outside;2) it provides high-throughput simultaneously for publication and subscription; 3) it supports more subscribers, can autobalance consumer when failure;4) it therefore can be used for batch by message duration to disk Amount consumption.Since Apache Kafka has above-mentioned advantage, it is widely used in Message Processing at present.
Specifically, it obtains module 10 and sends message data to Kafka and request and receive Kafka to ask based on the message data The message data stream of return is sought, then using the message data stream as original data stream to be counted.Certainly, above-mentioned preset industry Business message system can not also be Apache Kafka, for example can also disappear for business such as RabbitMQ, Apache ActiveMQ Breath system, when specific implementation, can carry out flexible setting.
Parsing module 20 is called, for preset data statistics configuration text to be called and parsed according to the original data stream Part obtains the pretreatment category information and statistical condition information of the statistical report form to be generated of preset kind;
In specific implementation, different data statistics configuration files can be set for different service message systems in advance, so The data statistics configuration file of setting is stored in Spark platform afterwards, wherein the format of data statistics configuration file can wrap Ini, xml etc. are included, is not construed as limiting herein.
After getting original data stream, parsing module 20 is called to call corresponding data statistics to match according to original data stream File is set, and data statistics configuration file is parsed, obtains the pretreatment class letter of the statistical report form to be generated of preset kind Breath and statistical condition information.Wherein, the type and quantity of statistical report form to be generated can be needed to carry out by administrative staff according to business Flexible setting, for each statistical report form to be generated comprising pretreatment category information and statistical condition information pre-process category information The pretreatment configuration for representing original data stream, title and quantity, preprocessing rule, pretreatment separator including pre-processing class The arithmetic logic configuration of data statistics, including operand, operation expression etc. are represented Deng, statistical condition information.
Preprocessing module 30 is obtained for being pre-processed according to the pretreatment category information to the original data stream Corresponding statistics metadata;
Preprocessing module 30 pre-processes original data stream according to the pretreatment category information got, obtains corresponding Count metadata.Specifically, preprocessing module 30 can first flow into initial data according to the separator in pretreatment category information Row segmentation, then formats the original data stream after segmentation, to obtain different types of statistical report form pair to be generated The statistics metadata answered.It in addition to this, can also include that data are carried out with logical check, sieve to the pretreatment of original data stream Choosing, cleaning etc., these pretreatments can be used in combination with above-mentioned pretreating scheme, so that the statistics metadata got is simpler It is clean, accurate, it is convenient for subsequent statistical analysis.
Statistical module 40, for being counted according to the statistical condition information to the statistics metadata, according to statistics As a result corresponding statistical report form is generated.
After getting statistics metadata, statistical module 40 is further according to statistical condition information to the statistics metadata got It is counted, and corresponding statistical report form is generated according to statistical result.Specifically, statistical module 40 can be believed according to statistical condition Logical operation expression formula in breath carries out logical operation to statistics metadata, to generate corresponding system according to logic operation result Count report, wherein statistical condition can carry out flexible setting and modification, for example, statistical condition can also include to the first number of statistics According to execution classification and marking etc..
Referring to Fig. 3, Fig. 3 is flow diagram for statistical analysis to Original CDR in the embodiment of the present invention.Original words Single original data stream got from business message system, data statistics device is first according in data statistics configuration file Pretreatment class Original CDR is pre-processed, generate corresponding pretreatment report respectively, i.e., statistics metadata, then further according to Statistical condition in data statistics configuration file carries out logical operation to statistics metadata, to generate corresponding statistical report form, most Afterwards, statistical report form is stored into the database of Spark platform.
It should be noted that administrative staff are not necessarily to develop new statistics codes again in newly-increased a kind of business statistics, and It only needs to make above-mentioned pretreatment category information and statistical condition information corresponding modification, for example increases or delete pretreatment class, repair Change the logical operation expression formula etc. in statistical condition, implements more convenient, and flexibility is higher.
In the present embodiment, it obtains module 10 and obtains original data stream to be counted;Call parsing module 20 according to described Original data stream calls and parses preset data statistics configuration file, obtains the pre- place of the statistical report form to be generated of preset kind Manage category information and statistical condition information;Preprocessing module 30 carries out the original data stream according to the pretreatment category information pre- Processing, obtains corresponding statistics metadata;Statistical module 40 carries out the statistics metadata according to the statistical condition information Statistics generates corresponding statistical report form according to statistical result.The present embodiment is configured by the way that data statistics is arranged in Spark platform File, and the pretreatment category information of the statistical report form to be generated of write-in preset kind and statistics item in data statistics configuration file Part information realizes and carries out data statistics based on Spark platform, compared with the existing technology, when business or statistical rules change, Developer need to only modify the configuration information in data statistics configuration file, without developing new statistics codes again, thus The source code flexible for carrying out data statistics based on Spark platform is improved, the exploitation maintenance cost of code is reduced.
The present invention also provides a kind of data statistics equipment.
Data statistics equipment of the present invention includes: memory, processor and is stored on the memory and can be at the place The data statistics program run on reason device, the data statistics program realize following steps when being executed by the processor:
Obtain original data stream to be counted;
Preset data statistics configuration file is called and parsed according to the original data stream, obtains the to be generated of preset kind At the pretreatment category information and statistical condition information of statistical report form;
The original data stream is pre-processed according to the pretreatment category information, obtains corresponding statistics metadata;
The statistics metadata is counted according to the statistical condition information, corresponding system is generated according to statistical result Count report.
Wherein, the data statistics program run on the processor, which is performed realized method, can refer to the present invention The each embodiment of data statistical approach, details are not described herein again.
The present invention also provides a kind of computer readable storage mediums.
Data statistics program is stored on computer readable storage medium of the present invention, the data statistics program is by processor Following steps are realized when execution:
Obtain original data stream to be counted;
Preset data statistics configuration file is called and parsed according to the original data stream, obtains the to be generated of preset kind At the pretreatment category information and statistical condition information of statistical report form;
The original data stream is pre-processed according to the pretreatment category information, obtains corresponding statistics metadata;
The statistics metadata is counted according to the statistical condition information, corresponding system is generated according to statistical result Count report.
Wherein, the data statistics program run on the processor, which is performed realized method, can refer to the present invention The each embodiment of data statistical approach, details are not described herein again.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the system that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or system institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or system.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in one as described above In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone, Computer, server, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of data statistical approach, which is characterized in that the data statistical approach includes the following steps:
Obtain original data stream to be counted;
Preset data statistics configuration file is called and parsed according to the original data stream, obtains the system to be generated of preset kind Count the pretreatment category information and statistical condition information of report;
The original data stream is pre-processed according to the pretreatment category information, obtains corresponding statistics metadata;
The statistics metadata is counted according to the statistical condition information, corresponding statistics is generated according to statistical result and is reported Table.
2. data statistical approach as described in claim 1, which is characterized in that the step for obtaining original data stream to be counted Before rapid, further includes:
The type of statistical report form to be generated is set, and corresponding pretreatment class letter is set for each type of statistical report form to be generated Breath and statistical condition information;
Setting result is saved into data statistics configuration file.
3. data statistical approach as described in claim 1, which is characterized in that the step for obtaining original data stream to be counted Suddenly include:
Message data request is sent to preset service message system;
The message data stream that the service message system is returned based on message data request is received, by the message data stream As original data stream to be counted.
4. data statistical approach as claimed any one in claims 1 to 3, which is characterized in that described according to the pretreatment The step of category information pre-processes the original data stream, obtains corresponding statistics metadata include:
The original data stream is divided into several pretreatment classes according to the separator in the pretreatment category information;
The original data stream under each pretreatment class is formatted according to preset format transformation rule, is obtained corresponding Count metadata.
5. data statistical approach as claimed in claim 4, which is characterized in that it is described according to the statistical condition information to described Statistics metadata is counted, and includes: according to the step of statistical result generation corresponding statistical report form
Logical operation is carried out to the statistics metadata according to the logical operation expression formula in the statistical condition information;
Corresponding statistical report form is generated according to the logic operation result.
6. data statistical approach as claimed in claim 5, which is characterized in that described according to logic operation result generation pair After the step of statistical report form answered, further includes:
The statistical report form of generation is stored into the database of Spark platform.
7. data statistical approach as described in claim 1, which is characterized in that the data statistical approach further include:
The modification instruction for receiving user, modifies to the data statistics configuration file according to modification instruction.
8. a kind of data statistics device, which is characterized in that the data statistics device includes:
Module is obtained, for obtaining original data stream to be counted;
Parsing module is called to obtain for preset data statistics configuration file to be called and parsed according to the original data stream The pretreatment category information and statistical condition information of the statistical report form to be generated of preset kind;
Preprocessing module obtains corresponding for being pre-processed according to the pretreatment category information to the original data stream Count metadata;
Statistical module, it is raw according to statistical result for being counted according to the statistical condition information to the statistics metadata At corresponding statistical report form.
9. a kind of data statistics equipment, which is characterized in that the data statistics equipment includes: memory, processor and is stored in On the memory and the data statistics program that can run on the processor, the data statistics program is by the processor Following steps are realized when execution:
Obtain original data stream to be counted;
Preset data statistics configuration file is called and parsed according to the original data stream, obtains the system to be generated of preset kind Count the pretreatment category information and statistical condition information of report;
The original data stream is pre-processed according to the pretreatment category information, obtains corresponding statistics metadata;
The statistics metadata is counted according to the statistical condition information, corresponding statistics is generated according to statistical result and is reported Table.
10. a kind of computer readable storage medium, which is characterized in that be stored with data system on the computer readable storage medium It has the records of distance by the log sequence, the data statistics program realizes following steps when being executed by processor:
Obtain original data stream to be counted;
Preset data statistics configuration file is called and parsed according to the original data stream, obtains the system to be generated of preset kind Count the pretreatment category information and statistical condition information of report;
The original data stream is pre-processed according to the pretreatment category information, obtains corresponding statistics metadata;
The statistics metadata is counted according to the statistical condition information, corresponding statistics is generated according to statistical result and is reported Table.
CN201710467826.2A 2017-06-19 2017-06-19 Data statistical approach, device, equipment and computer readable storage medium Pending CN109145027A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710467826.2A CN109145027A (en) 2017-06-19 2017-06-19 Data statistical approach, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710467826.2A CN109145027A (en) 2017-06-19 2017-06-19 Data statistical approach, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN109145027A true CN109145027A (en) 2019-01-04

Family

ID=64804601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710467826.2A Pending CN109145027A (en) 2017-06-19 2017-06-19 Data statistical approach, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109145027A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563686A (en) * 2017-08-22 2018-01-09 中国铁道科学研究院电子计算技术研究所 A kind of railway transportation stearic alcohol data verification method, system and storage medium
CN110032584A (en) * 2019-03-28 2019-07-19 莆田学院 A kind of data statistical approach and system
CN111068328A (en) * 2019-11-19 2020-04-28 深圳市其乐游戏科技有限公司 Game advertisement configuration table generation method, terminal device and medium
CN111414395A (en) * 2020-03-27 2020-07-14 中国平安财产保险股份有限公司 Data processing method, system and computer equipment
CN112364090A (en) * 2020-11-03 2021-02-12 杭州数梦工场科技有限公司 Data attribute display method and device and electronic equipment
CN113158633A (en) * 2021-04-16 2021-07-23 浙江鸿程计算机***有限公司 Statistical report processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101610531A (en) * 2009-07-15 2009-12-23 中兴通讯股份有限公司 Ticket information performance statistical method and device thereof
CN105207794A (en) * 2014-06-05 2015-12-30 中兴通讯股份有限公司 Statistics counting equipment and realization method thereof, and system with statistics counting equipment
CN106406858A (en) * 2016-08-30 2017-02-15 国电南瑞科技股份有限公司 Streaming type statistical definition and operation method based on configuration file
CN106599120A (en) * 2016-12-01 2017-04-26 中国联合网络通信集团有限公司 Stream processing framework-based data processing method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101610531A (en) * 2009-07-15 2009-12-23 中兴通讯股份有限公司 Ticket information performance statistical method and device thereof
US20120173555A1 (en) * 2009-07-15 2012-07-05 Zte Corporation Method and device for realizing the statistic operation of call record information performance
CN105207794A (en) * 2014-06-05 2015-12-30 中兴通讯股份有限公司 Statistics counting equipment and realization method thereof, and system with statistics counting equipment
CN106406858A (en) * 2016-08-30 2017-02-15 国电南瑞科技股份有限公司 Streaming type statistical definition and operation method based on configuration file
CN106599120A (en) * 2016-12-01 2017-04-26 中国联合网络通信集团有限公司 Stream processing framework-based data processing method and apparatus

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563686A (en) * 2017-08-22 2018-01-09 中国铁道科学研究院电子计算技术研究所 A kind of railway transportation stearic alcohol data verification method, system and storage medium
CN110032584A (en) * 2019-03-28 2019-07-19 莆田学院 A kind of data statistical approach and system
CN110032584B (en) * 2019-03-28 2021-07-02 莆田学院 Data statistical method and system
CN111068328A (en) * 2019-11-19 2020-04-28 深圳市其乐游戏科技有限公司 Game advertisement configuration table generation method, terminal device and medium
CN111414395A (en) * 2020-03-27 2020-07-14 中国平安财产保险股份有限公司 Data processing method, system and computer equipment
CN111414395B (en) * 2020-03-27 2024-04-30 中国平安财产保险股份有限公司 Data processing method, system and computer equipment
CN112364090A (en) * 2020-11-03 2021-02-12 杭州数梦工场科技有限公司 Data attribute display method and device and electronic equipment
CN113158633A (en) * 2021-04-16 2021-07-23 浙江鸿程计算机***有限公司 Statistical report processing method and device

Similar Documents

Publication Publication Date Title
CN109145027A (en) Data statistical approach, device, equipment and computer readable storage medium
US10389592B2 (en) Method, system and program product for allocation and/or prioritization of electronic resources
US11567959B2 (en) Self-contained files for generating a visualization of query results
US9110945B2 (en) Support for a parameterized query/view in complex event processing
CN105320572B (en) Browser abnormality eliminating method, apparatus and system
US8498956B2 (en) Techniques for matching a certain class of regular expression-based patterns in data streams
CN106897215A (en) A kind of method gathered based on WebView webpages loading performance and user behavior flow data
CN109379326B (en) XML message rule checking method, equipment and storage medium
AU2017348460A1 (en) Systems and methods for monitoring and analyzing computer and network activity
US10311043B2 (en) Log query user interface
US11875275B1 (en) Custom time series models in computer analytics systems
CN109165053B (en) Application software menu configuration method, mobile terminal and computer readable storage medium
CN110427188A (en) It is single to survey configuration method, device, equipment and the storage medium for asserting program
US11816573B1 (en) Robust systems and methods for training summarizer models
CN106406844A (en) A method and a device for realizing a communication interaction platform official account menu
CN112559301A (en) Service processing method, storage medium, processor and electronic device
CN110020261A (en) Document converts sharing method, device, equipment and readable storage medium storing program for executing
CN110442803A (en) Data processing method, device, medium and the calculating equipment executed by calculating equipment
CN109408577B (en) ORACLE database JSON analysis method, system, device and storable medium
CN116450723A (en) Data extraction method, device, computer equipment and storage medium
CN110297976A (en) Recommended method, device, equipment and readable storage medium storing program for executing based on cloud retrieval
CN114297057A (en) Design and use method of automatic test case
CN108984221A (en) A kind of acquisition method and device of multi-platform User action log
CN110308931A (en) A kind of data processing method and relevant apparatus
CN110309062A (en) Case generation method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination