Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.
The terminal may be implemented in various forms. For example, the terminal described in the present invention may include a mobile terminal such as a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), a navigation device, a wearable device, a smart band, a pedometer, and the like, and a fixed terminal such as a Digital TV, a desktop computer, and the like.
The following description will be given by way of example of a mobile terminal, and it will be understood by those skilled in the art that the construction according to the embodiment of the present invention can be applied to a fixed type terminal, in addition to elements particularly used for mobile purposes.
Referring to fig. 1, which is a schematic diagram of a hardware structure of a mobile terminal for implementing various embodiments of the present invention, the mobile terminal 100 may include: RF (Radio Frequency) unit 101, WiFi module 102, audio output unit 103, a/V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111. Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 1 is not intended to be limiting of mobile terminals, which may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
The following describes each component of the mobile terminal in detail with reference to fig. 1:
the radio frequency unit 101 may be configured to receive and transmit signals during information transmission and reception or during a call, and specifically, receive downlink information of a base station and then process the downlink information to the processor 110; in addition, the uplink data is transmitted to the base station. Typically, radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 can also communicate with a network and other devices through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), CDMA2000(Code Division Multiple Access 2000), WCDMA (Wideband Code Division Multiple Access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access), FDD-LTE (Frequency Division duplex-Long Term Evolution), and TDD-LTE (Time Division duplex-Long Term Evolution).
WiFi belongs to short-distance wireless transmission technology, and the mobile terminal can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 102, and provides wireless broadband internet access for the user. Although fig. 1 shows the WiFi module 102, it is understood that it does not belong to the essential constitution of the mobile terminal, and may be omitted entirely as needed within the scope not changing the essence of the invention.
The audio output unit 103 may convert audio data received by the radio frequency unit 101 or the WiFi module 102 or stored in the memory 109 into an audio signal and output as sound when the mobile terminal 100 is in a call signal reception mode, a call mode, a recording mode, a voice recognition mode, a broadcast reception mode, or the like. Also, the audio output unit 103 may also provide audio output related to a specific function performed by the mobile terminal 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 103 may include a speaker, a buzzer, and the like.
The a/V input unit 104 is used to receive audio or video signals. The a/V input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, the Graphics processor 1041 Processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 106. The image frames processed by the graphic processor 1041 may be stored in the memory 109 (or other storage medium) or transmitted via the radio frequency unit 101 or the WiFi module 102. The microphone 1042 may receive sounds (audio data) via the microphone 1042 in a phone call mode, a recording mode, a voice recognition mode, or the like, and may be capable of processing such sounds into audio data. The processed audio (voice) data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 101 in case of a phone call mode. The microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the course of receiving and transmitting audio signals.
The mobile terminal 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 1061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 1061 and/or a backlight when the mobile terminal 100 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.
The display unit 106 is used to display information input by a user or information provided to the user. The Display unit 106 may include a Display panel 1061, and the Display panel 1061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.
The user input unit 107 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the mobile terminal. Specifically, the user input unit 107 may include a touch panel 1071 and other input devices 1072. The touch panel 1071, also referred to as a touch screen, may collect a touch operation performed by a user on or near the touch panel 1071 (e.g., an operation performed by the user on or near the touch panel 1071 using a finger, a stylus, or any other suitable object or accessory), and drive a corresponding connection device according to a predetermined program. The touch panel 1071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 110, and can receive and execute commands sent by the processor 110. In addition, the touch panel 1071 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 1071, the user input unit 107 may include other input devices 1072. In particular, other input devices 1072 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like, and are not limited to these specific examples.
Further, the touch panel 1071 may cover the display panel 1061, and when the touch panel 1071 detects a touch operation thereon or nearby, the touch panel 1071 transmits the touch operation to the processor 110 to determine the type of the touch event, and then the processor 110 provides a corresponding visual output on the display panel 1061 according to the type of the touch event. Although the touch panel 1071 and the display panel 1061 are shown in fig. 1 as two separate components to implement the input and output functions of the mobile terminal, in some embodiments, the touch panel 1071 and the display panel 1061 may be integrated to implement the input and output functions of the mobile terminal, and is not limited herein.
The interface unit 108 serves as an interface through which at least one external device is connected to the mobile terminal 100. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 108 may be used to receive input (e.g., data information, power, etc.) from external devices and transmit the received input to one or more elements within the mobile terminal 100 or may be used to transmit data between the mobile terminal 100 and external devices.
The memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 109 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
The processor 110 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by operating or executing software programs and/or modules stored in the memory 109 and calling data stored in the memory 109, thereby performing overall monitoring of the mobile terminal. Processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.
The mobile terminal 100 may further include a power supply 111 (e.g., a battery) for supplying power to various components, and preferably, the power supply 111 may be logically connected to the processor 110 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system.
Although not shown in fig. 1, the mobile terminal 100 may further include a bluetooth module or the like, which is not described in detail herein.
In order to facilitate understanding of the embodiments of the present invention, a communication network system on which the mobile terminal of the present invention is based is described below.
Referring to fig. 2, fig. 2 is an architecture diagram of a communication Network system according to an embodiment of the present invention, where the communication Network system is an LTE system of a universal mobile telecommunications technology, and the LTE system includes a UE (User Equipment) 201, an E-UTRAN (Evolved UMTS Terrestrial Radio Access Network) 202, an EPC (Evolved Packet Core) 203, and an IP service 204 of an operator, which are in communication connection in sequence.
Specifically, the UE201 may be the terminal 100 described above, and is not described herein again.
The E-UTRAN202 includes eNodeB2021 and other eNodeBs 2022, among others. Among them, the eNodeB2021 may be connected with other eNodeB2022 through backhaul (e.g., X2 interface), the eNodeB2021 is connected to the EPC203, and the eNodeB2021 may provide the UE201 access to the EPC 203.
The EPC203 may include an MME (Mobility Management Entity) 2031, an HSS (Home Subscriber Server) 2032, other MMEs 2033, an SGW (Serving gateway) 2034, a PGW (PDN gateway) 2035, and a PCRF (Policy and charging functions Entity) 2036, and the like. The MME2031 is a control node that handles signaling between the UE201 and the EPC203, and provides bearer and connection management. HSS2032 is used to provide registers to manage functions such as home location register (not shown) and holds subscriber specific information about service characteristics, data rates, etc. All user data may be sent through SGW2034, PGW2035 may provide IP address assignment for UE201 and other functions, and PCRF2036 is a policy and charging control policy decision point for traffic data flow and IP bearer resources, which selects and provides available policy and charging control decisions for a policy and charging enforcement function (not shown).
The IP services 204 may include the internet, intranets, IMS (IP Multimedia Subsystem), or other IP services, among others.
Although the LTE system is described as an example, it should be understood by those skilled in the art that the present invention is not limited to the LTE system, but may also be applied to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA, and future new network systems.
Based on the above mobile terminal hardware structure and communication network system, the present invention provides various embodiments of the method.
As shown in fig. 3, a first embodiment of the present invention provides a method for data collection auditing, which includes the following steps:
step S1: collecting audit data: and carrying out daily data audit statistics on the generation times of each event on the client to obtain corresponding audit data, and locally storing the audit data.
Specifically, when a system preset event and a user-defined event are generated on a client, the occurrence frequency of each event is respectively counted according to the event name, and then the counting result is written into a local storage of a corresponding date and a corresponding user. The preset events of the system may include events such as startup, session, install, etc., considering that the special type event of session needs to be started next time to generate, and the event is not determined (may not be the current day), and for this type of data, the audit data needs to be reported again to update the historical audit data of the current day of the event.
The custom event is an event newly defined by each software client for performing relevant data statistics, such as a song requesting behavior of a music player, address search of map software, and the like.
Step S2: and reporting audit data: and packaging and sending the audit data to a data acquisition server side along with the daily data of the client side based on the existing reporting strategy of the daily data of the client side.
Specifically, the client has a corresponding reporting policy (such as real-time, timing interval, batch, etc.) originally, so that the client reports the generated daily data to the data acquisition server. The reporting of the audit data takes the flow and performance problems of the client into consideration, so that the data cannot be frequently sent, and the audit data can be packaged and sent to the data acquisition server side along with the daily data of the client by using the original reporting strategy of the client. Generally speaking, audit process data are 4-5 times a day approximately, the ultimate time of a user using the mobile terminal is 12-15 hours, therefore, audit data are merged at a positioning interval of 3 hours, 4-5 times are carried out every day, and the merged audit data and the daily data of the client are packed and sent to the data acquisition server side.
Step S3: storing audit data: and after the daily data and the audit data are separated and processed on the data acquisition server side, the daily data and the audit data are stored in a subarea mode according to the reporting date.
Specifically, when the step of depositing audit data is performed: firstly, after separating the daily data and the audit data through a flash system (flash is a high-availability, high-reliability and distributed system for collecting, aggregating and transmitting mass logs provided by Cloudera, and supports various data senders customized in the log system for collecting data, and simultaneously, flash provides the capability of simply processing the data and writing the data to various data receivers (customizable)) on a data acquisition server, the daily data and the audit data are stored into a partition corresponding to an HDFS system (i.e., a Hadoop distributed file system, which is designed to be suitable for a distributed file system running on general hardware (comfort hardware), is a highly fault-tolerant system, can provide high-throughput data access, is very suitable for application on a large-scale data set, and can realize streaming access (streaming access) to data in the file system) according to a reporting date.
Secondly, a daily statistical task can be set, audit data stored in the HDFS system is imported into a Hive table according to a reporting date (Hive is a data warehouse infrastructure established on Hadoop and provides a series of tools which can be used for extracting, converting and loading (ETL), which is a mechanism capable of storing, inquiring and analyzing large-scale data stored in Hadoop. Hive defines a simple SQL-like query language called HQL and allows users familiar with SQL to query data, meanwhile, the language also allows mappers and reducers familiar to development and customization of MapReduce developers to process complex analysis work which cannot be completed by built-in mappers and reducers, Hive has no special data format, Hive can well work on Thrift, controls separators and also allows users to specify data formats) corresponding partitions to obtain a corresponding statistical day table, and audit is carried out through the daily statistical day table, the daily total audit data reported by the client can be quickly inquired.
Step S4: comparing the original audit data: and setting a daily comparison task on the data acquisition server side to compare the consistency between the audit data reported by the client side in the previous day and the actual statistical data of the data acquisition server side, and giving an early warning when the comparison result is inconsistent.
Specifically, the total audit data of the previous day reported by the client can be quickly inquired through the audit statistical daily table. And when the data acquisition server side performs daily timing comparison, comparing the consistency between the audit data of the audit statistical schedule and the actual statistical data of the data acquisition server side in the previous day, and giving early warning when the comparison result is inconsistent. Because the data acquisition server end can also count the user behaviors of the client end, namely, when a system preset event and a user-defined event are generated on the client end each time, the data acquisition server end can synchronously count according to the data fed back, namely, the data acquisition server end actually counts data, and the actually counted data often causes data loss due to reasons of network, overtime of data acquisition time, bug of software and the like, if the audit data of the audit statistics schedule and the actually counted data of the data acquisition server end are inconsistent in the process of comparing the audit data of the audit statistics schedule in the previous day with the actually counted data of the data acquisition server end, the problem that the data are possibly lost at the client end is explained, or the defect of self SDK design occurs at the client end, at the moment, whether the data loss occurs at the client end occurs or not can be timely found through early warning, so that the related data restoration can be carried out subsequently according to the early warning, meanwhile, the client is helped to find the defects on the SDK design so that the developer can timely make corresponding maintenance.
For example, when the number of session data items of the app store 2017, 2, month, and 14 days is to be audited, a select count (audio _ session) from audio while ap pid and ds 2017-02-14 ' (without considering deduplication) may be used for comparison with a select count (1) from events2 w her ap pid and ds 2017-02-14 ' and an xdwhhat sta ═ session '. And if the comparison results are not consistent, early warning can be carried out.
Step S5: comparing the audit data after cleaning: and respectively carrying out duplicate removal calculation on audit data reported by the client and actual statistical data of the server in the previous day, then carrying out consistency comparison on the audit data and the actual statistical data, and giving an early warning when the comparison results are inconsistent.
Specifically, the audit data reported by the client and the actual statistical data of the server in the previous day can be further subjected to duplicate removal calculation, then consistency comparison is performed on the audit data and the actual statistical data, and early warning is given when comparison results are inconsistent.
For example, when auditing the active user data of the application store 2017, 2, month and 14 days, the number of sessions (one user only counts one session) can be calculated according to the duplicate removal of the user for the auditing data of the application store on the same day, and the result is compared with the data of the statistical background. And if the comparison results are not consistent, early warning can be carried out.
As shown in fig. 4, a second embodiment of the present invention provides a system 300 for data collection and audit, where the system 300 includes an audit data collection module 310, an audit data reporting module 320, an audit data storage module 330, an original audit data comparison module 340, and a cleaned audit data comparison module 350.
The audit data collection module 310 is mainly used for performing daily data audit statistics on the generation times of each event on the client to obtain corresponding audit data, and performing local storage.
Specifically, the audit data collection module 310 respectively counts the occurrence frequency of each event according to the event name when the client generates the system preset event and the user-defined event each time, and then writes the statistical result into the local storage of the corresponding user on the corresponding date. The preset events of the system may include events such as startup, session, install, etc., considering that the special type event of session needs to be started next time to generate, and the event is not determined (may not be the current day), and for this type of data, the audit data needs to be reported again to update the historical audit data of the current day of the event. The custom event is an event newly defined by each software client for performing relevant data statistics, such as a song requesting behavior of a music player, address search of map software, and the like.
The audit data reporting module 320 is mainly used for packaging and sending the audit data and the current daily data of the client to the data acquisition server side based on the reporting strategy of the current daily data of the client.
Specifically, the client has a corresponding reporting policy (such as real-time, timing interval, batch, etc.) originally, so that the client reports the generated daily data to the data acquisition server. The reporting of the audit data takes into account the traffic and performance problems of the client, so that the data cannot be frequently transmitted, and therefore, the audit data reporting module 320 can utilize the original reporting policy of the client to package and transmit the audit data to the data acquisition server along with the daily data of the client. Generally speaking, the auditing process data is 4-5 times a day, and the ultimate time of the user using the mobile terminal is 12-15 hours, so that the auditing data reporting module 320 can position the auditing data at an interval of 3 hours for merging, and packs the merged auditing data and the daily data of the client side together for 4-5 times every day, and sends the merged auditing data and the daily data to the data acquisition server side.
The audit data storage module 330 is mainly used for performing partition storage on the daily data and the audit data according to a reporting date after the daily data and the audit data are separated on the data acquisition server, and the audit data storage module 330 includes a storage processing unit 331 and a timing statistical unit 332.
Specifically, first, the storage processing unit 331 separates the daily data from the audit data on the data acquisition server side through a Flume system (Flume is a highly available, highly reliable, distributed system for acquiring, aggregating and transmitting mass logs provided by Cloudera, Flume supports various data senders customized in the log system for collecting data, and simultaneously, Flume provides the capability of simply processing data and writing to various data receivers (customizable), the daily data and the audit data are stored into a partition corresponding to an HDFS system (i.e., a Hadoop distributed file system, which is designed to be suitable for a distributed file system running on general hardware (comfort hardware), is a highly fault-tolerant system, can provide high-throughput data access, is very suitable for application on a large-scale data set, and can realize streaming access (streaming access) to data in the file system) according to a reporting date.
Secondly, the timing statistic unit 332 may set a daily statistic task, and import the audit data stored in the HDFS system into a Hive table according to the reporting date (Hive is a data warehouse infrastructure established on Hadoop, which provides a series of tools for performing data extraction, transformation, and loading (ETL), which is a mechanism that can store, query, and analyze large-scale data stored in Hadoop. Hive defines a simple SQL-like query language, called HQL, which allows SQL-familiar user to query data, and at the same time, this language also allows MapReduce developers familiar with development and customization to process complex analysis work that cannot be completed by the built-in mappers and reducers, Hive has no special data format, Hive can work well on the threshold, control separators, and also allows users to specify data formats) in the corresponding partitions, so as to obtain the corresponding audit statistic day table, through the audit statistics daily table, daily total audit data reported by the client can be quickly inquired.
The original audit data comparison module 340 is mainly used for setting a daily comparison task on the data acquisition server, comparing the audit data reported by the client in the previous day with the actual statistical data of the data acquisition server, and giving an early warning when the comparison result is inconsistent.
Specifically, the total audit data of the previous day reported by the client can be quickly queried through the audit statistics daily table, and the original audit data comparison module 340 performs daily timing comparison on the data acquisition server to compare the consistency between the audit data of the audit statistics daily table in the previous day and the actual statistical data of the data acquisition server, and performs early warning when the comparison result is inconsistent. Because the data acquisition server also counts the user behavior of the client, namely, when a system preset event and a user-defined event are generated on the client each time, the data acquisition server synchronously counts the data fed back, namely, the data acquisition server actually counts the data, which often causes data loss due to network, overtime of data acquisition time, bug of software, and the like, if the original audit data comparison module 340 finds that the audit data of the audit statistics schedule is inconsistent with the actual statistical data of the data acquisition server in the process of comparing the audit data of the audit statistics schedule in the previous day with the actual statistical data of the data acquisition server, the problem that the data is lost at the client is explained, or the client has a defect in self SDK design, at this time, the problem that whether the data is lost at the client can be timely found through early warning, therefore, related data restoration can be carried out subsequently according to early warning, and meanwhile, the client is helped to find the defects in the SDK design, so that a developer can timely carry out corresponding maintenance.
For example, when the number of session data items of the app store 2017, 2, month, and 14 days is to be audited, a select count (audio _ session) from audio while ap pid and ds 2017-02-14 ' (without considering deduplication) may be used for comparison with a select count (1) from events2 w her ap pid and ds 2017-02-14 ' and an xdwhhat sta ═ session '. And if the comparison results are not consistent, early warning can be carried out.
The audit data comparison module 350 after cleaning is mainly used for respectively carrying out duplicate removal calculation on the audit data reported by the client and the actual statistical data of the data acquisition server in the previous day, then carrying out consistency comparison on the audit data and the actual statistical data, and giving an early warning when the comparison results are inconsistent.
Specifically, the audit data comparison module 350 after cleaning can further perform duplicate removal calculation on the audit data reported by the client and the actual statistical data of the server in the previous day, then perform consistency comparison on the audit data and the actual statistical data of the server, and perform early warning when the comparison results are inconsistent.
For example, when auditing the active user data of the application store 2017, 2, month and 14 days, the number of sessions (one user only counts one session) can be calculated according to the duplicate removal of the user for the auditing data of the application store on the same day, and the result is compared with the data of the statistical background. And if the comparison results are not consistent, early warning can be carried out.
The third embodiment of the present invention also provides a computer-readable storage medium, which stores one or more programs, which are executable by one or more processors to implement the following specific steps as shown in fig. 3:
step S1: collecting audit data: and carrying out daily data audit statistics on the generation times of each event on the client to obtain corresponding audit data, and locally storing the audit data.
Specifically, when a system preset event and a user-defined event are generated on a client, the occurrence frequency of each event is respectively counted according to the event name, and then the counting result is written into a local storage of a corresponding date and a corresponding user. The preset events of the system may include events such as startup, session, install, etc., considering that the special type event of session needs to be started next time to generate, and the event is not determined (may not be the current day), and for this type of data, the audit data needs to be reported again to update the historical audit data of the current day of the event. The custom event is an event newly defined by each software client for performing relevant data statistics, such as a song requesting behavior of a music player, address search of map software, and the like.
Step S2: and reporting audit data: and packaging and sending the audit data to a data acquisition server side along with the daily data of the client side based on the existing reporting strategy of the daily data of the client side.
Specifically, the client has a corresponding reporting policy (such as real-time, timing interval, batch, etc.) originally, so that the client reports the generated daily data to the data acquisition server. The reporting of the audit data takes the flow and performance problems of the client into consideration, so that the data cannot be frequently sent, and the audit data can be packaged and sent to the data acquisition server side along with the daily data of the client by using the original reporting strategy of the client. Generally speaking, audit process data are 4-5 times a day approximately, the ultimate time of a user using the mobile terminal is 12-15 hours, therefore, audit data are merged at a positioning interval of 3 hours, 4-5 times are carried out every day, and the merged audit data and the daily data of the client are packed and sent to the data acquisition server side.
Step S3: storing audit data: and after the daily data and the audit data are separated and processed on the data acquisition server side, the daily data and the audit data are stored in a subarea mode according to the reporting date.
Specifically, when the step of depositing audit data is performed: firstly, after separating the daily data and the audit data through a flash system (flash is a high-availability, high-reliability and distributed system for collecting, aggregating and transmitting mass logs provided by Cloudera, and supports various data senders customized in the log system for collecting data, and simultaneously, flash provides the capability of simply processing the data and writing the data to various data receivers (customizable)) on a data acquisition server, the daily data and the audit data are stored into a partition corresponding to an HDFS system (i.e., a Hadoop distributed file system, which is designed to be suitable for a distributed file system running on general hardware (comfort hardware), is a highly fault-tolerant system, can provide high-throughput data access, is very suitable for application on a large-scale data set, and can realize streaming access (streaming access) to data in the file system) according to a reporting date.
Secondly, a daily statistical task can be set, audit data stored in the HDFS system is imported into a Hive table according to a reporting date (Hive is a data warehouse infrastructure established on Hadoop and provides a series of tools which can be used for extracting, converting and loading (ETL), which is a mechanism capable of storing, inquiring and analyzing large-scale data stored in Hadoop. Hive defines a simple SQL-like query language called HQL and allows a user familiar with SQL to query data, and at the same time, the language also allows a user-defined mapper and reducer familiar with MapReduce developers to process complex analysis work which cannot be completed by the built-in mapper and reducer, Hive has no special data format, Hive can well work on Thrift, controls separators and also allows the user to specify data formats) corresponding partitions to obtain a corresponding statistical day table, and by the statistical day audit table, the daily total audit data reported by the client can be quickly inquired.
Step S4: comparing the original audit data: and setting a daily comparison task on the data acquisition server side to compare the consistency between the audit data reported by the client side in the previous day and the actual statistical data of the data acquisition server side, and giving an early warning when the comparison result is inconsistent.
Specifically, the total audit data of the previous day reported by the client can be quickly inquired through the audit statistical daily table. And when the data acquisition server side performs daily timing comparison, comparing the consistency between the audit data of the audit statistical schedule and the actual statistical data of the data acquisition server side in the previous day, and giving early warning when the comparison result is inconsistent. Because the data acquisition server end can also count the user behaviors of the client end, namely, when a system preset event and a user-defined event are generated on the client end each time, the data acquisition server end can synchronously count according to the data fed back, namely, the data acquisition server end actually counts data, and the actually counted data often causes data loss due to reasons of network, overtime of data acquisition time, bug of software and the like, if the audit data of the audit statistics schedule and the actually counted data of the data acquisition server end are inconsistent in the process of comparing the audit data of the audit statistics schedule in the previous day with the actually counted data of the data acquisition server end, the problem that the data are possibly lost at the client end is explained, or the defect of self SDK design occurs at the client end, at the moment, whether the data loss occurs at the client end occurs or not can be timely found through early warning, so that the related data restoration can be carried out subsequently according to the early warning, meanwhile, the client is helped to find the defects on the SDK design so that the developer can timely make corresponding maintenance.
For example, when the number of session data items of the app store 2017, 2, month, and 14 days is to be audited, a select count (audio _ session) from audio while ap pid and ds 2017-02-14 ' (without considering deduplication) may be used for comparison with a select count (1) from events2 w her ap pid and ds 2017-02-14 ' and an xdwhhat sta ═ session '. And if the comparison results are not consistent, early warning can be carried out.
Step S5: comparing the audit data after cleaning: and respectively carrying out duplicate removal calculation on audit data reported by the client and actual statistical data of the server in the previous day, then carrying out consistency comparison on the audit data and the actual statistical data, and giving an early warning when the comparison results are inconsistent.
Specifically, the audit data reported by the client and the actual statistical data of the server in the previous day can be further subjected to duplicate removal calculation, then consistency comparison is performed on the audit data and the actual statistical data, and early warning is given when comparison results are inconsistent.
For example, when auditing the active user data of the application store 2017, 2, month and 14 days, the number of sessions (one user only counts one session) can be calculated according to the duplicate removal of the user for the auditing data of the application store on the same day, and the result is compared with the data of the statistical background. And if the comparison results are not consistent, early warning can be carried out.
The invention provides a method and a system for data acquisition and audit and a computer readable storage medium, which are characterized in that a data acquisition and audit mechanism is added at a client to carry out daily data audit statistics on the generation times of each event on the client to obtain corresponding audit data, then a daily comparison task is set at a data acquisition server to compare the consistency between the audit data reported by the client in the previous day and actual statistical data of the server, and early warning is given when the comparison result is inconsistent, so that whether the client has the problem of data loss or not is timely found, the subsequent related data repair can be carried out according to the early warning, and meanwhile, the client is helped to find the defects of self SDK design, so that a developer can timely carry out corresponding maintenance.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.