CN110597798A

CN110597798A - Data detection method based on Thrift

Info

Publication number: CN110597798A
Application number: CN201910873984.7A
Authority: CN
Inventors: 陈隽; 毛立花; 仇力; 符文俊; 周誉淼; 王家海
Original assignee: Shandong ICity Information Technology Co., Ltd.
Current assignee: Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority date: 2019-09-17
Filing date: 2019-09-17
Publication date: 2019-12-20
Anticipated expiration: 2039-09-17
Also published as: CN110597798B

Abstract

The invention discloses a data detection method based on Thrift, and relates to the technical field of data detection; the method comprises the steps of configuring a detection scheme of a product line of a management platform where Thrift is located and product line data, calling Spark service or SQL service to perform data detection according to the detection scheme of the product line data, performing quality detection on data in various databases of each product line and feeding back detection results, achieving quality control in a data production process, providing detection reports generated by data exceeding early warning values to quality control personnel in time by setting relevant configuration rules and threshold conditions, analyzing reasons of errors, and finally improving data quality and improving satisfaction of customers.

Description

Data detection method based on Thrift

Technical Field

The invention discloses a data detection method based on Thrift, and relates to the technical field of data detection.

Background

With the popularization of digital terminal devices such as the internet, sensors and the like, various data show explosive exponential growth, and the collection and processing of the data become important points required in the digital era. Because the internet data is disordered, the difficulty and complexity of data processing of operation and maintenance personnel are increased, and the obtained data cannot be effectively mined in time, so that valuable contents are obtained, and the significance of mass data generation is lost. Therefore, it is important to check the integrity and consistency of data before data processing and mining. Meanwhile, various indexes of the data are combed out, and an acceptable error range, namely an early warning value, is defined.

The invention provides a data detection method based on Thrift, which is characterized in that a detection scheme of a product line of a management platform where the Thrift is located and data of the product line is configured, Spark service or SQL service is called according to the detection scheme of the data of the product line to carry out data detection, data in various databases of each product line can be subjected to quality detection and detection results can be fed back, quality control in the data production process is realized, a detection report generated by the data exceeding an early warning value can be timely provided for quality control personnel by setting relevant configuration rules and threshold conditions, the reason of errors is analyzed, and the satisfaction degree of data quality improvement customers is finally improved.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a data detection method based on Thrift, which utilizes Thrift communication to realize the quality detection of integrity and consistency of data in various databases of each product line and feed back the detection result for quality control in the data production process, and meanwhile, quality control personnel give analysis reports to the data exceeding the early warning value and analyze the reasons of errors, thereby finally improving the data quality and improving the satisfaction degree of customers.

The specific scheme provided by the invention is as follows:

a data detection method based on Thrift is characterized by configuring a product line of a management platform where the Thrift is located and a detection scheme of product line data, configuring a Thrift calling interface, and calling Spark service or SQL service to perform data detection according to the detection scheme of the product line data.

In the data detection method based on the Thrift, an integrity and consistency detection scheme of product line data is configured, and corresponding detection rules of the integrity and consistency detection scheme are respectively configured.

In the data detection method based on the Thrift, corresponding detection rules of an integrity and consistency detection scheme are recorded through a configuration table.

In the consistency detection scheme in the data detection method based on the Thrift, the configuration table cluster is used for representing the same type of configuration table.

According to the data detection method based on Thrift, a Spark service detection task is started according to a detection scheme of product line data, a Spark corresponding interface is called through a Thrift calling interface, and a Spark task is generated to carry out data detection in a horn-Cluster mode;

or starting an SQL service detection task according to a detection scheme of product line data, calling an interface corresponding to the SQL service through a Thrift calling interface, and carrying out data detection by the SQL service.

A data detection system based on Thrift comprises a management platform where Thrift is located,

and configuring a detection scheme of a product line and product line data of a management platform where the thread is located, configuring a thread calling interface, and calling Spark service or SQL service to perform data detection according to the detection scheme of the product line data.

The management platform in the data detection system based on the Thrift configures an integrity and consistency detection scheme of product line data, and configures corresponding detection rules of the integrity and consistency detection scheme respectively.

The management platform in the data detection system based on the Thrift records corresponding detection rules of an integrity and consistency detection scheme through the configuration table.

The management platform in the data detection system based on the Thrift starts a Spark service detection task according to a detection scheme of product line data, calls a Spark corresponding interface through a Thrift call interface, generates a Spark task, and performs data detection in a horn-Cluster mode;

or the management platform starts an SQL service detection task according to a detection scheme of product line data, calls an interface corresponding to the SQL service through a Thrift call interface, and the SQL service performs data detection.

The invention has the advantages that:

Drawings

FIG. 1 is a schematic flow chart of the operation of the system of the present invention;

FIG. 2 is a schematic diagram of the Yarn-Cluster mode of operation of the Spark service;

FIG. 3 is a schematic view of the detection mode of the present invention;

fig. 4 is a schematic diagram of a management platform remote invocation framework where the thread is located in the present invention.

Detailed Description

The invention provides a data detection method based on Thrift, which is characterized by configuring a product line of a management platform where the Thrift is located and a detection scheme of product line data, configuring a Thrift calling interface at the same time, and calling Spark service or SQL service to perform data detection according to the detection scheme of the product line data.

Meanwhile, the invention also provides a data detection system based on the Thrift, which corresponds to the method, and comprises a management platform where the Thrift is located,

The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.

By utilizing the method of the invention, a product line can be newly built and configured on a management platform where the Thrift is positioned, relevant basic information is filled in, and database resources are selected or newly added, and group members and responsible persons are selected, according to different authorities, not only the newly built product line can be configured and operated, but also other participating product lines can be managed and configured in the management platform, meanwhile, a detection scheme for configuring the data of the product line is newly built, a database and detection rules are configured, in order to meet diversified user requirements, the invention not only detects the integrity but also detects the consistency, and respectively configures the corresponding detection rules of the integrity and consistency detection schemes,

after the detection scheme is newly established, a detection task can be started, the detection task is newly established according to the detection scheme of the product line data, parameters such as a detection period and an operation mode are set, and the detection task can have five states which are respectively as follows: the method has the advantages that the method is not started, is to be executed, is stopped during execution, and is completed, the mode of separately managing the configuration detection scheme and the set task running period is adopted, so that the operations such as task optimization and the like in the later period of a user are facilitated, the five running states are set, the user can know the running state of the task in time, and the next operation is facilitated;

after a detection task is started, the state is changed from non-started state to-be-executed state, the rear end puts the task into a container to be detected, the container is traversed at regular time, an implementation class corresponding to SQL service or Spark service is called according to an interface configured through Thrift to perform detection report calculation, and a detection report is generated. According to different task types, reports of the disposable tasks and the periodic tasks are distinguished, and even if the periodic tasks are still in execution, relevant report results can be checked after the tasks generate the detection reports of the first period. The invention displays the detection report result, is not limited to only checking the completed detection task, and improves the working efficiency.

In the process, the required field detection rule is configured in the detection rule before the integrity detection scheme is newly established by configuring corresponding detection rules of the table record integrity and consistency detection scheme, and the integrity detection scheme provides rules of whether the fields in the user configuration table can be empty or not, whether the fields can be repeated or not and the like; the consistency detection scheme provides relevant rules for users to configure whether data in different time periods of the same type of configuration table are consistent, in the configuration process, selection aiming at gathering, counting or grouping of a plurality of fields is provided, so that consistency data detection reports are enriched, in the newly-established consistency detection scheme, an existing configuration table cluster is selected to represent the type of configuration table needing to be detected, if no existing configuration table cluster exists, a new table cluster can be established according to requirements, the same type of configuration table is selected for users, the configuration table cluster is newly added, user experience is improved, and meanwhile, the probability of error reporting when different types of tables are selected is reduced.

In the process, the Thrift of the management platform defines the data type and service through an Interface Definition Language (IDL), a Thrift interface definition file generates a Thrift target language by a Thrift code compiler, Java codes are used in the method, and the generated codes are responsible for realizing an RPC protocol layer and a transmission layer.

In the process, the Spark service comprises the implementation class of the corresponding interface in the thread interface project and configuration information such as a port providing calling, in order to improve the operation rate and optimize the user experience, Spark tasks generated by each detection task are submitted to a Yarn Cluster for operation, when the service is called, the project of a detection report is generated by Spark calculation and analysis according to configuration analysis tasks, and a Cluster environment required by relevant submission of the Spark tasks is also configured, namely, the Spark tasks are submitted in a Yarn-Cluster mode for a production environment, referring to fig. 2, because the machine where the Driver for submitting the tasks each time is randomly selected, the phenomenon that the network card flow of a certain machine is increased is effectively avoided.

In the process, the SQL service also comprises the implementation class of the corresponding interface in the Thrift interface project, the port for providing calling and other configuration information, and the SQL service obviously improves the SQL query efficiency of small data volume. The method utilizes Thrift communication to realize detection and feedback of detection results by using flare-cluster mode operation or SQL analysis to submit tasks to cluster for data in various databases of various product lines through Spark analysis. The integrity and consistency detection of data of multiple data sources is realized, and the data detection of different product lines can be classified according to respective product lines, different timing tasks are set, and corresponding detection reports are generated.

The system can achieve the same effect, wherein a management platform of the system can be used as a client, a Spark service is used as a server, and an SQL service is used as another server to jointly realize the whole data detection service.

The user can establish and configure a product line on a management platform where the thread is located, fill in related basic information and select or newly add database resources, select group members and responsible persons, according to different user authorities, not only can configure and operate the newly established product line, but also can manage and configure other participating product lines in the management platform, simultaneously establish a detection scheme for configuring product line data, configure a database and detection rules, and in order to meet diversified user requirements, the invention not only detects the integrity but also detects the consistency, and respectively configures the corresponding detection rules of the integrity and consistency detection schemes,

In the process, a user can configure required field detection rules in the detection rules before recording corresponding detection rules of the integrity and consistency detection scheme through the configuration table by the management platform and establishing an integrity detection scheme, wherein the integrity detection scheme provides rules of whether fields in the user configuration table can be empty or not, and whether fields in the user configuration table can be repeated or not; the consistency detection scheme provides relevant rules for users to configure whether data in different time periods of the same type of configuration table are consistent, in the configuration process, selection aiming at gathering, counting or grouping of a plurality of fields is provided, so that consistency data detection reports are enriched, in the newly-established consistency detection scheme, an existing configuration table cluster is selected to represent the type of configuration table needing to be detected, if no existing configuration table cluster exists, a new table cluster can be established according to requirements, the same type of configuration table is selected for users, the configuration table cluster is newly added, user experience is improved, and meanwhile, the probability of error reporting when different types of tables are selected is reduced.

In the process, the Thrift of the management platform defines data types and services through Interface Definition Language (IDL), a Thrift interface definition file generates Thrift target language by a Thrift code compiler, the system uses Java codes, and the generated codes are responsible for realizing an RPC protocol layer and a transmission layer.

In the process, the SQL service also comprises the implementation class of the corresponding interface in the Thrift interface project, the port for providing calling and other configuration information, and the SQL service obviously improves the SQL query efficiency of small data volume. The system of the invention utilizes the Thrift communication to realize the detection and the feedback of the detection result by using the Yarn-cluster mode operation or the SQL analysis to carry out the detection on the data in the multi-class databases of each product line by submitting the tasks through the Spark analysis to cluster. The integrity and consistency detection of data of multiple data sources is realized, and the data detection of different product lines can be classified according to respective product lines, different timing tasks are set, and corresponding detection reports are generated.

The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Claims

1. A data detection method based on Thrift is characterized in that a detection scheme of a product line and product line data of a management platform where the Thrift is located is configured, a Thrift calling interface is configured at the same time, and a Spark service or an SQL service is called according to the detection scheme of the product line data to perform data detection.

2. The method for Thrift-based data detection as claimed in claim 1, wherein integrity and consistency detection schemes for the product line data are configured, and corresponding detection rules of the integrity and consistency detection schemes are configured respectively.

3. The Thrift-based data detection method of claim 2, wherein the corresponding detection rules of the integrity and consistency detection scheme are recorded via the configuration table.

4. The method for Thrift-based data detection according to claim 3, wherein the consistency detection scheme utilizes configuration table clusters to represent the same type of configuration tables.

5. The data detection method based on the Thrift according to any one of claims 1 to 4, wherein a Spark service detection task is started according to a detection scheme of product line data, a Spark corresponding interface is called through a Thrift call interface, and a Spark task is generated to carry out data detection in a Yarn-Cluster mode;

6. A data detection system based on Thrift is characterized by comprising a management platform where the Thrift is located,

7. The Thrift-based data detection system of claim 6, wherein the management platform configures integrity and consistency detection schemes for the product line data, and configures corresponding detection rules of the integrity and consistency detection schemes, respectively.

8. The Thrift-based data detection system of claim 7, wherein the management platform records the corresponding detection rules of the integrity and consistency detection scheme via the configuration table.

9. The Thrift-based data detection system of claim 8, wherein the management platform records the corresponding detection rules of the integrity and consistency detection scheme via the configuration table.

10. The data detection system based on the Thrift as claimed in any one of claims 6 to 9, wherein the management platform starts a Spark service detection task according to a detection scheme of product line data, calls a Spark corresponding interface through a Thrift call interface, generates a Spark task, and performs data detection in a Yarn-Cluster mode;