CN117033405A

CN117033405A - Processing method and device of data probe request, processor and electronic equipment

Info

Publication number: CN117033405A
Application number: CN202311029243.3A
Authority: CN
Inventors: 雷经纬; 徐嘉禛; 于子烨; 罗响
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-08-15
Filing date: 2023-08-15
Publication date: 2023-11-10

Abstract

The application discloses a processing method, a device, a processor and electronic equipment of a data probe request, wherein the method is applied to the technical field of big data and comprises the following steps: under the condition that a data probe request is received, determining a data probe task according to a request source of the data probe request; determining a data exploration algorithm according to the data exploration task and/or the data exploration request; determining a data exploration plan according to a data exploration algorithm and a data exploration task; and performing data exploration in the large data cluster according to the data exploration plan to obtain a data exploration result. The application solves the problems that the prior art adopts the traditional data exploration tool to design only aiming at specific data types and a certain amount of data volume and cannot adapt to the data exploration requirement of large-scale heterogeneous data due to huge data volume and various data structures of the large-scale data clusters.

Description

Processing method and device of data probe request, processor and electronic equipment

Technical Field

The present application relates to the field of big data technologies, and in particular, to a method, an apparatus, a processor, and an electronic device for processing a data probe request.

Background

With the gradual deep large-scale application of big data in the financial and technological industry, the requirement of the financial and technological industry for data research, development and operation standardization is urgent. The prior art financial institution's investigation and pretreatment of data composition and quality includes the following steps: firstly, carrying out basic condition statistics analysis on data concerned by business, for example, carrying out null field statistics, extremum statistics, data distribution statistics and the like on the business data through a relational database; then, checking the missing and abnormal values of the data, for example, checking whether the data contains missing content by using a pivot table or descriptive metering, and if so, adopting a method of filling or deleting the missing values; secondly, verifying and correcting the consistency of the data, for example, according to the characteristics of the data and service requirements, ensuring that the information such as the name, address, telephone number and the like of the customer in the data stored by different service systems of the financial institutions are kept consistent, namely, carrying out consistency check on the data; and finally, developing data research and development or data treatment according to the data exploration report, and developing data workflow research and development or data quality treatment according to the data exploration report by data research and development personnel after finishing data sampling preview, composition analysis and quality detection so as to realize lean, large-scale and sustainable service flow lines.

However, since the storage and calculation of large data in the financial and technological industry are usually supported by ultra-large-scale heterogeneous large data clusters, the conventional data exploration tool is often customized only for specific data types and data volumes, and cannot adapt to the requirement of large-scale heterogeneous data processing, so that the problem of low efficiency of batch data processing of the conventional data exploration tool and method is caused.

Aiming at the problems that in the related technology, due to huge data volume and various data structures of a large data cluster, a traditional data exploration tool is only designed for specific data types and a certain amount of data volume, and the data exploration requirement of large-scale heterogeneous data cannot be met, no effective solution is proposed at present.

Disclosure of Invention

The application mainly aims to provide a processing method, a device, a processor and electronic equipment for a data exploration request, which are used for solving the problems that the traditional data exploration tool is only designed for specific data types and a certain amount of data, and cannot meet the data exploration requirement of large-scale heterogeneous data due to huge data volume and various data structures of a large data cluster in the related technology.

To achieve the above object, according to one aspect of the present application, there is provided a method of processing a data probe request, the method comprising: under the condition that a data exploration request is received, determining a data exploration task according to a request source of the data exploration request, wherein the data exploration task is used for exploring abnormal data in service data or processing the service data; determining a data exploration algorithm according to the data exploration task and/or the data exploration request; determining a data exploration plan according to the data exploration algorithm and the data exploration task; and performing data exploration in the large data cluster according to the data exploration plan to obtain a data exploration result.

Further, determining the data probe task according to the request source of the data probe request includes: determining a data exploration range according to a request source of the data exploration request; obtaining target data information according to the data to be probed in the data probing range; and generating the data exploration task according to the target data information.

Further, the target data information includes at least the following information: the method comprises the steps of target cluster information of data to be probed, target database information of the data to be probed, target data table information of the data to be probed, submitting time of a data probing task and running state information of the data probing task.

Further, the request source of the data probe request at least comprises one of the following: the step of determining the data exploration scope according to the request source of the data exploration request comprises the following steps of: in the case where the request source of the data probe request is the timed batch job, the data probe range includes at least one of: the system comprises a hot spot table and a preset data table, wherein the hot spot table refers to a table with access times exceeding preset times; in the case where the request source of the data probe request is the asynchronous near real-time query, the data probe scope includes at least the target data table, wherein the target data table is a data table in which the data probe request indicates a query.

Further, determining a data probe algorithm according to the data probe task and/or the data probe request comprises: determining a target database in which the data to be probed are located according to the target database information; under the condition that the target database is a structured database, determining that the data exploration algorithm is a natural language processing algorithm; determining the data exploration algorithm as an image processing algorithm under the condition that the data stored in the target database is picture data; in the case where the data stored in the target database is time-series related data, the data exploration algorithm is determined to be a time-series algorithm.

Further, determining a data probe algorithm according to the data probe task and/or the data probe request comprises: determining the service requirement of the data exploration task according to the data exploration request; under the condition that the service requirement is statistical data and calculation data, determining the data exploration algorithm as a statistical algorithm; under the condition that the business requirement is a predicted data trend or a data model is established, determining the data exploration algorithm as a machine learning algorithm; and under the condition that the service requirement is the mining data relation, determining the data exploration algorithm as a data mining algorithm.

Further, determining a data exploration plan in accordance with the data exploration algorithm and the data exploration task includes: determining the priority of the data exploration task according to the task information of the data exploration task; performing task splitting on the data exploration task according to the data exploration algorithm to obtain a plurality of subtasks; determining the dependency relationship among the plurality of subtasks and the dependency relationship between the data exploration task and other tasks to obtain a task dependency relationship; and determining the data exploration plan according to the priority of the data exploration task, the plurality of subtasks and the task dependency relationship.

Further, performing data exploration in the big data cluster according to the data exploration plan, and obtaining a data exploration result includes: determining a target cluster in the big data cluster, wherein the target cluster comprises static cluster resources and dynamic cluster resources, the static cluster resources have preset quantity of cluster resources, and the dynamic cluster resources adjust the quantity of the cluster resources owned by the target cluster according to the load condition of the target cluster and the quantity of the data exploration tasks to be executed; and processing the data exploration task according to the target cluster and the data exploration plan to obtain the data exploration result.

Further, after processing the data exploration task in accordance with the target cluster and the data exploration plan, the method further comprises: and monitoring the execution state of the data exploration task, and sending reminding information to a first object under the condition that the data exploration task is in a preset state so as to remind the first object to process the data exploration task, wherein the first object refers to an operation and maintenance person.

Further, after performing data exploration in the big data cluster according to the data exploration plan to obtain a data exploration result, the method further comprises: receiving a query request of a target object for the data exploration result; determining query information according to the identity information of the target object and the query request; and inquiring the data exploration result according to the inquiry information to obtain an inquiry result, and sending the inquiry result to the target object.

Further, the query information includes information to be queried, a desensitization strategy and a visual display strategy, and the query is performed on the data exploration result according to the query information, and the obtaining of the query result includes: inquiring in the data exploration result to obtain the information to be inquired; desensitizing the information to be queried according to the desensitization strategy to obtain desensitized data; and calculating and processing the desensitized data according to the visual display strategy to obtain the query result.

To achieve the above object, according to another aspect of the present application, there is provided a processing apparatus for a data probe request, the apparatus comprising: a first determining unit, configured to determine a data probing task according to a request source of a data probing request in a case where the data probing request is received, where the data probing task is used for probing abnormal data in service data or is used for processing the service data; the second determining unit is used for determining a data exploration algorithm according to the data exploration task and/or the data exploration request; a third determining unit for determining a data exploration plan according to the data exploration algorithm and the data exploration task; and the exploration unit is used for carrying out data exploration in the big data cluster according to the data exploration plan to obtain a data exploration result.

Further, the first determination unit includes: a first determining subunit, configured to determine a data probing range according to a request source of the data probing request; the acquisition subunit is used for acquiring target data information according to the data to be probed in the data probing range; and the generating subunit is used for generating the data exploration task according to the target data information.

Further, the request source of the data probe request at least comprises one of the following: a timed batch job, asynchronous near real time query, the first determining subunit comprising: in the case where the request source of the data probe request is the timed batch job, the data probe range includes at least one of: the system comprises a hot spot table and a preset data table, wherein the hot spot table refers to a table with access times exceeding preset times; in the case where the request source of the data probe request is the asynchronous near real-time query, the data probe scope includes at least the target data table, wherein the target data table is a data table in which the data probe request indicates a query.

Further, the second determining unit includes: the second determining subunit is used for determining a target database in which the data to be probed are located according to the target database information; a third determining subunit, configured to determine that the data exploration algorithm is a natural language processing algorithm when the target database is a structured database; a fourth determining subunit, configured to determine that the data exploration algorithm is an image processing algorithm when the data stored in the target database is picture data; and a fifth determining subunit, configured to determine that the data exploration algorithm is a time sequence algorithm when the data stored in the target database is data related to time sequence.

Further, the second determining unit includes: a sixth determining subunit, configured to determine a service requirement of the data probing task according to the data probing request; a seventh determining subunit, configured to determine that the data probing algorithm is a statistical algorithm when the service requirement is statistical data and calculation data; an eighth determining subunit, configured to determine that the data exploration algorithm is a machine learning algorithm when the service requirement is a predicted data trend or a data model is established; and a ninth determining subunit, configured to determine, in a case where the service requirement is to mine a data relationship, that the data exploration algorithm is a data mining algorithm.

Further, the third determination unit includes: a tenth determining subunit, configured to determine a priority of the data probing task according to task information of the data probing task; the splitting subunit is used for splitting the data exploration task according to the data exploration algorithm to obtain a plurality of subtasks; an eleventh determination subunit, configured to determine a dependency relationship between the plurality of subtasks and a dependency relationship between the data exploration task and other tasks, to obtain a task dependency relationship; a twelfth determining subunit, configured to determine the data exploration plan according to the priority of the data exploration task, the multiple subtasks, and the task dependency relationship.

Further, the probing unit includes: a thirteenth determining subunit, configured to determine a target cluster in the big data cluster, where the target cluster includes a static cluster resource and a dynamic cluster resource, where the static cluster resource has a preset number of cluster resource amounts, and the dynamic cluster resource adjusts the number of cluster resources owned by the target cluster according to a load situation of the target cluster and the number of data probing tasks to be executed; and the first processing subunit is used for processing the data exploration task according to the target cluster and the data exploration plan to obtain the data exploration result.

Further, the probing unit further comprises: and the sending subunit is used for monitoring the execution state of the data exploration task after the data exploration task is processed according to the target cluster and the data exploration plan, and sending reminding information to a first object to remind the first object to process the data exploration task under the condition that the data exploration task is in a preset state, wherein the first object refers to an operation and maintenance personnel.

Further, the apparatus further comprises: the receiving unit is used for receiving a query request of a target object for the data exploration result after the data exploration is carried out in the big data cluster according to the data exploration plan to obtain the data exploration result; a fourth determining unit, configured to determine query information according to the identity information of the target object and the query request; and the query unit is used for querying the data exploration result according to the query information to obtain a query result and sending the query result to the target object.

Further, the query information includes information to be queried, a desensitization strategy and a visual display strategy, and the query unit includes: the query subunit is used for querying in the data exploration result to obtain the information to be queried; the desensitization subunit is used for desensitizing the information to be queried according to the desensitization strategy to obtain desensitized data; and the second processing subunit is used for calculating and processing the desensitized data according to the visual display strategy to obtain the query result.

To achieve the above object, according to an aspect of the present application, there is provided a processor for executing a program, wherein the program executes a method for processing a data probe request as described in any one of the above.

To achieve the above object, according to one aspect of the present application, there is provided an electronic device including one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method for processing a data probe request according to any one of the above.

According to the application, the following steps are adopted: under the condition that a data exploration request is received, determining a data exploration task according to a request source of the data exploration request, wherein the data exploration task is used for exploring abnormal data in service data or processing the service data; determining a data exploration algorithm according to the data exploration task and/or the data exploration request; determining a data exploration plan according to the data exploration algorithm and the data exploration task; according to the data exploration plan, data exploration is carried out in a large data cluster to obtain a data exploration result, and the problem that the traditional data exploration tool is only designed for specific data types and a certain amount of data and cannot adapt to the data exploration requirement of large-scale heterogeneous data due to huge data volume and various data structures of the large data cluster in the related technology is solved. The method has the advantages that the request source of the data exploration request is determined, the data to be explored and the data format of the data to be explored are defined, so that specific task information and a data exploration algorithm of the data exploration task can be determined according to different data formats, the effect of processing large-scale heterogeneous data is achieved, and the large data cluster can reasonably schedule the data exploration task according to the data exploration plan by determining the data exploration plan, so that the normal operation of a production environment is prevented from being influenced by the execution of the data exploration task, the exploration efficiency of the data exploration task is improved, the stable operation of normal business of a financial institution is ensured, and the effect of improving the working efficiency of the financial institution is achieved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:

fig. 1 is a flowchart of a method for processing a data probe request according to a first embodiment of the present application;

FIG. 2 is a schematic diagram I of an alternative method for processing a data probe request according to the first embodiment of the present application;

FIG. 3 is a schematic diagram II of an alternative method for processing a data probe request according to the first embodiment of the present application;

fig. 4 is a schematic diagram of a processing device for a data probe request according to a second embodiment of the present application;

FIG. 5 is a schematic diagram I of an alternative data probe request processing system provided in accordance with a third embodiment of the present application;

FIG. 6 is a schematic diagram II of an alternative data probe request processing system provided in accordance with a third embodiment of the present application;

FIG. 7 is a schematic diagram III of an alternative data probe request processing system provided in accordance with an embodiment III of the present application;

fig. 8 is a schematic diagram of an electronic device for processing a data probe request according to the sixth embodiment of the present application.

Detailed Description

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, user information stored in a financial institution, etc.) and the data (including, but not limited to, data for analysis, stored data, displayed data, user data stored in a financial institution, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related country and region, and are provided with corresponding operation entries for the user to select authorization or rejection.

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the application herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

The present application will be described with reference to preferred implementation steps, and fig. 1 is a flowchart of a method for processing a data probe request according to a first embodiment of the present application, as shown in fig. 1, where the method includes the following steps:

in step S101, in the case of receiving the data probe request, a data probe task is determined according to the request source of the data probe request, where the data probe task is used to probe abnormal data in service data or is used to process service data.

In the first embodiment, data exploration refers to cleaning, processing, and converting data in a large data cluster, and analyzing and mining data in a large data cluster. The data exploration is mainly used for ensuring the quality and accuracy of data, and can also assist business personnel of a financial institution to carry out operations such as decision making, financial risk prediction, financial trend prediction and the like. The data in the big data cluster is a large amount of transaction data generated after a financial institution transacts with a customer.

After receiving the data probe request, determining data to be probed in the data probe request according to a request source of the data probe request, and performing specific operations on the data to be probed to obtain a data probe task. The source of the data probe request can be a task of detecting data in batches running at regular time in the big data cluster, and can also be a task of inquiring transaction data in the big data cluster by service personnel of a financial institution.

Step S102, determining a data exploration algorithm according to the data exploration task and/or the data exploration request.

In the first embodiment, the data exploration algorithm refers to an algorithm for processing data in a large data cluster. Specifically, in order to improve accuracy of recommending financial products to customers by a financial institution, a clustering algorithm may be used to cluster customer information of transaction data in a big data cluster to obtain different customer types, and recommend different products to customers according to the different customer types. For another example, in order to reduce the probability of bad account occurrence of a financial institution when the financial institution loans a customer, information such as liability, income and the like of the customer can be modeled by a machine learning algorithm, and financial risk of each loan can be predicted by the model.

Step S103, determining a data exploration plan according to a data exploration algorithm and a data exploration task.

In the first embodiment, data to be probed is determined according to a data probing task, and then a specific step of the data probing task is explicitly executed according to a data probing algorithm to generate a data probing plan. When a plurality of probing tasks exist, the emergency degree of the plurality of data probing tasks can be sequenced, and the execution sequence of the plurality of probing tasks is determined.

Step S104, data exploration is carried out in the big data cluster according to the data exploration plan, and a data exploration result is obtained.

In the first embodiment, in order to improve the probing efficiency of the data probing task, cluster resources are allocated to the target cluster performing the data probing plan according to the load capacity of the large data cluster. The data probing result is a result obtained after probing the data in the large data cluster. Table one is an example of a data probe result. As shown in table one, the data probing result includes two tables of "data probing result-table basic information" and "data probing result-table field details", wherein the "data probing result-table basic information" table is used for counting cluster information, database information, table information and the like of the probed service data in the data probing task, and the "data probing result-table field details" table is used for counting field information and the like of the probed service data.

Table-data exploration result example

In summary, in the method for processing a data probe request according to the first embodiment of the present application, under the condition that a data probe request is received, a data probe task is determined according to a request source of the data probe request, where the data probe task is used for probing abnormal data in service data or is used for processing service data; determining a data exploration algorithm according to the data exploration task and/or the data exploration request; determining a data exploration plan according to a data exploration algorithm and a data exploration task; according to the data exploration plan, data exploration is carried out in a large data cluster to obtain a data exploration result, and the problem that the traditional data exploration tool is only designed for specific data types and a certain amount of data and cannot adapt to the data exploration requirement of large-scale heterogeneous data due to huge data volume and various data structures of the large data cluster in the related technology is solved. The method has the advantages that the request source of the data exploration request is determined, the data to be explored and the data format of the data to be explored are defined, so that specific task information and a data exploration algorithm of the data exploration task can be determined according to different data formats, the effect of processing large-scale heterogeneous data is achieved, and the large data cluster can reasonably schedule the data exploration task according to the data exploration plan by determining the data exploration plan, so that the normal operation of a production environment is prevented from being influenced by the execution of the data exploration task, the exploration efficiency of the data exploration task is improved, the stable operation of normal business of a financial institution is ensured, and the effect of improving the working efficiency of the financial institution is achieved.

Optionally, in the method for processing a data probe request according to the first embodiment of the present application, determining a data probe task according to a request source of the data probe request includes: determining a data exploration range according to a request source of a data exploration request; obtaining target data information according to data to be probed in a data probing range; and generating a data exploration task according to the target data information.

In the first embodiment, the data probe range refers to a specific data table (e.g., a data table in a structured database) or specific data information (e.g., picture data, a graph in a graph database) related to data to be probed in the data probe request. The target data information is specific information of data to be probed in the data probe request, for example, database information, cluster information, data table information and the like where the data to be probed is located.

The data structure of the data to be probed can be clarified by analyzing the data probing range and the target data information of the data to be probed through the data probing request, and then the probing task is generated according to the data structure of the data to be probed, so that the problem that the traditional data probing tool cannot process different data to be probed of the data structure due to more data in a large data cluster is avoided, and the application range of the data to be probed is improved.

Optionally, in the method for processing a data probe request according to the first embodiment of the present application, the target data information includes at least the following information: target cluster information of data to be probed, target database information of the data to be probed, target data table information of the data to be probed, submitting time of a data probing task and running state information of the data probing task.

Specifically, table two is an example of field information contained in the data probing task table in this scheme. The information contained in the target data information may be as shown in table two: data exploration inquiry ID, data exploration target cluster name, data exploration target database name, data exploration target mode name, data exploration target table name, data exploration task submitting time and data exploration task running state. The data exploration inquiry ID indicates an ID of a data exploration task, a data exploration target cluster name, a data exploration target database name, a data exploration target mode name and a data exploration target table name indicate storage positions of data to be explored, and the data exploration task submitting time and the data exploration task running state indicate task information of the data exploration task. To avoid mislocating the location of the data to be probed, it is necessary to ensure that the data probe target cluster name and the data probe target database name are unique identifiers.

Table data exploration task table

By defining the data information contained in the target data information, the data structure of the data to be probed can be deeply analyzed, corresponding data probing tasks can be generated aiming at the data to be probed of different data structures, and the effect of probing heterogeneous data is achieved.

Optionally, in the method for processing a data probe request according to the first embodiment of the present application, a request source of the data probe request includes at least one of the following: the method for determining the data exploration scope according to the request source of the data exploration request comprises the following steps of: in the case where the request source of the data probe request is a timed batch job, the data probe range includes at least one of: the system comprises a hot spot table and a preset data table, wherein the hot spot table refers to a table with access times exceeding preset times; in the case where the request source of the data probe request is an asynchronous near real time query, the data probe scope includes at least a target data table, wherein the target data table is a data table for which the data probe request indicates the query.

In one embodiment, the source of the data probe request is a timed batch job or an asynchronous near real-time query.

The timing batch operation refers to batch inspection operation for transaction data in a large data cluster, and is characterized by larger data exploration scale (for example, data exploration is performed on ten thousand tables), lower data timeliness requirement (for example, the timing batch operation initiated on the same day can obtain exploration results on the next day), more cluster resources occupation and the like. The data probe scope of a timed batch job is typically a hot spot table with more frequent data changes, including but not limited to: the audit log statistics of the big data cluster uses the table with more times, the table with larger data volume detected by the daily routine scanning of the big data cluster, the data table contained in the data probing white list configured according to the service requirement, the table with the data constitution and the data quality queried through the operation and maintenance platform, and the like. Table three is one example of a data probe whitelist containing field.

Table three data exploration whitelist

Asynchronous quasi-real-time queries refer to data probe requests initiated by business personnel (e.g., data engineers, data analysts, development engineers, test engineers, etc.) of a financial institution, which are characterized by smaller data probe sizes (e.g., data analysts probing one or more tables), higher data timeliness requirements (e.g., data analysts querying one table in real-time, requiring the probe results to be returned as soon as possible), less cluster resources occupation, etc. Because the data size of the asynchronous quasi-real-time query is smaller, when a service personnel initiates the asynchronous quasi-real-time query, the table name (i.e. the target data table) where the data to be probed is located is clearly indicated.

By analyzing the request source of the data probe request, the data probe range can be determined, so that the service requirement of the data probe request is met, and the problem of inaccurate data probe result caused by insufficient data to be probed is avoided.

Optionally, in the method for processing a data probe request according to the first embodiment of the present application, determining the data probe algorithm according to the data probe task and/or the data probe request includes: determining a target database in which the data to be probed are located according to the target database information; under the condition that the target database is a structured database, determining that the data exploration algorithm is a natural language processing algorithm; under the condition that the data stored in the target database is picture data, determining a data exploration algorithm as an image processing algorithm; in the case where the data stored in the target database is time-series related data, the data exploration algorithm is determined to be a time-series algorithm.

In the first embodiment, in order to obtain a data probing result that meets a data probing request, a cluster or a database where data to be probed is located may be determined according to target data information in a data probing task, so as to determine a data type of the data to be probed, and then a data probing algorithm used by the data probing task is determined according to the data type of the data to be probed. In addition, various data exploration algorithms may be combined to perform the data exploration tasks.

In particular, natural language processing algorithms may be employed when data exploration tasks are used to process and analyze data text in a structured data cluster, such as when classifying customer text or recognizing a customer's emotion.

When the data exploration task is to process and analyze the picture data, an image processing algorithm can be adopted, for example, when a camera is used for identifying a human face, when an object in a monitoring picture is identified, and when the task in a monitoring video is tracked and positioned, the image processing algorithm can be adopted for processing.

When the data exploration task is used for exploration of IOT clusters or operation and maintenance clusters or data with strong correlation to time sequence, a time sequence algorithm can be adopted, for example, when profit trends of financial products in one quarter are analyzed, the time sequence algorithm can be adopted for processing.

The data exploration algorithm is determined according to the data type of the data to be explored, and the data with different data structures can be processed by adopting the data exploration algorithm matched with the data to be explored, so that the effect of meeting the requirements of data exploration requests is achieved.

Optionally, in the method for processing a data probe request according to the first embodiment of the present application, determining the data probe algorithm according to the data probe task and/or the data probe request includes: determining the service requirement of a data exploration task according to the data exploration request; under the condition that the service requirement is statistical data and calculation data, determining a data exploration algorithm as a statistical algorithm; under the condition that the service requirement is the predicted data trend or the data model is established, determining the data exploration algorithm as a machine learning algorithm; in the case where the traffic demand is to mine a data relationship, the data mining algorithm is determined to be a data mining algorithm.

In the first embodiment, in order to obtain a data probe result satisfying the data probe request, in addition to determining a data probe algorithm according to the data type of the data to be probed, the data probe algorithm may be determined according to the service requirement of the data probe request, so as to ensure the accuracy and efficiency of data probing.

In particular, when the data exploration task is used to identify outliers and distribution conditions in the data to be explored, statistical algorithms may be employed, such as averaging, standard deviation, maximum, minimum, median, etc. of the field values in the target data table.

When the data exploration task is used for discovering potential relations and rules in the data, a data mining algorithm can be adopted, for example, a clustering algorithm or a classification algorithm in the data mining algorithm can be adopted when the clients are classified according to the client information of the clients; potential rules for customer and financial risk may also be mined based on customer information and transaction information for the customer.

When the data exploration task is used to build a data model and predict the behavior and trend of data, a machine learning algorithm may be employed, for example, to model the customer with supervised learning, unsupervised learning, semi-supervised learning, based on the customer's transaction information, to calculate the customer's risk level information, and predict the customer's ability to repay loans.

By analyzing the service requirement of the data exploration request from the actual production condition and determining the data exploration algorithm used by the data exploration task according to the service requirement, the data exploration result is consistent with the result required by the service personnel, and the accuracy and exploration efficiency of the data exploration result are improved.

Optionally, in the method for processing a data probe request according to the first embodiment of the present application, determining a data probe plan according to a data probe algorithm and a data probe task includes: determining the priority of the data exploration task according to the task information of the data exploration task; performing task splitting on the data exploration task according to a data exploration algorithm to obtain a plurality of subtasks; determining the dependency relationship among a plurality of subtasks and the dependency relationship between the data exploration task and other tasks to obtain task dependency relationship; and determining a data exploration plan according to the priority of the data exploration task, the plurality of subtasks and the task dependency relationship.

In the first embodiment, the task information of the data probing task may be a task type of the data probing task (i.e. the above-mentioned timed batch job and asynchronous quasi-real-time query), a cluster load condition of executing the data probing task, information such as an emergency degree of the data probing task, and an execution frequency of the data probing task (e.g. a period such as a day, a week, a month or a quarter).

In order to improve the exploration efficiency of the data exploration task and ensure the timeliness of the data exploration task, the priority of the data exploration task can be determined according to the task information of the data exploration task, the data exploration task is split into detailed subtasks, the task dependency relationship is defined, and a data exploration plan is generated.

Specifically, since the target cluster for executing the data probing task may occupy cluster resources by other tasks, the priority of the data probing task needs to be determined, so that the target cluster preferentially executes the data probing task with higher priority and the other task with higher priority, and executes the data probing task with lower priority and the other task with lower priority in a cluster idle state or at a preset time, so as to ensure timeliness of the data probing task. For example, for asynchronous near real-time queries with higher timeliness requirements, the priority of the asynchronous near real-time query may be set to the highest priority, and for timed batch jobs with lower timeliness requirements, the priority of the timed batch jobs may be set to a lower priority.

Then, the data exploration task can be disassembled into a stepwise specific query task according to the data exploration algorithm, for example, a preset task disassembly template is set, and the data exploration task is disassembled into three steps: acquiring data to be probed according to the target data information; cleaning and preprocessing data to be probed; and analyzing and calculating the preprocessed data to be probed by adopting a data probing algorithm to obtain a data probing result.

Secondly, for the subtasks with the dependency relationship, the front-end task which must be completed before each subtask is executed needs to be determined, and then the execution sequence of the subtasks is determined, so that the task dependency relationship is obtained. In addition, since there may be a dependency relationship between different data probing tasks, the execution order of the data probing tasks may also be determined according to the dependency relationship between the data probing tasks.

And finally, determining the execution sequence of a plurality of subtasks or the data exploration task according to the priority of the data exploration task, the plurality of subtasks and the task dependency relationship, and obtaining a data exploration plan.

By splitting the data exploration tasks and determining the execution sequence of the data exploration tasks, the timeliness of the data exploration tasks is improved, and therefore the working efficiency of the financial institution is improved.

Optionally, in the method for processing a data probe request according to the first embodiment of the present application, performing data probing in a large data cluster according to a data probe plan, and obtaining a data probe result includes: determining a target cluster in a big data cluster, wherein the target cluster comprises static cluster resources and dynamic cluster resources, the static cluster resources have preset quantity of cluster resources, and the dynamic cluster resources adjust the quantity of the cluster resources owned by the target cluster according to the load condition of the target cluster and the quantity of data exploration tasks to be executed; and processing the data exploration task according to the target cluster and the data exploration plan to obtain a data exploration result.

In the first embodiment, in order to improve the execution efficiency of the data exploration task, two parts of cluster resources, namely, a static cluster resource and a dynamic cluster resource, may be applied for in a large data cluster. The cluster resource amount of the dynamic cluster resource can be dynamically adjusted (i.e. the target cluster is subjected to capacity expansion and contraction operation) by detecting the load condition of the big data cluster and the task queue of the data probing task in real time, so that the resource utilization rate of the big data cluster is improved, and the execution efficiency of the data probing task is improved. The cluster resource amount of the static cluster resource can be used for processing a data exploration task with relatively stable cluster resource requirements by distributing cluster resources with preset resource amount (for example, a preset number of cluster nodes or a preset capacity of memory resources) through a large data cluster. In addition, appropriate resources may be allocated to the data probe task according to the request source of the data probe request and the priority of the data probe task, for example, more cluster nodes or more memory may be allocated for the timing batch job.

After cluster resources used by the data exploration task are allocated, the data exploration task is executed according to the data exploration plan, and a data exploration result is obtained. Because of the large variance in the data size of the data probe results from the different request sources, the data probe results may be distributed according to the request source of the data probe request. Specifically, for the data exploration results of the timing batch operation, the data exploration results can be written into a file, and transmitted back to a database in the form of the file for the related business personnel in the financial institution to inquire; for asynchronous near real-time queries initiated by business personnel at a financial institution, the data probe results may be communicated back to the business personnel in the form of structured data (e.g., a data table).

When the data exploration task is executed, key information of the execution process can be printed into a log file, and the data exploration result is also recorded in the log file so as to maintain the execution flow of the data exploration task later.

The dynamic cluster resources capable of flexibly adjusting the cluster resource quantity are arranged in the big data clusters, so that the cluster resources of the big data clusters can be elastically allocated according to actual production conditions, the utilization rate of the cluster resources and the execution efficiency of data exploration tasks are improved, meanwhile, by arranging the static cluster resources with fixed cluster resource quantity, part of data exploration tasks can be ensured to run stably, and further the stability and reliability of the data exploration tasks are ensured.

Optionally, in the method for processing a data probe request according to the first embodiment of the present application, after processing a data probe task according to a target cluster and a data probe plan, the method further includes: and monitoring the execution state of the data exploration task, and sending reminding information to the first object under the condition that the data exploration task is in a preset state so as to remind the first object to process the data exploration task, wherein the first object is an operation and maintenance person.

In the first embodiment, when the data exploration task is executed, the task execution condition of the data exploration task can be monitored in real time, and when the data exploration task is in a preset state, service personnel (i.e., the first object) of the operation and maintenance personnel or the financial institution are reminded to be correspondingly processed. The preset state may be a state that the data probing task is in waiting, overtime, failure, data error, etc.

In addition, the task information of the data exploration task can be adjusted according to the task execution condition of the data exploration task, for example, the operations of adjusting the allocation of cluster resources, modifying a data exploration plan, changing a data exploration algorithm and the like according to the resource consumption condition of the data exploration task are also carried out.

The task execution condition of the data exploration task is monitored in real time, so that the data exploration task in an abnormal state (namely the preset state) can be processed in time, and the task information of the data exploration task can be adjusted in time, thereby ensuring the normal operation of the data exploration task and improving the execution efficiency of the data exploration task.

Optionally, in the method for processing a data probe request according to the first embodiment of the present application, after performing data probing in a large data cluster according to a data probe plan to obtain a data probe result, the method further includes: receiving a query request of a target object on a data exploration result; determining query information according to the identity information of the target object and the query request; and inquiring the data exploration result according to the inquiry information to obtain an inquiry result, and sending the inquiry result to the target object.

After the data exploration result is obtained in the first embodiment, a query request of a business person (i.e., the target object) in the financial institution for the data exploration result can be received, and the query request of the target object is responded, and the query result is returned. The target object may be a person in a financial institution such as a data engineer, a data analyst, a test engineer, a data product manager, a development engineer, and a project manager.

Specifically, after receiving a query request of a target object, identity authentication needs to be performed on the target object, and identity information of the target object is determined. Then, determining data to be queried and data sources (such as test environment and production environment) of the data to be queried according to the query request, and determining query information of the data to be queried according to the authenticated identity information, wherein the query information can be information such as query requirements of the data to be queried, desensitization strategies of the data to be queried, visual display strategies of the data to be queried and the like. And finally, disassembling the query request into a specific SQL query statement, and carrying out asynchronous scheduling through the big data cluster to obtain a query result so as to avoid the blocking of the big data cluster caused by excessive query requests.

By verifying the identity information of the target object, the data returned by the query result can be determined according to the identity information, so that the problem of leakage of customer information in a financial institution is avoided, meanwhile, asynchronous scheduling is performed through the big data cluster, the problem of blocking of the big data cluster due to excessive query requests is avoided, and the stability and the working efficiency of the big data cluster are improved.

Optionally, in the method for processing a data probe request according to the first embodiment of the present application, the query information includes information to be queried, a desensitization policy, and a visual display policy, and the querying is performed on a data probe result according to the query information, where obtaining the query result includes: inquiring in the data exploration result to obtain information to be inquired; desensitizing the information to be queried according to a desensitization strategy to obtain desensitized data; and calculating and processing the desensitized data according to the visual display strategy to obtain a query result.

In the first embodiment, the desensitization processing and the visualization processing are performed on the data in the query result, so that the leakage of the customer information in the financial institution is avoided, the data security of the financial institution is improved, and meanwhile, the query result is better displayed to the target object more clearly and intuitively.

Specifically, the queried data (i.e. the information to be queried described above) is obtained according to the data source of the queried data.

Then, determining a desensitization strategy according to the authenticated identity information, and carrying out desensitization treatment on the information to be queried according to the desensitization strategy to obtain desensitized data. The desensitization method can be data shielding, data encryption, data disturbance and the like.

And secondly, calculating and processing the desensitized data according to the determined visual display strategy to obtain a query result. Visual presentation strategies can be specifically divided into three types: firstly, a visual chart strategy, namely, a visual chart (such as a line chart, a bar chart, a pie chart, a scatter chart and the like) is adopted to intuitively display the data distribution, the data trend, the relationship and the like of the desensitized data so as to help a target object to perform data analysis and decision; secondly, a multidimensional display strategy, namely selecting proper data dimension according to the query requirement of the target object, and displaying data exploration results of different angles so as to help the target object find out rules and anomalies of the data, for example, displaying the data distribution rules according to time, region, user attributes and the like; third, the interactive strategy provides an interactive data display interface, so that the target object can be flexibly probed and analyzed, for example, interactive inquiry functions such as chart switching, data screening, data aggregation and the like are provided.

Alternatively, in the first embodiment, the flow of acquiring the probing result in this embodiment may be as shown in fig. 2. Step S201, a data probe request is received, and the flow starts. In step S202, it is determined whether the source of the data probe request is a timed batch operation or an asynchronous quasi-real-time query. Step S203, the data exploration scope is defined according to the request source, and the data exploration task is determined. Step S204, determining a data exploration algorithm used by the data exploration task according to the data exploration scope and the request source of the data exploration request. Step S205, applying cluster resources to the big data cluster according to the actual production condition, wherein the cluster resources comprise dynamic cluster resources supporting elastic allocation and static cluster resources with fixed resource quantity. And S206, splitting the data exploration task, and scheduling the split task according to the cluster resource condition. Step S207, a data exploration result is obtained, and the data exploration result is sent to a target object which initiates a data exploration request. Step S208, the flow ends.

Alternatively, in the first embodiment, the flow of querying the data probing result according to the present embodiment may be as shown in fig. 3. Step S301, a query request is received, and the flow starts. In step S302, identity authentication is performed on the target object applying for the query request to obtain identity information (e.g. data engineer/data analyst in fig. 3). Step S303, determining the data to be queried according to the query request. And step S304, scheduling the query request according to the data to be queried and the cluster resource condition. Step S305, obtaining a query result. Step S306, desensitizing the query result. Step S307, the desensitized data is calculated and visualized, and displayed to the target object. Step S308, the flow ends.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

Example two

The second embodiment of the present application further provides a processing device for a data probe request, which needs to be described that the processing device for a data probe request of the second embodiment of the present application may be used to execute the processing method for a data probe request provided by the first embodiment of the present application. The following describes a processing device for a data probe request according to a second embodiment of the present application.

Fig. 4 is a schematic diagram of a processing apparatus for a data probe request according to a second embodiment of the present application. As shown in fig. 4, the apparatus includes: a first determination unit 401, a second determination unit 402, a third determination unit 403, and a probe unit 404.

Specifically, the first determining unit 401 is configured to determine, in a case of receiving a data probe request, a data probe task according to a request source of the data probe request, where the data probe task is used for probing abnormal data in service data or is used for processing the service data.

A second determining unit 402 is arranged for determining a data probing algorithm based on the data probing task and/or the data probing request.

A third determining unit 403 for determining a data exploration plan in dependence of the data exploration algorithm and the data exploration task.

And the probing unit 404 is configured to perform data probing in the large data cluster according to the data probing plan, so as to obtain a data probing result.

In the processing device for a data probe request provided in the second embodiment of the present application, under the condition that a data probe request is received by the first determining unit 401, determining a data probe task according to a request source of the data probe request, where the data probe task is used for probing abnormal data in service data or is used for processing service data; the second determining unit 402 determines a data probe algorithm depending on the data probe task and/or the data probe request; the third determining unit 403 determines a data exploration plan according to the data exploration algorithm and the data exploration task; the probing unit 404 performs data probing in the large data cluster according to the data probing plan to obtain a data probing result, so as to solve the problem that the conventional data probing tool is only designed for specific data types and a certain amount of data volumes, and cannot adapt to the data probing requirement of large-scale heterogeneous data due to huge data volumes and various data structures of the large data cluster in the related art. The method has the advantages that the request source of the data exploration request is determined, the data to be explored and the data format of the data to be explored are defined, so that specific task information and a data exploration algorithm of the data exploration task can be determined according to different data formats, the effect of processing large-scale heterogeneous data is achieved, and the large data cluster can reasonably schedule the data exploration task according to the data exploration plan by determining the data exploration plan, so that the normal operation of a production environment is prevented from being influenced by the execution of the data exploration task, the exploration efficiency of the data exploration task is improved, the stable operation of normal business of a financial institution is ensured, and the effect of improving the working efficiency of the financial institution is achieved.

Optionally, in the processing apparatus for a data probe request provided in the second embodiment of the present application, the first determining unit 401 includes: a first determining subunit, configured to determine a data probe range according to a request source of the data probe request; the acquisition subunit is used for acquiring target data information according to the data to be probed in the data probing range; and the generating subunit is used for generating a data exploration task according to the target data information.

Optionally, in the processing device for a data probe request according to the second embodiment of the present application, the target data information includes at least the following information: target cluster information of data to be probed, target database information of the data to be probed, target data table information of the data to be probed, submitting time of a data probing task and running state information of the data probing task.

Optionally, in the processing device for a data probe request according to the second embodiment of the present application, a source of the data probe request includes at least one of the following: the first determining subunit includes: in the case where the request source of the data probe request is a timed batch job, the data probe range includes at least one of: the system comprises a hot spot table and a preset data table, wherein the hot spot table refers to a table with access times exceeding preset times; in the case where the request source of the data probe request is an asynchronous near real time query, the data probe scope includes at least a target data table, wherein the target data table is a data table for which the data probe request indicates the query.

Optionally, in the processing apparatus for a data probe request provided in the second embodiment of the present application, the second determining unit 402 includes: the second determining subunit is used for determining a target database in which the data to be probed are located according to the target database information; the third determining subunit is configured to determine that the data exploration algorithm is a natural language processing algorithm when the target database is a structured database; a fourth determining subunit, configured to determine that the data exploration algorithm is an image processing algorithm when the data stored in the target database is picture data; and a fifth determining subunit for determining that the data probing algorithm is a time series algorithm in the case where the data stored in the target database is data related to time series.

Optionally, in the processing apparatus for a data probe request provided in the second embodiment of the present application, the second determining unit 402 includes: a sixth determining subunit, configured to determine a service requirement of the data probing task according to the data probing request; a seventh determining subunit, configured to determine that the data exploration algorithm is a statistical algorithm when the service requirement is statistical data and calculation data; an eighth determining subunit, configured to determine that the data exploration algorithm is a machine learning algorithm when the service requirement is a predicted data trend or a data model is established; and the ninth determining subunit is configured to determine that the data exploration algorithm is a data mining algorithm when the service requirement is to mine the data relationship.

Optionally, in the processing apparatus for a data probe request provided in the second embodiment of the present application, the third determining unit 403 includes: a tenth determination subunit, configured to determine a priority of the data probing task according to task information of the data probing task; the splitting unit is used for splitting tasks of the data exploration task according to the data exploration algorithm to obtain a plurality of subtasks; an eleventh determination subunit, configured to determine a dependency relationship between a plurality of subtasks and a dependency relationship between a data exploration task and other tasks, so as to obtain a task dependency relationship; a twelfth determining subunit, configured to determine the data exploration plan according to the priority of the data exploration task, the plurality of subtasks and the task dependency relationship.

Optionally, in the apparatus for processing a data probe request according to the second embodiment of the present application, the probe unit 404 includes: a thirteenth determining subunit, configured to determine a target cluster in the big data cluster, where the target cluster includes a static cluster resource and a dynamic cluster resource, the static cluster resource has a preset number of cluster resources, and the dynamic cluster resource adjusts the number of cluster resources owned by the target cluster according to a load condition of the target cluster and a number of data probing tasks to be executed; and the first processing subunit is used for processing the data exploration task according to the target cluster and the data exploration plan to obtain a data exploration result.

Optionally, in the apparatus for processing a data probe request according to the second embodiment of the present application, the probe unit 404 further includes: the sending subunit is used for monitoring the execution state of the data exploration task after the data exploration task is processed according to the target cluster and the data exploration plan, and sending reminding information to the first object to remind the first object to process the data exploration task when the data exploration task is in a preset state, wherein the first object refers to an operation and maintenance person.

Optionally, in the processing device for a data probe request provided in the second embodiment of the present application, the device further includes: the receiving unit is used for receiving a query request of the target object on the data exploration result after the data exploration is carried out in the big data cluster according to the data exploration plan to obtain the data exploration result; a fourth determining unit, configured to determine query information according to the identity information of the target object and the query request; and the query unit is used for querying the data exploration result according to the query information to obtain a query result and sending the query result to the target object.

Optionally, in the processing device for a data probe request provided in the second embodiment of the present application, the query information includes information to be queried, a desensitization policy, and a visual display policy, and the query unit includes: the inquiring subunit is used for inquiring in the data exploration result to obtain information to be inquired; the desensitization subunit is used for desensitizing the information to be queried according to a desensitization strategy to obtain desensitized data; and the second processing subunit is used for calculating and processing the desensitized data according to the visual display strategy to obtain a query result.

The processing device for a data probe request includes a processor and a memory, where the first determining unit 401, the second determining unit 402, the third determining unit 403, the probe unit 404, and the like are stored as program units, and the processor executes the program units stored in the memory to implement corresponding functions.

The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one, and the efficiency of probing data of the big data cluster is improved by adjusting kernel parameters.

The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.

Example III

The third embodiment of the present application further provides a processing system for a data probe request, which needs to be described, where the processing system for a data probe request of the third embodiment of the present application may be used to execute the processing method for a data probe request provided by the first embodiment of the present application. The following describes a processing system for a data probe request according to the third embodiment of the present application.

Fig. 5 is a schematic diagram of a processing system for a data probe request according to a third embodiment of the present application. As shown in fig. 5, the apparatus includes: the system comprises a general control 1, a configuration parameter management unit 2, a parameter analysis module 3, an environment inspection module 4, a timing task unit 5, a probing task scheduling module 6, an elastic resource management module 7, a probing algorithm management module 8, an external interface unit 9, a probing rule management module 10 and a report detail query module 11.

Specifically, the configuration parameter management unit 2, the timing task unit 5 and the external interface unit 9 which are connected in series by the master control 1 form three core modules of the processing system of the data probe request, each module is loosely coupled, the service implementation module and the parameter configuration management module are separated, and the efficiency of operation and maintenance deployment and data query can be improved.

The parameter configuration management unit 2 consists of a parameter analysis module 3 and an environment checking module 4, performs validity check on input parameters, loads necessary configuration parameters and pre-checks the availability of the environment, and if the pre-verification process is passed and all configuration parameters are loaded, sends a signal to the master control 1 to indicate that the management tool enters an operation state.

The timing task unit 5 is composed of a probing task scheduling module 6, an elastic resource management module 7 and a probing algorithm management module 8, is a core module of a processing system of the data probing request, and is responsible for interacting with a plurality of large-scale heterogeneous large data clusters and scheduling operation systems.

The probing task scheduling module 6 is responsible for the business logic implementation of the data probing tasks.

The flexible resource management module 7 is responsible for dynamically allocating and freeing computing resources for data exploration as needed using a flexible computing resource platform to meet the computing requirements of data exploration at a minimum cost.

The probing algorithm management module 8 is responsible for generating and optimizing a data probing algorithm on the premise of safety compliance according to the characteristics of the accessed big data cluster, so that the calculation time is shortened, the probing efficiency is improved, and the data probing time efficiency is shortened.

The external interface unit 9 is composed of a probing rule management module 10 and a report detail query module 11, and is responsible for interfacing a processing system for providing data probing requests with an external system interface so as to realize rule management and probing data visualization.

The probe rules management module 10 provides an interface for maintaining data probe rules, wherein the maintenance functions include: add, delete, modify, and query.

Report detail query module 11 is responsible for providing a detail query interface and a visual presentation interface for data probe results of a data probe task.

Fig. 6 is a schematic diagram of data interaction between a processing system for a data probe request and other systems of a financial institution according to a third embodiment of the present application. As shown in fig. 6, other systems include: a data development operating platform (development test case) 12, a data development operating platform (production case) 13, a data exploration API (development test case) 14, a data exploration API (production case) 15, a large data production cluster 16, a large data distributed scheduling platform (development test case) 17, a large data distributed scheduling platform (production case) 18, a large data platform database 19 and a processing system 20 for data exploration requests.

Specifically, based on the data security and compliance requirements of the financial institutions, the data development operation platform (development test instance) 12 is an instance of the data development operation platform deployed in the development test environment, and the data development operation platform (production instance) 13 is an instance of the data development operation platform deployed in the production environment for use by business personnel with different roles. The data development operation platform (development test case) 12 is a development module, the role of the main user is a data development engineer, the data development operation platform (production case) 13 is an operation and maintenance module, and the role of the main user is a data operation and maintenance engineer. A task listening interface is deployed in both the data development operating platform (development test case) 12 and the data development operating platform (production case) 13. In order to ensure the safe and compliant use of the data, the open scope of the data development operation platform (development test case) 12 is a role of a data development engineer who does not contact the original data of the production environment, and the data request submitted to the data development operation platform (development test case) 12 is subjected to role identity authentication first, and only the attribute is provided with the desensitized data exploration service for the user of the data development engineer. The open scope of the data development operation platform (production example) 13 is a data product manager or a data analyst with access to the production environment, and is a person who needs a data service, who first performs role identity authentication on a request submitted to the data development operation platform (production example) 13, and provides real-time production data exploration service to only the data product manager or the data analyst with attributes.

The data exploration API (development test case 14) is an example of the deployment of the data exploration API in a development test environment, and the data exploration API (production case 15) is an example of the deployment of the data exploration API in a production environment for use by business personnel of different roles. In the data exploration API (development test case 14), an inquiry interface of the big data production cluster 16 deployed in the development test environment is configured, wherein an interface for inquiring and displaying batch data exploration results is integrated, a request from the data development operation platform (development test case) 12 is monitored, a single call returns the data exploration results stored in the big data platform database 19, and a display result formatting interface of data visualization is provided. The data exploration API (production instance 15) is an internal service query interface deployed in the production environment for the big data production cluster 16, wherein a single data exploration task submission, query, presentation interface is integrated, and monitors requests from the data development operation platform (production instance) 13, and a single call will be directed to a single table data exploration task of the big data production cluster 16 (typically a very large scale cluster, possibly including heterogeneous big data clusters such as distributed databases, hadoop, hbase, clickHouse, etc.), specifically, a data product manager or data analyst may perform task submission, task status query, task result detail query, etc. through the data exploration API (production instance 15).

The big data production cluster 16 is an infrastructure of the big data platform, is responsible for supplying storage and computing resources of the big data platform, provides an elastic allocation management mechanism and a reliable high-availability disaster recovery mechanism of the big data infrastructure for access applications, shields a bottom layer single-point fault, and is also a physical device where service data is finally stored.

Big data distributed scheduling platform (research and development test instance) 17 is deployed in research and development test environment, and big data distributed scheduling platform (production instance) 18 is deployed in production environment. The big data distributed scheduling platform consists of a plurality of nodes distributed and deployed, and each node can be expanded in a stateless and infinite horizontal manner. Task scheduling is the execution of a given job based on preset trigger conditions (e.g., time points, time intervals, judgment mechanisms, etc.). The control node (Scheduler) is responsible for job scheduling, and relates to job status management, transmission and reception of job scheduling requests, specific job allocation, and the execution node (Worker) is responsible for execution of specific data loading, clarity, processing, export and other jobs.

The large data platform database 19 is a relational database storing the scheduling configuration and operational analysis data of the large data platform in the development test environment.

In the third embodiment, a timing chart for processing data probe requests initiated by service personnel with different role types may be shown in fig. 7. The role types are divided into two types, namely, a data engineer who develops a data task and a data analyst who develops a service. The work responsibility of the data engineer is mainly to build a data pipeline, and the data read-write authority is mainly used for researching and developing the simulated data or the desensitized data of the test environment. The work responsibility of the data analyst is closely related to the production service data, and is the core service user of the data platform, so that the data read-write authority of the data analyst is mainly in a large data cluster of the production environment. The specific flow of processing the data probe request initiated by the data engineer is as follows: submitting a data exploration request by a data engineer to generate a data exploration task; scheduling data probe tasks by a big data distributed scheduling platform (production instance) 18; acquiring data probing results from the big data production cluster 16; returning the data exploration result to the data development operation platform (development test case) 12; and carrying out visual processing on the data exploration result and returning to a data engineer. The specific flow of processing the data probe request initiated by the data analyst is as follows: submitting a data exploration request by a data analyst, and generating a data exploration task; verifying the identity information by means of a data development operating platform (development test case) 12; acquiring data probing results from the big data production cluster 16; returning the data exploration result to the data development operation platform (development test case) 12; desensitizing and visualizing the data exploration result, and returning to a data analyst.

A fourth embodiment of the present invention provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements a method of processing a data probe request.

The fifth embodiment of the invention provides a processor, which is used for running a program, wherein the program runs a processing method for executing a data probe request.

As shown in fig. 8, a sixth embodiment of the present invention provides an electronic device, where the device includes a processor, a memory, and a program stored in the memory and executable on the processor, and the processor implements the following steps when executing the program: under the condition that a data exploration request is received, determining a data exploration task according to a request source of the data exploration request, wherein the data exploration task is used for exploring abnormal data in service data or processing the service data; determining a data exploration algorithm according to the data exploration task and/or the data exploration request; determining a data exploration plan according to a data exploration algorithm and a data exploration task; and performing data exploration in the large data cluster according to the data exploration plan to obtain a data exploration result.

The processor also realizes the following steps when executing the program: determining the data probe task from the request source of the data probe request includes: determining a data exploration range according to a request source of a data exploration request; obtaining target data information according to data to be probed in a data probing range; and generating a data exploration task according to the target data information.

The processor also realizes the following steps when executing the program: the target data information at least comprises the following information: target cluster information of data to be probed, target database information of the data to be probed, target data table information of the data to be probed, submitting time of a data probing task and running state information of the data probing task.

The processor also realizes the following steps when executing the program: the request source of the data probe request at least comprises one of the following: the method for determining the data exploration scope according to the request source of the data exploration request comprises the following steps of: in the case where the request source of the data probe request is a timed batch job, the data probe range includes at least one of: the system comprises a hot spot table and a preset data table, wherein the hot spot table refers to a table with access times exceeding preset times; in the case where the request source of the data probe request is an asynchronous near real time query, the data probe scope includes at least a target data table, wherein the target data table is a data table for which the data probe request indicates the query.

The processor also realizes the following steps when executing the program: determining a data probe algorithm from the data probe task and/or the data probe request comprises: determining a target database in which the data to be probed are located according to the target database information; under the condition that the target database is a structured database, determining that the data exploration algorithm is a natural language processing algorithm; under the condition that the data stored in the target database is picture data, determining a data exploration algorithm as an image processing algorithm; in the case where the data stored in the target database is time-series related data, the data exploration algorithm is determined to be a time-series algorithm.

The processor also realizes the following steps when executing the program: determining a data probe algorithm from the data probe task and/or the data probe request comprises: determining the service requirement of a data exploration task according to the data exploration request; under the condition that the service requirement is statistical data and calculation data, determining a data exploration algorithm as a statistical algorithm; under the condition that the service requirement is the predicted data trend or the data model is established, determining the data exploration algorithm as a machine learning algorithm; in the case where the traffic demand is to mine a data relationship, the data mining algorithm is determined to be a data mining algorithm.

The processor also realizes the following steps when executing the program: determining a data exploration plan based on a data exploration algorithm and a data exploration task includes: determining the priority of the data exploration task according to the task information of the data exploration task; performing task splitting on the data exploration task according to a data exploration algorithm to obtain a plurality of subtasks; determining the dependency relationship among a plurality of subtasks and the dependency relationship between the data exploration task and other tasks to obtain task dependency relationship; and determining a data exploration plan according to the priority of the data exploration task, the plurality of subtasks and the task dependency relationship.

The processor also realizes the following steps when executing the program: performing data exploration in a large data cluster according to a data exploration plan, wherein obtaining a data exploration result comprises the following steps: determining a target cluster in a big data cluster, wherein the target cluster comprises static cluster resources and dynamic cluster resources, the static cluster resources have preset quantity of cluster resources, and the dynamic cluster resources adjust the quantity of the cluster resources owned by the target cluster according to the load condition of the target cluster and the quantity of data exploration tasks to be executed; and processing the data exploration task according to the target cluster and the data exploration plan to obtain a data exploration result.

The processor also realizes the following steps when executing the program: after processing the data probe task according to the target cluster and the data probe plan, the method further comprises: and monitoring the execution state of the data exploration task, and sending reminding information to the first object under the condition that the data exploration task is in a preset state so as to remind the first object to process the data exploration task, wherein the first object is an operation and maintenance person.

The processor also realizes the following steps when executing the program: after performing data exploration in the big data cluster according to the data exploration plan to obtain a data exploration result, the method further comprises the following steps: receiving a query request of a target object on a data exploration result; determining query information according to the identity information of the target object and the query request; and inquiring the data exploration result according to the inquiry information to obtain an inquiry result, and sending the inquiry result to the target object.

The processor also realizes the following steps when executing the program: the query information comprises information to be queried, a desensitization strategy and a visual display strategy, the query is performed on the data exploration result according to the query information, and the query result is obtained by the steps of: inquiring in the data exploration result to obtain information to be inquired; desensitizing the information to be queried according to a desensitization strategy to obtain desensitized data; and calculating and processing the desensitized data according to the visual display strategy to obtain a query result.

The device herein may be a server, PC, PAD, cell phone, etc.

The application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: under the condition that a data exploration request is received, determining a data exploration task according to a request source of the data exploration request, wherein the data exploration task is used for exploring abnormal data in service data or processing the service data; determining a data exploration algorithm according to the data exploration task and/or the data exploration request; determining a data exploration plan according to a data exploration algorithm and a data exploration task; and performing data exploration in the large data cluster according to the data exploration plan to obtain a data exploration result.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: determining the data probe task from the request source of the data probe request includes: determining a data exploration range according to a request source of a data exploration request; obtaining target data information according to data to be probed in a data probing range; and generating a data exploration task according to the target data information.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: the target data information at least comprises the following information: target cluster information of data to be probed, target database information of the data to be probed, target data table information of the data to be probed, submitting time of a data probing task and running state information of the data probing task.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: the request source of the data probe request at least comprises one of the following: the method for determining the data exploration scope according to the request source of the data exploration request comprises the following steps of: in the case where the request source of the data probe request is a timed batch job, the data probe range includes at least one of: the system comprises a hot spot table and a preset data table, wherein the hot spot table refers to a table with access times exceeding preset times; in the case where the request source of the data probe request is an asynchronous near real time query, the data probe scope includes at least a target data table, wherein the target data table is a data table for which the data probe request indicates the query.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: determining a data probe algorithm from the data probe task and/or the data probe request comprises: determining a target database in which the data to be probed are located according to the target database information; under the condition that the target database is a structured database, determining that the data exploration algorithm is a natural language processing algorithm; under the condition that the data stored in the target database is picture data, determining a data exploration algorithm as an image processing algorithm; in the case where the data stored in the target database is time-series related data, the data exploration algorithm is determined to be a time-series algorithm.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: determining a data probe algorithm from the data probe task and/or the data probe request comprises: determining the service requirement of a data exploration task according to the data exploration request; under the condition that the service requirement is statistical data and calculation data, determining a data exploration algorithm as a statistical algorithm; under the condition that the service requirement is the predicted data trend or the data model is established, determining the data exploration algorithm as a machine learning algorithm; in the case where the traffic demand is to mine a data relationship, the data mining algorithm is determined to be a data mining algorithm.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: determining a data exploration plan based on a data exploration algorithm and a data exploration task includes: determining the priority of the data exploration task according to the task information of the data exploration task; performing task splitting on the data exploration task according to a data exploration algorithm to obtain a plurality of subtasks; determining the dependency relationship among a plurality of subtasks and the dependency relationship between the data exploration task and other tasks to obtain task dependency relationship; and determining a data exploration plan according to the priority of the data exploration task, the plurality of subtasks and the task dependency relationship.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: performing data exploration in a large data cluster according to a data exploration plan, wherein obtaining a data exploration result comprises the following steps: determining a target cluster in a big data cluster, wherein the target cluster comprises static cluster resources and dynamic cluster resources, the static cluster resources have preset quantity of cluster resources, and the dynamic cluster resources adjust the quantity of the cluster resources owned by the target cluster according to the load condition of the target cluster and the quantity of data exploration tasks to be executed; and processing the data exploration task according to the target cluster and the data exploration plan to obtain a data exploration result.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: after processing the data probe task according to the target cluster and the data probe plan, the method further comprises: and monitoring the execution state of the data exploration task, and sending reminding information to the first object under the condition that the data exploration task is in a preset state so as to remind the first object to process the data exploration task, wherein the first object is an operation and maintenance person.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: after performing data exploration in the big data cluster according to the data exploration plan to obtain a data exploration result, the method further comprises the following steps: receiving a query request of a target object on a data exploration result; determining query information according to the identity information of the target object and the query request; and inquiring the data exploration result according to the inquiry information to obtain an inquiry result, and sending the inquiry result to the target object.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: the query information comprises information to be queried, a desensitization strategy and a visual display strategy, the query is performed on the data exploration result according to the query information, and the query result is obtained by the steps of: inquiring in the data exploration result to obtain information to be inquired; desensitizing the information to be queried according to a desensitization strategy to obtain desensitized data; and calculating and processing the desensitized data according to the visual display strategy to obtain a query result.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A method for processing a data probe request, comprising:

under the condition that a data exploration request is received, determining a data exploration task according to a request source of the data exploration request, wherein the data exploration task is used for exploring abnormal data in service data or processing the service data;

Determining a data exploration algorithm according to the data exploration task and/or the data exploration request;

determining a data exploration plan according to the data exploration algorithm and the data exploration task;

and performing data exploration in the large data cluster according to the data exploration plan to obtain a data exploration result.

2. The method of claim 1, wherein determining a data probe task based on a request source of the data probe request comprises:

determining a data exploration range according to a request source of the data exploration request;

obtaining target data information according to the data to be probed in the data probing range;

and generating the data exploration task according to the target data information.

3. The method according to claim 2, characterized in that the target data information comprises at least the following information: the method comprises the steps of target cluster information of data to be probed, target database information of the data to be probed, target data table information of the data to be probed, submitting time of a data probing task and running state information of the data probing task.

4. A method according to claim 3, wherein the request source of the data probe request comprises at least one of: the step of determining the data exploration scope according to the request source of the data exploration request comprises the following steps of:

In the case where the request source of the data probe request is the timed batch job, the data probe range includes at least one of: the system comprises a hot spot table and a preset data table, wherein the hot spot table refers to a table with access times exceeding preset times;

in the case where the request source of the data probe request is the asynchronous near real-time query, the data probe scope includes at least the target data table, wherein the target data table is a data table in which the data probe request indicates a query.

5. A method according to claim 3, characterized in that determining a data probe algorithm from the data probe task and/or the data probe request comprises:

determining a target database in which the data to be probed are located according to the target database information;

under the condition that the target database is a structured database, determining that the data exploration algorithm is a natural language processing algorithm;

determining the data exploration algorithm as an image processing algorithm under the condition that the data stored in the target database is picture data;

in the case where the data stored in the target database is time-series related data, the data exploration algorithm is determined to be a time-series algorithm.

6. The method according to claim 1, wherein determining a data probe algorithm in dependence of the data probe task and/or the data probe request comprises:

determining the service requirement of the data exploration task according to the data exploration request;

under the condition that the service requirement is statistical data and calculation data, determining the data exploration algorithm as a statistical algorithm;

under the condition that the business requirement is a predicted data trend or a data model is established, determining the data exploration algorithm as a machine learning algorithm;

and under the condition that the service requirement is the mining data relation, determining the data exploration algorithm as a data mining algorithm.

7. The method of claim 1, wherein determining a data exploration plan in accordance with the data exploration algorithm and the data exploration task comprises:

determining the priority of the data exploration task according to the task information of the data exploration task;

performing task splitting on the data exploration task according to the data exploration algorithm to obtain a plurality of subtasks;

determining the dependency relationship among the plurality of subtasks and the dependency relationship between the data exploration task and other tasks to obtain a task dependency relationship;

And determining the data exploration plan according to the priority of the data exploration task, the plurality of subtasks and the task dependency relationship.

8. The method of claim 1, wherein performing data probing in a large data cluster according to the data probing plan, the obtaining data probing results comprising:

determining a target cluster in the big data cluster, wherein the target cluster comprises static cluster resources and dynamic cluster resources, the static cluster resources have preset quantity of cluster resources, and the dynamic cluster resources adjust the quantity of the cluster resources owned by the target cluster according to the load condition of the target cluster and the quantity of the data exploration tasks to be executed;

and processing the data exploration task according to the target cluster and the data exploration plan to obtain the data exploration result.

9. The method of claim 8, wherein after processing the data probe task in accordance with the target cluster and the data probe plan, the method further comprises:

and monitoring the execution state of the data exploration task, and sending reminding information to a first object under the condition that the data exploration task is in a preset state so as to remind the first object to process the data exploration task, wherein the first object refers to an operation and maintenance person.

10. The method of claim 1, wherein after performing data probing in a large data cluster according to the data probing plan to obtain a data probing result, the method further comprises:

receiving a query request of a target object for the data exploration result;

determining query information according to the identity information of the target object and the query request;

and inquiring the data exploration result according to the inquiry information to obtain an inquiry result, and sending the inquiry result to the target object.

11. The method of claim 10, wherein the query information includes information to be queried, a desensitization policy, and a visual presentation policy, and querying the data probing result according to the query information, and obtaining the query result includes:

inquiring in the data exploration result to obtain the information to be inquired;

desensitizing the information to be queried according to the desensitization strategy to obtain desensitized data;

and calculating and processing the desensitized data according to the visual display strategy to obtain the query result.

12. A processing apparatus for a data probe request, comprising:

A first determining unit, configured to determine a data probing task according to a request source of a data probing request in a case where the data probing request is received, where the data probing task is used for probing abnormal data in service data or is used for processing the service data;

the second determining unit is used for determining a data exploration algorithm according to the data exploration task and/or the data exploration request;

a third determining unit for determining a data exploration plan according to the data exploration algorithm and the data exploration task;

and the exploration unit is used for carrying out data exploration in the big data cluster according to the data exploration plan to obtain a data exploration result.

13. A processor, characterized in that the processor is arranged to run a program, wherein the program when run performs the method of processing a data probe request according to any of claims 1 to 11.

14. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of processing a data probe request of any of claims 1-11.