CN111858251B

CN111858251B - Data security audit method and system based on big data computing technology

Info

Publication number: CN111858251B
Application number: CN202010713842.7A
Authority: CN
Inventors: 刘迎风; 冯桂安; 梁满; 冯骏; 何怡; 傅行晓; 周亚美
Original assignee: Shanghai Big Data Center
Current assignee: Shanghai Big Data Center
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2024-04-19
Anticipated expiration: 2040-07-22
Also published as: CN111858251A

Abstract

The invention discloses a data security audit method and a system based on big data computing technology, which belong to the field of big data security, and comprise the steps of collecting log data of a server and sending the log data to a first-class processing platform; receiving one or more log data, analyzing the log data, and transmitting the analyzed log data to at least one data destination; classifying the analyzed log data, judging whether the log data is real-time data or non-real-time data, and sending the real-time data to a stream processing platform for storage; transmitting the non-real-time data to a data center for storage; respectively analyzing and processing the log data to obtain an analysis result; and generating corresponding alarm information according to the analysis result and outputting the alarm information. The invention has the beneficial effects that: the journal collection and storage are realized based on the jume, and the task scheduling and the task monitoring are introduced, so that the collection source and the output source of the jume journal are enriched; realizing data security audit, alarm monitoring management and processing and security risk identification based on flink.

Description

Data security audit method and system based on big data computing technology

Technical Field

The invention relates to the field of big data security, in particular to a data security audit method and system based on big data computing technology.

Background

In recent years, a data security audit system is increasingly important, and is mainly used for monitoring and recording various operation behaviors of a data server, analyzing various operations of the data server in real time and intelligently through analysis of network data, and recording the operations in an audit database for inquiring, analyzing and filtering in the future, so that monitoring and audit of user operation of a target data audit system are realized, and particularly when public data resources of various industries are integrated and utilized, the data security audit system is urgently needed to provide guarantee for security application, sharing exchange and opening of data.

In the existing data circulation use process, due to the lack of audit protection measures, the query mode aiming at the security event in the work is to manually screen conditions by a large amount of manpower, search is carried out in a massive log library, the audit efficiency is low, the result is greatly interfered by human factors, the problems of untimely audit, insufficient audit force and the like exist, the data security audit requirement cannot be met, the security risk exists in the data circulation use process, the traditional big data calculation method is limited by the relative constraint of disk read-write performance and network performance, the query, calculation and storage of real-time data are not efficient, and therefore, the data security audit method and system based on the big data calculation technology are urgently designed to meet the requirements of actual use.

Disclosure of Invention

In order to solve the technical problems, the invention provides a data security audit method and system based on a big data computing technology.

The technical problems solved by the invention can be realized by adopting the following technical scheme:

the invention provides a data security audit method based on big data computing technology, comprising the following steps:

step S1, collecting log data of a server and sending the collected log data to a stream processing platform;

step S2, receiving one or more log data in the stream processing platform, analyzing the log data, and outputting and sending the analyzed log data to at least one data destination;

step S3, classifying the analyzed log data, and judging whether the log data is real-time data or non-real-time data:

if the real-time data is the real-time data, the real-time data is sent to the stream processing platform for storage;

If the data is the non-real-time data, the data is sent to a data center for storage;

Step S4, according to the classification in the step S3, respectively analyzing and processing the log data to obtain an analysis result, and outputting the analysis result;

and S5, generating corresponding alarm information according to the analysis result and outputting the alarm information.

Preferably, in the step S1, during the log data collection process, the collection status and collection amount of the stream processing platform and the log data are continuously managed and monitored.

Preferably, the real-time data is subjected to online analysis processing, and the non-real-time data is subjected to offline analysis processing;

The online analysis step comprises the following steps:

step A1, classifying the real-time data and storing the classified real-time data in a cluster of the stream processing platform, wherein the cluster comprises a global event and at least one internal event;

a2, carrying out real-time association analysis on the global event and at least one internal event;

Step A3, judging whether the event is an internal event:

if yes, turning to step A4;

if not, generating the internal event and storing the internal event in one of the internal events of the cluster;

step A4, outputting a first analysis result when judging that debugging and monitoring are required;

the offline analysis step comprises the following steps:

step B1, pre-storing offline rules, and issuing the offline rules to the stream processing platform;

Step B2, receiving the offline rule, and calling the log data of the data center according to the offline rule;

Step B3, carrying out batch analysis on the non-real-time log data, outputting a second analysis result and publishing the second analysis result to the stream processing platform;

and step B4, receiving the second analysis result and sending the second analysis result to a document database.

Preferably, in the step S2, at least one parsing node parses the log data, and the parsing steps are as follows:

Step 21: initializing the log data;

step 22: extracting effective log information from the log data;

step 23: and processing the log information to obtain the log data of at least one data type, and respectively sending the log data to at least one data destination.

Preferably, in step S1, the log data is collected by controlling the log collection system in a manner of performing a functional configuration with the log collection system, where the functional configuration includes a collection frequency, a collection time period, and on and off of tasks.

The invention also provides a data security audit system based on the big data computing technology, which is applied to the data security audit method based on the big data computing technology, and comprises the following steps:

The task scheduling module is connected with the log acquisition system and is used for acquiring log data of the server and sending the acquired log data to the first stream processing platform;

The analysis module is connected with the stream processing platform and is used for receiving one or more log data in the stream processing platform, analyzing the log data and outputting and sending the analyzed log data to at least one data destination;

the audit analysis module is connected with the analysis module and used for classifying the analyzed log data and judging whether the log data is real-time data or non-real-time data:

the audit analysis module analyzes and processes the log data to obtain an analysis result and outputs the analysis result;

and the alarm module is connected with the audit analysis module and used for generating corresponding alarm information according to the analysis result and outputting the alarm information.

Preferably, the data security audit system further comprises a monitoring module which is respectively connected with the log acquisition system and the stream processing platform and is used for continuously managing and monitoring the acquisition conditions and the acquisition quantity of the stream processing platform and the log acquisition system in the log data acquisition process.

Preferably, the audit analysis module comprises:

The online analysis engine is connected with the stream processing platform and is used for carrying out real-time association analysis on the global event and a plurality of internal events of the stream processing platform and outputting a first analysis result;

And the offline analysis engine is connected with the data center and used for calling the log data in the data center according to the issued offline rule, carrying out batch analysis on the non-real-time log data and outputting a second analysis result.

Preferably, the alarm module includes:

The first alarm unit is connected with the online analysis engine and used for generating and outputting corresponding first alarm information according to the first analysis result;

and the second alarm unit is connected with the offline analysis engine and is used for generating corresponding second alarm information according to the second analysis result and outputting the second alarm information.

Preferably, the parsing module includes a plurality of parsing nodes, and each parsing node is provided with a parser for initializing the log data, extracting valid log information from the log data, obtaining the log data of at least one data type according to the log information, and sending the log data to at least one data destination respectively.

The invention has the beneficial effects that:

According to the invention, the log acquisition and storage capacity is realized based on the open source frame log acquisition system (flime), the function iteration is carried out on the log acquisition system, the concepts of task scheduling and task monitoring are introduced, and meanwhile, the log acquisition source and the output destination of the log acquisition system are enriched; the development of functional modules such as data security audit capability, alarm monitoring management and processing capability, security risk identification capability access and the like is realized by modeling based on an open source component stream processing engine (flink), the construction of a data security audit system is realized, security audit is carried out in the whole life cycle of data acquisition, transmission, storage, processing, exchange and destruction, more comprehensive security management service is provided for a large data resource platform, and the normal circulation and use of data is ensured; meanwhile, the system can continuously check, discover and early warn various abnormal and illegal behaviors in the business support system, timely discover secret-related operation events, accurately and rapidly locate operators of the secret-related events, and reserve relevant evidence for overtaking responsibility.

Drawings

FIG. 1 is a flow chart of a data security audit method based on big data calculation technology in the present invention;

FIG. 2 is a flow chart of log data parsing according to the present invention;

FIG. 3 is a flow chart of online analysis in the present invention;

FIG. 4 is a flow chart of offline analysis in the present invention;

FIG. 5 is a block diagram illustrating the task scheduling and monitoring according to the present invention;

FIG. 6 is a schematic diagram of the operation of a stream processing engine (Flink) in accordance with the present invention;

FIG. 7 is a flow chart of an online policy in the present invention;

FIG. 8 is a flow chart of an offline strategy in the present invention;

fig. 9 is a block diagram of a data security audit system according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.

The invention is further described below with reference to the drawings and specific examples, which are not intended to be limiting.

The invention provides a data security audit method based on big data computing technology, which belongs to the big data security field, as shown in fig. 1 and 5, the data security audit method comprises the following steps:

Step S1, collecting log data of a server and sending the collected log data to a first stream processing platform;

step S2, receiving one or more pieces of log data in the stream processing platform, analyzing the log data, and outputting and sending the analyzed log data to at least one data destination;

if the real-time data is the real-time data, the real-time data is sent to a stream processing platform for storage;

Specifically, the function configuration is performed between the web end and the log acquisition system 1, so that the log acquisition system 1 is controlled to aggregate log data in batches, the log data acquired by the log acquisition system 1 are called, and the log data are sent to the stream processing platform for release.

Further, in this embodiment, the log collection system 1 provided by the present invention is a distributed, highly reliable and highly available collection system, and the log collection system 1 can collect, aggregate and move log data of a large number of different data sources into a data center (Hadoop Distributed FILE SYSTEM) for storage based on an open source framework flime.

Wherein, the open source stream processing platform is written by Scala and Java by APACHE KAFKA. Kafka is a distributed publish-subscribe messaging system with high throughput and can handle all action flow data of consumers in websites. Kafka unifies on-line and off-line message processing through the parallel loading mechanism of Hadoop Distributed FILE SYSTEM (HDFS for short), and also provides real-time messages through clusters.

Specifically, a certain one or a plurality of batch subscription log data in the stream processing platform is read in a subscription mode, and the read log data are respectively analyzed, wherein the analysis method comprises the following steps: multilevel JSON flattening conversion, irregular text regular analysis and database table field mapping; the analyzed log data is converted and output and is sent to at least one different data destination, wherein the data destination comprises an open source stream processing platform (Kafka), a distributed file system (HDFS), a Lucene-based search server (ELASTIC SEARCH), httpfs, an open source database (hbase), a file, a relational database and the like, so that the problem that the log acquisition system is single in output source is solved, the function iteration is performed on the file, the task scheduling concept is introduced, the log data is analyzed and output to different output destination places through a plurality of analysis nodes, and the log acquisition source and the output source of the log acquisition system are enriched.

Specifically, during the log audit process, the log data is classified, and the log data can be classified and stored according to whether the log data is real-time data or non-real-time data, the real-time log data is sent to a stream processing platform for storage, and the non-real-time log data is sent to a data center for storage;

Audit analysis is carried out on the log data by respectively calling real-time data in the stream processing platform or non-real-time data in the data center, analysis results are obtained after analysis processing, the analysis results are output, and corresponding alarm information and/or monitoring debugging information are generated according to the analysis results and output.

Both log audit and security identification alerts employ a computing engine based on open source component APACHE FLINK. The Flink is a streaming media technology computing engine implemented by java. The functionality of the Flink is very powerful, both streaming data (STREAM DATA) and batch data (batch data) can be processed, and can also have the functionality of general purpose compute engines (Spark) and SPARK STREAMING, but unlike general purpose compute engines (Spark), flink is essentially a stream-only concept, and batch is considered SPECIAL STREAM.

A further preferred embodiment is shown in fig. 6, wherein the Flink mainly comprises three components: jobClient, jobManager and TASKMANAGER.

The user submits a flink program to JobClient, jobClient, sends the program to JobManager, jobManager, receives job program, and then feeds back to JobClient. JobManager to plan to execute the received job program, firstly, allocating resources required by the job program, namely slots to be executed on TASKMANAGERS; after the resource allocation JobManager submits a separate Task to the response TASKMANAGER. TASKMANAGER receives a task and generates a message to the thread to perform the task. When the state changes, for example, the calculation is started or completed, it is sent back JobManager to report the state of the Task at regular time. Once a job program is executed JobManager returns the task results to JobClient.

The invention realizes the log acquisition and storage capacity based on the Flume, and realizes the access of the data security audit capacity, the alarm monitoring management and processing capacity and the security risk identification capacity based on the Flink engine modeling; in the whole life cycle circulation process of data acquisition, transmission, storage, processing, exchange and destruction, security audit is carried out, normal circulation use of the data is ensured, construction of a data security audit system is realized, and more comprehensive security management service is provided for a large data resource platform. Meanwhile, the system can timely find out secret-related operation events and accurately position event operators; various anomalies and violations in the business support system are inspected, discovered, and forewarned, providing relevant evidence that can be used for accountability.

As a preferred embodiment, the data security audit method is characterized in that in the data security audit process, in the log data collection process, the log collection system 1 and the stream processing platform (Kafka) are continuously managed and monitored, the collection condition and collection amount of the log data are monitored, and the log collection and storage are monitored in real time, so that a user can know the collection condition and collection amount of the log data in real time.

As a preferred embodiment, the data security auditing method performs online analysis processing on real-time data and performs offline analysis processing on non-real-time data;

As shown in fig. 7, the online analysis step includes:

Step A1, classifying real-time data and storing the classified real-time data in a cluster of a stream processing platform, wherein the cluster comprises a global event and at least one internal event;

Step A3, judging whether the event is an internal event:

if yes, turning to step A4;

if not, generating an internal event and storing the internal event in one of the internal events of the cluster;

As shown in fig. 8, the offline analysis step includes:

The method comprises the steps of storing a plurality of offline rules in advance, issuing the offline rules and issuing the offline rules through a stream processing platform, receiving the offline rules, retrieving log data of a data center according to the offline rules, carrying out batch processing analysis on non-real-time original log data by configuring parameters of a list DB and a base line DB, outputting a second analysis result, issuing the second analysis result to Kafka, subscribing and then sending the second analysis result to a document database (ES). The offline analysis can analyze the past logs in batches and generate different alarm information according to different parameter configurations.

As a preferred embodiment, in the data security audit method, as shown in fig. 2, in step S2, at least one analysis node analyzes log data, and the analysis steps are as follows:

Step 21: initializing log data;

Step 22: extracting effective log information from the log data;

Step 23: and processing the log information to obtain log data of at least one data type, and respectively transmitting the log data to at least one data destination.

Specifically, in this embodiment, the original log data is formatted, and effective log information is extracted from the text, so that the difficulty of parsing is reduced. And analyzing the extracted log information in a mode of multilevel JSON flattening conversion, irregular text regular analysis or database table field mapping, and dynamically completing the obtained log after analysis, wherein the completion content comprises regions and countries according to IP addresses.

As a preferred embodiment, the data security audit method is configured with the log collection system 1 to collect the condition of the log data collected by the log collection system 1 in the web server by configuring parameters, and can control the log collection task to be started and closed by configuring the time, time period and starting frequency of the task to be started and closed, and configure the collection frequency, collection time period and collection amount to control the condition collected in the collection process.

The invention also provides a data security audit system based on the big data computing technology, which is applied to the data security audit method based on the big data computing technology, as shown in fig. 9, and comprises the following steps:

the task scheduling module 2 is connected with the log acquisition system 1 and is used for acquiring log data of the server and sending the acquired log data to the first stream processing platform;

The analysis module 3 is connected with the stream processing platform and is used for receiving one or more log data in the stream processing platform, analyzing the log data, outputting the analyzed log data and sending the analyzed log data to at least one data destination;

the audit analysis module 5 is connected with the analysis module 3 and is used for classifying the analyzed log data and judging whether the log data is real-time data or non-real-time data:

The audit analysis module 5 analyzes and processes the log data to obtain an analysis result and outputs the analysis result;

And the alarm module 4 is connected with the audit analysis module 5 and is used for generating corresponding alarm information according to the analysis result and outputting the alarm information.

Specifically, in this embodiment, the data security audit system includes a task scheduling module 2, an analysis module 3, an audit analysis module 5, and an alarm module 4;

The task scheduling module 2 is used for controlling the log acquisition system 1 to acquire log data of a server in a mode of configuring acquisition frequency, acquisition time period and opening and closing of tasks with the log acquisition system 1 based on the flime frame, sending the acquired log data to the stream processing platform, performing functional iteration on the flime, and introducing a concept of task scheduling;

And the analysis module 3 is used for subscribing one or more log data in the stream processing platform, analyzing the log data, and carrying out multi-source output on the analyzed log data, and enriching a flume log acquisition source and an output source.

The audit analysis module 5 is configured to store the parsed log data, and determine whether the log data is real-time data or non-real-time data:

the audit analysis module 5 retrieves the stored log data for audit analysis and then outputs an analysis result;

and the alarm module is used for generating corresponding alarm information according to the analysis result and outputting the alarm information.

As a preferred embodiment, the data security audit system further comprises a monitoring module 6, which is respectively connected with the log acquisition system 1 and the stream processing platform and is used for continuously managing and monitoring the acquisition conditions and the acquisition quantity of the stream processing platform and the log acquisition system in the log data acquisition process.

As a preferred embodiment, the data security audit system, wherein the audit analysis module 5 comprises:

The online analysis engine 51 is connected with the stream processing platform, and the online analysis engine 51 based on the Flink framework is used for carrying out real-time association analysis on the global event and a plurality of internal events of the stream processing platform and outputting a first analysis result;

The offline analysis engine 52 is connected to the data center, and the offline analysis engine 52 based on the link framework is configured to retrieve the original log data in the data center according to the issued offline rule, perform batch analysis on the non-real-time original log data, and output a second analysis result.

Specifically, the online analysis engine 51 and the offline analysis engine 52 are both based on a link framework, and the link comprises a predefined window distributor, such as a rolling window, a sliding window, a session window and a global window, and the real-time online analysis engine can create a window to perform windowed analysis on real-time log stream data, so as to generate an alarm signal and monitor and debug information in real time, and the alarm signal and the monitor and debug information are given to operators, so that the operators can process in time, and the loss is reduced.

As a preferred embodiment, the data security audit system, wherein the alarm module 4 comprises:

The first alarm unit 41 is connected to the online analysis engine 51, and is configured to generate and output corresponding first alarm information according to a first analysis result;

the second alarm unit 42 is connected to the offline analysis engine 52, and is configured to generate and output corresponding second alarm information according to the second analysis result.

As a preferred implementation manner, the data security audit system further comprises an audit report module which is respectively connected with the audit analysis module 5 and the alarm module and used for generating a corresponding audit report according to the analysis result and the alarm information.

As a preferred embodiment, the data security audit system includes a plurality of analysis nodes, where each analysis node is correspondingly provided with a Parser host as an analyzer, and the Parser host is configured to perform initialization processing on log data, extract effective log information from the log data, obtain log data of at least one data type according to the log information, and send the log data to at least one data destination respectively.

Specifically, the log information is processed to obtain data with different data types, and the data is sent to a data destination, wherein the data destination comprises ELASTICSEARCH, HBSE/HDFS, druid, CVS and other output sources with various types.

The invention has the beneficial effects that:

The foregoing description is only illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the scope of the invention, and it will be appreciated by those skilled in the art that equivalent substitutions and obvious variations may be made using the description and illustrations of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. The data security audit method based on the big data calculation technology is characterized by comprising the following steps of:

S5, generating corresponding alarm information according to the analysis result and outputting the alarm information;

performing online analysis processing on the real-time data, wherein the online analysis step comprises the following steps:

Step A3, judging whether the event is an internal event:

if yes, turning to step A4;

If not, generating the internal event, returning to the step A1, and storing the generated internal event in one of the internal events of the cluster;

and step A4, outputting a first analysis result when judging that debugging and monitoring are required.

2. The data security audit method based on big data calculation technology according to claim 1, wherein in the step S1, the collection status and collection amount of the stream processing platform and the log data are continuously managed and monitored in the log data collection process.

3. The data security audit method based on big data calculation technology according to claim 1, characterized in that the non-real-time data is analyzed and processed offline;

the offline analysis step comprises the following steps:

4. The data security audit method based on big data computing technology according to claim 1, wherein in the step S2, at least one parsing node parses the log data, and the parsing step is as follows:

Step 21: initializing the log data;

step 22: extracting effective log information from the log data;

5. The data security audit method based on big data computing technology according to claim 1, wherein in the step S1, the log data is collected by the log collection system is controlled by performing a function configuration on the log collection system, and the function configuration includes collection frequency, collection time period, and on and off of tasks.

6. A data security audit system based on big data computing technology, characterized in that it is applied to the data security audit method based on big data computing technology as claimed in any one of claims 1-5, comprising:

7. The data security audit system based on big data computing technology according to claim 6, further comprising a monitoring module respectively connected to the log collection system and the stream processing platform, and configured to continuously manage and monitor collection conditions and collection amounts of the stream processing platform and the log collection system during the collection of the log data.

8. The data security audit system based on big data computing technique according to claim 6 wherein the audit analysis module includes:

9. The data security audit system based on big data calculation technique according to claim 8 wherein the alarm module includes:

10. The data security audit system based on big data computing technology according to claim 6, wherein the parsing module includes a plurality of parsing nodes, and a parser is correspondingly disposed at each parsing node, and is configured to initialize the log data, extract valid log information from the log data, obtain the log data of at least one data type according to the log information, and send the log data to at least one data destination respectively.