CN113672668A - Log real-time processing method and device in big data scene - Google Patents

Log real-time processing method and device in big data scene Download PDF

Info

Publication number
CN113672668A
CN113672668A CN202110993417.2A CN202110993417A CN113672668A CN 113672668 A CN113672668 A CN 113672668A CN 202110993417 A CN202110993417 A CN 202110993417A CN 113672668 A CN113672668 A CN 113672668A
Authority
CN
China
Prior art keywords
log
database
data
real
updated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110993417.2A
Other languages
Chinese (zh)
Inventor
李井新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
ICBC Technology Co Ltd
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
ICBC Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC, ICBC Technology Co Ltd filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110993417.2A priority Critical patent/CN113672668A/en
Publication of CN113672668A publication Critical patent/CN113672668A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention belongs to the technical field of big data, and provides a method and a device for processing logs in a big data scene in real time, wherein the method for processing logs in the big data scene in real time comprises the following steps: collecting a plurality of message bodies from a log storage server; analyzing and filtering the plurality of message bodies to obtain data before and after the log is updated, operation types, database names of the operations and table names corresponding to the operations; and searching the log according to the data before and after the log is updated, the operation type, the database name and the table name of the operation and the search condition of the user. The invention can accurately process the relevant log data for billions of massive log data under the condition of not invading the service system code.

Description

Log real-time processing method and device in big data scene
Technical Field
The application belongs to the technical field of big data, and particularly relates to a log real-time processing method and device in a big data scene.
Background
In the prior art, most of the applications use a relational database to manage data, such as commodity information, category information, and the like. In terms of commodity systems, billions of data are currently stored in MySQL databases, including products, sku, and other information. In the face of such a large amount of data, in order to clearly record and quickly search the corresponding data operation record, a log processing scheme is born.
The existing log system architecture is an ELK architecture, and can be used for collecting, storing and searching logs. The ELK is Elasticissearch + Logstash + Kibana, the Logstash is mainly used for log collection, data can be read from a database and transmitted into Elasticissearch for storage, the Elasticissearch provides functions of searching, word segmentation, data storage and the like, and the Kibana is a Web-based graphical interface and can perform visualization operations such as gathering, analyzing and searching on log data stored in the Elasticissearch. In the structure, the Logstash consumes large resources during operation, occupies a high CPU and a high memory during operation, and has no message queue cache, so that the risk of data loss exists. Therefore, another architecture adds a message queue on the basis of the above, and the data flow is as follows: the Logstash Agent of each node firstly transmits data to Redis or Kafka and also indirectly transmits the data to Logstash, and the Logstash processes the data and then transmits the processed data to the elastic search, so that the risk of data loss is avoided due to the introduction of Redis or Kafka.
From the above analysis, it can be known that Logstash is used for data collection, the Logstash is a table that is directly read, and the read operation data is data after incremental update, and the following disadvantages exist in the process:
(1) data before updating in the table cannot be acquired;
(2) being a table read, it can put a very large strain on the database;
(3) the data read from the table is limited to be operated, and the data cannot be directly operated. And the elastic search is more suitable for retrieval and is suitable for scenes with a large number of reading operations. According to the actual business analysis, most scenes of the commodity log system are write-in operations, and read operations are read operations only in a few scenes of troubleshooting problems. Therefore, another storage technology, Hbase, which supports mass data storage and is more consistent with a service scene is selected for the commodity log system. The data visualization function of Kibana is not suitable for operators without technical foundation because the learning cost is too high.
Disclosure of Invention
The invention can be used in the technical field of application of big data technology in finance, and can also be used in any field except the finance field.
In order to solve the technical problems, the invention provides the following technical scheme:
in a first aspect, the present invention provides a log real-time processing method in a big data scenario, including:
collecting a plurality of message bodies from a log storage server;
analyzing and filtering the plurality of message bodies to obtain data before and after the log is updated, operation types, database names of the operations and table names corresponding to the operations;
and searching the log according to the data before and after the log is updated, the operation type, the database name and the table name of the operation and the search condition of the user.
In an embodiment, parsing and filtering the plurality of message bodies to obtain the message bodies before and after the log update, the operation type, the database name corresponding to the operation, and the table name corresponding to the operation includes:
judging whether the plurality of message bodies belong to operations corresponding to the same thing or not according to the message body types;
if so, analyzing and filtering a plurality of message bodies of the operation corresponding to the same thing to obtain the data before and after the log is updated, the operation type, the database name of the operation and the table name corresponding to the operation.
In an embodiment, the searching the log according to the data before and after the log is updated, the operation type, the database name and the table name of the operation, and the search condition of the user includes:
packaging the data before and after the log is updated, the operation type, the database name of the operation and the table name corresponding to the operation into an object;
storing the object into an HBase database;
and searching the log in the HBase database according to the search condition.
In one embodiment, the collecting the plurality of message bodies from the log storage server includes:
and utilizing the Canal to intermediately collect the plurality of message bodies.
In an embodiment, before collecting the plurality of message bodies from the log storage server, the method further includes:
and configuring a database address of a log according to the current service and a record table corresponding to the log.
In a second aspect, the present invention provides a log real-time processing apparatus in a big data scenario, where the apparatus includes:
the message body acquisition module is used for acquiring a plurality of message bodies from the log storage server;
the message body analysis module is used for analyzing and filtering the message bodies to obtain data, operation types, database names of operations and table names corresponding to the operations before and after the log is updated;
and the log searching module is used for searching the log according to the data before and after the log is updated, the operation type, the database name and the table name of the operation and the searching condition of the user.
In one embodiment, the message body parsing module includes:
the message body judging unit is used for judging whether the message bodies belong to the operation corresponding to the same thing or not according to the message body types;
and the message body analysis unit is used for analyzing and filtering a plurality of message bodies of the operation corresponding to the same thing if the operation is true, so as to obtain the data before the log is updated and the updated data, the operation type, the database name of the operation and the table name corresponding to the operation.
In one embodiment, the log search module comprises:
the object generating unit is used for packaging the data before and after the log is updated, the operation type, the database name of the operation and the table name corresponding to the operation into an object;
the object storage unit is used for storing the object into an HBase database;
the log searching unit is used for searching the log in the HBase database according to the searching condition;
the message body acquisition module comprises:
the message body acquisition unit is used for acquiring the plurality of message bodies by using the Canal;
the log real-time processing device under the big data scene further comprises:
and the record table configuration unit is used for configuring the database address of the log and the record table corresponding to the log according to the current service.
In a third aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the log real-time processing method in a big data scenario when executing the program.
In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of a method for real-time processing of logs in a big data scenario.
As can be seen from the above description, an embodiment of the present invention provides a method and an apparatus for processing logs in a big data scenario in real time, where a plurality of message bodies are collected from a log storage server; then, analyzing and filtering the plurality of message bodies to obtain data before and after log updating, operation types, database names of the operations and table names corresponding to the operations; and finally, searching the log according to the data before and after the log is updated, the operation type, the database name and the table name of the operation and the search condition of the user. The invention can realize no code intrusion, does not influence an online service system, supports the real-time processing of log data of hundreds of millions of levels every day, can quickly integrate and access the log system to a system adopting the MySQL database, and reduces the system access cost. On the other hand, the invention can record massive operation historical data and can quickly and accurately inquire the historical operation log data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a first flowchart illustrating a log real-time processing method in a big data scenario according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating step 200 according to an embodiment of the present invention;
FIG. 3 is a flow chart illustrating step 300 according to an embodiment of the present invention;
FIG. 4 is a flow chart illustrating step 100 according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating a log real-time processing method in a big data scenario according to an embodiment of the present invention;
fig. 6 is a schematic flowchart of a log real-time processing method in a big data scene in an embodiment of the present invention;
FIG. 7 is a diagram illustrating the architecture of a logging system in accordance with an embodiment of the present invention;
FIG. 8 is a first block diagram of a log real-time processing apparatus in a big data scenario according to an embodiment of the present invention;
FIG. 9 is a block diagram of the message body parsing module 20 according to the embodiment of the present invention;
FIG. 10 is a block diagram of the log search module 30 according to an embodiment of the present invention;
FIG. 11 is a block diagram of the message body collection module 10 according to an embodiment of the present invention;
FIG. 12 is a second block diagram of a log real-time processing apparatus in a big data scenario according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of an electronic device in an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of this application and the above-described drawings, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The embodiment of the present invention provides a specific implementation of a log real-time processing method in a big data scene, and referring to fig. 1, the method specifically includes the following steps:
step 100: and collecting a plurality of message bodies from the log storage server.
It should be noted that the message body here is composed of three parts, namely, a message type, a target and a message body, and a timestamp may be added according to the data source characteristics as appropriate.
Step 200: and analyzing and filtering the plurality of message bodies to obtain the data before and after the log is updated, the operation type, the database name of the operation and the table name corresponding to the operation.
Specifically, the data in a transaction is parsed and filtered, and the data before and after updating, the operation type (update, delete, insert), the database name and the table name of the operation can be obtained. In addition, some tables and fields which need special processing are subjected to custom logic processing, and finally packaged into an object containing information such as table types, service main keys (table main keys), operation time, operation contents, operators, operation sources, operation IP and the like, and stored into the HBase database.
Step 300: and searching the log according to the data before and after the log is updated, the operation type, the database name and the table name of the operation and the search condition of the user.
Specifically, data are read from the HBase according to search conditions transmitted by a search page and then returned to the front end to be displayed to a user. The search page is mainly realized by velocity and js, and the page mainly realizes the functions of table selection, a service main key input box, keyword filtering, time period selection and the like. And searching corresponding data from the Hbase according to the user search term, performing paging processing, and returning to a front-end page for display.
As can be seen from the above description, an embodiment of the present invention provides a log real-time processing method in a big data scenario, which first collects a plurality of message bodies from a log storage server; then, analyzing and filtering the plurality of message bodies to obtain data before and after log updating, operation types, database names of the operations and table names corresponding to the operations; and finally, searching the log according to the data before and after the log is updated, the operation type, the database name and the table name of the operation and the search condition of the user. The invention can realize no code intrusion, does not influence an online service system, supports the real-time processing of log data of hundreds of millions of levels every day, can quickly integrate and access the log system to a system adopting the MySQL database, and reduces the system access cost. On the other hand, the invention can record massive operation historical data and can quickly and accurately inquire the historical operation log data.
In one embodiment, referring to fig. 2, step 200 further comprises:
step 201: judging whether the plurality of message bodies belong to operations corresponding to the same thing or not according to the message body types;
step 202: if so, analyzing and filtering a plurality of message bodies of the operation corresponding to the same thing to obtain the data before and after the log is updated, the operation type, the database name of the operation and the table name corresponding to the operation.
In step 201 and step 202, first, it is discriminated whether or not the operations are the same transaction according to the type of the message body, and the messages of one transaction are processed together. And analyzing and filtering the data in one transaction to obtain: pre-and post-update data, operation type (update, delete, insert), database name and table name of the operation. Some tables and fields which need special processing are subjected to custom logic processing, and finally packaged into an object containing information such as table types, service main keys (table main keys), operation time, operation contents, operators, operation sources, operation IP and the like, and stored into an HBase database.
In one embodiment, referring to fig. 3, step 300 further comprises:
step 301: packaging the data before and after the log is updated, the operation type, the database name of the operation and the table name corresponding to the operation into an object;
step 302: storing the object into an HBase database;
HBase is a distributed, column-oriented storage system built on top of HDFS. HBase can be used when real-time reading and writing and random access to a super-large-scale data set are needed. The HBase is constructed from bottom to top, the problem that an original database is difficult to expand transversely is solved, transverse expansion can be achieved simply by adding nodes by using the HBase, the storage scale is enlarged, and the ultra-large sparse table is managed on a cluster formed by cheap and common hardware. Although there are many strategies and implementations of data storage and access, in fact most solutions, especially of some relational types, are built without taking into account the very large scale and distributed nature. Many businesses expand databases by means of replication and partitioning to break through the boundaries of individual nodes, but these functions are often added afterwards, and installation and maintenance are complex. At the same time, certain functions of the RDBMS are also affected, such as joins, complex queries, triggers, views, and foreign key constraints, which are costly to implement on large RDBMS or even impossible at all.
It is understood that the technical effects of the present invention can be also achieved using the elastic search instead of the Hbase.
Step 303: and searching the log in the HBase database according to the search condition.
HBase addresses the scalability problem from another perspective. In particular, it extends by adding nodes in a linear fashion from bottom to top. HBase is not a relational database nor SQL is supported, but has its own specialties that RDBMS cannot handle, and it skillfully puts large and sparse tables on a commercial server cluster. HBase is an open source implementation of Google Bigtable, is similar to Google Bigtable in that GFS is used as a file storage system of the HBase, and is similar to Google Bigtable in that Hadoop HDFS is used as the file storage system of the HBase; *** runs MapReduce to process mass data in Bigtable, and HBase also utilizes Hadoop MapReduce to process mass data in HBase; *** Bigtable utilizes Chubby as a collaborative service and HBase utilizes Zookepper as a counterpart.
In one embodiment, referring to fig. 4, step 100 further comprises:
step 101: and utilizing the Canal to intermediately collect the plurality of message bodies.
An open source middleware Canal is selected to collect data from the database. The advantages of Canal are as follows: the method has the advantages that the Canal log of the MySQL master library is monitored by the Canal simulation MySQL slave library, code intrusion is avoided, data can be synchronized in near real time, and meanwhile, even if massive data are encountered, the pressure on the database and the influence on online services cannot be caused.
When the step 101 is implemented, the method specifically comprises the following steps: the slave simulates an interaction protocol of the mysql slave, pretends to be the mysql slave, sends a dump protocol to the mysql master, and then the mysql master receives the dump request and starts to push the binary log to the slave (namely the slave) slave analysis binary log object (originally, byte stream).
In an embodiment, referring to fig. 5, before step 100, the method for processing logs in a big data scenario in real time further includes:
step 400: and configuring a database address of a log according to the current service and a record table corresponding to the log.
Specifically, a database address and a corresponding table for recording an operation log are configured according to service characteristics, a zookeeper address is configured to be accessed to a zookeeper registration center, the zookeeper is used for recording operation site information of the MySQL log, namely a data updating time point of current MySQL operation is used for monitoring data processing conditions, and if delay exists, the data passing site can be judged. And the Server terminal collects data from the MySQL and sends the data to the Client terminal.
It can be understood that ZooKeeper is a distributed, open source distributed application coordination service, is an open source implementation of Chubby by Google, and is an important component of Hadoop and Hbase. It is a software that provides a consistent service for distributed applications, and the functions provided include: configuration maintenance, domain name service, distributed synchronization, group service, etc. ZooKeeper is based on Fast Paxos algorithm, which has livelock problem, that is, when there are multiple servers submitted in a staggered way, there is a possibility that none of the servers can successfully submit, Fast Paxos has done some optimization, and a leader (leader) is generated by election, and only the leader can submit the servers.
In a specific embodiment, the present invention further provides a specific embodiment of a log real-time processing method in a big data scenario, and with reference to fig. 6, the following contents are specifically included.
Referring to fig. 7, firstly, an architecture of a log system is provided, and the architecture is designed to be divided into a Canal Server service, a Canal Client service and a log search service.
The Client Server service replaces the Logstash service in the prior art, provides a data acquisition function, acquires data from MySQL and sends the data to the Client service. The cancer Client service performs a series of processing such as analysis and filtration on the acquired data according to the service characteristics, and then stores the data into the HBase database. The WEB program of the log search service provides an operation page which can quickly and accurately search the real-time log.
It should be noted that the framework platform implements backend services by Java, because the framework can be horizontally extended to other applications, and is applicable to all scenarios requiring recording MySQL operation records, the following does not describe specific databases and tables. The invention selects open source middleware Canal to collect data from the database. The advantages of Canal are as follows: the method has the advantages that the Canal log of the MySQL master library is monitored by the Canal simulation MySQL slave library, code intrusion is avoided, data can be synchronized in near real time, and meanwhile, even if massive data are encountered, the pressure on the database and the influence on online services cannot be caused.
Based on the above log system architecture, the log real-time processing method in the big data scenario provided by the present embodiment includes the following steps:
s0: and configuring a database address of the log according to the current service and a record table corresponding to the log.
The Canal Server service configures a database address and a corresponding table for recording an operation log according to service characteristics, configures a zookeeper address to access a zookeeper registration center, wherein the zookeeper is used for recording operation site information of the MySQL log, namely a data updating time point of the current MySQL operation and is used for monitoring data processing conditions, and if delay exists, the data passing site can be judged. And the Server terminal collects data from the MySQL and sends the data to the Client terminal.
S1: and collecting a plurality of message bodies from the log storage server.
The Canal Server service configures a database address and a corresponding table for recording an operation log according to service characteristics, configures a zookeeper address to access a zookeeper registration center, wherein the zookeeper is used for recording operation site information of the MySQL log, namely a data updating time point of the current MySQL operation and is used for monitoring data processing conditions, and if delay exists, the data passing site can be judged. And the Server terminal collects data from the MySQL and sends the data to the Client terminal.
S2: and analyzing and filtering the plurality of message bodies to obtain the data before and after the log is updated, the operation type, the database name of the operation and the table name corresponding to the operation.
The Client service receives data sent by the server, firstly distinguishes whether the data are the same transaction operation according to the type of the message body, and processes the messages of one transaction together. And analyzing and filtering the data in one transaction to obtain: pre-and post-update data, operation type (update, delete, insert), database name and table name of the operation. Some tables and fields which need special processing are subjected to custom logic processing, and finally packaged into an object containing information such as table types, service main keys (table main keys), operation time, operation contents, operators, operation sources, operation IP and the like, and stored into an HBase database.
S3: and searching the log according to the data before and after the log is updated, the operation type, the database name and the table name of the operation and the search condition of the user.
The log search service mainly realizes that data is read from HBase according to search conditions transmitted by a search page and then returned to the front end to be displayed to a user. The search page is mainly realized by velocity and js, and the page mainly realizes the functions of table selection, a service main key input box, keyword filtering, time period selection and the like. And searching corresponding data from the Hbase according to the user search term, performing paging processing, and returning to a front-end page for display.
As can be seen from the above description, an embodiment of the present invention provides a log real-time processing method in a big data scenario, which first collects a plurality of message bodies from a log storage server; then, analyzing and filtering the plurality of message bodies to obtain data before and after log updating, operation types, database names of the operations and table names corresponding to the operations; and finally, searching the log according to the data before and after the log is updated, the operation type, the database name and the table name of the operation and the search condition of the user. The invention mainly aims to accurately process relevant log data for billions of levels of massive log data under the condition of not invading a business system code, comprises values before and after change, operators, operation time, corresponding database tables and the like, provides a WEB page and supports real-time, quick and accurate search of operation history records of relevant information such as commodities and the like.
Based on the same inventive concept, the embodiment of the present application further provides a log real-time processing apparatus in a big data scene, which can be used to implement the method described in the foregoing embodiment, such as the following embodiments. The principle of the real-time log processing device in the big data scene for solving the problems is similar to the real-time log processing method in the big data scene, so the implementation of the real-time log processing device in the big data scene can refer to the implementation of the real-time log processing method in the big data scene, and repeated parts are not repeated. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. While the system described in the embodiments below is preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.
The embodiment of the present invention provides a specific implementation manner of a log real-time processing device in a big data scene, which can implement a log real-time processing method in a big data scene, and referring to fig. 8, the log real-time processing device in the big data scene specifically includes the following contents:
a message body collecting module 10, configured to collect a plurality of message bodies from the log storage server;
a message body analysis module 20, configured to analyze and filter the message bodies to obtain data before and after the log is updated, an operation type, a database name of the operation, and a table name corresponding to the operation;
and the log searching module 30 is configured to search the log according to the data before and after the log is updated, the operation type, the database name and the table name of the operation, and the search condition of the user.
In one embodiment, referring to fig. 9, the message body parsing module 20 includes:
a message body judging unit 201, configured to judge, according to a message body type, whether the plurality of message bodies belong to an operation corresponding to the same thing;
and the message body analysis unit 202 is configured to analyze and filter a plurality of message bodies of the operation corresponding to the same thing if the operation is true, so as to obtain the data before the log is updated and after the log is updated, the operation type, the database name of the operation, and the table name corresponding to the operation.
In one embodiment, referring to fig. 10, the log search module 30 includes:
an object generating unit 301, configured to encapsulate data before and after the log is updated, the operation type, the database name of the operation, and the table name corresponding to the operation into an object;
an object storage unit 302, configured to store the object in an HBase database;
a log searching unit 303, configured to search the log in the HBase database according to the search condition;
in one embodiment, referring to fig. 11, the message body collecting module 10 includes:
a message body acquisition unit 101, configured to acquire the plurality of message bodies by using Canal;
in an embodiment, referring to fig. 12, the log real-time processing apparatus in the big data scenario further includes:
and the record table configuration unit 40 is configured to configure a database address of a log and a record table corresponding to the log according to the current service.
As can be seen from the above description, an embodiment of the present invention provides a log real-time processing apparatus in a big data scenario, which first collects a plurality of message bodies from a log storage server; then, analyzing and filtering the plurality of message bodies to obtain data before and after log updating, operation types, database names of the operations and table names corresponding to the operations; and finally, searching the log according to the data before and after the log is updated, the operation type, the database name and the table name of the operation and the search condition of the user. The invention can realize no code intrusion, does not influence an online service system, supports the real-time processing of log data of hundreds of millions of levels every day, can quickly integrate and access the log system to a system adopting the MySQL database, and reduces the system access cost. On the other hand, the invention can record massive operation historical data and can quickly and accurately inquire the historical operation log data.
An embodiment of the present application further provides a specific implementation manner of an electronic device, which is capable of implementing all steps in the log real-time processing method in the big data scenario in the foregoing embodiment, and referring to fig. 13, the electronic device specifically includes the following contents:
a processor (processor)1201, a memory (memory)1202, a communication Interface 1203, and a bus 1204;
the processor 1201, the memory 1202 and the communication interface 1203 complete communication with each other through the bus 1204; the communication interface 1203 is used for implementing information transmission between related devices such as server-side devices and client-side devices;
the processor 1201 is configured to call the computer program in the memory 1202, and the processor executes the computer program to implement all the steps in the log real-time processing method in the big data scenario in the above embodiments, for example, the processor executes the computer program to implement the following steps:
step 100: collecting a plurality of message bodies from a log storage server;
step 200: analyzing and filtering the plurality of message bodies to obtain data before and after the log is updated, operation types, database names of the operations and table names corresponding to the operations;
step 300: and searching the log according to the data before and after the log is updated, the operation type, the database name and the table name of the operation and the search condition of the user.
An embodiment of the present application further provides a computer-readable storage medium capable of implementing all the steps in the log real-time processing method in the big data scenario in the foregoing embodiment, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, implements all the steps in the log real-time processing method in the big data scenario in the foregoing embodiment, for example, when the processor executes the computer program, the processor implements the following steps:
step 100: collecting a plurality of message bodies from a log storage server;
step 200: analyzing and filtering the plurality of message bodies to obtain data before and after the log is updated, operation types, database names of the operations and table names corresponding to the operations;
step 300: and searching the log according to the data before and after the log is updated, the operation type, the database name and the table name of the operation and the search condition of the user.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the hardware + program class embodiment, since it is substantially similar to the method embodiment, the description is simple, and the relevant points can be referred to the partial description of the method embodiment.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Although the present application provides method steps as in an embodiment or a flowchart, more or fewer steps may be included based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual apparatus or client product executes, it may execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, in implementing the embodiments of the present description, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of multiple sub-modules or sub-units, and the like. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
The embodiments of this specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The described embodiments may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of an embodiment of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
The above description is only an example of the embodiments of the present disclosure, and is not intended to limit the embodiments of the present disclosure. Various modifications and variations to the embodiments described herein will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present specification should be included in the scope of the claims of the embodiments of the present specification.

Claims (10)

1. A log real-time processing method in a big data scene is characterized by comprising the following steps:
collecting a plurality of message bodies from a log storage server;
analyzing and filtering the plurality of message bodies to obtain data before and after the log is updated, operation types, database names of the operations and table names corresponding to the operations;
and searching the log according to the data before and after the log is updated, the operation type, the database name and the table name of the operation and the search condition of the user.
2. The real-time log processing method according to claim 1, wherein parsing and filtering the plurality of message bodies to obtain message bodies before and after the log update, operation types, database names corresponding to the operations, and table names corresponding to the operations comprises:
judging whether the plurality of message bodies belong to operations corresponding to the same thing or not according to the message body types;
if so, analyzing and filtering a plurality of message bodies of the operation corresponding to the same thing to obtain the data before and after the log is updated, the operation type, the database name of the operation and the table name corresponding to the operation.
3. The real-time log processing method according to claim 1, wherein the searching the log according to the data before and after the log is updated, the operation type, the database name and table name of the operation, and the search condition of the user comprises:
packaging the data before and after the log is updated, the operation type, the database name of the operation and the table name corresponding to the operation into an object;
storing the object into an HBase database;
and searching the log in the HBase database according to the search condition.
4. The log real-time processing method of claim 1, wherein the collecting a plurality of message bodies from the log storage server comprises:
and utilizing the Canal to intermediately collect the plurality of message bodies.
5. The log real-time processing method of claim 1, wherein before collecting the plurality of message bodies from the log storage server, further comprising:
and configuring a database address of a log according to the current service and a record table corresponding to the log.
6. A log real-time processing device under a big data scene is characterized by comprising:
the message body acquisition module is used for acquiring a plurality of message bodies from the log storage server;
the message body analysis module is used for analyzing and filtering the message bodies to obtain data, operation types, database names of operations and table names corresponding to the operations before and after the log is updated;
and the log searching module is used for searching the log according to the data before and after the log is updated, the operation type, the database name and the table name of the operation and the searching condition of the user.
7. The log real-time processing apparatus of claim 6, wherein the message body parsing module comprises:
the message body judging unit is used for judging whether the message bodies belong to the operation corresponding to the same thing or not according to the message body types;
and the message body analysis unit is used for analyzing and filtering a plurality of message bodies of the operation corresponding to the same thing if the operation is true, so as to obtain the data before the log is updated and the updated data, the operation type, the database name of the operation and the table name corresponding to the operation.
8. The log real-time processing apparatus of claim 7, wherein the log search module comprises:
the object generating unit is used for packaging the data before and after the log is updated, the operation type, the database name of the operation and the table name corresponding to the operation into an object;
the object storage unit is used for storing the object into an HBase database;
the log searching unit is used for searching the log in the HBase database according to the searching condition;
the message body acquisition module comprises:
the message body acquisition unit is used for acquiring the plurality of message bodies by using the Canal;
the log real-time processing device under the big data scene further comprises:
and the record table configuration unit is used for configuring the database address of the log and the record table corresponding to the log according to the current service.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the log real-time processing method in big data scenario as claimed in any one of claims 1 to 5 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the steps of the log real-time processing method in big data scenario as claimed in any one of claims 1 to 5.
CN202110993417.2A 2021-08-27 2021-08-27 Log real-time processing method and device in big data scene Pending CN113672668A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110993417.2A CN113672668A (en) 2021-08-27 2021-08-27 Log real-time processing method and device in big data scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110993417.2A CN113672668A (en) 2021-08-27 2021-08-27 Log real-time processing method and device in big data scene

Publications (1)

Publication Number Publication Date
CN113672668A true CN113672668A (en) 2021-11-19

Family

ID=78547066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110993417.2A Pending CN113672668A (en) 2021-08-27 2021-08-27 Log real-time processing method and device in big data scene

Country Status (1)

Country Link
CN (1) CN113672668A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114826944A (en) * 2022-04-20 2022-07-29 中科嘉速(北京)信息技术有限公司 Website operation analysis system and method based on ELK and canal technologies
CN114970479A (en) * 2022-07-29 2022-08-30 飞狐信息技术(天津)有限公司 Chart generation method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114826944A (en) * 2022-04-20 2022-07-29 中科嘉速(北京)信息技术有限公司 Website operation analysis system and method based on ELK and canal technologies
CN114970479A (en) * 2022-07-29 2022-08-30 飞狐信息技术(天津)有限公司 Chart generation method and device

Similar Documents

Publication Publication Date Title
CN109918349B (en) Log processing method, log processing device, storage medium and electronic device
Tas et al. An approach to standalone provenance systems for big social provenance data
CN107103064B (en) Data statistical method and device
US9842134B2 (en) Data query interface system in an event historian
Kumar et al. 2scent: An efficient algorithm to enumerate all simple temporal cycles
CN113360554B (en) Method and equipment for extracting, converting and loading ETL (extract transform load) data
CN110675194A (en) Funnel analysis method, device, equipment and readable medium
CN113672668A (en) Log real-time processing method and device in big data scene
CN107016039B (en) Database writing method and database system
US20160170834A1 (en) Block data storage system in an event historian
CN112148578A (en) IT fault defect prediction method based on machine learning
CN113791586A (en) Novel industrial APP and identification registration analysis integration method
US9892275B2 (en) Data encryption in a multi-tenant cloud environment
CN113360581A (en) Data processing method, device and storage medium
US10826965B2 (en) Network monitoring to identify network issues
CN113760845A (en) Log processing method, system, device, client and storage medium
CN113672692A (en) Data processing method, data processing device, computer equipment and storage medium
Namiot et al. On data stream processing in IoT applications
Shakhovska et al. Big Data information technology and data space architecture
CN113220530B (en) Data quality monitoring method and platform
CN115809311A (en) Data processing method and device of knowledge graph and computer equipment
CN114490865A (en) Database synchronization method, device, equipment and computer storage medium
Chardonnens Big data analytics on high velocity streams
US10579601B2 (en) Data dictionary system in an event historian
US11100077B2 (en) Event table management using type-dependent portions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination