CN106776855A - The processing method of Kafka data is read based on Spark Streaming - Google Patents

The processing method of Kafka data is read based on Spark Streaming Download PDF

Info

Publication number
CN106776855A
CN106776855A CN201611069230.9A CN201611069230A CN106776855A CN 106776855 A CN106776855 A CN 106776855A CN 201611069230 A CN201611069230 A CN 201611069230A CN 106776855 A CN106776855 A CN 106776855A
Authority
CN
China
Prior art keywords
kafka
data
sparkstreaming
read
failure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611069230.9A
Other languages
Chinese (zh)
Other versions
CN106776855B (en
Inventor
程永新
谢涛
王仁铮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Qingwei Software Co Ltd
Original Assignee
Shanghai Qingwei Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Qingwei Software Co Ltd filed Critical Shanghai Qingwei Software Co Ltd
Priority to CN201611069230.9A priority Critical patent/CN106776855B/en
Publication of CN106776855A publication Critical patent/CN106776855A/en
Application granted granted Critical
Publication of CN106776855B publication Critical patent/CN106776855B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of processing method that Kafka data are read based on Spark Streaming, comprise the following steps:S1 in) storing data in topic using Kafka;S2 it is) using Spark Streaming that real time input data stream is blocking as unit cutting with timeslice;S3 number) is recorded previously according to Kafka data failures, SparkStreaming complement scheduling times is set;S4) monitor in real time SparkStreaming reads Kafka data procedures;S5 Kafka data) are re-read by SparkStreaming.The present invention records number and sets SparkStreaming complement scheduling times according to Kafka data failures, and monitor in real time reading process simultaneously re-reads failure record number and carries out complement, more flexibly, easily accomplishes that zero loses several guarantees.

Description

The processing method of Kafka data is read based on Spark Streaming
Technical field
Read based on Spark Streaming the present invention relates to a kind of Kafka data processing methods, more particularly to one kind The processing method of Kafka data.
Background technology
Spark Streaming are to calculate streaming to resolve into a series of short and small batch processing jobs.Here batch processing Engine is Spark, that is, the input data of Spark Streaming is divided into one section one according to batch size (such as 1 second) The data (Discretized Stream) of section, the RDD (Resilient in Spark are all converted into per one piece of data Distributed Dataset), then the Transformation operations in Spark Streaming to DStream are changed into For the Transformation operations in Spark to RDD, RDD is become into intermediate result by operation and is stored in internal memory.It is whole Individual streaming is calculated can be overlapped according to the demand of business to middle result, or external equipment is arrived in storage.Fig. 1 shows The whole flow process of Spark Streaming.
Kafka is distributed post-subscription message system.It is initially developed by LinkedIn companies, is turned into afterwards A part for Apache projects.Kafka is one distributed, can be divided, persistent log services of redundancy backup.It Mainly for the treatment of active stream data, as shown in Figure 2.
It is well known that real-time, stability, accuracy requirement more and more higher of the big data epoch to data processing;Now The combo architectures of rise have SparkStreaming to dock Kafka, and it is excellent to be based on internal memory iterative calculation by SparkStreaming Gesture and Kafka high concurrent data distribution capabilities, and then reach the real-time of data processing;But SparkStreaming is docked During kafka, potential loss of data scene still occurs unavoidably, detailed process is as follows:
1st, two Exectuor receive input data from receiver, and it is cached to the internal memory of Exectuor In;2nd, receiver notifies that input source data has been received;3rd, Exectuor has delayed according to the code start to process of application program The data deposited;4th, at this time Driver hangs suddenly;5th, from from the point of view of design, once after Driver hangs, it is safeguarded Exectuor also will be all by kill;Since the 6, all of Exectuor is by kill, so being cached in their internal memories Data will also be lost.As a result, but these have notified data source data cached just lost of not processing also;7th, cache When can not possibly recover because they are buffered in the internal memory of Exectuor, data are lost.
Therefore, urgent need is a kind of to be prevented from zero to lose several methods to ensure at SparkStreaming docking Kafka data Reason stability.
The content of the invention
The technical problems to be solved by the invention are to provide and a kind of read Kafka data based on Spark Streaming Processing method, can effectively prevent loss of data, after failure recovery from Kafka consumption data again, so as to In the case of SparkStreaming program exceptions, more flexibly, easily accomplish that zero loses several guarantees.
The technical scheme that the present invention is used to solve above-mentioned technical problem is to provide a kind of based on Spark Streaming The processing method of Kafka data is read, is comprised the following steps:S1 in) storing data in topic using Kafka, each topic Include the subregion of some configurable numbers;S2 it is single with timeslice real time input data stream) to utilize Spark Streaming Position cutting is blocking, and each block generates a Spark Job treatment;S3)
Number is recorded previously according to Kafka data failures, SparkStreaming complement scheduling times are set;S4) supervise in real time Control SparkStreaming reads the processing procedure of Kafka data;S5 number and scheduling time) are recorded according to Kafka data failures, The Kafka data unsuccessfully lost are re-read by SparkStreaming.
The above-mentioned processing method that Kafka data are read based on Spark Streaming, wherein, the step S3) use Relevant database creates two database tables, respectively dispatch list and failure record number table, storage scheduling in the dispatch list Numbering id, time started, end time, state and creation time information, it is described unsuccessfully to count storage failure record id in record sheet, Side-play amount, Kafka topics, Kafka node listing information, scheduling numbering id and the unsuccessfully mistake of several record sheets in the dispatch list It is main foreign key relationship to lose record id.
The above-mentioned processing method that Kafka data are read based on Spark Streaming, wherein, the step S4) bag Include:In SparkStreaming reads Kafka data procedures, if corresponding Kafka topic datas are not sky, get Reading the side-play amount of data from Kafka, and by the data offset, Kafka topics and Kafka node listing information Be put in storage during relevant database unsuccessfully counts record sheet, if data processing exception, the state in modification tables of data is failure.
The above-mentioned processing method that Kafka data are read based on Spark Streaming, wherein, the step S4) in SparkStreaming is directly connected on Kafka nodes by Direct modes, and by createDirectStream side Method gets the side-play amount that data are read from Kafka, while being in progress by the status indicator in dispatch list;When In SparkStreaming docking Kafka reading process data procedures, occur it is abnormal cause the program can not normally to perform, then change State in dispatch list is failure.
The above-mentioned processing method that Kafka data are read based on Spark Streaming, wherein, the step S5) bag Include:First according to dispatch list mode field as querying condition, scan schedule table drops according to creation time field as sequence Sequence, obtains earliest dispatching record, then obtains scheduling numbering id, using the field as inquiry failure number scale record surface condition, obtains All Kafka failure records numbers are obtained, Kafka data are re-read further according to Kafka topics and side-play amount.
The above-mentioned processing method that Kafka data are read based on Spark Streaming, wherein, the step S4) first read Take dispatch list and failure number scale in relational database and record table cache in internal memory, then the data in caching are updated by thread timing Carry out monitor in real time.
Present invention contrast prior art has following beneficial effect:What the present invention was provided is read based on Spark Streaming The processing method of Kafka data is taken, number is recorded according to Kafka data failures, SparkStreaming complement scheduling times are set, Monitor in real time reading process simultaneously re-reads failure record number and carries out complement such that it is able to effectively prevent loss of data, in failure After recovery from Kafka consumption data again, in the case of SparkStreaming program exceptions, more flexibly, easily Accomplish that zero loses several guarantees.
Brief description of the drawings
Fig. 1 is the Spark Streaming Organization Charts that the present invention is used;
Fig. 2 is the Kafka treatment streaming schematic diagram datas that the present invention is used;
Fig. 3 is dispatch list of the invention and failure number scale record table model structure chart;
Fig. 4 is the monitoring flow chart that Kafka data are read based on Spark Streaming of the invention;
Fig. 5 is failure record complement flow chart of the invention.
Specific embodiment
The invention will be further described with reference to the accompanying drawings and examples.
The processing method that Kafka data are read based on Spark Streaming that the present invention is provided, uses relational data Storehouse creates two database tables, respectively dispatch list (control), failure record number table (fai lure).Wherein dispatch list is deposited What is put is schedule information, including scheduling numbering id, the time started, the end time, state, the information such as creation time.Failure number scale Record table deposits specific miss data record details, including failure record id, side-play amount, topic (topic), Kafka nodes The information such as list.Scheduling numbering id wherein inside dispatch list is main foreign key relationship with unsuccessfully counting the id of record sheet.
In SparkStreaming docking Kafka reading process data procedures, SparkStreaming can be first passed through first CreateDirectStream methods, get the side-play amount that data are read from Kafka, and by the data-bias During amount information storage unsuccessfully counts record sheet to relevant database, state representation is in progress.
When in SparkStreaming docking Kafka reading process data procedures, there is exception and cause program normal Perform, according to the Exception information for capturing, with reference to corresponding data offset information, modification state is failure;Otherwise repair It is changed to successfully.
With reference to record sheet is unsuccessfully counted, dispatch list can be manually set, complement setting is carried out unsuccessfully, work as restarting During SparkStreaming programs, meeting scan schedule table and unsuccessfully several record sheets obtain complement strategy, re-read Kafka and refer to Data on fixed topic.
SparkStreaming of the invention obtains the mode of the two ways Receiver and Direct of Kafka data, Receiver modes are that Kafka queues are connected by zookeeper, and Direct modes are directly to the node of Kafka Upper acquisition data.Mode based on Receiver, this mode obtains data using Receiver.Receiver is to use The high-level Consumer API of Kafka are realized.The data that Receiver is obtained from Kafka are all stored in Spark In the internal memory of Executor, the job that then Spark Streaming start can go to process those data.However, in acquiescence Under configuration, this mode may lose data because of the failure of bottom.If enabling highly reliable mechanism, data zero are allowed to lose Lose, must just enable the write-ahead log mechanism (Write Ahead Log, WAL) of Spark Streaming.The mechanism can be synchronous The Kafka data that ground will be received are write in the write-ahead log in distributed file system (such as HDFS);But Receiver's There is shortcoming in mode:1st, WAL reduces the handling capacity of receiver, because the data for receiving must be saved in reliable distribution In file system;2nd, for some input sources, it can repeat identical data.Such as when data are read from Kafka, first A data are preserved in the brokers of Kafka, but also portion need to be preserved in Spark Streaming.It is of the invention Technical scheme by SparkStreaming obtain kafka data Direct modes premised under carry out, gather it is of the invention Technical scheme, zero several modes are lost relative to the first, can bring significant beneficial effect, and specific advantage is as follows:1st, no longer need Kafka receivers, Exectuor directly uses Simple Consumer API consumption datas from Kafka;2nd, no longer need WAL mechanism, still can from after failure recovery from Kafka consumption data again;3rd, exactly-once semantemes are protected Deposit, the data of repetition are no longer read from WAL;4th, in the case of can guarantee that SparkStreaming program exceptions, more flexibly, just Accomplish that zero loses several guarantees promptly.
The Spark Streaming that the present invention is used are built upon the real-time Computational frame on Spark, are provided by it Abundant API, the high-speed execution engine based on internal memory, user can combine streaming, batch processing and interaction audit trial and ask application;With The development of big data, people to the processing requirement of big data also more and more higher, original batch processing framework MapReduce be adapted to from Line computation, cannot but meet requirement of real-time business higher.Therefore, how to go to ensure that Spark Streaming obtain kafka Data and efficiently, stabilization be very important.The problem of kafka loss of data data is obtained for Spark Streaming, The Spark Streaming that the present invention is provided read kafka and fail the method for complement, relate generally to scheduling and monitoring model sets Three aspects such as meter, the design of complement control centre, Surveillance center's design.Specific implementation process is as follows:
1st, dispatch list (control), failure record number table (failure), specific table knot are created in relevant database Structure is as shown in Figure 3.
2nd, programming realization Surveillance center service, in SparkStreaming docking Kafka reading process data procedures, first The createDirectStream methods of SparkStreaming can be first passed through, is got and is read data from kafka Side-play amount, and the information such as the data offset (offset) storage to relevant database is unsuccessfully counted into record sheet, state It is expressed as in progress.When in SparkStreaming docking kafka reading process data procedures, there is exception and cause program Can not normally perform, according to the Exception information for capturing, with reference to corresponding data offset information, call update to repair It is failure to change state;Otherwise it is revised as successfully, as shown in Figure 4.
3rd, complement control centre interface, is control centre's program, as shown in figure 5, being made according to dispatch list mode field first It is querying condition, scan schedule table according to creation time field as sequence descending, obtains earliest dispatching record, then obtains Numbering ID must be dispatched, using the field as inquiry failure number scale record surface condition, all Kafka failure records numbers is obtained, further according to Topic and side-play amount (offset) re-read Kafka data and are processed.
Although the present invention is disclosed as above with preferred embodiment, so it is not limited to the present invention, any this area skill Art personnel, without departing from the spirit and scope of the present invention, when a little modification and perfect, therefore protection model of the invention can be made Enclose when by being defined that claims are defined.

Claims (6)

1. it is a kind of based on Spark Streaming read Kafka data processing method, it is characterised in that comprise the following steps:
S1 in) storing data in topic using Kafka, each topic subregion comprising some configurable numbers;
S2) using Spark Streaming that real time input data stream is blocking as unit cutting with timeslice, each block is generated One Spark Job treatment;
S3 number) is recorded previously according to Kafka data failures, SparkStreaming complement scheduling times is set;
S4) monitor in real time SparkStreaming reads the processing procedure of Kafka data;
S5 number and scheduling time) are recorded according to Kafka data failures, re-reads what is unsuccessfully lost by SparkStreaming Kafka data.
It is 2. as claimed in claim 1 to be based on the processing method that Spark Streaming read Kafka data, it is characterised in that The step S3) create two database tables, respectively dispatch list and failure record number table, the tune using relevant database Scheduling numbering id, time started, end time, state and creation time information are deposited in degree table, in unsuccessfully several record sheets Storage failure record id, side-play amount, Kafka topics, Kafka node listing information, scheduling numbering id in the dispatch list with The failure record id of failure number record sheet is main foreign key relationship.
It is 3. as claimed in claim 2 to be based on the processing method that Spark Streaming read Kafka data, it is characterised in that The step S4) include:In SparkStreaming reads Kafka data procedures, if corresponding Kafka topic datas are not Be sky, then get the side-play amount that data are read from Kafka, and by the data offset, Kafka topics and During Kafka node listings information storage unsuccessfully counts record sheet to relevant database, if data processing exception, changes data State in table is failure.
It is 4. as claimed in claim 3 to be based on the processing method that Spark Streaming read Kafka data, it is characterised in that The step S4) in SparkStreaming be directly connected on Kafka nodes by Direct modes, and pass through CreateDirectStream methods get the side-play amount that data are read from Kafka, while by the shape in dispatch list State is designated in progress;When in SparkStreaming docking Kafka reading process data procedures, there is exception and cause journey Sequence can not be performed normally, then it is failure to change the state in dispatch list.
It is 5. as claimed in claim 4 to be based on the processing method that Spark Streaming read Kafka data, it is characterised in that The step S5) include:First according to dispatch list mode field as querying condition, scan schedule table, according to creation time word Duan Zuowei sort descending, obtain earliest dispatching record, then obtain scheduling numbering id, using the field as inquiry fail number scale Record surface condition, obtains all Kafka failure records numbers, and Kafka data are re-read further according to Kafka topics and side-play amount.
It is 6. as claimed in claim 3 to be based on the processing method that Spark Streaming read Kafka data, it is characterised in that The step S4) first read dispatch list and failure number scale in relational database and record table cache in internal memory, then by thread regularly The data updated in caching carry out monitor in real time.
CN201611069230.9A 2016-11-29 2016-11-29 Processing method for reading Kafka data based on Spark Streaming Active CN106776855B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611069230.9A CN106776855B (en) 2016-11-29 2016-11-29 Processing method for reading Kafka data based on Spark Streaming

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611069230.9A CN106776855B (en) 2016-11-29 2016-11-29 Processing method for reading Kafka data based on Spark Streaming

Publications (2)

Publication Number Publication Date
CN106776855A true CN106776855A (en) 2017-05-31
CN106776855B CN106776855B (en) 2020-03-13

Family

ID=58905124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611069230.9A Active CN106776855B (en) 2016-11-29 2016-11-29 Processing method for reading Kafka data based on Spark Streaming

Country Status (1)

Country Link
CN (1) CN106776855B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062251A (en) * 2018-01-09 2018-05-22 福建星瑞格软件有限公司 A kind of server resource recovery method and computer equipment
CN108228830A (en) * 2018-01-03 2018-06-29 广东工业大学 A kind of data processing system
CN108647329A (en) * 2018-05-11 2018-10-12 中国联合网络通信集团有限公司 Processing method, device and the computer readable storage medium of user behavior data
CN109634784A (en) * 2018-12-24 2019-04-16 康成投资(中国)有限公司 Spark application control method and control device
CN110648178A (en) * 2019-09-24 2020-01-03 四川长虹电器股份有限公司 Method for increasing kafka consumption capacity
CN110647570A (en) * 2019-09-20 2020-01-03 百度在线网络技术(北京)有限公司 Data processing method and device and electronic equipment
CN110912949A (en) * 2018-09-14 2020-03-24 北京京东尚科信息技术有限公司 Method and device for submitting sites
CN111061565A (en) * 2019-12-12 2020-04-24 湖南大学 Two-stage pipeline task scheduling method and system in Spark environment
CN111124650A (en) * 2019-12-26 2020-05-08 中国建设银行股份有限公司 Streaming data processing method and device
CN111163118A (en) * 2018-11-07 2020-05-15 株式会社日立制作所 Message transmission method and device in Kafka cluster
CN111241051A (en) * 2020-01-07 2020-06-05 深圳迅策科技有限公司 Batch data processing method and device, terminal equipment and storage medium
CN111328013A (en) * 2018-12-17 2020-06-23 ***通信集团山东有限公司 Mobile terminal positioning method and system
CN111526188A (en) * 2020-04-10 2020-08-11 北京计算机技术及应用研究所 System and method for ensuring zero data loss based on Spark Streaming in combination with Kafka
CN112615773A (en) * 2020-12-02 2021-04-06 海南车智易通信息技术有限公司 Message processing method and system
CN112800073A (en) * 2021-01-27 2021-05-14 浪潮云信息技术股份公司 Method for updating Delta Lake based on NiFi

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636352A (en) * 2013-11-08 2015-05-20 中国石油天然气股份有限公司 SCADA system historical data complement and query processing method based on quality stamp
US20160306817A1 (en) * 2015-04-14 2016-10-20 Et International, Inc. Systems and methods for key-value stores
CN106126721A (en) * 2016-06-30 2016-11-16 北京奇虎科技有限公司 The data processing method of a kind of real-time calculating platform and device
CN106156307A (en) * 2016-06-30 2016-11-23 北京奇虎科技有限公司 The data handling system of a kind of real-time calculating platform and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636352A (en) * 2013-11-08 2015-05-20 中国石油天然气股份有限公司 SCADA system historical data complement and query processing method based on quality stamp
US20160306817A1 (en) * 2015-04-14 2016-10-20 Et International, Inc. Systems and methods for key-value stores
CN106126721A (en) * 2016-06-30 2016-11-16 北京奇虎科技有限公司 The data processing method of a kind of real-time calculating platform and device
CN106156307A (en) * 2016-06-30 2016-11-23 北京奇虎科技有限公司 The data handling system of a kind of real-time calculating platform and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
鱼儿慢慢游: ""spark streaming 对接 kafka记录"", 《HTTPS://WWW.CNBLOGS.COM/MISSMZT/P/6004868.HTML》 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228830A (en) * 2018-01-03 2018-06-29 广东工业大学 A kind of data processing system
CN108062251A (en) * 2018-01-09 2018-05-22 福建星瑞格软件有限公司 A kind of server resource recovery method and computer equipment
CN108647329A (en) * 2018-05-11 2018-10-12 中国联合网络通信集团有限公司 Processing method, device and the computer readable storage medium of user behavior data
CN108647329B (en) * 2018-05-11 2021-08-10 中国联合网络通信集团有限公司 User behavior data processing method and device and computer readable storage medium
CN110912949A (en) * 2018-09-14 2020-03-24 北京京东尚科信息技术有限公司 Method and device for submitting sites
CN110912949B (en) * 2018-09-14 2022-11-08 北京京东尚科信息技术有限公司 Method and device for submitting sites
CN111163118B (en) * 2018-11-07 2023-04-07 株式会社日立制作所 Message transmission method and device in Kafka cluster
CN111163118A (en) * 2018-11-07 2020-05-15 株式会社日立制作所 Message transmission method and device in Kafka cluster
CN111328013A (en) * 2018-12-17 2020-06-23 ***通信集团山东有限公司 Mobile terminal positioning method and system
CN109634784A (en) * 2018-12-24 2019-04-16 康成投资(中国)有限公司 Spark application control method and control device
CN110647570A (en) * 2019-09-20 2020-01-03 百度在线网络技术(北京)有限公司 Data processing method and device and electronic equipment
CN110647570B (en) * 2019-09-20 2022-04-29 百度在线网络技术(北京)有限公司 Data processing method and device and electronic equipment
CN110648178A (en) * 2019-09-24 2020-01-03 四川长虹电器股份有限公司 Method for increasing kafka consumption capacity
CN111061565A (en) * 2019-12-12 2020-04-24 湖南大学 Two-stage pipeline task scheduling method and system in Spark environment
CN111061565B (en) * 2019-12-12 2023-08-25 湖南大学 Two-section pipeline task scheduling method and system in Spark environment
CN111124650B (en) * 2019-12-26 2023-10-24 中国建设银行股份有限公司 Stream data processing method and device
CN111124650A (en) * 2019-12-26 2020-05-08 中国建设银行股份有限公司 Streaming data processing method and device
CN111241051B (en) * 2020-01-07 2023-09-12 深圳迅策科技有限公司 Batch data processing method and device, terminal equipment and storage medium
CN111241051A (en) * 2020-01-07 2020-06-05 深圳迅策科技有限公司 Batch data processing method and device, terminal equipment and storage medium
CN111526188A (en) * 2020-04-10 2020-08-11 北京计算机技术及应用研究所 System and method for ensuring zero data loss based on Spark Streaming in combination with Kafka
CN111526188B (en) * 2020-04-10 2022-11-22 北京计算机技术及应用研究所 System and method for ensuring zero data loss based on Spark Streaming in combination with Kafka
CN112615773B (en) * 2020-12-02 2023-02-28 海南车智易通信息技术有限公司 Message processing method and system
CN112615773A (en) * 2020-12-02 2021-04-06 海南车智易通信息技术有限公司 Message processing method and system
CN112800073A (en) * 2021-01-27 2021-05-14 浪潮云信息技术股份公司 Method for updating Delta Lake based on NiFi

Also Published As

Publication number Publication date
CN106776855B (en) 2020-03-13

Similar Documents

Publication Publication Date Title
CN106776855A (en) The processing method of Kafka data is read based on Spark Streaming
CN105224445B (en) Distributed tracking system
US6014673A (en) Simultaneous use of database and durable store in work flow and process flow systems
DE69635570T2 (en) Clocking for a fast-failing, functionally missing, fault-tolerant multiprocessor system
US6085200A (en) System and method for arranging database restoration data for efficient data recovery in transaction processing systems
CN101807073B (en) Historical data processing method and device of distributed control system
JP2667039B2 (en) Data management system and data management method
US20050223275A1 (en) Performance data access
CN107193539B (en) Multithreading concurrent processing method and multithreading concurrent processing system
WO2017079048A1 (en) Clustered fault tolerance systems and methods using load-based failover
CN107193909A (en) Data processing method and system
KR101708170B1 (en) System and method for tracing the activity of a data processing unit supporting speculative instruction execution and out-of-order data transfers
CN105224888B (en) A kind of data of magnetic disk array protection system based on safe early warning technology
CN109885453B (en) Big data platform monitoring system based on stream data processing
CN112416724A (en) Alarm processing method, system, computer equipment and storage medium
CN106991656B (en) A kind of mass remote sensing image distribution geometric correction system and method
US20190205221A1 (en) Error handling for services requiring guaranteed ordering of asynchronous operations in a distributed environment
CN106970846A (en) Payment system message is controlled and processing method, device
US10664192B2 (en) In-memory service with plural buffer type assignment
CN110196759A (en) Distributed transaction processing method and device, storage medium and electronic device
CN110262945A (en) A kind of method of intelligent monitoring data warehouse scheduling system
CN106776251A (en) A kind of monitoring data processing unit and method
CN110222039A (en) Data storage and garbage data cleaning method, device, equipment and storage medium
CN110324211A (en) A kind of data capture method and device
CN104734895A (en) Service monitoring system and service monitoring method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant