CN106776855A - The processing method of Kafka data is read based on Spark Streaming - Google Patents
The processing method of Kafka data is read based on Spark Streaming Download PDFInfo
- Publication number
- CN106776855A CN106776855A CN201611069230.9A CN201611069230A CN106776855A CN 106776855 A CN106776855 A CN 106776855A CN 201611069230 A CN201611069230 A CN 201611069230A CN 106776855 A CN106776855 A CN 106776855A
- Authority
- CN
- China
- Prior art keywords
- kafka
- data
- sparkstreaming
- read
- failure
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of processing method that Kafka data are read based on Spark Streaming, comprise the following steps:S1 in) storing data in topic using Kafka;S2 it is) using Spark Streaming that real time input data stream is blocking as unit cutting with timeslice;S3 number) is recorded previously according to Kafka data failures, SparkStreaming complement scheduling times is set;S4) monitor in real time SparkStreaming reads Kafka data procedures;S5 Kafka data) are re-read by SparkStreaming.The present invention records number and sets SparkStreaming complement scheduling times according to Kafka data failures, and monitor in real time reading process simultaneously re-reads failure record number and carries out complement, more flexibly, easily accomplishes that zero loses several guarantees.
Description
Technical field
Read based on Spark Streaming the present invention relates to a kind of Kafka data processing methods, more particularly to one kind
The processing method of Kafka data.
Background technology
Spark Streaming are to calculate streaming to resolve into a series of short and small batch processing jobs.Here batch processing
Engine is Spark, that is, the input data of Spark Streaming is divided into one section one according to batch size (such as 1 second)
The data (Discretized Stream) of section, the RDD (Resilient in Spark are all converted into per one piece of data
Distributed Dataset), then the Transformation operations in Spark Streaming to DStream are changed into
For the Transformation operations in Spark to RDD, RDD is become into intermediate result by operation and is stored in internal memory.It is whole
Individual streaming is calculated can be overlapped according to the demand of business to middle result, or external equipment is arrived in storage.Fig. 1 shows
The whole flow process of Spark Streaming.
Kafka is distributed post-subscription message system.It is initially developed by LinkedIn companies, is turned into afterwards
A part for Apache projects.Kafka is one distributed, can be divided, persistent log services of redundancy backup.It
Mainly for the treatment of active stream data, as shown in Figure 2.
It is well known that real-time, stability, accuracy requirement more and more higher of the big data epoch to data processing;Now
The combo architectures of rise have SparkStreaming to dock Kafka, and it is excellent to be based on internal memory iterative calculation by SparkStreaming
Gesture and Kafka high concurrent data distribution capabilities, and then reach the real-time of data processing;But SparkStreaming is docked
During kafka, potential loss of data scene still occurs unavoidably, detailed process is as follows:
1st, two Exectuor receive input data from receiver, and it is cached to the internal memory of Exectuor
In;2nd, receiver notifies that input source data has been received;3rd, Exectuor has delayed according to the code start to process of application program
The data deposited;4th, at this time Driver hangs suddenly;5th, from from the point of view of design, once after Driver hangs, it is safeguarded
Exectuor also will be all by kill;Since the 6, all of Exectuor is by kill, so being cached in their internal memories
Data will also be lost.As a result, but these have notified data source data cached just lost of not processing also;7th, cache
When can not possibly recover because they are buffered in the internal memory of Exectuor, data are lost.
Therefore, urgent need is a kind of to be prevented from zero to lose several methods to ensure at SparkStreaming docking Kafka data
Reason stability.
The content of the invention
The technical problems to be solved by the invention are to provide and a kind of read Kafka data based on Spark Streaming
Processing method, can effectively prevent loss of data, after failure recovery from Kafka consumption data again, so as to
In the case of SparkStreaming program exceptions, more flexibly, easily accomplish that zero loses several guarantees.
The technical scheme that the present invention is used to solve above-mentioned technical problem is to provide a kind of based on Spark Streaming
The processing method of Kafka data is read, is comprised the following steps:S1 in) storing data in topic using Kafka, each topic
Include the subregion of some configurable numbers;S2 it is single with timeslice real time input data stream) to utilize Spark Streaming
Position cutting is blocking, and each block generates a Spark Job treatment;S3)
Number is recorded previously according to Kafka data failures, SparkStreaming complement scheduling times are set;S4) supervise in real time
Control SparkStreaming reads the processing procedure of Kafka data;S5 number and scheduling time) are recorded according to Kafka data failures,
The Kafka data unsuccessfully lost are re-read by SparkStreaming.
The above-mentioned processing method that Kafka data are read based on Spark Streaming, wherein, the step S3) use
Relevant database creates two database tables, respectively dispatch list and failure record number table, storage scheduling in the dispatch list
Numbering id, time started, end time, state and creation time information, it is described unsuccessfully to count storage failure record id in record sheet,
Side-play amount, Kafka topics, Kafka node listing information, scheduling numbering id and the unsuccessfully mistake of several record sheets in the dispatch list
It is main foreign key relationship to lose record id.
The above-mentioned processing method that Kafka data are read based on Spark Streaming, wherein, the step S4) bag
Include:In SparkStreaming reads Kafka data procedures, if corresponding Kafka topic datas are not sky, get
Reading the side-play amount of data from Kafka, and by the data offset, Kafka topics and Kafka node listing information
Be put in storage during relevant database unsuccessfully counts record sheet, if data processing exception, the state in modification tables of data is failure.
The above-mentioned processing method that Kafka data are read based on Spark Streaming, wherein, the step S4) in
SparkStreaming is directly connected on Kafka nodes by Direct modes, and by createDirectStream side
Method gets the side-play amount that data are read from Kafka, while being in progress by the status indicator in dispatch list;When
In SparkStreaming docking Kafka reading process data procedures, occur it is abnormal cause the program can not normally to perform, then change
State in dispatch list is failure.
The above-mentioned processing method that Kafka data are read based on Spark Streaming, wherein, the step S5) bag
Include:First according to dispatch list mode field as querying condition, scan schedule table drops according to creation time field as sequence
Sequence, obtains earliest dispatching record, then obtains scheduling numbering id, using the field as inquiry failure number scale record surface condition, obtains
All Kafka failure records numbers are obtained, Kafka data are re-read further according to Kafka topics and side-play amount.
The above-mentioned processing method that Kafka data are read based on Spark Streaming, wherein, the step S4) first read
Take dispatch list and failure number scale in relational database and record table cache in internal memory, then the data in caching are updated by thread timing
Carry out monitor in real time.
Present invention contrast prior art has following beneficial effect:What the present invention was provided is read based on Spark Streaming
The processing method of Kafka data is taken, number is recorded according to Kafka data failures, SparkStreaming complement scheduling times are set,
Monitor in real time reading process simultaneously re-reads failure record number and carries out complement such that it is able to effectively prevent loss of data, in failure
After recovery from Kafka consumption data again, in the case of SparkStreaming program exceptions, more flexibly, easily
Accomplish that zero loses several guarantees.
Brief description of the drawings
Fig. 1 is the Spark Streaming Organization Charts that the present invention is used;
Fig. 2 is the Kafka treatment streaming schematic diagram datas that the present invention is used;
Fig. 3 is dispatch list of the invention and failure number scale record table model structure chart;
Fig. 4 is the monitoring flow chart that Kafka data are read based on Spark Streaming of the invention;
Fig. 5 is failure record complement flow chart of the invention.
Specific embodiment
The invention will be further described with reference to the accompanying drawings and examples.
The processing method that Kafka data are read based on Spark Streaming that the present invention is provided, uses relational data
Storehouse creates two database tables, respectively dispatch list (control), failure record number table (fai lure).Wherein dispatch list is deposited
What is put is schedule information, including scheduling numbering id, the time started, the end time, state, the information such as creation time.Failure number scale
Record table deposits specific miss data record details, including failure record id, side-play amount, topic (topic), Kafka nodes
The information such as list.Scheduling numbering id wherein inside dispatch list is main foreign key relationship with unsuccessfully counting the id of record sheet.
In SparkStreaming docking Kafka reading process data procedures, SparkStreaming can be first passed through first
CreateDirectStream methods, get the side-play amount that data are read from Kafka, and by the data-bias
During amount information storage unsuccessfully counts record sheet to relevant database, state representation is in progress.
When in SparkStreaming docking Kafka reading process data procedures, there is exception and cause program normal
Perform, according to the Exception information for capturing, with reference to corresponding data offset information, modification state is failure;Otherwise repair
It is changed to successfully.
With reference to record sheet is unsuccessfully counted, dispatch list can be manually set, complement setting is carried out unsuccessfully, work as restarting
During SparkStreaming programs, meeting scan schedule table and unsuccessfully several record sheets obtain complement strategy, re-read Kafka and refer to
Data on fixed topic.
SparkStreaming of the invention obtains the mode of the two ways Receiver and Direct of Kafka data,
Receiver modes are that Kafka queues are connected by zookeeper, and Direct modes are directly to the node of Kafka
Upper acquisition data.Mode based on Receiver, this mode obtains data using Receiver.Receiver is to use
The high-level Consumer API of Kafka are realized.The data that Receiver is obtained from Kafka are all stored in Spark
In the internal memory of Executor, the job that then Spark Streaming start can go to process those data.However, in acquiescence
Under configuration, this mode may lose data because of the failure of bottom.If enabling highly reliable mechanism, data zero are allowed to lose
Lose, must just enable the write-ahead log mechanism (Write Ahead Log, WAL) of Spark Streaming.The mechanism can be synchronous
The Kafka data that ground will be received are write in the write-ahead log in distributed file system (such as HDFS);But Receiver's
There is shortcoming in mode:1st, WAL reduces the handling capacity of receiver, because the data for receiving must be saved in reliable distribution
In file system;2nd, for some input sources, it can repeat identical data.Such as when data are read from Kafka, first
A data are preserved in the brokers of Kafka, but also portion need to be preserved in Spark Streaming.It is of the invention
Technical scheme by SparkStreaming obtain kafka data Direct modes premised under carry out, gather it is of the invention
Technical scheme, zero several modes are lost relative to the first, can bring significant beneficial effect, and specific advantage is as follows:1st, no longer need
Kafka receivers, Exectuor directly uses Simple Consumer API consumption datas from Kafka;2nd, no longer need
WAL mechanism, still can from after failure recovery from Kafka consumption data again;3rd, exactly-once semantemes are protected
Deposit, the data of repetition are no longer read from WAL;4th, in the case of can guarantee that SparkStreaming program exceptions, more flexibly, just
Accomplish that zero loses several guarantees promptly.
The Spark Streaming that the present invention is used are built upon the real-time Computational frame on Spark, are provided by it
Abundant API, the high-speed execution engine based on internal memory, user can combine streaming, batch processing and interaction audit trial and ask application;With
The development of big data, people to the processing requirement of big data also more and more higher, original batch processing framework MapReduce be adapted to from
Line computation, cannot but meet requirement of real-time business higher.Therefore, how to go to ensure that Spark Streaming obtain kafka
Data and efficiently, stabilization be very important.The problem of kafka loss of data data is obtained for Spark Streaming,
The Spark Streaming that the present invention is provided read kafka and fail the method for complement, relate generally to scheduling and monitoring model sets
Three aspects such as meter, the design of complement control centre, Surveillance center's design.Specific implementation process is as follows:
1st, dispatch list (control), failure record number table (failure), specific table knot are created in relevant database
Structure is as shown in Figure 3.
2nd, programming realization Surveillance center service, in SparkStreaming docking Kafka reading process data procedures, first
The createDirectStream methods of SparkStreaming can be first passed through, is got and is read data from kafka
Side-play amount, and the information such as the data offset (offset) storage to relevant database is unsuccessfully counted into record sheet, state
It is expressed as in progress.When in SparkStreaming docking kafka reading process data procedures, there is exception and cause program
Can not normally perform, according to the Exception information for capturing, with reference to corresponding data offset information, call update to repair
It is failure to change state;Otherwise it is revised as successfully, as shown in Figure 4.
3rd, complement control centre interface, is control centre's program, as shown in figure 5, being made according to dispatch list mode field first
It is querying condition, scan schedule table according to creation time field as sequence descending, obtains earliest dispatching record, then obtains
Numbering ID must be dispatched, using the field as inquiry failure number scale record surface condition, all Kafka failure records numbers is obtained, further according to
Topic and side-play amount (offset) re-read Kafka data and are processed.
Although the present invention is disclosed as above with preferred embodiment, so it is not limited to the present invention, any this area skill
Art personnel, without departing from the spirit and scope of the present invention, when a little modification and perfect, therefore protection model of the invention can be made
Enclose when by being defined that claims are defined.
Claims (6)
1. it is a kind of based on Spark Streaming read Kafka data processing method, it is characterised in that comprise the following steps:
S1 in) storing data in topic using Kafka, each topic subregion comprising some configurable numbers;
S2) using Spark Streaming that real time input data stream is blocking as unit cutting with timeslice, each block is generated
One Spark Job treatment;
S3 number) is recorded previously according to Kafka data failures, SparkStreaming complement scheduling times is set;
S4) monitor in real time SparkStreaming reads the processing procedure of Kafka data;
S5 number and scheduling time) are recorded according to Kafka data failures, re-reads what is unsuccessfully lost by SparkStreaming
Kafka data.
It is 2. as claimed in claim 1 to be based on the processing method that Spark Streaming read Kafka data, it is characterised in that
The step S3) create two database tables, respectively dispatch list and failure record number table, the tune using relevant database
Scheduling numbering id, time started, end time, state and creation time information are deposited in degree table, in unsuccessfully several record sheets
Storage failure record id, side-play amount, Kafka topics, Kafka node listing information, scheduling numbering id in the dispatch list with
The failure record id of failure number record sheet is main foreign key relationship.
It is 3. as claimed in claim 2 to be based on the processing method that Spark Streaming read Kafka data, it is characterised in that
The step S4) include:In SparkStreaming reads Kafka data procedures, if corresponding Kafka topic datas are not
Be sky, then get the side-play amount that data are read from Kafka, and by the data offset, Kafka topics and
During Kafka node listings information storage unsuccessfully counts record sheet to relevant database, if data processing exception, changes data
State in table is failure.
It is 4. as claimed in claim 3 to be based on the processing method that Spark Streaming read Kafka data, it is characterised in that
The step S4) in SparkStreaming be directly connected on Kafka nodes by Direct modes, and pass through
CreateDirectStream methods get the side-play amount that data are read from Kafka, while by the shape in dispatch list
State is designated in progress;When in SparkStreaming docking Kafka reading process data procedures, there is exception and cause journey
Sequence can not be performed normally, then it is failure to change the state in dispatch list.
It is 5. as claimed in claim 4 to be based on the processing method that Spark Streaming read Kafka data, it is characterised in that
The step S5) include:First according to dispatch list mode field as querying condition, scan schedule table, according to creation time word
Duan Zuowei sort descending, obtain earliest dispatching record, then obtain scheduling numbering id, using the field as inquiry fail number scale
Record surface condition, obtains all Kafka failure records numbers, and Kafka data are re-read further according to Kafka topics and side-play amount.
It is 6. as claimed in claim 3 to be based on the processing method that Spark Streaming read Kafka data, it is characterised in that
The step S4) first read dispatch list and failure number scale in relational database and record table cache in internal memory, then by thread regularly
The data updated in caching carry out monitor in real time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611069230.9A CN106776855B (en) | 2016-11-29 | 2016-11-29 | Processing method for reading Kafka data based on Spark Streaming |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611069230.9A CN106776855B (en) | 2016-11-29 | 2016-11-29 | Processing method for reading Kafka data based on Spark Streaming |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106776855A true CN106776855A (en) | 2017-05-31 |
CN106776855B CN106776855B (en) | 2020-03-13 |
Family
ID=58905124
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611069230.9A Active CN106776855B (en) | 2016-11-29 | 2016-11-29 | Processing method for reading Kafka data based on Spark Streaming |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106776855B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108062251A (en) * | 2018-01-09 | 2018-05-22 | 福建星瑞格软件有限公司 | A kind of server resource recovery method and computer equipment |
CN108228830A (en) * | 2018-01-03 | 2018-06-29 | 广东工业大学 | A kind of data processing system |
CN108647329A (en) * | 2018-05-11 | 2018-10-12 | 中国联合网络通信集团有限公司 | Processing method, device and the computer readable storage medium of user behavior data |
CN109634784A (en) * | 2018-12-24 | 2019-04-16 | 康成投资(中国)有限公司 | Spark application control method and control device |
CN110648178A (en) * | 2019-09-24 | 2020-01-03 | 四川长虹电器股份有限公司 | Method for increasing kafka consumption capacity |
CN110647570A (en) * | 2019-09-20 | 2020-01-03 | 百度在线网络技术(北京)有限公司 | Data processing method and device and electronic equipment |
CN110912949A (en) * | 2018-09-14 | 2020-03-24 | 北京京东尚科信息技术有限公司 | Method and device for submitting sites |
CN111061565A (en) * | 2019-12-12 | 2020-04-24 | 湖南大学 | Two-stage pipeline task scheduling method and system in Spark environment |
CN111124650A (en) * | 2019-12-26 | 2020-05-08 | 中国建设银行股份有限公司 | Streaming data processing method and device |
CN111163118A (en) * | 2018-11-07 | 2020-05-15 | 株式会社日立制作所 | Message transmission method and device in Kafka cluster |
CN111241051A (en) * | 2020-01-07 | 2020-06-05 | 深圳迅策科技有限公司 | Batch data processing method and device, terminal equipment and storage medium |
CN111328013A (en) * | 2018-12-17 | 2020-06-23 | ***通信集团山东有限公司 | Mobile terminal positioning method and system |
CN111526188A (en) * | 2020-04-10 | 2020-08-11 | 北京计算机技术及应用研究所 | System and method for ensuring zero data loss based on Spark Streaming in combination with Kafka |
CN112615773A (en) * | 2020-12-02 | 2021-04-06 | 海南车智易通信息技术有限公司 | Message processing method and system |
CN112800073A (en) * | 2021-01-27 | 2021-05-14 | 浪潮云信息技术股份公司 | Method for updating Delta Lake based on NiFi |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104636352A (en) * | 2013-11-08 | 2015-05-20 | 中国石油天然气股份有限公司 | SCADA system historical data complement and query processing method based on quality stamp |
US20160306817A1 (en) * | 2015-04-14 | 2016-10-20 | Et International, Inc. | Systems and methods for key-value stores |
CN106126721A (en) * | 2016-06-30 | 2016-11-16 | 北京奇虎科技有限公司 | The data processing method of a kind of real-time calculating platform and device |
CN106156307A (en) * | 2016-06-30 | 2016-11-23 | 北京奇虎科技有限公司 | The data handling system of a kind of real-time calculating platform and method |
-
2016
- 2016-11-29 CN CN201611069230.9A patent/CN106776855B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104636352A (en) * | 2013-11-08 | 2015-05-20 | 中国石油天然气股份有限公司 | SCADA system historical data complement and query processing method based on quality stamp |
US20160306817A1 (en) * | 2015-04-14 | 2016-10-20 | Et International, Inc. | Systems and methods for key-value stores |
CN106126721A (en) * | 2016-06-30 | 2016-11-16 | 北京奇虎科技有限公司 | The data processing method of a kind of real-time calculating platform and device |
CN106156307A (en) * | 2016-06-30 | 2016-11-23 | 北京奇虎科技有限公司 | The data handling system of a kind of real-time calculating platform and method |
Non-Patent Citations (1)
Title |
---|
鱼儿慢慢游: ""spark streaming 对接 kafka记录"", 《HTTPS://WWW.CNBLOGS.COM/MISSMZT/P/6004868.HTML》 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108228830A (en) * | 2018-01-03 | 2018-06-29 | 广东工业大学 | A kind of data processing system |
CN108062251A (en) * | 2018-01-09 | 2018-05-22 | 福建星瑞格软件有限公司 | A kind of server resource recovery method and computer equipment |
CN108647329A (en) * | 2018-05-11 | 2018-10-12 | 中国联合网络通信集团有限公司 | Processing method, device and the computer readable storage medium of user behavior data |
CN108647329B (en) * | 2018-05-11 | 2021-08-10 | 中国联合网络通信集团有限公司 | User behavior data processing method and device and computer readable storage medium |
CN110912949A (en) * | 2018-09-14 | 2020-03-24 | 北京京东尚科信息技术有限公司 | Method and device for submitting sites |
CN110912949B (en) * | 2018-09-14 | 2022-11-08 | 北京京东尚科信息技术有限公司 | Method and device for submitting sites |
CN111163118B (en) * | 2018-11-07 | 2023-04-07 | 株式会社日立制作所 | Message transmission method and device in Kafka cluster |
CN111163118A (en) * | 2018-11-07 | 2020-05-15 | 株式会社日立制作所 | Message transmission method and device in Kafka cluster |
CN111328013A (en) * | 2018-12-17 | 2020-06-23 | ***通信集团山东有限公司 | Mobile terminal positioning method and system |
CN109634784A (en) * | 2018-12-24 | 2019-04-16 | 康成投资(中国)有限公司 | Spark application control method and control device |
CN110647570A (en) * | 2019-09-20 | 2020-01-03 | 百度在线网络技术(北京)有限公司 | Data processing method and device and electronic equipment |
CN110647570B (en) * | 2019-09-20 | 2022-04-29 | 百度在线网络技术(北京)有限公司 | Data processing method and device and electronic equipment |
CN110648178A (en) * | 2019-09-24 | 2020-01-03 | 四川长虹电器股份有限公司 | Method for increasing kafka consumption capacity |
CN111061565A (en) * | 2019-12-12 | 2020-04-24 | 湖南大学 | Two-stage pipeline task scheduling method and system in Spark environment |
CN111061565B (en) * | 2019-12-12 | 2023-08-25 | 湖南大学 | Two-section pipeline task scheduling method and system in Spark environment |
CN111124650B (en) * | 2019-12-26 | 2023-10-24 | 中国建设银行股份有限公司 | Stream data processing method and device |
CN111124650A (en) * | 2019-12-26 | 2020-05-08 | 中国建设银行股份有限公司 | Streaming data processing method and device |
CN111241051B (en) * | 2020-01-07 | 2023-09-12 | 深圳迅策科技有限公司 | Batch data processing method and device, terminal equipment and storage medium |
CN111241051A (en) * | 2020-01-07 | 2020-06-05 | 深圳迅策科技有限公司 | Batch data processing method and device, terminal equipment and storage medium |
CN111526188A (en) * | 2020-04-10 | 2020-08-11 | 北京计算机技术及应用研究所 | System and method for ensuring zero data loss based on Spark Streaming in combination with Kafka |
CN111526188B (en) * | 2020-04-10 | 2022-11-22 | 北京计算机技术及应用研究所 | System and method for ensuring zero data loss based on Spark Streaming in combination with Kafka |
CN112615773B (en) * | 2020-12-02 | 2023-02-28 | 海南车智易通信息技术有限公司 | Message processing method and system |
CN112615773A (en) * | 2020-12-02 | 2021-04-06 | 海南车智易通信息技术有限公司 | Message processing method and system |
CN112800073A (en) * | 2021-01-27 | 2021-05-14 | 浪潮云信息技术股份公司 | Method for updating Delta Lake based on NiFi |
Also Published As
Publication number | Publication date |
---|---|
CN106776855B (en) | 2020-03-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106776855A (en) | The processing method of Kafka data is read based on Spark Streaming | |
CN105224445B (en) | Distributed tracking system | |
US6014673A (en) | Simultaneous use of database and durable store in work flow and process flow systems | |
DE69635570T2 (en) | Clocking for a fast-failing, functionally missing, fault-tolerant multiprocessor system | |
US6085200A (en) | System and method for arranging database restoration data for efficient data recovery in transaction processing systems | |
CN101807073B (en) | Historical data processing method and device of distributed control system | |
JP2667039B2 (en) | Data management system and data management method | |
US20050223275A1 (en) | Performance data access | |
CN107193539B (en) | Multithreading concurrent processing method and multithreading concurrent processing system | |
WO2017079048A1 (en) | Clustered fault tolerance systems and methods using load-based failover | |
CN107193909A (en) | Data processing method and system | |
KR101708170B1 (en) | System and method for tracing the activity of a data processing unit supporting speculative instruction execution and out-of-order data transfers | |
CN105224888B (en) | A kind of data of magnetic disk array protection system based on safe early warning technology | |
CN109885453B (en) | Big data platform monitoring system based on stream data processing | |
CN112416724A (en) | Alarm processing method, system, computer equipment and storage medium | |
CN106991656B (en) | A kind of mass remote sensing image distribution geometric correction system and method | |
US20190205221A1 (en) | Error handling for services requiring guaranteed ordering of asynchronous operations in a distributed environment | |
CN106970846A (en) | Payment system message is controlled and processing method, device | |
US10664192B2 (en) | In-memory service with plural buffer type assignment | |
CN110196759A (en) | Distributed transaction processing method and device, storage medium and electronic device | |
CN110262945A (en) | A kind of method of intelligent monitoring data warehouse scheduling system | |
CN106776251A (en) | A kind of monitoring data processing unit and method | |
CN110222039A (en) | Data storage and garbage data cleaning method, device, equipment and storage medium | |
CN110324211A (en) | A kind of data capture method and device | |
CN104734895A (en) | Service monitoring system and service monitoring method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |