CN106504169A

CN106504169A - A kind of waterlogging data handling system and its processing method based on stream process

Info

Publication number: CN106504169A
Application number: CN201611026709.4A
Authority: CN
Inventors: 史鑫明
Original assignee: SUZHOU AEROSPACE SYSTEM ENGINEERING Co Ltd
Current assignee: SUZHOU AEROSPACE SYSTEM ENGINEERING Co Ltd
Priority date: 2016-11-22
Filing date: 2016-11-22
Publication date: 2017-03-15

Abstract

The invention discloses a kind of waterlogging data handling system based on stream process, which includes waterlogging model computation module, Flume modules, Kafka modules, SparkStreaming modules and application system.Reading and treatment effeciency is improved using SparkStreaming stream process framework, result of calculation is submitted to by stream process framework with interval of timestamps, the parsing of Shp files is carried out in stream process framework, and the result to same node, the result for keeping up with a time is compared, the relatively last result of each node is exported, the different triangle gridding of water depth value is exported.And then meet actual demand.Improve the efficiency of our process and displaying.

Description

A kind of waterlogging data handling system and its processing method based on stream process

Technical field

The invention belongs to high amount of traffic process application, in particular to a kind of process waterlogging data system and Method.

Background technology

With the development of big data, people are to the processing requirement of big data also more and more higher, original batch processing framework MapReduce is suitable for calculated off line, cannot but meet the higher business of requirement of real-time, such as real-time recommendation, user behavior analysis Deng.

Spark Streaming are built upon the real-time Computational frame on Spark, by it provide abundant API, Based on the high-speed execution engine of internal memory, user can ask application in conjunction with streaming, batch processing and interaction audit trial, and Spark is a class The distributed computing framework of MapReduce is similar to, its core is elasticity distribution formula data set, there is provided richer than MapReduce Rich model, quickly can carry out successive ignition to data set, in internal memory to support the data mining algorithm and figure of complexity Shape computational algorithm.Spark Streaming are a kind of real-time Computational frame of structure on Spark, and it extends Spark process The ability of extensive stream data.

Flume is the system of distributed, reliable and High Availabitity massive logs collection, polymerization and a transmission, supports Various types of data sender is customized in system, for collecting data；Meanwhile, Flume is provided and is carried out simple process to data, and is write Various data receivings（Customizable）Ability.

Flume is mainly purchased into by 3 important components：

Source:The collection to daily record data is completed, is divided into transtion and event is driven among channel.

Channel:The function of a queue is mainly provided, and the data in providing to source are simply cached.

Sink:The data in Channel are taken out, corresponding storage file system, data base is carried out, or is submitted to long-range Server.

It is using the journal file for being the original record of the program that directly reads, base to change minimum occupation mode to existing program Originally seamless access can be realized, it is not necessary to which existing program is made any change.

Flume divides three-tier architecture in logic：Agent, collector and storage.

①agent

For gathered data, agent be in flume produce data flow where, meanwhile, the data of generation can be spread by agent Defeated to collector.

②collector

The effect of collector be by the data summarization of multiple agent after, be loaded in storage.

③storage

Storage is storage system, can be common a file, or HDFS, HIVE, HBase etc..

At present, as due to the characteristic of geography information, the real-time estimate of waterlogging model fails to carry using Distributed Calculation The high computational efficiency of itself.Therefore for the calculating of large area waterlogging model, the calculating for carrying out zones of different using multiple nodes Then the result of each node is processed.But for model prediction area increasing when, need to process Data also more and more, single work station configures higher server and is increasingly difficult to the demand for meeting this change.

Content of the invention

For overcoming deficiency of the prior art, it is an object of the invention to provide at a kind of waterlogging data based on stream process Reason system is improving the efficiency and real-time of the bandwagon effect of result.

For realizing above-mentioned technical purpose, above-mentioned technique effect is reached, the present invention is achieved through the following technical solutions：

A kind of waterlogging data handling system based on stream process, which includes waterlogging model computation module, Flume modules, Kafka moulds Block, SparkStreaming modules and application system；The waterlogging model computation module will produce substantial amounts of waterlogging Predicting Technique Result data, is then stored as Shp files with Shp forms（Shp files are developed by ESRI, and the Shp files of an ESRI include one Individual master file, an index file, and a dBASE table, the suffix of wherein master file is exactly .shp）, the Flume modules lead to Cross its Agent and collect the Shp files, be then aggregated into the collector of the Flume modules, the Flume modules Daily record is transported to Sink the production procedure that the Kafka modules complete data, and the SparkStreaming modules are followed the trail of and disappeared The side-play amount or offset for taking this data is consumed, and is encoded with parsing described in the SparkStreaming modules The program of Shp files, described program return the result of change every time after parsing the Shp files, be transmitted further to the Kafka moulds Block, then communication is set up by the application system and the Kafka systems, specific message queue is monitored, the result of change is obtained, Complete the displaying of GIS information.

Another goal of the invention of the present invention is to provide a kind of waterlogging data processing method based on stream process, it include with Lower step：

1）The calculating that zones of different is carried out by waterlogging model computation module to node；

2）The results of prediction and calculation of these multiple nodes is collected by process by Flume modules；

3）The result that collects is processed by SparkStreaming modules, result of calculation is submitted to interval of timestamps Stream process framework, carries out the parsing of Shp files in stream process framework；

4）By result of the Kafka modules to same node, the result for keeping up with a time is compared；

5）The relatively last result of each node is exported by application system, the different triangle gridding of water depth value is exported.

The invention has the beneficial effects as follows:

Compared with prior art, the result of calculation of waterlogging model is used for stream calculation framework by system and method for the invention, is carried The speed of the displaying of high waterlogging early warning.Manager can be made to take the precautionary measures faster, reduce loss.

Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of description, below with presently preferred embodiments of the present invention and coordinate accompanying drawing describe in detail as after. The specific embodiment of the present invention is shown in detail in by following examples and its accompanying drawing.

Description of the drawings

Accompanying drawing described herein is used for providing a further understanding of the present invention, constitutes the part of the application, this Bright schematic description and description does not constitute inappropriate limitation of the present invention for explaining the present invention.In the accompanying drawings：

Fig. 1 is the system framework schematic diagram of the present invention.

Specific embodiment

Below with reference to the accompanying drawings and in conjunction with the embodiments, the present invention is described in detail.

Shown in Figure 1, a kind of waterlogging data handling system based on stream process, it include waterlogging model computation module 1, Flume modules 2, Kafka modules 3, SparkStreaming modules 4 and application system 5；The waterlogging model computation module 1 will Substantial amounts of waterlogging Predicting Technique result data is produced, Shp files are stored as with Shp forms then, the Flume modules 2 pass through Its Agent collects the Shp files, is then aggregated into the collector of the Flume modules 2, the Flume modules 2 Daily record is transported to Sink the production procedure that the Kafka modules 3 complete data, and the SparkStreaming modules 4 are followed the trail of The side-play amount or offset for consuming this data is consumed, and is encoded with parsing institute in the SparkStreaming modules 4 The program of Shp files is stated, described program returns the result of change every time, is transmitted further to the Kafka after parsing the Shp files Module 3, then communication is set up by the application system 5 and the Kafka systems 3, specific message queue is monitored, change is obtained As a result, the displaying of GIS information is completed.

The processing method of the waterlogging data handling system of the present embodiment is as follows：

1）The calculating that zones of different is carried out by waterlogging model computation module 1 to node；

2）The results of prediction and calculation of these multiple nodes is collected by process by Flume modules 2；

3）Processed by the result of 4 pairs of collections of SparkStreaming modules, result of calculation is submitted to interval of timestamps To stream process framework, the parsing of Shp files is carried out in stream process framework；

4）By result of the Kafka modules 3 to same node, the result for keeping up with a time is compared；

5）The relatively last result of each node is exported by application system 5, the different triangle gridding of water depth value is exported.

The preferred embodiments of the present invention are the foregoing is only, the present invention is not limited to, for the skill of this area For art personnel, the present invention can have various modifications and variations.All within the spirit and principles in the present invention, made any repair Change, equivalent, improvement etc., should be included within the scope of the present invention.

Claims

1. a kind of waterlogging data handling system based on stream process, it is characterised in that：Including waterlogging model computation module（1）、 Flume modules（2）, Kafka modules（3）, SparkStreaming modules（4）And application system（5）；

The waterlogging model computation module（1）Substantial amounts of waterlogging Predicting Technique result data will be produced, will then be stored with Shp forms For Shp files, the Flume modules（2）The Shp files are collected by its Agent, the Flume modules are then aggregated into （2）Collector, the Flume modules（2）Sink daily record is transported to the Kafka modules（3）Complete the life of data Produce flow process, the SparkStreaming modules（4）The side-play amount of this data is consumed in tracking or offset is consumed, institute State SparkStreaming modules（4）In be encoded with the program that parses the Shp files, described program parses the Shp files Return the result of change every time afterwards, be transmitted further to the Kafka modules（3）, then by the application system（5）With the Kafka System（3）Communication is set up, specific message queue is monitored, the result of change is obtained, is completed the displaying of GIS information.

2. a kind of waterlogging data processing method based on stream process, it is characterised in that including following processing method：

1）By waterlogging model computation module（1）The calculating that zones of different is carried out to node；

2）By Flume modules（2）The results of prediction and calculation of these multiple nodes is collected process；

3）By SparkStreaming modules（4）The result that collects is processed, result of calculation is carried with interval of timestamps Stream process framework is given, and the parsing of Shp files is carried out in stream process framework；

4）By Kafka modules（3）Result to same node, the result for keeping up with a time are compared；

5）By application system（5）The relatively last result of each node is exported, the different triangle gridding of water depth value carries out defeated Go out.