CN112667683A - Stream computing system, electronic device and storage medium therefor - Google Patents

Stream computing system, electronic device and storage medium therefor Download PDF

Info

Publication number
CN112667683A
CN112667683A CN202011559972.6A CN202011559972A CN112667683A CN 112667683 A CN112667683 A CN 112667683A CN 202011559972 A CN202011559972 A CN 202011559972A CN 112667683 A CN112667683 A CN 112667683A
Authority
CN
China
Prior art keywords
stream
stream computing
task
data
data source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011559972.6A
Other languages
Chinese (zh)
Other versions
CN112667683B (en
Inventor
蒋英明
万书武
张观成
赵楚旋
林琪琛
刘微明
覃芳
曹晓能
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011559972.6A priority Critical patent/CN112667683B/en
Publication of CN112667683A publication Critical patent/CN112667683A/en
Application granted granted Critical
Publication of CN112667683B publication Critical patent/CN112667683B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application is applicable to the technical field of data processing, and provides a stream computing system, which comprises: the data acquisition layer is used for configuring a data source of the stream computing task and acquiring stream data resources required by processing the stream computing task from the data source in real time; the data bus layer is used for creating a task topic corresponding to the stream calculation task and caching the stream data resources acquired from the data source in real time according to the task topic; the resource management layer is used for scheduling and managing real-time stream data resources for the stream computing task according to the task topic; the computing engine layer is used for developing a corresponding stream computing mode for the stream computing task according to the data source configured by the stream computing task, executing the stream computing task according to the stream computing mode and outputting a stream computing result; and the storage and interface layer is used for storing the flow calculation result and providing an output interface for the flow calculation result. The method solves the problems that the existing flow computing system has high development threshold and high development and operation and maintenance cost when realizing enterprise-level productization application.

Description

Stream computing system, electronic device and storage medium therefor
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a stream computing system, and an electronic device and a storage medium thereof.
Background
The real-time flow calculation is a process that a pointer calculates flow data in real time, mass data from different data sources are obtained in real time, and valuable information is obtained through real-time analysis processing, so that sensing, analysis, judgment and decision making are realized through an in-the-spot mode or even an in-advance mode. Stream computing inherits the basic idea that the value of data decreases with the passage of time, and thus, in order to be able to process stream data in a timely manner, it is necessary to provide a low-latency, scalable, highly reliable stream computing system.
The existing stream computing engines mainly comprise commercial-grade InfoSphere Streams and StreamBase and open-source versions of Twitter Storm, Spark stream and Flink, and the Spark stream and Flink engines are widely applied. However, Spark Streaming and Flink engines are usually open-source Streaming computing frameworks, and existing Streaming computing systems are real-time Streaming computing applications with single function, specific application scenarios and specific application modes developed by qualified developers using open-source Streaming computing frameworks such as Spark Streaming and Flink in combination with product requirements of enterprises when implementing enterprise-level productized applications. The real-time flow calculation application has high development threshold and high development and operation and maintenance cost.
Disclosure of Invention
In view of this, embodiments of the present application provide a stream computing system, and an electronic device and a storage medium thereof, which aim to at least solve one of the problems of the existing stream computing system, such as high application development difficulty and high operation and maintenance cost.
A first aspect of an embodiment of the present application provides a stream computing system, including:
the data acquisition layer is used for configuring a data source of a stream computing task and acquiring stream data resources required by processing the stream computing task from the data source in real time;
the data bus layer is used for creating a task topic corresponding to the stream computing task and caching stream data resources acquired from the data source in real time according to the task topic;
the resource management layer is used for scheduling and managing real-time stream data resources for the stream computing task according to the task topic;
the computing engine layer is used for developing a corresponding stream computing mode for the stream computing task according to the data source configured by the stream computing task, executing the stream computing task according to the stream computing mode and outputting a stream computing result;
and the storage and interface layer is used for storing the stream calculation result and providing an output interface for the stream calculation result.
With reference to the first aspect, in a first possible implementation manner of the first aspect, a data source configuration unit is disposed in the data acquisition layer, and the data source configuration unit is configured to select, according to a flow computation task execution request of a user, one data source configured as the flow computation task from pre-accessed configurable data sources, where the pre-accessed configurable data source includes at least one of: mysql data source, postgresql data source, oracle data source, SQLserver data source, log data source, message bus MQ data source, external kafka data source, and restful API data source.
With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, a development mode configuration unit is disposed in the computation engine layer, and the development mode configuration unit is configured to configure a development mode for the stream computation task according to a data source configured by the stream computation task, where the development mode configurable by the development mode configuration unit includes one or more of an sql development mode, a jar development mode, and a canvas development mode.
With reference to the first aspect, in a third possible implementation manner of the first aspect, the resource management layer uses a kubernets resource management manner and a yann resource management manner when scheduling and managing real-time streaming data resources, where when the resource management layer identifies that a user requesting to execute a streaming computation task is a traditional hadoop user, the resource management layer provides the traditional hadoop user with the yann resource management manner.
With reference to the first aspect, in a fourth possible implementation manner of the first aspect, the stream computing system further includes: service management module and operation system module, wherein:
the service management module is used for providing at least one service of multi-tenant resource management, unified user right management compatible with kubernets ecology and hadoop ecology, health state monitoring management of flow calculation tasks, operation index monitoring management of the flow calculation tasks and data center monitoring management in the flow calculation system;
the operation system module is used for providing at least one of a user guiding mechanism, a user behavior auditing mechanism, a system automatic capacity expansion mechanism, a system abnormity early warning mechanism and a system abnormity recovery mechanism related to the document and video data in the stream computing system.
With reference to the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, the service management module is configured to, when providing unified management of user permissions compatible with a kubernets ecology and a hadoop ecology in the stream computing system, detect whether a user has permission to operate a real-time data bus layer, limit flow of reading and writing data resources of the user with the task topic as an object, and detect whether the flow computing task is scheduled by using the kubernets resource management manner and has a preset hadoop operation permission.
With reference to the fourth possible implementation manner of the first aspect, in a sixth possible implementation manner of the first aspect, when the service management module provides a health status monitoring management service for a stream computing task in the stream computing system, the service management module is further configured to monitor whether the stream computing task is alive and whether backpressure exists, where if monitoring that a processing process of the stream computing task has been interrupted or failed, it is determined that the stream computing task is not alive; and if the stream data receiving speed is greater than the stream data processing speed in the stream data processing process of the monitoring stream calculation task, judging the back pressure of the stream calculation task.
With reference to the first aspect, in a seventh possible implementation manner of the first aspect, the stream computing system further includes: an operation system module for providing at least one of a user guidance mechanism, a user behavior auditing mechanism, a system automatic capacity expansion mechanism, a system anomaly early warning mechanism and a system anomaly recovery mechanism for documents and video data in the stream computing system.
A second aspect of embodiments of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the electronic device, where the processor implements the functions of the system provided in the first aspect when executing the computer program.
A third aspect of embodiments of the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the functions of the system provided by the first aspect.
The stream computing system, the electronic device and the storage medium thereof provided by the embodiment of the application have the advantages that:
through a data acquisition layer, a data bus layer, a resource management layer, a calculation engine layer and a storage and interface layer which are arranged in a stream calculation system, a mapping relation between a task topic and a data source configured by a stream calculation task is established on the data bus layer in a one-to-one correspondence manner, so that the data source-oriented resource isolation management of the stream calculation task is realized, the flow quota of the data source in the stream calculation task is configured according to the task topic on the basis of the resource isolation, the flow monitoring is realized, the automatic resource allocation management within the flow quota limit range is carried out, and the like, the development of a stream calculation mode and the stream calculation processing are realized by selecting a corresponding development mode according to the type of the data source on the calculation engine layer, thereby realizing the adaptation to the development of different stream calculation applications and the meeting the requirements of different skill user groups, and reducing the development threshold of the stream calculation applications, meanwhile, the development and operation and maintenance cost is reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic diagram illustrating a basic functional architecture of a stream computing system according to a first embodiment of the present application;
FIG. 2 is a schematic diagram of a technical architecture of a stream computing system according to another embodiment of the present application;
fig. 3 is a schematic diagram of an architecture of a stream computing system according to a third embodiment of the present application;
fig. 4 is a block diagram of an electronic device according to a fourth embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating a basic functional architecture of a stream computing system according to a first embodiment of the present application. The details are as follows: the stream computing system 100 in this embodiment may include a data collection layer 110, a data bus layer 120, a resource management layer 130, a compute engine layer 140, and a storage and interface layer 150. Wherein:
the data collection layer 110 is used for configuring a data source of a stream computing task and collecting stream data resources required for processing the stream computing task from the data source in real time. In this embodiment, the data acquisition layer 110 includes a plurality of configurable data sources that are accessed by the stream computing system 100 in advance, so as to provide the data acquisition layer 110 with automatic acquisition of the data sources. In this embodiment, the data acquisition layer may be provided with a data source configuration unit, and the data source configuration unit is configured to select a data source configured as the stream computation task from the pre-accessed configurable data sources according to a stream computation task execution request of a user. Specifically, the data sources previously accessed by the stream computing system 100 include, but are not limited to, eight types of mysql data sources, postgresql data sources, oracle data sources, SQLserver data sources, log data sources, message bus MQ data sources, external kafka data sources, and restful API data sources. The stream computing system triggers the data source configuration unit in the data acquisition layer 110 to select one data source configured as a stream computing task from the eight data sources according to the stream computing task execution request of the user by responding to the stream computing task execution request of the user, so that when the stream computing task is executed, the stream computing system acquires the stream data resources required by the stream computing task from the configured data source in real time.
The data bus layer 120 is configured to create a task topic corresponding to the stream computation task, and cache a stream data resource acquired from the data source in real time according to the task topic. In this embodiment, the stream computing system employs an Apache kakfa (distributed publish-subscribe messaging system) based cluster deployment. By creating a task topic on the data bus layer 120 corresponding to the stream computation task, i.e., creating topic in kakfa, the data bus layer 120 may cache the stream data resources collected from the data source in real time according to the task topic. Based on the method, by creating the task theme, the one-to-one mapping relation between the task theme and the data source configured by the stream computing task can be established, and the data source-oriented resource isolation management of the stream computing task is realized.
The resource management layer 130 is configured to schedule and manage real-time streaming data resources for the streaming computation task according to the task topic. In this embodiment, the resource management layer 130 manages and schedules the cluster resources in a unified manner, specifically, allocates the stream computation tasks to the working nodes of the stream computation system by using a scheduling algorithm, and performs resource allocation management and a scheduling engine on the stream computation tasks by monitoring the use condition of the stream data resources on each working node. The resource allocation management further includes, but is not limited to, calculating a traffic quota of a data source in the task according to the task theme configuration stream, implementing traffic monitoring, performing automatic capacity expansion within a traffic quota limit range, and the like.
The calculation engine layer 140 is configured to develop a corresponding stream calculation manner for the stream calculation task according to the data source configured by the stream calculation task, execute the stream calculation task according to the stream calculation manner, and output a stream calculation result. In this embodiment, the compute engine layer 130 supports the basic framework of Apache flash, Apache spark streaming. The 8 data sources configured in the data acquisition layer 110 can be roughly classified into three types: DB data type, log file type, restapi type. Correspondingly, the computation engine layer 130 provides three stream computation development modes, which are an sql development mode (sql mode for developing stream computation tasks), a jar development mode (original jar mode for deploying stream computation tasks), and a canvas development mode (canvas dragging mode for automatically constructing stream computation tasks). And selecting a corresponding stream computing development mode to develop a stream computing mode for the stream computing task based on the type of the data source configured by the stream computing task, and further executing the stream computing task according to the correspondingly developed stream computing mode and outputting a stream computing result. Based on the method, the corresponding development mode can be selected in the stream computing system according to the type of the data source to develop the stream computing mode, different stream computing application developments can be adapted, the requirements of user groups with different skills can be met, the development threshold of the stream computing application is reduced, and meanwhile the development and operation and maintenance costs are also reduced.
Storage and interface layer 150 is used to store the stream computation results and provide an output interface for the stream computation results. In this embodiment, the stream computing system is provided with a storage unit in the storage and interface layer 150, and when the computing engine layer 140 executes the stream computing task to obtain a stream computing result, the stream computing result is stored in the storage unit. The stream computing system is further provided with a message subscription interface in the storage and interface layer 150 for providing the user with a stream computing result. In the embodiment, the message subscription interface is configured as an SDKAPI interface of java and python languages, and supports third-party system integration. Through the SDKAPI interface, the background service of the stream computing system can be packaged, a user can directly use the stream computing system in an interface authentication mode, the user does not need to pay attention to the background service logic of the bottom layer of the system, the operation is simple, and the use threshold is low.
The stream computing system described in this embodiment establishes a mapping relationship between a task topic and a data source configured by a stream computing task in a data bus layer, by using a data acquisition layer, a data bus layer, a resource management layer, a computing engine layer, and a storage and interface layer, and performs data source-oriented resource isolation management on the stream computing task, and implements resource allocation management such as configuring a traffic quota of the data source in the stream computing task according to the task topic, implementing traffic monitoring, and performing automatic capacity expansion within a traffic quota limit range, and implements development of a stream computing mode and stream computing processing in the computing engine layer according to a corresponding development mode selected by a type of the data source, thereby implementing adaptation to different stream computing application developments and satisfying requirements of user groups with different skills, and reducing a development threshold of the stream computing application, meanwhile, the development and operation and maintenance cost is reduced.
In this embodiment, the resource management layer 130 may further be compatible with two resource management modes, kubernets and yarn. The method comprises the steps that a kubernets resource management mode is used during scheduling and management of real-time stream data resources, and when the resource management layer identifies that a user requesting to execute a stream computing task is a traditional hadoop user, a compatible yann resource management mode is provided for the traditional hadoop user. The kubernets is a container cluster management system, resource management is achieved through container deployment, containers are isolated from one another, each container has a file system, processes among the containers cannot affect one another, and computing resources can be distinguished. The kubernets have the functions of automatic deployment, automatic capacity expansion and contraction, maintenance and the like. Yarn (yet other Resource manager) is a hadoop cluster Resource manager system, which is a universal Resource management system, and is used for providing uniform Resource management and scheduling for upper-layer applications. In this embodiment, it is mainly to provide a yarn computing resource management mode for the legacy hadoop user in order to be compatible with the application habits of the legacy hadoop user. In this embodiment, based on the task theme created in the data bus layer and the mapping relationship between the task theme and the data source configured by the stream computing task, in cooperation with containerized resource management of kubernets, resource isolation management oriented to the data source can be well performed on the stream computing task, so that platform-level resource isolation management is established based on Kafka, and thus, authority management configured according to product projects can be conveniently realized when enterprise-level productized application is realized. In this embodiment, configuring the rights management according to the product project may use a system project member as an authentication basis to determine whether the user has a right to operate the real-time data bus, and the rights management is simpler than the traditional rights management based on the Kafka client. In addition, in the embodiment, unified management of data rights of kubernets and hadoops is also set,
in this embodiment of the application, the computation engine layer 140 may further include a development mode configuration unit configured to configure a development mode for the stream computation task according to the data source configured by the stream computation task, where the development mode configurable by the development mode configuration unit includes an sql development mode, a jar development mode, and a canvas development mode. In this embodiment, the following three data source types are obtained by classifying the configurable data sources in the system data acquisition layer: a DB data type, a log file type, and a restapi type. At this time, the development mode configuration unit may configure the corresponding development mode for the stream computation task according to the data source type by identifying the data source type to which the data source configured by the stream computation task belongs. For example, for a data source of a DB data type, a development mode configured by the development mode configuration unit is an sql development mode, and at this time, the computation engine layer develops a corresponding flow computation manner for a flow computation task by using the sql development mode; aiming at a data source of a log file type, a development mode configured by a development mode configuration unit is a jar development mode, and at the moment, a corresponding stream calculation mode is developed for a stream calculation task by a calculation engine layer by adopting the jar development mode; aiming at the restapi type data source, the development mode configured by the development mode configuration unit is a canvas development mode, and the computation engine layer adopts the canvas development mode to develop a corresponding stream computation mode for the stream computation task. Therefore, the calculation engine layer develops a corresponding stream calculation mode for the stream calculation task according to the data source configured by the stream calculation task.
In another embodiment of the present application, the stream computing system service adopts a front-end and back-end separation design from a technical architecture, please refer to fig. 2, and fig. 2 is a schematic diagram of a technical architecture of a stream computing system according to another embodiment of the present application. As shown in FIG. 2, in a streaming computing system 200, a front-end web UI interface is employed that is responsible for directly interacting with a user to manipulate a portal. The operation entry is a background service of the stream computing system and is a scheduler of a web UI interface and a background engine. The operation entries comprise GBD-RTC, GBD-BUS and GBD-RSC, wherein the GBD-RTC is located in the data acquisition layer and is responsible for starting the collector in the data acquisition layer to acquire real-time data. In the data collection layer, collectors of various configurations of the flow computing system 200 may be included, such as mysql collector (mysql-collector), postgresql collector (PG-collector), oracle collector (oracle-collector), SQLserver collector (SQLserver-collector), log collector (log-collector), message bus MQ collector (MQ-collector), external kafka collector (kafka-collector), and restful API collector (restful-collector). The GBD-BUS is located at the data BUS layer and is responsible for creating task topics (topics) in the data BUS layer. The GBD-RSC is located in a calculation engine layer and is responsible for task distribution in the calculation engine layer, and the task distribution supports a basic framework of flash and spark streaming. In this embodiment, the stream computing system 200 constructs a kubernets ecology in the resource management layer, so that the stream computing system 200 supports a kubernets resource management manner, and constructs a Hadoop ecology in the storage and interface layer, so that the stream computing system 200 is compatible with a yarn resource management manner while supporting the kubernets resource management manner. And an SDK API interface is configured in the storage and interface layer to obtain the stream computation result computed by the stream computation system 200 in a manner that the stream computation system supports third-party system integration. In this embodiment, one or multiple data centers may be deployed in the stream computing system 200, and each data center is constructed with a kubernets ecology and a Hadoop ecology, so that the stream computing system 200 supports two resource management modes, namely kubernets and yarn.
In an embodiment of the present application, please refer to fig. 3, and fig. 3 is a schematic diagram illustrating an architecture of a stream computing system according to a third embodiment of the present application. As shown in fig. 3, based on the technical architecture of the stream computing system service, the front-end web UI interface of the stream computing system 300 includes a service management module 310 and an operation architecture module 320. The service management module 310 is configured to provide at least one service of multi-tenant resource management, unified user right management compatible with kubernets ecology and hadoop ecology, health state monitoring management of flow computing tasks, operation index monitoring management of flow computing tasks, and data center monitoring management in the flow computing system. The operation system module is used for providing at least one of a user guiding mechanism, a user behavior auditing mechanism, a system automatic capacity expansion mechanism, a system abnormity early warning mechanism and a system abnormity recovery mechanism related to the document and video data in the stream computing system.
In this embodiment, when providing the multi-tenant resource management service, the service management module 310 performs resource isolation on data resources required by each flow computing task of multiple tenants by constructing kubernets and hadoop resource layer tenant isolation, so as to perform data resource management based on tenant dimensions, thereby implementing multi-tenant resource management of the flow computing system.
In this embodiment, when providing a user permission unified management service compatible with a kubernets ecology and a hadoop ecology, the service management module 310 detects whether a user has permission to operate a real-time data bus layer by using a system project member as an authentication basis, limits the flow of reading and writing data resources of the user by using a task topic as an object, and presets a corresponding hadoop operation permission in a flow computing system, and further detects whether the flow computing task is scheduled by using the kubernets resource management method and has the preset hadoop operation permission, thereby realizing user permission unified management compatible with the kubernets ecology and the hadoop ecology.
In this embodiment, the service management module 310, when providing the health status monitoring management service for the flow computing task, monitors the survivability of the flow computing task and the backpressure condition, i.e. monitors whether the flow computing task processing process has been interrupted or failed. The back pressure monitoring is to monitor whether the flow data receiving speed is larger than the flow data processing speed in the flow data processing process of the flow computing system when the flow computing task is executed, and if the flow data receiving speed is larger than the flow data processing speed, the back pressure is generated. And when the health state of the stream computing task is monitored to have the conditions of whether the processing process is interrupted or failed, the back pressure of the stream data processing process and the like, displaying that the health state of the stream computing task is abnormal. Furthermore, an alarm strategy of a telephone, a short message or other chat tools can be provided, related personnel are informed of the abnormal condition of the health state of the flow calculation task, and the related personnel can conveniently execute abnormal recovery operation in time.
In this embodiment, when providing the operation index monitoring management service for the flow calculation task, the service management module 310 monitors various operation index parameters such as the number of bytes flowing in per second, the number of records flowing in per second, the number of bytes flowing out per second, the number of records flowing out per second, and the like in the execution process of the flow calculation task, and adjusts the parameters to achieve the purposes of the data source flow quota and the automatic capacity expansion.
In this embodiment, the data link is long and the aging requirement is high during processing of the flow calculation task, which is basically at the millisecond level, and if the cross-data center processing occurs, the aging requirement is difficult to achieve. Therefore, in the present embodiment, when providing the data center monitoring management service, the service management module 310 mainly monitors the stream computation task and prevents the stream data processing process of the stream computation task from crossing the data center. When the stream computing system supports a plurality of data centers, data synchronization is realized in a data disaster recovery or dual-activity strategy mode. The dual-activity strategy is that when one data center has a problem and cannot continuously execute the stream computing task, the stream computing system automatically switches services to ensure that the stream computing task is normally processed.
With the combination of the embodiment, from the perspective of enterprise products, the stream computing system provided by the application realizes visualization and configurability, and reduces the trial-and-error cost of user data development from the whole process management of development, testing, deployment, operation and maintenance.
Referring to fig. 4, fig. 4 is a block diagram of an electronic device according to a fourth embodiment of the present disclosure. As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: a processor 41, a memory 42 and a computer program 43 stored in said memory 42 and executable on said processor 41. The processor 41, when executing the computer program 43, implements the functions of the flow computing system corresponding to each execution layer or unit or module in the above embodiments.
Illustratively, the computer program 43 may be partitioned into one or more execution layers, which are stored in the memory 42 and executed by the processor 41 to accomplish the present application. The one or more execution layers may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 43 in the electronic device 4. For example, the computer program 43 may be divided into:
the data acquisition layer is used for configuring a data source of a stream computing task and acquiring stream data resources required by processing the stream computing task from the data source in real time;
the data bus layer is used for creating a task topic corresponding to the stream computing task and caching stream data resources acquired from the data source in real time according to the task topic;
the resource management layer is used for scheduling and managing real-time stream data resources for the stream computing task according to the task topic;
the computing engine layer is used for developing a corresponding stream computing mode for the stream computing task according to the data source configured by the stream computing task, executing the stream computing task according to the stream computing mode and outputting a stream computing result;
and the storage and interface layer is used for storing the stream calculation result and providing an output interface for the stream calculation result.
The electronic device may include, but is not limited to, a processor 41, a memory 42. Those skilled in the art will appreciate that fig. 4 is merely an example of an electronic device 4 and does not constitute a limitation of the electronic device 4 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the electronic device may also include input-output devices, network access devices, buses, etc.
The Processor 41 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 42 may be an internal storage unit of the electronic device 4, such as a hard disk or a memory of the electronic device 4. The memory 42 may also be an external storage device of the electronic device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 4. Further, the memory 42 may also include both an internal storage unit and an external storage device of the electronic device 4. The memory 42 is used for storing the computer program and other programs and data required by the electronic device. The memory 42 may also be used to temporarily store data that has been output or is to be output.
It should be noted that, because the above-mentioned information interaction between the execution layers, the execution process, and the like are based on the same concept as that of the system embodiment of the present application, specific functions and technical effects thereof may be referred to specifically in the method embodiment section, and are not described herein again.
Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program may implement the functions in the above-described system embodiments. In this embodiment, the computer-readable storage medium may be nonvolatile or volatile.
Embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the functions in the above system embodiments when executed. Wherein:
in one embodiment, the computer program product, when executed on a mobile terminal, causes the mobile terminal to perform the following functions:
configuring a data source of a stream computing task and acquiring stream data resources required by processing the stream computing task from the data source in real time;
creating a task topic corresponding to the stream computing task, and caching a stream data resource acquired from the data source in real time according to the task topic;
scheduling and managing real-time stream data resources of the stream computing task according to the task theme;
developing a corresponding stream computing mode for the stream computing task according to a data source configured by the stream computing task, executing the stream computing task according to the stream computing mode and outputting a stream computing result;
storing the stream computation result and providing an output interface for the stream computation result.
In one embodiment, the computer program product, when executed on a mobile terminal, causes the mobile terminal to perform the following functions:
selecting a data source configured as the stream computing task from pre-accessed configurable data sources according to a stream computing task execution request of a user, wherein the pre-accessed configurable data sources comprise at least one of the following: mysql data source, postgresql data source, oracle data source, SQLserver data source, log data source, message bus MQ data source, external kafka data source, and restful API data source.
In one embodiment, the computer program product, when executed on a mobile terminal, causes the mobile terminal to perform the following functions:
configuring a development mode for the stream computing task according to the data source configured by the stream computing task, wherein the development mode configurable by the development mode configuration unit comprises one or more of an sql development mode, a jar development mode and a canvas development mode.
In one embodiment, the computer program product, when executed on a mobile terminal, causes the mobile terminal to perform the following functions:
when scheduling and managing real-time stream data resources, a kubernets resource management mode is used and a yann resource management mode is compatible, wherein when the resource management layer identifies that a user requesting to execute a stream computing task is a traditional hadoop user, the resource management layer provides the traditional hadoop user with the yann resource management mode.
In one embodiment, the computer program product, when executed on a mobile terminal, causes the mobile terminal to perform the following functions:
providing at least one service of multi-tenant resource management, unified user right management compatible with kubernets ecology and hadoop ecology, health state monitoring management of flow computing tasks, operation index monitoring management of the flow computing tasks and data center monitoring management in the flow computing system.
In one embodiment, the computer program product, when executed on a mobile terminal, causes the mobile terminal to perform the following functions:
and detecting whether a user has the authority to operate a real-time data bus layer and whether the user has a preset hadoop operation authority when the stream computing task is scheduled by adopting the kubernets resource management mode.
In one embodiment, the computer program product, when executed on a mobile terminal, causes the mobile terminal to perform the following functions:
monitoring whether the flow computing task is alive and backpressure, wherein if the processing process of the flow computing task is interrupted or failed, the flow computing task is judged not to be alive; and if the stream data receiving speed is greater than the stream data processing speed in the stream data processing process of the monitoring stream calculation task, judging the back pressure of the stream calculation task.
In one embodiment, the computer program product, when executed on a mobile terminal, causes the mobile terminal to perform the following functions:
providing at least one of a user guidance mechanism, a user behavior audit mechanism, a system automatic capacity expansion mechanism, a system anomaly early warning mechanism and a system anomaly recovery mechanism for the document and the video data in the stream computing system.
It will be clear to those skilled in the art that, for convenience and simplicity of description, the above division of the execution layers is merely illustrated, and in practical applications, the above function distribution may be performed by different functional units and modules as needed, that is, the internal structure of the system is divided into different functional units or modules to perform all or part of the above described functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the functions of the above embodiments may be implemented by a computer program, which can be stored in a computer readable storage medium and can be executed by a processor to implement the functions of the above embodiments of the system. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A stream computing system, comprising:
the data acquisition layer is used for configuring a data source of a stream computing task and acquiring stream data resources required by processing the stream computing task from the data source in real time;
the data bus layer is used for creating a task topic corresponding to the stream computing task and caching stream data resources acquired from the data source in real time according to the task topic;
the resource management layer is used for scheduling and managing real-time stream data resources for the stream computing task according to the task topic;
the computing engine layer is used for developing a corresponding stream computing mode for the stream computing task according to the data source configured by the stream computing task, executing the stream computing task according to the stream computing mode and outputting a stream computing result;
and the storage and interface layer is used for storing the stream calculation result and providing an output interface for the stream calculation result.
2. The stream computing system according to claim 1, wherein a data source configuration unit is disposed in the data acquisition layer, and the data source configuration unit is configured to select a data source configured as the stream computing task from pre-accessed configurable data sources according to a stream computing task execution request of a user, where the pre-accessed configurable data source includes at least one of: mysql data source, postgresql data source, oracle data source, SQLserver data source, log data source, message bus MQ data source, external kafka data source, and restful API data source.
3. The stream computing system according to claim 2, wherein a development mode configuration unit is disposed in the computing engine layer, and the development mode configuration unit is configured to configure development modes for the stream computing task according to the data source configured by the stream computing task, wherein the development modes configurable by the development mode configuration unit include one or more of an sql development mode, a jar development mode, and a canvas development mode.
4. The stream computing system of claim 1, wherein the resource management layer uses kubernets resource management and yarn resource management when scheduling and managing real-time stream data resources, and wherein when the resource management layer identifies that the user requesting to perform the stream computing task is a legacy hadoop user, the resource management layer provides the legacy hadoop user with the yarn resource management.
5. The stream computing system of claim 1, further comprising: the service management module is used for providing at least one service of multi-tenant resource management, unified user authority management compatible with kubernets ecology and hadoop ecology, health state monitoring management of flow calculation tasks, operation index monitoring management of the flow calculation tasks and data center monitoring management in the flow calculation system.
6. The stream computing system of claim 5, wherein the service management module, when providing a unified management service of user permissions compatible with a kubernets ecology and a hadoop ecology in the stream computing system, is further configured to detect whether a user has a permission to operate a real-time data bus layer and whether a preset hadoop operation permission is provided when the stream computing task is scheduled by using the kubernets resource management method.
7. The stream computing system of claim 5, wherein the service management module, when providing the health status monitoring management service for the stream computing tasks in the stream computing system, is further configured to monitor whether the stream computing tasks are alive and backpressure, wherein if monitoring the stream computing task processing has been interrupted or failed, it is determined that the stream computing tasks are not alive; and if the stream data receiving speed is greater than the stream data processing speed in the stream data processing process of the monitoring stream calculation task, judging the back pressure of the stream calculation task.
8. The stream computing system of claim 1, further comprising: an operation system module for providing at least one of a user guidance mechanism, a user behavior auditing mechanism, a system automatic capacity expansion mechanism, a system anomaly early warning mechanism and a system anomaly recovery mechanism for documents and video data in the stream computing system.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the functions of the system according to any one of claims 1 to 8 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the functions of a system according to any one of claims 1 to 8.
CN202011559972.6A 2020-12-25 2020-12-25 Stream computing system, electronic device thereof, and storage medium Active CN112667683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011559972.6A CN112667683B (en) 2020-12-25 2020-12-25 Stream computing system, electronic device thereof, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011559972.6A CN112667683B (en) 2020-12-25 2020-12-25 Stream computing system, electronic device thereof, and storage medium

Publications (2)

Publication Number Publication Date
CN112667683A true CN112667683A (en) 2021-04-16
CN112667683B CN112667683B (en) 2023-05-26

Family

ID=75408882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011559972.6A Active CN112667683B (en) 2020-12-25 2020-12-25 Stream computing system, electronic device thereof, and storage medium

Country Status (1)

Country Link
CN (1) CN112667683B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239081A (en) * 2021-05-21 2021-08-10 瀚云科技有限公司 Streaming data calculation method
CN113590443A (en) * 2021-07-29 2021-11-02 杭州玳数科技有限公司 Log acquisition and log monitoring method and device
CN115904722A (en) * 2022-12-14 2023-04-04 上海汇付支付有限公司 Big data real-time processing platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180074657A1 (en) * 2016-09-12 2018-03-15 International Business Machines Corporation Window management based on a set of computing resources in a stream computing environment
US20180165306A1 (en) * 2016-12-09 2018-06-14 International Business Machines Corporation Executing Queries Referencing Data Stored in a Unified Data Layer
CN110245158A (en) * 2019-06-10 2019-09-17 上海理想信息产业(集团)有限公司 A kind of multi-source heterogeneous generating date system and method based on Flink stream calculation technology
US10817334B1 (en) * 2017-03-14 2020-10-27 Twitter, Inc. Real-time analysis of data streaming objects for distributed stream processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180074657A1 (en) * 2016-09-12 2018-03-15 International Business Machines Corporation Window management based on a set of computing resources in a stream computing environment
US20180165306A1 (en) * 2016-12-09 2018-06-14 International Business Machines Corporation Executing Queries Referencing Data Stored in a Unified Data Layer
US10817334B1 (en) * 2017-03-14 2020-10-27 Twitter, Inc. Real-time analysis of data streaming objects for distributed stream processing
CN110245158A (en) * 2019-06-10 2019-09-17 上海理想信息产业(集团)有限公司 A kind of multi-source heterogeneous generating date system and method based on Flink stream calculation technology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙大为: "大数据流式计算:关键技术及***实例", 《软件学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239081A (en) * 2021-05-21 2021-08-10 瀚云科技有限公司 Streaming data calculation method
CN113590443A (en) * 2021-07-29 2021-11-02 杭州玳数科技有限公司 Log acquisition and log monitoring method and device
CN115904722A (en) * 2022-12-14 2023-04-04 上海汇付支付有限公司 Big data real-time processing platform

Also Published As

Publication number Publication date
CN112667683B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
US11182098B2 (en) Optimization for real-time, parallel execution of models for extracting high-value information from data streams
CN112667683A (en) Stream computing system, electronic device and storage medium therefor
US9419917B2 (en) System and method of semantically modelling and monitoring applications and software architecture hosted by an IaaS provider
Calheiros et al. On the effectiveness of isolation‐based anomaly detection in cloud data centers
US12008027B2 (en) Optimization for real-time, parallel execution of models for extracting high-value information from data streams
US9852041B2 (en) Systems and methods for categorizing exceptions and logs
US10459832B2 (en) How to track operator behavior via metadata
EP3002924B1 (en) Stream-based object storage solution for real-time applications
CN110995497A (en) Method for unified operation and maintenance in cloud computing environment, terminal device and storage medium
CN113656245B (en) Data inspection method and device, storage medium and processor
CN110912757B (en) Service monitoring method and server
CN112328448A (en) Zookeeper-based monitoring method, monitoring device, equipment and storage medium
US10331484B2 (en) Distributed data platform resource allocator
CN112307046A (en) Data acquisition method and device, computer readable storage medium and electronic equipment
Zhang et al. Efficient online surveillance video processing based on spark framework
CN116708219A (en) DPI platform-based data acquisition method and device
US10346626B1 (en) Versioned access controls
EP2674876A1 (en) Streaming analytics processing node and network topology aware streaming analytics system
US10771586B1 (en) Custom access controls
EP3380906A1 (en) Optimization for real-time, parallel execution of models for extracting high-value information from data streams
KR20170131007A (en) Apparatus for monitoring communication based on data distribution service
Komarek et al. Metric based cloud infrastructure monitoring
Di Martino et al. A comparison of two different approaches to cloud monitoring
Hölttä Enhancing the Capacity of Data Center Manager
Ward Efficient monitoring of large scale infrastructure as a service clouds

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant