CN106657099A

CN106657099A - Spark data analysis service release system

Info

Publication number: CN106657099A
Application number: CN201611248761.4A
Authority: CN
Inventors: 王莹; 张立军; 孙丙聪
Original assignee: Beijing Tianyuan Creative Technology Ltd
Current assignee: Beijing Tianyuan Creative Technology Ltd
Priority date: 2016-12-29
Filing date: 2016-12-29
Publication date: 2017-05-10
Anticipated expiration: 2036-12-29
Also published as: CN106657099B

Abstract

The invention provides a data analysis service distribution system comprising a Spark data analysis module, a service scheduling module and a service standard setting module, wherein the service standard setting module is used for setting a unified service release standard; the service scheduling module is used for receiving a service request and sending the service request to idle service; and the Spark data analysis module is used for constructing a service container, and analyzing the service request according to the service release standard. As the unified service standard is set, a third party client or a business system performs big data analysis by calling the data analysis service, so that the business system and the big data analysis can be effectively isolated, and the development cost of the business system is reduced; and the operation environment of the service is a Spark distributed computing system, so that the speed and the efficiency of data analysis are greatly improved.

Description

A kind of Spark data analysis services delivery system

Technical field

The present invention relates to data analysis digging technology field, issues more particularly, to a kind of Spark data analysis services System.

Background technology

With the arrival of information age, the geometry that rolls up of data increases again.In order to excavate from existing mass data Effective information, occurs in that a variety of data analysis algorithms.In the actual mechanical process of data analysis, it is impossible to immediately determine that Most suitable algorithm, needs by constantly attempting different algorithms, or algorithm combination to obtain different result of calculations.Root According to contrasting to different result of calculation, so as to obtain the analysis result of optimal algorithm arrangement and optimum, to obtain most The data feedback information of effect.

Data analyst needs the principle for both understanding algorithm, and the specific code that algorithm is understood again is realized.Will to technical staff Ask higher, while when realizing different algorithm combination analyze datas, constantly adjustment coding is needed, it is relatively complicated.Current is mutual Networking comes into the information data epoch, and with the rapid growth of data, company, scientific research institution increasingly pay attention to from existing number According to middle excavation effective information, a variety of data mining architectural frameworks are occurred in that.

Seldom it is related to data mining in traditional business system, in order to adapt to the development of big data, traditional software company Cost very big time and cost is needed to remove creation analysis Mining Platform.

The content of the invention

The present invention provides a kind of data analysis service point for overcoming the problems referred to above or solving the above problems at least in part Match system, service form is unified, and rationally using cluster resource, is designed by Spark distributed structure/architecture, what structure was cheaply used Big data Analysis Service.

According to an aspect of the present invention, there is provided including Spark data analysis modules, service dispatch module, service standard Formulate module；The service standard formulates module to be used to formulate unified service promulgated standard；The service dispatch module is used for Receive service request and service request is sent into leisure service；The Spark data analysis modules are used to build service container, Process is analyzed to service request according to service promulgated standard.

Used as preferred, user checks information on services, adjustment service state using B/S frameworks by browser, and arranges Service execution form, service scale.

Used as preferred, the service standard formulates module and different algorithms is specified into uniform service standard, concrete bag Include service parameter, service result combination, service call pattern.

Used as preferred, the service dispatch module is additionally operable to that data analysis function is made the HTTP interface of opening API.

Used as preferred, the Spark data analysis modules include Spark data analysis units and distributed type assemblies；

The Spark data analysis units are used for by Spark distributed computing systems, and the service request to distributing is carried out Analytical calculation；

The distributed type assemblies are used for the running environment for providing Distributed Calculation for Spark data analysis units.

Used as preferred, the distributed type assemblies include Spark clusters and Hadoop clusters.

Used as preferred, the Spark data analysis units include that business subelement and flow process issue subelement；

The business subelement is used for according to service promulgated standard, and the algorithm random combine for realizing service request is depicted as Flow chart；

The flow process issues subelement to be used to be combined each node of flow chart, generates task, and by task system Service is made, process is analyzed to service request.

Used as preferred, the service dispatch module is used for the clustering information data by distributed type assemblies offer, according to Service request is sent to load balancing-random algorithm the service of free time.

Used as preferred, the service dispatch module is communicated by socket with service, and Content of Communication includes service Request data, service result data, service status data, service calculation procedure data.

A kind of data analysis service distribution system that the present invention is provided, by formulating uniform service standard, third party visitor Family or operation system carry out big data analysis by calling data analysis service, can effectively isolate operation system with big number According to analysis, the development cost of operation system is reduced；The running environment of service adopts Spark distributed computing systems, significantly carries The speed and efficiency of high data analysis.

Description of the drawings

Fig. 1 is the data analysis service distribution system configuration block diagram of the embodiment of the present invention.

Specific embodiment

With reference to the accompanying drawings and examples, the specific embodiment of the present invention is described in further detail.Hereinafter implement Example is not limited to the scope of the present invention for illustrating the present invention.

Fig. 1 shows a kind of data analysis service distribution system, including Spark data analysis modules, service dispatch module, Service standard formulates module；The service standard formulates module to be used to formulate unified service promulgated standard, specific to include clothes Business production standard, parameter Transfer Standards, result return standard, by this standard, ensure that the uniformity of service, convenient to use Family uses；Service request is sent to leisure service by the service dispatch module for receiving service request, distributes data analysis Task, balancing cluster resource, duty cycle are performed, service starts and are closed；The Spark data analysis modules are used to build clothes Business container, according to service promulgated standard process is analyzed to service request.The running environment of service adopts the distributed meters of Spark Calculation system.Spark distributed computing systems are one of cloud computing frameworks of main flow.By the way of cloud computing, increase substantially The speed and efficiency of data analysis.The running environment of service adopts Spark distributed computing systems, can realize to algorithm not Come analyzing and processing data, analysis process variation with sequential combination.

In the present embodiment, user checks information on services using B/S frameworks by browser, and such as service parameter, service is returned Return value combining form, service state, flow chart, service call daily record etc.；Adjustment service state, and service execution form is set, Such as timing is performed, the cycle performs；Service scale, such as concurrent number.

Used as preferred, the service standard formulates module and different algorithms is specified into uniform service standard, concrete bag Include service parameter, service result combination, service call pattern；By this standard, the uniformity of service, energy are ensure that Enough reduce user and use difficulty, be user-friendly to, improve the availability and operation system code reusability of service.

Used as preferred, the Spark data analysis units also include that business subelement and flow process issue subelement；

The business subelement is used for according to service standard, and the algorithm random combine for realizing service request is depicted as into flow process Figure；Relation comprising algorithm examples node and algorithm examples node in flow chart, the relation of algorithm examples node passes through algorithm Between line determining.

The flow process issues subelement to be used to be combined each node of flow chart, generates task, and by task system It is made service.

When there is service request, the cluster resource data that service dispatch module is provided by distributed data collection, according to negative Carry the service that service request is sent to equilibrium-random algorithm the free time；Service dispatch module records the current shape of each service State, using random algorithm, random call backstage leisure service.Because in the case of performing environment identical, from above Probability Say, with increasing for request, the called number of times of each service is substantially the same.

The invention provides a kind of Spark data analysis services delivery system, by will specify unified service mark is issued Standard, increases the extensive application of service, reduces the complexity that the generation and service of mistake are used, and by Spark data analyses Framework builds Data Analysis Platform to realize analytical calculation and analysis process, by the way of cloud computing, number is greatly improved According to the speed and efficiency of analysis；It is effectively isolated operation system to analyze with big data, the development cost of operation system is reduced, by data Analytic function is fabricated to the HTTP interface of opening API, facilitates third party to call.

Finally, the present processes are only preferably embodiment, are not intended to limit protection scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements made etc. should be included in the protection of the present invention Within the scope of.

Claims

1. a kind of data analysis service distribution system, it is characterised in that including Spark data analysis modules, service dispatch module, Service standard formulates module；The service standard formulates module to be used to formulate unified service promulgated standard；The service dispatch Module is used to receive service request and service request is sent into leisure service；The Spark data analysis modules are used to build Service container, according to service promulgated standard process is analyzed to service request.

2. data analysis service distribution system according to claim 1, it is characterised in that also including B/S frameworks, user adopts Information on services, adjustment service state are checked by browser with B/S frameworks, and service execution form, service scale are set.

3. data analysis service distribution system according to claim 1, it is characterised in that the service standard formulates module Different algorithms is specified into uniform service standard, service parameter, service result combination, service call mould is specifically included Formula.

4. data analysis service distribution system according to claim 1, it is characterised in that the service dispatch module is also used In the HTTP interface that data analysis function is made opening API.

5. data analysis service distribution system according to claim 2, it is characterised in that the Spark data analyses mould Block includes Spark data analysis units and distributed type assemblies；

The Spark data analysis units are used for by Spark distributed computing systems, and the service request to distributing is analyzed Calculate；

6. data analysis service distribution system according to claim 5, it is characterised in that the distributed type assemblies include Spark clusters and Hadoop clusters.

7. data analysis service distribution system according to claim 5, it is characterised in that the Spark data analyses list Unit also includes that business and flow process issue subelement；

The business subelement is used for according to service promulgated standard, and the algorithm random combine for realizing service request is depicted as into flow process Figure；

The flow process issues subelement to be used to be combined each node of flow chart, generates task, and task is fabricated to Service, process is analyzed to service request.

8. data analysis service distribution system according to claim 5, it is characterised in that the service dispatch module is used for The clustering information data provided by distributed type assemblies, the free time is sent to according to load balancing-random algorithm by service request Service.

9. data analysis service distribution system according to claim 1, it is characterised in that the service dispatch module passes through Socket is communicated with service, and Content of Communication includes service data, service result data, service status data, service Calculation procedure data.