CN106657099A - Spark data analysis service release system - Google Patents

Spark data analysis service release system Download PDF

Info

Publication number
CN106657099A
CN106657099A CN201611248761.4A CN201611248761A CN106657099A CN 106657099 A CN106657099 A CN 106657099A CN 201611248761 A CN201611248761 A CN 201611248761A CN 106657099 A CN106657099 A CN 106657099A
Authority
CN
China
Prior art keywords
service
data analysis
spark
data
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611248761.4A
Other languages
Chinese (zh)
Other versions
CN106657099B (en
Inventor
王莹
张立军
孙丙聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tianyuan Creative Technology Ltd
Original Assignee
Beijing Tianyuan Creative Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tianyuan Creative Technology Ltd filed Critical Beijing Tianyuan Creative Technology Ltd
Priority to CN201611248761.4A priority Critical patent/CN106657099B/en
Publication of CN106657099A publication Critical patent/CN106657099A/en
Application granted granted Critical
Publication of CN106657099B publication Critical patent/CN106657099B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Multimedia (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a data analysis service distribution system comprising a Spark data analysis module, a service scheduling module and a service standard setting module, wherein the service standard setting module is used for setting a unified service release standard; the service scheduling module is used for receiving a service request and sending the service request to idle service; and the Spark data analysis module is used for constructing a service container, and analyzing the service request according to the service release standard. As the unified service standard is set, a third party client or a business system performs big data analysis by calling the data analysis service, so that the business system and the big data analysis can be effectively isolated, and the development cost of the business system is reduced; and the operation environment of the service is a Spark distributed computing system, so that the speed and the efficiency of data analysis are greatly improved.

Description

A kind of Spark data analysis services delivery system
Technical field
The present invention relates to data analysis digging technology field, issues more particularly, to a kind of Spark data analysis services System.
Background technology
With the arrival of information age, the geometry that rolls up of data increases again.In order to excavate from existing mass data Effective information, occurs in that a variety of data analysis algorithms.In the actual mechanical process of data analysis, it is impossible to immediately determine that Most suitable algorithm, needs by constantly attempting different algorithms, or algorithm combination to obtain different result of calculations.Root According to contrasting to different result of calculation, so as to obtain the analysis result of optimal algorithm arrangement and optimum, to obtain most The data feedback information of effect.
Data analyst needs the principle for both understanding algorithm, and the specific code that algorithm is understood again is realized.Will to technical staff Ask higher, while when realizing different algorithm combination analyze datas, constantly adjustment coding is needed, it is relatively complicated.Current is mutual Networking comes into the information data epoch, and with the rapid growth of data, company, scientific research institution increasingly pay attention to from existing number According to middle excavation effective information, a variety of data mining architectural frameworks are occurred in that.
Seldom it is related to data mining in traditional business system, in order to adapt to the development of big data, traditional software company Cost very big time and cost is needed to remove creation analysis Mining Platform.
The content of the invention
The present invention provides a kind of data analysis service point for overcoming the problems referred to above or solving the above problems at least in part Match system, service form is unified, and rationally using cluster resource, is designed by Spark distributed structure/architecture, what structure was cheaply used Big data Analysis Service.
According to an aspect of the present invention, there is provided including Spark data analysis modules, service dispatch module, service standard Formulate module;The service standard formulates module to be used to formulate unified service promulgated standard;The service dispatch module is used for Receive service request and service request is sent into leisure service;The Spark data analysis modules are used to build service container, Process is analyzed to service request according to service promulgated standard.
Used as preferred, user checks information on services, adjustment service state using B/S frameworks by browser, and arranges Service execution form, service scale.
Used as preferred, the service standard formulates module and different algorithms is specified into uniform service standard, concrete bag Include service parameter, service result combination, service call pattern.
Used as preferred, the service dispatch module is additionally operable to that data analysis function is made the HTTP interface of opening API.
Used as preferred, the Spark data analysis modules include Spark data analysis units and distributed type assemblies;
The Spark data analysis units are used for by Spark distributed computing systems, and the service request to distributing is carried out Analytical calculation;
The distributed type assemblies are used for the running environment for providing Distributed Calculation for Spark data analysis units.
Used as preferred, the distributed type assemblies include Spark clusters and Hadoop clusters.
Used as preferred, the Spark data analysis units include that business subelement and flow process issue subelement;
The business subelement is used for according to service promulgated standard, and the algorithm random combine for realizing service request is depicted as Flow chart;
The flow process issues subelement to be used to be combined each node of flow chart, generates task, and by task system Service is made, process is analyzed to service request.
Used as preferred, the service dispatch module is used for the clustering information data by distributed type assemblies offer, according to Service request is sent to load balancing-random algorithm the service of free time.
Used as preferred, the service dispatch module is communicated by socket with service, and Content of Communication includes service Request data, service result data, service status data, service calculation procedure data.
A kind of data analysis service distribution system that the present invention is provided, by formulating uniform service standard, third party visitor Family or operation system carry out big data analysis by calling data analysis service, can effectively isolate operation system with big number According to analysis, the development cost of operation system is reduced;The running environment of service adopts Spark distributed computing systems, significantly carries The speed and efficiency of high data analysis.
Description of the drawings
Fig. 1 is the data analysis service distribution system configuration block diagram of the embodiment of the present invention.
Specific embodiment
With reference to the accompanying drawings and examples, the specific embodiment of the present invention is described in further detail.Hereinafter implement Example is not limited to the scope of the present invention for illustrating the present invention.
Fig. 1 shows a kind of data analysis service distribution system, including Spark data analysis modules, service dispatch module, Service standard formulates module;The service standard formulates module to be used to formulate unified service promulgated standard, specific to include clothes Business production standard, parameter Transfer Standards, result return standard, by this standard, ensure that the uniformity of service, convenient to use Family uses;Service request is sent to leisure service by the service dispatch module for receiving service request, distributes data analysis Task, balancing cluster resource, duty cycle are performed, service starts and are closed;The Spark data analysis modules are used to build clothes Business container, according to service promulgated standard process is analyzed to service request.The running environment of service adopts the distributed meters of Spark Calculation system.Spark distributed computing systems are one of cloud computing frameworks of main flow.By the way of cloud computing, increase substantially The speed and efficiency of data analysis.The running environment of service adopts Spark distributed computing systems, can realize to algorithm not Come analyzing and processing data, analysis process variation with sequential combination.
In the present embodiment, user checks information on services using B/S frameworks by browser, and such as service parameter, service is returned Return value combining form, service state, flow chart, service call daily record etc.;Adjustment service state, and service execution form is set, Such as timing is performed, the cycle performs;Service scale, such as concurrent number.
Used as preferred, the service standard formulates module and different algorithms is specified into uniform service standard, concrete bag Include service parameter, service result combination, service call pattern;By this standard, the uniformity of service, energy are ensure that Enough reduce user and use difficulty, be user-friendly to, improve the availability and operation system code reusability of service.
Used as preferred, the Spark data analysis modules include Spark data analysis units and distributed type assemblies;
The Spark data analysis units are used for by Spark distributed computing systems, and the service request to distributing is carried out Analytical calculation;
The distributed type assemblies are used for the running environment for providing Distributed Calculation for Spark data analysis units.
Used as preferred, the distributed type assemblies include Spark clusters and Hadoop clusters.
Used as preferred, the Spark data analysis units also include that business subelement and flow process issue subelement;
The business subelement is used for according to service standard, and the algorithm random combine for realizing service request is depicted as into flow process Figure;Relation comprising algorithm examples node and algorithm examples node in flow chart, the relation of algorithm examples node passes through algorithm Between line determining.
The flow process issues subelement to be used to be combined each node of flow chart, generates task, and by task system It is made service.
When there is service request, the cluster resource data that service dispatch module is provided by distributed data collection, according to negative Carry the service that service request is sent to equilibrium-random algorithm the free time;Service dispatch module records the current shape of each service State, using random algorithm, random call backstage leisure service.Because in the case of performing environment identical, from above Probability Say, with increasing for request, the called number of times of each service is substantially the same.
Used as preferred, the service dispatch module is communicated by socket with service, and Content of Communication includes service Request data, service result data, service status data, service calculation procedure data.
The invention provides a kind of Spark data analysis services delivery system, by will specify unified service mark is issued Standard, increases the extensive application of service, reduces the complexity that the generation and service of mistake are used, and by Spark data analyses Framework builds Data Analysis Platform to realize analytical calculation and analysis process, by the way of cloud computing, number is greatly improved According to the speed and efficiency of analysis;It is effectively isolated operation system to analyze with big data, the development cost of operation system is reduced, by data Analytic function is fabricated to the HTTP interface of opening API, facilitates third party to call.
Finally, the present processes are only preferably embodiment, are not intended to limit protection scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements made etc. should be included in the protection of the present invention Within the scope of.

Claims (9)

1. a kind of data analysis service distribution system, it is characterised in that including Spark data analysis modules, service dispatch module, Service standard formulates module;The service standard formulates module to be used to formulate unified service promulgated standard;The service dispatch Module is used to receive service request and service request is sent into leisure service;The Spark data analysis modules are used to build Service container, according to service promulgated standard process is analyzed to service request.
2. data analysis service distribution system according to claim 1, it is characterised in that also including B/S frameworks, user adopts Information on services, adjustment service state are checked by browser with B/S frameworks, and service execution form, service scale are set.
3. data analysis service distribution system according to claim 1, it is characterised in that the service standard formulates module Different algorithms is specified into uniform service standard, service parameter, service result combination, service call mould is specifically included Formula.
4. data analysis service distribution system according to claim 1, it is characterised in that the service dispatch module is also used In the HTTP interface that data analysis function is made opening API.
5. data analysis service distribution system according to claim 2, it is characterised in that the Spark data analyses mould Block includes Spark data analysis units and distributed type assemblies;
The Spark data analysis units are used for by Spark distributed computing systems, and the service request to distributing is analyzed Calculate;
The distributed type assemblies are used for the running environment for providing Distributed Calculation for Spark data analysis units.
6. data analysis service distribution system according to claim 5, it is characterised in that the distributed type assemblies include Spark clusters and Hadoop clusters.
7. data analysis service distribution system according to claim 5, it is characterised in that the Spark data analyses list Unit also includes that business and flow process issue subelement;
The business subelement is used for according to service promulgated standard, and the algorithm random combine for realizing service request is depicted as into flow process Figure;
The flow process issues subelement to be used to be combined each node of flow chart, generates task, and task is fabricated to Service, process is analyzed to service request.
8. data analysis service distribution system according to claim 5, it is characterised in that the service dispatch module is used for The clustering information data provided by distributed type assemblies, the free time is sent to according to load balancing-random algorithm by service request Service.
9. data analysis service distribution system according to claim 1, it is characterised in that the service dispatch module passes through Socket is communicated with service, and Content of Communication includes service data, service result data, service status data, service Calculation procedure data.
CN201611248761.4A 2016-12-29 2016-12-29 Spark data analysis service publishing system Active CN106657099B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611248761.4A CN106657099B (en) 2016-12-29 2016-12-29 Spark data analysis service publishing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611248761.4A CN106657099B (en) 2016-12-29 2016-12-29 Spark data analysis service publishing system

Publications (2)

Publication Number Publication Date
CN106657099A true CN106657099A (en) 2017-05-10
CN106657099B CN106657099B (en) 2020-06-16

Family

ID=58836389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611248761.4A Active CN106657099B (en) 2016-12-29 2016-12-29 Spark data analysis service publishing system

Country Status (1)

Country Link
CN (1) CN106657099B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427992A (en) * 2018-03-16 2018-08-21 济南飞象信息科技有限公司 A kind of machine learning training system and method based on edge cloud computing
CN109729086A (en) * 2018-12-28 2019-05-07 北京奇安信科技有限公司 Policy management method, system, equipment and medium
CN110288104A (en) * 2019-07-04 2019-09-27 北京百佑科技有限公司 O&M flow system, O&M workflow management method and device
CN111031123A (en) * 2019-12-10 2020-04-17 中盈优创资讯科技有限公司 Spark task submission method, system, client and server
CN112115202A (en) * 2020-09-18 2020-12-22 北京人大金仓信息技术股份有限公司 Task distribution method and device in cluster environment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120173476A1 (en) * 2011-01-04 2012-07-05 Nasir Rizvi System and Method for Rule-Based Asymmetric Data Reporting
CN105608160A (en) * 2015-12-21 2016-05-25 浪潮软件股份有限公司 Distributed big data analysis method
CN105930460A (en) * 2016-04-21 2016-09-07 重庆邮电大学 Multi-algorithm-integrated big data analysis middleware platform

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120173476A1 (en) * 2011-01-04 2012-07-05 Nasir Rizvi System and Method for Rule-Based Asymmetric Data Reporting
CN105608160A (en) * 2015-12-21 2016-05-25 浪潮软件股份有限公司 Distributed big data analysis method
CN105930460A (en) * 2016-04-21 2016-09-07 重庆邮电大学 Multi-algorithm-integrated big data analysis middleware platform

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427992A (en) * 2018-03-16 2018-08-21 济南飞象信息科技有限公司 A kind of machine learning training system and method based on edge cloud computing
CN109729086A (en) * 2018-12-28 2019-05-07 北京奇安信科技有限公司 Policy management method, system, equipment and medium
CN109729086B (en) * 2018-12-28 2021-02-23 奇安信科技集团股份有限公司 Policy management method, system, device, and medium
CN110288104A (en) * 2019-07-04 2019-09-27 北京百佑科技有限公司 O&M flow system, O&M workflow management method and device
CN111031123A (en) * 2019-12-10 2020-04-17 中盈优创资讯科技有限公司 Spark task submission method, system, client and server
CN112115202A (en) * 2020-09-18 2020-12-22 北京人大金仓信息技术股份有限公司 Task distribution method and device in cluster environment

Also Published As

Publication number Publication date
CN106657099B (en) 2020-06-16

Similar Documents

Publication Publication Date Title
CN106657099A (en) Spark data analysis service release system
CN106850788B (en) Integrated framework and integrated approach towards multi-source heterogeneous geographic information resources
CN105809356A (en) Information system resource management method based on application integrated cloud platform
CN109831478A (en) Rule-based and model distributed processing intelligent decision system and method in real time
CN108932588B (en) Hydropower station group optimal scheduling system with separated front end and rear end and method
CN105049218B (en) PhiCloud clouds charging method and system
CN106777227A (en) Multidimensional data convergence analysis system and method based on cloud platform
CN103023980B (en) A kind of method and system of cloud platform processes user service request
CN102300011A (en) Automated mechanism for populating and maintaining data structures in queueless contact center
CN106375480A (en) Electric energy data real-time acquisition system and method based on distributed system
CN103198099A (en) Cloud-based data mining application method facing telecommunication service
CN103544060A (en) WEBSERVICE based service dispatching system and method
CN109361737A (en) Agricultural supervisory system based on Internet of Things
CN103744880B (en) A kind of DNA data managing methods and system based on cloud computing
CN106408490A (en) Active work order processing method and active work order processing apparatus
CN110505301A (en) A kind of aeronautical manufacture workshop industry big data processing frame
CN103152428A (en) Method for carrying out service communication among nodes on cloud platform
CN106131186A (en) A kind of power information acquisition interface adjustment method based on Redis distributed caching
CN115858672A (en) Power terminal management method and device, electronic equipment and storage medium
CN109857965A (en) Products of Meteorological Services publisher server control system and method based on SOA
Xie et al. Research on Information Sharing System of Digital Library in Cloud Computing Environment
CN109359146A (en) A kind of automating ETL data processing tools and its application method
CN109150938A (en) Satellite application public service platform based on cloud service
CN114596046A (en) Integrated platform based on unified digital model of business center station and data center station
Gargees et al. Multi-stage distributed computing for big data: Evaluating connective topologies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant