CN108600300A - Daily record data processing method and processing device - Google Patents

Daily record data processing method and processing device Download PDF

Info

Publication number
CN108600300A
CN108600300A CN201810184438.8A CN201810184438A CN108600300A CN 108600300 A CN108600300 A CN 108600300A CN 201810184438 A CN201810184438 A CN 201810184438A CN 108600300 A CN108600300 A CN 108600300A
Authority
CN
China
Prior art keywords
data processing
dpp
clusters
daily record
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810184438.8A
Other languages
Chinese (zh)
Other versions
CN108600300B (en
Inventor
李士超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Si Kong Science And Technology Co Ltd
Original Assignee
Beijing Si Kong Science And Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Si Kong Science And Technology Co Ltd filed Critical Beijing Si Kong Science And Technology Co Ltd
Priority to CN201810184438.8A priority Critical patent/CN108600300B/en
Publication of CN108600300A publication Critical patent/CN108600300A/en
Application granted granted Critical
Publication of CN108600300B publication Critical patent/CN108600300B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of daily record data processing method and processing devices.Wherein, this method includes:Data processing platform (DPP) obtains the daily record data of predetermined cache content, handles the daily record data of acquisition, and then obtains processing message, and gives obtained processing news release to Kafka clusters, wherein Kafka clusters subscribe to processing message for CND control platforms.The present invention solves data transmission poor in timeliness in the related technology and the big technical problem of peak period pressure.

Description

Daily record data processing method and processing device
Technical field
The present invention relates to fields of communication technology, in particular to a kind of daily record data processing method and processing device.
Background technology
As provide basic content distribution network CDN (Content Ddistribute Network) service supplier, For a certain specific domain name, the page or content caching, the CDN service chamber of commerce periodically generates the daily record number for accessing the cache contents According to, and need to calculate daily record data extraction bandwidth traffic information by analysis.
CDN control platforms are to be based on CDN technologies, collection management, and control, flow bandwidth monitoring are integrated comprehensive with cost accounting Conjunction property management system.During operation, need from CDN service quotient side obtain daily record data, be submitted to data processing platform (DPP) into Row log analysis is counted with result, then result of calculation is synchronized to CDN control platforms by certain way.It is main in the related technology Using File Transfer Protocol FTP (File Transfer Protocol) files or RESTful web Service interfaces come real Existing synchrodata, wherein FTP modes are faced with that timeliness is low, the larger problem of security risk, and based on http protocol RESTful WebSevice interfaces then have single transmission data capacity limited, when peak the response time increase, be easy to cause service The risk of device delay machine.
For above-mentioned problem, currently no effective solution has been proposed.
Invention content
An embodiment of the present invention provides a kind of daily record data processing method and processing devices, at least to solve data in the related technology Transmit poor in timeliness and the big technical problem of peak period pressure.
One side according to the ... of the embodiment of the present invention provides a kind of daily record data processing method, including:Data processing is flat Platform obtains the daily record data of predetermined cache content;The data processing platform (DPP) handles the daily record data of acquisition, obtains To processing message;The data processing platform (DPP) gives the obtained processing news release to Kafka clusters, wherein the Kafka Cluster subscribes to the processing message for CDN control platforms.
Optionally, the obtained processing news release is included by the data processing platform (DPP) to the Kafka clusters:It is right The processing message distributes partition identification;The data processing platform (DPP) is according to the partition identification of distribution, described in obtaining News release is handled to the Kafka clusters.
Optionally, the Kafka clusters subscribe to the processing message for the CDN control platforms by Storm clusters.
One side according to the ... of the embodiment of the present invention additionally provides another daily record data processing method, including:Content point Hairnet network CDN control platforms subscribe to the processing message of data processing platform (DPP) publication by Storm clusters, wherein the processing disappears Breath is handled to obtain by the data processing platform (DPP) to the daily record data of predetermined cache content;The CDN control platforms are to ordering The handling result that the processing message read is handled is stored.
Optionally, the institute of the data processing platform (DPP) publication is subscribed to by the Storm clusters in the CDN control platforms Before stating processing message, further include:The CDN control platforms adjust the Storm clusters by zookeeper services Match.
Optionally, the CDN control platforms are subscribed to by the Storm clusters described in the data processing platform (DPP) publication Handling message includes:The CDN control platforms subscribe to the data processing platform (DPP) by the Storm clusters and pass through Kafka collection Mass-send the processing message of cloth.
Another aspect according to the ... of the embodiment of the present invention additionally provides a kind of daily record data processing unit, is applied at data Platform, including:Acquisition module, the daily record data for obtaining predetermined cache content;Processing module, for described in acquisition Daily record data is handled, and processing message is obtained;Release module, for giving the obtained processing news release to Kafka collection Group, wherein the Kafka clusters subscribe to the processing message for CDN control platforms.
Optionally, the Kafka clusters subscribe to the processing message for the CDN control platforms by Storm clusters.
Another aspect according to the ... of the embodiment of the present invention additionally provides another daily record data processing unit, is applied to content Distribution network CDN control platforms, including:Subscribing module, the processing for subscribing to data processing platform (DPP) publication by Storm clusters Message, wherein the processing message is handled to obtain by the data processing platform (DPP) to the daily record data of predetermined cache content; Memory module, the handling result handled for the processing message to subscription store.
Optionally, the subscribing module is additionally operable to pass through by the Storm clusters subscription data processing platform (DPP) The processing message of Kafka clusters publication.
In embodiments of the present invention, flat by data processing using Kafka clusters in such a way that Storm clusters are combined Platform obtains the daily record data of predetermined cache content, is handled the daily record data of acquisition to obtain processing message, and then will obtain Processing news release give Kafka clusters, and subscribe to what data processing platform (DPP) was issued by Kafka clusters by Storm clusters Message is handled, has achieved the purpose that provide the data transmission pre-processing service that real-time is good, availability is high, timeliness is strong, in turn Solve data transmission poor in timeliness in the related technology and the big technical problem of peak period pressure.
Description of the drawings
Attached drawing described herein is used to provide further understanding of the present invention, and is constituted part of this application, this hair Bright illustrative embodiments and their description are not constituted improper limitations of the present invention for explaining the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of according to embodiments of the present invention 1 daily record data processing method;
Fig. 2 is a kind of structural schematic diagram of according to embodiments of the present invention 2 daily record data processing unit;
Fig. 3 is the flow chart of according to embodiments of the present invention 3 daily record data processing method;
Fig. 4 is a kind of structural schematic diagram of according to embodiments of the present invention 4 daily record data processing unit;
Fig. 5 be according to embodiments of the present invention 5 one kind be based on streaming real-time operation frame improve CDN control platform synchronous belts The flow chart of the method for wide data on flows.
Specific implementation mode
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The every other embodiment that member is obtained without making creative work should all belong to the model that the present invention protects It encloses.
It should be noted that term " first " in description and claims of this specification and above-mentioned attached drawing, " Two " etc. be for distinguishing similar object, without being used to describe specific sequence or precedence.It should be appreciated that using in this way Data can be interchanged in the appropriate case, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " comprising " and " having " and their any deformation, it is intended that cover It includes to be not necessarily limited to for example, containing the process of series of steps or unit, method, system, product or equipment to cover non-exclusive Those of clearly list step or unit, but may include not listing clearly or for these processes, method, product Or the other steps or unit that equipment is intrinsic.
First, the part noun or term occurred during the embodiment of the present application is described is suitable for following solution It releases:
FTP:File Transfer Protocol, the transmitted in both directions for the control file on Internet.Based on different operation systems System has different FTP application programs, and all these application programs are in compliance with same agreement to transmit file.
Storm:Distributed real-time Computational frame, possesses the characteristics such as low latency, high-performance, distribution, expansible, fault-tolerant, And then possess ensure message do not lose, Message Processing is strictly orderly, support multilingual exploitation the advantages that.It can be task Different types of component, each component is entrusted to be responsible for handling a simple specific task.
Kafka:Distributed post subscribes to message system, mainly for the treatment of active stream data.With height write-in speed Degree, high reliability, high power capacity, can persistence the advantages that.
Embodiment 1
In the related technology mainly using File Transfer Protocol FTP (File Transfer Protocol) files or RESTful web Service interfaces realize synchrodata, and wherein FTP modes are faced with that timeliness is low, and security risk is larger Problem, and the RESTful WebSevice interfaces based on http protocol then have single transmission data capacity limited, when peak, respond Time increases, and is easy to cause the risk of server delay machine.And then CDN control platforms is caused to carry out data biography with data processing platform (DPP) When defeated, the problem that data transmission efficiency is low, unstable is easy tod produce.
In order to solve the above-mentioned technical problem, present applicant proposes a kind of daily record data processing method, Fig. 1 is according to the present invention The flow chart of the daily record data processing method of embodiment, as shown in Figure 1, this method comprises the following steps:
Step S102, data processing platform (DPP) obtain the daily record data of predetermined cache content.
In daily record data processing procedure, the relevant treatment amount of CDN service quotient is bigger.For a certain specific domain Name, the page or content caching, the CDN service chamber of commerce periodically generates the access cache contents daily record data (for example, it may be with The form of file exists, you can with for a kind of Log Data File), and need to extract bandwidth stream by analysis calculating daily record data Measure information.
Wherein, CDN is the content distributing network built on network, by the Edge Server for being deployed in various regions, is led to The function modules such as the load balancing, content distribution, scheduling of central platform are crossed, so that user is obtained required content nearby, reduces network Congestion improves user's access response speed and hit rate.Its main method is widely used various cache servers, these are delayed It deposits in area or the network that server distribution accesses Relatively centralized to user, when user accesses website, skill is loaded using the overall situation The access of user is directed toward on the nearest cache server working properly of distance by art, is asked directly in response to user by cache server It asks.
Step S104, data processing platform (DPP) handle the daily record data of acquisition, obtain processing message.
It, can be by data processing platform (DPP) (for example, it may be big data computing platform, that is, be used in a kind of alternative Big data calculates the platform of scene) it obtains the daily record data of predetermined cache content and is handled.
Wherein, for the related service of CDN, CDN control platforms are to be based on CDN technologies, collection management, control, flow bandwidth The comprehension management system that monitoring is integrated with cost accounting.During operation, need to obtain daily record number from CDN service quotient side According to, then be submitted to data processing platform (DPP) and carry out log analysis and result statistics, and will calculate and tie again by data processing platform (DPP) Fruit is synchronized to CDN control platforms by certain way.Wherein, CDN control platforms can pass through application programming interfaces API The Log Data File for the mode download service quotient side that (Application Programming Interface) is called, and carry Data processing platform (DPP) is handed over to carry out log analysis calculating.
It is small and high for data volume when using FTP mode synchronous documents poor in timeliness in the related technology and using HTTP interface The concurrent pressure in peak big problem when CDN control platforms carry out data transmission with data processing platform (DPP), easy tos produce data transmission effect Low, the unstable problem of rate.
Step S106, data processing platform (DPP) give obtained processing news release to Kafka clusters, wherein Kafka clusters are used Processing message is subscribed in CDN control platforms.
And through the above steps, data processing platform (DPP) may be implemented will be according to the daily record data of the predetermined cache content of acquisition Processing message that treated is distributed to Kafka clusters, and then subscribes to above-mentioned processing in CDN control platforms by Kafka clusters and disappear Breath.The high writing speed that is possessed by Kafka, high reliability, high power capacity, can persistence the advantages of, data processing may be implemented The technique effect that writing speed is high when platform subscribes to the message of publication, reliability is high.It efficiently solves in the related technology The problem that data volume is small and the concurrent pressure in peak is big when using FTP mode synchronous documents poor in timeliness and using HTTP interface.
As a kind of optional embodiment, step S106 gives obtained processing news release to Kafka clusters, can wrap Include following steps:
Step S1061, data processing platform (DPP) distribute partition identification to processing message;
Step S1062, data processing platform (DPP) is according to the partition identification of distribution, by obtained processing news release to Kafka Cluster.
CDN control platforms need to obtain daily record data from CDN service quotient side, are submitted to data processing during operation Platform carries out log analysis and is counted with result, then result of calculation is synchronized to CDN control platforms by certain way.Further , distributed message ordering system Kafka clusters can be disposed in data processing platform (DPP), wherein big data handles service provider The log analysis file routine of side handles the daily record data of acquisition, obtains processing message, therefore it will divide as the producer Result after analysis be assembled into containing theme, operator, area information, timestamp and bandwidth traffic data theme message, publication Give Kafka group systems.
Because in Kafka clusters, subregion is more, and handling capacity is higher, thus it is preferred, it can be according to system needs, in Kafka Multiple subregions are set in cluster.And then by this alternative embodiment, for treated theme Topic message, (i.e. above-mentioned processing disappears Breath) distribution partition identification, and according to the partition identification of distribution, by obtained processing news release to the mode of Kafka clusters, both The speed of service can be improved, and realizes the effect of load balancing, reduces the server operating pressure of data transmission peak period.It is excellent Selection of land, can be in such a way that specified partition key Partitioning Key be as partition identification.
Embodiment as one preferred, the Kafka clusters in the embodiment of the present invention can be used for CDN control platforms and pass through Storm clusters subscribe to processing message.
Specifically, real-time Computational frame Storm clusters can be disposed in CDN control platforms.Storm clusters are as theme Topic consumer subscribes to the processing message in Kafka clusters by KafkaSpout components, and data are carried out according still further to business rule Handling result is saved in the database Mysql/MongoDB of CDN control platforms after processing.Later, it can also be managed by CDN The api interface of control platform is called the handling result, and then can become in the data of front page layout real-time display service provider Change.
Because Storm possesses the characteristics such as low latency, high-performance, distribution, expansible, fault-tolerant, and then having ensures message not The advantages that loss, Message Processing is strictly orderly, support multilingual exploitation, therefore be combined with Storm clusters by Kafka clusters Mode handled to obtain processing message to the daily record data of acquisition by obtaining the daily record data of predetermined cache content, into And give obtained processing news release to Kafka clusters, and data processing platform (DPP) is subscribed to by Storm clusters and passes through Kafka collection The processing message for mass-sending cloth has reached the mesh for providing the data transmission pre-processing service that real-time is good, availability is high, timeliness is strong , and then solve data transmission poor in timeliness in the related technology and the big technical problem of peak period pressure.
It should be noted that according to embodiments of the present invention, the embodiment of the method for the above-mentioned data limitation imprinting provided, attached The step of flow of figure illustrates can execute in the computer system of such as a group of computer-executable instructions, though also, So logical order is shown in flow charts, but in some cases, it can be with different from shown by sequence execution herein Or the step of description.
Wherein, all or part of the technical solution of the embodiment of the present invention can embody in form of a computer software product Out, which can be stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or network equipment etc.) executes all or part of step of present invention method Suddenly.And storage medium above-mentioned includes:USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory The various media that can store program code such as (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD.
Embodiment 2
According to the embodiment of the present application, a kind of daily record data processing unit for implementing above-described embodiment 1, figure are additionally provided 2 be a kind of structural schematic diagram of daily record data processing unit according to the ... of the embodiment of the present invention, as shown in Fig. 2, the device can answer For data processing platform (DPP), including:
Acquisition module 22, the daily record data for obtaining predetermined cache content;
Processing module 24 is connected to above-mentioned acquisition module 22, handles, is handled for the daily record data to acquisition Message;
Release module 26 is connected to above-mentioned processing module 24, and the processing news release for will obtain gives Kafka clusters, Wherein, Kafka clusters subscribe to processing message for CDN control platforms.
Herein it should be noted that above-mentioned acquisition module 22, processing module 24 and release module 26, correspond to embodiment 1 In step S102 to step S106, the example and application scenarios that three modules and corresponding step are realized be identical but unlimited In 1 disclosure of that of above-described embodiment.
CDN control platforms are to be based on CDN technologies, collection management, and control, flow bandwidth monitoring are integrated comprehensive with cost accounting Conjunction property management system.During operation, need from CDN service quotient side obtain daily record data, be submitted to data processing platform (DPP) into Row log analysis is counted with result, then statistical result is synchronized to CDN control platforms by certain way.Wherein, CDN management and control is flat Platform can by way of API Calls download service quotient side Log Data File, and submit data processing platform (DPP) carry out daily record Analysis calculates.Therefore in its operational process, volume of transmitted data is big.
It is small and high for data volume when using FTP mode synchronous documents poor in timeliness in the related technology and using HTTP interface The concurrent pressure in peak big problem when CDN control platforms carry out data transmission with data processing platform (DPP), easy tos produce data transmission effect Low, the unstable problem of rate.
As a kind of optional technical solution, can be set by the function of correlation module in above-mentioned daily record data processing unit It sets, the data processing platform (DPP) carried out data transmission with CDN control platforms may be implemented, by the daily record data processing unit by root According to acquisition predetermined cache content daily record data treated processing message, be distributed to Kafka clusters, and then pass through Kafka Cluster subscribes to above-mentioned processing message in CDN control platforms.By Kafka high writing speeds, high reliability, high power capacity, can be lasting The technique effect that writing speed is high when data processing platform (DPP) news release is subscribed to, reliability is high may be implemented in the advantages of change.
As a kind of optional embodiment, above-mentioned release module 26 may include:
Allocation unit, for distributing partition identification to processing message;
Release unit is connected to above-mentioned allocation unit, and for the partition identification according to distribution, obtained processing message is sent out Cloth gives Kafka clusters.
Herein it should be noted that above-mentioned allocation unit and release unit, correspond to step S1021 in embodiment 1 extremely Step S1022, two units are identical as example and application scenarios that corresponding step is realized, but are not limited to the above embodiments 1 Disclosure of that.
Embodiment as one preferred, in the embodiment of the present invention, Kafka clusters can be used for CDN control platforms and pass through Storm clusters subscribe to processing message.
Because Storm possesses the characteristics such as low latency, high-performance, distribution, expansible, fault-tolerant, and then possesses and ensure message not The advantages that loss, Message Processing is strictly orderly, support multilingual exploitation, therefore be combined with Storm clusters by Kafka clusters Mode handled to obtain processing message to the daily record data of acquisition by obtaining the daily record data of predetermined cache content, into And give obtained processing news release to Kafka clusters, and data processing platform (DPP) is subscribed to by Storm clusters and passes through Kafka collection The processing message for mass-sending cloth has reached the mesh for providing the data transmission pre-processing service that real-time is good, availability is high, timeliness is strong , and then solve data transmission poor in timeliness in the related technology and the big technical problem of peak period pressure.
Embodiment 3
In the related technology mainly using File Transfer Protocol FTP (File Transfer Protocol) files or RESTful web Service interfaces realize synchrodata, and wherein FTP modes are faced with that timeliness is low, and security risk is larger Problem, and the RESTful WebSevice interfaces based on http protocol then have single transmission data capacity limited, when peak, respond Time increases, and is easy to cause the risk of server delay machine.And then CDN control platforms is caused to carry out data biography with data processing platform (DPP) When defeated, the problem that data transmission efficiency is low, unstable is easy tod produce.
In order to solve the above-mentioned technical problem, the application also proposed another daily record data processing method, and Fig. 3 is according to this The flow chart of the daily record data processing method of inventive embodiments, as shown in figure 3, this method comprises the following steps:
Step S302, CDN control platform subscribes to the processing message of data processing platform (DPP) publication by Storm clusters, wherein Processing message is handled to obtain by data processing platform (DPP) to the daily record data of predetermined cache content;
The handling result that step S304, CDN control platform handles the processing message of subscription stores.
In a kind of optional embodiment, there are the CDN control platforms that mass data is transmitted with data processing platform (DPP), at it It during operation, needs to obtain daily record data from CDN service quotient side, is submitted to data processing platform (DPP) and carries out log analysis and result Statistics, then statistical result is synchronized to CDN control platforms by certain way.Because synchronizing text using FTP modes in the related technology Part poor in timeliness and using data volume is small and the concurrent pressure in peak is big problem when HTTP interface, CDN control platforms at data When platform carries out data transmission, the problem that data transmission efficiency is low, unstable is easy tod produce.
And low latency, high-performance, distribution, the characteristics such as expansible, fault-tolerant possessed by Storm, it can be in CDN management and control The real-time Computational frame Storm clusters of Platform deployment.The processing message of data processing platform (DPP) publication is subscribed to by Storm clusters, and The handling result handled the processing message of subscription by CDN control platforms stores, and then may be implemented to subscribe to The technique effect that writing speed is high when the message of data processing platform (DPP) publication, reliability is high.
As a kind of optional embodiment, in step S302, CDN control platforms subscribe to data processing by Storm clusters Before the processing message of platform publication, daily record data processing can also include the following steps:
Step S301, CDN control platform allocates Storm clusters by zookeeper services.
Wherein, reliably coordinate system zookeeper as the software for providing Consistency service for Distributed Application, provide Function include:Configuring maintenance, distributed synchronization, group service etc..Specifically, the target of zookeeper is packaged complexity And be easy error key service, the system of interface and performance efficiency easy to use, function-stable is supplied to user.Therefore In this alternative embodiment, Storm clusters are allocated by using zookeeper services, Storm clusters may be implemented and order Read efficient, reliable technique effect when processing message.
As a kind of optional embodiment, step S302, CDN control platform is subscribed to data processing by Storm clusters and is put down The processing message of platform publication, may include steps of:
Step S3021, CDN control platform subscribes to what data processing platform (DPP) was issued by Kafka clusters by Storm clusters Handle message.
Specifically, in CDN operations, distributed message ordering system Kafka clusters can be disposed in data processing platform (DPP), The log analysis file routine of big data processing service provider side handles the daily record data of acquisition, obtains processing message, therefore Result after analysis is assembled into containing theme, operator, area information, timestamp and bandwidth traffic number by it as the producer According to Topic message, while specify a Partitioning Key as partition identification, to realize the function of load balancing, It is distributed to Kafka group systems.
And then Storm clusters can be subscribed to as the consumer of theme Topic message by KafkaSpout components Result is saved in CDN control platforms by the Topic message in Kafka clusters according still further to after business rule progress data processing In database Mysql/MongoDB, called by CDN control platform api interfaces, and then can be taken in front page layout real-time display The data variation of business quotient.
Because Kafka have the advantages that high writing speed, high reliability, high power capacity, can persistence, data processing may be implemented The technique effect that writing speed is high when platform news release is subscribed to, reliability is high.Therefore pass through Kafka clusters and Storm collection faciations In conjunction with mode, the processing message issued by Kafka clusters of data processing platform (DPP) is subscribed to by Storm clusters, and to subscribing to The handling result that is handled of processing message stored, realize CDN control platforms and carried out with data processing platform (DPP) The technique effect that data transmission pretreatment real-time is good in data transmission, availability is high, timeliness is strong, and then solve related skill Data during operation transmits poor in timeliness and the big technical problem of peak period pressure.
Embodiment 4
According to the embodiment of the present application, a kind of daily record data processing unit for implementing above-described embodiment 3, figure are additionally provided 4 be a kind of structural schematic diagram of daily record data processing unit according to the ... of the embodiment of the present invention, as shown in figure 4, the device can answer For CDN control platforms, including:
Subscribing module 42, the processing message for subscribing to data processing platform (DPP) publication by Storm clusters, wherein processing Message is handled to obtain by data processing platform (DPP) to the daily record data of predetermined cache content;
Memory module 44 is connected to above-mentioned subscribing module 42, the place handled for the processing message to subscription Reason result is stored.
Herein it should be noted that above-mentioned subscribing module 42 and memory module 44, correspond to the step S302 in embodiment 3 To step S304, two modules are identical as example and application scenarios that corresponding step is realized, but are not limited to the above embodiments 3 Disclosure of that.
In a kind of optional embodiment, there are the CDN control platforms that mass data is transmitted with data processing platform (DPP), at it It during operation, needs to obtain daily record data from CDN service quotient side, is submitted to data processing platform (DPP) and carries out log analysis and result Statistics, then result of calculation is synchronized to CDN control platforms by certain way.Because synchronizing text using FTP modes in the related technology Part poor in timeliness and using data volume is small and the concurrent pressure in peak is big problem when HTTP interface, CDN control platforms at data When platform carries out data transmission, the problem that data transmission efficiency is low, unstable is easy tod produce.
And low latency, high-performance, distribution, the characteristics such as expansible, fault-tolerant possessed by Storm, it can be in CDN management and control The real-time Computational frame Storm clusters of Platform deployment.Daily record data processing unit in through the embodiment of the present invention, CDN management and control are flat Platform subscribes to the processing message of data processing platform (DPP) publication by Storm clusters, and by CDN control platforms to the processing message of subscription The handling result handled is stored, and then may be implemented that speed is written when subscribing to the message of data processing platform (DPP) publication The technique effect that degree is high, reliability is high.
As a kind of optional embodiment, on the daily record data processing unit architecture basics shown in Fig. 4, can also wrap It includes:
Scheduling module is connected to above-mentioned subscribing module 42, for subscribing to data processing platform (DPP) publication by Storm clusters Processing message before, by zookeeper service Storm clusters are allocated.
Herein it should be noted that above-mentioned scheduling module, correspond to the step S301 in embodiment 3, the module with it is corresponding The step of the example realized it is identical with application scenarios, but be not limited to the above embodiments 3 disclosure of that.
As a kind of optional embodiment, above-mentioned subscribing module is additionally operable to subscribe to data processing platform (DPP) by Storm clusters The processing message issued by Kafka clusters.
Because Kafka have the advantages that high writing speed, high reliability, high power capacity, can persistence, data processing may be implemented The technique effect that writing speed is high when platform news release is subscribed to, reliability is high.Therefore at the daily record data in the embodiment of the present invention Device is managed, Kafka clusters in such a way that Storm clusters are combined, data processing platform (DPP) is subscribed to by Storm clusters and is passed through The processing message of Kafka clusters publication, and the handling result handled the processing message of subscription stores, and realizes CDN control platforms carry out data transmission that middle data transmission pretreatment real-time is good, availability is high, timeliness with data processing platform (DPP) The strong technique effect of property, and then solve data transmission poor in timeliness in the related technology and the big technical problem of peak period pressure.
Embodiment 5
According to the embodiment of the present application, additionally provide a kind of based on the improvement CDN control platforms synchronization of streaming real-time operation frame The method of bandwidth traffic data, Fig. 5 are according to the ... of the embodiment of the present invention a kind of based on the improvement CDN management and control of streaming real-time operation frame The flow chart of the method for platform synchronization bandwidth data on flows, as shown in figure 5, the main design thought of this method includes:
(1) it in the Log Data File of CDN control platforms download service quotient side by way of api interface calling, and carries Data processing platform (DPP) is handed over to carry out log analysis calculating;
(2) distributed message ordering system Kafka clusters are disposed in data processing platform (DPP), big data handles service provider side Log analysis file routine the daily record data of acquisition is handled, obtain analyzing processing message, therefore it is as the producer, will Result after analysis is assembled into the theme Topic containing theme, operator, area information, timestamp and bandwidth traffic data and disappears Breath, while a Partitioning Key being specified to be distributed to Kafka as partition identification to realize the function of load balancing Group system;
(3) real-time Computational frame Storm clusters are disposed in CDN control platforms, Storm clusters are consumed as theme Topic Person subscribes to the Message Record in Kafka clusters by KafkaSpout components, will after carrying out data processing according still further to business rule As a result it is saved in the database Mysql/MongoDB of CDN control platforms, is called by CDN control platform api interfaces, it can be with In the data variation of front page layout real-time display service provider;Meanwhile zookeeper services being installed simultaneously in CDN control platforms, make For the coordinator of Storm clusters.
Wherein, in the above-mentioned methods, the producer, consumer, Kafka queues are each configured to multiple nodes, zookeeper clothes Business is configured to 1~3 node.
Specifically, for a certain specific cache contents (domain name, the page, multimedia file etc.), the CDN service chamber of commerce according to Time sequencing provides daily record data, and this method obtains service provider's daily record data interval by shortening, repeatedly submits big data analysis Mode, shorten CDN service quotient and CDN control platforms time difference.Pass through the real-time place to data processing platform (DPP) analysis result Reason, and then reach real-time height, the high design object of availability.
It should be noted that Kafka frames support three kinds of news release subscribing modes:(message may by At most once Can lose, but will not repeat transmit), At least once (message will not lose, but may repeat transmit), Exactly Once (every message is certain to be transmitted primary and only transmission primaries), according to bandwidth traffic data characteristics, and with the elder generation of time It is main sort by afterwards, it is preferable that the embodiment of the present invention uses the message model of above-mentioned Exactly once.
Meanwhile be based on bandwidth traffic data characteristics, this method use using based on timestamp in the way of as Message is saved into subregion by Partitioning Key, and realization is linearly writing reading.
Possess the characteristics such as low latency, high-performance, distribution, expansible, fault-tolerant for Storm, it is ensured that message is not lost Lose, Message Processing is strictly orderly, support multilingual exploitation, while Kafka have high writing speed, high reliability, high power capacity, Can persistence the advantages that, the features such as embodiment of the present invention is high by the comprehensive availability using streaming operation frame, and real-time is good, It is good that a real-time can be provided, high availability, the strong data transmission pre-processing service of timeliness, and then solve CDN operations Data transmission is using FTP mode synchronous documents poor in timeliness in the process and data volume is small when using HTTP interface and peak is concurrently pressed The big problem of power.
The embodiments of the present invention are for illustration only, can not represent the quality of embodiment.
In the above embodiment of the present invention, all emphasizes particularly on different fields to the description of each embodiment, do not have in some embodiment The part of detailed description may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, for example, the unit division, Ke Yiwei A kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple units or component can combine or Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Between coupling, direct-coupling or communication connection can be INDIRECT COUPLING or communication link by some interfaces, unit or module It connects, can be electrical or other forms.
The unit illustrated as separating component may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple On unit.Some or all of unit therein can be selected according to the actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can be stored in a computer read/write memory medium.Based on this understanding, technical scheme of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or Part steps.And storage medium above-mentioned includes:USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD etc. are various can to store program code Medium.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (10)

1. a kind of daily record data processing method, which is characterized in that including:
Data processing platform (DPP) obtains the daily record data of predetermined cache content;
The data processing platform (DPP) handles the daily record data of acquisition, obtains processing message;
The data processing platform (DPP) gives the obtained processing news release to Kafka clusters, wherein the Kafka clusters are used The processing message is subscribed in CDN control platforms.
2. according to the method described in claim 1, it is characterized in that, the processing message that the data processing platform (DPP) will obtain Being distributed to the Kafka clusters includes:
The data processing platform (DPP) distributes partition identification to the processing message;
The data processing platform (DPP) gives the obtained processing news release described according to the partition identification of distribution Kafka clusters.
3. method according to claim 1 or 2, which is characterized in that the Kafka clusters are used for the CDN control platforms The processing message is subscribed to by Storm clusters.
4. a kind of daily record data processing method, which is characterized in that including:
Content distributing network CDN control platforms subscribe to the processing message of data processing platform (DPP) publication by Storm clusters, wherein The processing message is handled to obtain by the data processing platform (DPP) to the daily record data of predetermined cache content;
The handling result that the CDN control platforms handle the processing message of subscription stores.
5. according to the method described in claim 4, it is characterized in that, being ordered by the Storm clusters in the CDN control platforms Before the processing message for readding the data processing platform (DPP) publication, further include:
The CDN control platforms allocate the Storm clusters by zookeeper services.
6. method according to claim 4 or 5, which is characterized in that the CDN control platforms pass through the Storm clusters Subscribing to the processing message that the data processing platform (DPP) is issued includes:
The CDN control platforms subscribe to the institute that the data processing platform (DPP) is issued by Kafka clusters by the Storm clusters State processing message.
7. a kind of daily record data processing unit, which is characterized in that it is applied to data processing platform (DPP), including:
Acquisition module, the daily record data for obtaining predetermined cache content;
Processing module is handled for the daily record data to acquisition, obtains processing message;
Release module, for giving the obtained processing news release to Kafka clusters, wherein the Kafka clusters are used for CDN control platforms subscribe to the processing message.
8. device according to claim 7, which is characterized in that the Kafka clusters pass through for the CDN control platforms Storm clusters subscribe to the processing message.
9. a kind of daily record data processing unit, which is characterized in that it is applied to content distributing network CDN control platforms, including:
Subscribing module, the processing message for subscribing to data processing platform (DPP) publication by Storm clusters, wherein the processing disappears Breath is handled to obtain by the data processing platform (DPP) to the daily record data of predetermined cache content;
Memory module, the handling result handled for the processing message to subscription store.
10. device according to claim 9, which is characterized in that the subscribing module is additionally operable to through the Storm collection Group subscribes to the processing message that the data processing platform (DPP) is issued by Kafka clusters.
CN201810184438.8A 2018-03-06 2018-03-06 Log data processing method and device Active CN108600300B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810184438.8A CN108600300B (en) 2018-03-06 2018-03-06 Log data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810184438.8A CN108600300B (en) 2018-03-06 2018-03-06 Log data processing method and device

Publications (2)

Publication Number Publication Date
CN108600300A true CN108600300A (en) 2018-09-28
CN108600300B CN108600300B (en) 2021-11-12

Family

ID=63625739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810184438.8A Active CN108600300B (en) 2018-03-06 2018-03-06 Log data processing method and device

Country Status (1)

Country Link
CN (1) CN108600300B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109274540A (en) * 2018-11-16 2019-01-25 四川长虹电器股份有限公司 A kind of web access log processing method based on storm
CN109901992A (en) * 2019-01-18 2019-06-18 竞技世界(北京)网络技术有限公司 A kind of method of Remote Dynamic oracle listener process performing
CN109918349A (en) * 2019-02-25 2019-06-21 网易(杭州)网络有限公司 Log processing method, device, storage medium and electronic device
CN109951323A (en) * 2019-02-27 2019-06-28 网宿科技股份有限公司 A kind of log analysis method and system
CN110401724A (en) * 2019-08-22 2019-11-01 北京旷视科技有限公司 File management method, ftp server and storage medium
CN110719332A (en) * 2019-10-17 2020-01-21 北京旷视科技有限公司 Data transmission method, device, system, computer equipment and storage medium
CN111723156A (en) * 2020-06-29 2020-09-29 深圳壹账通智能科技有限公司 Data disaster tolerance method and system
CN111897997A (en) * 2020-06-15 2020-11-06 济南浪潮高新科技投资发展有限公司 Data processing method and system based on ROS operating system
CN113015203A (en) * 2021-03-22 2021-06-22 Oppo广东移动通信有限公司 Information acquisition method, device, terminal, system and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631026A (en) * 2015-12-30 2016-06-01 北京奇艺世纪科技有限公司 Security data analysis system
CN105681303A (en) * 2016-01-15 2016-06-15 中国科学院计算机网络信息中心 Big data driven network security situation monitoring and visualization method
CN105868075A (en) * 2016-03-31 2016-08-17 浪潮通信信息***有限公司 System and method for monitoring and analyzing great deal of logs in real time
US20160335287A1 (en) * 2015-05-14 2016-11-17 Alibaba Group Holding Limited Stream computing system and method
CN107332719A (en) * 2017-08-16 2017-11-07 北京云端智度科技有限公司 A kind of method that daily record is analyzed in real time in CDN system
CN107391606A (en) * 2017-06-30 2017-11-24 中国联合网络通信集团有限公司 Log processing method and device based on Storm

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160335287A1 (en) * 2015-05-14 2016-11-17 Alibaba Group Holding Limited Stream computing system and method
CN105631026A (en) * 2015-12-30 2016-06-01 北京奇艺世纪科技有限公司 Security data analysis system
CN105681303A (en) * 2016-01-15 2016-06-15 中国科学院计算机网络信息中心 Big data driven network security situation monitoring and visualization method
CN105868075A (en) * 2016-03-31 2016-08-17 浪潮通信信息***有限公司 System and method for monitoring and analyzing great deal of logs in real time
CN107391606A (en) * 2017-06-30 2017-11-24 中国联合网络通信集团有限公司 Log processing method and device based on Storm
CN107332719A (en) * 2017-08-16 2017-11-07 北京云端智度科技有限公司 A kind of method that daily record is analyzed in real time in CDN system

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109274540A (en) * 2018-11-16 2019-01-25 四川长虹电器股份有限公司 A kind of web access log processing method based on storm
CN109901992A (en) * 2019-01-18 2019-06-18 竞技世界(北京)网络技术有限公司 A kind of method of Remote Dynamic oracle listener process performing
CN109901992B (en) * 2019-01-18 2022-03-04 竞技世界(北京)网络技术有限公司 Method for remotely and dynamically monitoring program execution behavior
CN109918349A (en) * 2019-02-25 2019-06-21 网易(杭州)网络有限公司 Log processing method, device, storage medium and electronic device
CN109918349B (en) * 2019-02-25 2021-05-25 网易(杭州)网络有限公司 Log processing method, log processing device, storage medium and electronic device
CN109951323A (en) * 2019-02-27 2019-06-28 网宿科技股份有限公司 A kind of log analysis method and system
CN109951323B (en) * 2019-02-27 2022-11-08 网宿科技股份有限公司 Log analysis method and system
CN110401724A (en) * 2019-08-22 2019-11-01 北京旷视科技有限公司 File management method, ftp server and storage medium
CN110401724B (en) * 2019-08-22 2022-04-12 北京旷视科技有限公司 File management method, file transfer protocol server and storage medium
CN110719332B (en) * 2019-10-17 2022-07-26 北京旷视科技有限公司 Data transmission method, device, system, computer equipment and storage medium
CN110719332A (en) * 2019-10-17 2020-01-21 北京旷视科技有限公司 Data transmission method, device, system, computer equipment and storage medium
CN111897997A (en) * 2020-06-15 2020-11-06 济南浪潮高新科技投资发展有限公司 Data processing method and system based on ROS operating system
CN111723156A (en) * 2020-06-29 2020-09-29 深圳壹账通智能科技有限公司 Data disaster tolerance method and system
CN113015203A (en) * 2021-03-22 2021-06-22 Oppo广东移动通信有限公司 Information acquisition method, device, terminal, system and storage medium

Also Published As

Publication number Publication date
CN108600300B (en) 2021-11-12

Similar Documents

Publication Publication Date Title
CN108600300A (en) Daily record data processing method and processing device
Zhang et al. Proactive workload management in hybrid cloud computing
CN109618002B (en) Micro-service gateway optimization method, device and storage medium
CN106657379A (en) Implementation method and system for NGINX server load balancing
CN107688496A (en) Task distribution formula processing method, device, storage medium and server
CN103747274B (en) A kind of video data center setting up cache cluster and cache resources dispatching method thereof
CN108683720A (en) A kind of container cluster service configuration method and device
CN105338061A (en) Lightweight message oriented middleware realization method and system
CN104348798B (en) A kind of method, apparatus, dispatch server and system for distributing network
US20110172963A1 (en) Methods and Apparatus for Predicting the Performance of a Multi-Tier Computer Software System
CN106101264A (en) Content distributing network daily record method for pushing, device and system
CN104202386B (en) A kind of high concurrent amount distributed file system and its secondary load equalization methods
CN110198332A (en) Dispatching method, device and the storage medium of content delivery network node
CN107870763A (en) For creating the method and its device of the real-time sorting system of mass data
CN106559498A (en) Air control data collection platform and its collection method
Garcia-Carballeira et al. Enhancing the power of two choices load balancing algorithm using round robin policy
CN103248636B (en) The system and method downloaded offline
CN105656794B (en) Data distributing method, device and computer readable storage medium
CN113422808A (en) Internet of things platform HTTP information pushing method, system, device and medium
CN115866059A (en) Block chain link point scheduling method and device
CN105578212B (en) A kind of point-to-point Streaming Media method of real-time in big data under stream calculation platform
Zhou et al. An adaptive cloud downloading service
CN107277088B (en) High-concurrency service request processing system and method
CN105187518B (en) A kind of CDN content distribution method and system
EP2674876A1 (en) Streaming analytics processing node and network topology aware streaming analytics system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant