CN107341084A - A kind of method and device of data processing - Google Patents

A kind of method and device of data processing Download PDF

Info

Publication number
CN107341084A
CN107341084A CN201710343310.7A CN201710343310A CN107341084A CN 107341084 A CN107341084 A CN 107341084A CN 201710343310 A CN201710343310 A CN 201710343310A CN 107341084 A CN107341084 A CN 107341084A
Authority
CN
China
Prior art keywords
processing
server
result
data
stream data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710343310.7A
Other languages
Chinese (zh)
Other versions
CN107341084B (en
Inventor
周光辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710343310.7A priority Critical patent/CN107341084B/en
Publication of CN107341084A publication Critical patent/CN107341084A/en
Application granted granted Critical
Publication of CN107341084B publication Critical patent/CN107341084B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/183Provision of network file services by network file servers, e.g. by using NFS, CIFS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/565Conversion or adaptation of application format or content
    • H04L67/5651Reducing the amount or size of exchanged application data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A kind of method and device of data processing of disclosure, the first system in this method in the first computer room is after stream data is got, can first streaming data carry out Map processing, and the first obtained result, then, the first result that itself is obtained again is sent into the second system of the second computer room, so that the first result that second system can be sent according to the first system got, obtains second processing result.Because the stream data of acquisition is handled to obtain the first result for the stream data of acquisition by the first system, data volume greatly reduces, it so then can greatly reduce the volume of transmitted data across computer room, so as to shorten the time consumed during across computer room transmission data, and then improve data-handling efficiency.

Description

A kind of method and device of data processing
Technical field
The application is related to field of computer technology, more particularly to a kind of method and device of data processing.
Background technology
With the continuous development of big data technology, people can be analyzed the data of magnanimity by big data technology, Processing, accurate analysis result is obtained, and using obtained analysis result, carry out business diagnosis, market supposition etc. Activity, so as to bring the information of directiveness for the follow-up production and living of people.
Currently, people using big data technology when carrying out Stream Processing, the sea that will typically be got in multiple computer rooms Amount stream data is first pooled in a computer room, then again by collecting the computer room of magnanimity stream data to these magnanimity streaming numbers According to being handled, and result will be obtained and exported, as shown in Figure 1.
Fig. 1 is the schematic diagram that Stream Processing is carried out to big data that prior art provides.
Assuming that when operation maintenance personnel needs to carry out real-time statistics to a business by Stream Processing, can be real by computer room 1~4 When get the magnanimity stream data of the business, then, operation maintenance personnel needs the magnanimity stream that will be got in real time in computer room 1~4 Formula data are collected in computer room 5 by network transmission, and each server in computer room 5 can deposit the magnanimity stream data collected Storage is in respective data queue.For each server in computer room 5, the server can be by its data queue Magnanimity stream data is handled by default data processing method, and obtains corresponding result.Each clothes in computer room 5 The result that business device (server for obtaining magnanimity stream data) respectively can obtain itself is sent to a clothes of computer room 5 It is engaged in device, so that each result collected is further processed the server, obtains final result and defeated Go out.
However, the data volume that computer room 1~4 is transmitted to computer room 5 is excessively huge, and due to being across computer room transmission data, network Time delay is larger, therefore, in the prior art, magnanimity stream data is collected in the process of a computer room progress data processing In, the time of across computer room transmission data consumption is longer, and the efficiency of transmission of data is relatively low, causes the effect that data are carried out with Stream Processing Rate is also corresponding relatively low.
The content of the invention
The embodiment of the present application provides a kind of method of data processing, real to solve across the computer room streaming data of prior art Apply the problem of less efficient during processing.
The embodiment of the present application provides a kind of method of data processing, including:
The first system obtains stream data and stored, and the first system is located in the first computer room;
Map processing is carried out to the stream data of storage, obtains the first result, the number of first result It is less than the stream data according to amount;
First result is sent into second system, so that the second system is tied according to the described first processing Fruit, obtains second processing result, and the second system is located in the second computer room.
The embodiment of the present application provides a kind of system of data processing, real to solve across the computer room streaming data of prior art Apply the problem of less efficient during processing.
The embodiment of the present application provides a kind of system of data processing, including:It is at least one service server, at least one Storage server and at least one first processing server, the system are located in the first computer room;
The service server, obtain stream data;
The storage server, the stream data is obtained from the service server and is stored;
First processing server, Map processing is carried out to the stream data of storage server storage, obtained First result.
The embodiment of the present application provides a kind of method of data processing, real to solve across the computer room streaming data of prior art Apply the problem of less efficient during processing.
The embodiment of the present application provides a kind of method of data processing, including:
Second system obtains the first result that at least one the first system obtains, and the second system is located at the second machine Fang Zhong;
Reduce processing is carried out to first result got, obtains second processing result.
The embodiment of the present application provides a kind of system of data processing, real to solve across the computer room streaming data of prior art Apply the problem of less efficient during processing.
The embodiment of the present application provides a kind of system of data processing, including:At least one acquisition server and at least One second processing server, the system are located in the second computer room;
The acquisition server, obtain the first result that at least one the first system obtains;
The second processing server, first result got at least one acquisition server are entered Row Reduce processing, obtains second processing result.
Above-mentioned at least one technical scheme that the embodiment of the present application uses can reach following beneficial effect:
In the embodiment of the present application, the first system in the first computer room, can first convection type number after stream data is got According to Map processing, and the first obtained result is carried out, then, then the first result that itself is obtained is sent to second In the second system of computer room, so that the first result that second system can be sent according to the first system that gets, the is obtained Two results.Because the first system is handled the stream data of acquisition to obtain stream of first result relative to acquisition For formula data, data volume greatly reduces, and so then can greatly reduce the volume of transmitted data across computer room, so as to shorten across Computer room transmits the time consumed during data, and then improves data-handling efficiency.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, forms the part of the application, this Shen Schematic description and description please is used to explain the application, does not form the improper restriction to the application.In the accompanying drawings:
Fig. 1 is the schematic diagram that Stream Processing is carried out to big data that prior art provides;
Fig. 2 is the schematic diagram for the data handling procedure that the embodiment of the present application provides;
Fig. 3 is the configuration diagram for the first system that the embodiment of the present application provides;
Fig. 4 is the configuration diagram for the second system that the embodiment of the present application provides;
Fig. 5 is that respective first result is pooled to one second by multiple the first systems that the embodiment of the present application provides The schematic diagram handled in system.
Embodiment
In the embodiment of the present application, it may be summarized to be the stream data that will be got in real time during whole data processing Map processing is first carried out, and the result that Map processing is obtained further carries out Reduce processing, so as to obtain final processing knot Fruit.Wherein, obtaining stream data and streaming data and carrying out Map processing can be completed by the first system, and by Map processing Obtained result further carries out Reduce processing, and obtaining final result can then be completed by second system.Change sentence Talk about, the data handling procedure illustrated by the embodiment of the present application can split into two parts, previous portion by boundary of Map processing Divide (stream data that will be got carries out Map processing, obtains the first result) can be carried out by the first system, then A part (obtains the first result, and carries out Reduce processing to the first result of acquisition, obtain second processing knot Fruit) it can then be carried out by second system, wherein, the first system and second system can be different systems, accordingly, the first system System and second system can be located in different computer rooms.In the embodiment of the present application, including the computer room of the first system can claim Be the first computer room, and the computer room for including second system can then be referred to as the second computer room.
In order that those skilled in the art more fully understand the technical scheme in the application, it is real below in conjunction with the application The accompanying drawing in example is applied, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described implementation Example only some embodiments of the present application, rather than whole embodiments.It is common based on the embodiment in the application, this area The every other embodiment that technical staff is obtained under the premise of creative work is not made, it should all belong to the application protection Scope.
Fig. 2 is the schematic diagram for the data handling procedure that the embodiment of the present application provides, and specifically includes following steps:
S201:The first system obtains stream data and stored.
In the embodiment of the present application, operation maintenance personnel is needed by way of Stream Processing, and at least one business is entered in real time During the operations such as row monitoring, analysis, stream data caused by least one business can be got by the first system.Its In, the first system can be used for the processing work of multiple business, and in actual applications, perform the of multiple business processings work One system there may be it is multiple, and in the different computer rooms.Such as, in actual applications, a business platform is corresponding with multiple The first system, these systems can belong in the first computer room of different zones, and be responsible for entering the multinomial business of affiliated area Row processing.
Above-mentioned the first system, also can be to getting in addition to it can get stream data caused by least one business Stream data handled, obtain the first result, specific processing procedure will specifically be introduced in subsequent step.
In the embodiment of the present application, the first system can get above-mentioned streaming number from the business procession of user According to.Specifically, user during business processing is carried out, can send business information into the first system of the first computer room, The business information that the first system can send user is as stream data.
User can also send business information into the first system, and the first system is getting the business letter of user's transmission After breath, the business information that can be sent according to user carries out corresponding business processing, and obtained business diary, and then will obtain Business diary is as stream data.Certainly, referred to herein as stream data can also be other forms data, it is real in the application Apply in example, stream data is determined especially by which kind of mode, at least one business can be held by operation maintenance personnel Capable specific real-time operation determines.
In the embodiment of the present application, the first system in the first computer room can be made up of multiple servers, in these services In device, different classes of server has different task responsibilities, as shown in Figure 3.
Fig. 3 is the configuration diagram for the first system that the embodiment of the present application provides.
In figure 3, the server that the first system includes can be roughly divided into three classes, and a kind of server can be referred to as business Server, these service servers are responsible for obtaining stream data, and a kind of server can be referred to as storage server, these storages The stream data that server can get service server is stored, and another kind of server can be referred to as at first Server is managed, this kind of server can get the stream data of storage server storage from storage server, and to obtaining The stream data arrived carries out data processing, obtains the first result.
Wherein, storage server can be stored the stream data that service server obtains in the form of data queue, So that the first follow-up processing server can get these stream datas from the data queue of storage server.Certainly, The stream data that storage server can also be got using other cache way come storage service server, just differs herein One has been illustrated.
In the embodiment of the present application, why storage server can be in the form of data queue come storage service service The stream data that device is got, be because except need ensure stream data can in real time acquired in the first processing server simultaneously Processing is outer, and the storage server either in the first system actively sends the streaming in the data queue to the first processing server Data, or the first processing server is from the stream data in the storage server active obtaining data queue, the stream data Once being got by the first processing server, then the stream data will remove from data queue.So, in the first system Other first processing servers will be unable to get the stream data from the data queue of the storage server, so as to avoid The situation that each first processing server in the first system repeats to obtain stream data from same data queue occurs, and ensure that Accurately and effectively second processing result can be obtained subsequently through the second system of the second computer room.
Certainly, if the storage server in the first system caches stream data not in the form of data queue, then Need to ensure that each first processing server in the first system will not get identical stream data.
Wherein, if the storage server in the first system actively sends stream data to the first processing server, then For a part of stream data of storage server storage, if the storage server is by the success of this part flow data Send to first processing server, then the storage server again can not be sent this part flow data to other at first Manage server.In other words, when storage server actively sends stream data to the first processing server, a stream data is only It can correspond to and send to first processing server.
And if the first processing server actively obtains stream data from storage server, then the storage server can be by the The stream data that one processing server is got is locked so that other first processing servers in the first system from this When storage server obtains the stream data, state of the stream data in locking is found, so as to judge the streaming Data are got by other first processing servers, and then no longer obtain the stream data.
Wherein, the stream data that storage server has been got to the first processing server enters line-locked mode can be with It is:The storage server, then can be right when it is determined that the stream data of its storage is got by some first processing server The stream data adds identification information.So, when other first processing servers find what is stored in the storage server When the stream data has been added to identification information, then it can determine that the stream data is obtained by other first processing servers Take, therefore also will no longer obtain the stream data.Certainly, the streaming number that storage server has been got to the first processing server Also many according to line-locked mode is entered, just differ one has been illustrated herein.
It should be noted that service server, storage server of the above-mentioned the first system except described above can be included And first outside processing server, it can also include other soft hardware equipments, such as gateway, route, load balancing, these set The standby data communication that can be used between computer room, with data communication between each server in computer room etc..And for the first system For server included in system, the server can also possess a variety of functions, e.g., storage can not be divided in the first system Server and the first processing server, store stream data and this two of the first result are obtained to processing stream data Task can be completed by a server.
S202:Map processing is carried out to the stream data of storage, obtains the first result, the first processing knot The data volume of fruit is less than the stream data.
For each the first system, stream that storage server that the first system includes gets service server After formula data are stored, the first processing server in the first system can get this part stream from storage server Formula data.
Wherein, for each storage server that the first system is included, the storage server is to the first system In the first processing server actively send the storage server storage stream data when, can be wrapped at random from the first system (or multiple) first processing server is selected in each first processing server contained, and by the stream data of storage actively Send into first processing server.The storage server can also be included by way of load balancing from the first system (or multiple) first processing server, and then the stream data master that itself is stored are selected in each first processing server Dynamic transmission is into the first processing server selected.
Certainly, for each first processing server in the first system, first processing server from this During storage server active obtaining stream data in one system, (or multiple) storage server can be randomly selected out, And obtain stream data from the storage server selected.Certainly, first processing server can also pass through load balancing Mode, selects (or multiple) storage server in each storage server included from the first system, and then from selecting Storage server in obtain stream data.
In addition, one can be pre-established between each storage server in the first system and each first processing server Individual mapping relations.Such as, for each storage server in the first system, the storage service is defined in the mapping relations Device needs actively to send the stream data of itself storage into which the first processing server in the first system.Together Reason, for each first processing server in the first system, the mapping relations define first processing server Stream data is got in which storage server that can be included from the first system.
The first processing server in the first system obtains from a storage server (can also be multiple storage servers) After getting stream data, Map processing can be carried out to the stream data got, and obtain corresponding first result.
For example, it is assumed that the stream data that the service server in the first system is got is the daily record letter of user's transmission red packet Breath, in actual applications, user can carry out red packet transmission in 4 business scenarios, so, service server is got each Include the log information that user in this 4 business scenarios sends red packet in stream data.
After service server in the first system gets above-mentioned each stream data in real time, the storage clothes in the first system These stream datas got in service server can be stored in self-contained data queue by business device.And the first system In the first processing server get stream data in the one or more storage servers included from the first system When, Map processing can be carried out to the stream data got, by the field of each business scenario of mark included in stream data As Key, the number that each stream data occurs obtains each key-value pair (Key-Value) as Value.Wherein, obtain here Each key-value pair can be referred to as at the first processing server carries out obtaining after Map processing to the stream data that gets first Result is managed, the content included in first result can be with as shown in the table.
Key-value pair (Key-Value)
(business scenario 1,1)
(business scenario 2,1)
(business scenario 1,1)
(business scenario 3,1)
(business scenario 4,1)
(business scenario 2,1)
(business scenario 1,1)
……
Table 1
In the embodiment of the present application, the first processing server carries out what is obtained after Map processing to the stream data that gets The stream data that first result is far smaller than got in data volume, because, the first processing server is in convection current During formula data carry out Map processing, the unnecessary field in stream data can be removed, only retain and perform above-mentioned real-time operation Field needed for (handling the stream data got in real time), then according to required field, obtain the first processing As a result.In actual applications, the sub-fraction that the field needed for real-time operation is only accounted in whole stream data, so, at first The first result that reason server obtains also will be substantially less that the stream data got in data volume.
Accordingly, the first processing server subsequently by first result across computer room transmission to the second of the second computer room During system, although situation larger during across computer room transmission data delay still occurs, due to the first processing server to For prior art, data volume greatly reduces the first result that the second system is sent, so, the first processing Server sends the transmission time that first result consumed to second system and also will greatly reduced, so with regard to pole The big efficiency of transmission for improving across computer room transmission data, so as to improve the efficiency of data processing to a certain extent.
S203:First result is sent into second system, so that the second system is according to described first Result, obtain second processing result and export, the second system is located in the second computer room.
Above-mentioned the first system (being located in the first computer room) can be by the first processing server for being included in the first system by the The first result that one processing server obtains is sent to the second system by way of across computer room transmission data.Wherein, It can also be made up of in the second system multiple servers, different classes of server also possesses different task responsibilities, such as Fig. 4 It is shown.
Fig. 4 is the configuration diagram for the second system that the embodiment of the present application provides.
In Fig. 4, the server included in the second system of the second computer room can be roughly divided into two classes, a kind of server Can be used for obtain the first system the first processing server send the first result, this kind of server can be referred to as be Obtain server;The first result that another kind of server can be used for getting to obtaining server is carried out at Reduce Reason, obtains second processing result, for this kind of server, it is second processing server that can be referred to as.
In the embodiment of the present application, second system can be included by it multiple acquisition servers obtain each first system The first result that system (each the first system can be located in the first different computer rooms) is sent, and each first processing that will be got As a result it is sent directly to carry out Reduce processing in a second processing server, to obtain second processing result and export. Also after first can carrying out certain processing to the first result got, then the centre obtained after the first result will be handled Result is finally pooled to progress Reduce processing in second processing server, then obtains second processing result and exports.
Continue to continue to use up, it is assumed that one in second system (being located in the second computer room) obtains server and gets first The first result that three the first processing servers in system (being located in the first computer room) are sent respectively is as shown in the table.
Table 2
The acquisition server can further be tied according to multiple first results got to the first obtained processing Fruit is handled, and obtains an intermediate processing results, as shown in the table.
Table 3
Wherein, second processing server by the key-value pair included in above-mentioned first result according to Key (i.e. identification services The field of scene) dimension divided, obtaining the mechanism of intermediate processing results can be referred to as to be Shuffle mechanism.Certainly, In the embodiment of the present application, the Shuffle mechanism can also be completed by the first processing server in the first system, i.e. first The first processing server and default Shuffle mechanism that system can be included by it, at the first result Reason, and the first result after processing is sent into second system, it is to be understood that being obtained into the first processing server To after the first result, first result can be transformed into by shape as shown in table 3 by default Shuffle mechanism Formula, then the first result of this form is sent into second system again.
In the embodiment of the present application, the Shuffle mechanism can also be by the second processing server that is included in second system , can be directly by first after the acquisition server in second system gets each first result from the first system to complete Result is sent to second processing server, then by second processing server by default Shuffle mechanism to obtaining To the first result handled, obtain intermediate processing results, and Reduce processing is carried out to intermediate processing results, finally Obtain second processing result.Certainly, second processing server receives at least one first processing that each acquisition server is sent As a result after, also Reduce processing can directly be carried out to the first result got, and obtain second processing result.
(first result can also after at least one first result is got for above-mentioned second processing server It is the first result after treatment), Reduce processing can be carried out to the first result got, and then obtain phase The second processing result answered simultaneously exports.
Continue to use the example above, second processing server gets each server that obtains in second system and obtained from the first system After the first result got, Reduce processing can be carried out to the first result got, and obtain as shown in the table Second processing result.
Table 4
In table 4, in the second processing result that second processing server obtains, Value values illustrate the business scenario Middle user sends the quantity of red packet, and second processing server can in this way, and real-time statistics go out user in each business field Quantity and the output of red packet are sent in scape.
It should be noted that in the embodiment of the present application, second system (being located at the second computer room) can also be randomly assigned one Individual or several servers obtain each first result of each the first system (being located at each first computer room) transmission, equally also can be with Machine specifies at least one server to carry out Reduce processing to each first result got, obtains second processing result And export.In addition, second system can also be by way of load balancing, multiple clothes included in the second system In business device select several servers, and the server by selecting is obtained at each first transmitted by each the first system Manage result.Certainly equally also can by way of load balancing, from the second system included in multiple servers in choose Go out at least one server, and the server by selecting carries out Reduce processing to each first result got, Obtain second processing result and export.
In from the above as can be seen that because the first system is handled the stream data of acquisition to obtain the first processing As a result for the stream data of acquisition, data volume greatly reduces, and so then can greatly reduce the data across computer room Transmission quantity, so as to shorten the time consumed during across computer room transmission data, and then improve data-handling efficiency.
It should be noted that in the embodiment of the present application, the first result that each the first system sends and right is obtained Each first result got is handled to obtain second processing result and can also completed by a server in second system. In other words, each first result that each the first system is sent can send a given server into second system In, it is responsible for collecting each first result that each the first system sends by the given server, and according to each the got One result, obtain corresponding second processing result and export.
In the prior art, in order to prevent the computer room (computer room 5 in such as Fig. 1) for collecting stream data occur delaying machine situations such as And cause data handling procedure to be affected, current to usually require to establish a standby computer room, the standby computer room needs reality When synchronization aggregated data computer room accessed data and result, so could aggregated data computer room occur Delay machine when, standby computer room can timely and effectively play disaster tolerance effect, it is ensured that the process of data processing is not delayed machine by computer room Influence.However, establishing a standby computer room needs to consume great man power and material, so as to greatly improve data processing During O&M cost.
And in the embodiment of the present application, because the server in second system (being located at the second computer room) is substantial amounts of without receiving Stream data, and only need to obtain the less each first processing knot of data volume that each the first system (being located at each first computer room) obtains Fruit, you can according to the first result got, obtain corresponding second processing result.So operation maintenance personnel is without right The second computer room that second system is located at establishes a standby computer room, but need to only be set in the second system at least one Standby server, the standby server are used for when the failures such as machine occurs delaying in the second processing server in second system, The first result transmitted by each the first system can be got from the acquisition server in second system, then instead of second The second processing server to be broken down in system, Reduce processing is carried out to the first result got, obtained corresponding Second processing result and export.Wherein, the quantity of the standby server is not more than the quantity of the second server.
As can be seen that setting at least one active service to the second processing server in second system in from the above Device is for needing to set a standby computer room in the prior art, and its cost spent will substantially reduce, so as to drop The low cost consumption of data handling procedure.
Similarly, in the embodiment of the present application, the acquisition server for obtaining the first result can be set in second system At least one standby server is put, to ensure that the standby server can when the acquisition server in second system breaks down To replace the acquisition server in second system, each first result transmitted by each the first system is obtained.
Certainly, standby server mentioned above (can obtain at the standby server or second of server Manage the standby server of server) it can be arranged in the second system of second computer room, other computer rooms can also be arranged on In system.
It should be noted that in the embodiment of the present application, storage server can not be included in the first system yet, streaming is stored Data can be carried out by the service server in the first system, i.e. the service server in the first system can obtain streaming number According to, and the stream data got is stored.Wherein, the stream data got can be stored in certainly by service server In the data queue that body is included.
Accordingly, the first processing server in the first system can get service server from service server and be deposited The stream data of storage, and the stream data to getting carries out Map processing, obtains the first result.
For the data processing method that further instruction the embodiment of the present application is provided, below by with more the first systems Actual scene carries out detailed, brief description to the process of whole data processing.
Fig. 5 is that respective first result is pooled to one second by multiple the first systems that the embodiment of the present application provides The schematic diagram handled in system.
For each the first system, the service server in the first system (being located at the first computer room) will can obtain To stream data be stored in the storage server included in the first system, wherein, storage server can pass through number The stream data for getting service server according to the form of queue is stored.The first processing server in first computer room Stream data can be got from the data queue of storage server, and then the stream data to getting carries out Map processing, Obtain the first result.
At least one first processing server that each the first system can be included by it is obtained at least one first Result is managed, the first obtained result is handled by default shuffle mechanism, and first after processing is handled As a result send into same second system (being located at the second computer room).The second system can be by acquisition server pair that it is included Each first result that each the first system is sent is obtained, and the second processing server by being included in second system, Each first result that acquisition server is got is subjected to Reduce processing, and then finally gives a second processing result And export.
In the above-described embodiments, each the first system can be located at the computer room that can carry out practical business processing, and second System can then be located at one dedicated for collecting the first result that each the first system is sent, and draw second processing result Computer room.Certainly, the computer room that second system is located at can also be selected in the first computer room being located at from each the first system One computer room, in other words, the first system of each first computer room, can be with addition to possessing the disposal ability for carrying out practical business Possess and collect the first result caused by other the first systems, and combine itself caused first result, obtain The ability of second processing result.
In the embodiment of the present application, the first system in the first computer room, can first convection type number after stream data is got According to Map processing, and the first obtained result is carried out, then, then the first result that itself is obtained is sent to second In the second system of computer room, so that the first result that second system can be sent according to the first system that gets, the is obtained Two results simultaneously export.Because the stream data of acquisition is handled to obtain the first result relative to obtaining by the first system For the stream data taken, data volume greatly reduces, and the volume of transmitted data across computer room so then can be greatly reduced, so as to contract The time consumed during short across computer room transmission data, and then improve data-handling efficiency.
In the 1990s, the improvement for a technology can clearly distinguish be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And as the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow is programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, PLD (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, its logic function is determined by user to device programming.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, without asking chip maker to design and make Special IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " patrols Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but have many kinds, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed are most generally used at present Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also should This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages, Can is readily available the hardware circuit for realizing the logical method flow.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing Device and storage can by the computer of the computer readable program code (such as software or firmware) of (micro-) computing device Read medium, gate, switch, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and embedded microcontroller, the example of controller include but is not limited to following microcontroller Device:ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, are deposited Memory controller is also implemented as a part for the control logic of memory.It is also known in the art that except with Pure computer readable program code mode realized beyond controller, completely can be by the way that method and step is carried out into programming in logic to make Controller is obtained in the form of gate, switch, application specific integrated circuit, programmable logic controller (PLC) and embedded microcontroller etc. to come in fact Existing identical function.Therefore this controller is considered a kind of hardware component, and various for realizing to including in it The device of function can also be considered as the structure in hardware component.Or even, can be by for realizing that the device of various functions regards For that not only can be the software module of implementation method but also can be the structure in hardware component.
System, device, module or the unit that above-described embodiment illustrates, it can specifically be realized by computer chip or entity, Or realized by the product with certain function.One kind typically realizes that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cell phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet PC, wearable device or these equipment The combination of equipment.
For convenience of description, it is divided into various units during description apparatus above with function to describe respectively.Certainly, this is being implemented The function of each unit can be realized in same or multiple softwares and/or hardware during application.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and internal memory.
Internal memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, calculate according to herein Machine computer-readable recording medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability Comprising so that process, method, commodity or equipment including a series of elements not only include those key elements, but also wrapping Include the other element being not expressly set out, or also include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that wanted including described Other identical element also be present in the process of element, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer program product. Therefore, the application can be using the embodiment in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Form.Deposited moreover, the application can use to can use in one or more computers for wherein including computer usable program code The shape for the computer program product that storage media is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.
The application can be described in the general context of computer executable instructions, such as program Module.Usually, program module includes performing particular task or realizes routine, program, object, the group of particular abstract data type Part, data structure etc..The application can also be put into practice in a distributed computing environment, in these DCEs, by Task is performed and connected remote processing devices by communication network.In a distributed computing environment, program module can be with In the local and remote computer-readable storage medium including storage device.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment Divide mutually referring to what each embodiment stressed is the difference with other embodiment.It is real especially for system For applying example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method Part explanation.
Embodiments herein is the foregoing is only, is not limited to the application.For those skilled in the art For, the application can have various modifications and variations.All any modifications made within spirit herein and principle, it is equal Replace, improve etc., it should be included within the scope of claims hereof.

Claims (10)

1. a kind of method of data processing, including:
The first system obtains stream data and stored, and the first system is located in the first computer room;
Map processing is carried out to the stream data of storage, obtains the first result, the data volume of first result Less than the stream data;
First result is sent into second system, so that the second system is according to first result, Second processing result is obtained, the second system is located in the second computer room.
2. the method as described in claim 1, the first system includes:At least one at least one storage clothes of service server Business device and at least one first processing server;
The first system obtains stream data and stored, and specifically includes:
The first system obtains the stream data by the service server, and by the storage server from described The stream data is obtained in service server and is stored;
Map processing is carried out to the stream data of storage, the first result is obtained, specifically includes:
The stream data stored by first processing server to the storage server carries out Map processing, obtains First result.
3. method as claimed in claim 2, first result is sent into second system, specifically included:
By first processing server and default Shuffle mechanism, first result is handled, and First result after processing is sent into second system.
4. a kind of method of data processing, including:
Second system obtains the first result that at least one the first system obtains, and the second system is located at the second computer room In;Wherein, the first result
Reduce processing is carried out to first result got, obtains second processing result.
5. method as claimed in claim 4, at least one acquisition server and at least one is included in the second system Individual second processing server;
Second system obtains the first result that at least one the first system obtains, and specifically includes:
By at least one acquisition server included in the second system, at least one the first system obtains first is obtained Result;
Reduce processing is carried out to first result got, second processing result is obtained and exports, specifically include:
By the second processing server, first result got at least one acquisition server is entered Row Reduce processing, obtain the second processing result.
6. method as claimed in claim 5, also include at least one standby server in the second system, for when described When at least one second processing server breaks down, by least one standby server at least one acquisition First result that server is got carries out Reduce processing, obtains second processing result.
7. a kind of system of data processing, including:At least one service server, at least one storage server and at least one Individual first processing server, the system are located in the first computer room;
The service server, obtain stream data;
The storage server, the stream data is obtained from the service server and is stored;
First processing server, Map processing is carried out to the stream data of storage server storage, obtains first Result.
8. system as claimed in claim 7, first processing server, by default Shuffle mechanism to described One result is handled, and first result after processing is sent into second system, the second system In the second computer room.
9. a kind of system of data processing, including:At least one acquisition server and at least one second processing server, institute System is stated to be located in the second computer room;
The acquisition server, obtain the first result that at least one the first system obtains;
The second processing server, at least one first result progress for obtaining server and getting Reduce processing, obtain second processing result.
10. system as claimed in claim 9, the system also includes:At least one standby server;
The standby server, when it is determined that at least one second processing server breaks down, to described at least one Obtain first result that server is got and carry out Reduce processing, obtain second processing result.
CN201710343310.7A 2017-05-16 2017-05-16 Data processing method and device Active CN107341084B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710343310.7A CN107341084B (en) 2017-05-16 2017-05-16 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710343310.7A CN107341084B (en) 2017-05-16 2017-05-16 Data processing method and device

Publications (2)

Publication Number Publication Date
CN107341084A true CN107341084A (en) 2017-11-10
CN107341084B CN107341084B (en) 2021-07-06

Family

ID=60220240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710343310.7A Active CN107341084B (en) 2017-05-16 2017-05-16 Data processing method and device

Country Status (1)

Country Link
CN (1) CN107341084B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3392373B2 (en) * 1999-07-21 2003-03-31 末広 江良 Map creation management system and house diagram creation management system
CN103034540A (en) * 2012-11-16 2013-04-10 北京奇虎科技有限公司 Distributed information system, device and coordinating method thereof
CN103345514A (en) * 2013-07-09 2013-10-09 焦点科技股份有限公司 Streamed data processing method in big data environment
US20140222787A1 (en) * 2011-12-29 2014-08-07 Teradata Us, Inc. Techniques for accessing a parallel database system via external programs using vertical and/or horizontal partitioning
CN104572921A (en) * 2014-12-27 2015-04-29 北京奇虎科技有限公司 Cross-datacenter data synchronization method and device
CN104809231A (en) * 2015-05-11 2015-07-29 浪潮集团有限公司 Mass web data mining method based on Hadoop
CN105069703A (en) * 2015-08-10 2015-11-18 国家电网公司 Mass data management method of power grid
US20160085810A1 (en) * 2014-09-24 2016-03-24 Oracle International Corporation Scaling event processing using distributed flows and map-reduce operations
CN105578212A (en) * 2015-12-15 2016-05-11 南京邮电大学 Point-to-point streaming media real-time monitoring method under big data stream computing platform
CN105677752A (en) * 2015-12-30 2016-06-15 深圳先进技术研究院 Streaming computing and batch computing combined processing system and method
CN106294445A (en) * 2015-05-27 2017-01-04 华为技术有限公司 The method and device stored based on the data across machine room Hadoop cluster

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3392373B2 (en) * 1999-07-21 2003-03-31 末広 江良 Map creation management system and house diagram creation management system
US20140222787A1 (en) * 2011-12-29 2014-08-07 Teradata Us, Inc. Techniques for accessing a parallel database system via external programs using vertical and/or horizontal partitioning
CN103034540A (en) * 2012-11-16 2013-04-10 北京奇虎科技有限公司 Distributed information system, device and coordinating method thereof
CN103345514A (en) * 2013-07-09 2013-10-09 焦点科技股份有限公司 Streamed data processing method in big data environment
US20160085810A1 (en) * 2014-09-24 2016-03-24 Oracle International Corporation Scaling event processing using distributed flows and map-reduce operations
CN104572921A (en) * 2014-12-27 2015-04-29 北京奇虎科技有限公司 Cross-datacenter data synchronization method and device
CN104809231A (en) * 2015-05-11 2015-07-29 浪潮集团有限公司 Mass web data mining method based on Hadoop
CN106294445A (en) * 2015-05-27 2017-01-04 华为技术有限公司 The method and device stored based on the data across machine room Hadoop cluster
CN105069703A (en) * 2015-08-10 2015-11-18 国家电网公司 Mass data management method of power grid
CN105578212A (en) * 2015-12-15 2016-05-11 南京邮电大学 Point-to-point streaming media real-time monitoring method under big data stream computing platform
CN105677752A (en) * 2015-12-30 2016-06-15 深圳先进技术研究院 Streaming computing and batch computing combined processing system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LINLIN DING, JUNCHANG XIN, GUOREN WANG,SHAN HUANG: "ComMapReduce: An Improvement of MapReduce with Lightweight Communication Mechanisms", 《DATA&KNOWLEDGE ENGINEERING》 *
段庆新: "大数据时代网络基础架构的思考", 《信息通信技术与政策》 *

Also Published As

Publication number Publication date
CN107341084B (en) 2021-07-06

Similar Documents

Publication Publication Date Title
CN110537194B (en) Power efficient deep neural network processor and method configured for layer and operation protection and dependency management
CN107395665A (en) A kind of block chain service handling and business common recognition method and device
CN107450979A (en) A kind of block chain common recognition method and device
CN107391526A (en) A kind of data processing method and equipment based on block chain
CN107395659A (en) A kind of method and device of service handling and common recognition
CN104782136B (en) Video data is handled in cloud
CN107391527A (en) A kind of data processing method and equipment based on block chain
CN107040585A (en) A kind of method and device of business verification
CN107679700A (en) Business flow processing method, apparatus and server
CN110650347B (en) Multimedia data processing method and device
CN106874320A (en) The method and apparatus of distributive type data processing
CN107577697A (en) A kind of data processing method, device and equipment
CN107633347A (en) A kind of data target statistical method and device
CN109617829A (en) A kind of processing method of service request data, apparatus and system
CN109002357A (en) Resource allocation methods, device and Internet of things system
CN107016039A (en) The method and Database Systems of database write-in
CN109492024A (en) Data processing method, device, computer equipment and storage medium
CN107479868A (en) A kind of interface loading method, device and equipment
WO2020263418A1 (en) Managing workloads of a deep neural network processor
CN107038127A (en) Application system and its buffer control method and device
CN107391541A (en) A kind of real time data merging method and device
CN105162837B (en) The method and system of I/O throughputs are promoted under mass data storage environment
CN107341084A (en) A kind of method and device of data processing
CN116594568A (en) Data storage method and device, electronic equipment and storage medium
CN110704182A (en) Deep learning resource scheduling method and device and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20201019

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201019

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant