CN109324898B

CN109324898B - Service processing method and system

Info

Publication number: CN109324898B
Application number: CN201810983481.0A
Authority: CN
Inventors: 董涛; 卜云涛
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2018-08-27
Filing date: 2018-08-27
Publication date: 2022-12-02
Anticipated expiration: 2038-08-27
Also published as: CN109324898A

Abstract

The invention discloses a service processing method and a system, which classify services to be processed so as to determine one or more product versions corresponding to the services to be processed; then estimate the data volume of each product version of pending business is exactly the data volume that follow-up Reduce reduction task was handled because the data volume of each product version is, so can be based on the data volume of each product version of pending business applies the Reduce reduction task that corresponds quantity, can avoid applying the uneven problem of resource distribution that too much or too little Reduce reduction task leads to, reaches the purpose that pending business was handled to reasonable distribution Reduce task, is based on at last the Reduce reduction task that corresponds quantity is right pending business carries out the distributed processing.

Description

Service processing method and system

Technical Field

The present application relates to the field of distributed technologies, and in particular, to a method and a system for processing a service.

Background

In the distributed system infrastructure, the core design is HDFS and MapReduce. HDFS provides storage for massive data, and MapReduce provides computation for massive data.

While the distributed system infrastructure has a huge amount of computing resources, tasks in a large number of map (mapping) stages are finally gathered to a small number of reduce (reduction) computing stages.

If the task of the map computation does not match the task volume of the reduce computation phase, resources are wasted. For example, if the reduce tasks are too few and the map stage tasks are too many, the operation time in the previous reduce stage will be too long, and even if the memory usage is too large, the operation of the tasks will fail. And if the reduce tasks are too many, the resource waste is caused.

Therefore, how to reasonably allocate the MapReduce resource is a problem which needs to be solved urgently at present.

Disclosure of Invention

The invention provides a service processing method and a service processing system, which aim to solve or partially solve the technical problem of resource allocation in a MapReduce stage.

In order to solve the above technical problem, the present invention provides a service processing method, where the method includes:

classifying the service to be processed and determining one or more product versions corresponding to the service to be processed;

estimating the data volume of each product version of the service to be processed;

applying for a corresponding number of Reduce reduction tasks based on the data volume of each product version of the service to be processed;

and carrying out distributed processing on the service to be processed based on the Reduce reduction tasks of the corresponding quantity.

Preferably, the classification parameters include: log type, log service ID, product version;

classifying the service to be processed so as to determine one or more product versions corresponding to the service to be processed, which specifically includes:

classifying the service to be processed according to log categories to obtain a first classification result in each log category;

classifying the first classification result in each log category according to the log service ID to obtain a second classification result in each log service ID;

and classifying the second classification result in each log service ID according to the product version to determine one or more types of product versions corresponding to the service to be processed.

Preferably, the applying for the Reduce reduction tasks of corresponding quantities based on the data volume of each product version of the service to be processed specifically includes:

determining the number of to-be-applied Reduce reduction tasks based on the data volume of each product version of the to-be-processed service;

and applying for the Reduce reduction tasks with corresponding quantity based on the quantity to be applied of the Reduce reduction tasks.

Preferably, the determining the number to be applied for the Reduce reduction task based on the data volume of each product version of the service to be processed specifically includes:

judging whether the data volume of each product version of the service to be processed is larger than a preset data volume threshold value or not;

if so, distributing a corresponding Reduce reduction task to the first product version which is larger than the preset data volume threshold;

if not, combining two or more types of second product versions smaller than the preset data volume threshold value into a third product version, and distributing corresponding Reduce reduction tasks to the third product version; the difference value between the data volume of the third product version and the preset data volume threshold value is within a preset range;

obtaining a mapping relation between each product version of the service to be processed and the Reduce reduction task based on the Reduce reduction task corresponding to the first product version and the Reduce reduction task corresponding to the third product version;

the quantity of Reduce reduction tasks that statistics first product version corresponds with the quantity of Reduce reduction tasks that the third product version corresponds determines the quantity of waiting to apply for of Reduce reduction tasks.

Preferably, the preset data amount threshold is obtained by the following steps: determining the data volume which can be processed by a single Reduce reduction task according to the resource threshold of the Reduce reduction task, and determining the data volume which can be processed by the single Reduce reduction task as the preset data volume threshold.

Preferably, the performing distributed processing on the to-be-processed service based on the Reduce reduction tasks of the corresponding number specifically includes:

dividing the service to be processed into a plurality of subtasks, inputting the subtasks into a Map frame, and respectively performing mapping calculation processing to obtain intermediate data sets with the number corresponding to the plurality of subtasks;

classifying the intermediate data set to further obtain one or more product versions of each intermediate data in the intermediate data set; the intermediate data set comprises a plurality of intermediate data sets, wherein each product version in the intermediate data set corresponds to each product version in the service to be processed one by one;

and inputting each product version of the intermediate data set into the corresponding distributed Reduce reduction task for reduction calculation processing based on the mapping relation between each product version of the service to be processed and the Reduce reduction task.

Preferably, the classifying the intermediate data set specifically includes:

classifying each intermediate data in the intermediate data set according to log categories to obtain a third classification result in each log category;

classifying the third classification result in each log category according to the log service ID to obtain a fourth classification result in each log service ID;

and classifying the fourth classification result in each log service ID according to the product version to determine one or more types of product versions corresponding to each intermediate data in the intermediate data set.

Preferably, the data volume of each product version of the service to be processed is: the visit volume of each product version of the service to be processed.

In another aspect of the present invention, a service processing system is disclosed, the system comprising:

the first classification module is used for classifying the service to be processed and determining one or more product versions corresponding to the service to be processed;

the pre-estimation module is used for pre-estimating the data volume of each product version of the service to be processed;

the application module is used for applying for corresponding quantity of Reduce reduction tasks based on the data volume of each product version of the service to be processed;

and the processing module is used for carrying out distributed processing on the service to be processed based on the Reduce reduction tasks with the corresponding quantity.

the first classification module specifically includes:

the first classification submodule is used for classifying the service to be processed according to the log categories to obtain a first classification result in each log category;

the second classification submodule is used for classifying the first classification result in each log category according to the log service ID to obtain a second classification result in each log service ID;

and the third classification submodule is used for classifying the second classification result in each log service ID according to the product version so as to determine one or more product versions corresponding to the service to be processed.

Preferably, the application module specifically includes:

the determining module is used for determining the number of the Reduce reduction tasks to be applied based on the data volume of each product version of the to-be-processed service;

and the application submodule is used for applying the Reduce reduction tasks of the corresponding quantity based on the quantity to be applied of the Reduce reduction tasks.

Preferably, the determining module specifically includes:

the judging module is used for judging whether the data volume of each product version of the service to be processed is larger than a preset data volume threshold value or not;

the first distribution module is used for distributing corresponding Reduce reduction tasks to the first product versions larger than the preset data volume threshold value if the first distribution module is used for distributing the corresponding Reduce reduction tasks to the first product versions larger than the preset data volume threshold value;

the second distribution module is used for combining two or more types of second product versions smaller than the preset data volume threshold value into a third product version if the second product versions are not smaller than the preset data volume threshold value, and distributing corresponding Reduce reduction tasks to the third product version; the difference value between the data volume of the third product version and the preset data volume threshold value is within a preset range;

an obtaining module, configured to obtain a mapping relationship between each product version of the service to be processed and a Reduce reduction task based on the Reduce reduction task corresponding to the first product version and the Reduce reduction task corresponding to the third product version;

and the counting module is used for counting the quantity of the Reduce reduction tasks corresponding to the first product version and the quantity of the Reduce reduction tasks corresponding to the third product version, and determining the quantity to be applied of the Reduce reduction tasks.

Preferably, the processing module specifically includes:

the mapping module is used for dividing the service to be processed into a plurality of subtasks, inputting the subtasks into a Map frame, and respectively performing mapping calculation processing to obtain intermediate data sets with the number corresponding to the subtasks;

the second classification module is used for classifying the intermediate data set so as to obtain one or more product versions of each intermediate data in the intermediate data set; wherein, each product version in the intermediate data set corresponds to each product version in the service to be processed one by one;

and the reduction module is used for inputting each product version of the intermediate data set into the corresponding distributed Reduce task for reduction calculation processing based on the mapping relation between each product version of the service to be processed and the Reduce task.

Preferably, the second classification module specifically includes:

the fourth classification submodule is used for classifying each intermediate data in the intermediate data set according to the log classification to obtain a third classification result in each log classification;

the fifth classification submodule is used for classifying the third classification result in each log category according to the log service ID to obtain the fourth classification result in each log service ID;

and the sixth classification submodule is used for classifying the fourth classification result in each log service ID according to the product version so as to determine one or more types of product versions corresponding to each piece of intermediate data in the intermediate data set.

Preferably, the data volume of each product version of the service to be processed is the visit volume of each product version of the service to be processed.

The invention discloses a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

The invention discloses a computer device, comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the method when executing the program.

Through one or more technical schemes of the invention, the invention has the following beneficial effects or advantages:

the invention discloses a business processing method and a system, which classify businesses to be processed so as to determine one or more product versions corresponding to the businesses to be processed; then predict the data volume of each product version of pending operation, because the data volume of each product version is exactly the data volume that follow-up Reduce reduction task was handled, so can be based on the data volume of each product version of pending operation applies for the Reduce reduction task that corresponds quantity, can avoid applying for the uneven problem of resource distribution that too much or too little Reduce reduction task leads to, reach the purpose that the pending operation was handled to reasonable distribution Reduce task, finally based on the Reduce reduction task that corresponds quantity, it is right pending operation carries out the distributed processing.

The above description is only an overview of the technical solutions of the present invention, and the present invention can be implemented in accordance with the content of the description so as to make the technical means of the present invention more clearly understood, and the above and other objects, features, and advantages of the present invention will be more clearly understood.

Drawings

Various additional advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 shows a flow diagram of a method of business processing according to one embodiment of the invention;

FIG. 2 shows a cross-number diagram after pending traffic classification according to one embodiment of the present invention;

fig. 3 shows a schematic diagram of a traffic processing system according to an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The embodiment of the invention provides a service processing method and a service processing system, which are used for solving the technical problem of resource waste in the prior art.

Referring to fig. 1, an embodiment of the present invention discloses a service processing method, including:

and 11, classifying the services to be processed, and determining one or more product versions corresponding to the services to be processed.

The classification parameters of the present embodiment generally include: log type, log service ID, product version. The log category is a level 1 category, the log service ID is positioned below the log category and belongs to a level 2 category, and the product version belongs to a level 3 category below the log service ID. Therefore, in the process of classifying the service to be processed, the service to be processed is firstly classified according to the log categories to obtain a first classification result in each log category, that is, a plurality of log categories are included in the service to be processed, for example, one service to be processed includes the log category 1 and the log category 2, so that the service to be processed is classified according to the log categories, and then the first classification result in each log category is obtained. However, each log category may include a plurality of log service IDs, for example, log category 1 includes ID1 and ID2, so in each log category, the first classification result in each log category is classified according to the log service ID, and the second classification result in each log service ID is obtained. Further, each log service ID further includes one or more product versions, for example, ID1 includes product version 1 and product version 2, so that in each log service ID, the second classification result in each log service ID is classified according to the product version to determine one or more product versions corresponding to the service to be processed.

Therefore, the to-be-processed business is actually composed of one or more product versions, and therefore after classification, the to-be-processed business is finally classified into one or more product versions. During the classification processing, the log types, the log service IDs and the product versions are classified in sequence, so that the purpose is to perform the classification, because different log types are not necessarily on the same reduce task (the reduce task is processed facing to logs of different types), but different product versions are possibly on the same reduce task, the classification must be performed in sequence according to the log types, the log service IDs and the product versions, and the classification cannot be combined and divided at will.

The present embodiment includes 18 log categories, for example: activity, activity times, net, netReceive, netLinkTime, netErrcode, netTimes, memory, monitor, filinfo, func, io, processinfo, anr, block, cpu, fps, netMidSend, netMidReceive, and the like.

Each log category in turn has one or more log service IDs, such as: mobile safe, clean _ droid, ganme _ union, etc., will increase continuously according to the increase of access products.

There are one or more product versions for each log service ID, and the version number increases as the products iterate for each product.

Referring to table 1, in order to better illustrate the classification result, the present embodiment is a specific example of classifying according to the log category, the log service ID, and the product version.

As can be seen from table 1, a service to be processed in this embodiment finally includes 4 product versions, which are: product version 1, product version 2, product version 3 and product version 4. Each product version has a respective number.

TABLE 1

As can be seen from table 1, actually, the "log category" and the "log service ID" are mainly classified, and the data amount is not calculated after the "log category" and the "log service ID" are classified, but the task amount is calculated only after the product version is classified. Therefore, the log category and the log service ID are used for distinguishing, are similar to a multi-way tree, and are effectively leaf nodes (product versions).

Referring to fig. 2, a schematic diagram of a cross tree formed by summing the partitions of the traffic to be processed is shown. It should be noted that the cross tree in fig. 2 is only used to more intuitively explain the classification of the to-be-processed service in this embodiment, and without any limitation, the representation after classification may have a plurality of representations, such as a list, a set, and the like, besides the number of crosses.

After one or more product versions of the pending business are determined, the following steps are performed.

And step 12, estimating the data volume of each product version of the service to be processed.

After one or more product versions of the service to be processed are determined, each product version has its own data size, for example, product version 1 is two in table 1, the sum of the data sizes is 5M, product version 2 is 10M, and so on. The data amount of the product version here is calculated by category, not by number.

And the data volume of each product version of the service to be processed is the visit volume of each product version of the service to be processed.

And step 13, applying for corresponding quantity of Reduce reduction tasks based on the data volume of each product version of the service to be processed.

In the process of applying for the Reduce reduction task in this embodiment, the number of applications to be made for the Reduce reduction task is determined based on the data volume of each product version of the service to be made; and then applying for the Reduce reduction tasks with corresponding quantity based on the quantity to be applied of the Reduce reduction tasks.

Specifically, the data volume of each product version of the service to be processed and the number of applications to be made by the Reduce reduction task have a corresponding relationship, and in the process of determining the relationship between the data volume of each product version of the service to be processed and the number of applications to be made by the Reduce reduction task, the data volume of each product version of the service to be made by the following method can be used:

and judging whether the data volume of each product version of the service to be processed is greater than a preset data volume threshold value. The preset data volume threshold is obtained by calculation according to the resource threshold of the Reduce reduction task, and the preset data volume threshold is the data volume which can be processed by a single Reduce reduction task. The preset data volume threshold is obtained by the following steps: determining the data volume which can be processed by a single Reduce reduction task according to the resource threshold of the Reduce reduction task, and determining the data volume which can be processed by the single Reduce reduction task as the preset data volume threshold. After the data volume which can be processed by the single Reduce reduction task is determined, the data volume of each product version of the service to be processed is compared one by one based on the data volume which can be processed by the single Reduce reduction task, and whether the data volume of each product version of the service to be processed is larger than a preset data volume threshold value or not is judged.

And if so, distributing the corresponding Reduce reduction task to the first product version which is larger than the preset data volume threshold value. Specifically, if the data volume of a certain product version is greater than a preset data volume threshold value in each product version of the service to be processed, the product version is named as a first product version, so that the first product version refers to a single product version whose data volume is greater than the preset data volume threshold value.

And if not, mutually combining two or more types of second product versions smaller than the preset data volume threshold value into a third product version, and distributing corresponding Reduce reduction tasks to the third product version. Specifically, the second product version refers to a single product version having a data volume smaller than a preset data volume threshold, and because the data volume is small, if a Reduce reduction task is allocated to the second product version alone, the resource waste of the Reduce reduction task may be caused, two or more types of second product versions may be combined to form a third product version, and thus the third product version refers to a set of two or more types of second product versions having a data volume smaller than the preset data volume threshold, and further, the data volume of the third product version obtained after the combination is the sum of the data volumes of the two or more types of second product versions. And after the third product version is obtained, distributing the corresponding Reduce reduction task to the third product version, further, in the specific distribution process of distributing the corresponding Reduce reduction task to the third product version, firstly judging whether the difference value between the data volume of the third product version and the preset data volume threshold value is in a preset range, and if so, distributing the corresponding Reduce reduction task to the third product version. And the difference value between the data volume of the third product version and the preset data volume threshold value is within a preset range. The reason why the difference between the data volume of the third product version and the preset data volume threshold is limited within the preset range and the corresponding Reduce reduction task is distributed to the third product version is to ensure that the difference between the data volume of the third product version and the data volume of the preset data volume threshold is very small, so that the Reduce reduction task can be reasonably applied. The situation that the processing amount of the applied Reduce reduction task is not matched with the third product version due to too large difference is avoided. If the data size of the third product version is too large compared with the preset data size threshold, the processing time of the Reduce reduction task is very long, and if the data size of the third product version is too small compared with the preset data size threshold, the resource waste of the Reduce reduction task is caused. Therefore, it is necessary to ensure that the difference between the data size of the third product version and the preset data size threshold is within a preset range, for example, [ 3m,3m ].

And after the corresponding Reduce reduction task is assigned to the first product version, and the corresponding Reduce reduction task is assigned to the third product version, and obtaining the mapping relation between each product version of the service to be processed and the Reduce reduction task based on the Reduce reduction task corresponding to the first product version and the Reduce reduction task corresponding to the third product version. Specifically, after corresponding Reduce reduction tasks are allocated to each type of product version (possibly, two or more types of product versions correspond to one same Reduce reduction task), each type of product version corresponds to each Reduce reduction task, and then the mapping relation between each product version of the service to be processed and each Reduce reduction task can be obtained.

Referring to table 2, assume that there are 4 types of product versions in the service to be processed, which are product version 1, product version 2, product version 3, and product version 4, respectively. And the product version 1 and the product version 2 respectively correspond to a Reduce reduction task, namely the Reduce reduction task 1 and the Reduce reduction task 2. And product version 3 and product version 4 correspond to the same Reduce reduction task, and the number is assumed to be Reduce reduction task 3.

TABLE 2

First product version	Product version 1	Reduce task 1
			First product version	Product version 2	Reduce reduction task 2
Third product edition	Product version 3 and product version 4	Reduce reduction task 3

As an optional embodiment, after the corresponding Reduce reduction tasks are distributed to the first product version and the corresponding Reduce reduction tasks are distributed to the third product version, the number of the Reduce reduction tasks corresponding to the first product version and the number of the Reduce reduction tasks corresponding to the third product version are counted, and the number to be applied for the Reduce reduction tasks can be determined. Referring to table 2, the number of to-be-applied tasks of the Reduce reduction task can be counted to be 3 according to the number of the Reduce reduction tasks corresponding to each product version.

And step 14, performing distributed processing on the service to be processed based on the corresponding number of Reduce reduction tasks.

The steps of distributed processing generally include: classifying; map mapping processing; reduce reduction processing. Therefore, the distributed processing of the present embodiment is generally as follows:

and dividing the service to be processed into a plurality of subtasks, inputting the subtasks into a Map framework, and performing mapping calculation processing respectively to obtain intermediate data sets with the number corresponding to the plurality of subtasks. The service to be processed may be randomly divided into a plurality of subtasks, for example, mapReduce may first divide data into a plurality of key/value first key-value pairs. The Map frame is then entered to get (new key/value key-value pairs) second key-value pairs, which are the intermediate data of this embodiment. After the Map frame mapping processing, the intermediate data set comprises one or more intermediate data, each Map frame mapping processing obtains one intermediate data, and each Map frame mapping processing obtains the intermediate data set.

Further, the intermediate data sets are classified, and one or more product versions of each intermediate data in the intermediate data sets are obtained.

In the classification process, for each intermediate data, the intermediate data are classified according to the log category, the log service ID and the product version.

Specifically, each intermediate data in the intermediate data set is classified according to log categories respectively to obtain a third classification result in each log category. That is, each intermediate data includes one or more log categories, so that each intermediate data is classified according to the log categories, that is, the classification of each intermediate data is independent of the classification of other intermediate data and is processed independently. For example, the intermediate data a includes two log categories, log category A1 and log category A2. After the intermediate data a is classified into log categories, a third classification result in each log category, namely, into a log category A1 and a log category A2, is obtained. The intermediate data B includes three log categories, which are a log category B1, a log category B2, and a log category B3. After the intermediate data B is classified into log categories, a third classification result in each log category, that is, into a log category B1, a log category B2, and a log category B3, is obtained.

Since the intermediate data are processed independently of each other and are unrelated to each other, the log types are also processed independently of each other. And classifying the third classification result in each log class according to the log service ID to obtain a fourth classification result in each log service ID.

And then classifying the fourth classification result in each log service ID according to the product version to determine one or more types of product versions corresponding to each intermediate data in the intermediate data set.

As can be seen from the above description, taking an intermediate data a as an example, the intermediate data a is classified according to log categories, and a third classification result in each log category is obtained; classifying the third classification result in each log category according to the log service ID to obtain a fourth classification result in each log service ID; and classifying the fourth classification result in each log service ID according to the product version to determine one or more types of product versions corresponding to the intermediate data A.

Each intermediate data is processed independently of the other intermediate data, so that each intermediate data is processed as described above.

However, since the intermediate data is actually obtained after Map mapping processing is performed on the service to be processed, and since the classification manner is the same, after classification, the product versions in the intermediate data set and the product versions in the service to be processed are actually in one-to-one correspondence. The intermediate data is actually only data obtained after Map frame mapping processing is performed on the service to be processed, so that if the classification mode is the same, the obtained product version is the same as the product version obtained by directly classifying the service to be processed. In addition, the mapping relation between each product version and the Reduce reduction task is the same, so that each product version of the intermediate data set can be input into the corresponding distributed Reduce reduction task for reduction calculation processing based on the mapping relation between each product version of the service to be processed and the Reduce reduction task.

In this embodiment, the data volume of each product version of the service to be processed is calculated in advance, so that the number of Reduce reduction tasks to be applied is determined, the Reduce reduction tasks are reasonably applied, the problem of uneven resource distribution caused by too many or too few Reduce reduction tasks can be avoided, and the purpose of reasonably distributing the Reduce reduction tasks to process the service to be processed is achieved.

In addition, in the process of distributed processing, the pending business is processed by using the applied Reduce reduction task, so that the processing efficiency can be improved.

Based on the same inventive concept, referring to fig. 3, the present embodiment further discloses a service processing system, which includes:

the first classification module 31 is configured to classify services to be processed, so as to determine one or more product versions corresponding to the services to be processed;

the estimation module 32 is used for estimating the data volume of each product version of the service to be processed;

an application module 33, configured to apply for a corresponding number of Reduce reduction tasks based on the data volume of each product version of the service to be processed;

and the processing module 34 is configured to perform distributed processing on the to-be-processed service based on the Reduce reduction tasks of the corresponding number.

As an alternative embodiment, the classification parameter comprises: log type, log service ID, product version;

the first classification module 31 specifically includes:

the second classification submodule is used for classifying the first classification result in each log type according to the log service ID to obtain a second classification result in each log service ID;

As an optional embodiment, the application module 33 specifically includes:

the determining module is used for determining the number to be applied of the Reduce reduction task based on the data volume of each product version of the service to be processed;

and the application submodule is used for applying the Reduce reduction tasks with corresponding quantity based on the quantity to be applied of the Reduce reduction tasks.

As an optional embodiment, the determining module specifically includes:

and the counting module is used for counting the quantity of Reduce reduction tasks corresponding to the first product version and the quantity of Reduce reduction tasks corresponding to the third product version and determining the quantity to be applied of the Reduce reduction tasks.

As an alternative embodiment, the preset data amount threshold is obtained by: determining the data volume which can be processed by a single Reduce reduction task according to the resource threshold of the Reduce reduction task, and determining the data volume which can be processed by the single Reduce reduction task as the preset data volume threshold.

As an optional embodiment, the processing module 34 specifically includes:

and the reduction module is used for inputting each product version of the intermediate data set into the corresponding distributed Reduce tasks for reduction calculation processing based on the mapping relation between each product version of the service to be processed and the Reduce reduction tasks.

As an optional embodiment, the second classification module specifically includes:

and the sixth classification submodule is used for classifying the fourth classification result in each log service ID according to the product version so as to determine one or more types of product versions corresponding to each intermediate data in the intermediate data set.

As an optional embodiment, the data volume of each product version of the to-be-processed service is an access volume of each product version of the to-be-processed service.

Based on the same inventive concept as in the previous embodiments, embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of any of the methods described above.

Based on the same inventive concept as in the previous embodiments, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the steps of any one of the methods when executing the program.

Through one or more embodiments of the invention, the invention has the following advantages or advantages:

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system is apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the devices in an embodiment may be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components of a gateway, proxy server, system according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The invention discloses, A1, a service processing method, characterized in that the method comprises:

The method as claimed in A1, wherein the classification parameters include: log type, log service ID, product version;

A3, the method as in A1, wherein the applying for the Reduce reduction tasks of corresponding quantities based on the data volume of each product version of the service to be processed specifically includes:

A4, as in A3, the method, wherein the determining the number of applications to be made for the Reduce reduction task based on the data volume of each product version of the service to be made, specifically includes:

if not, mutually combining two or more types of second product versions smaller than the preset data volume threshold value into a third product version, and distributing corresponding Reduce reduction tasks to the third product version; the difference value between the data volume of the third product version and the preset data volume threshold value is within a preset range;

the quantity of the Reduce reduction tasks corresponding to the first product version is counted and the quantity of the Reduce reduction tasks corresponding to the third product version is counted, and the quantity to be applied for of the Reduce reduction tasks is determined.

The method of A5, the method of A4, wherein the preset data volume threshold is obtained by: determining the data volume which can be processed by a single Reduce reduction task according to the resource threshold of the Reduce reduction task, and determining the data volume which can be processed by the single Reduce reduction task as the preset data volume threshold.

A6, as in the method described in A4, wherein the performing distributed processing on the service to be processed based on the Reduce reduction tasks of the corresponding number specifically includes:

classifying the intermediate data set to further obtain one or more product versions of each intermediate data in the intermediate data set; wherein, each product version in the intermediate data set corresponds to each product version in the service to be processed one by one;

The method A7 according to A6, wherein the classifying the intermediate data set specifically includes:

classifying the third classification result in each log class according to the log service ID to obtain a fourth classification result in each log service ID;

and classifying the fourth classification result in each log service ID according to the product version to determine one or more types of product versions corresponding to each piece of intermediate data in the intermediate data set.

The method A8 is characterized in that the data volume of each product version of the service to be processed is as follows: the visit volume of each product version of the service to be processed.

B9, a service processing system, characterized in that the system includes:

the application module is used for applying for the Reduce reduction tasks with corresponding quantity based on the data volume of each product version of the service to be processed;

and the processing module is used for carrying out distributed processing on the to-be-processed service based on the Reduce reduction tasks with the corresponding quantity.

The system according to B10 or B8, wherein the classification parameters include: log type, log service ID, product version;

the first classification module specifically includes:

B11, the system as described in B8, wherein the application module specifically includes:

B12, the system according to B11, wherein the determining module specifically includes:

B13. The system according to B12, wherein the preset data amount threshold is obtained by: determining the data volume which can be processed by a single Reduce reduction task according to the resource threshold of the Reduce reduction task, and determining the data volume which can be processed by the single Reduce reduction task as the preset data volume threshold.

B14, the system as claimed in B12, wherein the processing module specifically includes:

the mapping module is used for dividing the service to be processed into a plurality of subtasks and inputting the subtasks into a Map framework for mapping calculation processing respectively to obtain intermediate data sets with the number corresponding to the subtasks;

the second classification module is used for classifying the intermediate data set so as to obtain one or more product versions of each intermediate data in the intermediate data set; the intermediate data set comprises a plurality of intermediate data sets, wherein each product version in the intermediate data set corresponds to each product version in the service to be processed one by one;

B15, the system as set forth in B14, wherein the second classification module specifically includes:

the fourth classification submodule is used for classifying each piece of intermediate data in the intermediate data set according to the log categories to obtain a third classification result in each log category;

The system according to B16 or B8, wherein the data volume of each product version of the service to be processed is a visit volume of each product version of the service to be processed.

C17, a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, carries out the steps of the method according to any one of A1-A8.

C18, a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method according to any one of A1-A8 when executing the program.

Claims

1. A method for processing a service, the method comprising:

performing distributed processing on the service to be processed based on the Reduce reduction tasks of the corresponding number;

based on the data volume of each product version of pending service, apply for the Reduce reduction task of corresponding quantity, specifically include:

determining the number of to-be-applied tasks of the Reduce reduction task based on the data volume of each product version of the to-be-processed service;

applying for the corresponding number of Reduce reduction tasks based on the number to be applied of the Reduce reduction tasks;

based on the data volume of each product version of the service to be processed, the number to be applied for of the Reduce reduction task is determined, and the method specifically comprises the following steps:

if yes, distributing a corresponding Reduce reduction task to the first product version which is larger than the preset data volume threshold;

2. The method of claim 1, wherein the classification parameters comprise: log type, log service ID, product version;

3. The method of claim 1, wherein the preset data volume threshold is obtained by: determining the data volume which can be processed by a single Reduce reduction task according to the resource threshold of the Reduce reduction task, and determining the data volume which can be processed by the single Reduce reduction task as the preset data volume threshold.

4. The method of claim 1, wherein the performing distributed processing on the to-be-processed service based on the corresponding number of Reduce reduction tasks specifically comprises:

5. The method of claim 4, wherein the classifying the intermediate data set specifically comprises:

classifying each intermediate data in the intermediate data set according to a log category to obtain a third classification result in each log category;

6. The method of claim 1, wherein the data volume for each product version of the pending transaction is: the visit volume of each product version of the service to be processed.

7. A transaction system, the system comprising:

the processing module is used for performing distributed processing on the service to be processed based on the corresponding number of Redce reduction tasks;

the application module specifically comprises:

the application submodule is used for applying for the Reduce reduction tasks with corresponding quantity based on the quantity to be applied of the Reduce reduction tasks;

the determining module specifically includes:

8. The system of claim 7, wherein the classification parameters comprise: log type, log service ID, product version;

the first classification module specifically includes:

9. The system of claim 7, wherein the preset data volume threshold is obtained by: determining the data volume which can be processed by a single Reduce reduction task according to the resource threshold of the Reduce reduction task, and determining the data volume which can be processed by the single Reduce reduction task as the preset data volume threshold.

10. The system of claim 7, wherein the processing module specifically comprises:

11. The system according to claim 10, wherein the second classification module specifically comprises:

12. The system of claim 7, wherein the data volume for each product version of the pending transaction is a visit volume for each product version of the pending transaction.

13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.

14. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1-6 are implemented when the program is executed by the processor.