Disclosure of Invention
The present invention is directed to a system for merging multi-channel data chain processing, so as to solve the problems in the background art.
In order to achieve the above object, the present invention provides a system for fusing multi-channel data chain processing, which includes a unified receiving unit, a responsibility chain processing unit, a data chain processing unit and a service chain storage unit;
the uniform receiving unit is used for providing a uniform chain type processing inlet and uniformly receiving data of an input end;
the responsibility chain processing unit is used for splitting the data received by the unified receiving unit into a plurality of data blocks according to the complexity of the service;
the data chain processing unit is used for sequentially receiving the plurality of data blocks of the responsibility chain processing unit and enabling the data blocks to sequentially enter a plurality of processing links for processing, and each processing link comprises a data checking link, a data cleaning link, a data processing link and a data fusion link;
the service chain storage unit is used for storing the data blocks processed by the data chain processing unit, delivering the processing result to the service chain, and continuously executing the service chain downwards until the service chain is finished and then returning the related information processed by the requester.
When the data processing method is used specifically, multi-branch data enters a unified service data processing inlet of a unified receiving unit for processing and then is output as a first service, then the first service is subjected to responsibility chain processing through a responsibility chain processing unit, the first service is divided into a plurality of small data according to a flow sequence, then data checking 1-N processing, data cleaning 1-N processing, data processing 1-N processing and data processing 1-N processing are sequentially performed on a data block through the data chain processing unit, then the processed data block is output as a second service, and finally the second service is stored through a service chain storage unit.
As a further improvement of the technical solution, the unified receiving unit uses a gateway to coordinate and unify the request distribution routes, and processes the request through a plurality of filters, including a pre-filter, a route filter, a post-filter, and an error filter.
As a further improvement of the technical solution, the responsibility chain processing unit includes a complexity detection module and a data segmentation module;
the complexity detection module is used for calculating the memory occupied by the received data processing;
the data segmentation module is used for receiving the result of the complexity detection module and segmenting the data into a plurality of data blocks, wherein the data blocks process the data blocks, and the occupied memory of the data blocks is smaller than the limit value of the processing memory.
Assuming that the memory occupied by the memory value of the received data through the processing calculated by the complexity detection module is m and the processing memory limit value is n, the received data needs to be divided into d blocks of data through the data dividing module, the memory occupied by the processing of each block of data is m/k, and m/k is less than or equal to n, so that the processing of subsequent data is easier, the processing operation overload caused by the fact that the memory occupied by the processing of the received data is greater than the processing memory limit value is avoided, and the stability of the system is improved.
As a further improvement of the technical solution, a calculation formula of the complexity detection module is as follows:
m=n*m1
where m is the occupied memory for receiving data, n is the number of data elements, and m1 is the size of each data element.
As a further improvement of the technical solution, the data segmentation module adopts a clustering segmentation algorithm, and includes the following steps:
inputting received data, and classifying the received data by adopting K-means clustering;
determining an adaptability value function according to the clustering center, and terminating classification if the adaptability value of the classified data is greater than a specified threshold value;
and dividing the classified data to generate a plurality of data blocks.
As a further improvement of the technical solution, the K-means clustering calculation formula is:
wherein x is a new clustering center, j is a data element code, n is a data element number, and p is an adaptability value.
As a further improvement of the technical solution, the data link processing unit includes a data verification module, a data cleaning module, a data processing module, and a data fusion module;
the data checking module is used for verifying the integrity of data in the data block in the data checking link;
the data cleaning module is used for rechecking the data in the data block in the data cleaning link, deleting repeated information, correcting existing errors and providing data consistency;
the data processing module is used for further processing the data after the data cleaning module in the data processing link and converting the data into a standard, clear and easily analyzed structure;
the data fusion module is used for receiving the data blocks processed by the data processing module in a data fusion link, synthesizing a plurality of fields of the data blocks into a new field and acquiring the processed data.
As a further improvement of the technical solution, the data chain processing unit adopts a chain processing algorithm, and includes the following steps:
the data blocks enter a plurality of processing links in sequence;
detecting whether a data block submitted by a requester in a current processing link completes a processing task;
otherwise, continuing to process data in the current processing link until a completion signal is sent out, and delivering the data block to the next processing link in the current processing link;
and repeating the steps until the data block sequentially passes through a plurality of processing links to output data.
As a further improvement of the technical solution, the service chain storage unit adopts a chain storage structure.
Compared with the prior art, the invention has the beneficial effects that:
1. in the system fusing multi-channel data chain processing, the responsibility chain processing unit splits the received data into a plurality of data blocks according to the complexity of the service, so that the subsequent data processing is easier, the phenomenon that the processing operation overload is caused because the memory occupied by the received data processing is larger than the processing memory limit value is avoided, and the stability of the system is improved.
2. In the system for fusing multi-channel data chain processing, the data chain processing unit sequentially receives the data blocks, so that the data blocks sequentially enter the processing links to be processed, if modification and maintenance are carried out in one of the processing links, only development codes corresponding to the current link need to be modified, the operation is convenient, the modification difficulty is reduced, and the practicability is improved.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Referring to fig. 1-3, the present embodiment provides a system for merging multi-channel data chain processing, which includes a unified receiving unit 100, a responsibility chain processing unit 200, a data chain processing unit 300, and a service chain storage unit 400;
the unified receiving unit 100 is used for providing a unified chained processing entry and uniformly receiving data at an input end;
specifically, the unified receiving unit 100 uses a gateway to coordinate and unify request distribution routes, and processes the request through a plurality of filters, including a pre-filter, a route filter, a post-filter, and an error filter;
the pre-filter is used for processing general transactions, such as authentication, current limiting, fusing degradation and caching;
the route filter is used for protocol conversion and routing work;
the post filter is used for receiving response information returned by the routing filter, making response statistics and log records, and facilitating subsequent query of data receiving records;
the error filter is used for processing errors, and when the three filters are abnormal, the error filter is used for filtering error data;
therefore, considering that data are scattered and enter for processing, the fault phenomenon of the data is caused, and the integration and the connection of subsequent data are not facilitated, in order to process all data in a centralized mode, the gateway coordinates and unifies a request distribution path through the unified receiving unit 100, a unified chain type processing inlet is established, the data of the input end is received in a unified mode, the next operation is carried out after all the data to be processed are received, the integrity of the data is guaranteed, the data are conveniently processed in a unified mode, and the practicability is higher.
The responsibility chain processing unit 200 is configured to split the data received by the unified receiving unit 100 into a plurality of data blocks according to the complexity of the service itself;
in order to improve the stability of the system and avoid overload operation, the responsibility chain processing unit 200 includes a complexity detection module and a data segmentation module;
the complexity detection module is used for calculating the memory occupied by the received data processing;
the data segmentation module is used for receiving the result of the complexity detection module and segmenting the data into a plurality of data blocks, wherein the data blocks process the data blocks, and the occupied memory of the data blocks is smaller than the limit value of the processing memory.
Assuming that the memory occupied by the memory value of the received data through the processing calculated by the complexity detection module is m and the processing memory limit value is n, the received data needs to be divided into d blocks of data through the data dividing module, the memory occupied by the processing of each block of data is m/k, and m/k is less than or equal to n, so that the processing of subsequent data is easier, the processing operation overload caused by the fact that the memory occupied by the processing of the received data is greater than the processing memory limit value is avoided, and the stability of the system is improved.
The calculation formula of the complexity detection module is as follows:
m=n*m1
wherein m is an occupied memory of received data, n is the number of data elements, and m1 is the size of each data element;
specifically, the received data is composed of p data elements, and the memory occupied by the received data can be obtained only by detecting the size of each data element, so that the integrity of the system is ensured.
The data segmentation module adopts a clustering segmentation algorithm and comprises the following steps:
inputting received data, and classifying the received data by adopting K-means clustering;
determining an adaptability value function according to the clustering center, and terminating classification if the adaptability value of the classified data is greater than a specified threshold value;
dividing the classified data to generate a plurality of data blocks;
specifically, the received data are classified firstly, and then an adaptability value function is determined according to the classified clustering center, so that the adaptability value of the classified data is greater than the adaptability value of a specified threshold value, namely, the compatibility of the state structure of the classified data is realized, if the adaptability value of the data is less than the specified threshold value, the state structure of the data is not compatible with the system, the data cannot be operated, otherwise, the data can be operated in the system, the larger the adaptability value is, the better the segmentation effect of the data is, then the classified data are divided, a plurality of data blocks are generated, the data segmentation precision is improved, and the follow-up operation can be effectively realized.
The K-means clustering calculation formula is as follows:
wherein, x is a new clustering center, j is data element code, n is data element number, p is adaptability value, and data are divided clearly by forming a new clustering center;
specifically, the K-means clustering comprises the following steps: firstly, dividing data of n data elements into K groups in advance, randomly selecting K objects as initial clustering centers, then calculating the distance between each object and each initial clustering center as a convergence condition, allocating each object to the nearest clustering center, representing a cluster by the clustering center and the objects allocated to the clustering center, allocating a sample, recalculating the clustering center of the cluster according to the existing objects in the cluster, and repeating the process continuously until the data adaptability value of the classification is larger than a specified threshold value, namely the state structure of the data is compatible with the system, finishing the classification of the data, and facilitating the subsequent accurate data division.
The data chain processing unit 300 is configured to sequentially receive a plurality of data blocks of the responsibility chain processing unit 200, and sequentially enter the data blocks into a plurality of processing links for processing, where the processing links include a data check link, a data cleaning link, a data processing link, and a data fusion link;
specifically, the data chain processing unit 300 includes a data checking module, a data cleaning module, a data processing module, and a data fusion module;
the data checking module is used for verifying the integrity of data in the data block in the data checking link, calculating a checking value of the original data by using a specified algorithm, calculating a primary checking value by using the same algorithm at a receiver, and if the primary checking value is the same as the checking value provided with the data, indicating that the data is complete;
the data cleaning module is used for rechecking the data in the data block in the data cleaning link, deleting repeated information, correcting existing errors and providing data consistency;
in particular, because the data is a collection of data oriented to a certain subject, the data is extracted from a plurality of business systems and contains historical data, so that the problems that some data are wrong data and some data conflict with each other are avoided, such erroneous or conflicting data is obviously undesirable, and therefore, the data cleansing module is employed to check whether the data is satisfactory or not according to the reasonable value ranges and interrelations of each variable, and to find out data that is out of the normal range, logically unreasonable or contradictory, and at the same time, due to investigation, coding and logging errors, some invalid and missing values may be present in the identification data, then the data is input into a data cleaning processor, the data is cleaned up through estimation, whole case deletion, variable deletion, paired deletion and the like, and the cleaned up data is output in a desired format.
The data processing module is used for further processing the data after the data cleaning module in the data processing link and converting the data into a standard, clear and easily analyzed structure;
specifically, the data conversion includes:
structure conversion, in data analysis, according to different service requirements, data (or sampling data) needs to be subjected to structure conversion, which mainly refers to conversion between a one-dimensional data table and a two-dimensional data table;
in line-column conversion, data analysis reporting is often performed by observing data from different dimensions, such as time dimension, or regional dimension, which requires line-column data conversion, also called transposition.
The data fusion module is used for receiving the data blocks processed by the data processing module in the data fusion link, synthesizing a plurality of fields of the data blocks into a new field and acquiring the processed data.
In summary, the data blocks are processed and processed sequentially through the data checking module, the data cleaning module, the data processing module and the data fusion module, so that the data blocks can be processed sequentially in different links, and when a service change occurs in one of the service flows, only the service code in the corresponding flow needs to be changed;
for example, when a group of data blocks are processed, the data blocks sequentially pass through the data verification module, the data cleaning module, the data processing module and the data fusion module, so that the data blocks are verified in the data verification module, the data blocks are transmitted to the data cleaning module to delete useless data after verification is finished, then the data blocks are transmitted to the data processing module to be further processed, finally the data blocks are gathered in the data fusion module, the data blocks are combined to obtain processed data, and if the method and the flow for verification need to be modified and maintained in the data verification process in a data verification link, only corresponding development codes need to be modified in the data verification module.
The service chain storage unit 400 is configured to store the data blocks processed by the data chain processing unit 300, deliver the processing result to the service chain, and continue to execute the service chain downward until the service chain is finished and return the related information to the requester for processing.
The service chain storage unit 400 adopts a chain storage structure, the logic relationship between the data elements is embodied by using pointers, and by using the structure, the storage units of the data elements are not required to be continuous, namely two logically adjacent elements can be stored in physically non-adjacent storage units, and nodes of the nonlinear relationship can be represented in a linear addressing memory;
the chain type storage structure is mainly characterized in that:
the nodes comprise an information field for storing the self information of the data elements and a pointer field for representing the link information among the data elements, so that the storage density is lower than that of a sequential storage structure, and the utilization rate of a storage space is lower;
the logically adjacent data elements are not necessarily physically adjacent, and can be used for storing various logical structures such as linear tables, trees, graphs and the like;
the insertion and deletion operations are flexible, and the data elements do not need to be moved, and only the values of the pointer fields in the nodes need to be changed, so that the data can be effectively stored.
When the data processing method is used specifically, as shown in fig. 2, multi-branch data enters a unified service data processing inlet of a unified receiving unit 100 for processing and then is output as a service a, then the service a is subjected to responsibility chain processing by a responsibility chain processing unit 200, so that the service a is divided into a plurality of small data according to a flow sequence, then data verification 1-N processing, data cleaning 1-N processing, data processing 1-N processing and data processing 1-N processing are sequentially performed on a data block by a data chain processing unit 300, then the processed data block is output as a service b, and finally the service b is stored by a service chain storage unit 400.
Example 2
Considering that when a data block is processed sequentially by a data verification module, a data cleaning module, a data processing module, and a data fusion module, it cannot be ensured that when the current data block completes the processing flow of one of the links, other data blocks enter, and the data processing effect is affected, the embodiment is different from embodiment 1 in that:
the data chain processing unit 300 adopts a chain processing algorithm, and comprises the following steps:
the data blocks enter a plurality of processing links in sequence;
detecting whether a data block submitted by a requester in a current processing link completes a processing task;
otherwise, continuing to process data in the current processing link until a completion signal is sent out, and delivering the data block to the next processing link in the current processing link;
and repeating the steps until the data block sequentially passes through a plurality of processing links to output data.
Therefore, the data blocks can be ensured to enter a plurality of processing links in sequence, and the next data block can enter after the processing task of the current processing link is completed, so that the data processing order is improved.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.