CN104793997B - A kind of data processing equipment and method - Google Patents

A kind of data processing equipment and method Download PDF

Info

Publication number
CN104793997B
CN104793997B CN201410023109.7A CN201410023109A CN104793997B CN 104793997 B CN104793997 B CN 104793997B CN 201410023109 A CN201410023109 A CN 201410023109A CN 104793997 B CN104793997 B CN 104793997B
Authority
CN
China
Prior art keywords
data
row
row data
arrays
mark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410023109.7A
Other languages
Chinese (zh)
Other versions
CN104793997A (en
Inventor
吴万里
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201410023109.7A priority Critical patent/CN104793997B/en
Publication of CN104793997A publication Critical patent/CN104793997A/en
Application granted granted Critical
Publication of CN104793997B publication Critical patent/CN104793997B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of data processing equipment and method, to simplify processing procedure of the thread to mass data file, improves thread to the processing speed of file, reduces the consumption of computer resource, improve the calculating speed of hash algorithm.The method of the present invention includes:To arbitrary data line in pending file, the characteristic value of the row data is determined according to the key message of pre-set the row data, the element of the mark for storage line data in pre-set structure of arrays is navigated to according to this feature value;Whether the element navigated to described in judgement is occupied, if it is, the row data and the row data occupied corresponding to the mark of the element are determined as the row data for meeting preset condition;Otherwise, the mark of the row data is stored into the element.

Description

A kind of data processing equipment and method
Technical field
The present invention relates to the communications field more particularly to a kind of data processing equipment and methods.
Background technology
When handling mass data, it usually needs re-scheduling processing is carried out to mass data, i.e., in mass data, Related two or more data segments are searched, and do phase to the related data segment according to the key message of each data segment It should handle.
By taking data auditing business procedure as an example, it should mainly include CRM system in the process(CRM, Customer Relationship Management), Convergence Billing System(CBS, Convergent Billing System)It is with checking System, CRM and CBS are responsible for providing a large amount of data file, and audit system is responsible for that these data files are carried out to check processing, i.e. root Related two or more data segments in All Files are searched, and analyze associated number according to the key message of each data segment According to the identical and different part of information between section, associated data segment is then formed into report file, last audit system should Report file feeds back to CRM and CBS, and CRM and CBS do data correction to it again.
Due to needing the data volume for carrying out re-scheduling processing bigger, to save the software and hardware resources of computer, generally use The input of random length can be converted into the regular length in specified numberical range by Hash (Hash) algorithm by hash algorithm Output, this conversion is a kind of compression mapping, and the occupied space of the output is generally much less than its and inputs occupied space, The output obtained by the different input of hash algorithm may be identical, and unique input can not possibly be determined according to output; Briefly hash algorithm is exactly a kind of function of summary information of message compression by random length to a certain regular length. Mass data is carried out in re-scheduling processing procedure, data segment is compressed into the output of regular length, Ke Yijian by hash algorithm Change the processing procedure to mass data.
At present, using the prior art come simplify to the processing procedure of mass data still can in consumption calculations machine it is a large amount of soft Hardware resource, such as calculator memory and central processing unit(CPU, Central Processing Unit)Resource.
Invention content
The embodiment of the present invention provides a kind of data processing equipment and method, to simplify place of the thread to mass data file Reason process improves thread to the processing speed of file, reduces the consumption of computer resource, improve the calculating speed of hash algorithm.
In a first aspect, a kind of data processing equipment is provided, including:
Determination unit, for arbitrary data line in pending file, according to the key of pre-set the row data Information determines the characteristic value of the row data, according to this feature value navigate in pre-set structure of arrays for storing line number According to mark element;
Processing unit, for judging whether the element that determination unit navigates to occupied, if it is, by the row data with And the row data occupied corresponding to the mark of the element are determined as the row data for meeting preset condition;Otherwise, by the row data In mark storage to the element.
With reference to first aspect, in the first possible implementation, structure of arrays is comprising row element and column element Two-dimensional array structure, and the number of the element in structure of arrays is not less than the line number of data in All Files to be treated.
The possible realization method of with reference to first aspect the first, in second of possible realization method of first aspect In, the line number of data is 2 in All Files to be treatedm+n, m and n are natural number;
Structure of arrays includes 2m+ 1 row element and 2n+ 1 column element.
With reference to first aspect the first or second of possible realization method, in the third possible reality of first aspect In existing mode, determination unit is specifically used for:
To arbitrary data line in pending file, the key message x of pre-set the row data is obtained, and utilize Kazakhstan Uncommon function pair x be calculated the characteristic value h of the row data(x);
According to formula r (x)=h (x) & (2m+n-1)>>N, navigate in pre-set structure of arrays for storing the row The row address r (x) of the element of the mark of data;
According to formula c (x)=h (x) & (2m+n-1)&(2n- 1) it, navigates in pre-set structure of arrays and is used to store The column address c (x) of the element of the mark of the row data;
Wherein, 2m+nRepresent the line number of data in All Files to be treated, m and n are natural number.
With reference to first aspect, in the 4th kind of possible realization method, structure of arrays is to include 1 row element and 2n+ 1 row member The one-dimension array structure of element, n is natural number.
With reference to first aspect, in the 5th kind of possible realization method, the mark of row data includes text where the row data The line number of the number and the row data of part in this document.
With reference to first aspect, in the 6th kind of possible realization method, which further includes:
Message generation unit, for the relevant information for the row data for meeting preset condition that processing unit determines to be added to In message;Wherein, relevant information includes meeting the key message and mark of the row data of preset condition.
The possible realization method of with reference to first aspect the first, in the 7th kind of possible realization method of first aspect In, determination unit and processing unit are handled the row data in multiple files while multiple thread parallels are respectively adopted, wherein Row data in one thread process, one file.
The 7th kind of possible realization method with reference to first aspect, in the 8th kind of possible realization method of first aspect In, determination unit and processing unit when determining an element in a thread accesses structure of arrays, forbid other lines respectively Journey accesses the row element where the element.
Second aspect provides a kind of data processing method, including
To arbitrary data line in pending file, which is determined according to the key message of pre-set the row data According to characteristic value, the member of the mark for storage line data in pre-set structure of arrays is navigated to according to this feature value Element;
Judge whether the element that navigates to is occupied, if it is, by the row data and occupying the mark institute of the element Corresponding row data are determined as the row data for meeting preset condition;Otherwise, the mark of the row data is stored into the element.
With reference to second aspect, in the first possible implementation, structure of arrays is comprising row element and column element Two-dimensional array structure, and the number of the element in structure of arrays is not less than the line number of data in All Files to be treated.
With reference to the first possible realization method of second aspect, in second of possible realization method of second aspect In, the line number of data is 2 in All Files to be treatedm+n, m and n are natural number;
Structure of arrays includes 2m+ 1 row element and 2n+ 1 column element.
With reference to the first or second of possible realization method of second aspect, in the third possible reality of second aspect In existing mode, to arbitrary data line in pending file, which is determined according to the key message of pre-set the row data The characteristic value of data navigates to the member of the mark for storage line data in pre-set structure of arrays according to this feature value Element, including:
To arbitrary data line in pending file, the key message x of pre-set the row data is obtained, and utilize Kazakhstan Uncommon function pair x be calculated the characteristic value h of the row data(x);
According to formula r (x)=h (x) & (2m+n-1)>>N, navigate in pre-set structure of arrays for storing the row The row address r (x) of the element of the mark of data;
According to formula c (x)=h (x) & (2m+n-1)&(2n- 1) it, navigates in pre-set structure of arrays and is used to store The column address c (x) of the element of the mark of the row data;
Wherein, 2m+nRepresent the line number of data in All Files to be treated, m and n are natural number.
With reference to second aspect, in the 4th kind of possible realization method, structure of arrays is to include 1 row element and 2n+ 1 row member The one-dimension array structure of element, n is natural number.
With reference to second aspect, in the 5th kind of possible realization method, the mark of row data includes text where the row data The line number of the number and the row data of part in this document.
With reference to second aspect, in the 6th kind of possible realization method, this method further includes:
The relevant information of row data for meeting preset condition is added in message;Wherein, relevant information includes meeting pre- If the key message and mark of the row data of condition.
With reference to the first possible realization method of second aspect, in the 7th kind of possible realization method of second aspect In, using multiple thread parallels the row data in multiple files are handled, in one of one file of thread process Row data.
With reference to the 7th kind of possible realization method of second aspect, in the 8th kind of possible realization method of second aspect In, when an element in a thread accesses structure of arrays, forbid the row element where other thread accesses element.
In a kind of data processing method provided in an embodiment of the present invention, thread is believed according to the key of pre-set row data Breath determines the characteristic value of the row data, is navigated in pre-set structure of arrays according to this feature value and is used for storage line data Mark element, and whether the element that navigates to described in judging occupied, to determine the row data for meeting preset condition, letter Processing procedure of the thread to mass data file is changed, has improved thread to the processing speed of file, reduce disappearing for computer resource Consumption.
Description of the drawings
Fig. 1 is a kind of data processing equipment structural diagram provided in an embodiment of the present invention;
Fig. 2 is a kind of data processing equipment structural diagram provided in an embodiment of the present invention;
Fig. 3 is a kind of data processing method flow diagram provided in an embodiment of the present invention;
Fig. 4 shows the process flow of the row data in file using multithreading for audit system provided in an embodiment of the present invention It is intended to;
Fig. 5 is pre-set structure of arrays schematic diagram provided in an embodiment of the present invention;
Fig. 6 is the file schematic diagram of thread process provided in an embodiment of the present invention;
Schematic diagrames of the Fig. 7 for row data in the file of thread process provided in an embodiment of the present invention.
Specific embodiment
The present invention provides a kind of data processing equipment and method, simplifies thread to a large amount of by hash algorithm to realize The processing procedure of data file improves processing speed of the thread to file, reduces the consumption of computer resource, improves hash algorithm Calculating speed.
As shown in Figure 1, an embodiment of the present invention provides a kind of data processing equipment, which includes:
Determination unit 11, for arbitrary data line in pending file, according to the pass of pre-set the row data Key information determines the characteristic value of the row data, is navigated in pre-set structure of arrays according to this feature value and is used for storage line The element of the mark of data;
Processing unit 12, for judging whether the element that determination unit 11 navigates to is occupied, if it is, by the line number According to this and the row data that occupy corresponding to the mark of the element are determined as the row data for meeting preset condition;Otherwise, by the line number According to mark storage in the element.
Preferably, structure of arrays is the two-dimensional array structure comprising row element and column element, and the element in structure of arrays Number not less than data in All Files to be treated line number.
Preferably, the line number of data is 2 in All Files to be treatedm+n, m and n are natural number;
Structure of arrays includes 2m+ 1 row element and 2n+ 1 column element.
Preferably, determination unit 11 is specifically used for:
To arbitrary data line in pending file, the key message x of pre-set the row data is obtained, and utilize Kazakhstan Uncommon function pair x be calculated the characteristic value h of the row data(x);
According to formula r (x)=h (x) & (2m+n-1)>>N, navigate in pre-set structure of arrays for storing the row The row address r (x) of the element of the mark of data;
According to formula c (x)=h (x) & (2m+n-1)&(2n- 1) it, navigates in pre-set structure of arrays and is used to store The column address c (x) of the element of the mark of the row data;
Wherein, 2m+nRepresent the line number of data in All Files to be treated, m and n are natural number.
Preferably, structure of arrays is to include 1 row element and 2nThe one-dimension array structure of+1 column element, n are natural number.
Preferably, the mark of row data include file where the row data number and the row data in this document Line number.
Preferably, the device further includes:
Message generation unit 13, the relevant information of the row data for meeting preset condition for processing unit 12 to be determined add It is added in message;Wherein, relevant information includes meeting the key message and mark of the row data of preset condition.
Preferably, determination unit 11 and processing unit 12 are multiple thread parallels are respectively adopted to the line number in multiple files According to being handled, the row data in one of one file of thread process.
Preferably, determination unit 11 and processing unit 12 are when a member in structure of arrays described in a determining thread accesses When plain, forbid the row element where other thread accesses element.
Specifically, determination unit 11, processing unit 12 and message generation unit 13 can be realized by entities such as processors, this Invention is not limited to realize the entity of these modules.
As shown in Fig. 2, an embodiment of the present invention provides a kind of data processing equipment, which includes:
Processor 21, for arbitrary data line in pending file, according to the key of pre-set the row data Information determines the characteristic value of the row data, according to this feature value navigate in pre-set structure of arrays for storing line number According to mark element, and judge whether the element that navigates to occupied, if it is, by the row data and occupying the element Mark corresponding to row data be determined as the row data for meeting preset condition;Otherwise, the storage of the mark of the row data is arrived should In element.
Memory 22, for storing the pre-set key message per data line and the pre-set number of storage Group structure and its relevant information.
Preferably, structure of arrays is the two-dimensional array structure comprising row element and column element, and the element in structure of arrays Number not less than data in All Files to be treated line number.
Preferably, the line number of data is 2 in All Files to be treatedm+n, m and n are natural number;
Structure of arrays includes 2m+ 1 row element and 2n+ 1 column element.
Preferably, the arbitrary data line in for pending file of processor 21, according to the pre-set line number According to key message determine the characteristic value of the row data, navigated in pre-set structure of arrays and be used for according to this feature value During the element of the mark of storage line data, it is specifically used for:
To arbitrary data line in pending file, the key message x of pre-set the row data is obtained, and utilize Kazakhstan Uncommon function pair x be calculated the characteristic value h of the row data(x);
According to formula r (x)=h (x) & (2m+n-1)>>N, navigate in pre-set structure of arrays for storing the row The row address r (x) of the element of the mark of data;
According to formula c (x)=h (x) & (2m+n-1)&(2n- 1) it, navigates in pre-set structure of arrays and is used to store The column address c (x) of the element of the mark of the row data;
Wherein, 2m+nRepresent the line number of data in All Files to be treated, m and n are natural number.
Preferably, structure of arrays is to include 1 row element and 2nThe one-dimension array structure of+1 column element, n are natural number.
Preferably, the mark of row data include file where the row data number and the row data in this document Line number.
Preferably, processor 21 is additionally operable to the relevant information of the determining row data for meeting preset condition being added to message In;Wherein, relevant information includes meeting the key message and mark of the row data of preset condition.
Preferably, processor 21 is handled the row data in multiple files using multiple thread parallels, wherein one Row data in a one file of thread process.
Preferably, processor 21 forbids other when determining an element in structure of arrays described in a thread accesses Row element where the thread accesses element.
As shown in figure 3, an embodiment of the present invention provides a kind of data processing method, this method includes:
S31, to arbitrary data line in pending file, this is determined according to the key message of pre-set the row data The characteristic value of row data navigates to the mark for storage line data in pre-set structure of arrays according to this feature value Element;
Whether the element navigated to described in S32, judgement is occupied, if it is, by the row data and occupying the element Mark corresponding to row data be determined as the row data for meeting preset condition;Otherwise, the storage of the mark of the row data is arrived should In element.
Wherein, the row of the number and the row data of file in this document where the mark of row data includes the row data Number.
Preferably, this method further includes before step S31:Pre-set structure of arrays;
Specifically, pre-set structure of arrays is the two-dimensional array structure comprising row element and column element, element is used for The mark of storage line data, the number of the element in structure of arrays not less than data in All Files to be treated line number, Such as:When the line number of data in All Files to be treated is 2m+nWhen, structure of arrays includes 2m+ 1 row element and 2n+ 1 row member Element, m and n are natural number;Alternatively, pre-set structure of arrays is to include 1 row element and 2nThe one-dimension array knot of+1 column element Structure, n are natural number.
Preferably, the method in step S31 includes:
To arbitrary data line in pending file, the key message x of pre-set the row data is obtained, and utilize Kazakhstan Uncommon function pair x be calculated the characteristic value h of the row data(x);
According to formula r (x)=h (x) & (2m+n-1)>>N, navigate in pre-set structure of arrays for storing the row The row address r (x) of the element of the mark of data;
According to formula c (x)=h (x) & (2m+n-1)&(2n- 1) it, navigates in pre-set structure of arrays and is used to store The column address c (x) of the element of the mark of the row data;
Wherein, 2m+nRepresent the line number of data in All Files to be treated, m and n are natural number.
Preferably, this method further includes after step s 32:
The relevant information of the row data for meeting preset condition determining in step s 32 is added in message;Wherein, phase Information is closed to include meeting the key message and mark of the row data of preset condition.
In the present invention, when needing processing there are a large amount of file, multiple thread parallels may be used to multiple texts Row data in part are handled, the row data in one of one file of thread process, at this point, pre-set array knot Structure is two-dimensional array structure;When an element in a thread accesses structure of arrays, forbid other thread accesses element The row element at place can improve thread to text to avoid access conflict of the multithreading to row element same in structure of arrays in this way The processing speed of row data in part;When file amount to be treated is fewer, a thread may be used in multiple files Row data handled, at this time pre-set structure of arrays be one-dimension array structure.
At a kind of data that embodiment that the present invention will be described in detail by taking the process of audit system processing data as an example below provides Reason method.
As shown in figure 4, audit system is as follows to the process flow of the row data in file using multithreading:
S41, structure of arrays is pre-set;
Specifically, pre-set structure of arrays is the two-dimensional array structure comprising row element and column element, element is used for The mark of storage line data, the number of the element in structure of arrays not less than data in All Files to be treated line number, Such as:As shown in figure 5, when the line number of data in All Files to be treated is 2m+nWhen, structure of arrays includes 2m+ 1 row element With 2n+ 1 column element, m and n are natural number.Alternatively, pre-set structure of arrays is to include 1 row element and 2nThe one of+1 column element Structure of arrays is tieed up, n is natural number.
Preferably, when needing processing there are a large amount of file, multiple thread parallels may be used in multiple files Row data handled, the row data in one of one file of thread process, at this point, pre-set structure of arrays is Two-dimensional array structure;When file amount to be treated is fewer, a thread may be used to the row data in multiple files It is handled, pre-set structure of arrays is one-dimension array structure at this time.
Wherein it is determined that the method for the line number of data can be in All Files to be treated:Operation system automatically analyzes The size of the number of All Files to be treated and each document memory;Determine file in All Files to be treated Byte number minimum in the byte number of each row of data in the byte number and this document of the file of memory maximum;According to processing The minimum byte number * of row data needs to locate in byte number/this document of the file of the line number of data=memory maximum in All Files The file number of reason, you can estimate the maximum value of the line number of data in All Files to be treated.
S42, to arbitrary data line in pending file, this is determined according to the key message of pre-set the row data The characteristic value of row data navigates to the mark for storage line data in pre-set structure of arrays according to this feature value Element;Wherein, the line number of the row data in the number of file where the mark of row data includes the row data and this document, For example, as shown in fig. 6, currently processed reference number of a document from 1 to n, each row of data in each file has respective line number.
Specifically, the step includes:
To arbitrary data line in pending file, the key message x of pre-set the row data is obtained, and utilize Kazakhstan Uncommon function pair x be calculated the characteristic value h of the row data(x);
According to formula r (x)=h (x) & (2m+n-1)>>N, navigate in pre-set structure of arrays for storing the row The row address r (x) of the element of the mark of data;
According to formula c (x)=h (x) & (2m+n-1)&(2n- 1) it, navigates in pre-set structure of arrays and is used to store The column address c (x) of the element of the mark of the row data;
Wherein, 2m+nRepresent the line number of data in All Files to be treated, m and n are natural number.
Wherein, the key message of pre-set row data can according to user or system needs be configured or in advance Agreement;For example, the currently processed row data of thread are the first row data in file shown in Fig. 7, the row data are: Ln=1 in 15895868086ABC TOM NanJing, Fig. 7 represents currently processed row data as the first row in this document Data, pre-set key message x include 15895868086(Subscriber Number)And TOM(User name), according to hash function meter Calculation obtains the characteristic value h of the row data(x)=726346, at this point, referring to Fig. 4, m=1, n=3,
r(x)=h(15895868086,TOM)&(21+3-1)>>3=0, both the 0th row elements;
c(x)=h(15895868086,TOM)&(21+3-1)&(23- 1)=6, both the 6th column element;
The characteristic value of the row data is just navigated to the 0th row in pre-set structure of arrays as shown in Figure 5 in this way The element of 6th row, i.e. element in Fig. 5 marked as 6.
Whether the element navigated in S43, judgment step S42 is occupied;
If it is, perform step S44;Otherwise, step S45 is performed;
S44, the row data and the row data occupied corresponding to the mark of the element are determined as meeting preset condition Row data, and the relevant information of row data for meeting preset condition is added in message;Wherein, relevant information includes meeting pre- If the key message and mark of the row data of condition;
If the element navigated in step S42 is occupied, illustrate to occupy the row data corresponding to the mark of the element Characteristic value it is identical with the characteristic value of currently processed row data, i.e., key message is identical, then needs the two row data Key message and mark are added in message, which can check report or can be sent out for the two row data Alarm.
S45, will the row data mark storage in the element;
S46, judge whether to be disposed to the row data in All Files;
If it is, perform step S47;Otherwise, step S48 is performed;
S47, message is sent;
Also untreated row data in S48, locating file.
If the row data in multiple files should be handled using multiple thread parallels in the process, wherein per thread Fixed file can be assigned, thread priority processing is assigned to the row data in the file of this thread, at this point, pre-set Structure of arrays is two-dimensional array structure;When an element in a thread accesses structure of arrays, forbid other thread accesses Row element where the element to avoid access conflict of the multithreading to row element same in structure of arrays, improves thread to text The processing speed of row data in part.Such as:Now there are two thread parallels to handle two files, wherein, first thread processing file 1 In the first row data, the second thread is responsible for handling the first row data in file 2, and specific handling result is as shown in table 1:
Table 1
At this point, first thread the characteristic value of the first row data in file 1 is navigated to it is as shown in Figure 5 pre-set The element of the 0th row the 6th row in structure of arrays, i.e. element in Fig. 5 marked as 6;Second thread is by the first line number in file 2 According to characteristic value navigate to the element that the 1st row the 7th in pre-set structure of arrays as shown in Figure 5 arranges, i.e. label in Fig. 5 For 16 element;During the element of the 0th row the 6th row during first thread accesses structure of arrays, first thread is by the 0th row All elements locking, forbid all elements of other the 0th rows of thread accesses, treat first thread to the first line number in file 1 According to after treatment, all elements of the 0th row just can be by other thread accesses;Similarly, in the second thread accesses structure of arrays In the 1st row the 7th row element during, the second thread locks all elements of the 1st row.
If third thread in the first row data in handling file 3, by the characteristic value of the row data also navigate to as The element of the 0th row the 6th row in pre-set structure of arrays shown in fig. 5, due to the 0th row the 6th row in structure of arrays Element is occupied by the mark of the first row data of the file 1 of first thread processing, shows the first row data and text of file 3 The characteristic value of the first row data of part 1 is identical, i.e., key message is identical, then needs the key message and mark of the two row data Knowledge is added in message, which can check report or can send out alarm for the two row data.
It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, apparatus or computer program Product.Therefore, the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware can be used in the present invention Apply the form of example.Moreover, the computer for wherein including computer usable program code in one or more can be used in the present invention Usable storage medium(Including but not limited to magnetic disk storage and optical memory etc.)The shape of the computer program product of upper implementation Formula.
The present invention be with reference to according to the method for the embodiment of the present invention, equipment(Device)And the flow of computer program product Figure and/or block diagram describe.It should be understood that it can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided The processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that the instruction performed by computer or the processor of other programmable data processing devices is generated for real The device of function specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction generation being stored in the computer-readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps are performed on calculation machine or other programmable devices to generate computer implemented processing, so as in computer or The instruction offer performed on other programmable devices is used to implement in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art God and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (18)

1. a kind of data processing equipment, which is characterized in that the device includes:
Determination unit, for arbitrary data line in pending file, according to the key message of pre-set the row data Determine the characteristic value of the row data, according to this feature value navigate in pre-set structure of arrays for storage line data The element of mark;
Processing unit, for judging whether the element that the determination unit navigates to occupied, if it is, by the row data with And the row data occupied corresponding to the mark of the element are determined as the row data for meeting preset condition;Otherwise, by the row data In mark storage to the element.
2. device as described in claim 1, which is characterized in that the structure of arrays is the two dimension comprising row element and column element Structure of arrays, and the number of the element in the structure of arrays is not less than the line number of data in All Files to be treated.
3. device as claimed in claim 2, which is characterized in that the line number of data is 2 in All Files to be treatedm+n, m It is natural number with n;
The structure of arrays includes 2m+ 1 row element and 2n+ 1 column element.
4. device as claimed in claim 2 or claim 3, which is characterized in that the determination unit is specifically used for:
To arbitrary data line in pending file, the key message x of pre-set the row data is obtained, and utilize Hash letter Several characteristic value h for x be calculated the row data(x);
According to formula r (x)=h (x) & (2m+n-1)>>N, navigate in pre-set structure of arrays for storing the row data Mark element row address r (x);
According to formula c (x)=h (x) & (2m+n-1)&(2n- 1), navigate in pre-set structure of arrays for storing the row The column address c (x) of the element of the mark of data;
Wherein, 2m+nRepresent the line number of data in All Files to be treated, m and n are natural number.
5. device as described in claim 1, which is characterized in that the structure of arrays is to include 1 row element and 2n+ 1 column element One-dimension array structure, n are natural number.
6. device as described in claim 1, which is characterized in that the number of file where the mark of row data includes the row data And the line number of the row data in this document.
7. device as described in claim 1, which is characterized in that the device further includes:
Message generation unit, for the relevant information for the row data for meeting preset condition that the processing unit determines to be added to In message;Wherein, the relevant information includes meeting the key message and mark of the row data of preset condition.
8. device as claimed in claim 2, which is characterized in that multiple threads are respectively adopted in the determination unit and processing unit Concurrently the row data in multiple files are handled, the row data in one of one file of thread process.
9. device as claimed in claim 8, which is characterized in that the determination unit and processing unit are when determining thread visit When asking an element in the structure of arrays, forbid the row element where other thread accesses element.
10. a kind of data processing method, which is characterized in that this method includes:
To arbitrary data line in pending file, the row data are determined according to the key message of pre-set the row data Characteristic value navigates to the element of the mark for storage line data in pre-set structure of arrays according to this feature value;
Whether the element navigated to described in judgement is occupied, if it is, by the row data and occupying the mark institute of the element Corresponding row data are determined as the row data for meeting preset condition;Otherwise, the mark of the row data is stored into the element.
11. method as claimed in claim 10, which is characterized in that the structure of arrays is includes the two of row element and column element Structure of arrays is tieed up, and the number of the element in the structure of arrays is not less than the line number of data in All Files to be treated.
12. method as claimed in claim 11, which is characterized in that the line number of data is 2 in All Files to be treatedm+n, M and n is natural number;
The structure of arrays includes 2m+ 1 row element and 2n+ 1 column element.
13. the method as described in claim 11 or 12, which is characterized in that arbitrary data line in pending file, according to The key message of pre-set the row data determines the characteristic value of the row data, is navigated to according to this feature value pre-set The element of the mark for storage line data in structure of arrays, including:
To arbitrary data line in pending file, the key message x of pre-set the row data is obtained, and utilize Hash letter Several characteristic value h for x be calculated the row data(x);
According to formula r (x)=h (x) & (2m+n-1)>>N, navigate in pre-set structure of arrays for storing the row data Mark element row address r (x);
According to formula c (x)=h (x) & (2m+n-1)&(2n- 1), navigate in pre-set structure of arrays for storing the row The column address c (x) of the element of the mark of data;
Wherein, 2m+nRepresent the line number of data in All Files to be treated, m and n are natural number.
14. method as claimed in claim 10, which is characterized in that the structure of arrays is to include 1 row element and 2n+ 1 column element One-dimension array structure, n is natural number.
15. method as claimed in claim 10, which is characterized in that the volume of file where the mark of row data includes the row data Number and the row data line number in this document.
16. method as claimed in claim 10, which is characterized in that this method further includes:
The relevant information of row data for meeting preset condition is added in message;Wherein, the relevant information includes meeting pre- If the key message and mark of the row data of condition.
17. method as claimed in claim 11, which is characterized in that using multiple thread parallels to the line number in multiple files According to being handled, the row data in one of one file of thread process.
18. method as claimed in claim 17, which is characterized in that when a member in structure of arrays described in a thread accesses When plain, forbid the row element where other thread accesses element.
CN201410023109.7A 2014-01-17 2014-01-17 A kind of data processing equipment and method Active CN104793997B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410023109.7A CN104793997B (en) 2014-01-17 2014-01-17 A kind of data processing equipment and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410023109.7A CN104793997B (en) 2014-01-17 2014-01-17 A kind of data processing equipment and method

Publications (2)

Publication Number Publication Date
CN104793997A CN104793997A (en) 2015-07-22
CN104793997B true CN104793997B (en) 2018-06-26

Family

ID=53558810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410023109.7A Active CN104793997B (en) 2014-01-17 2014-01-17 A kind of data processing equipment and method

Country Status (1)

Country Link
CN (1) CN104793997B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460453B (en) * 2017-02-21 2022-05-17 阿里巴巴集团控股有限公司 Data processing method, device and system for CTC training

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101159795A (en) * 2007-10-25 2008-04-09 中兴通讯股份有限公司 Calling list rearrangement method and device
CN102591855A (en) * 2012-01-13 2012-07-18 广州从兴电子开发有限公司 Data identification method and data identification system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120323786A1 (en) * 2011-06-16 2012-12-20 OneID Inc. Method and system for delayed authorization of online transactions

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101159795A (en) * 2007-10-25 2008-04-09 中兴通讯股份有限公司 Calling list rearrangement method and device
CN102591855A (en) * 2012-01-13 2012-07-18 广州从兴电子开发有限公司 Data identification method and data identification system

Also Published As

Publication number Publication date
CN104793997A (en) 2015-07-22

Similar Documents

Publication Publication Date Title
CN107196900A (en) A kind of method and device for verification of knowing together
CN107819569B (en) The encryption method and terminal device of log-on message
CN105224606B (en) A kind of processing method and processing device of user identifier
CN104298680B (en) Data statistical approach and data statistics device
CN107040582A (en) A kind of data processing method and device
CN106529682A (en) Method and apparatus for processing deep learning task in big-data cluster
CN108228730A (en) Data lead-in method, device, computer equipment and readable storage medium storing program for executing
CN107729409A (en) A kind of short chain delivers a child into method and device
CN104572785B (en) A kind of distributed method and apparatus for creating index
CN109726004B (en) Data processing method and device
CN106294421A (en) A kind of data write, read method and device
Jane et al. Evaluating cost and reliability integrated performance of stochastic logistics systems
CN104573557B (en) Cloud data storage method and device and cloud data restoration method
CN108268534A (en) Propagating influence computational methods and device
CN107016115A (en) Data export method, device, computer-readable recording medium and electronic equipment
Apostal et al. Password recovery using MPI and CUDA
CN105956921A (en) Method and device for selecting bankcard number by user himself/herself
CN108399333A (en) System and method for the anti-virus scan for executing webpage
CN107402905A (en) Computational methods and device based on neutral net
CN104376056B (en) A kind of method and apparatus of data processing
CN105677645B (en) A kind of tables of data comparison method and device
CN103455518A (en) Data processing method and device
CN104793997B (en) A kind of data processing equipment and method
CN112651054A (en) Memory data integrity protection method and device and electronic equipment
CN106796587A (en) Checking analysis result

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant