CN104793997B - A kind of data processing equipment and method - Google Patents
A kind of data processing equipment and method Download PDFInfo
- Publication number
- CN104793997B CN104793997B CN201410023109.7A CN201410023109A CN104793997B CN 104793997 B CN104793997 B CN 104793997B CN 201410023109 A CN201410023109 A CN 201410023109A CN 104793997 B CN104793997 B CN 104793997B
- Authority
- CN
- China
- Prior art keywords
- data
- row
- row data
- arrays
- mark
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of data processing equipment and method, to simplify processing procedure of the thread to mass data file, improves thread to the processing speed of file, reduces the consumption of computer resource, improve the calculating speed of hash algorithm.The method of the present invention includes:To arbitrary data line in pending file, the characteristic value of the row data is determined according to the key message of pre-set the row data, the element of the mark for storage line data in pre-set structure of arrays is navigated to according to this feature value;Whether the element navigated to described in judgement is occupied, if it is, the row data and the row data occupied corresponding to the mark of the element are determined as the row data for meeting preset condition;Otherwise, the mark of the row data is stored into the element.
Description
Technical field
The present invention relates to the communications field more particularly to a kind of data processing equipment and methods.
Background technology
When handling mass data, it usually needs re-scheduling processing is carried out to mass data, i.e., in mass data,
Related two or more data segments are searched, and do phase to the related data segment according to the key message of each data segment
It should handle.
By taking data auditing business procedure as an example, it should mainly include CRM system in the process(CRM, Customer
Relationship Management), Convergence Billing System(CBS, Convergent Billing System)It is with checking
System, CRM and CBS are responsible for providing a large amount of data file, and audit system is responsible for that these data files are carried out to check processing, i.e. root
Related two or more data segments in All Files are searched, and analyze associated number according to the key message of each data segment
According to the identical and different part of information between section, associated data segment is then formed into report file, last audit system should
Report file feeds back to CRM and CBS, and CRM and CBS do data correction to it again.
Due to needing the data volume for carrying out re-scheduling processing bigger, to save the software and hardware resources of computer, generally use
The input of random length can be converted into the regular length in specified numberical range by Hash (Hash) algorithm by hash algorithm
Output, this conversion is a kind of compression mapping, and the occupied space of the output is generally much less than its and inputs occupied space,
The output obtained by the different input of hash algorithm may be identical, and unique input can not possibly be determined according to output;
Briefly hash algorithm is exactly a kind of function of summary information of message compression by random length to a certain regular length.
Mass data is carried out in re-scheduling processing procedure, data segment is compressed into the output of regular length, Ke Yijian by hash algorithm
Change the processing procedure to mass data.
At present, using the prior art come simplify to the processing procedure of mass data still can in consumption calculations machine it is a large amount of soft
Hardware resource, such as calculator memory and central processing unit(CPU, Central Processing Unit)Resource.
Invention content
The embodiment of the present invention provides a kind of data processing equipment and method, to simplify place of the thread to mass data file
Reason process improves thread to the processing speed of file, reduces the consumption of computer resource, improve the calculating speed of hash algorithm.
In a first aspect, a kind of data processing equipment is provided, including:
Determination unit, for arbitrary data line in pending file, according to the key of pre-set the row data
Information determines the characteristic value of the row data, according to this feature value navigate in pre-set structure of arrays for storing line number
According to mark element;
Processing unit, for judging whether the element that determination unit navigates to occupied, if it is, by the row data with
And the row data occupied corresponding to the mark of the element are determined as the row data for meeting preset condition;Otherwise, by the row data
In mark storage to the element.
With reference to first aspect, in the first possible implementation, structure of arrays is comprising row element and column element
Two-dimensional array structure, and the number of the element in structure of arrays is not less than the line number of data in All Files to be treated.
The possible realization method of with reference to first aspect the first, in second of possible realization method of first aspect
In, the line number of data is 2 in All Files to be treatedm+n, m and n are natural number;
Structure of arrays includes 2m+ 1 row element and 2n+ 1 column element.
With reference to first aspect the first or second of possible realization method, in the third possible reality of first aspect
In existing mode, determination unit is specifically used for:
To arbitrary data line in pending file, the key message x of pre-set the row data is obtained, and utilize Kazakhstan
Uncommon function pair x be calculated the characteristic value h of the row data(x);
According to formula r (x)=h (x) & (2m+n-1)>>N, navigate in pre-set structure of arrays for storing the row
The row address r (x) of the element of the mark of data;
According to formula c (x)=h (x) & (2m+n-1)&(2n- 1) it, navigates in pre-set structure of arrays and is used to store
The column address c (x) of the element of the mark of the row data;
Wherein, 2m+nRepresent the line number of data in All Files to be treated, m and n are natural number.
With reference to first aspect, in the 4th kind of possible realization method, structure of arrays is to include 1 row element and 2n+ 1 row member
The one-dimension array structure of element, n is natural number.
With reference to first aspect, in the 5th kind of possible realization method, the mark of row data includes text where the row data
The line number of the number and the row data of part in this document.
With reference to first aspect, in the 6th kind of possible realization method, which further includes:
Message generation unit, for the relevant information for the row data for meeting preset condition that processing unit determines to be added to
In message;Wherein, relevant information includes meeting the key message and mark of the row data of preset condition.
The possible realization method of with reference to first aspect the first, in the 7th kind of possible realization method of first aspect
In, determination unit and processing unit are handled the row data in multiple files while multiple thread parallels are respectively adopted, wherein
Row data in one thread process, one file.
The 7th kind of possible realization method with reference to first aspect, in the 8th kind of possible realization method of first aspect
In, determination unit and processing unit when determining an element in a thread accesses structure of arrays, forbid other lines respectively
Journey accesses the row element where the element.
Second aspect provides a kind of data processing method, including
To arbitrary data line in pending file, which is determined according to the key message of pre-set the row data
According to characteristic value, the member of the mark for storage line data in pre-set structure of arrays is navigated to according to this feature value
Element;
Judge whether the element that navigates to is occupied, if it is, by the row data and occupying the mark institute of the element
Corresponding row data are determined as the row data for meeting preset condition;Otherwise, the mark of the row data is stored into the element.
With reference to second aspect, in the first possible implementation, structure of arrays is comprising row element and column element
Two-dimensional array structure, and the number of the element in structure of arrays is not less than the line number of data in All Files to be treated.
With reference to the first possible realization method of second aspect, in second of possible realization method of second aspect
In, the line number of data is 2 in All Files to be treatedm+n, m and n are natural number;
Structure of arrays includes 2m+ 1 row element and 2n+ 1 column element.
With reference to the first or second of possible realization method of second aspect, in the third possible reality of second aspect
In existing mode, to arbitrary data line in pending file, which is determined according to the key message of pre-set the row data
The characteristic value of data navigates to the member of the mark for storage line data in pre-set structure of arrays according to this feature value
Element, including:
To arbitrary data line in pending file, the key message x of pre-set the row data is obtained, and utilize Kazakhstan
Uncommon function pair x be calculated the characteristic value h of the row data(x);
According to formula r (x)=h (x) & (2m+n-1)>>N, navigate in pre-set structure of arrays for storing the row
The row address r (x) of the element of the mark of data;
According to formula c (x)=h (x) & (2m+n-1)&(2n- 1) it, navigates in pre-set structure of arrays and is used to store
The column address c (x) of the element of the mark of the row data;
Wherein, 2m+nRepresent the line number of data in All Files to be treated, m and n are natural number.
With reference to second aspect, in the 4th kind of possible realization method, structure of arrays is to include 1 row element and 2n+ 1 row member
The one-dimension array structure of element, n is natural number.
With reference to second aspect, in the 5th kind of possible realization method, the mark of row data includes text where the row data
The line number of the number and the row data of part in this document.
With reference to second aspect, in the 6th kind of possible realization method, this method further includes:
The relevant information of row data for meeting preset condition is added in message;Wherein, relevant information includes meeting pre-
If the key message and mark of the row data of condition.
With reference to the first possible realization method of second aspect, in the 7th kind of possible realization method of second aspect
In, using multiple thread parallels the row data in multiple files are handled, in one of one file of thread process
Row data.
With reference to the 7th kind of possible realization method of second aspect, in the 8th kind of possible realization method of second aspect
In, when an element in a thread accesses structure of arrays, forbid the row element where other thread accesses element.
In a kind of data processing method provided in an embodiment of the present invention, thread is believed according to the key of pre-set row data
Breath determines the characteristic value of the row data, is navigated in pre-set structure of arrays according to this feature value and is used for storage line data
Mark element, and whether the element that navigates to described in judging occupied, to determine the row data for meeting preset condition, letter
Processing procedure of the thread to mass data file is changed, has improved thread to the processing speed of file, reduce disappearing for computer resource
Consumption.
Description of the drawings
Fig. 1 is a kind of data processing equipment structural diagram provided in an embodiment of the present invention;
Fig. 2 is a kind of data processing equipment structural diagram provided in an embodiment of the present invention;
Fig. 3 is a kind of data processing method flow diagram provided in an embodiment of the present invention;
Fig. 4 shows the process flow of the row data in file using multithreading for audit system provided in an embodiment of the present invention
It is intended to;
Fig. 5 is pre-set structure of arrays schematic diagram provided in an embodiment of the present invention;
Fig. 6 is the file schematic diagram of thread process provided in an embodiment of the present invention;
Schematic diagrames of the Fig. 7 for row data in the file of thread process provided in an embodiment of the present invention.
Specific embodiment
The present invention provides a kind of data processing equipment and method, simplifies thread to a large amount of by hash algorithm to realize
The processing procedure of data file improves processing speed of the thread to file, reduces the consumption of computer resource, improves hash algorithm
Calculating speed.
As shown in Figure 1, an embodiment of the present invention provides a kind of data processing equipment, which includes:
Determination unit 11, for arbitrary data line in pending file, according to the pass of pre-set the row data
Key information determines the characteristic value of the row data, is navigated in pre-set structure of arrays according to this feature value and is used for storage line
The element of the mark of data;
Processing unit 12, for judging whether the element that determination unit 11 navigates to is occupied, if it is, by the line number
According to this and the row data that occupy corresponding to the mark of the element are determined as the row data for meeting preset condition;Otherwise, by the line number
According to mark storage in the element.
Preferably, structure of arrays is the two-dimensional array structure comprising row element and column element, and the element in structure of arrays
Number not less than data in All Files to be treated line number.
Preferably, the line number of data is 2 in All Files to be treatedm+n, m and n are natural number;
Structure of arrays includes 2m+ 1 row element and 2n+ 1 column element.
Preferably, determination unit 11 is specifically used for:
To arbitrary data line in pending file, the key message x of pre-set the row data is obtained, and utilize Kazakhstan
Uncommon function pair x be calculated the characteristic value h of the row data(x);
According to formula r (x)=h (x) & (2m+n-1)>>N, navigate in pre-set structure of arrays for storing the row
The row address r (x) of the element of the mark of data;
According to formula c (x)=h (x) & (2m+n-1)&(2n- 1) it, navigates in pre-set structure of arrays and is used to store
The column address c (x) of the element of the mark of the row data;
Wherein, 2m+nRepresent the line number of data in All Files to be treated, m and n are natural number.
Preferably, structure of arrays is to include 1 row element and 2nThe one-dimension array structure of+1 column element, n are natural number.
Preferably, the mark of row data include file where the row data number and the row data in this document
Line number.
Preferably, the device further includes:
Message generation unit 13, the relevant information of the row data for meeting preset condition for processing unit 12 to be determined add
It is added in message;Wherein, relevant information includes meeting the key message and mark of the row data of preset condition.
Preferably, determination unit 11 and processing unit 12 are multiple thread parallels are respectively adopted to the line number in multiple files
According to being handled, the row data in one of one file of thread process.
Preferably, determination unit 11 and processing unit 12 are when a member in structure of arrays described in a determining thread accesses
When plain, forbid the row element where other thread accesses element.
Specifically, determination unit 11, processing unit 12 and message generation unit 13 can be realized by entities such as processors, this
Invention is not limited to realize the entity of these modules.
As shown in Fig. 2, an embodiment of the present invention provides a kind of data processing equipment, which includes:
Processor 21, for arbitrary data line in pending file, according to the key of pre-set the row data
Information determines the characteristic value of the row data, according to this feature value navigate in pre-set structure of arrays for storing line number
According to mark element, and judge whether the element that navigates to occupied, if it is, by the row data and occupying the element
Mark corresponding to row data be determined as the row data for meeting preset condition;Otherwise, the storage of the mark of the row data is arrived should
In element.
Memory 22, for storing the pre-set key message per data line and the pre-set number of storage
Group structure and its relevant information.
Preferably, structure of arrays is the two-dimensional array structure comprising row element and column element, and the element in structure of arrays
Number not less than data in All Files to be treated line number.
Preferably, the line number of data is 2 in All Files to be treatedm+n, m and n are natural number;
Structure of arrays includes 2m+ 1 row element and 2n+ 1 column element.
Preferably, the arbitrary data line in for pending file of processor 21, according to the pre-set line number
According to key message determine the characteristic value of the row data, navigated in pre-set structure of arrays and be used for according to this feature value
During the element of the mark of storage line data, it is specifically used for:
To arbitrary data line in pending file, the key message x of pre-set the row data is obtained, and utilize Kazakhstan
Uncommon function pair x be calculated the characteristic value h of the row data(x);
According to formula r (x)=h (x) & (2m+n-1)>>N, navigate in pre-set structure of arrays for storing the row
The row address r (x) of the element of the mark of data;
According to formula c (x)=h (x) & (2m+n-1)&(2n- 1) it, navigates in pre-set structure of arrays and is used to store
The column address c (x) of the element of the mark of the row data;
Wherein, 2m+nRepresent the line number of data in All Files to be treated, m and n are natural number.
Preferably, structure of arrays is to include 1 row element and 2nThe one-dimension array structure of+1 column element, n are natural number.
Preferably, the mark of row data include file where the row data number and the row data in this document
Line number.
Preferably, processor 21 is additionally operable to the relevant information of the determining row data for meeting preset condition being added to message
In;Wherein, relevant information includes meeting the key message and mark of the row data of preset condition.
Preferably, processor 21 is handled the row data in multiple files using multiple thread parallels, wherein one
Row data in a one file of thread process.
Preferably, processor 21 forbids other when determining an element in structure of arrays described in a thread accesses
Row element where the thread accesses element.
As shown in figure 3, an embodiment of the present invention provides a kind of data processing method, this method includes:
S31, to arbitrary data line in pending file, this is determined according to the key message of pre-set the row data
The characteristic value of row data navigates to the mark for storage line data in pre-set structure of arrays according to this feature value
Element;
Whether the element navigated to described in S32, judgement is occupied, if it is, by the row data and occupying the element
Mark corresponding to row data be determined as the row data for meeting preset condition;Otherwise, the storage of the mark of the row data is arrived should
In element.
Wherein, the row of the number and the row data of file in this document where the mark of row data includes the row data
Number.
Preferably, this method further includes before step S31:Pre-set structure of arrays;
Specifically, pre-set structure of arrays is the two-dimensional array structure comprising row element and column element, element is used for
The mark of storage line data, the number of the element in structure of arrays not less than data in All Files to be treated line number,
Such as:When the line number of data in All Files to be treated is 2m+nWhen, structure of arrays includes 2m+ 1 row element and 2n+ 1 row member
Element, m and n are natural number;Alternatively, pre-set structure of arrays is to include 1 row element and 2nThe one-dimension array knot of+1 column element
Structure, n are natural number.
Preferably, the method in step S31 includes:
To arbitrary data line in pending file, the key message x of pre-set the row data is obtained, and utilize Kazakhstan
Uncommon function pair x be calculated the characteristic value h of the row data(x);
According to formula r (x)=h (x) & (2m+n-1)>>N, navigate in pre-set structure of arrays for storing the row
The row address r (x) of the element of the mark of data;
According to formula c (x)=h (x) & (2m+n-1)&(2n- 1) it, navigates in pre-set structure of arrays and is used to store
The column address c (x) of the element of the mark of the row data;
Wherein, 2m+nRepresent the line number of data in All Files to be treated, m and n are natural number.
Preferably, this method further includes after step s 32:
The relevant information of the row data for meeting preset condition determining in step s 32 is added in message;Wherein, phase
Information is closed to include meeting the key message and mark of the row data of preset condition.
In the present invention, when needing processing there are a large amount of file, multiple thread parallels may be used to multiple texts
Row data in part are handled, the row data in one of one file of thread process, at this point, pre-set array knot
Structure is two-dimensional array structure;When an element in a thread accesses structure of arrays, forbid other thread accesses element
The row element at place can improve thread to text to avoid access conflict of the multithreading to row element same in structure of arrays in this way
The processing speed of row data in part;When file amount to be treated is fewer, a thread may be used in multiple files
Row data handled, at this time pre-set structure of arrays be one-dimension array structure.
At a kind of data that embodiment that the present invention will be described in detail by taking the process of audit system processing data as an example below provides
Reason method.
As shown in figure 4, audit system is as follows to the process flow of the row data in file using multithreading:
S41, structure of arrays is pre-set;
Specifically, pre-set structure of arrays is the two-dimensional array structure comprising row element and column element, element is used for
The mark of storage line data, the number of the element in structure of arrays not less than data in All Files to be treated line number,
Such as:As shown in figure 5, when the line number of data in All Files to be treated is 2m+nWhen, structure of arrays includes 2m+ 1 row element
With 2n+ 1 column element, m and n are natural number.Alternatively, pre-set structure of arrays is to include 1 row element and 2nThe one of+1 column element
Structure of arrays is tieed up, n is natural number.
Preferably, when needing processing there are a large amount of file, multiple thread parallels may be used in multiple files
Row data handled, the row data in one of one file of thread process, at this point, pre-set structure of arrays is
Two-dimensional array structure;When file amount to be treated is fewer, a thread may be used to the row data in multiple files
It is handled, pre-set structure of arrays is one-dimension array structure at this time.
Wherein it is determined that the method for the line number of data can be in All Files to be treated:Operation system automatically analyzes
The size of the number of All Files to be treated and each document memory;Determine file in All Files to be treated
Byte number minimum in the byte number of each row of data in the byte number and this document of the file of memory maximum;According to processing
The minimum byte number * of row data needs to locate in byte number/this document of the file of the line number of data=memory maximum in All Files
The file number of reason, you can estimate the maximum value of the line number of data in All Files to be treated.
S42, to arbitrary data line in pending file, this is determined according to the key message of pre-set the row data
The characteristic value of row data navigates to the mark for storage line data in pre-set structure of arrays according to this feature value
Element;Wherein, the line number of the row data in the number of file where the mark of row data includes the row data and this document,
For example, as shown in fig. 6, currently processed reference number of a document from 1 to n, each row of data in each file has respective line number.
Specifically, the step includes:
To arbitrary data line in pending file, the key message x of pre-set the row data is obtained, and utilize Kazakhstan
Uncommon function pair x be calculated the characteristic value h of the row data(x);
According to formula r (x)=h (x) & (2m+n-1)>>N, navigate in pre-set structure of arrays for storing the row
The row address r (x) of the element of the mark of data;
According to formula c (x)=h (x) & (2m+n-1)&(2n- 1) it, navigates in pre-set structure of arrays and is used to store
The column address c (x) of the element of the mark of the row data;
Wherein, 2m+nRepresent the line number of data in All Files to be treated, m and n are natural number.
Wherein, the key message of pre-set row data can according to user or system needs be configured or in advance
Agreement;For example, the currently processed row data of thread are the first row data in file shown in Fig. 7, the row data are:
Ln=1 in 15895868086ABC TOM NanJing, Fig. 7 represents currently processed row data as the first row in this document
Data, pre-set key message x include 15895868086(Subscriber Number)And TOM(User name), according to hash function meter
Calculation obtains the characteristic value h of the row data(x)=726346, at this point, referring to Fig. 4, m=1, n=3,
r(x)=h(15895868086,TOM)&(21+3-1)>>3=0, both the 0th row elements;
c(x)=h(15895868086,TOM)&(21+3-1)&(23- 1)=6, both the 6th column element;
The characteristic value of the row data is just navigated to the 0th row in pre-set structure of arrays as shown in Figure 5 in this way
The element of 6th row, i.e. element in Fig. 5 marked as 6.
Whether the element navigated in S43, judgment step S42 is occupied;
If it is, perform step S44;Otherwise, step S45 is performed;
S44, the row data and the row data occupied corresponding to the mark of the element are determined as meeting preset condition
Row data, and the relevant information of row data for meeting preset condition is added in message;Wherein, relevant information includes meeting pre-
If the key message and mark of the row data of condition;
If the element navigated in step S42 is occupied, illustrate to occupy the row data corresponding to the mark of the element
Characteristic value it is identical with the characteristic value of currently processed row data, i.e., key message is identical, then needs the two row data
Key message and mark are added in message, which can check report or can be sent out for the two row data
Alarm.
S45, will the row data mark storage in the element;
S46, judge whether to be disposed to the row data in All Files;
If it is, perform step S47;Otherwise, step S48 is performed;
S47, message is sent;
Also untreated row data in S48, locating file.
If the row data in multiple files should be handled using multiple thread parallels in the process, wherein per thread
Fixed file can be assigned, thread priority processing is assigned to the row data in the file of this thread, at this point, pre-set
Structure of arrays is two-dimensional array structure;When an element in a thread accesses structure of arrays, forbid other thread accesses
Row element where the element to avoid access conflict of the multithreading to row element same in structure of arrays, improves thread to text
The processing speed of row data in part.Such as:Now there are two thread parallels to handle two files, wherein, first thread processing file 1
In the first row data, the second thread is responsible for handling the first row data in file 2, and specific handling result is as shown in table 1:
Table 1
At this point, first thread the characteristic value of the first row data in file 1 is navigated to it is as shown in Figure 5 pre-set
The element of the 0th row the 6th row in structure of arrays, i.e. element in Fig. 5 marked as 6;Second thread is by the first line number in file 2
According to characteristic value navigate to the element that the 1st row the 7th in pre-set structure of arrays as shown in Figure 5 arranges, i.e. label in Fig. 5
For 16 element;During the element of the 0th row the 6th row during first thread accesses structure of arrays, first thread is by the 0th row
All elements locking, forbid all elements of other the 0th rows of thread accesses, treat first thread to the first line number in file 1
According to after treatment, all elements of the 0th row just can be by other thread accesses;Similarly, in the second thread accesses structure of arrays
In the 1st row the 7th row element during, the second thread locks all elements of the 1st row.
If third thread in the first row data in handling file 3, by the characteristic value of the row data also navigate to as
The element of the 0th row the 6th row in pre-set structure of arrays shown in fig. 5, due to the 0th row the 6th row in structure of arrays
Element is occupied by the mark of the first row data of the file 1 of first thread processing, shows the first row data and text of file 3
The characteristic value of the first row data of part 1 is identical, i.e., key message is identical, then needs the key message and mark of the two row data
Knowledge is added in message, which can check report or can send out alarm for the two row data.
It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, apparatus or computer program
Product.Therefore, the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware can be used in the present invention
Apply the form of example.Moreover, the computer for wherein including computer usable program code in one or more can be used in the present invention
Usable storage medium(Including but not limited to magnetic disk storage and optical memory etc.)The shape of the computer program product of upper implementation
Formula.
The present invention be with reference to according to the method for the embodiment of the present invention, equipment(Device)And the flow of computer program product
Figure and/or block diagram describe.It should be understood that it can be realized by computer program instructions every first-class in flowchart and/or the block diagram
The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided
The processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that the instruction performed by computer or the processor of other programmable data processing devices is generated for real
The device of function specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction generation being stored in the computer-readable memory includes referring to
Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or
The function of being specified in multiple boxes.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted
Series of operation steps are performed on calculation machine or other programmable devices to generate computer implemented processing, so as in computer or
The instruction offer performed on other programmable devices is used to implement in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
God and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.
Claims (18)
1. a kind of data processing equipment, which is characterized in that the device includes:
Determination unit, for arbitrary data line in pending file, according to the key message of pre-set the row data
Determine the characteristic value of the row data, according to this feature value navigate in pre-set structure of arrays for storage line data
The element of mark;
Processing unit, for judging whether the element that the determination unit navigates to occupied, if it is, by the row data with
And the row data occupied corresponding to the mark of the element are determined as the row data for meeting preset condition;Otherwise, by the row data
In mark storage to the element.
2. device as described in claim 1, which is characterized in that the structure of arrays is the two dimension comprising row element and column element
Structure of arrays, and the number of the element in the structure of arrays is not less than the line number of data in All Files to be treated.
3. device as claimed in claim 2, which is characterized in that the line number of data is 2 in All Files to be treatedm+n, m
It is natural number with n;
The structure of arrays includes 2m+ 1 row element and 2n+ 1 column element.
4. device as claimed in claim 2 or claim 3, which is characterized in that the determination unit is specifically used for:
To arbitrary data line in pending file, the key message x of pre-set the row data is obtained, and utilize Hash letter
Several characteristic value h for x be calculated the row data(x);
According to formula r (x)=h (x) & (2m+n-1)>>N, navigate in pre-set structure of arrays for storing the row data
Mark element row address r (x);
According to formula c (x)=h (x) & (2m+n-1)&(2n- 1), navigate in pre-set structure of arrays for storing the row
The column address c (x) of the element of the mark of data;
Wherein, 2m+nRepresent the line number of data in All Files to be treated, m and n are natural number.
5. device as described in claim 1, which is characterized in that the structure of arrays is to include 1 row element and 2n+ 1 column element
One-dimension array structure, n are natural number.
6. device as described in claim 1, which is characterized in that the number of file where the mark of row data includes the row data
And the line number of the row data in this document.
7. device as described in claim 1, which is characterized in that the device further includes:
Message generation unit, for the relevant information for the row data for meeting preset condition that the processing unit determines to be added to
In message;Wherein, the relevant information includes meeting the key message and mark of the row data of preset condition.
8. device as claimed in claim 2, which is characterized in that multiple threads are respectively adopted in the determination unit and processing unit
Concurrently the row data in multiple files are handled, the row data in one of one file of thread process.
9. device as claimed in claim 8, which is characterized in that the determination unit and processing unit are when determining thread visit
When asking an element in the structure of arrays, forbid the row element where other thread accesses element.
10. a kind of data processing method, which is characterized in that this method includes:
To arbitrary data line in pending file, the row data are determined according to the key message of pre-set the row data
Characteristic value navigates to the element of the mark for storage line data in pre-set structure of arrays according to this feature value;
Whether the element navigated to described in judgement is occupied, if it is, by the row data and occupying the mark institute of the element
Corresponding row data are determined as the row data for meeting preset condition;Otherwise, the mark of the row data is stored into the element.
11. method as claimed in claim 10, which is characterized in that the structure of arrays is includes the two of row element and column element
Structure of arrays is tieed up, and the number of the element in the structure of arrays is not less than the line number of data in All Files to be treated.
12. method as claimed in claim 11, which is characterized in that the line number of data is 2 in All Files to be treatedm+n,
M and n is natural number;
The structure of arrays includes 2m+ 1 row element and 2n+ 1 column element.
13. the method as described in claim 11 or 12, which is characterized in that arbitrary data line in pending file, according to
The key message of pre-set the row data determines the characteristic value of the row data, is navigated to according to this feature value pre-set
The element of the mark for storage line data in structure of arrays, including:
To arbitrary data line in pending file, the key message x of pre-set the row data is obtained, and utilize Hash letter
Several characteristic value h for x be calculated the row data(x);
According to formula r (x)=h (x) & (2m+n-1)>>N, navigate in pre-set structure of arrays for storing the row data
Mark element row address r (x);
According to formula c (x)=h (x) & (2m+n-1)&(2n- 1), navigate in pre-set structure of arrays for storing the row
The column address c (x) of the element of the mark of data;
Wherein, 2m+nRepresent the line number of data in All Files to be treated, m and n are natural number.
14. method as claimed in claim 10, which is characterized in that the structure of arrays is to include 1 row element and 2n+ 1 column element
One-dimension array structure, n is natural number.
15. method as claimed in claim 10, which is characterized in that the volume of file where the mark of row data includes the row data
Number and the row data line number in this document.
16. method as claimed in claim 10, which is characterized in that this method further includes:
The relevant information of row data for meeting preset condition is added in message;Wherein, the relevant information includes meeting pre-
If the key message and mark of the row data of condition.
17. method as claimed in claim 11, which is characterized in that using multiple thread parallels to the line number in multiple files
According to being handled, the row data in one of one file of thread process.
18. method as claimed in claim 17, which is characterized in that when a member in structure of arrays described in a thread accesses
When plain, forbid the row element where other thread accesses element.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410023109.7A CN104793997B (en) | 2014-01-17 | 2014-01-17 | A kind of data processing equipment and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410023109.7A CN104793997B (en) | 2014-01-17 | 2014-01-17 | A kind of data processing equipment and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104793997A CN104793997A (en) | 2015-07-22 |
CN104793997B true CN104793997B (en) | 2018-06-26 |
Family
ID=53558810
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410023109.7A Active CN104793997B (en) | 2014-01-17 | 2014-01-17 | A kind of data processing equipment and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104793997B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108460453B (en) * | 2017-02-21 | 2022-05-17 | 阿里巴巴集团控股有限公司 | Data processing method, device and system for CTC training |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101159795A (en) * | 2007-10-25 | 2008-04-09 | 中兴通讯股份有限公司 | Calling list rearrangement method and device |
CN102591855A (en) * | 2012-01-13 | 2012-07-18 | 广州从兴电子开发有限公司 | Data identification method and data identification system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120323786A1 (en) * | 2011-06-16 | 2012-12-20 | OneID Inc. | Method and system for delayed authorization of online transactions |
-
2014
- 2014-01-17 CN CN201410023109.7A patent/CN104793997B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101159795A (en) * | 2007-10-25 | 2008-04-09 | 中兴通讯股份有限公司 | Calling list rearrangement method and device |
CN102591855A (en) * | 2012-01-13 | 2012-07-18 | 广州从兴电子开发有限公司 | Data identification method and data identification system |
Also Published As
Publication number | Publication date |
---|---|
CN104793997A (en) | 2015-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107196900A (en) | A kind of method and device for verification of knowing together | |
CN107819569B (en) | The encryption method and terminal device of log-on message | |
CN105224606B (en) | A kind of processing method and processing device of user identifier | |
CN104298680B (en) | Data statistical approach and data statistics device | |
CN107040582A (en) | A kind of data processing method and device | |
CN106529682A (en) | Method and apparatus for processing deep learning task in big-data cluster | |
CN108228730A (en) | Data lead-in method, device, computer equipment and readable storage medium storing program for executing | |
CN107729409A (en) | A kind of short chain delivers a child into method and device | |
CN104572785B (en) | A kind of distributed method and apparatus for creating index | |
CN109726004B (en) | Data processing method and device | |
CN106294421A (en) | A kind of data write, read method and device | |
Jane et al. | Evaluating cost and reliability integrated performance of stochastic logistics systems | |
CN104573557B (en) | Cloud data storage method and device and cloud data restoration method | |
CN108268534A (en) | Propagating influence computational methods and device | |
CN107016115A (en) | Data export method, device, computer-readable recording medium and electronic equipment | |
Apostal et al. | Password recovery using MPI and CUDA | |
CN105956921A (en) | Method and device for selecting bankcard number by user himself/herself | |
CN108399333A (en) | System and method for the anti-virus scan for executing webpage | |
CN107402905A (en) | Computational methods and device based on neutral net | |
CN104376056B (en) | A kind of method and apparatus of data processing | |
CN105677645B (en) | A kind of tables of data comparison method and device | |
CN103455518A (en) | Data processing method and device | |
CN104793997B (en) | A kind of data processing equipment and method | |
CN112651054A (en) | Memory data integrity protection method and device and electronic equipment | |
CN106796587A (en) | Checking analysis result |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |