CN102073479A

CN102073479A - Data size-based shuffle switch matrix compression method

Info

Publication number: CN102073479A
Application number: CN 201010559590
Authority: CN
Inventors: 陈书明; 刘胜; 万江华; 陈海燕; 陈跃跃; 孙海燕; 张凯; ***; 刘祥远; 李振涛
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2010-11-25
Filing date: 2010-11-25
Publication date: 2011-05-25
Anticipated expiration: 2030-11-25
Also published as: CN102073479B

Abstract

The invention discloses a data size-based shuffle switch matrix compression method, which comprises the following steps of: step 1, when performing data size-based shuffle switch matrix compression according to a shuffle request, setting the data size of the current shuffle request to be the smallest data size first; step 2, judging whether the data size of the current shuffle request can be doubled, and if the data size of the current shuffle request cannot be doubled, skipping to the step 3, otherwise, skipping to the step 4; step 3, doubling the current data size, compressing the current shuffle switch matrix according to the doubled data size, and skipping to the step 2; step 4, further compressing the current switch matrix by using the current size indexing method; and step 5, judging whether the current shuffle request is the last shuffle request in the procedure, if the current shuffle request is not the last shuffle request, processing the next shuffle request, and skipping to the step 1, otherwise, ending the algorithm. The data size-based shuffle switch matrix compression method is simple in principle, is easy and convenient to operate, and can effectively improve the utilization ratio and the compression efficiency of a shuffle mode memory.

Description

Shuffle the switch matrix compression method based on data granularity

Technical field

The present invention relates generally to and adopts single instruction stream multiple data stream (Single Instruction stream Multiple Data streams, SIMD) general processor of technology or digital signal processor (Digital Signal Processor, DSP) field is refered in particular to and is shuffled the switch matrix compression method among a kind of SIMD of being applied to or the DSP.

Background technology

Parallel and vectorial technology is that the SIMD technology of representative has very big advantage in solving data operation intensive applications such as media and radio communication with sub-word, thereby has obtained using widely and developing, and has been subjected to the particularly favor of DSP of processor.Shuffling the unit and mainly be responsible for the mutual of data between each processing unit register of SIMD parts, is the critical component of the parallel processing unit performance performance of restriction.

The hardware body structure of shuffling the unit generally has following three kinds of modes, is respectively cross bar switch (Crossbar), multistage network and complete shuffle-exchange network.The dirigibility of the central Crossbar of these three kinds of structures is the strongest, and multistage network takes second place, and shuffle-exchange network is the poorest fully; The hardware of shuffle-exchange network realization expense minimum and extendability are also relatively good fully, and multistage network takes second place, and Crossbar is the poorest.Because the diversity of the shuffle mode that Embedded Application needs, the use of Crossbar is the most extensive, and the method that the present invention discussed is that Crossbar is a background with the hardware configuration that shuffles the unit also.

With Crossbar be the shuffling in the unit of main hardware structure, shuffle mode be with control Crossbar in each switching node disconnect or the matrix of the 01 value composition of conducting is one to one.The matrix that 01 value of each switching node disconnection or conducting among the control Crossbar is formed is referred to as switch matrix, and (or before) must directly or indirectly tell Crossbar that this shuffles needed switch matrix when the user shuffled instruction in execution.Because the data volume of switch matrix itself is bigger, need compress it.On the one hand, this compression is very important, square being directly proportional of the width of the data volume of switch matrix and SIMD data path, and along with the increase of the width of SIMD data path, the data volume of switch matrix can sharply expand; On the other hand, the complicacy of circuit in the time of must considering decompress(ion) should adopt fairly simple Compress softwares technology, so that trading off aspect critical path time-delay and the compression efficiency.

Traditional shuffles the unit and shuffles in the instruction, generally adopts minimum data granularity indexing method to compress.As shown in Figure 1, be the compression process synoptic diagram of traditional employing minimum data granularity indexing method: when the switch matrix compression is carried out in the request of shuffling, do not consider its data granularity, its data granularity is set to the minimum data granularity, according to current granularity indexing method switch matrix is carried out index compression then, judge that then whether the current request of shuffling is that in the program last shuffled request, handle if not the compression that then continues the next one request of shuffling carrying out switch matrix, otherwise dispose.

The AltiVec instruction set that with the Motorola Inc. is design agents is an example, in the AltiVec instruction set, the width of vector is 128bits (16 byte), and the minimum data granularity of shuffling is 8bits, the instruction VPERM that shuffles in the AltiVec instruction set has used a vector registor (128bit) expression shuffle mode, it is 16 territories (each 8bits) that this vector registor is divided into, respectively each byte of indicating target vector registor from which byte in the source vector register (be each domain representation the delegation in the switch matrix), similarly also has the SHF instruction of the EVP processor of PHILIPS Co. respectively.

The switch matrix compression method of this minimum data granularity indexing method uses more extensive in present vector processor, but this method is the same for the compression efficiency of the request of shuffling of different pieces of information granularity, and whole compression efficiency is lower.

The program of different application (or the different program segment of same application program) often has different data widths, as in video algorithm, data width is in the majority with byte or half-word, in the wireless telecommunications algorithm, the shared ratio of complex operation (data width is word or double word) is very big, above-mentioned phenomenon has also caused the data granularity of the request of shuffling of distinct program (or the different program segment of same application program) also to exist than big-difference, and switch matrix compression method provides possibility for the present invention seeks more efficiently for this.

Summary of the invention

The technical problem to be solved in the present invention just is: at the technical matters that prior art exists, the invention provides that a kind of principle is simple, easy and simple to handle, the utilization factor that can effectively improve the shuffle mode storer, improve compression efficiency shuffle the switch matrix compression method based on data granularity.

For solving the problems of the technologies described above, the present invention by the following technical solutions:

A kind of based on data granularity shuffle the switch matrix compression method, it is characterized in that step is:

Step 1: the request of shuffling is being carried out based on the shuffling switch matrix when compression of data granularity, and the at first current data granularity that shuffles request is set to the minimum data granularity;

Step 2: judge that whether the current data granularity that shuffles request can double, if cannot double, then goes to step 3; Otherwise jump to step 4;

Step 3: current data granularity is doubled, and the current switch matrix that shuffles is compressed, and jump to step 2 based on the data granularity after doubling;

Step 4: shuffle request according to this, use current granularity indexing method that existing switch matrix is further compressed;

Step 5: judge that whether the current request of shuffling is that in the program last shuffled request, if not, then handle the next one and shuffle request, and go to step 1, otherwise algorithm finishes.

As a further improvement on the present invention:

The flow process of described step 2 is:

Whether duplicable variable s is set to 1 2.1 identify current data of shuffling request, and variable i and j are set to 0 simultaneously;

Whether satisfy one of following two conditions 2.2 judge the current element that shuffles among the switch matrix x: 1. .x _{I, j}, x _{I+1, j}, x _{I, j+1}, x _{I+1, j+1}These four elements all are 0; 2. .x _{I, j}, x _{I+1, j+1}These two elements are 1, and x _{I+1, j}, x _{I, j+1}These two elements are 0, if do not satisfy, it is 0 that s then is set, and goes to 2.5; Otherwise go to 2.3;

Deduct 2 2.3 judge the scale n whether j equals current switch matrix, if then go to 2.4, otherwise j is from increasing 2, and goes to 2.2;

Deduct 2 2.4 judge the scale n whether i equals current switch matrix, if, then go to 2.5, otherwise i being from increasing 2, j equals 0, and goes to 2.2;

2.5 if current s=1, the then current data granularity that shuffles request can double, otherwise cannot double.

The flow process of described step 4 is:

4.1 variable i is set to 0;

4.2 judge whether the capable element of i of the switch matrix of current data granularity all is 0, if then with the most significant digit SF_i[log in the i territory of final shuffle mode SF ₂N] assignment is 1, all the other position SF_i[log ₂N-1:0] assignment is 0, and jump to 4.4, otherwise with SF_i[log ₂N] assignment is 0, it is 0 that variable j is set, and jumps to 4.3;

4.3 judge whether the element that the capable j of i of the switch matrix of current data granularity is listed as is 1, if then with SF_i[log ₂N-1:0] assignment is j, and jumps to 4.4, otherwise with j from increasing 1, and go to 4.3;

4.4 i is increased 1 certainly; And judge i whether less than the scale n of the switch matrix of current data granularity, if, then jump to 4.2, continue the next line element of compression switch matrix, otherwise compression finishes, the SF_0 that obtains, SF_i ..., SF_n-1 is final shuffle mode.

Compared with prior art, the invention has the advantages that:

1. what propose shuffles the matrix compress technique, can improve the utilization factor of shuffle mode storer effectively, and the storer of storage shuffle mode is fully utilized;

2. application the present invention, the total amount of data of the needed shuffle mode of same application program is less, and when executive utility, the data volume of loading shuffle mode also correspondence tails off, and has reduced power consumption;

3. decompression circuit is simple, and is less to critical path time-delay influence.Use the present invention and propose method, decompression circuit only needs to increase a spot of signal replication logic on the decoding scheme basis of classic method, and hardware complexity is little.

4. in sum, the method that the present invention proposes is the capacity of compression switch matrix effectively, and little to the influence of existing decompression circuit, and the Interface design of shuffling instruction and shuffling the unit for vector processor provides new thinking.

Description of drawings

Fig. 1 is the compression process synoptic diagram that adopts minimum data granularity indexing method;

Fig. 2 is the schematic flow sheet that shuffles the switch matrix compression method that the present invention is based on data granularity;

Fig. 3 is an algorithm flow synoptic diagram of judging among the present invention whether the current data granularity that shuffles request can double;

Fig. 4 is the schematic flow sheet of current granularity indexing method in concrete the enforcement;

Fig. 5 is the compression process synoptic diagram that the switch matrix of the bigger request of shuffling of data granularity adopts classic method;

Fig. 6 is the compression process synoptic diagram that the switch matrix of the bigger request of shuffling of data granularity adopts the method for the present invention's proposition.

Embodiment

Below with reference to Figure of description and specific embodiment the present invention is described in further details.

The present invention is at pretreatment stage, shuffle request for each, determine its final shuffling data granularity step by step, and switch matrix is compressed according to established data granularity step by step, further compress according to current granularity indexing method at last, thereby finish the final compression of switch matrix.

The data path overall width of supposing processor is W bits, the minimum data granularity is G bits, the scale of Crossbar is N*N, and the width of each port is that (W here and G are 2 integral number power to G bits, and N=W/G), then each shuffle mode of shuffling request is the switch matrix of a N*N when initial, establish it and be X, and wherein the numerical value of each element is X _{I, j}(1≤i≤N-1,1≤j≤N-1), the maximum data granularity of supposing the processor support is M bits, then T=1og ₂(M/G).

If the current switch matrix that shuffles request is x, current switch matrix scale of shuffling request is that the element in the current switch matrix that shuffles request of n*n is x _{I, j}(1≤i≤n-1,1≤j≤n-1), the current data granularity that shuffles request is g.

As shown in Figure 2, the flow process of shuffling the switch matrix compression method that the present invention is based on data granularity is:

Step 1: the request of shuffling is when carrying out based on the compression of the switch matrix of data granularity, and the at first current data granularity that shuffles request is set to the minimum data granularity; X=X promptly is set, g=G, n=N, x _{I, j}=X _{I, j}(1≤i≤N-1,1≤j≤N-1), t=0, and go to step 2;

Step 3: current data granularity is doubled, and current switch matrix is compressed, and jump to step 2 based on the data granularity after doubling; If the switch matrix after the compression is y, then Ya Suo method is: y is set _{I, j}=x _{I*2, j*2}(1≤i≤n/2-1,1≤j≤n/2-1), g=2g is set, x=y, n=n/2, t=t+1; If t≤T then goes to step 2, otherwise skips to step 4;

As shown in Figure 3, the algorithm flow synoptic diagram for judging in the above-mentioned steps 2 whether the current data granularity that shuffles request can double:

Whether duplicable variable s is set to 1 2.1 identify current data of shuffling request, and variable i and j are set to 0 simultaneously; That is, whether the current data granularity that shuffles request of defining variable s sign is 2g (s=1 represents is, s=0 represents not), and s=1 is set, and i=0 is set, j=0;

Whether satisfy one of following two conditions 2.2 judge the element among the current switch matrix x: 1. .x _{I, j}, x _{I+1, j}, x _{I, j+1}, x _{I+1, j+1}These four elements all are 0; 2. .x _{I, j}, x _{I+1, j+1}These two elements are 1 and x _{I+1, j}, x _{I, j+1}These two elements are 0.If do not satisfy, it is 0 that s then is set, and goes to 2.5; Otherwise go to 2.3;

Deduct 2 (being n-2) 2.3 judge the scale n whether j equals current switch matrix, if then go to 2.4, otherwise j is from increasing 2 (j=j+2), and goes to 2.2;

Deduct 2 (they being n-2) 2.4 judge the scale n whether i equals current switch matrix, if, then go to 2.5, otherwise i being from increasing 2 (i=i+2), j equals 0, and goes to 2.2;

2.5 if current s=1, the then current data granularity that shuffles request can double (being that the current data granularity that shuffles request is 2g), otherwise cannot double (being that the current data granularity that shuffles request is g).

As shown in Figure 4, subalgorithm---the schematic flow sheet of current granularity indexing method total in the switch matrix compression method that shuffles for traditional minimum data granularity indexing method and the present invention's proposition based on data granularity, it is above-mentioned steps 4, if final shuffle mode is SF, then SF has n territory, the figure place 1+log in each territory ₂N uses SF_i[j here] (0≤i≤n-1,0≤j≤log ₂N) the j position of expression shuffle mode territory i, idiographic flow is:

4.1 variable i is set to 0;

4.4 i is increased 1 certainly; And judge i whether less than the scale n of the switch matrix of current data granularity, if, then jump to 4.2, continue the next line element of compression switch matrix, otherwise compression finishes, the SF_0 that obtains, SF_1 ..., SF_n-1 is final shuffle mode.

As from the foregoing, the method that the present invention proposes is the improvement to traditional minimum data granularity indexing method, if do not carry out first to the 4th above-mentioned step, and directly carries out for the 5th step, and this method is exactly a minimum data granularity indexing method.The method principal character that the present invention proposes is that the data granularity of the request of shuffling is distinguished, and for the switch matrix of a N*N, it can be compressed to N/2 ^X-1* (1+log ₂(N/2 ^X-1)) position (the x here is the multiple that the current data granularity that shuffles request is the minimum data granularity), and traditional minimum data granularity indexing method can only be with its boil down to N* (1+log ₂N) position, when the granularity of the request of shuffling is not equal to the minimum data granularity, i.e. x＞1 o'clock, the method that the present invention proposes will be far superior to traditional method to the compression effectiveness of switch matrix.

At above-mentioned switch matrix compression method, the present invention proposes following instruction SHUF (B, the H of shuffling based on data granularity, W, DW) (Mode), VRi, VRj. wherein SHUFB, SHUFH, SHUFW and SHUFDW to identify the granularity of shuffling respectively be byte, half-word, word and double word; The address (when specific implementation, also Mode can be set, at this moment acquiescence corresponding shuffle mode leave in certain register, programmer need in advance shuffle mode moved this register) of shuffle mode in storer that the Mode sign is corresponding; VRi and VRj identify the source operand place register number of shuffling operation and the register number at destination operand place respectively.

The method that the present invention proposes is carried out shuffling instruction, and the shuffle mode decompress(ion) only needs to increase a spot of signal replication logic when being switch matrix on the decoding scheme basis of classic method, thereby very little to critical path time-delay influence.

As shown in Figure 5, adopt the compression process synoptic diagram of classic method for the switch matrix of the bigger request of shuffling of data granularity: a among the figure, b, c, d, e, f, g, h, A, B, C, D, E, F, G, the equal 8bits of the width of each element among the H, the scale of shuffling the Crossbar in the unit is 16*16, and the width of port is 8bits, and the minimum data granularity of the request of shuffling that can support also is 8bits.Vectorial abcdefghABCDEFGH shuffled be vectorial efghEFGHabcd0000, the original switch matrix of its shuffle mode correspondence is 01 matrix (shown in bottom left section among the figure) of a 16*16, traditional minimum data granularity indexing method directly compresses according to step shown in Figure 4 the original switch matrix, the 0th behavior 0000_1000_0000_0000 in the original switch matrix, because the element of this delegation is not 0 entirely, so the most significant digit in the 0th territory of the shuffle mode after the compression is 0; Simultaneously because the 4th element of this delegation is 1, so minimum four of the 0th territory of the shuffle mode after the compression are 4 (compression process of the row of the 1st～15 in the original switch matrix repeats no more here).Adopt classic method, the switch matrix of 256 elements in the example is compressed to 80bits.

As shown in Figure 6, adopt the compression process synoptic diagram of the method for the present invention's proposition for the switch matrix of the bigger request of shuffling of data granularity.Obviously, the example among the request of shuffling shown in the figure and Fig. 5 is identical.Method according to the present invention's proposition, the data granularity of the original request of shuffling can be 2 times minimum data granularity, be 16bits, the switch matrix of original switch matrix this moment after can 2 times of minimum data granularities compressions of boil down to (its scale become original 1/4); Further, switch matrix after 2 times of minimum data granularities are compressed can also continue to be compressed, be that 32bits compresses promptly according to data granularity, switch matrix after 4 times of minimum data granularities compressions of switch matrix boil down to after 2 times of minimum data granularities compression this moment (it is original 1/4 that its scale becomes, become the original switch matrix 1/16); Last again according to method (the described method of Fig. 4) the boil down to 001_011_001_100 of 32bits index, adopt the present invention to propose method like this, the switch matrix of 256 elements in the example finally is compressed into 12bits, with respect to traditional method, the compressibility of the method that the present invention proposes has improved more than 6 times.

Below only be preferred implementation of the present invention, protection scope of the present invention also not only is confined to the foregoing description, and all technical schemes that belongs under the thinking of the present invention all belong to protection scope of the present invention.Should be pointed out that for those skilled in the art the some improvements and modifications not breaking away under the principle of the invention prerequisite should be considered as protection scope of the present invention.

Claims

One kind based on data granularity shuffle the switch matrix compression method, it is characterized in that step is:

Step 1: the request of shuffling is being carried out based on the shuffling switch matrix when compression of data granularity, and the at first current data granularity that shuffles request is set to the minimum data granularity;

Step 2: judge that whether the current data granularity that shuffles request can double, if cannot double, then goes to step 3; Otherwise jump to step 4;

Step 3: current data granularity is doubled, and the current switch matrix that shuffles is compressed, and jump to step 2 based on the data granularity after doubling;

Step 4: shuffle request according to this, use current granularity indexing method that existing switch matrix is further compressed;

Step 5: judge that whether the current request of shuffling is that in the program last shuffled request, if not, then handle the next one and shuffle request, and go to step 1, otherwise algorithm finishes.
2. according to claim 1 based on data granularity shuffle the switch matrix compression method, it is characterized in that the flow process of described step 2 is:

Whether duplicable variable s is set to 1 2.1 identify current data of shuffling request, and variable i and j are set to 0 simultaneously;

Whether satisfy one of following two conditions 2.2 judge the current element that shuffles among the switch matrix x: 1. .x _{I, j}, x _{I+1, j}, x _{I, j+1}, x _{I+1, j+1}These four elements all are 0; 2. .x _{I, j}, x _{I+1, j+1}These two elements are 1, and x _{I+1, j}, x _{I, j+1}These two elements are 0, if do not satisfy, it is 0 that s then is set, and goes to 2.5; Otherwise go to 2.3;

Deduct 2 2.3 judge the scale n whether j equals current switch matrix, if then go to 2.4, otherwise j is from increasing 2, and goes to 2.2;

Deduct 2 2.4 judge the scale n whether i equals current switch matrix, if, then go to 2.5, otherwise i being from increasing 2, j equals 0, and goes to 2.2;

2.5 if current s=1, the then current data granularity that shuffles request can double, otherwise cannot double.
3. according to claim 1 based on data granularity shuffle the switch matrix compression method, it is characterized in that the flow process of described step 4 is:

4.1 variable i is set to 0;

4.2 judge whether the capable element of i of the switch matrix of current data granularity all is 0, if then with the most significant digit SF_i[log in the i territory of final shuffle mode SF ₂N] assignment is 1, all the other position SF_i[log ₂N-1:0] assignment is 0, and jump to 4.4, otherwise with SF_i[log ₂N] assignment is 0, it is 0 that variable j is set, and jumps to 4.3;

4.3 judge whether the element of the capable j row of the i of switch matrix of current data granularity is 1, if then with SF_i[log2n-1:0] assignment is j, and jumps to 4.4, otherwise with j from increasing 1, and go to 4.3;

4.4 i is increased 1 certainly; And judge i whether less than the scale n of the switch matrix of current data granularity, if, then jump to 4.2, continue the next line element of compression switch matrix, otherwise compression finishes, the SF_0 that obtains, SF_1 ..., SF_n-1 is final shuffle mode.