CN101477456A - Self-correlated arithmetic unit and processor - Google Patents

Self-correlated arithmetic unit and processor Download PDF

Info

Publication number
CN101477456A
CN101477456A CNA2009101050581A CN200910105058A CN101477456A CN 101477456 A CN101477456 A CN 101477456A CN A2009101050581 A CNA2009101050581 A CN A2009101050581A CN 200910105058 A CN200910105058 A CN 200910105058A CN 101477456 A CN101477456 A CN 101477456A
Authority
CN
China
Prior art keywords
register
data element
shift register
source operand
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2009101050581A
Other languages
Chinese (zh)
Other versions
CN101477456B (en
Inventor
焦玉中
王新安
倪学文
刘雪娇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hengxing Strategy Investment Limited
Original Assignee
Peking University Shenzhen Graduate School
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Shenzhen Graduate School filed Critical Peking University Shenzhen Graduate School
Priority to CN2009101050581A priority Critical patent/CN101477456B/en
Publication of CN101477456A publication Critical patent/CN101477456A/en
Application granted granted Critical
Publication of CN101477456B publication Critical patent/CN101477456B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses an autocorrelation operation unit and a processor. The autocorrelation operation unit comprises a dependency number processing unit, a multiplier, an adder, a second shift register and a register. The processor deploys the functions of the autocorrelation operation unit through executing autocorrelation operation instructions, reads and executes autocorrelation operation on source operands according to the sequence of data elements in source operands in sequence, and outputs a plurality of data elements of destination operands, so as to read only once, decode and execute autocorrelation operation instructions and simplify the complexity of the autocorrelation operation.

Description

A kind of self-correlated arithmetic unit and processor
[technical field]
The present invention relates to the integrated circuit (IC) design field, relate in particular to a kind of processor and a kind of self-correlated arithmetic unit.
[background technology]
In the communications field, often need carry out various computings to a sequence data element, auto-correlation computation for example, Fig. 1 is the principle schematic of autocorrelation operation.100 each sampling period of data stream are moved one along data flow, and the arrow indication is a data flow.Data stream 100 comprises that several data elements 101,102 and 103 are respectively the up-to-date data elements that enters and be about to grand window A 106.104 and 105 is respectively the up-to-date data element that enters and be about to grand window B 107.Autocorrelation operation result is the sum of products that enters a plurality of data elements and the corresponding conjugate that enters a plurality of data elements among the window B among the window A.When data element 102 and 104 is same data, become a kind of particular form of auto-correlation computation, promptly ask the average power computing of data element.Existing processor need be got two source operands at every turn when handling auto-correlation computation, then the product execution result is deposited in storer, afterwards again in auto-correlation results added last time/subtract, thereby complicated operation, be still waiting to improve.
[summary of the invention]
The main technical problem to be solved in the present invention is to provide a kind of processor and a kind of self-correlated arithmetic unit, the complexity of simplification auto-correlation computation.
For solving the problems of the technologies described above, the invention provides a kind of self-correlated arithmetic unit, comprise: the dependency number processing unit, multiplier, totalizer, second shift register and register, the input end of described dependency number processing unit is used for the data element of input source operand, the input end of described multiplier is respectively applied for the data element of input source operand and the result of dependency number processing unit output, the output terminal of described multiplier connects the totalizer and second shift register respectively, the input end of described totalizer also connects the output terminal of second shift register and the output terminal of register respectively, and the output terminal of described totalizer connects the input end of register respectively and is used to export the data element of destination operand.
According to a further aspect in the invention, a kind of processor also is provided, comprise the algorithm data control assembly, configuration register and at least one arithmetic logical unit, described arithmetic logical unit comprises the above-mentioned described self-correlated arithmetic unit that is used to carry out auto-correlation computation at least, described algorithm data control assembly links to each other with configuration register, configuration register links to each other with self-correlated arithmetic unit, described algorithm data control assembly is carried out the autocorrelation operation instruction, send first configuration information to described configuration register, described self-correlated arithmetic unit is configured according to the function of first configuration information to himself.
The invention has the beneficial effects as follows: processor is by carrying out the autocorrelation operation instruction, function to self-correlated arithmetic unit is configured, order according to the data element in the source operand reads successively to source operand, and successively the conjugate of a data element before having read in the up-to-date data element that reads and first source operand in the source operand is carried out complex multiplication by self-correlated arithmetic unit, obtain a result of product plural number or real number; A pairing result of product of data element that had before read in this result of product and the source operand is subtracted each other, and and source operand in the pairing auto-correlation results added of previous data element of a up-to-date data element that reads, obtain an auto-correlation result of a up-to-date pairing destination operand of data element that reads in the source operand.And successively several data elements of destination operand are exported, thereby instruction can only once read, deciphers and carry out to autocorrelation operation, simplify the complexity of auto-correlation computation, can dwindle instruction memory size simultaneously, reduce the power consumption of instruction decode and configuration operation.
[description of drawings]
Fig. 1 is the schematic diagram of auto-correlation computation;
Fig. 2 A is the structural drawing of an embodiment of the present invention;
Fig. 2 B is the structural drawing of the another kind of embodiment of the present invention;
Fig. 3 A is the structural drawing of a kind of embodiment of self-correlated arithmetic unit among the present invention;
Fig. 3 B is the structural drawing of the another kind of embodiment of self-correlated arithmetic unit among the present invention;
Fig. 3 C is the structural drawing of another embodiment of self-correlated arithmetic unit among the present invention.
[embodiment]
In conjunction with the accompanying drawings the present invention is described in further detail below by embodiment.
Following declarative description is carried out auto-correlation or is asked the embodiment of a kind of technology of average power operation in treating apparatus, computing machine or software program.In the following description, set forth a large amount of details, to provide to thorough of the present invention such as processor type, microarchitecture, Initiated Mechanism etc.Yet, person of skill in the art will appreciate that do not have this class to specifically describe details, also can implement the present invention.Though the reference number signal processor is described following examples,, other embodiment is applicable to the integrated circuit and the logical unit of other type.
If the algorithm characteristics that the processing of data is handled according to data-signal carries out can improving undoubtedly the overall performance of processor.For example in the broadband connections field, the Base-Band Processing of system part generally has the architectural feature of streamline or stream processing.Each processing module receives some data stream with homogeneity data element, through the several times repeated operation, generates some new data stream with homogeneity data element.For this processing with higher regular data, adopting than the performance element of coarsegrain is favourable to the fast processing of data.For example can handle by a performance element in the processor carrying out an above-mentioned repeated operation.Because this performance element carries out identical processing at every turn, and the position and the addressing method of storage source operand are relatively-stationary, and therefore each repeated operation does not need to repeat to read and instruction is deciphered.Finish configuration by carrying out once command operation (can finish) before Data Stream Processing or in beginning to handle to performance element by an instruction or many instructions.Performance element is data-driven preferably.Like this, data of the every reception of performance element are just carried out single job.Can in instruction, point out to finish the data number of the reception that a subtask needs, i.e. the number of the data cell that comprises of source operand.If all data cells in the source operand all dispose and finish result's output, performance element quits work immediately, waits for configuration and task next time.
Embodiments of the invention comprise the unit that is used to realize auto-correlation computation.Fig. 1 is the principle schematic of autocorrelation operation.Autocorrelation operation result is the multiplication of complex numbers sum that enters a plurality of data elements and the corresponding conjugate that enters a plurality of data elements among the window B among the window A, therefore and the auto-correlation result of preceding single job is always known, and the auto-correlation result of single job added the multiplication of complex numbers result of the up-to-date data element 102 that enters window A and the conjugate of the up-to-date data element 104 that enters window B and deducts the data element 103 of grand window A soon and be about to the multiplication of complex numbers result of conjugate of the data element 105 of grand window B before current autocorrelation operation was actually.According to the flow direction of data stream, data stream at first enters window A.Receive as window B before first data element of data stream, the data element among the window B can be interpreted as null value, so autocorrelation value is zero.When window B receives first data element of data stream, the autocorrelation value practical significance that comes into existence.And when first data element of data stream became 105, auto-correlation result began output.Therefore, when the data element of data stream enters window A and do not enter window B, do not need to carry out auto-correlation computation; When first data element enters window B, promptly begin auto-correlation computation, but keep the auto-correlation result, not outwards output; Before first data element grand window B, carry out autocorrelation operation, keep and output auto-correlation result.Afterwards, data stream whenever moves a data element position, exports an auto-correlation result.Last data element up to data stream enters window A, exports last auto-correlation result this moment.
The source operand of autocorrelation operation promptly is above-mentioned data stream continuously, and it comprises several data elements.Embodiments of the invention promptly are to be used for the source operand with specified data element number is carried out auto-correlation computation.
Please refer to Fig. 2 A, in one embodiment, the processor 200 of finishing auto-correlation computation comprises algorithm data control assembly (being called for short ADU) 203, configuration register 205 and at least one arithmetic logical unit, arithmetic logical unit comprises a self-correlated arithmetic unit 201 that is used to carry out auto-correlation computation at least, algorithm data control assembly 203 links to each other with configuration register 205, configuration register 205 links to each other with self-correlated arithmetic unit 201, self-correlated arithmetic unit 201 also links to each other with the destination operand output source with the source operand input source respectively, algorithm data control assembly 203 is carried out configuration-direct, be to carry out the autocorrelation operation instruction in the present embodiment, send first configuration information to configuration register 205, self-correlated arithmetic unit 201 is configured according to the function of first configuration information to himself.
Algorithm data control assembly 203 comprises storage unit 204 that is used for storage instruction or data and the decoding unit 214 that is used for instruction is deciphered in the present embodiment, and in other embodiments, the algorithm data control assembly can also be to comprise other unit.In the present embodiment, the performed configuration-direct of algorithm data control assembly comprises operational code, configuration information and configuration purpose, operational code is the command code of the performed operation of regulation instruction, configuration information is the object of instruction manipulation, configuration purpose is used to specify the configuration register that writes configuration information, instruct for autocorrelation operation, configuration information comprises the length of window of data element number, auto-correlation computation of source operand and the distance between two windows, in other embodiments, configuration information can also comprise out of Memory.Autocorrelation operation instruction in algorithm data control assembly 203 reading cells 204, decipher by 214 pairs of instructions of decoding unit, first configuration information that decoding is obtained writes configuration register 205, self-correlated arithmetic unit 201 carries out functional configuration according to first configuration information, realization is to the setting of the concrete parameter of autocorrelation operation, the distance in the data cell number that comprises as source operand, the length of window of auto-correlation computation and the auto-correlation computation between two windows.The data element of source operand can directly come from the port of digital signal processor, the register or the data-carrier store that also can come from digital signal processor inside, promptly the source operand input source can be port, internal register or the data-carrier store of processor.Similarly, the data element of destination operand can store in port, internal register or the data-carrier store of digital signal processor, and promptly the destination operand output source also can be port, internal register or the data-carrier store of processor.
Processor is carried out the auto-correlation instruction of single instruction multiple data input multidata output to source operand, function to self-correlated arithmetic unit is configured, order according to the data element in the source operand reads successively and carries out source operand, and successively several data elements of destination operand are exported, thereby instruction can only once read, deciphers and carry out to autocorrelation operation, simplifies the complexity of auto-correlation computation.
Please refer to Fig. 2 B, in another kind of embodiment, finishing the processor 200 of auto-correlation computation and the key distinction of the foregoing description is also to comprise interconnected logical block 206, configuration register 205 also links to each other with interconnected logical block 206, algorithm data control assembly 203 is also carried out other configuration-directs, send second configuration information to configuration register 205, interconnected logical block 206 is configured the input of source operand certificate and the outgoing route of destination operand certificate according to second configuration information.
In the present embodiment, processor 200 comprises the arithmetic logical unit (ALU) 202 of a plurality of combine digital signal Processing, instruction is read and deciphered, algorithm data control assembly (ADU) 203 with the configuration information that produces function and annexation, the configuration that storage obtains through instruction decode and the configuration register (Config) 205 of control information, be responsible for ALU and port (Ports) annexation, the interconnect logic unit 206 that annexation is configured between annexation and a plurality of port between a plurality of ALU, and be responsible for the ppu unit between the port 208 that is connected of bus 209.The function that ALU can finish includes but not limited to: addition, subtraction, multiplication, multiplication add up, with or, arithmetical operation and logical operations such as XOR, left arithmetic/logical shift, right arithmetic/logical shift, comparison, transmission.The ALU201 that for example carries out auto-correlation computation can finish than the complex calculations task, can realize an auto-correlation computation processing that comprises the data stream of a plurality of data elements through configuration.Interconnect logic unit 206 comprises the register (Reg) 207 that is used for exchanges data between a plurality of ALU.211 and 212 are respectively the control bus of annexation between configuration ALU function and configuration ALU and port.The 210th, the data bus that ALU and interconnect logic unit are 206.The 213rd, the data bus that port 208 and interconnect logic are 206.
In the present embodiment, the flow process of autocorrelation operation is: ADU203 reads and deciphers the auto-correlation instruction among the storage unit MEM204, and the function that decoding is obtained and the configuration information of annexation write configuration register 205; Configuration to function and annexation is finished according to the information in the configuration register 205 in auto-correlation ALU201 and interconnect logic unit 206, functional configuration has realized the setting to the concrete parameter of autocorrelation operation, distance in the data cell number that comprises as source operand, the length of window of auto-correlation computation and the auto-correlation computation between two windows, the annexation configuration has realized the source operand of autocorrelation operation and the position of destination operand are set.The source operand input source can be at least one in port 208, internal register 207 and the data-carrier store of processor, the destination operand output source also can be at least one in port 208, internal register 207 and the data-carrier store of processor, also can be port or the register different with the source operand input source.Interconnected logical block 206 is selected the input and output path according to second configuration information.After for example finishing configuration, auto-correlation ALU201 just can read port 208 or register (Reg) 207 in data begin continuous autocorrelation operation, and the result after the autocorrelation operation is outputed in port 208 or the register (Reg) 207.
In the foregoing description, the processor adopting data driven mode has data just to handle, and the output result deposits corresponding ports or register in, then wait; After data element Deng the data element in the source operand in all processed and destination operand is all exported, stop to handle, and wait for configuration next time.
The self-correlated arithmetic unit that is used to carry out auto-correlation computation according to the foregoing description comprises dependency number processing unit, multiplier, totalizer, second shift register and register.The input end of dependency number processing unit is used for the data element of input source operand, the input end of multiplier is respectively applied for the data element of input source operand and the result of dependency number processing unit output, the output terminal of multiplier connects the totalizer and second shift register respectively, the input end of totalizer also connects the output terminal of second shift register and the output terminal of register respectively, and the output terminal of totalizer connects the input end of register respectively and is used to export the data element of destination operand.
The dependency number processing unit is used for the dependency number that data element definite and the input source operand multiplies each other, the data element of input source operand is first data element that enters window A, when the data element in the source operand was real number, the dependency number that multiplies each other with the data element of input source operand was first data element that enters window B.When the data element in the source operand is when plural number, the dependency number that multiplies each other with the data element of input source operand is the conjugate of first data element that enters window B.
Be that plural number is that example describes with the data element in the source operand below.
Fig. 3 A is the logic diagram according to a kind of embodiment of the self-correlated arithmetic unit of the foregoing description.Data element in the source operand is a plural number, the dependency number processing unit comprises first shift register (being source operand data element shift register) 300 and asks conjugate unit 306, the input end of source operand data element shift register 300 is used for the data element of input source operand, its output terminal connects the input end of asking conjugate unit 306, ask the output terminal of conjugate unit 306 to connect multiplier, in the present embodiment, multiplier is a complex multiplier 307, totalizer is a complex adder 308, specifically comprise two complex adder and a plural subtracter, second shift register is a complex multiplication shift register 301 as a result, and register is an auto-correlation result register 309.Input source operand data element one tunnel send complex multiplier 307, and one the tunnel send first register-bit 303 of left side of source operand data element shift register 300.Source operand data element shift register 300 storage source operands or partial data element wherein, a data element of every new reception sources operand, data in the source operand data element shift register 300 move one to certain direction, abandon the data element of a previously stored source operand, the data element that will newly receive deposits the room register cell that produces because of displacement in the source operand data element shift register 300 in simultaneously.Last register-bit 302 of shift register 300 is corresponding to the up-to-date data element 104 that enters window B among Fig. 1.Therefore the effective length of shift register 300 is the distance between first data element that enters two windows among Fig. 1 respectively, the just distance between window A and the window B.The data element (being the data element 302 in the register) of a source operand being about to abandon because of displacement in the source operand shift register through asking conjugate unit 306, and is carried out complex multiplication with a data element of the source operand of its result and up-to-date reception.The output result one tunnel of complex multiplier 307 send complex adder 308, and one the tunnel send complex multiplication first register-bit 304 of left side of shift register 301 as a result.Complex multiplication is the pairing multiplication of complex numbers result of partial data element of shift register 301 storage source operands as a result, every new multiplication of complex numbers result that produces, the complex multiplication data in the shift register 301 as a result moves one to certain direction, abandon the pairing multiplication of complex numbers result of data element of a previously stored source operand, will newly produce a multiplication of complex numbers result simultaneously and deposit the room register cell that complex multiplication produces because of displacement in the shift register 301 as a result in; The complex multiplication length of shift register 301 as a result is the length of window of autocorrelation operation, be the length of window A or window B, the equal in length of window A and window B, 305 is last register-bit, and its storing value is corresponding to the result of product of the conjugate of data flow data element 103 among Fig. 1 and 105.Complex adder 308 receives three tunnel input data, is respectively a preceding autocorrelation value, current complex multiplier result and is stored in value in the register 305.Wherein the value in the register 305 of being stored in of input is that subtracter in complex adder 308 is done subtraction.The output result one tunnel of complex adder deposits the data element output of correlation register 309, a tunnel as destination operand in.The pairing auto-correlation result of previous data element of a up-to-date data element that reads in the correlation register 309 storage source operands.
Data of two every receptions of shift register move a register-bit to assigned direction automatically, in the present embodiment, and data of two every receptions of shift register register-bit that moves right automatically.In other embodiments, also can define data of two every receptions of shift register and be moved to the left a register-bit automatically.
The process of the autocorrelation operation computing of present embodiment is: the conjugate of a data element before having read in up-to-date data element that reads and the source operand in the source operand is carried out complex multiplication, obtain a result of product plural number or real number; A pairing result of product of data element that had before read in this result of product and the source operand is subtracted each other, and and source operand in the pairing auto-correlation results added of previous data element of a up-to-date data element that reads, obtain an auto-correlation result of a up-to-date pairing destination operand of data element that reads in the source operand.
Fig. 3 B is the logic diagram according to the another kind of embodiment of the self-correlated arithmetic unit of the foregoing description.Data element in the source operand still is a plural number, with different being of Fig. 3 A embodiment, asks the position of conjugate unit 306 and the replacing of source operand data element shift register 300.First shift register is a source operand data element conjugate shift register 310, and the input source operand is directly asked conjugation, deposits source operand data element conjugate shift register 310 then in.First register-bit of shift register 310 and last register-bit are respectively 311 and 312.
When the data element in the source operand is real number, the dependency number processing unit can not need to ask conjugation, so the dependency number processing unit comprises first shift register, the length of first shift register among the length of first shift register and Fig. 3 A and Fig. 3 B is identical, the data element of the input end input source operand of first shift register, its output terminal connection multiplier is shifted the data element of importing afterwards and the data element of current input multiplies each other.Accordingly, multiplier and totalizer also are respectively real multipliers and real add musical instruments used in a Buddhist or Taoist mass.
For the shift register in the foregoing description, its effective length is determined by the functional configuration information that instruction decode obtains.That is to say, self-correlated arithmetic unit comprises two relatively long shift registers, and when carrying out autocorrelation operation, determine source operand data element shift register 300 (or source operand data element conjugate shift register 310) and the complex multiplication effective length of shift register 301 as a result respectively according to functional configuration information actual.The effective length of shift register 300 (or 310) is the distance between two windows of auto-correlation computation as previously described, be the distance that enters respectively among Fig. 1 between first data elements of two windows, the length of shift register 301 is the length of window of autocorrelation operation.
When the effective length of shift register 300 (or 310) is zero, show two windows overlays, corresponding operating also becomes from autocorrelation operation asks power operation.
Fig. 3 C is a logic diagram of asking the ALU of average power operation.When the effective length of source operand data element shift register 300 or source operand data element conjugate shift register 310 is zero, self-correlated arithmetic unit can be used for asking the average power of the data element of some, suppose to ask the average power of the data element among the window A, the power of each element is actual refer to its absolute value square, therefore algorithm is square (can multiply each other by data itself and its conjugate and obtain) of each element among the window A being asked absolute value, averages according to the element number in the window then).When the average power of the data element of asking some, on the basis of the foregoing description, need to add and ask average logical block 313.In addition, ask conjugate unit 306 directly the up-to-date data element that reads of source operand to be asked conjugate, complex multiplier 307 carries out the complex multiplication operation with up-to-date data element that reads of source operand and conjugate thereof, draws a real number value; Complex multiplication is the pairing real number result of product of partial data element of shift register 301 storage source operands as a result, every new real number result of product that produces, the complex multiplication data in the shift register as a result moves one to certain direction, abandon the pairing real number result of product of data element of a previously stored source operand, will newly produce a real number result of product simultaneously and deposit the room register cell that complex multiplication produces because of displacement in the shift register as a result in.Correlation register 309 has changed performance number register 314 into, the pairing power sum of previous data element of a up-to-date data element that reads in the performance number register 314 storage source operands.Complex adder 309 has changed real add musical instruments used in a Buddhist or Taoist mass 315 into.The real number result of product that real multiplications result that real add musical instruments used in a Buddhist or Taoist mass 315 will newly produce and complex multiplication are about to abandon because of displacement in the shift register is as a result carried out subtraction, and with the performance number register in the pairing performance number result of previous data element that stores carry out add operation, obtain the up-to-date pairing power sum of a data element that reads of source operand.The power sum of asking 313 pairs of real add musical instruments used in a Buddhist or Taoist mass of average logical block to obtain is asked on average, obtains the average power of the partial data element of source operand.Because real add musical instruments used in a Buddhist or Taoist mass 315 is output as the power sum of a plurality of data elements, the function of therefore asking average logical block 313 is to obtain the average power of the data element in certain length of window.The simplest way is to realize by right shift.
The operating process of present embodiment is: the conjugate of a up-to-date data element that reads in up-to-date data element that reads and the source operand in the source operand is carried out complex multiplication, obtain the result of product of a real number; A pairing result of product of data element that had before read in this result of product and the source operand is subtracted each other, and and source operand in the pairing power sum of the previous data element addition of a up-to-date data element that reads, and ask average, obtain an average power of a up-to-date pairing destination operand of data element that reads in the source operand.
Constructed and the theoretical circuit or the semiconductor device that can easily be applied to other type that can benefit from higher streamline handling capacity and improvement performance of the present invention.Theory of the present invention is applicable to any processor and the machine of carrying out data manipulation.But, the invention is not restricted to carry out the processor or the machine of 64,32 or the operation of 16 bit data.
According to an aspect of the present invention, a kind of machine readable media of storing instruction therein is provided, described instruction makes described machine carry out the method that may further comprise the steps: the data element that reads the source operand with a plurality of plural numbers or real number characteristic data value continuously when being carried out by machine; Determine in a plurality of data elements that newly read in the source operand and the source operand before or the correlated results between a plurality of data elements that newly read; Store described correlated results, promptly store a plurality of data elements continuously with destination operand of plural characteristic or real number characteristic data value.
Above content be in conjunction with concrete preferred implementation to further describing that the present invention did, can not assert that concrete enforcement of the present invention is confined to these explanations.For the general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, can also make some simple deduction or replace, all should be considered as belonging to protection scope of the present invention.

Claims (10)

1. self-correlated arithmetic unit, it is characterized in that comprising: the dependency number processing unit, multiplier, totalizer, second shift register and register, the input end of described dependency number processing unit is used for the data element of input source operand, the input end of described multiplier is respectively applied for the data element of input source operand and the result of dependency number processing unit output, the output terminal of described multiplier connects the totalizer and second shift register respectively, the input end of described totalizer also connects the output terminal of second shift register and the output terminal of register respectively, and the output terminal of described totalizer connects the input end of register respectively and is used to export the data element of destination operand.
2. self-correlated arithmetic unit as claimed in claim 1, it is characterized in that: described dependency number processing unit comprises first shift register and asks conjugate unit, the input end of described first shift register is used for the data element of input source operand, its output terminal connects the input end of asking conjugate unit, and the described output terminal of conjugate unit of asking connects multiplier; Perhaps the input end of described conjugate unit is used for the data element of input source operand, and its output terminal connects the input end of first shift register, and the output terminal of described first shift register connects multiplier.
3. self-correlated arithmetic unit as claimed in claim 1 is characterized in that: described dependency number processing unit comprises first shift register, and the input end of described first shift register is used for the data element of input source operand, and its output terminal connects multiplier.
4. as claim 2 or 3 described self-correlated arithmetic units, it is characterized in that: the length of described first shift register is the distance between two windows of auto-correlation computation, and the length of described second shift register is the length of window of auto-correlation computation.
5. self-correlated arithmetic unit as claimed in claim 4 is characterized in that: data of the every reception of described first shift register and second shift register move a register-bit to assigned direction automatically.
6. as claim 2 or 3 described self-correlated arithmetic units, it is characterized in that: the length of described first shift register is 0, described self-correlated arithmetic unit also comprises asks average logical block, the described output terminal of asking the input end connection totalizer of average logical block, output terminal is used to export the destination operand data element.
7. processor, comprise the algorithm data control assembly, it is characterized in that also comprising: configuration register and at least one arithmetic logical unit, described arithmetic logical unit comprises one at least as each described self-correlated arithmetic unit that is used to carry out auto-correlation computation in the claim 1 to 6, described algorithm data control assembly links to each other with configuration register, configuration register links to each other with self-correlated arithmetic unit, described algorithm data control assembly is carried out the autocorrelation operation instruction, send first configuration information to described configuration register, described self-correlated arithmetic unit is configured according to the function of first configuration information to himself.
8. processor as claimed in claim 7 is characterized in that: described first configuration information comprises the length of window of data element number, auto-correlation computation of source operand and the distance between two windows.
9. as claim 7 or 8 described processors, it is characterized in that: also comprise interconnected logical block, described configuration register also links to each other with interconnected logical block, described algorithm data control assembly is also carried out configuration-direct, send second configuration information to described configuration register, described interconnected logical block is configured the input of source operand certificate and the outgoing route of destination operand certificate according to second configuration information.
10. processor as claimed in claim 7 is characterized in that: the decoding unit that described algorithm data control assembly comprises the storage unit that is used for storage instruction or data and is used for instruction is deciphered.
CN2009101050581A 2009-01-14 2009-01-14 Self-correlated arithmetic unit and processor Expired - Fee Related CN101477456B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009101050581A CN101477456B (en) 2009-01-14 2009-01-14 Self-correlated arithmetic unit and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009101050581A CN101477456B (en) 2009-01-14 2009-01-14 Self-correlated arithmetic unit and processor

Publications (2)

Publication Number Publication Date
CN101477456A true CN101477456A (en) 2009-07-08
CN101477456B CN101477456B (en) 2011-06-08

Family

ID=40838178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009101050581A Expired - Fee Related CN101477456B (en) 2009-01-14 2009-01-14 Self-correlated arithmetic unit and processor

Country Status (1)

Country Link
CN (1) CN101477456B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102103486A (en) * 2009-12-22 2011-06-22 英特尔公司 Add instructions to add three source operands
CN110535847A (en) * 2019-08-23 2019-12-03 北京无极芯动科技有限公司 The stacking processing method of network processing unit and network data
CN111124492A (en) * 2019-12-16 2020-05-08 海光信息技术有限公司 Instruction generation method and device, instruction execution method, processor and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1753395A (en) * 2004-09-24 2006-03-29 松下电器产业株式会社 Symbol timing method for multi-antenna wireless communication system
CN101123477A (en) * 2006-07-28 2008-02-13 三星电机株式会社 Systems, nethods, and apparatuses for a long delay generation technique for spectrum-sensing of cognitive radios

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102103486A (en) * 2009-12-22 2011-06-22 英特尔公司 Add instructions to add three source operands
CN102103486B (en) * 2009-12-22 2016-03-30 英特尔公司 For the add instruction that three source operands are added
CN110535847A (en) * 2019-08-23 2019-12-03 北京无极芯动科技有限公司 The stacking processing method of network processing unit and network data
CN110535847B (en) * 2019-08-23 2021-08-31 极芯通讯技术(南京)有限公司 Network processor and stack processing method of network data
CN111124492A (en) * 2019-12-16 2020-05-08 海光信息技术有限公司 Instruction generation method and device, instruction execution method, processor and electronic equipment
WO2021120712A1 (en) * 2019-12-16 2021-06-24 成都海光微电子技术有限公司 Instruction generation method and apparatus, instruction execution method, processor, electronic device, and storage medium

Also Published As

Publication number Publication date
CN101477456B (en) 2011-06-08

Similar Documents

Publication Publication Date Title
EP3602278B1 (en) Systems, methods, and apparatuses for tile matrix multiplication and accumulation
US10445451B2 (en) Processors, methods, and systems for a configurable spatial accelerator with performance, correctness, and power reduction features
CN109240746B (en) Apparatus and method for performing matrix multiplication operation
EP3343388A1 (en) Processors, methods, and systems with a configurable spatial accelerator
CN104899182B (en) A kind of Matrix Multiplication accelerated method for supporting variable partitioned blocks
US20190095369A1 (en) Processors, methods, and systems for a memory fence in a configurable spatial accelerator
US20180189231A1 (en) Processors, methods, and systems with a configurable spatial accelerator
CN111310910A (en) Computing device and method
KR101918464B1 (en) A processor and a swizzle pattern providing apparatus based on a swizzled virtual register
CN111651203B (en) Device and method for executing vector four-rule operation
EP3719638A2 (en) Apparatuses, methods, and systems for transpose instructions of a matrix operations accelerator
CN112579159A (en) Apparatus, method and system for instructions for a matrix manipulation accelerator
CN111027690B (en) Combined processing device, chip and method for performing deterministic reasoning
EP3623940A2 (en) Systems and methods for performing horizontal tile operations
WO2013095629A1 (en) Apparatus and method for vector instructions for large integer arithmetic
CN102156836A (en) Elliptic curve cipher processor
CN102682232B (en) High-performance superscalar elliptic curve cryptographic processor chip
CN102360281A (en) Multifunctional fixed-point media access control (MAC) operation device for microprocessor
CN101477456B (en) Self-correlated arithmetic unit and processor
CN116710912A (en) Matrix multiplier and control method thereof
CN109144472B (en) Scalar multiplication of binary extended field elliptic curve and implementation circuit thereof
US20130311753A1 (en) Method and device (universal multifunction accelerator) for accelerating computations by parallel computations of middle stratum operations
Stepchenkov et al. Recurrent data-flow architecture: features and realization problems
CN102012802B (en) Vector processor-oriented data exchange method and device
US10445099B2 (en) Reconfigurable microprocessor hardware architecture

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: JI GANG

Free format text: FORMER OWNER: PEKING UNIVERSITY SHENZHEN GRADUATE SCHOOL

Effective date: 20120803

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518055 SHENZHEN, GUANGDONG PROVINCE TO: 519015 ZHUHAI, GUANGDONG PROVINCE

TR01 Transfer of patent right

Effective date of registration: 20120803

Address after: 519015 Guangdong Province, Zhuhai city Xiangzhou District Jiuzhou Jiuzhou Avenue East Lane 12, Room 401

Patentee after: Ji Gang

Address before: 518055 Guangdong city in Shenzhen Province, Nanshan District City Xili Shenzhen University North Campus

Patentee before: Shenzhen Graduate School of Peking University

ASS Succession or assignment of patent right

Owner name: BEIJING ANCE HENGXING INVESTMENT CO., LTD.

Free format text: FORMER OWNER: JI GANG

Effective date: 20120924

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 519015 ZHUHAI, GUANGDONG PROVINCE TO: 100142 HAIDIAN, BEIJING

TR01 Transfer of patent right

Effective date of registration: 20120924

Address after: 100142 Beijing city Haidian District enjizhuang District F No. 46 room 338

Patentee after: Beijing Hengxing Strategy Investment Limited

Address before: 519015 Guangdong Province, Zhuhai city Xiangzhou District Jiuzhou Jiuzhou Avenue East Lane 12, Room 401

Patentee before: Ji Gang

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110608

Termination date: 20200114

CF01 Termination of patent right due to non-payment of annual fee