CN102713875B - Comprise the selection of combination and the multistage multiplex computing of alignment of data or data Replica - Google Patents

Comprise the selection of combination and the multistage multiplex computing of alignment of data or data Replica Download PDF

Info

Publication number
CN102713875B
CN102713875B CN201180005892.9A CN201180005892A CN102713875B CN 102713875 B CN102713875 B CN 102713875B CN 201180005892 A CN201180005892 A CN 201180005892A CN 102713875 B CN102713875 B CN 102713875B
Authority
CN
China
Prior art keywords
data
computing
byte
alignment
zero
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201180005892.9A
Other languages
Chinese (zh)
Other versions
CN102713875A (en
Inventor
阿贾伊·阿南特·英格尔
林任从
拉胡尔·R·托莱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN102713875A publication Critical patent/CN102713875A/en
Application granted granted Critical
Publication of CN102713875B publication Critical patent/CN102713875B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1678Details of memory controller using bus width

Abstract

Disclose and a kind ofly comprise the selection of combination and the multistage multiplex computing of alignment of data or data Replica.In a particular embodiment, a kind of method comprises the first order performing multistage multiplex computing.During the described first order, from more than first data source, select the first data source.During the described first order, also at least one in the first alignment of data computing and the first data Replica computing is performed to the first data from described the first selected data source.

Description

Comprise the selection of combination and the multistage multiplex computing of alignment of data or data Replica
Technical field
The present invention relates generally to multistage multiplex computing.
Background technology
The progress of technology has produced more small-sized and more powerful calculation element.For example, current exist multiple Portable, personal calculation element, comprises wireless computing device, such as portable radiotelephone, personal digital assistant (PDA) and paging equipment, and described device volume is little, lightweight and be easy to be carried by user.More particularly, portable radiotelephone (such as, cellular phone and Internet Protocol (IP) phone) can transmit voice-and-data bag via wireless network.In addition, this little wireless telephones many comprise the device of other type be incorporated into wherein.For example, wireless telephone also can comprise Digital Still Camera, digital camera, numeroscope and audio file player.And this little wireless telephone can process executable instruction, comprising can in order to the software application (such as, web browser application program) entered the Internet.Thus, these wireless telephones can comprise significant computing power.
Such as wireless telephonic electronic installation comprises the data storage device of such as storer usually.At electronic installation place, execution relates generally to three independent stages to the instruction (such as, processor instruction) that resident data in memory carry out computing.In the first phase, from storer, selection and retrieval institute want data.In subordinate phase, handle the data that (such as, alignment, sign or zero fills up, sign or zero expansion, or copies) is selected.In the phase III, according to instruction, computing is carried out to the data through handling.First stage and subordinate phase may spend multiple processor cycle.Therefore, in some cases, the data retrieving and prepare pending computing may spend than carrying out longer time (that is, more processing device cycle) computing time used to described data, want result with the institute obtaining described instruction.
Summary of the invention
Disclose a kind of system, method and computer-readable storage medium that Data import and storing process be provided, in described Data import and storing process, side by side can perform data selection with data manipulation.Data manipulation can comprise data Replica (such as, copy one or more words, the half-word of data, or byte) or alignment of data is (such as, by data to shifting left, by data right shift, carrying out sign or zero expansion to data, or sign or zero is carried out to data fill up).In order to the multistage multiplex computing of retrieve data from storer one or more levels can side by side from select between multiple data source and (such as, via data Replica or alignment of data) amendment from the data of described selected data source.Therefore, accessible site data retrieval and data manipulation, thus the Potential performance produced in storer related operation improves.
In a particular embodiment, disclose a kind of method, during described method is included in the first order of multistage multiplex computing, from multiple data source, select the first data source.During described method is also included in the described first order, at least one in the first alignment of data computing and the first data Replica computing is performed to the first data retrieved from described the first selected data source.
In another specific embodiment, a kind of equipment comprises storer, and described storer comprises multiple data source.Described equipment also comprises load aligner, and described load aligner is configured to the multiple sections of at least one optionally performed in alignment of data computing and data Replica computing to the double word retrieved from the one in described multiple data source.Also multiplexed computing is performed to described multiple data source.Described alignment of data computing or described data Replica computing and described multiplexed computing side by side perform.
In another specific embodiment, a kind of equipment comprises sign or zero expansion logic, and described sign or zero expansion logic are configured to optionally carry out sign or zero expansion to multiple sections of the double word retrieved from the one in multiple data source.Described sign or zero expansion side by side perform with multiplexed computing.
In another specific embodiment, a kind of equipment comprises the multiple devices for storing data.Described equipment also comprises load aligner device, described load aligner device is used for performing multiplexed computing concurrently with to the described multiple device for storing data, to the multiple sections of at least one optionally performed in alignment of data computing and data Replica computing of the double word retrieved from the one for storing in described multiple device of data.Described equipment also comprises expanding unit, described expanding unit be used for described multiplexed computing concurrently, multiple sections to described double word are optionally carried out sign extended or zero expansion.Described equipment comprises further fills up device, described in fill up device for described multiplexed computing concurrently, described multiple sections to described double word are optionally carried out sign and fill up or zero to fill up.
In another specific embodiment, disclose a kind of computer-readable media.Described computer-readable media comprises microprocessor instruction, and described microprocessor instruction makes described microprocessor perform the first order of multistage multiplex computing when being performed by microprocessor.Perform the described first order and comprise the multiplexed computing of execution Part I, the multiplexed computing of described Part I comprises selects the first data source from more than first data source.Perform the described first order and also comprise at least one the first data received from described the first selected data source performed in the first alignment of data computing and the first data Replica computing.Described first alignment of data computing or described first data Replica computing perform with word level about described first data.
The specific advantages that at least one passed through in disclosed embodiment provides is: the ability selecting and handle (such as, via aliging or copying) data during the single-stage of multistage multiplex computing.Another specific advantages that passing through at least one in disclosed embodiment provides is: in order to retrieval, the minimizing selecting and prepare to treat the processor cycle of the data of being carried out computing by performance element according to microprocessor instruction.
After checking whole application case, other side of the present invention, advantage and feature will become apparent, and described whole application case comprises following chapters and sections: [accompanying drawing explanation], [embodiment] and [claims].
Accompanying drawing explanation
Fig. 1 is the block diagram of the certain illustrative embodiment of the system performing multistage multiplex (MUX) computing, and described multistage multiplex (MUX) computing comprises (multiple) level of alignment of data or the data Replica with combination;
Fig. 2 is the block diagram of the certain illustrative embodiment of the multistage MUX logic of Fig. 1;
Fig. 3 is the block diagram of the computing of logic in order to key diagram 2;
Fig. 4 is the circuit stratal diagram of the certain illustrative embodiment of circuit, and described circuit comprises the multistage MUX logic of Fig. 1 and the logic of Fig. 2;
Fig. 5 is the circuit stratal diagram of the computing of logic in order to Fig. 2 to be as depicted in figure 3 described;
Fig. 6 is the figure of the specific embodiment of the data access pattern supported in order to the multistage MUX logic of key diagram 1, the logic of Fig. 2 and the circuit of Fig. 4;
Fig. 7 is the process flow diagram of the certain illustrative embodiment of method, and described method performs and comprises the data selection of combination and the multistage multiplex computing of alignment of data or data Replica; And
Fig. 8 is the block diagram of wireless device, and described wireless device comprises the data selection of combination and the multistage multiplex computing of alignment of data or data Replica.
Embodiment
Referring to Fig. 1, disclose a kind of certain illustrative embodiment of system and substantially described system be expressed as 100, described system performs multistage multiplex (MUX) computing comprising multiple grades of alignment of data, data Replica and the data selection with combination.System 100 comprises storer 110 and multistage multiplex (MUX) logical one 20, described multistage multiplex (MUX) logical one 20 is configured to retrieve and handle the data that (such as, via aliging and copying) is stored in storer 110 place.Logical one 20 is also configured to the data through handling are stored into storer 110 and be stored into register file 130.
In a particular embodiment, storer 110 and can store instruction 102 and access via loading.For example, load and store the microprocessor instruction that instruction 102 can be microprocessor, and system 100 accessible site is in the performance element of described microprocessor.Load and store instruction 102 can comprise the storage address of data to be retrieved, the memory offset of data to be retrieved, data to be retrieved size (such as, the number of byte or position), sign or zero extension bits, sign or zero fill up position, left or right displacement position, or its any combination.For example, particular load instruction can be asked retrieve stored in the data at the particular address place of storer 110 and described data is stored in the particular register of register file 130.In a particular embodiment, storer 110 is divided into some memory set.For example, as illustrated in Figure 1, storer 110 can be divided into four memory set 111,112,113 and 114.In a particular embodiment, each in memory set 111-114 stores 64 double words.Each 64 double word can comprise two 32 words, and each 32 word can comprise two 16 half-words, and each 16 half-word can comprise two octets.
Logical one 20 is configured to perform the multistage MUX computing about the data retrieved from storer 110.Described multistage MUX computing comprises two or more levels.In a particular embodiment, described multistage MUX computing comprises three levels, and each in described three levels comprises data selection computing and data manipulation computing.Logical one 20 is also configured to the data storage retrieved from storer 110 got back to storer 110 and be stored into register file 130.Therefore, in a particular embodiment, logical one 20 can in order to be repositioned onto the identical of storer 110 place or the second address by the data of the first address from storer 110 place.The specific embodiment of logical one 20 is further described referring to Fig. 2 and Fig. 4.
Register file 130 can comprise one or more registers being configured to store data.For example, register file 130 can be the 64 bit register heaps comprising multiple 64 bit registers, and each register is configured to storage 64 bit data item.In a particular embodiment, register file 130 is in order to store to data execution microprocessor computing (such as, addition, subtraction, logical "and" (logicalAND), logical "or" (logicalOR)) retrieved from storer 110 data retrieved before.Register file 130 also can in order to the result of storage microprocessor computing.
In computing, one or more loadings or storage instruction 102 can trigger the multistage MUX computing about the data being stored in storer 110 place.For example, during the first order of described multistage MUX computing, logical one 20 can be selected between the first double word retrieved from first memory group 111 (such as, 64) and the second double word retrieved from second memory group 112.Logical one 20 also can be selected between the 3rd double word retrieved from the 3rd memory set 113 and the 4th double word retrieved from the 4th memory set 114.Logical one 20 can perform word level (such as, 32) data manipulation computing (such as, sign extended, zero is filled up and alignd) to selected data with data selection further concurrently.Therefore, the first order of described multistage MUX computing can produce the first selected double word from first memory group 111 or second memory group 112, and select double word from second of the 3rd memory set 113 or the 4th memory set 114, wherein the first selected double word and the second selected double word experience data manipulation.
During the second level of described multistage MUX computing, logical one 20 can be selected between the produce during the first order first selected double word and the second selected double word.Logical one 20 also can perform half-word level (such as, 16) data manipulation computing (such as, sign extended, zero is filled up and alignd) to selected data.Therefore, the second level of described multistage MUX computing can produce the double word from the one in four memory set 111-114, and wherein said double word has experienced the selective data carried out with word level and half-word level and handled.Data manipulation can comprise data Replica (such as, copy one or more words of data, half-word or byte) or alignment of data is (such as, by data to shifting left, by data right shift, carrying out sign or zero expansion to data, or sign or zero is carried out to data fill up).It should be noted that in some cases, alignment of data may not relate to the original alignment of changing data.For example, the data retrieved from storer 110 may in wanted word boundary alignment.Further describe declarative data referring to Fig. 6 and handle computing.
During the third level of described multistage MUX computing, logical one 20 the selected part of byte level to the double word produced by the second level or double word can perform further data manipulation (such as, sign or zero expansion, sign or zero are filled up and align).For example, gained final data can comprise two through copy word, at least two through copying half-word, at least one sign extended byte, zero fills up byte, at least two through copying byte, at least one byte through again aliging or its combination at least one.After the third level is terminated, just can as load or store instruction 102 ask storer 110 to be got back in the storage of gained final data or is stored into register file 130.
Should be appreciated that, the system 100 of Fig. 1 can enable data selection and the data manipulation (such as, via aliging or copying) of the combination during the single-stage of multistage MUX computing.Therefore, should be appreciated that, the system 100 of Fig. 1 can reduce in order to retrieval, the processor cycle of selecting and preparing to treat to carry out according to microprocessor instruction the data of computing.
Referring to Fig. 2, the block diagram of the certain illustrative embodiment of the multistage MUX logical one 20 of key diagram 1 and described embodiment is expressed as 200.Logic 200 comprises the alignment of word level and the Part I MUX221 copied, comprises the alignment of word level and the Part II MUX222 copied, comprises the alignment of half-word level and the final MUX223 that copies, and is configured to the data steering logic 224 that performs the alignment of byte level, copy and convert.Logic 200 can receive 64 bit data from each in four memory set 211,212,213,214 and can produce 64 bit data as output.In an illustrative embodiment, memory set 211-214 is the memory set 111-114 of Fig. 1.
Part I MUX221 can receive from the one 64 bit data of first memory group 211 and the 2 64 bit data from second memory group 212.For example, the one 64 bit data (order according to from most significant byte to least significant byte) can be expressed as byte B7-B0, and the 2 64 bit data can be expressed as byte B15-B8.Part I MUX221 can select between the one 64 bit data and the 2 64 bit data.Part I MUX221 also can perform word level (that is, 32) to multiple sections of 64 selected bit data and aligns and copy.Therefore, Part I MUX221 can produce the output of first selected 64 double words 231 as Part I MUX221, and first selected 64 double words 231 are the expression of aliging through word/copying of the byte B7-B0 from first memory the group 211 or byte B15-B8 from second memory group 212.In a particular embodiment, Part I MUX221 is integrated in load aligner, sign or zero expansion logic, sign or zero fill logic or its any combination.In another specific embodiment, Part I MUX221 comprises load aligner, sign or zero expansion logic, sign or zero fill logic or its any combination.
Part II MUX222 can with Part I MUX221 concurrently (such as, during the first order) carry out computing.Part II MUX222 can receive from the 3 64 bit data of the 3rd memory set 213 and the 4 64 bit data from the 4th memory set 214.For example, the 3 64 bit data can be expressed as byte B23-B16, and the 4th 64-bit data is expressed as byte B31-B24.Part II MUX222 can select between the 3 64 bit data and the 4 64 bit data.Part II MUX222 also can perform word level (that is, 32) to multiple sections of 64 selected bit data and aligns and copy.Therefore, Part II MUX222 can produce selected 64 double words 232 of second selected 64 double words 232, second is the expression of aliging through word/copying from the byte B23-B16 of the 3rd the memory set 213 or byte B31-B24 from the 4th memory set 214.In a particular embodiment, Part II MUX222 is integrated in load aligner, sign extended logic, zero fill logic or its any combination.In another specific embodiment, Part II MUX222 comprises load aligner, sign extended logic, zero fill logic or its any combination.
Final MUX223 can receive the first selected 64 double words 231 and second selected 64 double words 232, and selects between first selected 64 double words 231 and second selected 64 double words 232.Final MUX223 also can perform half-word level (that is, 16) and aligns and copy.Therefore, the expression through word/halfword-aligned/copy of to be 64 double words, 233,64 double words 233 be in byte B7-B0, byte B15-B8, byte B23-B16 or byte B31-B24 one that final MUX223 can produce output.
Data steering logic 224 receives 64 double words 233 from final MUX223, and can perform the alignment of byte level, copy and convert to produce the final output data 234 of 64 64 double words 233.Can be byte D7-D0 by 8 byte representations finally exporting data 234.
The specific embodiment of the computing of logic 200 can be described referring to Fig. 3.In an illustrative embodiment, memory set 311,312,313,314 is the memory set 211,212,213,214 of Fig. 2, part MUX321,322 is the part MUX221,222 of Fig. 2, final MUX323 is the final MUX223 of Fig. 2, and data steering logic 324 is the data steering logic 224 of Fig. 2.
In particular instances, microprocessor instruction can perform byte operand and be stored in first memory group 311 place data specified byte 302 " E1 " between logical "or" computing.As illustrated in Figure 3, specified byte 302 can be the part of the one 64 double word 301 " F0E1D2C3B4A59687 ".In a particular embodiment, before the described logical "or" computing of generation, retrieve specified byte 301, and specified byte 301 0 is filled up least significant bit (LSB) put.For example, before actuating logic "or" zero fills up the non-intrinsically safe potential energy of operand can be made " to pass through " logical "or" computing enough unchangeably.
During the first order of multistage MUX computing, Part I MUX321 can receive the one 64 double word 301 from first memory group 311, and can receive the 2 64 double word from second memory group 312.Because want byte 302 to be in first word " F0E1D2C3 " of the one 64 double word 301, so Part I MUX321 selects the one 64 double word 301 and performs about the word level duplication of the first word of the one 64 double word 301.Part I MUX321 can produce first selected 64 double words 331 " F0E1D2C3F0E1D2C3 ".
During the second level of described multistage MUX computing, final MUX323 can receive first selected 64 double words 331 from Part I MUX321 and second selected 64 double words (not shown) from Part II MUX322.Because want byte 302 to be in the 3rd half-word " F0E1 " of first selected 64 double words 331, so final MUX323 selects first selected 64 double words 331 and performs the half-word level duplication of the 3rd half-word about first selected 64 double words 331.Final MUX323 can produce 64 double words 333 " F0E1F0E1F0E1F0E1 " as exporting.
During the third level of described multistage MUX computing, data steering logic 324 can perform the data transformation of other necessity about 64 double words 333.For example, data steering logic 324 can carry out zero to 64 double words 333 and fills up to produce and finally export data 334 " 00000000000000E1 ".Final output data 334 can be then used in logical "or" computing.
Should be appreciated that, Fig. 2 and logic accessible site data retrieval illustrated in fig. 3/selection logical and data steering logic.Therefore, should be appreciated that, before carrying out computing according to the final output data 234 (334 in Fig. 3) of microprocessor instruction (such as, logical "or") to Fig. 2, the final data 234 (334 in Fig. 3) that export may not need further manipulation.
Referring to Fig. 4, the circuit stratal diagram of the certain illustrative embodiment of the multistage MUX logical one 20 of key diagram 1 and the multistage MUX logic 200 of Fig. 2 and substantially described embodiment is expressed as 400.
In a particular embodiment, as illustrated in Figure 4, can use " and/or " MUX implements to comprise the multistage MUX logic of the data selection of combination and alignment of data/copy.For example, can use eight four inputs " and/or " MUX401,402,403,404,405,406,407 and 408 to be to implement the Part I MUX221 of Fig. 2.Can use eight four inputs " and/or " MUX411,412,413,414,415,416,417 and 418 to be to implement the Part II MUX222 of Fig. 2.The input byte that it should be noted that from various memory set can cross over described various " and/or " MUX401-418 scatters.For example, " and/or " MUX401 can receive byte B8 and the B12 of byte B0 and the B4 from the first memory group 211 of Fig. 2 and the second memory group 212 from Fig. 2.As another example, " and/or " MUX418 can receive byte B27 and the B31 of byte B19 and the B23 from the 3rd memory set 213 of Fig. 2 and the 4th memory set 214 from Fig. 2.
Also can use " and/or " MUX to be to implement the data steering logic 224 of final MUX223 and Fig. 2 of Fig. 2.For example, can use eight four inputs " and/or " MUX421,422,423,424,425,426,427 and 428 to be to implement the final MUX223 of Fig. 2." and/or " each input of each in MUX421-428 can be " and/or " output of one in MUX401-418.Can use eight four inputs " and/or " MUX431,432,433,434,435,436,437 and 438 to be to implement the data steering logic 224 of Fig. 2." and/or " each input of MUX431-438 can be " and/or " output of one in MUX421-428 or the version filling up (" SZ ") through sign or zero expansion/sign or zero of described output." and/or " each in MUX431-438 can produce the final data D7-D0 of a byte.
It should be noted that for clarity, do not describe all connections of circuit 400.For example, although do not connect in the explanation of Fig. 4, " and/or " output " A " of MUX408 be input to " and/or " MUX428 and " and/or " in MUX426.Be further noted that each byte level depicted in figure 4 " and/or " MUX can comprise eight position levels " and/or " MUX.That is, each byte level depicted in figure 4 " and/or " MUX can represent eight position levels " and/or " MUX, and each can be inputted position be applied to different position levels four input " and/or " MUX.
When use " and/or " MUX implement the multistage MUX logical one 20 of Fig. 1 time, confirm (such as by stopping using or cancelling, be set to zero) specific " and/or " all selections of MUX perform zero expansion and zero and fill up, produce whereby null value as described specific " and/or " output of MUX.Perform sign extended by following each: by 8 bit wides four inputs " and/or " one in the input of MUX is expressed as sign bits, and described sign bits is connected to all 8 positions of specific input.Can then select (such as, via sign extended device select) those 8 positions crossing over " and/or " output of MUX expands described sign bits.
It should be noted that to change into uses other logic element to implement a part for multistage multiplex computing as described in this article or multistage multiplex computing.For example, can change into using, by door 4, first and second grades of MUX are implemented to a MUX (replace " and/or " MUX).If do not need zero to fill up and zero expansion, so can use, by door four, third level MUX be implemented to a MUX.If need zero to fill up or zero expansion, so third level MUX can be implemented instead by five to a MUX (one wherein in five inputs is set to zero).Perform sign extended by following each: by door four, sign bits is expressed as to the one in the input of a MUX by 8 bit wides, and described sign bits is connected to all 8 positions of specific input.(such as, selecting via sign extended device) those 8 positions can be selected to cross over the described output by door MUX expansion sign bits.It should be noted that the one in input is confirmed (such as, one) all the time for for door MUX.Therefore, for producing zero as the output (such as, filling up for zero or zero expansion) by door MUX, can be connected to described by the one in the input of door MUX by zero.In another specific embodiment, can only use four to one " and/or " MUX to implement the third level, and can use and implement front two-stage by door four to a MUX.
The specific embodiment of the computing of the circuit 400 of Fig. 4 can be described referring to Fig. 5.In an illustrative embodiment, " and/or " MUX511-518,521-528,531-538 and 541-548 be respectively Fig. 4 " and/or " MUX411-418,421-428,431-438 and 441-448.
For example, described by the example with reference to figure 3, microprocessor instruction can perform the logical "or" computing between byte operand and the specified byte (such as, " E1 ") at 64 double words (such as, " the F0E1D2C3B4A59687 ") place stored in memory.Retrieve by circuit 500, select described specified byte and carry out zero to described specified byte to fill up.
If be B7-B0 by the byte representation of double word " F0E1D2C3B4A59687 ", so wanted byte " E1 " is in B6 position.Therefore, during the first order of multistage MUX computing, by as illustrated in fig. 5 " and/or " MUX501-508 selects and copies word " F0E1D2C3 ", thus produces the first result double word " F0E1D2C3F0E1D2C3 ".
During the second level of described multistage MUX computing, by as illustrated in fig. 5 " and/or " MUX521-528 selects and copies the half-word " F0E1 " in the first result double word " F0E1D2C3F0E1D2C3 ".Therefore the second result double word " F0E1F0E1F0E1F0E1 " can be produced.
During the third level of described multistage MUX computing, by as illustrated in fig. 5 " and/or " MUX531-538 carries out zero to the second result double word " F0E1F0E1F0E1F0E1 " and fills up, produce whereby and finally export double word " 00000000000000E1 ".
Should be appreciated that, as illustrated in Fig. 4-5 " and/or " use of MUX can simplify the logic at microprocessor place.For example, can only one be placed in 64 bit datapath of microprocessor through replicated gate structure, for each in the multiplexed level of three in described data routing.
Referring to Fig. 6, the specific embodiment of the data access pattern about input double word 600 " F0E1D2C3B4A59687 " is described.As illustrated in fig. 6, input double word 600 can be divided into eight byte B0-B7610-617, four half-word H0-H3620-623, or two word W0-W1630-631.
Described by referring to Fig. 4-5, the many different variant of input double word 600 " F0E1D2C3B4A59687 " can be produced during multistage MUX process.For example, the variant copied through word can be produced.In order to illustrate, carry out word about word W0630 and copying double word 641, and carry out word about word W1631 and copy double word 642.Also can produce the variant copied through half-word of input double word 600.In order to illustrate, carry out half-word about half-word H1621 and copying double word 643, and carry out half-word about half-word H2622 and copy double word 644.Copy produce double word 645 by then carrying out word after copying at half-word.
What also can produce input double word 600 fills up the variant with sign extended through zero.For example, double word 646 is described to carry out the zero input double word 600 filled up about half-word H0620 with half-word level, and double word 647 describes the input double word 600 of carrying out sign extended about half-word H0620 with half-word level.As another example, double word 648 is described to carry out the zero input double word 600 filled up about byte B0610 with byte level, and double word 649 describes the input double word 600 of carrying out sign extended about byte B0610 with byte level.
The specified byte of input double word 600 of also can individually again aliging.For example, double word 650 describes zero the filling up and again align of individual byte about word W1631, and double word 651 describe about the individual byte of word W1631 sign extended and again align.As another example, double word 652 describes zero the filling up and again align of individual byte about half-word H0620, and double word 653 describe about the individual byte of half-word H0620 sign extended and again align.
Should be appreciated that, by data are combined to shifting left, by data right shift or its data manipulation implemented as illustrated in fig. 6.Should also be clear that can one or more execution data manipulation as illustrated in fig. 6 in double word level, word level, half-word level or byte level.Therefore, many different data access pattern are supported by multistage MUX computing as disclosed herein.
Referring to Fig. 7, the process flow diagram of the certain illustrative embodiment of plotting method 700, method 700 performs and comprises the data selection of combination and the multistage multiplex computing of alignment of data or data Replica.In an illustrative embodiment, the circuit 400 by the system 100 of Fig. 1, the logic 200 of Fig. 2 or Fig. 4 carrys out manner of execution 700.
Method 700 is included in the first order of 710 execution multistage multiplex computings.Perform the first order 711 to comprise select the first data source from more than first data source, and the first data received from the first selected data source are performed at least one in the first alignment of data computing and the first data Replica computing 712.For example, in fig. 2, Part I MUX221 can select first memory group 211, and can align to the data execution word level received from first memory group 211/copy.
713, perform the first order and also comprise selects the second data source from more than second data source, and the second data received from the second selected data source are performed at least one in the second alignment of data computing and the second data Replica computing 714.For example, in fig. 2, Part II MUX222 can select the 3rd memory set 213, and can align to the data execution word level received from the 3rd memory set 213/copy.
Method 700 also comprises, and 720, during the second level of described multistage multiplex computing after the first stage, performs at least one in the 3rd alignment of data computing and the 3rd data Replica computing to the 3rd data.3rd data are selected from the one in the first data and the second data.For example, in fig. 2, final MUX223 selects the first selected double word 231 of receiving from Part I MUX221 and performs half-word level to the described first selected double word 231 to align/copy.
Method 700 comprises further, 730, during the third level of the described multistage multiplex computing after the second level, alignment of data computing, data Replica computing, sign or zero extended arithmetic and sign or zero are performed to the 3rd data and fills up at least one in computing to produce final data.For example, in fig. 2, data steering logic 224 executable Bytes level aligns, copies and converts to produce final output data 234.
Method 700 comprises, and 740, final data is stored in register file place, or 750, final data is stored in storer place.Storer can comprise more than first data source and more than second data source.For example, in fig. 2, can register file place be stored in by finally exporting data 234 or be stored in storer place.In an illustrative embodiment, be stored in register file place (as described in the register file 130 referring to Fig. 1) by finally exporting data 234 or be stored in storer place (as described in the storer 110 referring to Fig. 1).
Should be appreciated that, the method 700 of Fig. 7 can enable data selection and the data manipulation (such as, via aliging or copying) of the combination during the various levels of multistage MUX computing.Therefore, should be appreciated that, the method 700 of Fig. 7 can reduce in order to retrieval, the processor cycle of selecting and preparing to treat to carry out according to microprocessor instruction the data of computing.
Referring to Fig. 8, describe the block diagram of the certain illustrative embodiment of electronic installation and substantially described electronic installation be expressed as 800, described electronic installation can operate to perform the multistage MUX computing of (multiple) level comprising the alignment with combination/copy/select.Device 800 comprises the processor being coupled to storer 832, such as, and digital signal processor (DSP) 810.Digital signal processor 810 comprises multistage MUX logic 864, and multistage MUX logic 864 comprises one or more levels of the alignment with combination/copy/select.In an illustrative embodiment, what logic 864 comprised in the circuit 400 of the logical one 20 of Fig. 1, the logic 200 of Fig. 2 and Fig. 4 is one or more.
Fig. 8 also illustrates and is coupled to digital signal processor 810 and the display controller 826 being coupled to display 828.Encoder/decoder (CODEC) 834 also can be coupled to digital signal processor 810.Loudspeaker 836 and microphone 838 can be coupled to CODEC834.
Fig. 8 also indicates wireless controller 840 can be coupled to digital signal processor 810 and is coupled to wireless antenna 842.In a particular embodiment, DSP810, display controller 826, storer 832, CODEC834 and wireless controller 840 are included in system in package or system level chip device 822.In a particular embodiment, input media 830 and electric supply 844 are coupled to system level chip device 822.In addition, in a particular embodiment, as illustrated in figure 8, display 828, input media 830, loudspeaker 836, microphone 838, wireless antenna 842 and electric supply 844 are positioned at system level chip device 822 outside.But each in display 828, input media 830, loudspeaker 836, microphone 838, wireless antenna 842 and electric supply 844 can be coupled to the assembly (such as, interface or controller) of system level chip device 822.
In a particular embodiment, device 800 is communicator (such as, wireless telephone), music player, video player, amusement unit, guider, personal digital assistant (PDA) or computing machine.Between operational stage, DSP810 can perform the instruction comprising retrieve data and the described data of manipulation from storer 832.Multistage MUX logic 864 can comprise one or more levels side by side selecting and handle data (such as, fill up via sign extended, zero, and word/half-word/byte copying).After described multistage MUX computing completes, just can carry out computing according to instruction to data, and result can be stored into storer 832 or be stored into the register file of DSP.
Those skilled in the art should be further appreciated that various illustrative logical blocks, configuration, module, circuit and the algorithm steps described in conjunction with embodiment disclosed herein can be embodied as electronic hardware, computer software or both combinations.In functional, various Illustrative components, block, configuration, module, circuit and step are described substantially above.This is functional is embodied as hardware or software depends on application-specific and forces at the design constraint of whole system.Functional described by those skilled in the art can to implement with variation pattern for each application-specific, but this little implementation decision should not be interpreted as causing departing from scope of the present invention.
The method described in conjunction with embodiment disclosed herein or the step of algorithm can directly be embodied in the software module performed in hardware, by processor, or in both combinations.Software module can reside in following each: random access memory (RAM), flash memory, ROM (read-only memory) (ROM), programmable read only memory (PROM), Erasable Programmable Read Only Memory EPROM (EPROM), Electrically Erasable Read Only Memory (EEPROM), register, hard disk, removable disk, compact disk ROM (read-only memory) (CD-ROM), or the medium of other form any known in technique.Exemplary storage medium is coupled to processor, makes described processor from described read information and described medium can be write information to.In alternative, medium can be integrated into processor.Processor and medium can reside in special IC (ASIC).ASIC can reside in calculation element or user terminal.In alternative, processor and medium can be used as discrete component and reside in calculation element or user terminal.
There is provided the previous description of disclosed embodiment with make those skilled in the art to carry out or to use announcement embodiment.Principle as defined herein by the easily apparent various amendments to these embodiments, and can be applied to other embodiment when not departing from scope of the present invention by those skilled in the art.Therefore, the present invention is not set is limited to illustrated embodiment herein, and should meet may be consistent with the principle such as defined by following claims and novel feature most wide region.

Claims (21)

1. for performing a method for multistage multiplex computing, it is characterized in that, comprising:
Receive more than first data;
During the first order of multistage multiplex computing:
The first data are selected from more than first received data; And
First data parallel ground described with selection, performs at least one in the first alignment of data computing and the first data Replica computing to described first data; Wherein said first data comprise the one 64 double word, and described at least one in wherein said first alignment of data computing and described first data Replica computing about described first data 32 words and perform.
2. method according to claim 1, during it is included in the described first order of described multistage multiplex computing further:
Receive more than second data;
The second data are selected from more than second received data; And
Second data parallel ground described with selection, performs at least one in the second alignment of data computing and the second data Replica computing to described second data.
3. method according to claim 2, wherein said second data comprise the 2 64 double word, and described at least one in wherein said second alignment of data computing and described second data Replica computing about described second data 32 words and perform.
4. method according to claim 2, during it is included in the second level after the described first order of described multistage multiplex computing further:
At least one in 3rd alignment of data computing and the 3rd data Replica computing is performed to the 3rd data,
Wherein said 3rd data are selected from the output of the described first order.
5. method according to claim 4, wherein said 3rd data comprise 64 double words, and described at least one in wherein said 3rd alignment of data computing and described 3rd data Replica computing about described 3rd data 16 half-words and perform.
6. method according to claim 4, during it is included in the third level after the described second level of described multistage multiplex computing further:
At least one in following each is performed to produce final data to described 3rd data: alignment of data computing, data Replica computing, sign extended computing, zero extended arithmetic, sign are filled up computing and zero and filled up computing.
7. method according to claim 6, during it is included in the described third level of described multistage multiplex computing further:
Described final data is stored at least one in register file and storer.
8. method according to claim 7, it is one or more that wherein said storer comprises in more than first data source of described more than first data, and one or more in more than second data source of described more than second data.
9. method according to claim 6, it is one or more that wherein said final data comprises in following each: double word, two through copy word, at least two through copying half-word, at least one sign extended byte, at least one zero extended byte, at least one sign fill up byte, at least one zero fill up byte, at least two through copying byte, and at least one byte through again aliging.
10. method according to claim 2, at least one data source of wherein said more than first data is first memory group, and at least one data source of described more than second data is the 3rd memory set.
11. 1 kinds for performing the equipment of multistage multiplex computing, is characterized in that, comprising:
Storer, described storer comprises multiple data source; And
Load aligner, described load aligner is configured to from the multiple data of described multiple data sources, and
Multiplexed computing is performed side by side, to the multiple sections of at least one optionally performed in alignment of data computing and data Replica computing of the double word retrieved from the one in received multiple data with to received multiple data.
12. equipment according to claim 11, wherein said load aligner through be configured to further by by data to shift left or right shift performs described alignment of data computing.
13. equipment according to claim 11, each in wherein received multiple data is multibyte data, and wherein said load aligner is through being configured to further perform the described at least one in described alignment of data computing and described data Replica computing on byte-by-byte basis.
14. equipment according to claim 11, wherein with perform the described multiplexed computing described at least one side by side optionally performed in described alignment of data computing and described data Replica computing comprise each of described double word is input to four inputs " with-or " in multiplexer.
15. 1 kinds for performing the equipment of multistage multiplex computing, is characterized in that, comprising:
Storer, described storer comprises multiple data source; And
Sign extended logic, described sign extended logic is configured to from the multiple data of described multiple data sources, and
Multiple sections to the double word retrieved from the one in received multiple data are optionally carried out sign extended, wherein said sign extended with the multiplexed computing that received multiple data are carried out side by side is performed.
16. equipment according to claim 15, it comprises zero fill logic further, and described zero fill logic is configured to concurrently with described multiplexed computing described multiple sections of described double word optionally be carried out to zero and fill up.
17. equipment according to claim 16, each in wherein received multiple data is multibyte, wherein said sign extended logic is through being configured to further carry out sign extended to described multiple sections of described double word on byte-by-byte basis, and wherein said zero fill logic is filled up through being configured to carry out zero to each in described multiple sections of described double word further on byte-by-byte basis.
18. equipment according to claim 15, wherein with described multiplexed computing concurrently to multiple sections of described double word optionally carry out sign extended comprise each of described double word is input to four inputs " with-or " in multiplexer.
19. equipment according to claim 15, wherein said double word comprises 64 positions.
20. 1 kinds for performing the equipment of multistage multiplex computing, is characterized in that, comprising:
For storing multiple devices of data;
Load aligner device, described load aligner device is used for receiving multiple data from described multiple devices for storing data, and
Multiplexed computing is performed concurrently, to the multiple sections of at least one optionally performed in alignment of data computing and data Replica computing of the double word retrieved from the one in received multiple data with to received multiple data;
Expanding unit, described expanding unit is used for optionally carrying out sign extended or zero expansion to multiple sections of described double word concurrently with described multiplexed computing; And
Fill up device, described in fill up device for optionally carrying out sign and fill up or zero to fill up described multiple sections of described double word with described multiplexed computing concurrently.
21. equipment according to claim 20, it comprises the device being selected from the group be made up of following each further: music player, video player, amusement unit, guider, communicator, personal digital assistant PDA and computing machine, and described load aligner device, described sign extended device and described zero are filled up device and be integrated in described device.
CN201180005892.9A 2010-01-15 2011-01-14 Comprise the selection of combination and the multistage multiplex computing of alignment of data or data Replica Active CN102713875B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US12/688,091 2010-01-15
US12/688,091 US8356145B2 (en) 2010-01-15 2010-01-15 Multi-stage multiplexing operation including combined selection and data alignment or data replication
PCT/US2011/021342 WO2011088351A1 (en) 2010-01-15 2011-01-14 Multi-stage multiplexing operation including combined selection and data alignment or data replication

Publications (2)

Publication Number Publication Date
CN102713875A CN102713875A (en) 2012-10-03
CN102713875B true CN102713875B (en) 2016-01-20

Family

ID=43827766

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201180005892.9A Active CN102713875B (en) 2010-01-15 2011-01-14 Comprise the selection of combination and the multistage multiplex computing of alignment of data or data Replica

Country Status (6)

Country Link
US (1) US8356145B2 (en)
EP (1) EP2524315A1 (en)
JP (1) JP5584781B2 (en)
CN (1) CN102713875B (en)
TW (1) TW201140318A (en)
WO (1) WO2011088351A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8484433B2 (en) * 2010-11-19 2013-07-09 Netapp, Inc. Dynamic detection and reduction of unaligned I/O operations
US10423353B2 (en) 2016-11-11 2019-09-24 Micron Technology, Inc. Apparatuses and methods for memory alignment
US10372452B2 (en) * 2017-03-14 2019-08-06 Samsung Electronics Co., Ltd. Memory load to load fusing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0695998A2 (en) * 1994-08-02 1996-02-07 Motorola, Inc. Interbus buffer
US5882620A (en) * 1995-06-07 1999-03-16 International Carbitech Industries, Inc. Pyrometallurgical process for forming tungsten carbide
US5907865A (en) * 1995-08-28 1999-05-25 Motorola, Inc. Method and data processing system for dynamically accessing both big-endian and little-endian storage schemes
CN101256546A (en) * 2007-03-01 2008-09-03 黄新亚 32 bits micro-processor
CN101299185A (en) * 2003-08-18 2008-11-05 上海海尔集成电路有限公司 Microprocessor structural frame based on CISC structure and instruction realizing mode

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5394133A (en) * 1977-01-28 1978-08-17 Hitachi Ltd Data converter
US4583199A (en) * 1982-07-02 1986-04-15 Honeywell Information Systems Inc. Apparatus for aligning and packing a first operand into a second operand of a different character size
JP3181001B2 (en) * 1993-06-01 2001-07-03 インターナショナル・ビジネス・マシーンズ・コーポレ−ション Cache memory system and cache memory access method and system
US5627773A (en) * 1995-06-30 1997-05-06 Digital Equipment Corporation Floating point unit data path alignment
US5761469A (en) * 1995-08-15 1998-06-02 Sun Microsystems, Inc. Method and apparatus for optimizing signed and unsigned load processing in a pipelined processor
US5822620A (en) * 1997-08-11 1998-10-13 International Business Machines Corporation System for data alignment by using mask and alignment data just before use of request byte by functional unit
US7197625B1 (en) * 1997-10-09 2007-03-27 Mips Technologies, Inc. Alignment and ordering of vector elements for single instruction multiple data processing
US6539467B1 (en) * 1999-11-15 2003-03-25 Texas Instruments Incorporated Microprocessor with non-aligned memory access
US6622242B1 (en) * 2000-04-07 2003-09-16 Sun Microsystems, Inc. System and method for performing generalized operations in connection with bits units of a data word
US20030002474A1 (en) * 2001-03-21 2003-01-02 Thomas Alexander Multi-stream merge network for data width conversion and multiplexing
US6877019B2 (en) * 2002-01-08 2005-04-05 3Dsp Corporation Barrel shifter
US7877581B2 (en) * 2002-12-12 2011-01-25 Pmc-Sierra Us, Inc. Networked processor for a pipeline architecture
US20070088772A1 (en) * 2005-10-17 2007-04-19 Freescale Semiconductor, Inc. Fast rotator with embedded masking and method therefor
US8285766B2 (en) * 2007-05-23 2012-10-09 The Trustees Of Princeton University Microprocessor shifter circuits utilizing butterfly and inverse butterfly routing circuits, and control circuits therefor
CN101981987B (en) * 2008-01-30 2014-12-03 谷歌公司 Notification of mobile device events
US8291002B2 (en) * 2009-06-01 2012-10-16 Arm Limited Barrel shifter

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0695998A2 (en) * 1994-08-02 1996-02-07 Motorola, Inc. Interbus buffer
US5882620A (en) * 1995-06-07 1999-03-16 International Carbitech Industries, Inc. Pyrometallurgical process for forming tungsten carbide
US5907865A (en) * 1995-08-28 1999-05-25 Motorola, Inc. Method and data processing system for dynamically accessing both big-endian and little-endian storage schemes
CN101299185A (en) * 2003-08-18 2008-11-05 上海海尔集成电路有限公司 Microprocessor structural frame based on CISC structure and instruction realizing mode
CN101256546A (en) * 2007-03-01 2008-09-03 黄新亚 32 bits micro-processor

Also Published As

Publication number Publication date
JP5584781B2 (en) 2014-09-03
US8356145B2 (en) 2013-01-15
WO2011088351A1 (en) 2011-07-21
TW201140318A (en) 2011-11-16
CN102713875A (en) 2012-10-03
EP2524315A1 (en) 2012-11-21
JP2013517576A (en) 2013-05-16
US20110179242A1 (en) 2011-07-21

Similar Documents

Publication Publication Date Title
CN102341794B (en) Configurable cache and method to configure same
CN104583938B (en) Data extraction system and method in vector processor
KR100325430B1 (en) Data processing apparatus and method for performing different word-length arithmetic operations
US7130952B2 (en) Data transmit method and data transmit apparatus
CN103827818B (en) FIFO loading instructions
CN103207773A (en) System And Method Of Processing Data Using Scalar/vector Instructions
WO2001089098A2 (en) A method and system for performing permutations with bit permutation instructions
CN102713875B (en) Comprise the selection of combination and the multistage multiplex computing of alignment of data or data Replica
KR101635116B1 (en) Selective coupling of an address line to an element bank of a vector register file
CN102160031A (en) system and method to execute a linear feedback-shift instruction
US6442729B1 (en) Convolution code generator and digital signal processor which includes the same
WO2012137428A1 (en) Data processing device and data processing method
KR101449732B1 (en) System and method of processing hierarchical very long instruction packets
CN106796505A (en) Instruct the method and processor for performing
US20090249032A1 (en) Information apparatus
JP2013517576A5 (en)
US20240053989A1 (en) Hardware-based message block padding for hash algorithms
US20240061961A1 (en) Hardware-based implementation of secure hash algorithms
JP3917357B2 (en) Non-linear conversion method, computer-readable recording medium storing program, and non-linear conversion device
JP2006155448A (en) Data processor and method for designing data processor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant