AU3134501A - Apparatus for stack capable of multiple push or pop operations in one clock cycle - Google Patents

Apparatus for stack capable of multiple push or pop operations in one clock cycle Download PDF

Info

Publication number
AU3134501A
AU3134501A AU31345/01A AU3134501A AU3134501A AU 3134501 A AU3134501 A AU 3134501A AU 31345/01 A AU31345/01 A AU 31345/01A AU 3134501 A AU3134501 A AU 3134501A AU 3134501 A AU3134501 A AU 3134501A
Authority
AU
Australia
Prior art keywords
stack
cells
stack apparatus
compositing
clock cycle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
AU31345/01A
Other versions
AU766816B2 (en
Inventor
Yu-Ling Chen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AUPQ6554A external-priority patent/AUPQ655400A0/en
Application filed by Canon Inc filed Critical Canon Inc
Priority to AU31345/01A priority Critical patent/AU766816B2/en
Publication of AU3134501A publication Critical patent/AU3134501A/en
Application granted granted Critical
Publication of AU766816B2 publication Critical patent/AU766816B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Landscapes

  • Executing Machine-Instructions (AREA)

Description

S&FRef: 541412
AUSTRALIA
PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT
ORIGINAL
Name and Address of Applicant: Canon Kabushiki Kaisha 30-2, Shimomaruko 3-chome, Ohta-ku Tokyo 146 Japan Actual Inventor(s): Address for Service: Yu-Ling Chen Spruson Ferguson St Martins Tower,Level 31 Market Street Sydney NSW 2000 Apparatus for Stack Capable of Multiple Push or Pop Operations in One Clock Cycle Invention Title: ASSOCIATED PROVISIONAL APPLICATION DETAILS [33] Country [31] Applic. No(s) AU PQ6554 [32] Application Date 29 Mar 2000 The following statement is a full description of this invention, including the best method of performing it known to me/us:- 5815c -1- APPARATUS FOR STACK CAPABLE OF MULTIPLE PUSH OR POP OPERATIONS IN ONE CLOCK CYCLE Field of the Invention The present invention relates to optimisation of stack operations and has particular application to graphics compositing stacks.
Background In graphics rendering, graphical objects are often composited together, sometimes to obtain some special visual effect. For a binary compositing operation there are two operands, named source and destination. The source operand is either provided 10 with the compositing command, or obtained from a previously computed intermediate result, which is sometimes retained in a stack. The destination operand derives typically from the top of stack. Intermediate compositing results are then pushed back onto the compositing stack for further compositing. When the source operand is obtained from the stack, such processing requires two pops from, and one push to, the compositing stack.
15 Conventionally implemented stacks can only finish a single stack operation in each clock cycle. As the intermediate result is needed for the next compositing operation, the compositing pipeline has to stall until the intermediate result is computed. Such a situation slows down data throughput in image rendering. It is desirable to ameliorate delays in such operations.
Summary of the Invention It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.
According to a first aspect of the invention, there is provided stack apparatus comprising: a first plurality of stack cells arranged for moving data content between said cells; 541412.doc -2at least one selector device coupled to a second plurality said cells to provide access to data content within each of said second plurality of said cells; and means for manipulating said at least one selector device to select at least one of said second plurality of said cells as an output of said stack apparatus.
Brief Description of the Drawings At least one preferred embodiment of the present invention will now be described with reference to the appended drawings and tables, in which: Fig. 1 is a flow chart depicting a conventional stack process; S. :Fig. 2 is a flowchart depicting a stack process according to the present 10 disclosure; :000 Figs. 3A to 3D illustrate stack operations performed by conventional arrangements and desired to be replicated by more efficient processes; Fig. 4 shows a multiple access stack structure formed using shift registers; ooo Fig. 5 is an arrangement similar to Fig. 4 but formed using memory; 1: 5 Table 1 shows the stack change and actions for Fig 4; and croo Table 2 shows the stack change and actions for Fig Detailed Description Binary raster operations require two operands source and destination. When performed in a stack, such operations are controlled by a 2-bit STACK_OP field. As seen in Fig. 3A, in a normal operation, the STACK_OP field is set to STACKSTD OP (00).
The source operand 301 comes along with the raster operation code 302 and the destination operand 303 is popped 304a from the top of the stack 304. When a raster operation 306 between the operands 301 and 303 is complete, the result 305 is pushed onto the stack 304. If the stack 304 is initially empty, opaque white is traditionally substituted for the destination operand 303.
541412.doc -3- If the destination operand needs to be kept for a later operation, the STACK OP field is set to STACK_NO_TOP_DEST as seen in Fig 3B. In such a case, the destination operand 307 is read 308 from the top level of the stack 304, but is not popped, and the result 309 is pushed onto the stack.
To perform a graphics operation on the top two levels of the stack, rather than using data that comes with the raster operation code, the STACKOP field is set to STACK_POP_SRC (10) as seen in Fig 3C. In this case, the source operand 310 is popped from the stack 311 first, and then the destination operand 312 is popped from the So-: stack 313. The result 313 is pushed onto the stack 311 after the operation is complete. If the stack 311 is empty, opaque white can again be substituted.
i If the STACK_OP field is set to STACK_KEEP_SRC (11) as shown in Fig. 3D, the source operand is popped 315 from the top of the stack, and then the destination •i operand is also popped 316. The source operand and the result are pushed (318 and 317 respectively) onto the stack after the operation 319 is complete, leaving the stack in the S 15 configuration shown at 319. If the stack is empty, again opaque white is substituted.
In the above operations, only the intermediate result is written back to the stack.
The final compositing result is output to the downstream device without writing back to the stack. In Figs. 3A to 3D, the pseudo code represented next to each STACK_OP setting shows the required steps. The last step in each pseudocode, push(result), is omitted if the compositing command is the last for each pixel.
Fig. 1 shows, in flow diagram form, the conventional implementation of the above stack operations. A command is received in step 101 and decoded at step 201. In each case of Figs. 3A to 3D, an operand is popped from the stack, this being shown at step 103. Step 104 determines from the decoded operation whether another operand is required from the stack. If so, a feedback check loop 105 requires multiple clock cycles 541412.doc to complete a compositing command using a conventionally implemented stack. If not (106), the composite step 107 is performed. The destination of the result or operands is determined in step 108 where the compositing result is output at 109 (eg. last composite in stack) or steps 110, 111 and 112 to reconfigure the stack for the next composite operation.
Fig. 2 shows a modified stack operation flow according to the present disclosure, which analyses the stack depth changes after the compositing command and updates the stack after compositing accordingly. In Fig. 2, a command is received at step 201 and decoded at step 202 as in the conventional approach. At step 203, the composite 10 operation to be performed is analysed and the required stack depth determined so that operands can be simultaneously extracted from the stack where required. Compositing is then performed at step 204. Step 205 then determines what is required to update the stack. For a final composite, no update is required and path 206 directly provides the output compositing result 209. Otherwise, path 207 causes step 208 to update the stack, simultaneously where required and also provide the output result 209.
A first implementation of the process of Fig. 2 uses shift registers and is shown in Fig. 4. In Fig. 4, a stack 400 is shown formed of n m-bit shift registers 401 having two top elements 402 and 404 readable at any time via two multiplexers 414 and 416. The registers 401 are also are arranged for shifting either left or right. The multiplexers 414 and 416 are each controlled by the STACKOP signal that can acquire one of two values:
STACKKEEPSRC
STACK POP SRC When STACK_OP is set to STACK_KEEP_SRC, the source operand 406 is obtained from a compositing command 410, compose_d, and the destination operand 408 is obtained from the top element 402.
541412.doc When STACK_OP is set to STACK_POP_SRC, the source operand 406 is obtained from the top element 402 and the destination operand is popped from the second element 404. The top of the stack 400 is indexed as 0. The stack 400 has a bottom pointer 418 which points to 0 when the stack 400 is empty and incremented when new data is pushed to stack 400. The bottom pointer 400 points to n when the stack 400 is full.
Table 1 included with the drawings of this specification, lists the stack depth change after the completion of a compositing command and the actions required for update. In normal operation, actions are to be carried out in the same clock cycle. The 10 registers 401 perform a right shift (as per Fig. 4) when the stack depth decreases and left shift when the stack depth increases. When the stack depth has no change, only the top element or the top second element is updated. In the case where the stack depth change is more than one, the update can be done in two clock cycles with one shift in each clock cycle. At the end of each compositing command, the top of stack is written with the compositing result, if it is not the last compositing command. On the last compositing command of each pixel, the result will be output to the downstream device instead of pushed back to the stack.
The stack 400 is reset with a WHITE signal 420 and maintained WHITE in the empty entries 401 by shifting WHITE data into the stack 400, (stack(n-1)<=WHITE as shown in Table 1) when the number of data in the stack decreases. If the stack overflows or underflows, exceptions are raised. In these cases, compositing can still continue. An example of stack overflow is when STACK_OP is set to STACK_NO_POP_DEST (as shown in Fig. 3B) while the stack is already full. In this case, the actions will become 1. stack(0)<=result 2. stack(bottom-l: 1)<=stack(bottom-2:0).
541412.doc -6- This means the element in the bottom of the stack will be discarded. When the stack is empty, opaque white data will be used. The stack should be reset to a default colour (opaque white in this example) and maintained at the default colour by shifting the default colour into the stack when the stack depth decreases.
A further implementation is performed using memory as shown in Fig. 5. In this embodiment, a stack 500 is formed of n words of m-bits of 3 port memory 520, having two read ports 501 and 502 and one write port 503. Ports 501 and 502 are for reading from the memory 520, while port 503 is for writing to the memory 520. Three pointers, 504, plusl 505 and minusl 506 represent the top element, a vacant location next to 10 the top element and the top second element of the stack, respectively, and access an address port 507 of the memory 520. The pointers 504-506 are incremeneted or decremented as the stack depth changes. Entry 0 and 1 in the memory 520 are reserved for the default colour, so the default colour can be output when the stack 500 is empty.
As a result, the 3 pointers (minusl, top and plusl) are reset to 0, 1 and 2 respectively.
The first data pushed into the stack 500 is therefore written to the entry indexed as 2.
Data is extracted from the stack 500 using a pair of multiplexers 508 and 504 via the read ports 501 and 502 in a manner similar to the stack 400 of Fig. 4, and for which corresponding operation applies.
Table 2 lists the stack depth change after the completion of compositing command and the actions required for update. Normal stack actions are carried out in the same clock cycle. On the case of updating two elements, the update can be performed in two clock cycles with one write in each clock cycle. At the end of each compositing command, the top of stack will be written with the compositing result, if it is not the last compositing command. On the last compositing command of each pixel, the result will be output to the downstream device instead of pushed back to the stack.
541412.doc Industrial Applicability It is apparent from the above that which is applicable to the hardware stack implementations where operations require the utilization of plural stack entries.
The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiment(s) being illustrative and not restrictive.
For example, whilst described with reference to computer graphics operations involving two operands, the stack implementations described can be used or modified for use in any environment where a stack is desired. Further, by increasing the number of multiplexers (414,416; 508,504) arrangements may be formed which can be provided for the popping from the stack of more than two stack values. Specifically, where two or S more multiplexers are in use, any combination of the multiplexers may be used to provide plural outputs. Such may be appropriate where complex operations are required involving the multiple use of one or more operands.
15 In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including" and not "consisting only of'. Variations of the word comprising, such as "comprise" and "comprises" have corresponding meanings.
541412.doc

Claims (3)

1. Stack apparatus comprising: a first plurality of stack cells arranged for moving data content between said cells; at least one selector device coupled to a second plurality of said cells to provide access to data content within each of said second plurality of said cells; and means for manipulating said at least one selector device to select at least one of said second plurality of said cells as an output of said stack apparatus.
9. 9oo 2. Stack apparatus according to claim 1, wherein said means for manipulating is operable to simultaneously provide as said output a third plurality of said second plurality of said cells. 15 3. Stack apparatus according to claim 1 or 2, wherein said at least one selector oooo device comprises at least first and second multiplexers each coupled to respective ones of said second plurality of said cells, wherein a control input of each of said multiplexers comprises said means for manipulating. 4. Stack apparatus according to claim 1, 2 or 3, further comprising means for setting each of said cells to a predetermined value. Stack apparatus according to claim 4, wherein said means for setting simultaneously sets each of said cells to said predetermined value.
541412.doc -9- 6. Stack apparatus according to any one of claims 1 to 5, wherein said cells comprise a plurality of shift registers. 7. Stack apparatus according to any one of claims 1 to 5, wherein said cells comprise locations within a memory device. 8. Stack apparatus according to any one of the preceding claims wherein said second plurality of cells is accessible during a single clock cycle. 6 6666.6 6 9 6 @6 6 66 9. Stack apparatus substantially as described herein with reference to Fig. 4 of the drawings. Stack apparatus substantially as described herein with reference to Fig. 5 of the drawings. 11. A computer graphics compositing system comprising stack apparatus according to any one of the preceding claims. DATED this twenty-seventh Day of March, 2001 Canon Kabushiki Kaisha Patent Attorneys for the Applicant/Nominated Person SPRUSON FERGUSON 541412.doc Explanatory Note: 31345/01 For graphics compositing stacks.
AU31345/01A 2000-03-29 2001-03-27 Apparatus for stack capable of multiple push or pop operations in one clock cycle Ceased AU766816B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU31345/01A AU766816B2 (en) 2000-03-29 2001-03-27 Apparatus for stack capable of multiple push or pop operations in one clock cycle

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
AUPQ6554A AUPQ655400A0 (en) 2000-03-29 2000-03-29 Apparatus for stack capable of multiple push or pop operations in on clock cycle
AUPQ6554 2000-03-29
AU31345/01A AU766816B2 (en) 2000-03-29 2001-03-27 Apparatus for stack capable of multiple push or pop operations in one clock cycle

Publications (2)

Publication Number Publication Date
AU3134501A true AU3134501A (en) 2001-10-04
AU766816B2 AU766816B2 (en) 2003-10-23

Family

ID=25621747

Family Applications (1)

Application Number Title Priority Date Filing Date
AU31345/01A Ceased AU766816B2 (en) 2000-03-29 2001-03-27 Apparatus for stack capable of multiple push or pop operations in one clock cycle

Country Status (1)

Country Link
AU (1) AU766816B2 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5142635A (en) * 1989-04-07 1992-08-25 Intel Corporation Method and circuitry for performing multiple stack operations in succession in a pipelined digital computer
US5930820A (en) * 1996-03-18 1999-07-27 Advanced Micro Devices, Inc. Data cache and method using a stack memory for storing stack data separate from cache line storage
US6349383B1 (en) * 1998-09-10 2002-02-19 Ip-First, L.L.C. System for combining adjacent push/pop stack program instructions into single double push/pop stack microinstuction for execution

Also Published As

Publication number Publication date
AU766816B2 (en) 2003-10-23

Similar Documents

Publication Publication Date Title
US20200143587A1 (en) Processing primitives which have unresolved fragments in a graphics processing system
US6092176A (en) System and method for assigning tags to control instruction processing in a superscalar processor
US5669012A (en) Data processor and control circuit for inserting/extracting data to/from an optional byte position of a register
US20030105945A1 (en) Methods and apparatus for a bit rake instruction
KR20170118946A (en) Data processing apparatus and method
CN105488842A (en) Depth forwarding in a graphics processing system
JPS6162980A (en) Picture memory peripheral lsi
US7370180B2 (en) Bit field extraction with sign or zero extend
EP2302510A1 (en) System for matrix multipy operation with wide operand architecture and method
US6542989B2 (en) Single instruction having op code and stack control field
US7512290B2 (en) Image processing apparatus with SIMD-type microprocessor to perform labeling
JPH06162228A (en) Data flow processor device
WO1998028680A1 (en) Method and apparatus for performing a masked byte addition operation
US6727904B2 (en) System and method for rendering graphical data
AU766816B2 (en) Apparatus for stack capable of multiple push or pop operations in one clock cycle
US20090154834A1 (en) Rendering system and data processing method for the same
US4980853A (en) Bit blitter with narrow shift register
US20020083307A1 (en) System and method for partial merges for sub-register data operations
US6983297B2 (en) Shifting an operand left or right while minimizing the number of multiplexor stages
AU704659B2 (en) Method of transposing data
KR20020067003A (en) System for processing graphic patterns
US6378057B1 (en) Data processing apparatus
KR20010072505A (en) Data-processing arrangement comprising a plurality of processing and memory circuits
JPH0749779A (en) Method and device for interleaving
JP4256263B2 (en) Texture rising method and apparatus

Legal Events

Date Code Title Description
DA3 Amendments made section 104

Free format text: THE NATURE OF THE AMENDMENT IS: SUBSTITUTE PATENT REQUEST REGARDING ASSOCIATED DETAILS

FGA Letters patent sealed or granted (standard patent)