CN108427574A - 微处理器加速的代码优化器 - Google Patents

微处理器加速的代码优化器 Download PDF

Info

Publication number
CN108427574A
CN108427574A CN201810449173.XA CN201810449173A CN108427574A CN 108427574 A CN108427574 A CN 108427574A CN 201810449173 A CN201810449173 A CN 201810449173A CN 108427574 A CN108427574 A CN 108427574A
Authority
CN
China
Prior art keywords
instruction
dependence
grouping
sequence
microprocessor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810449173.XA
Other languages
English (en)
Other versions
CN108427574B (zh
Inventor
M·阿布达拉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to CN201810449173.XA priority Critical patent/CN108427574B/zh
Publication of CN108427574A publication Critical patent/CN108427574A/zh
Application granted granted Critical
Publication of CN108427574B publication Critical patent/CN108427574B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3808Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/3017Runtime instruction translation, e.g. macros
    • G06F9/30174Runtime instruction translation, e.g. macros for non-native instruction set, e.g. Javabyte, legacy code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3853Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

本发明涉及微处器加速的代码优化器。一种用于加速微处理器的代码优化的方法。该方法包括使用指令提取部件来提取传入的微指令序列并且向解码部件传送所提取的宏指令用于解码成微指令。通过将微指令序列重排序成包括多个依赖代码组的优化的微指令序列来执行优化处理。向微处理器流水线输出优化的微指令序列用于执行。优化的微指令序列的副本存储到序列高速缓存中,用于在后续命中优化的微指令序列时的后续使用。

Description

微处理器加速的代码优化器
本发明专利申请是国际申请号为PCT/US2011/061957,国际申请日为2011年11月22日,进入中国国家阶段的申请号为201180076248.0,名称为“微处理器加速的代码优化器”的发明专利申请的分案申请。
有关申请的交叉引用
本申请与Mohammad A.Abdallah提交于2010年1月5日、名称为"APPARATUS ANDMETHOD FOR PROCESSING COMPLEX INSTRUCTION FORMATS IN A MULTITHREADEDARCHITECTURE SUPPORTING VARIOUS CONTEXT SWITCH MODES AND VIRTUALIZATIONSCHEMES"的共同未决、共同转让的美国专利申请第2010/0161948号有关,并且将其完全结合于此。
本申请与Mohammad A.Abdallah提交于2008年12月19日、名称为"APPARATUS ANDMETHOD FOR PROCESSING AN INSTRUCTION MATRIX SPECIFYING PARALLEL IN DEPENDENTOPERATIONS"的共同未决、共同转让的美国专利申请第2009/0113170号有关,并且将其完全结合于此。
本申请与Mohammad A.Abdallah提交于2010年9月17日、名称为"SINGLE CYCLEMULTI-BRANCH PREDICTION INCLUDING SHADOW CACHE FOR EARLY FAR BRANCHPREDICTION"共同未决、共同转让的美国专利申请第61/384,198号有关,并且将其完全结合于此。
本申请与Mohammad A.Abdallah提交于2011年3月25日、名称为"EXECUTINGINSTRUCTION SEQUENCE CODE BLOCKS BY USING VIRTUAL CORES INSTANTIATED BYPARTITIONABLE ENGINES"的共同未决、共同转让的美国专利申请第61/467,944号有关,并且将其完全结合于此。
技术领域
本发明总体上涉及数字计算机***、更具体地涉及一种用于选择包括指令序列的指令的***和方法。
背景技术
要求处理器处置相互依赖的多个任务或者完全独立的多个任务。这样的处理器的内部状态通常由可以在每个特定程序执行瞬间保持不同值的寄存器构成。在每个程序执行瞬间,内部状态映像被称为处理器的架构状态。
在切换代码执行以运行另一函数(例如另一线程、进程或者程序)时,必须保存机器/处理器的状态,从而新函数可以利用内部寄存器以构建它的新状态。一旦终止新函数,然后可以丢弃它的状态,并且将恢复先前上下文的状态并且执行重建。这样的切换过程被称为上下文切换并且尤其对于运用大量(例如64、128、256个)寄存器和/或无序执行的现代架构通常包括数十或者数百个周期。
在线程认知硬件架构中,对于硬件通常支持用于有限数目的由硬件支持的线程的多个上下文状态。在这一情况下,硬件重复用于每个支持的线程的所有架构状态元件。这消除对于在执行新线程时的上下文切换的需要。然而这仍然具有多个缺点、即为在硬件中支持的每个附加线程重复所有架构状态元件(即寄存器)的面积、功率和复杂性。此外,如果软件线程数目超过显式支持的硬件线程的数目,则仍然必须执行上下文切换。
这随着在需要大量线程的精细粒度基础上需要并行而变得普遍。具有重复上下文状态硬件存储的硬件线程认知架构无助于非线程式软件代码而仅减少用于线程式软件的上下文切换数目。然而那些线程通常被构造用于谷粒平行性并且造成用于启动和同步的繁重软件开销从而让精细粒度并行、比如函数调用和循环并行执行而没有高效线程启动/自动生成。这样的描述的开销伴随有难以使用现有技术的用于非显式/易于并行化/线程化的软件代码的编译器或者用户并行化技术的这样的代码的自动并行化。
发明内容
在一个实施例中,将本发明实施为一种用于加速微处理器中的代码优化的方法。该方法包括使用指令提取部件来提取传入的宏指令序列并且向解码部件传送所提取的宏指令用于解码成微指令。通过将微指令序列重排序成包括多个依赖代码组的优化的微指令序列来执行优化处理。然后向微处理器流水线输出优化的微指令序列用于执行。优化的微指令序列的副本被存储在序列高速缓存中用于在后续命中优化的微指令序列时的后续使用。
前文为发明内容、因此必然地包含简化、概括和省略的细节;因而本领域技术人员将理解发明内容仅为示例而未旨在于以任何方式限制。仅如权利要求限定的本发明的其它方面、发明特征和优点将在以下阐述的非限制具体描述中变得清楚。
附图说明
在附图的各图中通过示例而非通过限制来举例说明本发明,并且在附图中,相似标号指代相似元素。
图1示出根据本发明的一个实施例的微处理器的分配/发布级的概况图。
图2示出对根据本发明的一个实施例的优化过程进行图示的概况图。
图3示出根据本发明的一个实施例的多步骤优化过程。
图4示出根据本发明的一个实施例的多步骤优化和指令移动过程。
图5示出根据本发明的一个实施例的示例硬件优化过程的步骤的流程图。
图6示出根据本发明的一个实施例的备选示例硬件优化过程的步骤的流程图。
图7示出对根据本发明的一个实施例的分配/发布级的CAM匹配硬件和优先级编码硬件的操作进行示出的图。
图8示出对根据本发明的一个实施例的在分支之前的优化的调度进行图示的图。
图9示出对根据本发明的一个实施例的在存储之前的优化的调度进行图示的图。
图10示出根据本发明的一个实施例的示例软件优化过程的图。
图11示出根据本发明的一个实施例的SIMD基于软件的优化过程的流程图。
图12示出根据本发明的一个实施例的示例SIMD基于软件的优化过程的操作步骤的流程图。
图13示出根据本发明的一个实施例的基于软件的依赖性广播过程。
图14示出对根据本发明的一个实施例如何使用指令的依赖性分组以构建依赖指令的可变有界组的进行示出的示例流程图。
图15示出对根据本发明的一个实施例的指令的分级调度进行描绘的流程图。
图16示出对根据本发明的一个实施例的三时隙依赖性指令组的分级调度进行描绘的流程图。
图17示出对根据本发明的一个实施例的三时隙依赖性指令组的分级移动窗口调度进行描绘的流程图。
图18示出根据本发明的一个实施例如何向多个计算引擎分配指令的可变大小的依赖链(例如可变有界组)。
图19示出对根据本发明的一个实施例的向调度队列的块分配和三时隙依赖性指令组的分级移动窗口调度进行描绘的流程图。
图20示出根据本发明的一个实施例如何在引擎上执行依赖代码块(例如依赖性组或者依赖性链)。
图21示出根据本发明的一个实施例的多个引擎及其部件的概况图,这些部件包括全局前端提取和调度器和寄存器文件、全局互连以及用于多核处理器的片段式存储器子***。
图22示出根据本发明的一个实施例的多个分段、多个分段式公共分区调度器和互连以及进入分段的端口。
图23示出根据本发明的一个实施例的示例微处理器流水线的图。
具体实施方式
虽然已经与一个实施例结合来描述本发明,但是本发明未旨在于限于这里阐述的具体形式。恰好相反,它旨在于覆盖如可以在如所附权利要求限定的本发明的范围内合理包括的这样的备选、修改和等同。
在以下具体描述中,已经阐述许多具体细节、比如具体方法顺序、结构、单元和连接。然而将理解无需利用这些和其它具体细节以实现本发明的实施例。在其它境况中,已经省略或者尚未具体描述公知结构、单元或者连接以免不必要地模糊本描述。
在说明书内对“一个实施例”或者“实施例”的引用旨在于指示与该实施例结合描述的特定特征、结构或者特性包含于本发明的至少一个实施例中。在说明书内的各处出现短语“在一个实施例中”未必都指代相同实施例,并且分离或者备选实施例未与其它实施例互斥。另外,描述各种特征可以被一些实施例而未被其它实施例表现。相似地,描述各种要求可以对于一些实施例、但是并非其它实施例的要求。
在对计算机存储器内的数据位的操作的流程、步骤、逻辑块、处理和其它符号表示方面呈现以下具体描述的一些部分。这些描述和表示是在数据处理领域中的技术人员用来向本领域中的其他技术人员最有效传达他们的工作的实质的手段。这里并且通常设想流程、计算机执行的步骤、逻辑块、过程等是促成所需结果的自一致步骤或者指令序列。该步骤是需要物理操控物理量的步骤。这些量尽管未必但是通常采用计算机可读存储介质的电信号或磁信号的形式并且能够在计算机***中被存储、传送、组合、比较和另外操作。已经证实主要出于普遍用法的原因而将这些信号称为位、值、单元、符号、字符、项、数等有时是方便的。
然而应当谨记这些和相似术语中的所有术语将与适当物理量关联并且仅为应用于这些量的方便标签。除非如从以下讨论中清楚的那样另有具体明示,理解贯穿本发明利用术语、比如“处理”或者“访问”或者“写入”或者“存储”或者“重复”等的讨论指代计算机***或者相似电子计算设备的动作和过程,该计算机***或者电子计算设备将在计算机***的寄存器和存储器以及其它计算机可读介质内的表示为物理(电子)量的数据操控和变换成在计算机***存储器或者寄存器或者其它这样的信息存储、传输或者显示设备内的相似地表示为物理量的其它数据。
在一个实施例中,将本发明实施为一种用于加速微处理器中的代码优化的方法。该方法包括使用指令提取部件来提取传入的微指令序列并且向解码部件传送所提取的宏指令用于解码成微指令。通过将微指令序列重排序成包括多个依赖代码组的优化的微指令序列来执行优化处理。该优化的微指令序列被输出至微处理器流水线用于执行。优化的微指令序列的副本被存储在序列高速缓存中用于在向优化的微指令序列后续命中时的后续使用。
图1示出根据本发明的一个实施例的微处理器100的分配/发出级的概况图。如图1中所示,微处理器100包括提取部件101、本机解码部件102以及指令调度和优化部件110和微处理器的其余流水线105。
在图1的实施例中,宏指令由提取部件101提取并且由本机解码部件102解码成本机微指令,该本机解码部件然后向微指令高速缓存121以及指令调度和优化器部件110提供该本机微指令。在一个实施例中,所提取的宏指令包括通过预测某些分支而组装的指令序列。
宏指令序列由本机解码部件102解码成结果微指令序列。这一微指令序列然后通过复用器103向指令调度和优化部件110传输。该指令调度和优化器部件通过例如通过对微指令序列的某些指令重排序来执行优化处理用于更高效执行而工作。这产生然后通过复用器104向其余流水线105(例如分配、派发、执行和引退级等)传送的优化的微指令序列。该优化的微指令序列造成指令的更快和更高效执行。
在一个实施例中,宏指令可以是来自高级指令集架构的指令,而微指令是低级机器指令。在另一实施例中,宏指令可以是来自多个不同指令集架构(例如与CISC相似、x86、与RISC相似、MIPS、SPARC、ARM、与虚拟相似、JAVA等)的客户指令,而微指令是低级别机器指令或者不同本机指令集架构的指令。相似地,在一个实施例中,宏指令可以是架构的本机指令,并且微指令可以是该相同架构的已经重排序和优化的本机微指令。例如X86宏指令和X86微代码化的微指令。
在一个实施例中,为了加速频繁遇到的代码(例如热代码)的执行性能,在微指令高速缓存121内高速缓存频繁遇到的微指令序列的副本,并且在序列高速缓存122内高速缓存频繁遇到的优化的微指令序列的副本。在代码被提取、解码、优化和执行时,可以通过描绘的驱逐和填充路径130根据序列高速缓存的大小来驱逐或者提取某些优化的微指令序列。这一驱逐和填充路径允许向微处理器的存储器分级(例如L1高速缓存、L2高速缓存、特殊可缓冲存储器范围等)和从微处理器的存储器分级传送优化的微指令序列。
应当指出在一个实施例中可以省略微指令高速缓存121。在这样的实施例中,通过在序列高速缓存122内存储优化的微指令序列来提供热代码的加速。例如通过省略微指令高速缓存121而节省的空间可以例如用来实现更大序列高速缓存122。
图2示出对根据本发明的一个实施例的优化过程进行图示的概况图。图2的左手侧示出如例如从本机解码部件102或者微指令高速缓存121接收的传入的微指令序列。在第一次接收这些指令时,它们未被优化。
优化过程的一个目的是对相互依赖的指令进行定位和标识并且将它们移入它们的相应依赖性组中,从而它们可以更高效执行。在一个实施例中,可以一起派发依赖指令组,从而它们可以更高效执行,因为它们的相应源和目的被一起分组用于本地。应当指出这一优化过程可以使用在无序处理器以及依序处理器二者中。例如在依序处理器中,指令被依序地派发。然而可以到处移动它们,从而如以上描述的那样在相应组中放置依赖指令,使得组然后可以独立执行。
例如传入指令包括加载、操作和存储。例如指令1包括如下操作,在该操作中将源寄存器(例如寄存器9和寄存器9)相加并且将结果存储在寄存器5中。因此,寄存器5是目的而寄存器9和寄存器5是源。以这一方式,16个指令的序列如图所述包括目的寄存器和源寄存器。
图2的实施例实施指令重排序以创建依赖性组,其中属于组的指令相互依赖。为了实现这一点,执行关于16个传入指令的加载和存储执行危险校验的算法。例如存储在无依赖性校验的情况下就不能移动过去更早的加载。存储不能传递更早存储。加载在无依赖性校验的情况下就不能传递更早加载。加载可以传递加载。指令可以通过使用重命名技术来传递在先路径预测的分支(例如动态构造的分支)。在非动态预测的分支的情况下,指令移动需要考虑分支的范围。以上规则中的每个规则可以通过添加虚拟依赖性来实施(例如通过向指令人为添加虚拟源或者目的以施行规则)。
仍然参照图2,如以上描述的那样,优化过程的目的是对依赖指令进行定位并且将它们移入公共依赖性组。这一过程必须根据危险校验算法来完成。优化算法寻找指令依赖性。指令依赖性还包括真实依赖性、输出依赖性和反依赖性。
算法通过首先寻找真实依赖性来开始。为了标识真实依赖性,比较16指令序列的每个目的与在16指令序列中更晚出现的其它后续源。真实依赖于更早指令的后续指令被标记“_1”以表明它们的真实依赖性。这在图2中由在16指令序列上从左向右进行的指令编号示出。例如考虑指令编号4,比较目的寄存器R3与后续指令的源,并且每个后续源被标记“_1”以指示该指令的真实依赖性。在这一情况下,指令6、指令7、指令11和指令15被标记“_1”。
算法然后寻找输出依赖性。为了标识输出依赖性,比较每个目的与其它后续指令的目的。并且对于16个指令中的每个指令,匹配的每个后续目的被标记“1_”(例如有时称为红色目的)。
算法然后寻找反依赖性。为了标识反依赖性,对于16个指令中的每个指令,比较每个源与更早指令的源以标识匹配。如果匹配出现,则在考虑之下的指令标记它本身为“1_”(例如有时称为红色指令)。
以这一方式,算法填充用于16个指令的序列的行和列的依赖性矩阵。依赖性矩阵包括标记,这些标记指示用于16个指令中的每个指令的不同依赖性类型。在一个实施例中,通过使用CAM匹配硬件和适当广播逻辑在一个周期中填充依赖性矩阵。例如目的可以通过其余指令向下广播以与后续指令的源比较(例如真实依赖性)以及与后续指令的目的比较(例如输出依赖性),而目的可以通过先前指令向上广播以与在先指令的源比较(例如反依赖性)。
优化算法使用依赖性矩阵以选择将哪些指令一起移入公共依赖性组。希望真实相互依赖的指令被移向相同组。寄存器重命名用来消除反依赖性以允许移动那些反依赖指令。该移动根据以上描述的规则和危险校验来完成。例如存储在无依赖性校验的情况下就不能移动过去更早的加载。存储不能传递更早存储。加载在无依赖性校验的情况下就不能传递更早存储。加载可以穿递加载。指令可以通过使用重命名技术来传递在先路径预测的分支(例如动态构造的分支)。在非动态预测的分支的情况下,指令移动需要考虑分支的范围。注意该描述。
在一个实施例中,可以实施优先级编码器以确定哪些指令获得移动以与其它指令分组。优先级编码器将根据依赖性矩阵提供的信息工作。
图3和图4示出根据本发明的一个实施例的多步骤优化过程。在一个实施例中,优化过程是迭代的,因为在指令通过移动它们的依赖性列在第一传递中被移动之后,重填充并且再次检查依赖性矩阵寻找移动指令的新机会。在一个实施例中,重复这一依赖性矩阵填充过程三次。这在图4中被示出,该图示出已经被移动、然后再次被检查寻找移动其它指令的机会的指令。在16个指令中的每个指令的右手侧上的编号序列示出指令在过程开始时所在的组和指令在过程完成时所在的组而在它们之间有居间组编号。例如图4示出指令6如何初始地在组4中、但是被移动在组1中。
以这一方式,图2至图4图示根据本发明的一个实施例的优化算法的操作。应当指出虽然图2至图4图示分配/发布级,但是也可以在本地调度器/派发级中实施这一功能。
图5示出根据本发明的一个实施例的示例硬件优化过程500的步骤的流程图。如在图5中描绘的那样,流程图示出根据本发明的一个实施例的如在微处理器的分配/发布级中实施的优化过程的操作步骤。
过程500在步骤501中开始,在该步骤中使用指令提取部件(例如来自图1的提取部件20)来提取传入宏指令序列。如以上描述的那样,提取的指令包括通过预测某些指令分支而组装的序列。
在步骤502中,将提取的宏指令向解码部件传送用于解码成微指令。该宏指令序列根据分支预测而解码成微指令序列。在一个实施例中,微指令序列然后在微指令高速缓存中存储。
在步骤503中,然后通过将包括微指令的序列重排序成依赖性组对微指令序列执行优化处理。重排序由指令重排序部件(例如指令调度和优化器部件110)实施。在图2至图4中描述这一过程。
在步骤504中,优化的微指令序列是向微处理器流水线的输出用于执行。如以上描述的那样,将优化的微指令序列向机器的其余部分转发用于执行(例如其余流水线105)。
并且后续在步骤505中,将优化的微指令序列的副本在序列高速缓存中存储用于在向该序列后续命中时的后续使用。以这一方式,序列高速缓存实现在后续命中优化的微指令序列时访问那些序列、由此加速热代码。
图6示出根据本发明的一个实施例的备选示例硬件优化过程600的步骤的流程图。如在图6中描绘的那样,流程图示出根据本发明的一个备选实施例的如在微处理器的分配/发布级中实施的优化过程的操作步骤。
过程600在步骤601中开始,在该步骤中使用指令提取部件(例如来自图1的提取部件20)来提取传入宏指令序列。如以上描述的那样,提取的指令包括通过预测某些指令分支而组装的序列。
在步骤602中,将提取的宏指令向解码部件传送用于解码成微指令。将宏指令序列根据分支预测解码成微指令序列。在一个实施例中,微指令序列然后在微指令高速缓存中存储。
在步骤603中,将所解码的微指令在微指令序列高速缓存中存储成序列。根据基本块边界形成微指令高速缓存中的序列以开始。这些序列在这一点未被优化。
在步骤604中,然后通过将包括微指令的序列重排序成依赖性组对微指令序列进行优化处理。重排序由指令重排序部件(例如指令调度和优化器部件110)实施。在图2至图4中描述这一过程。
在步骤605中,优化的微指令序列是向微处理器流水线的输出用于执行。如以上描述的那样,将优化的微指令序列向机器的其余部分转发用于执行(例如其余流水线105)。
并且后续在步骤606中,将优化的微指令序列的副本在序列高速缓存中存储用于在向该序列后续命中时用于后续使用。以这一方式,序列高速缓存实现在后续命中优化的微指令序列时访问那些序列、由此加速热代码。
图7示出对根据本发明的一个实施例的分配/发布级的CAM匹配硬件和优先级编码硬件的操作进行示出的图。如在图7中描绘的那样,从左侧向CAM阵列中广播指令的目的。示出三个示例指令目的。更浅色阴影的CAM(例如绿色)用于真实依赖性匹配和输出依赖性匹配,因此将该目的向下广播。更深色阴影(例如蓝色)用于反依赖性匹配,因此将目的向上广播。这些匹配如以上描述的那样填充依赖性矩阵。在右侧上示出优先级编码器,并且它们通过扫描CAM行以发现第一匹配“_1”或者“1_”来工作。如以上在图2-图4的讨论中描述的那样,该过程可以实施为迭代。例如,如果“_1”被“1_”阻塞,则可以将该目的重命名和移动。
图8示出对根据本发明的一个实施例的在分支之前的优化的调度指令进行图示的图。如图8中所示,与传统准时编译器实例并排描绘硬件优化的示例。图8的左手侧示出原有未优化的代码,该代码包括偏置的未取得的分支“Branch C to L1”。图8的中间列示出传统准时编译器优化,其中将寄存器重命名并且将指令移动在分支之前。在这一示例中,准时编译器***补偿代码以考虑分支偏置判决错误的那些场合(例如其中将分支实际取得而不是未取得)。对照而言,图8的右列示出硬件展开的优化。在这一情况下,将寄存器重命名并且将指令移动在分支之前。然而应当指出未***补偿代码。硬件保持对分支偏置判决是否为真的追踪。在错误预测的分支的情况下,硬件自动退回它的状态以便执行正确的指令序列。硬件优化器解决方案能够避免使用补偿代码,因为在那些情况下,在未命中预测分支时,硬件跳转至存储器中的原有代码并且从该原有代码执行正确序列而刷新未命中预测的指令序列。
图9示出对根据本发明的一个实施例的在存储之前的优化调度加载进行图示的图。如图9中所示,与传统准时编译器示例并排描绘硬件优化的示例。图9的左手侧示出包括存储“R3<-LD[R5]”的原有未优化的代码。图9的中间列示出传统准时编译器优化,其中将寄存器重命名并且将加载移动在存储之前。在这一示例中,准时编译器***补偿代码以考虑加载指令的地址与存储指令的地址混淆的场合(例如其中将加载移动在存储之前是不适合的)。对照而言,图9的右列示出硬件展开的优化。在这一情况下,将寄存器重命名并且也将加载移动在存储之前。然而应当指出未***补偿代码。在将加载移动在存储之前是错误的情况下,硬件自动退回它的状态以便执行正确的指令序列。硬件优化器解决方案能够避免使用补偿代码,因为在那些情况下,在未命中预测地址混淆校验分支的情况下,硬件跳转至存储器中的原有代码并且从该原有代码执行正确序列而刷新未命中预测的指令序列。在这一情况下,序列假设无混淆。应当指出在一个实施例中,在图9中图解的功能可以由图1的指令调度和优化器部件110实施。相似地,应当指出在一个实施例中,在图9中描绘的功能可以由以下在图10中描述的软件优化器1000实施。
此外,关于动态地展开的序列,应当指出指令可以通过使用重命名来传递在先路径预测的分支(例如动态构造的分支)。在非动态预测的分支的情况下,指令移动应当考虑分支的范围。可以在希望的程度上展开循环,并且可以跨越整个序列应用优化。例如这可以通过将跨越分支移动的指令的目的寄存器重命名来实施。这一特征的益处之一是无需补偿代码或者分支范围的广泛分析。这一特征因此大量加速和简化优化过程。
关于分支预测和指令序列的组装的附加信息可以在Mohammad A.Abdallah提交于2010年9月17日的、名称为"SINGLE CYCLE MULTI-BRANCH PREDICTION INCLUDING SHADOWCACHE FOR EARLY FAR BRANCH PREDICTION"的、共同转让的美国专利申请第61/384,198号中发现,并且将其完全结合于此。
图10示出根据本发明的一个实施例的示例软件优化过程的图。在图10的实施例中,指令调度和优化器部件(例如图1的部件110)被基于软件的优化器1000替代。
在图10的实施例中,软件优化器1000执行由基于硬件的指令调度和优化器部件110执行的优化处理。软件优化器在存储器分级(例如L1、L2、***存储器)中维护优化的序列的副本。这允许软件优化器维护与在序列高速缓存中存储的优化的序列的汇集比较的大得多的优化的序列的汇集。
应当指出软件优化器1000可以包括在存储器分级中驻留的代码作为向优化器的输入和从优化过程的输出二者。
应当指出在一个实施例中可以省略微指令高速缓存。在这样的实施例中,仅高速缓存优化的微指令序列。
图11示出根据本发明的一个实施例的SIMD基于软件的优化过程的流程图。图11的顶部示出基于软件的优化器如何检查输入指令序列的每个指令。图11示出SIMD比较如何可以用来匹配一个与许多(例如SIMD字节比较第一源“Src1”与所有第二源字节“Scr2”)。在一个实施例中,Src1包含任何指令的目的寄存器,并且Src2包含来自每个其它后续指令的一个源。对于每个目的完成与所有后续指令源匹配(例如真实依赖性校验)。这是指示用于指令的希望组的按对匹配。在每个目的与每个后续指令目的之间完成匹配(例如输出依赖性校验)。这是可以用重命名化解的阻塞匹配。在每个目的与每个在先指令源之间完成匹配(例如反依赖性匹配)。这是可以通过重命名化解的阻塞匹配。结果用来填充依赖性矩阵的行和列。
图12示出根据本发明的一个实施例的示例SIMD基于软件的优化过程1200的操作步骤的流程图。该过程1200在图9的流程图的上下文中描述。
在步骤1201中,通过使用基于软件的优化器实例化的存储器来访问输入指令序列。
在步骤1202中,使用SIMD指令用通过使用SIMD比较指令序列从输入指令序列抽取的依赖性信息来填充依赖性矩阵。
在步骤1203中,从右向左扫描矩阵的行寻找第一匹配(例如依赖性标记)。
在步骤1204中,分析第一匹配中的每个第一匹配以确定匹配的类型。
在步骤1205中,如果第一标记的匹配是阻塞依赖性,则对于这一目的完成重命名。
在步骤1206中,标识用于矩阵的每行的所有第一匹配并且将用于该匹配的对应列移向给定的依赖性组。
在步骤1207中,重复扫描过程若干次以对包括输入序列的指令重排序以产生优化的输出序列。
在步骤1208中,将优化的指令序列向微处理器的执行流水线输出用于执行。
在步骤1209中,将优化的输出序列存储在序列高速缓存中用于后续消耗(例如加速热代码)。
应当指出可以使用SIMD指令来串行完成软件优化。例如可以通过一次处理一个指令从而扫描指令的源和目的(例如从在序列中的更早指令到后续指令)来实施优化。软件使用SIMD指令以根据以上描述的优化算法和SIMD指令并行比较当前指令源和目的与在先指令源和目的(例如检测真实依赖性、输出依赖性和反依赖性)。
图13示出根据本发明的一个实施例的基于软件的依赖性广播过程。图13的实施例示出处理指令组而未以如以上描述的完全并行硬件实现方式为代价的示例软件调度过程的流程图。然而图13的实施例仍然可以使用SIMD以并行处理更小指令组。
图13的软件调度过程进行如下。首先,该过程初始化三个寄存器。该过程取得指令编号并且将它们加载到第一寄存器中。该过程然后取得目的寄存器编号并且将它们加载到第二寄存器中。该过程然后取得第一寄存器中的值并且根据第二寄存器中的位置编号将它们向第三结果寄存器中的位置广播。该过程然后在第二寄存器中从左向右改写,最左侧值将在广播去往结果寄存器中的相同位置的那些实例中改写右侧值。第三寄存器中的尚未写入的位置被绕过。这一信息用来填充依赖性矩阵。
图13的实施例也示出可以将输入指令序列处理为多组的方式。例如可以将16输入指令序列处理为第一组8个指令和第二组8个指令。利用第一组,将指令编号加载第一寄存器中,将指令目的编号加载到第二寄存器中,并且根据第二寄存器中的位置编号(例如组广播)将第一寄存器中的值向第三寄存器(例如结果寄存器)中的位置广播。第三寄存器中的尚未写入的位置被绕过。第三寄存器现在变成用于处理第二组的基础。例如来自组1的结果寄存器现在变成用于处理组二的结果寄存器。
利用第二组,将指令编号加载到第一寄存器中,将指令目的编号加载到第二寄存器中,并且根据第二寄存器中的位置编号将第一寄存器中的值向第三寄存器(例如结果寄存器)中的位置广播。在第三寄存器中的位置可以改写在处理第一组期间写入的结果。第三寄存器中的尚未写入的位置被绕过。以这一方式,第二组更新来自第一组的基础并且由此产生用于处理第三组的新基础并且以此类推。
在第二组中的指令可以继承在第一组的处理中生成的依赖性信息。应当指出无需处理整个第二组以更新结果寄存器中的依赖性。例如可以在第一组的处理中生成用于指令12的依赖性,然后处理第二组中的指令直至指令11。这更新结果寄存器为直至指令12的状态。在一个实施例中,掩码可以用来防止用于第二组的其余指令(例如指令12至16)的更新。为了确定用于指令12的依赖性,检查结果寄存器寻找R2和R5。将用指令1更新R5,并且将用指令11更新R2。应当指出在处理组2的全部的情况下将用指令15更新R2。
此外,应当指出可以相互独立处理第二组的所有指令(指令9-16)。在这样的情况下,第二组的指令仅依赖于第一组的结果寄存器。一旦根据第一组的处理更新结果寄存器,则第二组的指令可以并行处理。以这一方式,可以接连并行处理指令组。在一个实施例中,使用SIMD指令(例如SIMD广播指令)来处理每组、由此并行处理所述每组中的所有指令。
图14示出对根据本发明的一个实施例可以如何使用指令的依赖性分组以构建依赖指令的可变有界组进行示出的示例流程图。在图2至图4的描述中,组大小被约束,在那些情况下每组三个指令。图14示出如何可以将指令重排序成可变大小组,该可变大小组然后可以向多个计算引擎进行分配。例如图14示出4个引擎。由于组可以根据它们的特性而大小可变,所以引擎1可以被分配例如比引擎2更大的组。这可以例如在引擎2具有如下指令情况下出现,该指令未具体依赖于该组中的其它指令。
图15示出对根据本发明的一个实施例的指令的分级调度进行描绘的流程图。如以上描述的那样,指令的依赖性分组可以用来构建可变有界组。图15示出如下特征,其中各种依赖性级存在于依赖性组内。例如指令1未依赖于这一指令序列内的任何其它指令、因此使指令1成为L0依赖性级。然而指令4依赖于指令1、因此使指令4成为L1依赖性级。以这一方式,如图所示将指令序列的指令中的每个指令指派依赖性级。
每个指令的依赖性级由第二级分级调度器用来以确保资源可用于执行依赖指令这样的方式派发指令。例如在一个实施例中,将L0指令加载到第二级调度器1-4处理的指令队列中。加载L0指令使得它们在队列中的每个队列前面,加载L1指令使得它们在队列中的每个队列中跟随、L2指令跟随它们并且以此类推。这在图15中由从L0至Ln的依赖性级示出。调度器1-4的分级调度有利地利用在时间上局部和指令到指令依赖性以用最优方式做出调度判决。
以这一方式,本发明的实施例提示用于指令序列的指令的依赖性组时隙分配。例如为了实施无序微架构,指令序列的指令的派发是无序的。在一个实施例中,在每个周期上校验指令准备就绪。如果先前已经派发它依赖于的所有指令,则指令准备就绪。调度器结构通过校验那些依赖性来工作。在一个实施例中,调度器是统一的调度器,并且在统一的调度器结构中执行所有依赖性校验。在另一实施例中,跨越多个引擎的执行单元的派发队列分布调度器功能。因此,在一个实施例中,调度器是统一的,而在另一实施例中,调度器是分布的。利用这两种解决方案,每个周期按照派发指令的目的来校验每个指令源。
因此,图15示出如本发明的实施例执行的分级调度。如以上描述的那样,首先将指令分组以形成依赖性链(例如依赖性组)。这些依赖性链的形成可以通过软件或者硬件静态或者动态完成。一旦这些依赖性链已经形成,可以向引擎分布/派发它们。以这一方式,依赖性分组允许无序调度的有序形成的组。依赖性分组也向多个引擎(例如核或者线程)上分布整个依赖性组。依赖性分组也有助于如以上描述的分级调度,其中依赖指令在第一步骤中被分组、然后在第二步骤中被调度。
应当指出在图14-图19中图解的功能可以与用来对指令进行分组的任何方法(例如是否在硬件、软件等中实施分组功能)独立地工作。此外,图14-图19中所示依赖性分组可以包括独立组的矩阵,其中每组还包括依赖指令。此外,应当指出调度器也可以是引擎。在这样的实施例中,调度器1-4中的每个调度器可以被并入于它的相应引擎内(例如如图22中所示,其中每个分段包括公共分区调度器)。
图16示出对根据本发明的一个实施例的三时隙依赖性指令组的分级调度进行描绘的流程图。如以上描述的那样,指令的依赖性分组可以用来构建可变有界组。在这一实施例中,依赖性组包括三个时隙。图16示出甚至在三时隙依赖性组内的各种依赖性级。如以上描述的那样,指令1未依赖在这一指令序列内的任何其它指令、因此使指令1成为L0依赖性级。然而指令4依赖于指令1、因此使指令4成为L1依赖性级。以这一方式,如图所示将指令序列的指令中的每个指令指派依赖性级。
如以上描述的那样,每个指令的依赖性级由第二级分级调度器用来以确保资源可用于执行依赖指令这样的方式来派发指令。将L0指令加载到第二级调度器1-4处理的指令队列中。如图6中从L0至Ln的依赖性级所示,加载L0指令使得它们在队列中的每个队列前面,加载L1指令使得它们在队列中的每个指令中跟随、L2指令跟随它们并且以此类推。应当指出组编号四(例如从顶部起的第四组)即使它是分离组仍然在L2开始。这是因为指令7依赖指令4,指令4依赖指令1,由此向指令7给予L2依赖性。
以这一方式,图16示出如何在调度器1-4中的给定的调度器上一起调度每三个依赖指令。在第一级组后面调度第二级组,然后旋转该组。
图17示出对根据本发明的一个实施例的三时隙依赖性指令组的分级移动窗口调度进行描绘的流程图。在这一实施例中,经由统一的移动窗口调度器实施用于三时隙依赖性组的分级调度。移动窗口调度器处理队列中的指令以用确保资源可用于执行依赖指令这样的方式来派发指令。如以上描述的那样,向第二级调度器1-4处理的指令队列中加载L0指令。如图17中从L0至Ln的依赖性级所示,加载L0指令使得它们在队列中的每个队列前面,加载L1指令使得它们在队列中的每个队列中跟随,L2指令跟随它并且以此类推。移动窗口图示如何可以从队列中的每个队列派发L0指令,即使它们可以在一个队列而不是另一队列中。以这一方式,移动窗口调度器如图17中所示随着队列从左向右流动而派发指令。
图18示出根据本发明的一个实施例如何向多个计算引擎分配指令的可变大小依赖链(例如可变有界组)。
如图18中描绘的那样,处理器包括指令调度器部件10和多个引擎11-14。该指令调度器部件生成代码块和继承矢量以支持在它们的相应引擎上执行依赖代码块(例如可变有界组)。依赖代码块中的每个依赖代码块可以属于相同逻辑核/线程或者不同逻辑核/线程。指令调度器部件将处理依赖代码块以生成相应继承矢量。这些依赖代码块和相应继承矢量如图所示向特定引擎11-14分配。全局互联30支持跨越引擎11-14中的每个引擎的必需通信。应当指出如以上在图14的讨论中描述的用于指令的依赖性分组以构建依赖指令的可变有界组的功能由图18的实施例的指令调度器部件10实施。
图19示出对根据本发明的一个实施例的向调度队列的块分配和三时隙依赖性指令组的分级移动窗口调度进行描绘的流程图。如以上描述的那样,可以经由统一的移动窗口调度器实施用于三时隙依赖性组的分级调度。图19示出依赖性组如何变成向调度队列中加载的块。在图19的实施例中,可以在每个队列中加载两个独立组作为半块。这在图19的顶部被示出,其中组1形成向第一调度队列中加载的一个半块并且组4形成另一半块。
如以上描述的那样,移动窗口调度器处理队列中的指令以用确保资源可用于执行依赖指令这样的方式分派指令。图19的底部示出如何向第二级调度器处理的指令队列中加载L0指令。
图20示出根据本发明的一个实施例如何在引擎11-14上执行依赖代码块(例如依赖性组或者依赖性链)。如以上描述的那样,指令调度器部件生成代码块和继承矢量以支持在它们的相应引擎上执行依赖代码块(例如可变有界组、三时隙组等)。如以上在图19中描述的那样,图20还示出如何可以向每个引擎中加载两个独立组作为代码块。图20示出如何向引擎11-14派发这些代码块,其中依赖指令在每个引擎的堆叠(例如串联连接)的执行单元上执行。例如在图20的左上部上的第一依赖性组或者代码块中,向引擎11派发指令,其中按照它们的依赖性的顺序在执行单元上堆叠它们,从而在L1上面堆叠L0,在L2上进一步堆叠L1。在这样做时,L0的结果流向L1的执行单元,L1然后可以流向L2的执行。
以这一方式,图20中所示依赖性组可以包括独立组的矩阵,其中每组还包括依赖指令。组独立的益处是有能力并行派发和执行它们以及最小化对于跨越在引擎之间的互连的通信的需要这样的属性。此外,应当指出图11-14中所示执行单元可以包括CPU或者GPU。
根据本发明的实施例,应当理解指令根据它们的依赖性被抽象化成依赖性组或者块或者指令矩阵。根据指令的依赖性对它们进行分组有助于具有更大指令窗口(例如更大输入指令序列)的更简化调度过程。如以上描述的分组去除指令变化并且统一地抽象化这样的变化、由此允许实施简单、同构和统一调度决策做出。以上描述的分组功能增加调度器的吞吐量而未增加调度器的复杂性。例如在用于四个引擎的调度器中,调度器可以派发四组,其中每组具有三个指令。在这样做时,调度器在派发12个指令之时仅处置超定标器复杂性的四个通道。另外,每个块可以包含并行独立组,这进一步增加派发的指令数目。
图21示出根据本发明的一个实施例的多个引擎及其部件的概况图,这些部件包括全局前端提取和调度器和寄存器文件、全局互连以及用于多核处理器的分段式存储器子***。如在图21中描绘的那样,示出四个存储器片段101-104。存储器片段化分级跨越每个存储器分级(例如L1高速缓存、L2高速缓存和加载存储缓冲器)相同。可以通过存储器全局互连110a在L1高速缓存中的每个L1高速缓存、L2高速缓存中的每个L2高速缓存和加载存储缓冲器中的每个加载存储缓冲器之间交换数据。
存储器全局互连包括路由矩阵,该路由矩阵允许多个核(例如地址计算和执行单元121-124)访问可以在片段式高速缓存分级中的任何点(例如L1高速缓存、加载存储缓冲器和L2高速缓存)存储的数据。图21还描绘地址计算和执行单元121-124可以通过存储器全局互连110a来访问片段101-104中的每个片段的方式。
执行全局互连110b相似地包括路由矩阵,该路由矩阵允许多个核(例如地址计算和执行单元121-124)访问可以在分段式寄存器文件中的任何分段式寄存器文件存储的数据。因此,该核具有通过存储器全局互连110a或者执行全局互连110b对在片段中的任何片段中存储的数据和对在分段中的任何分段中存储的数据的访问。
图21还示出全局前端提取和调度器,该全局前端提取和调度器对于整个机器的视图并且管理寄存器文件分段和片段式存储器子***的利用。地址生成包括用于片段定义的基础。全局前端提取和调度器通过向每个片段分配指令序列来工作。
图22示出根据本发明的一个实施例的多个片段、多个片段是公共分区调度器互连和以及进入片段的端口。如图在图22中描绘的那样,示出每个分段具有公共分区调度器。该公共分区调度器通过调度在它的相应分段内的指令来工作。这些指令是从全局前端提取和调度器轮流接收的。在这一实施例中,该公共分区调度器被配置为与全局前端提取和调度器配合工作。还示出该分段具有4个读取写入端口,这些读取写入端口提供对操作数/结果缓冲器、线程式寄存器文件和公共分区或者调度器的读取/写入访问。
在一个实施例中,实施非集中式访问过程用于使用互连,并且本地互连运用保留加法器和门限限制器来控制对每个争用的资源——在这一情况下为进入每个分段的端口——的访问。在这样的一个实施例中,核需要保留必需总线并且保留必需端口。
图23示出根据本发明的一个实施例的示例微处理器流水线2300的图。微处理器流水线2300包括提取模块2301,该提取模块实施如以上描述的用于标识和抽取包括执行的指令的过程的功能。在图23的实施例中,提取模块跟随有解码模块2302、分配模块2303、派发模块2304、执行模块2305和引退模块2306。应当指出微处理器流水线2300仅为实施以上描述的本发明的实施例的功能的流水线的一个示例。本领域技术人员将认识可以实施包括以上描述的解码模块的功能的其它微处理器流水线。
出于说明的目的,前文描述参照具体实施例,其未旨在于穷举或者限制本发明。与以上教导一致的许多修改和变化是可能的。选择和描述实施例以便最好地说明本发明的原理及其实际应用以便使本领域其他技术人员能够借助如可以与本发明和各种实施例的特定使用相适的各种修改最好地利用本发明和各种实施例。

Claims (14)

1.一种用于分级地调度指令的微处理器中的方法,其中,所述微处理器包括多个指令队列,所述方法包括:
访问输入指令序列;
将所述输入指令序列分组为多个依赖性分组;
向所述输入指令序列中的每个指令指派依赖性级;
按照依赖性级的顺序将被分组到所述多个依赖性分组中的相同依赖性分组中的指令加载到所述多个指令队列中的相同指令队列中;以及
分派来自所述多个指令队列的指令。
2.如权利要求1所述的方法,其特征在于,所述多个依赖性分组包括可变大小的依赖性分组。
3.如权利要求1所述的方法,其特征在于,所述多个依赖性分组包括固定大小的依赖性分组。
4.如权利要求1所述的方法,其特征在于,移动窗口调度器处理所述多个指令队列中的指令以分派来自指令队列的指令。
5.如权利要求1所述的方法,其特征在于还包括:
生成继承矢量以支持所述多个依赖性分组的执行;以及
将所述多个依赖性分组和所述继承矢量分配给多个引擎。
6.如权利要求1所述的方法,其特征在于,所述多个依赖性分组中的第一依赖性分组和第二依赖性分组被加载到所述多个指令队列中的一指令队列中作为多个半块。
7.如权利要求6所述的方法,其特征在于,所述第一依赖性分组和所述第二依赖性分组被分派给一引擎,其中,依赖性分组中的指令按依赖性级的顺序被堆叠在所述引擎的一组执行单元上,以允许执行给定指令的结果流至用于执行依赖于所述给定指令的指令的执行单元。
8.一种用于分级地调度指令的微处理器,包括:
多个指令队列;以及
调度器组件,所述调度器组件用于:访问输入指令序列,将所述输入指令序列分组为多个依赖性分组,向所述输入指令序列中的每个指令指派依赖性级,按照依赖性级的顺序将被分组到所述多个依赖性分组中的相同依赖性分组中的指令加载到所述多个指令队列中的相同指令队列中,以及分派来自所述多个指令队列的指令。
9.如权利要求8所述的微处理器,其特征在于,所述多个依赖性分组包括可变大小的依赖性分组。
10.如权利要求8所述的微处理器,其特征在于,所述多个依赖性分组包括固定大小的依赖性分组。
11.如权利要求8所述的微处理器,其特征在于,移动窗口调度器处理所述多个指令队列中的指令以分派来自指令队列的指令。
12.如权利要求8所述的微处理器,其特征在于还包括:
多个引擎,其中,所述调度器组件被配置为用于生成继承矢量以支持所述多个依赖性分组的执行并且将所述多个依赖性分组和所述继承矢量分配给所述多个引擎。
13.如权利要求8所述的微处理器,其特征在于,所述多个依赖性分组中的第一依赖性分组和第二依赖性分组被加载到所述多个指令队列中的一指令队列中作为多个半块。
14.如权利要求13所述的微处理器,其特征在于,所述第一依赖性分组和所述第二依赖性分组被分派给一引擎,其中,依赖性分组中的指令按依赖性级的顺序被堆叠在所述引擎的一组执行单元上,以允许执行给定指令的结果流至用于执行依赖于所述给定指令的指令的执行单元。
CN201810449173.XA 2011-11-22 2011-11-22 微处理器加速的代码优化器 Active CN108427574B (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810449173.XA CN108427574B (zh) 2011-11-22 2011-11-22 微处理器加速的代码优化器

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
PCT/US2011/061957 WO2013077876A1 (en) 2011-11-22 2011-11-22 A microprocessor accelerated code optimizer
CN201810449173.XA CN108427574B (zh) 2011-11-22 2011-11-22 微处理器加速的代码优化器
CN201180076248.0A CN104040491B (zh) 2011-11-22 2011-11-22 微处理器加速的代码优化器

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201180076248.0A Division CN104040491B (zh) 2011-11-22 2011-11-22 微处理器加速的代码优化器

Publications (2)

Publication Number Publication Date
CN108427574A true CN108427574A (zh) 2018-08-21
CN108427574B CN108427574B (zh) 2022-06-07

Family

ID=48470172

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201180076248.0A Active CN104040491B (zh) 2011-11-22 2011-11-22 微处理器加速的代码优化器
CN201810449173.XA Active CN108427574B (zh) 2011-11-22 2011-11-22 微处理器加速的代码优化器

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201180076248.0A Active CN104040491B (zh) 2011-11-22 2011-11-22 微处理器加速的代码优化器

Country Status (5)

Country Link
US (2) US20150039859A1 (zh)
EP (1) EP2783281B1 (zh)
KR (2) KR101832679B1 (zh)
CN (2) CN104040491B (zh)
WO (1) WO2013077876A1 (zh)

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646009B (zh) 2006-04-12 2016-08-17 索夫特机械公司 对载明并行和依赖运算的指令矩阵进行处理的装置和方法
EP2523101B1 (en) 2006-11-14 2014-06-04 Soft Machines, Inc. Apparatus and method for processing complex instruction formats in a multi- threaded architecture supporting various context switch modes and virtualization schemes
WO2012037491A2 (en) 2010-09-17 2012-03-22 Soft Machines, Inc. Single cycle multi-branch prediction including shadow cache for early far branch prediction
EP2689326B1 (en) 2011-03-25 2022-11-16 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
KR101638225B1 (ko) 2011-03-25 2016-07-08 소프트 머신즈, 인크. 분할가능한 엔진에 의해 인스턴스화된 가상 코어를 이용한 명령어 시퀀스 코드 블록의 실행
EP2689330B1 (en) 2011-03-25 2022-12-21 Intel Corporation Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
KR101639853B1 (ko) 2011-05-20 2016-07-14 소프트 머신즈, 인크. 복수의 엔진에 의해 명령어 시퀀스들의 실행을 지원하기 위한 자원들 및 상호접속 구조들의 비집중 할당
US9442772B2 (en) 2011-05-20 2016-09-13 Soft Machines Inc. Global and local interconnect structure comprising routing matrix to support the execution of instruction sequences by a plurality of engines
CN104040491B (zh) 2011-11-22 2018-06-12 英特尔公司 微处理器加速的代码优化器
US10191746B2 (en) 2011-11-22 2019-01-29 Intel Corporation Accelerated code optimizer for a multiengine microprocessor
US10270709B2 (en) 2015-06-26 2019-04-23 Microsoft Technology Licensing, Llc Allocating acceleration component functionality for supporting services
US9886279B2 (en) 2013-03-15 2018-02-06 Intel Corporation Method for populating and instruction view data structure by using register template snapshots
US10140138B2 (en) 2013-03-15 2018-11-27 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
WO2014150971A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for dependency broadcasting through a block organized source view data structure
WO2014150806A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for populating register view data structure by using register template snapshots
US9891924B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
WO2014151018A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for executing multithreaded instructions grouped onto blocks
WO2014150991A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for implementing a reduced size register view data structure in a microprocessor
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
US9569216B2 (en) 2013-03-15 2017-02-14 Soft Machines, Inc. Method for populating a source view data structure by using register template snapshots
US10275255B2 (en) 2013-03-15 2019-04-30 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
EP2972836B1 (en) 2013-03-15 2022-11-09 Intel Corporation A method for emulating a guest centralized flag architecture by using a native distributed flag architecture
US9904625B2 (en) 2013-03-15 2018-02-27 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
GB2514618B (en) * 2013-05-31 2020-11-11 Advanced Risc Mach Ltd Data processing systems
US9733909B2 (en) * 2014-07-25 2017-08-15 Intel Corporation System converter that implements a reordering process through JIT (just in time) optimization that ensures loads do not dispatch ahead of other loads that are to the same address
US10353680B2 (en) * 2014-07-25 2019-07-16 Intel Corporation System converter that implements a run ahead run time guest instruction conversion/decoding process and a prefetching process where guest code is pre-fetched from the target of guest branches in an instruction sequence
US9823939B2 (en) 2014-07-25 2017-11-21 Intel Corporation System for an instruction set agnostic runtime architecture
US11281481B2 (en) 2014-07-25 2022-03-22 Intel Corporation Using a plurality of conversion tables to implement an instruction set agnostic runtime architecture
US20160026486A1 (en) * 2014-07-25 2016-01-28 Soft Machines, Inc. An allocation and issue stage for reordering a microinstruction sequence into an optimized microinstruction sequence to implement an instruction set agnostic runtime architecture
CN104699466B (zh) * 2015-03-26 2017-07-18 中国人民解放军国防科学技术大学 一种面向vliw体系结构的多元启发式指令选择方法
US9983938B2 (en) * 2015-04-17 2018-05-29 Microsoft Technology Licensing, Llc Locally restoring functionality at acceleration components
US9792154B2 (en) 2015-04-17 2017-10-17 Microsoft Technology Licensing, Llc Data processing system having a hardware acceleration plane and a software plane
US10198294B2 (en) 2015-04-17 2019-02-05 Microsoft Licensing Technology, LLC Handling tenant requests in a system that uses hardware acceleration components
US10216555B2 (en) 2015-06-26 2019-02-26 Microsoft Technology Licensing, Llc Partially reconfiguring acceleration components
KR20180038793A (ko) * 2016-10-07 2018-04-17 삼성전자주식회사 영상 데이터 처리 방법 및 장치
US10884751B2 (en) * 2018-07-13 2021-01-05 Advanced Micro Devices, Inc. Method and apparatus for virtualizing the micro-op cache
GB2577738B (en) * 2018-10-05 2021-02-24 Advanced Risc Mach Ltd An apparatus and method for providing decoded instructions

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030014613A1 (en) * 1998-08-31 2003-01-16 Naresh H. Soni Reservation stations to increase instruction level parallelism
US20080082469A1 (en) * 2006-09-20 2008-04-03 Chevron U.S.A. Inc. Method for forecasting the production of a petroleum reservoir utilizing genetic programming
CN101201734A (zh) * 2006-12-13 2008-06-18 国际商业机器公司 预解码用于执行的指令的方法及装置
CN101201733A (zh) * 2006-12-13 2008-06-18 国际商业机器公司 预解码用于执行的指令的方法及装置
CN101217495A (zh) * 2008-01-11 2008-07-09 北京邮电大学 用于t-mpls网络环境下的流量监控方法和装置
CN102066419A (zh) * 2008-04-28 2011-05-18 健泰科生物技术公司 人源化抗因子d抗体及其用途

Family Cites Families (489)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US727487A (en) 1902-10-21 1903-05-05 Swan F Swanson Dumping-car.
US4075704A (en) 1976-07-02 1978-02-21 Floating Point Systems, Inc. Floating point data processor for high speech operation
US4228496A (en) 1976-09-07 1980-10-14 Tandem Computers Incorporated Multiprocessor system
US4245344A (en) 1979-04-02 1981-01-13 Rockwell International Corporation Processing system with dual buses
US4527237A (en) 1979-10-11 1985-07-02 Nanodata Computer Corporation Data processing system
US4414624A (en) 1980-11-19 1983-11-08 The United States Of America As Represented By The Secretary Of The Navy Multiple-microcomputer processing
US4524415A (en) 1982-12-07 1985-06-18 Motorola, Inc. Virtual machine data processor
US4597061B1 (en) 1983-01-03 1998-06-09 Texas Instruments Inc Memory system using pipleline circuitry for improved system
US4577273A (en) 1983-06-06 1986-03-18 Sperry Corporation Multiple microcomputer system for digital computers
US4682281A (en) 1983-08-30 1987-07-21 Amdahl Corporation Data storage unit employing translation lookaside buffer pointer
US4633434A (en) 1984-04-02 1986-12-30 Sperry Corporation High performance storage unit
US4600986A (en) 1984-04-02 1986-07-15 Sperry Corporation Pipelined split stack with high performance interleaved decode
JPS6140643A (ja) 1984-07-31 1986-02-26 Hitachi Ltd システムの資源割当て制御方式
US4835680A (en) 1985-03-15 1989-05-30 Xerox Corporation Adaptive processor array capable of learning variable associations useful in recognizing classes of inputs
JPS6289149A (ja) 1985-10-15 1987-04-23 Agency Of Ind Science & Technol 多ポ−トメモリシステム
JPH0658650B2 (ja) 1986-03-14 1994-08-03 株式会社日立製作所 仮想計算機システム
US4920477A (en) 1987-04-20 1990-04-24 Multiflow Computer, Inc. Virtual address table look aside buffer miss recovery method and apparatus
US4943909A (en) 1987-07-08 1990-07-24 At&T Bell Laboratories Computational origami
US5339398A (en) 1989-07-31 1994-08-16 North American Philips Corporation Memory architecture and method of data organization optimized for hashing
US5471593A (en) 1989-12-11 1995-11-28 Branigin; Michael H. Computer processor with an efficient means of executing many instructions simultaneously
US5197130A (en) 1989-12-29 1993-03-23 Supercomputer Systems Limited Partnership Cluster architecture for a highly parallel scalar/vector multiprocessor system
US5317754A (en) 1990-10-23 1994-05-31 International Business Machines Corporation Method and apparatus for enabling an interpretive execution subset
US5317705A (en) 1990-10-24 1994-05-31 International Business Machines Corporation Apparatus and method for TLB purge reduction in a multi-level machine system
US6282583B1 (en) 1991-06-04 2001-08-28 Silicon Graphics, Inc. Method and apparatus for memory access in a matrix processor computer
US5539911A (en) 1991-07-08 1996-07-23 Seiko Epson Corporation High-performance, superscalar-based computer system with out-of-order instruction execution
JPH0820949B2 (ja) 1991-11-26 1996-03-04 松下電器産業株式会社 情報処理装置
JPH07502358A (ja) 1991-12-23 1995-03-09 インテル・コーポレーション マイクロプロセッサーのクロックに依るマルチプル・アクセスのためのインターリーブ・キャッシュ
KR100309566B1 (ko) 1992-04-29 2001-12-15 리패치 파이프라인프로세서에서다중명령어를무리짓고,그룹화된명령어를동시에발행하고,그룹화된명령어를실행시키는방법및장치
KR950701437A (ko) 1992-05-01 1995-03-23 요시오 야마자끼 슈퍼스칼라 마이크로프로세서에서의 명령어 회수를 위한 시스템 및 방법
DE69329260T2 (de) 1992-06-25 2001-02-22 Canon K.K., Tokio/Tokyo Gerät zum Multiplizieren von Ganzzahlen mit vielen Ziffern
JPH0637202A (ja) 1992-07-20 1994-02-10 Mitsubishi Electric Corp マイクロ波ic用パッケージ
JPH06110781A (ja) 1992-09-30 1994-04-22 Nec Corp キャッシュメモリ装置
US5493660A (en) 1992-10-06 1996-02-20 Hewlett-Packard Company Software assisted hardware TLB miss handler
US5513335A (en) 1992-11-02 1996-04-30 Sgs-Thomson Microelectronics, Inc. Cache tag memory having first and second single-port arrays and a dual-port array
US5819088A (en) 1993-03-25 1998-10-06 Intel Corporation Method and apparatus for scheduling instructions for execution on a multi-issue architecture computer
JPH0784883A (ja) 1993-09-17 1995-03-31 Hitachi Ltd 仮想計算機システムのアドレス変換バッファパージ方法
US6948172B1 (en) 1993-09-21 2005-09-20 Microsoft Corporation Preemptive multi-tasking with cooperative groups of tasks
US5469376A (en) 1993-10-14 1995-11-21 Abdallah; Mohammad A. F. F. Digital circuit for the evaluation of mathematical expressions
US5517651A (en) * 1993-12-29 1996-05-14 Intel Corporation Method and apparatus for loading a segment register in a microprocessor capable of operating in multiple modes
US5956753A (en) 1993-12-30 1999-09-21 Intel Corporation Method and apparatus for handling speculative memory access operations
US5761476A (en) 1993-12-30 1998-06-02 Intel Corporation Non-clocked early read for back-to-back scheduling of instructions
JP3048498B2 (ja) 1994-04-13 2000-06-05 株式会社東芝 半導体記憶装置
JPH07287668A (ja) 1994-04-19 1995-10-31 Hitachi Ltd データ処理装置
CN1084005C (zh) 1994-06-27 2002-05-01 国际商业机器公司 用于动态控制地址空间分配的方法和设备
US5548742A (en) 1994-08-11 1996-08-20 Intel Corporation Method and apparatus for combining a direct-mapped cache and a multiple-way cache in a cache memory
US5813031A (en) 1994-09-21 1998-09-22 Industrial Technology Research Institute Caching tag for a large scale cache computer memory system
US5640534A (en) 1994-10-05 1997-06-17 International Business Machines Corporation Method and system for concurrent access in a data cache array utilizing multiple match line selection paths
US5835951A (en) 1994-10-18 1998-11-10 National Semiconductor Branch processing unit with target cache read prioritization protocol for handling multiple hits
JP3569014B2 (ja) 1994-11-25 2004-09-22 富士通株式会社 マルチコンテキストをサポートするプロセッサおよび処理方法
US5724565A (en) 1995-02-03 1998-03-03 International Business Machines Corporation Method and system for processing first and second sets of instructions by first and second types of processing systems
US5655115A (en) 1995-02-14 1997-08-05 Hal Computer Systems, Inc. Processor structure and method for watchpoint of plural simultaneous unresolved branch evaluation
US5675759A (en) 1995-03-03 1997-10-07 Shebanow; Michael C. Method and apparatus for register management using issue sequence prior physical register and register association validity information
US5751982A (en) 1995-03-31 1998-05-12 Apple Computer, Inc. Software emulation system with dynamic translation of emulated instructions for increased processing speed
US5634068A (en) 1995-03-31 1997-05-27 Sun Microsystems, Inc. Packet switched cache coherent multiprocessor system
US6209085B1 (en) 1995-05-05 2001-03-27 Intel Corporation Method and apparatus for performing process switching in multiprocessor computer systems
US6643765B1 (en) 1995-08-16 2003-11-04 Microunity Systems Engineering, Inc. Programmable processor with group floating point operations
US5710902A (en) * 1995-09-06 1998-01-20 Intel Corporation Instruction dependency chain indentifier
US6341324B1 (en) 1995-10-06 2002-01-22 Lsi Logic Corporation Exception processing in superscalar microprocessor
US5864657A (en) 1995-11-29 1999-01-26 Texas Micro, Inc. Main memory system and checkpointing protocol for fault-tolerant computer system
US5983327A (en) 1995-12-01 1999-11-09 Nortel Networks Corporation Data path architecture and arbitration scheme for providing access to a shared system resource
US5793941A (en) 1995-12-04 1998-08-11 Advanced Micro Devices, Inc. On-chip primary cache testing circuit and test method
US5911057A (en) 1995-12-19 1999-06-08 Texas Instruments Incorporated Superscalar microprocessor having combined register and memory renaming circuits, systems, and methods
US5699537A (en) * 1995-12-22 1997-12-16 Intel Corporation Processor microarchitecture for efficient dynamic scheduling and execution of chains of dependent instructions
US6882177B1 (en) 1996-01-10 2005-04-19 Altera Corporation Tristate structures for programmable logic devices
US5752796A (en) 1996-01-24 1998-05-19 Muka; Richard S. Vacuum integrated SMIF system
US5754818A (en) 1996-03-22 1998-05-19 Sun Microsystems, Inc. Architecture and method for sharing TLB entries through process IDS
US5904892A (en) 1996-04-01 1999-05-18 Saint-Gobain/Norton Industrial Ceramics Corp. Tape cast silicon carbide dummy wafer
US5752260A (en) 1996-04-29 1998-05-12 International Business Machines Corporation High-speed, multiple-port, interleaved cache with arbitration of multiple access addresses
US5806085A (en) 1996-05-01 1998-09-08 Sun Microsystems, Inc. Method for non-volatile caching of network and CD-ROM file accesses using a cache directory, pointers, file name conversion, a local hard disk, and separate small database
US5829028A (en) 1996-05-06 1998-10-27 Advanced Micro Devices, Inc. Data cache configured to store data in a use-once manner
US6108769A (en) 1996-05-17 2000-08-22 Advanced Micro Devices, Inc. Dependency table for reducing dependency checking hardware
US5881277A (en) 1996-06-13 1999-03-09 Texas Instruments Incorporated Pipelined microprocessor with branch misprediction cache circuits, systems and methods
US5860146A (en) 1996-06-25 1999-01-12 Sun Microsystems, Inc. Auxiliary translation lookaside buffer for assisting in accessing data in remote address spaces
US5903760A (en) 1996-06-27 1999-05-11 Intel Corporation Method and apparatus for translating a conditional instruction compatible with a first instruction set architecture (ISA) into a conditional instruction compatible with a second ISA
US5974506A (en) 1996-06-28 1999-10-26 Digital Equipment Corporation Enabling mirror, nonmirror and partial mirror cache modes in a dual cache system
US6167490A (en) 1996-09-20 2000-12-26 University Of Washington Using global memory information to manage memory in a computer network
KR19980032776A (ko) 1996-10-16 1998-07-25 가나이 츠토무 데이타 프로세서 및 데이타 처리시스템
DE69727127T2 (de) 1996-11-04 2004-11-25 Koninklijke Philips Electronics N.V. Verarbeitungsgerät zum lesen von befehlen aus einem speicher
US6385715B1 (en) 1996-11-13 2002-05-07 Intel Corporation Multi-threading for a processor utilizing a replay queue
US5978906A (en) 1996-11-19 1999-11-02 Advanced Micro Devices, Inc. Branch selectors associated with byte ranges within an instruction cache for rapidly identifying branch predictions
US6253316B1 (en) 1996-11-19 2001-06-26 Advanced Micro Devices, Inc. Three state branch history using one bit in a branch prediction mechanism
US5903750A (en) 1996-11-20 1999-05-11 Institute For The Development Of Emerging Architectures, L.L.P. Dynamic branch prediction for branch instructions with multiple targets
US6212542B1 (en) 1996-12-16 2001-04-03 International Business Machines Corporation Method and system for executing a program within a multiscalar processor by processing linked thread descriptors
US6134634A (en) 1996-12-20 2000-10-17 Texas Instruments Incorporated Method and apparatus for preemptive cache write-back
US5918251A (en) 1996-12-23 1999-06-29 Intel Corporation Method and apparatus for preloading different default address translation attributes
US6016540A (en) 1997-01-08 2000-01-18 Intel Corporation Method and apparatus for scheduling instructions in waves
US6065105A (en) * 1997-01-08 2000-05-16 Intel Corporation Dependency matrix
US5802602A (en) 1997-01-17 1998-09-01 Intel Corporation Method and apparatus for performing reads of related data from a set-associative cache memory
US6088780A (en) 1997-03-31 2000-07-11 Institute For The Development Of Emerging Architecture, L.L.C. Page table walker that uses at least one of a default page size and a page size selected for a virtual address space to position a sliding field in a virtual address
US6314511B2 (en) 1997-04-03 2001-11-06 University Of Washington Mechanism for freeing registers on processors that perform dynamic out-of-order execution of instructions using renaming registers
US6075938A (en) 1997-06-10 2000-06-13 The Board Of Trustees Of The Leland Stanford Junior University Virtual machine monitors for scalable multiprocessors
US6073230A (en) 1997-06-11 2000-06-06 Advanced Micro Devices, Inc. Instruction fetch unit configured to provide sequential way prediction for sequential instruction fetches
JPH1124929A (ja) 1997-06-30 1999-01-29 Sony Corp 演算処理装置およびその方法
US6128728A (en) 1997-08-01 2000-10-03 Micron Technology, Inc. Virtual shadow registers and virtual register windows
US6170051B1 (en) 1997-08-01 2001-01-02 Micron Technology, Inc. Apparatus and method for program level parallelism in a VLIW processor
US6085315A (en) 1997-09-12 2000-07-04 Siemens Aktiengesellschaft Data processing device with loop pipeline
US6101577A (en) 1997-09-15 2000-08-08 Advanced Micro Devices, Inc. Pipelined instruction cache and branch prediction mechanism therefor
US5901294A (en) 1997-09-18 1999-05-04 International Business Machines Corporation Method and system for bus arbitration in a multiprocessor system utilizing simultaneous variable-width bus access
US6185660B1 (en) 1997-09-23 2001-02-06 Hewlett-Packard Company Pending access queue for providing data to a target register during an intermediate pipeline phase after a computer cache miss
US5905509A (en) 1997-09-30 1999-05-18 Compaq Computer Corp. Accelerated Graphics Port two level Gart cache having distributed first level caches
US6226732B1 (en) 1997-10-02 2001-05-01 Hitachi Micro Systems, Inc. Memory system architecture
US5922065A (en) 1997-10-13 1999-07-13 Institute For The Development Of Emerging Architectures, L.L.C. Processor utilizing a template field for encoding instruction sequences in a wide-word format
US6178482B1 (en) 1997-11-03 2001-01-23 Brecis Communications Virtual register sets
US6021484A (en) 1997-11-14 2000-02-01 Samsung Electronics Co., Ltd. Dual instruction set architecture
US6256728B1 (en) 1997-11-17 2001-07-03 Advanced Micro Devices, Inc. Processor configured to selectively cancel instructions from its pipeline responsive to a predicted-taken short forward branch instruction
US6260131B1 (en) 1997-11-18 2001-07-10 Intrinsity, Inc. Method and apparatus for TLB memory ordering
US6016533A (en) 1997-12-16 2000-01-18 Advanced Micro Devices, Inc. Way prediction logic for cache array
US6219776B1 (en) 1998-03-10 2001-04-17 Billions Of Operations Per Second Merged array controller and processing element
US6609189B1 (en) 1998-03-12 2003-08-19 Yale University Cycle segmented prefix circuits
JP3657424B2 (ja) 1998-03-20 2005-06-08 松下電器産業株式会社 番組情報を放送するセンター装置と端末装置
US6216215B1 (en) 1998-04-02 2001-04-10 Intel Corporation Method and apparatus for senior loads
US6157998A (en) 1998-04-03 2000-12-05 Motorola Inc. Method for performing branch prediction and resolution of two or more branch instructions within two or more branch prediction buffers
US6205545B1 (en) 1998-04-30 2001-03-20 Hewlett-Packard Company Method and apparatus for using static branch predictions hints with dynamically translated code traces to improve performance
US6115809A (en) 1998-04-30 2000-09-05 Hewlett-Packard Company Compiling strong and weak branching behavior instruction blocks to separate caches for dynamic and static prediction
US6256727B1 (en) 1998-05-12 2001-07-03 International Business Machines Corporation Method and system for fetching noncontiguous instructions in a single clock cycle
JPH11338710A (ja) 1998-05-28 1999-12-10 Toshiba Corp 複数種の命令セットを持つプロセッサのためのコンパイル方法ならびに装置および同方法がプログラムされ記録される記録媒体
US6272616B1 (en) 1998-06-17 2001-08-07 Agere Systems Guardian Corp. Method and apparatus for executing multiple instruction streams in a digital processor with multiple data paths
US6988183B1 (en) * 1998-06-26 2006-01-17 Derek Chi-Lan Wong Methods for increasing instruction-level parallelism in microprocessors and digital system
US6260138B1 (en) 1998-07-17 2001-07-10 Sun Microsystems, Inc. Method and apparatus for branch instruction processing in a processor
US6122656A (en) 1998-07-31 2000-09-19 Advanced Micro Devices, Inc. Processor configured to map logical register numbers to physical register numbers using virtual register numbers
US6272662B1 (en) 1998-08-04 2001-08-07 International Business Machines Corporation Distributed storage system using front-end and back-end locking
JP2000057054A (ja) 1998-08-12 2000-02-25 Fujitsu Ltd 高速アドレス変換システム
US8631066B2 (en) 1998-09-10 2014-01-14 Vmware, Inc. Mechanism for providing virtual machines for use by multiple users
US6339822B1 (en) 1998-10-02 2002-01-15 Advanced Micro Devices, Inc. Using padded instructions in a block-oriented cache
US6332189B1 (en) 1998-10-16 2001-12-18 Intel Corporation Branch prediction architecture
GB9825102D0 (en) 1998-11-16 1999-01-13 Insignia Solutions Plc Computer system
JP3110404B2 (ja) 1998-11-18 2000-11-20 甲府日本電気株式会社 マイクロプロセッサ装置及びそのソフトウェア命令高速化方法並びにその制御プログラムを記録した記録媒体
US6490673B1 (en) 1998-11-27 2002-12-03 Matsushita Electric Industrial Co., Ltd Processor, compiling apparatus, and compile program recorded on a recording medium
US6519682B2 (en) 1998-12-04 2003-02-11 Stmicroelectronics, Inc. Pipelined non-blocking level two cache system with inherent transaction collision-avoidance
US6477562B2 (en) 1998-12-16 2002-11-05 Clearwater Networks, Inc. Prioritized instruction scheduling for multi-streaming processors
US7020879B1 (en) 1998-12-16 2006-03-28 Mips Technologies, Inc. Interrupt and exception handling for multi-streaming digital processors
US6247097B1 (en) 1999-01-22 2001-06-12 International Business Machines Corporation Aligned instruction cache handling of instruction fetches across multiple predicted branch instructions
US6321298B1 (en) 1999-01-25 2001-11-20 International Business Machines Corporation Full cache coherency across multiple raid controllers
JP3842474B2 (ja) 1999-02-02 2006-11-08 株式会社ルネサステクノロジ データ処理装置
US6327650B1 (en) 1999-02-12 2001-12-04 Vsli Technology, Inc. Pipelined multiprocessing with upstream processor concurrently writing to local register and to register of downstream processor
US6732220B2 (en) 1999-02-17 2004-05-04 Elbrus International Method for emulating hardware features of a foreign architecture in a host operating system environment
US6668316B1 (en) 1999-02-17 2003-12-23 Elbrus International Limited Method and apparatus for conflict-free execution of integer and floating-point operations with a common register file
US6418530B2 (en) 1999-02-18 2002-07-09 Hewlett-Packard Company Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions
US6437789B1 (en) 1999-02-19 2002-08-20 Evans & Sutherland Computer Corporation Multi-level cache controller
US6850531B1 (en) 1999-02-23 2005-02-01 Alcatel Multi-service network switch
US6212613B1 (en) 1999-03-22 2001-04-03 Cisco Technology, Inc. Methods and apparatus for reusing addresses in a computer
US6529928B1 (en) 1999-03-23 2003-03-04 Silicon Graphics, Inc. Floating-point adder performing floating-point and integer operations
EP1050808B1 (en) 1999-05-03 2008-04-30 STMicroelectronics S.A. Computer instruction scheduling
US6449671B1 (en) 1999-06-09 2002-09-10 Ati International Srl Method and apparatus for busing data elements
US6473833B1 (en) 1999-07-30 2002-10-29 International Business Machines Corporation Integrated cache and directory structure for multi-level caches
US6643770B1 (en) 1999-09-16 2003-11-04 Intel Corporation Branch misprediction recovery using a side memory
US6772325B1 (en) * 1999-10-01 2004-08-03 Hitachi, Ltd. Processor architecture and operation for exploiting improved branch control instruction
US6704822B1 (en) 1999-10-01 2004-03-09 Sun Microsystems, Inc. Arbitration protocol for a shared data cache
US6457120B1 (en) 1999-11-01 2002-09-24 International Business Machines Corporation Processor and method including a cache having confirmation bits for improving address predictable branch instruction target predictions
US7441110B1 (en) 1999-12-10 2008-10-21 International Business Machines Corporation Prefetching using future branch path information derived from branch prediction
US7107434B2 (en) 1999-12-20 2006-09-12 Board Of Regents, The University Of Texas System, method and apparatus for allocating hardware resources using pseudorandom sequences
AU2597401A (en) 1999-12-22 2001-07-03 Ubicom, Inc. System and method for instruction level multithreading in an embedded processor using zero-time context switching
US6557095B1 (en) 1999-12-27 2003-04-29 Intel Corporation Scheduling operations using a dependency matrix
US6542984B1 (en) 2000-01-03 2003-04-01 Advanced Micro Devices, Inc. Scheduler capable of issuing and reissuing dependency chains
CN1210649C (zh) 2000-01-03 2005-07-13 先进微装置公司 能够发送及重新发送附属链接的排程器、包括该排程器的处理器以及排程方法
US6594755B1 (en) 2000-01-04 2003-07-15 National Semiconductor Corporation System and method for interleaved execution of multiple independent threads
US6728872B1 (en) 2000-02-04 2004-04-27 International Business Machines Corporation Method and apparatus for verifying that instructions are pipelined in correct architectural sequence
GB0002848D0 (en) 2000-02-08 2000-03-29 Siroyan Limited Communicating instruction results in processors and compiling methods for processors
GB2365661A (en) 2000-03-10 2002-02-20 British Telecomm Allocating switch requests within a packet switch
US6615340B1 (en) 2000-03-22 2003-09-02 Wilmot, Ii Richard Byron Extended operand management indicator structure and method
US6604187B1 (en) 2000-06-19 2003-08-05 Advanced Micro Devices, Inc. Providing global translations with address space numbers
US6557083B1 (en) 2000-06-30 2003-04-29 Intel Corporation Memory system for multiple data types
US6704860B1 (en) 2000-07-26 2004-03-09 International Business Machines Corporation Data processing system and method for fetching instruction blocks in response to a detected block sequence
US7206925B1 (en) 2000-08-18 2007-04-17 Sun Microsystems, Inc. Backing Register File for processors
US6728866B1 (en) * 2000-08-31 2004-04-27 International Business Machines Corporation Partitioned issue queue and allocation strategy
US6721874B1 (en) 2000-10-12 2004-04-13 International Business Machines Corporation Method and system for dynamically shared completion table supporting multiple threads in a processing system
US7757065B1 (en) 2000-11-09 2010-07-13 Intel Corporation Instruction segment recording scheme
JP2002185513A (ja) 2000-12-18 2002-06-28 Hitachi Ltd パケット通信ネットワークおよびパケット転送制御方法
US6907600B2 (en) 2000-12-27 2005-06-14 Intel Corporation Virtual translation lookaside buffer
US6877089B2 (en) 2000-12-27 2005-04-05 International Business Machines Corporation Branch prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program
US6647466B2 (en) 2001-01-25 2003-11-11 Hewlett-Packard Development Company, L.P. Method and apparatus for adaptively bypassing one or more levels of a cache hierarchy
FR2820921A1 (fr) 2001-02-14 2002-08-16 Canon Kk Dispositif et procede de transmission dans un commutateur
US6985951B2 (en) 2001-03-08 2006-01-10 International Business Machines Corporation Inter-partition message passing method, system and program product for managing workload in a partitioned processing environment
US6950927B1 (en) 2001-04-13 2005-09-27 The United States Of America As Represented By The Secretary Of The Navy System and method for instruction-level parallelism in a programmable multiple network processor environment
US7707397B2 (en) 2001-05-04 2010-04-27 Via Technologies, Inc. Variable group associativity branch target address cache delivering multiple target addresses per cache line
US7200740B2 (en) 2001-05-04 2007-04-03 Ip-First, Llc Apparatus and method for speculatively performing a return instruction in a microprocessor
US6658549B2 (en) 2001-05-22 2003-12-02 Hewlett-Packard Development Company, Lp. Method and system allowing a single entity to manage memory comprising compressed and uncompressed data
US6985591B2 (en) 2001-06-29 2006-01-10 Intel Corporation Method and apparatus for distributing keys for decrypting and re-encrypting publicly distributed media
US7203824B2 (en) 2001-07-03 2007-04-10 Ip-First, Llc Apparatus and method for handling BTAC branches that wrap across instruction cache lines
US7024545B1 (en) 2001-07-24 2006-04-04 Advanced Micro Devices, Inc. Hybrid branch prediction device with two levels of branch prediction cache
US6954846B2 (en) 2001-08-07 2005-10-11 Sun Microsystems, Inc. Microprocessor and method for giving each thread exclusive access to one register file in a multi-threading mode and for giving an active thread access to multiple register files in a single thread mode
US6718440B2 (en) 2001-09-28 2004-04-06 Intel Corporation Memory access latency hiding with hint buffer
US7150021B1 (en) 2001-10-12 2006-12-12 Palau Acquisition Corporation (Delaware) Method and system to allocate resources within an interconnect device according to a resource allocation table
US7117347B2 (en) 2001-10-23 2006-10-03 Ip-First, Llc Processor including fallback branch prediction mechanism for far jump and far call instructions
US7272832B2 (en) 2001-10-25 2007-09-18 Hewlett-Packard Development Company, L.P. Method of protecting user process data in a secure platform inaccessible to the operating system and other tasks on top of the secure platform
US6964043B2 (en) 2001-10-30 2005-11-08 Intel Corporation Method, apparatus, and system to optimize frequently executed code and to use compiler transformation and hardware support to handle infrequently executed code
GB2381886B (en) 2001-11-07 2004-06-23 Sun Microsystems Inc Computer system with virtual memory and paging mechanism
US7092869B2 (en) 2001-11-14 2006-08-15 Ronald Hilton Memory address prediction under emulation
US7363467B2 (en) * 2002-01-03 2008-04-22 Intel Corporation Dependence-chain processing using trace descriptors having dependency descriptors
US6640333B2 (en) 2002-01-10 2003-10-28 Lsi Logic Corporation Architecture for a sea of platforms
US7055021B2 (en) 2002-02-05 2006-05-30 Sun Microsystems, Inc. Out-of-order processor that reduces mis-speculation using a replay scoreboard
US7331040B2 (en) 2002-02-06 2008-02-12 Transitive Limted Condition code flag emulation for program code conversion
US20030154363A1 (en) 2002-02-11 2003-08-14 Soltis Donald C. Stacked register aliasing in data hazard detection to reduce circuit
US6839816B2 (en) 2002-02-26 2005-01-04 International Business Machines Corporation Shared cache line update mechanism
US6731292B2 (en) 2002-03-06 2004-05-04 Sun Microsystems, Inc. System and method for controlling a number of outstanding data transactions within an integrated circuit
JP3719509B2 (ja) 2002-04-01 2005-11-24 株式会社ソニー・コンピュータエンタテインメント シリアル演算パイプライン、演算装置、算術論理演算回路およびシリアル演算パイプラインによる演算方法
US7565509B2 (en) 2002-04-17 2009-07-21 Microsoft Corporation Using limits on address translation to control access to an addressable entity
US6920530B2 (en) 2002-04-23 2005-07-19 Sun Microsystems, Inc. Scheme for reordering instructions via an instruction caching mechanism
US7113488B2 (en) 2002-04-24 2006-09-26 International Business Machines Corporation Reconfigurable circular bus
US7281055B2 (en) 2002-05-28 2007-10-09 Newisys, Inc. Routing mechanisms in systems having multiple multi-processor clusters
US7117346B2 (en) 2002-05-31 2006-10-03 Freescale Semiconductor, Inc. Data processing system having multiple register contexts and method therefor
US6938151B2 (en) 2002-06-04 2005-08-30 International Business Machines Corporation Hybrid branch prediction using a global selection counter and a prediction method comparison table
US6735747B2 (en) 2002-06-10 2004-05-11 Lsi Logic Corporation Pre-silicon verification path coverage
US8024735B2 (en) 2002-06-14 2011-09-20 Intel Corporation Method and apparatus for ensuring fairness and forward progress when executing multiple threads of execution
JP3845043B2 (ja) 2002-06-28 2006-11-15 富士通株式会社 命令フェッチ制御装置
JP3982353B2 (ja) 2002-07-12 2007-09-26 日本電気株式会社 フォルトトレラントコンピュータ装置、その再同期化方法及び再同期化プログラム
US6944744B2 (en) 2002-08-27 2005-09-13 Advanced Micro Devices, Inc. Apparatus and method for independently schedulable functional units with issue lock mechanism in a processor
US7546422B2 (en) 2002-08-28 2009-06-09 Intel Corporation Method and apparatus for the synchronization of distributed caches
US6950925B1 (en) 2002-08-28 2005-09-27 Advanced Micro Devices, Inc. Scheduler for use in a microprocessor that supports data-speculative execution
TW200408242A (en) 2002-09-06 2004-05-16 Matsushita Electric Ind Co Ltd Home terminal apparatus and communication system
US6895491B2 (en) 2002-09-26 2005-05-17 Hewlett-Packard Development Company, L.P. Memory addressing for a virtual machine implementation on a computer processor supporting virtual hash-page-table searching
US7334086B2 (en) 2002-10-08 2008-02-19 Rmi Corporation Advanced processor with system on a chip interconnect technology
US6829698B2 (en) 2002-10-10 2004-12-07 International Business Machines Corporation Method, apparatus and system for acquiring a global promotion facility utilizing a data-less transaction
US7213248B2 (en) 2002-10-10 2007-05-01 International Business Machines Corporation High speed promotion mechanism suitable for lock acquisition in a multiprocessor data processing system
US7222218B2 (en) 2002-10-22 2007-05-22 Sun Microsystems, Inc. System and method for goal-based scheduling of blocks of code for concurrent execution
US20040103251A1 (en) 2002-11-26 2004-05-27 Mitchell Alsup Microprocessor including a first level cache and a second level cache having different cache line sizes
US7539879B2 (en) 2002-12-04 2009-05-26 Nxp B.V. Register file gating to reduce microprocessor power dissipation
US6981083B2 (en) 2002-12-05 2005-12-27 International Business Machines Corporation Processor virtualization mechanism via an enhanced restoration of hard architected states
US7073042B2 (en) 2002-12-12 2006-07-04 Intel Corporation Reclaiming existing fields in address translation data structures to extend control over memory accesses
US20040117594A1 (en) 2002-12-13 2004-06-17 Vanderspek Julius Memory management method
US20040122887A1 (en) 2002-12-20 2004-06-24 Macy William W. Efficient multiplication of small matrices using SIMD registers
US7191349B2 (en) 2002-12-26 2007-03-13 Intel Corporation Mechanism for processor power state aware distribution of lowest priority interrupt
US20040139441A1 (en) 2003-01-09 2004-07-15 Kabushiki Kaisha Toshiba Processor, arithmetic operation processing method, and priority determination method
US6925421B2 (en) 2003-01-09 2005-08-02 International Business Machines Corporation Method, system, and computer program product for estimating the number of consumers that place a load on an individual resource in a pool of physically distributed resources
US7178010B2 (en) 2003-01-16 2007-02-13 Ip-First, Llc Method and apparatus for correcting an internal call/return stack in a microprocessor that detects from multiple pipeline stages incorrect speculative update of the call/return stack
US7089374B2 (en) 2003-02-13 2006-08-08 Sun Microsystems, Inc. Selectively unmarking load-marked cache lines during transactional program execution
US7278030B1 (en) 2003-03-03 2007-10-02 Vmware, Inc. Virtualization system for computers having multiple protection mechanisms
US6912644B1 (en) 2003-03-06 2005-06-28 Intel Corporation Method and apparatus to steer memory access operations in a virtual memory system
US7111145B1 (en) 2003-03-25 2006-09-19 Vmware, Inc. TLB miss fault handler and method for accessing multiple page tables
US7143273B2 (en) 2003-03-31 2006-11-28 Intel Corporation Method and apparatus for dynamic branch prediction utilizing multiple stew algorithms for indexing a global history
CN1214666C (zh) 2003-04-07 2005-08-10 华为技术有限公司 位置业务中限制位置信息请求流量的方法
US7058764B2 (en) 2003-04-14 2006-06-06 Hewlett-Packard Development Company, L.P. Method of adaptive cache partitioning to increase host I/O performance
US7139855B2 (en) 2003-04-24 2006-11-21 International Business Machines Corporation High performance synchronization of resource allocation in a logically-partitioned system
US7469407B2 (en) 2003-04-24 2008-12-23 International Business Machines Corporation Method for resource balancing using dispatch flush in a simultaneous multithread processor
US7290261B2 (en) 2003-04-24 2007-10-30 International Business Machines Corporation Method and logical apparatus for rename register reallocation in a simultaneous multi-threaded (SMT) processor
EP1471421A1 (en) 2003-04-24 2004-10-27 STMicroelectronics Limited Speculative load instruction control
US7055003B2 (en) 2003-04-25 2006-05-30 International Business Machines Corporation Data cache scrub mechanism for large L2/L3 data cache structures
US7007108B2 (en) 2003-04-30 2006-02-28 Lsi Logic Corporation System method for use of hardware semaphores for resource release notification wherein messages comprises read-modify-write operation and address
US7743238B2 (en) 2003-05-09 2010-06-22 Arm Limited Accessing items of architectural state from a register cache in a data processing apparatus when performing branch prediction operations for an indirect branch instruction
US7861062B2 (en) 2003-06-25 2010-12-28 Koninklijke Philips Electronics N.V. Data processing device with instruction controlled clock speed
JP2005032018A (ja) 2003-07-04 2005-02-03 Semiconductor Energy Lab Co Ltd 遺伝的アルゴリズムを用いたマイクロプロセッサ
US7149872B2 (en) 2003-07-10 2006-12-12 Transmeta Corporation System and method for identifying TLB entries associated with a physical address of a specified range
US7089398B2 (en) 2003-07-31 2006-08-08 Silicon Graphics, Inc. Address translation using a page size tag
US8296771B2 (en) 2003-08-18 2012-10-23 Cray Inc. System and method for mapping between resource consumers and resource providers in a computing system
US7133950B2 (en) 2003-08-19 2006-11-07 Sun Microsystems, Inc. Request arbitration in multi-core processor
US9032404B2 (en) 2003-08-28 2015-05-12 Mips Technologies, Inc. Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor
US7594089B2 (en) 2003-08-28 2009-09-22 Mips Technologies, Inc. Smart memory based synchronization controller for a multi-threaded multiprocessor SoC
US7849297B2 (en) 2003-08-28 2010-12-07 Mips Technologies, Inc. Software emulation of directed exceptions in a multithreading processor
JP4740851B2 (ja) 2003-08-28 2011-08-03 ミップス テクノロジーズ インコーポレイテッド 仮想プロセッサリソースの動的構成のための機構体
US7111126B2 (en) 2003-09-24 2006-09-19 Arm Limited Apparatus and method for loading data values
JP4057989B2 (ja) 2003-09-26 2008-03-05 株式会社東芝 スケジューリング方法および情報処理システム
FR2860313B1 (fr) 2003-09-30 2005-11-04 Commissariat Energie Atomique Composant a architecture reconfigurable dynamiquement
US7047322B1 (en) 2003-09-30 2006-05-16 Unisys Corporation System and method for performing conflict resolution and flow control in a multiprocessor system
US7373637B2 (en) 2003-09-30 2008-05-13 International Business Machines Corporation Method and apparatus for counting instruction and memory location ranges
TWI281121B (en) 2003-10-06 2007-05-11 Ip First Llc Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence
US8407433B2 (en) 2007-06-25 2013-03-26 Sonics, Inc. Interconnect implementing internal controls
US7395372B2 (en) 2003-11-14 2008-07-01 International Business Machines Corporation Method and system for providing cache set selection which is power optimized
US7243170B2 (en) 2003-11-24 2007-07-10 International Business Machines Corporation Method and circuit for reading and writing an instruction buffer
US20050120191A1 (en) 2003-12-02 2005-06-02 Intel Corporation (A Delaware Corporation) Checkpoint-based register reclamation
US20050132145A1 (en) 2003-12-15 2005-06-16 Finisar Corporation Contingent processor time division multiple access of memory in a multi-processor system to allow supplemental memory consumer access
US7310722B2 (en) 2003-12-18 2007-12-18 Nvidia Corporation Across-thread out of order instruction dispatch in a multithreaded graphics processor
US7293164B2 (en) 2004-01-14 2007-11-06 International Business Machines Corporation Autonomic method and apparatus for counting branch instructions to generate branch statistics meant to improve branch predictions
US20050204118A1 (en) 2004-02-27 2005-09-15 National Chiao Tung University Method for inter-cluster communication that employs register permutation
US7478374B2 (en) 2004-03-22 2009-01-13 Intel Corporation Debug system having assembler correcting register allocation errors
US20050216920A1 (en) 2004-03-24 2005-09-29 Vijay Tewari Use of a virtual machine to emulate a hardware device
WO2005093562A1 (ja) 2004-03-29 2005-10-06 Kyoto University データ処理装置、データ処理プログラム、およびデータ処理プログラムを記録した記録媒体
US7383427B2 (en) 2004-04-22 2008-06-03 Sony Computer Entertainment Inc. Multi-scalar extension for SIMD instruction set processors
US20050251649A1 (en) 2004-04-23 2005-11-10 Sony Computer Entertainment Inc. Methods and apparatus for address map optimization on a multi-scalar extension
US7418582B1 (en) 2004-05-13 2008-08-26 Sun Microsystems, Inc. Versatile register file design for a multi-threaded processor utilizing different modes and register windows
US7478198B2 (en) 2004-05-24 2009-01-13 Intel Corporation Multithreaded clustered microarchitecture with dynamic back-end assignment
US7594234B1 (en) 2004-06-04 2009-09-22 Sun Microsystems, Inc. Adaptive spin-then-block mutual exclusion in multi-threaded processing
US7284092B2 (en) 2004-06-24 2007-10-16 International Business Machines Corporation Digital data processing apparatus having multi-level register file
US20050289530A1 (en) 2004-06-29 2005-12-29 Robison Arch D Scheduling of instructions in program compilation
EP1628235A1 (en) 2004-07-01 2006-02-22 Texas Instruments Incorporated Method and system of ensuring integrity of a secure mode entry sequence
US8044951B1 (en) 2004-07-02 2011-10-25 Nvidia Corporation Integer-based functionality in a graphics shading language
US7339592B2 (en) 2004-07-13 2008-03-04 Nvidia Corporation Simulating multiported memories using lower port count memories
US7398347B1 (en) 2004-07-14 2008-07-08 Altera Corporation Methods and apparatus for dynamic instruction controlled reconfigurable register file
EP1619593A1 (en) 2004-07-22 2006-01-25 Sap Ag Computer-Implemented method and system for performing a product availability check
JP4064380B2 (ja) 2004-07-29 2008-03-19 富士通株式会社 演算処理装置およびその制御方法
US8443171B2 (en) 2004-07-30 2013-05-14 Hewlett-Packard Development Company, L.P. Run-time updating of prediction hint instructions
US7213106B1 (en) 2004-08-09 2007-05-01 Sun Microsystems, Inc. Conservative shadow cache support in a point-to-point connected multiprocessing node
US7318143B2 (en) 2004-10-20 2008-01-08 Arm Limited Reuseable configuration data
US20090150890A1 (en) * 2007-12-10 2009-06-11 Yourst Matt T Strand-based computing hardware and dynamically optimizing strandware for a high performance microprocessor system
US7707578B1 (en) 2004-12-16 2010-04-27 Vmware, Inc. Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system
US7257695B2 (en) 2004-12-28 2007-08-14 Intel Corporation Register file regions for a processing system
US7996644B2 (en) 2004-12-29 2011-08-09 Intel Corporation Fair sharing of a cache in a multi-core/multi-threaded processor by dynamically partitioning of the cache
US8719819B2 (en) 2005-06-30 2014-05-06 Intel Corporation Mechanism for instruction set based thread execution on a plurality of instruction sequencers
US7050922B1 (en) 2005-01-14 2006-05-23 Agilent Technologies, Inc. Method for optimizing test order, and machine-readable media storing sequences of instructions to perform same
US7681014B2 (en) 2005-02-04 2010-03-16 Mips Technologies, Inc. Multithreading instruction scheduler employing thread group priorities
US7657891B2 (en) 2005-02-04 2010-02-02 Mips Technologies, Inc. Multithreading microprocessor with optimized thread scheduler for increasing pipeline utilization efficiency
US20090031104A1 (en) 2005-02-07 2009-01-29 Martin Vorbach Low Latency Massive Parallel Data Processing Device
US7400548B2 (en) 2005-02-09 2008-07-15 International Business Machines Corporation Method for providing multiple reads/writes using a 2read/2write register file array
US7343476B2 (en) 2005-02-10 2008-03-11 International Business Machines Corporation Intelligent SMT thread hang detect taking into account shared resource contention/blocking
US7152155B2 (en) 2005-02-18 2006-12-19 Qualcomm Incorporated System and method of correcting a branch misprediction
US20060200655A1 (en) 2005-03-04 2006-09-07 Smith Rodney W Forward looking branch target address caching
US8195922B2 (en) 2005-03-18 2012-06-05 Marvell World Trade, Ltd. System for dynamically allocating processing time to multiple threads
US20060212853A1 (en) 2005-03-18 2006-09-21 Marvell World Trade Ltd. Real-time control apparatus having a multi-thread processor
GB2424727B (en) 2005-03-30 2007-08-01 Transitive Ltd Preparing instruction groups for a processor having a multiple issue ports
US8522253B1 (en) 2005-03-31 2013-08-27 Guillermo Rozas Hardware support for virtual machine and operating system context switching in translation lookaside buffers and virtually tagged caches
US20060230243A1 (en) 2005-04-06 2006-10-12 Robert Cochran Cascaded snapshots
US7313775B2 (en) 2005-04-06 2007-12-25 Lsi Corporation Integrated circuit with relocatable processor hardmac
US20060230409A1 (en) 2005-04-07 2006-10-12 Matteo Frigo Multithreaded processor architecture with implicit granularity adaptation
US8230423B2 (en) 2005-04-07 2012-07-24 International Business Machines Corporation Multithreaded processor architecture with operational latency hiding
US20060230253A1 (en) 2005-04-11 2006-10-12 Lucian Codrescu Unified non-partitioned register files for a digital signal processor operating in an interleaved multi-threaded environment
US20060236074A1 (en) 2005-04-14 2006-10-19 Arm Limited Indicating storage locations within caches
US7437543B2 (en) 2005-04-19 2008-10-14 International Business Machines Corporation Reducing the fetch time of target instructions of a predicted taken branch instruction
US7461237B2 (en) 2005-04-20 2008-12-02 Sun Microsystems, Inc. Method and apparatus for suppressing duplicative prefetches for branch target cache lines
US8713286B2 (en) 2005-04-26 2014-04-29 Qualcomm Incorporated Register files for a digital signal processor operating in an interleaved multi-threaded environment
GB2426084A (en) 2005-05-13 2006-11-15 Agilent Technologies Inc Updating data in a dual port memory
US7861055B2 (en) 2005-06-07 2010-12-28 Broadcom Corporation Method and system for on-chip configurable data ram for fast memory and pseudo associative caches
US8010969B2 (en) 2005-06-13 2011-08-30 Intel Corporation Mechanism for monitoring instruction set based thread execution on a plurality of instruction sequencers
GB2444455A (en) * 2005-08-29 2008-06-04 Searete Llc Scheduling mechanism of a hierarchical processor including multiple parallel clusters
CN101263465B (zh) 2005-09-14 2011-11-09 皇家飞利浦电子股份有限公司 用于总线仲裁的方法和***
US7350056B2 (en) 2005-09-27 2008-03-25 International Business Machines Corporation Method and apparatus for issuing instructions from an issue queue in an information handling system
US7676634B1 (en) 2005-09-28 2010-03-09 Sun Microsystems, Inc. Selective trace cache invalidation for self-modifying code via memory aging
US7231106B2 (en) 2005-09-30 2007-06-12 Lucent Technologies Inc. Apparatus for directing an optical signal from an input fiber to an output fiber within a high index host
US7627735B2 (en) 2005-10-21 2009-12-01 Intel Corporation Implementing vector memory operations
US7613131B2 (en) 2005-11-10 2009-11-03 Citrix Systems, Inc. Overlay network infrastructure
US7681019B1 (en) 2005-11-18 2010-03-16 Sun Microsystems, Inc. Executing functions determined via a collection of operations from translated instructions
US7861060B1 (en) 2005-12-15 2010-12-28 Nvidia Corporation Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior
US7634637B1 (en) 2005-12-16 2009-12-15 Nvidia Corporation Execution of parallel groups of threads with per-instruction serialization
US7770161B2 (en) 2005-12-28 2010-08-03 International Business Machines Corporation Post-register allocation profile directed instruction scheduling
US8423682B2 (en) 2005-12-30 2013-04-16 Intel Corporation Address space emulation
US20070186050A1 (en) 2006-02-03 2007-08-09 International Business Machines Corporation Self prefetching L2 cache mechanism for data lines
GB2435362B (en) 2006-02-20 2008-11-26 Cramer Systems Ltd Method of configuring devices in a telecommunications network
JP4332205B2 (ja) 2006-02-27 2009-09-16 富士通株式会社 キャッシュ制御装置およびキャッシュ制御方法
US7543282B2 (en) 2006-03-24 2009-06-02 Sun Microsystems, Inc. Method and apparatus for selectively executing different executable code versions which are optimized in different ways
CN103646009B (zh) 2006-04-12 2016-08-17 索夫特机械公司 对载明并行和依赖运算的指令矩阵进行处理的装置和方法
US7577820B1 (en) 2006-04-14 2009-08-18 Tilera Corporation Managing data in a parallel processing environment
US7610571B2 (en) 2006-04-14 2009-10-27 Cadence Design Systems, Inc. Method and system for simulating state retention of an RTL design
CN100485636C (zh) 2006-04-24 2009-05-06 华为技术有限公司 一种基于模型驱动进行电信级业务开发的调试方法及装置
US7804076B2 (en) 2006-05-10 2010-09-28 Taiwan Semiconductor Manufacturing Co., Ltd Insulator for high current ion implanters
US8145882B1 (en) 2006-05-25 2012-03-27 Mips Technologies, Inc. Apparatus and method for processing template based user defined instructions
US20080126771A1 (en) 2006-07-25 2008-05-29 Lei Chen Branch Target Extension for an Instruction Cache
CN100495324C (zh) 2006-07-27 2009-06-03 中国科学院计算技术研究所 复杂指令集体系结构中的深度优先异常处理方法
US7904704B2 (en) 2006-08-14 2011-03-08 Marvell World Trade Ltd. Instruction dispatching method and apparatus
US8046775B2 (en) 2006-08-14 2011-10-25 Marvell World Trade Ltd. Event-based bandwidth allocation mode switching method and apparatus
US7539842B2 (en) 2006-08-15 2009-05-26 International Business Machines Corporation Computer memory system for selecting memory buses according to physical memory organization information stored in virtual address translation tables
US7594060B2 (en) 2006-08-23 2009-09-22 Sun Microsystems, Inc. Data buffer allocation in a non-blocking data services platform using input/output switching fabric
US7752474B2 (en) 2006-09-22 2010-07-06 Apple Inc. L1 cache flush when processor is entering low power mode
US7716460B2 (en) 2006-09-29 2010-05-11 Qualcomm Incorporated Effective use of a BHT in processor having variable length instruction set execution modes
US7774549B2 (en) 2006-10-11 2010-08-10 Mips Technologies, Inc. Horizontally-shared cache victims in multiple core processors
TWI337495B (en) 2006-10-26 2011-02-11 Au Optronics Corp System and method for operation scheduling
US7680988B1 (en) 2006-10-30 2010-03-16 Nvidia Corporation Single interconnect providing read and write access to a memory shared by concurrent threads
US8108625B1 (en) 2006-10-30 2012-01-31 Nvidia Corporation Shared memory with parallel access and access conflict resolution mechanism
US7617384B1 (en) 2006-11-06 2009-11-10 Nvidia Corporation Structured programming control flow using a disable mask in a SIMD architecture
EP2523101B1 (en) * 2006-11-14 2014-06-04 Soft Machines, Inc. Apparatus and method for processing complex instruction formats in a multi- threaded architecture supporting various context switch modes and virtualization schemes
US7493475B2 (en) 2006-11-15 2009-02-17 Stmicroelectronics, Inc. Instruction vector-mode processing in multi-lane processor by multiplex switch replicating instruction in one lane to select others along with updated operand address
US7934179B2 (en) 2006-11-20 2011-04-26 Et International, Inc. Systems and methods for logic verification
US20080235500A1 (en) 2006-11-21 2008-09-25 Davis Gordon T Structure for instruction cache trace formation
JP2008130056A (ja) 2006-11-27 2008-06-05 Renesas Technology Corp 半導体回路
US7783869B2 (en) 2006-12-19 2010-08-24 Arm Limited Accessing branch predictions ahead of instruction fetching
WO2008077088A2 (en) 2006-12-19 2008-06-26 The Board Of Governors For Higher Education, State Of Rhode Island And Providence Plantations System and method for branch misprediction prediction using complementary branch predictors
EP1940028B1 (en) 2006-12-29 2012-02-29 STMicroelectronics Srl Asynchronous interconnection system for 3D inter-chip communication
US8321849B2 (en) 2007-01-26 2012-11-27 Nvidia Corporation Virtual architecture and instruction set for parallel thread computing
TW200833002A (en) 2007-01-31 2008-08-01 Univ Nat Yunlin Sci & Tech Distributed switching circuit having fairness
US20080189501A1 (en) 2007-02-05 2008-08-07 Irish John D Methods and Apparatus for Issuing Commands on a Bus
US7685410B2 (en) 2007-02-13 2010-03-23 Global Foundries Inc. Redirect recovery cache that receives branch misprediction redirects and caches instructions to be dispatched in response to the redirects
US7647483B2 (en) 2007-02-20 2010-01-12 Sony Computer Entertainment Inc. Multi-threaded parallel processor methods and apparatus
JP4980751B2 (ja) 2007-03-02 2012-07-18 富士通セミコンダクター株式会社 データ処理装置、およびメモリのリードアクティブ制御方法。
US8452907B2 (en) 2007-03-27 2013-05-28 Arm Limited Data processing apparatus and method for arbitrating access to a shared resource
US20080250227A1 (en) 2007-04-04 2008-10-09 Linderman Michael D General Purpose Multiprocessor Programming Apparatus And Method
US7716183B2 (en) 2007-04-11 2010-05-11 Dot Hill Systems Corporation Snapshot preserved data cloning
US7941791B2 (en) 2007-04-13 2011-05-10 Perry Wang Programming environment for heterogeneous processor resource integration
US7769955B2 (en) 2007-04-27 2010-08-03 Arm Limited Multiple thread instruction fetch from different cache levels
US7711935B2 (en) 2007-04-30 2010-05-04 Netlogic Microsystems, Inc. Universal branch identifier for invalidation of speculative instructions
US8555039B2 (en) 2007-05-03 2013-10-08 Qualcomm Incorporated System and method for using a local condition code register for accelerating conditional instruction execution in a pipeline processor
US8219996B1 (en) 2007-05-09 2012-07-10 Hewlett-Packard Development Company, L.P. Computer processor with fairness monitor
CN101344840B (zh) 2007-07-10 2011-08-31 苏州简约纳电子有限公司 一种微处理器及在微处理器中执行指令的方法
US7937568B2 (en) 2007-07-11 2011-05-03 International Business Machines Corporation Adaptive execution cycle control method for enhanced instruction throughput
US20090025004A1 (en) 2007-07-16 2009-01-22 Microsoft Corporation Scheduling by Growing and Shrinking Resource Allocation
US8108545B2 (en) 2007-08-27 2012-01-31 International Business Machines Corporation Packet coalescing in virtual channels of a data processing system in a multi-tiered full-graph interconnect architecture
US7711929B2 (en) 2007-08-30 2010-05-04 International Business Machines Corporation Method and system for tracking instruction dependency in an out-of-order processor
US8725991B2 (en) 2007-09-12 2014-05-13 Qualcomm Incorporated Register file system and method for pipelined processing
US8082420B2 (en) 2007-10-24 2011-12-20 International Business Machines Corporation Method and apparatus for executing instructions
US7856530B1 (en) 2007-10-31 2010-12-21 Network Appliance, Inc. System and method for implementing a dynamic cache for a data storage system
US7877559B2 (en) 2007-11-26 2011-01-25 Globalfoundries Inc. Mechanism to accelerate removal of store operations from a queue
US8245232B2 (en) 2007-11-27 2012-08-14 Microsoft Corporation Software-configurable and stall-time fair memory access scheduling mechanism for shared memory systems
US7809925B2 (en) 2007-12-07 2010-10-05 International Business Machines Corporation Processing unit incorporating vectorizable execution unit
US8145844B2 (en) 2007-12-13 2012-03-27 Arm Limited Memory controller with write data cache and read data cache
US7870371B2 (en) 2007-12-17 2011-01-11 Microsoft Corporation Target-frequency based indirect jump prediction for high-performance processors
US7831813B2 (en) 2007-12-17 2010-11-09 Globalfoundries Inc. Uses of known good code for implementing processor architectural modifications
US20090165007A1 (en) 2007-12-19 2009-06-25 Microsoft Corporation Task-level thread scheduling and resource allocation
US8782384B2 (en) 2007-12-20 2014-07-15 Advanced Micro Devices, Inc. Branch history with polymorphic indirect branch information
US7917699B2 (en) 2007-12-21 2011-03-29 Mips Technologies, Inc. Apparatus and method for controlling the exclusivity mode of a level-two cache
US9244855B2 (en) 2007-12-31 2016-01-26 Intel Corporation Method, system, and apparatus for page sizing extension
US8645965B2 (en) 2007-12-31 2014-02-04 Intel Corporation Supporting metered clients with manycore through time-limited partitioning
US7877582B2 (en) 2008-01-31 2011-01-25 International Business Machines Corporation Multi-addressable register file
WO2009101563A1 (en) 2008-02-11 2009-08-20 Nxp B.V. Multiprocessing implementing a plurality of virtual processors
US9021240B2 (en) 2008-02-22 2015-04-28 International Business Machines Corporation System and method for Controlling restarting of instruction fetching using speculative address computations
US7949972B2 (en) 2008-03-19 2011-05-24 International Business Machines Corporation Method, system and computer program product for exploiting orthogonal control vectors in timing driven synthesis
US7987343B2 (en) 2008-03-19 2011-07-26 International Business Machines Corporation Processor and method for synchronous load multiple fetching sequence and pipeline stage result tracking to facilitate early address generation interlock bypass
US9513905B2 (en) 2008-03-28 2016-12-06 Intel Corporation Vector instructions to enable efficient synchronization and parallel reduction operations
US8120608B2 (en) 2008-04-04 2012-02-21 Via Technologies, Inc. Constant buffering for a computational core of a programmable graphics processing unit
TWI364703B (en) 2008-05-26 2012-05-21 Faraday Tech Corp Processor and early execution method of data load thereof
US8131982B2 (en) 2008-06-13 2012-03-06 International Business Machines Corporation Branch prediction instructions having mask values involving unloading and loading branch history data
US8145880B1 (en) 2008-07-07 2012-03-27 Ovics Matrix processor data switch routing systems and methods
CN102089752B (zh) 2008-07-10 2014-05-07 洛克泰克科技有限公司 依赖性问题的有效率的并行计算
JP2010039536A (ja) 2008-07-31 2010-02-18 Panasonic Corp プログラム変換装置、プログラム変換方法およびプログラム変換プログラム
US8316435B1 (en) 2008-08-14 2012-11-20 Juniper Networks, Inc. Routing device having integrated MPLS-aware firewall with virtual security system support
US8135942B2 (en) 2008-08-28 2012-03-13 International Business Machines Corpration System and method for double-issue instructions using a dependency matrix and a side issue queue
US7769984B2 (en) 2008-09-11 2010-08-03 International Business Machines Corporation Dual-issuance of microprocessor instructions using dual dependency matrices
US8225048B2 (en) 2008-10-01 2012-07-17 Hewlett-Packard Development Company, L.P. Systems and methods for resource access
US9244732B2 (en) 2009-08-28 2016-01-26 Vmware, Inc. Compensating threads for microarchitectural resource contentions by prioritizing scheduling and execution
US7941616B2 (en) 2008-10-21 2011-05-10 Microsoft Corporation System to reduce interference in concurrent programs
GB2464703A (en) 2008-10-22 2010-04-28 Advanced Risc Mach Ltd An array of interconnected processors executing a cycle-based program
US8423749B2 (en) 2008-10-22 2013-04-16 International Business Machines Corporation Sequential processing in network on chip nodes by threads generating message containing payload and pointer for nanokernel to access algorithm to be executed on payload in another node
CN102257785A (zh) 2008-10-30 2011-11-23 诺基亚公司 用于交织数据块的方法和装置
US8032678B2 (en) 2008-11-05 2011-10-04 Mediatek Inc. Shared resource arbitration
US7848129B1 (en) 2008-11-20 2010-12-07 Netlogic Microsystems, Inc. Dynamically partitioned CAM array
US8868838B1 (en) 2008-11-21 2014-10-21 Nvidia Corporation Multi-class data cache policies
US8171223B2 (en) 2008-12-03 2012-05-01 Intel Corporation Method and system to increase concurrency and control replication in a multi-core cache hierarchy
US8200949B1 (en) 2008-12-09 2012-06-12 Nvidia Corporation Policy based allocation of register file cache to threads in multi-threaded processor
US8312268B2 (en) 2008-12-12 2012-11-13 International Business Machines Corporation Virtual machine
US8099586B2 (en) 2008-12-30 2012-01-17 Oracle America, Inc. Branch misprediction recovery mechanism for microprocessors
US20100169578A1 (en) 2008-12-31 2010-07-01 Texas Instruments Incorporated Cache tag memory
US20100205603A1 (en) 2009-02-09 2010-08-12 Unisys Corporation Scheduling and dispatching tasks in an emulated operating system
JP5417879B2 (ja) 2009-02-17 2014-02-19 富士通セミコンダクター株式会社 キャッシュ装置
US8505013B2 (en) 2010-03-12 2013-08-06 Lsi Corporation Reducing data read latency in a network communications processor architecture
US8805788B2 (en) 2009-05-04 2014-08-12 Moka5, Inc. Transactional virtual disk with differential snapshots
US8332854B2 (en) 2009-05-19 2012-12-11 Microsoft Corporation Virtualized thread scheduling for hardware thread optimization based on hardware resource parameter summaries of instruction blocks in execution groups
US8533437B2 (en) 2009-06-01 2013-09-10 Via Technologies, Inc. Guaranteed prefetch instruction
GB2471067B (en) 2009-06-12 2011-11-30 Graeme Roy Smith Shared resource multi-thread array processor
US9122487B2 (en) 2009-06-23 2015-09-01 Oracle America, Inc. System and method for balancing instruction loads between multiple execution units using assignment history
US8386754B2 (en) 2009-06-24 2013-02-26 Arm Limited Renaming wide register source operand with plural short register source operands for select instructions to detect dependency fast with existing mechanism
CN101582025B (zh) 2009-06-25 2011-05-25 浙江大学 片上多处理器体系架构下全局寄存器重命名表的实现方法
US8397049B2 (en) 2009-07-13 2013-03-12 Apple Inc. TLB prefetching
US8539486B2 (en) 2009-07-17 2013-09-17 International Business Machines Corporation Transactional block conflict resolution based on the determination of executing threads in parallel or in serial mode
JP5423217B2 (ja) 2009-08-04 2014-02-19 富士通株式会社 演算処理装置、情報処理装置、および演算処理装置の制御方法
US8127078B2 (en) 2009-10-02 2012-02-28 International Business Machines Corporation High performance unaligned cache access
US20110082983A1 (en) 2009-10-06 2011-04-07 Alcatel-Lucent Canada, Inc. Cpu instruction and data cache corruption prevention system
US8695002B2 (en) 2009-10-20 2014-04-08 Lantiq Deutschland Gmbh Multi-threaded processors and multi-processor systems comprising shared resources
US8364933B2 (en) 2009-12-18 2013-01-29 International Business Machines Corporation Software assisted translation lookaside buffer search mechanism
JP2011150397A (ja) 2010-01-19 2011-08-04 Panasonic Corp バス調停装置
KR101699910B1 (ko) 2010-03-04 2017-01-26 삼성전자주식회사 재구성 가능 프로세서 및 그 제어 방법
US20120005462A1 (en) 2010-07-01 2012-01-05 International Business Machines Corporation Hardware Assist for Optimizing Code During Processing
US8312258B2 (en) 2010-07-22 2012-11-13 Intel Corporation Providing platform independent memory logic
CN101916180B (zh) 2010-08-11 2013-05-29 中国科学院计算技术研究所 Risc处理器中执行寄存器类型指令的方法和其***
US8751745B2 (en) 2010-08-11 2014-06-10 Advanced Micro Devices, Inc. Method for concurrent flush of L1 and L2 caches
US8756329B2 (en) 2010-09-15 2014-06-17 Oracle International Corporation System and method for parallel multiplexing between servers in a cluster
US9201801B2 (en) 2010-09-15 2015-12-01 International Business Machines Corporation Computing device with asynchronous auxiliary execution unit
WO2012037491A2 (en) 2010-09-17 2012-03-22 Soft Machines, Inc. Single cycle multi-branch prediction including shadow cache for early far branch prediction
US20120079212A1 (en) 2010-09-23 2012-03-29 International Business Machines Corporation Architecture for sharing caches among multiple processes
EP3306466B1 (en) 2010-10-12 2020-05-13 INTEL Corporation An instruction sequence buffer to store branches having reliably predictable instruction sequences
WO2012051262A2 (en) 2010-10-12 2012-04-19 Soft Machines, Inc. An instruction sequence buffer to enhance branch prediction efficiency
US8370553B2 (en) 2010-10-18 2013-02-05 International Business Machines Corporation Formal verification of random priority-based arbiters using property strengthening and underapproximations
US9047178B2 (en) 2010-12-13 2015-06-02 SanDisk Technologies, Inc. Auto-commit memory synchronization
US8677355B2 (en) 2010-12-17 2014-03-18 Microsoft Corporation Virtual machine branching and parallel execution
WO2012103245A2 (en) 2011-01-27 2012-08-02 Soft Machines Inc. Guest instruction block with near branching and far branching sequence construction to native instruction block
KR101638225B1 (ko) 2011-03-25 2016-07-08 소프트 머신즈, 인크. 분할가능한 엔진에 의해 인스턴스화된 가상 코어를 이용한 명령어 시퀀스 코드 블록의 실행
EP2689326B1 (en) 2011-03-25 2022-11-16 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
EP2689330B1 (en) 2011-03-25 2022-12-21 Intel Corporation Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
US20120254592A1 (en) 2011-04-01 2012-10-04 Jesus Corbal San Adrian Systems, apparatuses, and methods for expanding a memory source into a destination register and compressing a source register into a destination memory location
US9740494B2 (en) 2011-04-29 2017-08-22 Arizona Board Of Regents For And On Behalf Of Arizona State University Low complexity out-of-order issue logic using static circuits
US8843690B2 (en) 2011-07-11 2014-09-23 Avago Technologies General Ip (Singapore) Pte. Ltd. Memory conflicts learning capability
US8930432B2 (en) 2011-08-04 2015-01-06 International Business Machines Corporation Floating point execution unit with fixed point functionality
US20130046934A1 (en) 2011-08-15 2013-02-21 Robert Nychka System caching using heterogenous memories
US8839025B2 (en) 2011-09-30 2014-09-16 Oracle International Corporation Systems and methods for retiring and unretiring cache lines
CN104040491B (zh) 2011-11-22 2018-06-12 英特尔公司 微处理器加速的代码优化器
CN104040492B (zh) 2011-11-22 2017-02-15 索夫特机械公司 微处理器加速的代码优化器和依赖性重排序方法
US10191746B2 (en) 2011-11-22 2019-01-29 Intel Corporation Accelerated code optimizer for a multiengine microprocessor
US20130138888A1 (en) 2011-11-30 2013-05-30 Jama I. Barreh Storing a target address of a control transfer instruction in an instruction field
US8930674B2 (en) 2012-03-07 2015-01-06 Soft Machines, Inc. Systems and methods for accessing a unified translation lookaside buffer
KR20130119285A (ko) 2012-04-23 2013-10-31 한국전자통신연구원 클러스터 컴퓨팅 환경에서의 자원 할당 장치 및 그 방법
US9684601B2 (en) 2012-05-10 2017-06-20 Arm Limited Data processing apparatus having cache and translation lookaside buffer
US9996348B2 (en) 2012-06-14 2018-06-12 Apple Inc. Zero cycle load
US9940247B2 (en) 2012-06-26 2018-04-10 Advanced Micro Devices, Inc. Concurrent access to cache dirty bits
US9916253B2 (en) 2012-07-30 2018-03-13 Intel Corporation Method and apparatus for supporting a plurality of load accesses of a cache in a single cycle to maintain throughput
US9710399B2 (en) 2012-07-30 2017-07-18 Intel Corporation Systems and methods for flushing a cache with modified data
US9740612B2 (en) 2012-07-30 2017-08-22 Intel Corporation Systems and methods for maintaining the coherency of a store coalescing cache and a load cache
US9229873B2 (en) 2012-07-30 2016-01-05 Soft Machines, Inc. Systems and methods for supporting a plurality of load and store accesses of a cache
US9430410B2 (en) 2012-07-30 2016-08-30 Soft Machines, Inc. Systems and methods for supporting a plurality of load accesses of a cache in a single cycle
US9678882B2 (en) 2012-10-11 2017-06-13 Intel Corporation Systems and methods for non-blocking implementation of cache flush instructions
US10037228B2 (en) 2012-10-25 2018-07-31 Nvidia Corporation Efficient memory virtualization in multi-threaded processing units
US9195506B2 (en) 2012-12-21 2015-11-24 International Business Machines Corporation Processor provisioning by a middleware processing system for a plurality of logical processor partitions
US9891924B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
EP2972836B1 (en) 2013-03-15 2022-11-09 Intel Corporation A method for emulating a guest centralized flag architecture by using a native distributed flag architecture
US10275255B2 (en) 2013-03-15 2019-04-30 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
US9886279B2 (en) 2013-03-15 2018-02-06 Intel Corporation Method for populating and instruction view data structure by using register template snapshots
WO2014151018A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for executing multithreaded instructions grouped onto blocks
WO2014150991A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for implementing a reduced size register view data structure in a microprocessor
WO2014150971A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for dependency broadcasting through a block organized source view data structure
US9904625B2 (en) 2013-03-15 2018-02-27 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
EP2972794A4 (en) 2013-03-15 2017-05-03 Soft Machines, Inc. A method for executing blocks of instructions using a microprocessor architecture having a register view, source view, instruction view, and a plurality of register templates
US9569216B2 (en) 2013-03-15 2017-02-14 Soft Machines, Inc. Method for populating a source view data structure by using register template snapshots
US10140138B2 (en) 2013-03-15 2018-11-27 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
WO2014150806A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for populating register view data structure by using register template snapshots
US9632825B2 (en) 2013-03-15 2017-04-25 Intel Corporation Method and apparatus for efficient scheduling for asymmetrical execution units

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030014613A1 (en) * 1998-08-31 2003-01-16 Naresh H. Soni Reservation stations to increase instruction level parallelism
US20080082469A1 (en) * 2006-09-20 2008-04-03 Chevron U.S.A. Inc. Method for forecasting the production of a petroleum reservoir utilizing genetic programming
CN101201734A (zh) * 2006-12-13 2008-06-18 国际商业机器公司 预解码用于执行的指令的方法及装置
CN101201733A (zh) * 2006-12-13 2008-06-18 国际商业机器公司 预解码用于执行的指令的方法及装置
CN101217495A (zh) * 2008-01-11 2008-07-09 北京邮电大学 用于t-mpls网络环境下的流量监控方法和装置
CN102066419A (zh) * 2008-04-28 2011-05-18 健泰科生物技术公司 人源化抗因子d抗体及其用途

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
J. WEI: "Implementing low-power configurable processors - practical options and tradeoffs", 《PROCEEDINGS. 42ND DESIGN AUTOMATION CONFERENCE》 *
Z. YOUSSFI: "A New Technique to Exploit Instruction-Level Parallelism for Reducing Microprocessor Power Consumption", 《2006 IEEE INTERNATIONAL CONFERENCE ON ELECTRO/INFORMATION TECHNOLOGY》 *
葛海通: "32位高性能嵌入式CPU及平台研发", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *
葛海通: "32位高性能嵌入式CPU及平台研发", 《中国博士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN108427574B (zh) 2022-06-07
US20150039859A1 (en) 2015-02-05
EP2783281B1 (en) 2020-05-13
US10521239B2 (en) 2019-12-31
KR20170016012A (ko) 2017-02-10
EP2783281A1 (en) 2014-10-01
US20170024219A1 (en) 2017-01-26
KR101832679B1 (ko) 2018-02-26
WO2013077876A1 (en) 2013-05-30
KR20140093721A (ko) 2014-07-28
EP2783281A4 (en) 2016-07-13
KR101703400B1 (ko) 2017-02-06
CN104040491B (zh) 2018-06-12
CN104040491A (zh) 2014-09-10

Similar Documents

Publication Publication Date Title
CN104040491B (zh) 微处理器加速的代码优化器
CN104040490B (zh) 用于多引擎微处理器的加速的代码优化器
CN104040492B (zh) 微处理器加速的代码优化器和依赖性重排序方法
US8893079B2 (en) Methods for generating code for an architecture encoding an extended register specification
CN107810477A (zh) 解码的指令的重复使用
CN108027767A (zh) 寄存器读取/写入排序
CN108027769A (zh) 使用寄存器访问指令发起指令块执行
CN108027771A (zh) 基于块的处理器核复合寄存器
CN110249302A (zh) 在处理器核上同时执行多个程序
JPH0922354A (ja) 命令シーケンスを実行するための方法および装置
CN108027735A (zh) 隐式程序次序
JP2828219B2 (ja) オブジェクト・コード互換性を与える方法、オブジェクト・コード互換性並びにスカラ・プロセッサ及びスーパスカラ・プロセッサとの互換性を与える方法、ツリー命令を実行するための方法、データ処理システム
TWI610224B (zh) 微處理器加速編碼最佳化器
TWI512613B (zh) 多引擎微處理器之加速編碼最佳化器
TWI506548B (zh) 微處理器加速編碼最佳化器與相依性重排序之方法
Verians et al. A graph-oriented task manager for small multiprocessor systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant