CN105849670A

CN105849670A - Energy efficient multi-cluster system and its operations

Info

Publication number: CN105849670A
Application number: CN201580003343.6A
Authority: CN
Inventors: 陈家明; 周宏霖; 张雅婷; 邱士颜; 许嘉豪; 林有明; 黄万庆; 杨仁杰; 萧丕承
Original assignee: MediaTek Inc
Current assignee: MediaTek Inc
Priority date: 2014-11-17
Filing date: 2015-11-16
Publication date: 2016-08-10
Also published as: WO2016078556A1

Abstract

A multi-cluster system having processor cores of different energy efficiency characteristics is configured to operate with high efficiency such that performance and power requirements can be satisfied. The system includes multiple processor cores in a hierarchy of groups. The hierarchy of groups includes: multiple level-1 groups, each level-1 group including one or more of processor cores having identical energy efficiency characteristics, and each level-1 group configured to be assigned tasks by a level-1 scheduler; one or more level-2 groups, each level-2 group including respective level-1 groups, the processor cores in different level-1 groups of the same level-2 group having different energy efficiency characteristics, and each level-2 group configured to be assigned tasks by a respective level-2 scheduler; and a level-3 group including the one or more level-2 groups and configured to be assigned tasks by a level-3 scheduler.

Description

The polyad collecting system of energy efficiency and operation thereof

Cross-Reference to Related Applications

The application be U.S. Patent Application No. be 14/931,923, the part in filing date on November 4th, 2015 Continuity application, and its require filing date on November 17th, 2014 U.S. Provisional Application case 62/080,617, The U.S. Provisional Application case in filing date on February 3rd, 2015 is 62/111,138, filing date 2015 3 The U.S. in the U.S. Provisional Application case 62/126,963 of months 2 days and filing date on April 16th, 2015 faces Time application case 62/148,325 priority.

Technical field

The present embodiments relate to polyad collecting system, and more particularly, in polyad collecting system Performance and power management, this polyad collecting system includes the processor core with different energy sources efficiency characteristic.

Background technology

Dynamic frequency scalable is a kind of technology the most automatically adjusting the frequency that processor performs.Process The increase of the operation frequency of device can improve calculated performance.But, owing in integrated circuit, power consumption is to pass through P=CxV²XF calculates acquisition, and wherein P represents that power, C represent the electric capacity that each clock cycle switches, V Representing voltage, F represents frequency, and therefore the increase of frequency means that the power consumption of processor is consequently increased. Some existing calculating systems have built-in management framework (framework), with management of performance and power consumption Between balance.Such as, determine operation frequency the most whether is increased or decreased, and whether activate Or deexcitation processor core, to meet the demand of systematic function or in order to save power.

In having the multiple cluster calculating system of processor of multiple type, management framework needs in view of every The power consumption of the processor of type and performance.If the most only allowing a type of processor to operate, The disposal ability of other type of processor is not used, and high live load may be unsatisfactory for Demand.In the system allowing polytype processor simultaneously to operate, management framework needs have optimization effect Rate and drop the strategy of lower powered operation.

Therefore, in there is the polyad collecting system of processor of multiple type, need to improve its power and property Can management.

Summary of the invention

In one embodiment, it is provided that a kind of calculating system, it is included in the multiple processors in layering group Core.This layering group comprises: multiple level-1 groups, and each described level-1 group includes multiple processor Core there are the one or more of identical energy efficiency characteristic, and each described level-1 group is by each From level-1 scheduler be allocated first task；One or more level-2 groups, the one or more Each in level-2 group includes respective multiple level-1 group, in identical described level-2 group Different described level-1 groups in processor core there is different energy efficiency characteristics, and described one Each in individual or multiple level-2 group is allocated the second task by respective level-2 scheduler；With And level-3 group, including the one or more level-2 group, and divided by level-3 scheduler Join the 3rd task.It should be noted that first task, the second task and the 3rd task can be identical or not Same task.

In another embodiment, it is provided that a kind of calculating system, it is included in the multiple processors in layering group Core.This layering group comprises: multiple level-1 groups, and each described level-1 group includes multiple processor Core one or more, and there is symmetry multiprocessing framework；One or more level-2 groups, described Each in one or more level-2 groups includes respective multiple level-1 group, described in identical Processor core in different described level-1 group of level-2 group has different energy efficiency characteristics； And level-3 group, including the one or more level-2 group, and there is isomery multiprocessing framework.

In another embodiment, it is provided that a kind of calculating system, it is included in the multiple processors in layering group Core.This layering group comprises: one or more leaves-level group, in identical leaf-level group extremely Processor core described at least two in a few leaf-level group has different energy efficiency characteristics, with And each in the one or more leaf-level group divided by respective leaf-hierarchical scheduler Join first task；And root-level group, including the one or more leaf-level group, and pass through Root-hierarchical scheduler is allocated the second task.It should be noted that first task and the second task can be identical Or different task.

In another embodiment, it is provided that a kind of calculating system, it is included in the multiple processors in layering group Core.This layering group comprises: one or more leaves-level group, in identical leaf-level group extremely At least two processor core in a few leaf-level group has different energy efficiency characteristics；And root- Level group, including the one or more leaf-level group, and has isomery multiprocessing framework.

According to embodiment described herein, there is the multiple constellation of the processor core of different energy sources efficiency characteristic System can operate under high efficiency so that satisfiability can be with the demand of power.

Accompanying drawing explanation

The present invention is described by example, and example unrestricted, identical in the diagram in accompanying drawing part Need to represent identical element.It should be noted that sequence number " one " embodiment is not necessarily limited in this specification Identical embodiment, and such sequence number represents at least one.Additionally, ought describe specific in conjunction with the embodiments Feature, structure or characteristic time, in the ken of one skilled in the relevant art, regardless of whether bright Really describing, such feature, structure or characteristic can realize in conjunction with other embodiments.

Fig. 1 is the schematic diagram of the example of the polyad collecting system according to an embodiment.

Fig. 2 A is the schematic diagram for the relation between frequency of the power consumption according to an embodiment.

Fig. 2 B is the schematic diagram for the relation between frequency of the power consumption according to another embodiment.

Fig. 3 is the schematic diagram of the example of the polyad collecting system in the layering group according to an embodiment.

Fig. 4 is the schematic diagram of the example of the polyad collecting system in the layering group according to another embodiment.

Fig. 5 is the schematic diagram of the example of the polyad collecting system in the layering group according to another embodiment.

Fig. 6 A is the signal of the example of the polyad collecting system in two rank is layered according to an embodiment Figure.

Fig. 6 B is the signal of the example of the polyad collecting system in two rank is layered according to another embodiment Figure.

Fig. 7 is according to an embodiment of the invention, the signal of the transformation of four operation scenarios of two clusters Figure.

Fig. 8 is according to an embodiment of the invention, and the method for operation polyad collecting system is schematic efficiently Flow chart.

Fig. 9 is according to an embodiment of the invention, has the additional detail of interrupt processing function, Fig. 1 The schematic diagram of polyad collecting system.

Figure 10 is according to an embodiment of the invention, and from a cluster, interrupt requests and task are transferred to another The schematic diagram of the method for cluster.

Figure 11 is the schematic diagram changed between two clusters according to an embodiment of the invention.

Figure 12 is the schematic diagram changed between two clusters according to another embodiment of the present invention.

Figure 13 is the schematic diagram changed between two clusters according to further embodiment of this invention.

Figure 14 is the schematic diagram changed between two clusters according to yet another embodiment of the invention.

Detailed description of the invention

Hereinafter, many concrete details are elaborated.It can be appreciated, however, that embodiments of the invention are permissible Implement in the case of there is no these details.In other example, be not described in known circuit, Framework and technology, in order to avoid obscuring the understanding of this description.Those of ordinary skill in the art can manage Solving, the present invention can implement in the case of not having these details.Usual technology in those this areas With included narration, suitable functional without too much experiment by realizing.

It should be noted that term used herein " polyad collecting system (multi-cluster system) " is " many Core processor system ", it is configured with management as multiple clusters.According to actual design, multinuclear processes Device system can be multiple nucleus system or multicomputer system.In other words, the method that the present invention proposes can be by It is configured to use with multicomputer system as any multiple nucleus system of multiple clusters with management.Such as, close In multiple nucleus system, all of processor core is configurable in a processor.As another example, about Multicomputer system, each processor core can be arranged respectively in a processor.Therefore, each cluster can Implement to be used as the group of one or more processor.

Additionally, " processor group " used herein represents the group of the processor core of specific level, example Such as level-1 group, level-2 group or level-3 group.Additionally, " the class relevant with " processor type " Type " represent the common denominator that processor core group shares, wherein common denominator includes but not limited to energy efficiency Characteristic and/or calculated performance.Such as, calculated performance can be instructed (million instruction per second by million Per-second, MIPS) weigh." power efficiency " in " energy efficiency " of processor core or equivalence Usable frequency or frequency range are measured.Some indexs can be used for this measurement, and one of them is MIPS/MW, It represents MIPS/ megawatt, or MHz/MW, and it represents megahertz/megawatt.Energy efficiency and power consumption phase Instead；There is in frequency range the processor core of high-energy source efficiency, this frequency range consumes low-power. It addition, the terms " is substantially the same " expression " identical " or " in predetermined tolerance ".

Additionally, term " deexcitation processor core " represents that processor core is fully powered-off (that is, not leading to Electricity) or enter low power state.Processor core can pass through hot plug (that is, in the operating system time Power-off or physical removal) or other mechanism power-off." deexcitation cluster " represents all process in this cluster Device core is fully powered-off or enters low power state." active processor core " represents that processor core is for energising And enter holding state or perform instruction state of activation." activation cluster " represents in this cluster Individual or multiple processor cores enter state that is standby or that activate." activate " processor core or cluster has been also referred to as For " activation " processor core or cluster.Similarly, " deexcitation " processor core or cluster are also referred to as " nonactivated " processor core or cluster.

Fig. 1 is the schematic diagram of the example of the polyad collecting system 100 according to an embodiment.In this example embodiment, Polyad collecting system 100 includes cluster Cluster (0), Cluster (1) ...., Cluster (M).Can In the embodiment replaced, polyad collecting system 100 can include any number of cluster of at least two.Each Cluster includes one or more processor core, and they can share identical L2 cache or the height of additional level Speed caching (additional levels of caches).Each cluster also can be via cache coherence interface (cache coherence interconnect) 110 access system storage 130.

In one embodiment, polyad collecting system 100 uses management module 120, and it activates or at deexcitation Reason device core or cluster are to meet system design considerations so that reach energy efficiency.Polyad collecting system 100 is also Use task allocating module 140, the task between its distribution and dispatch processor core.This distribution can reach each Worn balance in cluster, and optimize the work distribution across cluster.In one embodiment, task Distribution module 140 includes a group scheduling device, and include but not limited in following scheduler is one or more: right Claim multiprocessing (Symmetric Multiprocessing, SMP) scheduler, asymmetric multiprocessing (Asymmetric Multiprocessing, AMP) scheduler, isomery multiprocessing (Heterogeneous Multiprocessing, HMP) scheduler, in cluster scheduler and core switch (In-kernel Switcher, IKS) scheduler Deng.The function of these schedulers is described in detail below in conjunction with Fig. 3-14.Management module 120 and task distribution Module 140 can be with hardware, software or both be implemented in combination with.In one embodiment, management module 120 And task allocating module 140 is realized by software, this software can be stored in polyad collecting system 100 can In the system storage 130 of access or other non-transitory computer-readable medium.This software can be hard by central authorities The cluster activated or processor core in part unit or polyad collecting system perform.

With cluster Cluster (0) of display, Cluster (1) and the spy of Cluster (2) in Fig. 1 top half As a example by writing view.In this example embodiment, the processor core in each cluster is identical.But, real at other Executing in example, the processor core in each cluster can be different.In this example embodiment, cluster Cluster (0) bag Including four processor cores (e.g., four LLP), (such as, cluster Cluster (1) includes four processor cores Four LP) and cluster Cluster (2) include four processor cores (such as, four BP).It should be understood that , each cluster can include any number of processor core, and different cluster can have varying number Processor core.Between different clusters, processor core can have similarity in various degree and dissimilarity. It addition, compare the processor core of some other cluster centering, the processor core in some cluster centerings can have There is more like or less same energy efficiency characteristic.In one embodiment, LP and LLP has identical Or similar calculated performance；It is to say, they differences on MIPS are negligible.But, when When identical temperature survey, the energy efficiency characteristic of LP and LLP is the most different from each other.BP has and LP And the different calculated performance of LLP and energy efficiency characteristic.Such as, BP can have more higher than LP and LLP Calculated performance and higher power consumption.Other cluster in Fig. 1 also can have with LP, LLP and BP not Same energy efficiency characteristic.The energy efficiency characteristic of different processor core is described below in conjunction with Fig. 2 A and 2B.

Fig. 2 A is the relation for frequency of the power consumption (it is contrary with energy efficiency) according to an embodiment Signal Figure 200.Signal Figure 200 includes three energy efficiency characteristic curves 210,220 and 230, respectively Represent cluster Cluster (0), the peak performance frequency of the processor core in Cluster (1) and Cluster (2) Scope.Signal Figure 200 demonstrates that cluster Cluster (0) is that the energy is the most efficient in low frequency ranges, In mid frequency range, cluster Cluster (1) is that the energy is the most efficient, cluster Cluster in high-frequency range (2) it is that the energy is the most efficient.Signal Figure 200 also demonstrates that curve 210 intersects with curve 220, and curve 220 do not intersect with curve 230.Intersection region SP (0,1) is referred to as the frequency in optimum district (sweet-spot) Scope, or it is simply referred as optimum district or Frequency point district (frequency spot).Optimum district SP (i, j) Represent the region, coboundary of the peak performance frequency range of cluster Cluster (i), and cluster Cluster (j) The lower boundary region of peak performance frequency range.These borderline regions are not the rigid restrictions on operation frequency, Such as cluster Cluster (0) is also operable in the frequency on SP (0,1), and cluster Cluster (1) Also the frequency under SP (0,1) it is operable in.These borderline regions only indicate whether cluster operates for this It is in the frequency range of high efficiency of energy for cluster.Each optimum district is associated with two clusters, such as SP (0,1) it is associated with cluster Cluster (0) and Cluster (1).In the side in given optimum district, The energy efficiency of each processor core in one cluster is higher than the energy of each processor core in other clusters Efficiency；The energy efficiency of each processor core in the opposite side in given optimum district, a cluster is low The energy efficiency of each processor core in other clusters.Such as, in the right side of SP (0,1), cluster The energy efficiency of each processor core in Cluster (1) is higher than each process in cluster Cluster (0) The energy efficiency of device core；In the left side of SP (0,1), each processor core in cluster Cluster (0) Energy efficiency is higher than the energy efficiency of each processor core in cluster Cluster (1).In certain embodiments, Cluster can be associated with the optimum district of more than two, and some of which optimum district can be located at its peak performance One end of frequency range, and other optimum district can be located at the other end of its peak frequency range.

In this example embodiment, relative to the distance along power consumption (vertically) axle, energy efficiency characteristic curve 230 with curve 220 at least at a distance of the distance of threshold value (TH).This represents that cluster Cluster (2) compares cluster Cluster (1) more power consumption.Curve 220 and 230 represents not to be had and cluster Cluster (1) and cluster The Frequency point district that Cluster (2) is associated.Therefore, in one embodiment, cluster Cluster (2) can It is placed in the processor group different from cluster Cluster (0) and Cluster (1) (such as, different Level-2 group).

Fig. 2 B is another schematic diagram 250 for the relation of frequency of the power consumption according to an embodiment.With Figure 200 is similar in signal, and schematic diagram 250 includes three energy efficiency characteristic curves 260,270 and 280, point Not Biao Shi cluster Cluster (0), the processor core in Cluster (1) and Cluster (2) peak performance frequency Rate scope.But, schematic diagram 250 with signal Figure 200 difference be, curve 260 and 270 it Between non-intersect, but the most mutually adjacent (that is, within threshold distance).In schematic diagram 250, SP (0,1) it is the frequency range between the tail end of a curve and the top of next adjacent curve.Although curve 260, non-intersect between 270 and 280, they represented energy efficiency characteristics are previously with regards to signal Figure 200 Described is identical.

In this example embodiment, relative to the distance along power consumption axle, energy efficiency characteristic curve 280 is with bent Line 270 is at least at a distance of the distance of threshold value (TH).This represents that cluster Cluster (2) is than cluster Cluster (1) More power consumption.Curve 270 and 280 represents not to be had and cluster Cluster (1) and cluster Cluster (2) The Frequency point district being associated.Therefore, in one embodiment, cluster Cluster (2) can be placed in and cluster In the processor group that Cluster (0) and Cluster (1) is different (such as, different level-2 groups).

Although illustrate only three clusters in Fig. 2 A and 2B, it should be understood that afore-mentioned characteristics is expansible takes office The cluster of meaning quantity.Additionally, different clusters can show different different qualities represented by curve.Each other Can intersect each other between some curves in adjacent curve, some curves in curve adjacent one another are can have There is the region of overlap, and some curves in curve adjacent one another are can be entirely without overlapping region.Can From test result and experiment, determine these curves and optimum district.

Embodiments of the invention provide the system and method for managing the power in polyad collecting system and performance, Polyad collecting system includes the multiple processor cores being layered in group.Layering bottom be level-1 group (again Name cluster).It is one or more level-2 groups on level-1 group in layering, and one or many It is level-3 group on individual level-2 group.Similar with tree construction, level-3 group is positioned at the root layer of layering Level, one or more level-2 groups are positioned at the intermediate level of layering, and level-1 group is positioned at layering Leaf level.In one embodiment, layering can include an intermediate level.In another embodiment, divide Layer may not include intermediate level (that is, not having level-2 group).In another embodiment, layering can include Multiple intermediate level, each intermediate level includes one or more level-2 group.

As described above, in energy efficiency characteristic, the processor core that distinct group is concentrated can have different phases Like property.The processor core of same type can have identical energy efficiency characteristic and may be provided at identical cluster In.In energy efficiency characteristic, there is the processor core of two clusters of higher similarity, can have difference But crossing or close/adjacent energy efficiency characteristic curve.In energy efficiency characteristic, there is more low phase seemingly The processor core of two clusters of property, can have different and non-intersect or farther energy efficiency characteristic curve. To show in embodiment as described below, the cluster in energy efficiency characteristic with higher similarity can set Put in layering group in the group of lower level level, and on the contrary, have lower in energy efficiency characteristic The cluster of similarity may be provided at layering group in more high-level group in.

Such as, each minimum level group, i.e. level-1 group, it may include have in energy efficiency characteristic There is the cluster of the highest similarity (such as, identical energy efficiency characteristic), it is meant that level-1 group includes The processor core of one or more same types.In certain embodiments, level-1 group can include one or Multiple LP clusters, another level-1 group can include one or more LLP cluster, and/or another level-1 Group can include one or more BP cluster.

Additionally, each next to the lowest level group, i.e. level-2 group, it may include in energy efficiency characteristic Have less than in each level-1 group similarity (such as, but different intersect (close or adjacent) Energy efficiency characteristic curve) cluster, it is meant that level-2 group includes having phase more than two kinds of As the processor core of energy efficiency characteristic.In certain embodiments, level-2 group can include one or many Individual first level-1 group and one or more second level-1 group, the most each first level-1 group can Including one or more LP clusters, each second level-1 group can include one or more LLP cluster. It addition, another level-2 group can include one or more first level-1 group, the most each first level -1 group can include one or more BP cluster, and alternatively and additionally, another level-2 group can Including one or more second level-1 groups, each in plurality of second level-1 group can include One or more clusters, compared with LP and LLP cluster, these one or more clusters can have and BP cluster More like energy efficiency characteristic.

Additionally, each 3rd low level group, i.e. level-3 group, it may include in energy efficiency characteristic Have (such as, more different and non-intersect less than the similarity in each level-1 group and level-2 group Or farther energy efficiency characteristic curve) cluster, it is meant that level-3 group includes more than two kinds of There is the processor core of more dissimilar energy efficiency characteristic compared with level-2 group.In some embodiments In, level-3 group can include first level-2 group and second level-2 group, Qi Zhong One level-2 group can include one or more first level-1 group and one or more second level-1 group, Each first level-1 group includes one or more LP cluster, and each second level-1 group can include One or more LLP clusters, and the second level-2 group can include one or more first level-1 group, Each first level-1 group can include one or more BP cluster, and alternatively or additionally, Two level-2 groups can include other level-1 group one or more, every in other level-1 groups multiple One can include another type of cluster, and compared with LP and LLP cluster, this another type of cluster has The energy efficiency characteristic more like with BP cluster.

Fig. 3 is the schematic diagram of the example of the polyad collecting system 30 in the layering group according to an embodiment. This layering can be formed by some clusters in Fig. 1 and processor core, and in these clusters and processor core Some can have energy efficiency characteristic curve shown in Fig. 2 A and 2B.In this example embodiment, this layering bag Include four level-1 groups.Each level-1 group is a cluster, it include two identical have identical The processor core of energy efficiency characteristic.This layering also includes two level-2 groups, and each level-2 group wraps Include two level-1 groups.This layering also includes that level-3 group, level-3 group include two levels-2 groups Group.

More specifically, in the example in fig 1, system 30 includes level-1 group G_1f、G_2f、G_3fAnd G_4f (wherein subscript " f " expression " leaf "), level-2 group G_1iAnd G_2i(wherein subscript " i " expression " in Between ") and level-3 group G_r(wherein subscript " r " expression " root ").Although Fig. 3 shows specific The processor core of quantity and the certain amount of group in each level of layering are it should be understood that be System 30 can include any number of processor core in each level-1 group, at each level-2 group Zhong Bao Include any number of level-1 group and include any number of level-2 group in each level-3 group.

In this embodiment, each processor core in identical level-1 group, there is the identical energy Efficiency characteristic.Processor core in different levels-1 group of identical level-2 group, has different Energy efficiency characteristic.Additionally, the different levels-1 group of identical level-2 group (such as, P11 and P22) Processor core in group, has than the processor core in different level-2 groups (such as, P11 and P33) Even more like energy efficiency characteristic.Such as, G_1f、G_2fAnd G_3fEnergy efficiency characteristic curve, can use respectively Curve 210,220 and 230 in Fig. 2 A represents, or can respectively with the curve 260,270 in 2B and 280 represent.Due to relative to the distance along power consumption axle, curve 210 and 220 (or curve 260 and 270) draw closer together than curve 220 and 230 (or curve 270 and 280), G_1fAnd G_2fCan be placed in phase In same level-2 group, and G_3fCan be placed in another level-2 group.It is to say, when two songs Distance between line is more than threshold value (TH in Fig. 2 A and 2B), and level-1 group corresponding to the two can put It is placed in different level-2 groups.Thus, appointing in different levels-1 group of identical level-2 group Anticipate two processor cores, have at than any two in different levels-1 group of different level-2 groups Reason device core closer to energy efficiency characteristic curve.In certain embodiments, in different level-2 groups Processor core not only can have different energy efficiency characteristics, it is possible to has different calculated performances (such as, When they differences on MIPS are more than threshold value).

In one embodiment, the one or more processors in each level-1 group form SMP architecture； It is to say, each level-1 group has SMP architecture.One or more layers in each level-2 group Level-1 group forms AMP framework；It is to say, each level-2 group has AMP framework.Level-3 One or more levels-2 group in group forms HMP framework；It is to say, level-3 group has HMP framework.SMP architecture can include the homogeneity processor (homogeneous processors) of independent operating Pond.In the processor group with SMP architecture, two or more identical processor cores are connectable to altogether The system storage enjoyed, can access identical I/O equipment, and be treated the single of these processor cores by equivalent Operation system example controls.Each processor core is to shared storage space, when can have identical access Prolong.In AMP framework, not every processor core is treated equally.There is the place of AMP framework Li Qi group can include two or more different types of processor cores, and different types of processor core has Different energy efficiency characteristics and the calculated performance being substantially the same.The processor group with HMP framework can Including two or more different types of processor cores, different types of processor core has different calculating Performance, different energy efficiency characteristics and the different access delays to system storage.These different process Device core can share identical storage space, or can be allocated the different piece of storage space.

In one embodiment, each in level-1 group has SMP architecture, and SMP architecture uses each From SMP scheduler come in level-1 group processor core distribution task.SMP scheduler schedules processes Multiple tasks on device core, and also optimize in identical level-1 group the task across multiple processor cores and divide Join, with the load balance between the processor core in the most identical level-1 group.Such as, SMP Scheduler 11,12, the 13 and 14 processor core G in level-1 group respectively_1f、G_2f、G_3fAnd G_4f Distribution task.

Additionally, each level-2 group (G_1iAnd G_2i) there is AMP framework, AMP scheduler is to level-2 Processor core distribution task in level-1 group of group.Such as, an AMP scheduler 21 is to level -1 group G_1fAnd G_2fIn processor core distribution task, the 2nd AMP scheduler 22 is to level-1 group G_3fAnd G_4fIn processor core distribution task.Each AMP scheduler 21 and 22 is to different processor cores Scheduler task, the different processor core in different levels-1 group to level-1 group that major general is identical Energy efficiency characteristic is in view of in scheduling.When by system workload institute required time, each AMP scheduler 21 And 22 can the simultaneously distribution of the processor core in its all of level-1 group task.To provide in conjunction with Fig. 7-14 The further detail below of AMP scheduling.More details also refers to the United States Patent (USP) of Application No. 14/931,923.

It addition, level-3 group G_rHaving HMP framework, HMP framework uses HMP scheduler 31 to layer Level-2 group G_1iAnd G_2iDistribution task.HMP scheduler 31 can optimize in HMP framework across multiple process The task distribution of device core, the different disposal in different levels-2 group to level-3 group that major general is identical The different calculated performances of device core are in view of in scheduling.When by system workload institute required time, HMP scheduler 31 can be simultaneously to two level-2 group G_1iAnd G_2iIn processor core distribution task.

In one embodiment, can be first by level-1 group G_1fAnd G_2fIn each perform load balance. Such as, at level-1 group G_1fIn, load balance can be performed according to the first predetermined sequential.It addition, scheduling Device SMP 11 can perform G_1fIn the load balance of par-ticular processor core, with in identical level-1 group G_1fIn processor core between balance its load.Subsequently, also can be according to the second scheduled timing, at G_1fBelonging to Level-2 group G_1iMiddle execution load balance.Scheduler AMP 21 can perform identical par-ticular processor core Load balance, with at identical level-2 group G_1iIn processor core between balance its load.Subsequently, Also can be according to the 3rd scheduled timing, at G_1fAffiliated level-3 group G_rMiddle execution load balance.HMP adjusts Degree device 31 can perform the load balance of identical par-ticular processor core, with at identical level-3 group G_rIn Processor core between balance its load.Level-1 group G to other_2f, phase can be performed by scheduler 12 As operating process, and then level-2 group G to other_1i, can perform similar by scheduler 21 Operating process, and then level-3 group G to other_r, similar operation stream can be performed by scheduler 31 Journey.Additionally, can be by respective scheduler 13 and 14, respectively to other level-1 group G_3fAnd G_4f, hold The operating process that row is similar.Scheduler 22 can be respectively at level-1 group G_3fAnd G_4fIn each negative After carrying balance, level-2 group G of execution_2iIn load balance, and then scheduler 31 perform Level-3 group G_rIn load balance can follow level-1 group G respectively_3fAnd G_4fIn the load of each Balance.

In one embodiment, in each level-2 group, the energy efficiency characteristic of each level-1 group At least one preset frequency point district of curve definitions, preset frequency point district crosses or adjacent to (that is, in threshold value Apart from interior) at least one other level-1 group in identical level-2 group.Can additionally, each In level-2 group, the energy efficiency characteristic curve of each level-1 group from different level-2 groups The energy efficiency characteristic curve of each level-1 group is at least at a distance of distance (such as, the song in Fig. 2 A of threshold value Line 220 and 230, or the curve 270 and 280 in Fig. 2 B).Different layers in identical level-2 group Any two processor core in level-1 group, any in different levels-1 than in different level-2 groups Two processor cores, have closer to energy efficiency characteristic curve.

In one embodiment, corresponding layer is performed each level-1 scheduler (11,12,13 and 14) After the load balance between processor core in level-1 group, the scheduler (21 and 22) of each level-2 Perform the load balance between the processor core in corresponding level-2 group, and then level-3 scheduler 31 perform the load balance in level-3 group between processor core.

Fig. 4 is the schematic diagram of the example of the polyad collecting system 40 in the layering group according to an embodiment. This layering can be formed by some clusters in Fig. 1 and processor core, and in these clusters and processor core Some can have energy efficiency characteristic curve shown in Fig. 2 A and 2B.This example is the special feelings of Fig. 3 Condition, wherein level-1 group G_1fIncluding or be replaced by single processor core P11.Single processor core P11 is not belonging to any level-1 group, and is not dispatched by SMP.This example also shows each level-1 group Group can include the processor core of varying number: G_2fIn two processor cores, G_3fIn three processor cores and G_4f In four processor cores.In layering, the task scheduling in each level is identical with Fig. 3, except in these feelings Under condition, AMP scheduler 21 can be directly to processor core P11 and level-1 group G_2fIn processor core divide Join task.Should also be noted that in this embodiment, scheduler 21 is to level-1 group G_2fIn process Device core distribution task, but the present invention is not limited to this.In other embodiments, scheduler 21 can be to level -1 group G_2fDistribution task, and supplementary module can be to level-1 group G_2fIn processor core distribution task. Supplementary module can be integrated with scheduler 21 or separated.

Fig. 5 is the signal of the another example of the polyad collecting system 50 in the layering group according to an embodiment Figure.This layering can be formed by some clusters in Fig. 1 and processor core, and these clusters and processor core In some had Fig. 2 A and 2B in shown energy efficiency characteristic curve.In this example embodiment, level -2 group G_1i, level-1 group G_3fAnd processor core P44 is directly at level-3 group G_rUnder.Namely Say, level-1 group G_3fAnd processor core P44 is not belonging to any level-2 group.This example also shows that, HMP scheduler 31 can be directly to level-2 group G_1i, level-1 group G_3fAnd processor core P44 distribution Task.

Fig. 6 A is showing of the example of the polyad collecting system 60 in two rank is layered according to an embodiment It is intended to.This layering can be formed by some clusters in Fig. 1 and processor core, and these clusters and processor Energy efficiency characteristic curve shown in some had Fig. 2 A and 2B in core.In this example embodiment, divide Layer group only comprises two levels: root level and leaf level.In order to avoid with Fig. 3-5 described in embodiment Obscuring, the group of two rank is referred to as leaf-level group and root-level group.Root-level group G_rHave HMP framework, and use root-hierarchical scheduler 41 to leaf-level group allocation task.Leaf-level group Group G₁And leaf-level group G₂Directly at root-level group G_rUnder.In identical leaf-level group Processor core can be identical or different.In this embodiment, leaf-level group G₂Including identical process Device core, and use SMP scheduler 13 to distribute task to processor core.Leaf-level group G₁Use across Cluster scheduler 42 is to leaf-level group G₁In processor core distribution task, or leaf-level group G₁Including the processor core of different energy efficiency characteristics, and their energy efficiency characteristic curve is (such as figure In 2A and 2B, the curve represented by curve 210 and 220, or the song represented by curve 260 and 270 Line) difference within threshold value.At leaf-level group G₁In, can be processed by the processor core fixed and interrupt. Additionally, leaf-level group G₁At least one processor core be online (i.e., activating).One In individual embodiment, the scheduler 42 across cluster is embodied as AMP scheduler, but the invention is not restricted to this, Can be embodied as processing any scheduler of the dissimilarity between the processor core that distinct group is concentrated.It addition, In alternate embodiment, leaf-level group G₁Including single cluster, this single cluster comprises dissimilar Processor core or but there is the different processor core of the most adjacent energy efficiency characteristic.In such reality Execute in example, across cluster scheduler 42 by single cluster processor core distribution task scheduler Replace.

In the embodiment of two levels shown in fig. 6, identical leaf-level group (such as, G₁ Or G₂Any two processor core in), has and processes than any two in different leaves-level group Device core closer to energy efficiency characteristic curve.Additionally, at least one in leaf-level group is (such as, G₁In), the energy efficiency characteristic curve of each processor core defines the Frequency point district that at least one is predetermined, should Frequency point district crosses or adjacent at least one other processor core in identical leaf-level group.It addition, In each leaf-level group, the energy efficiency characteristic curve of each processor core and different leaf-levels The energy efficiency characteristic curve of each processor core in group, at least at a distance of the distance of threshold value.

In the embodiment of Fig. 6 A, root-hierarchical scheduler 41 can be HMP scheduler.Implement at another In example, root-hierarchical scheduler 41 can be IKS scheduler.IKS scheduler to processor core to distribution task. Such as, every pair of processor core can include processor core P11 and P33, or P22 and P33；That is, different Energy efficiency characteristic and two processor cores of different calculated performances.When using IKS scheduling, processor pair Middle only one of which processor activates.

Fig. 6 B is showing of the example of the polyad collecting system 65 in two rank is layered according to another embodiment It is intended to.This layering can be formed by some clusters in Fig. 1 and processor core, and these clusters and processor Energy efficiency characteristic curve shown in some had Fig. 2 A and 2B in core.In this example embodiment, root- Level group G_rThere is HMP framework, and use root-hierarchical scheduler 41 to leaf-level group or its In processor core distribution task.Leaf-level group G_1f、G_2f、G_3fAnd G_4fDirectly in root-level group G_rUnder.Each leaf-level group includes identical processor core, and use SMP scheduler 11, 12,13 and 14 distribute task to leaf-level group or processor core therein.Root-hierarchical scheduler 41 Can be HMP scheduler or IKS scheduler.

In another alternate embodiment, layering group can include more than three level；Such as, leaf layer Level has SMP architecture, and multiple intermediate level have AMP framework, and root level has HMP framework. Similar to the example of three levels in Fig. 3, root level use HMP scheduler 31 to directly root level it Under intermediate level distribution task (specifically, distribute task to intermediate level group or processor core therein), And each intermediate level uses AMP scheduler (such as, AMP scheduler 21 or 22) to directly at it Under level distribution task (specifically, to the group of this level or processor core therein distribution task). Leaf level uses SMP scheduler (such as, SMP scheduler 11,12,13 and 14) to each leaf Processor core distribution task in-level group or level-1 group.

Additionally, in above-mentioned any embodiment, scheduler can be to a processor group or wherein have higher The processor core of calculated performance, distribute heavy (such as, when the quantity of thread or task exceedes threshold value) Or urgent (such as, when admissible delay or time delay are less than threshold value) task, and lower to having Another processor group of calculated performance, distribution is easily or the most urgent (it is with heavy or urgent Task on the contrary).Use the example in Fig. 3, it is assumed that P33 is (at G_3fIn) calculated performance higher than P11 And P12 is (respectively at G_1fAnd G_2fIn) calculated performance.When system 30 receives heavy or urgent appointing Business, and level-1 group G_3fNot activating, system 30 can be according to heavy task and urgent appointing The quantity of business, or heavy task and the total load of urgent task, activate G_3fIn one or more places Reason device core.System 30 can use the scheduler of above-mentioned multiple level, the processor activated in system 30 Group or processor core therein distribution task.In the intermediate level of layering, can be met by AMP scheduling Low to middle performance requirement, and the root level in layering, can be met high or instant by HMP scheduling Performance requirement.Load in each level-1 group, between SMP scheduling balance treater core.

Due to layer architecture, processor core can more efficiently be dispatched, activated, distribution task or request.Example As, in the scheduling of level-2 group, it may be considered that different energy efficiency characteristics.In one embodiment, Use AMP scheduler, interrupt requests (interrupt requests) can be arranged neatly, with by different clusters In different processor core process.Correspondingly, may further determine that the whether unwanted processor core of deexcitation.With After, also can promote systematic function or the power consumption of entirety.On the whole, can reach the energy efficiency of system.

Description subsequently uses cluster Cluster (0) of Fig. 1 and Cluster (1) as an example, explains Aforementioned AMP framework and AMP scheduling.In the following description, cluster Cluster (0), Cluster (1) other cluster and shown in Fig. 1 is all dispatched by the AMP of AMP scheduler.

Referring again to Fig. 1, polyad collecting system 100 includes the first cluster (Cluster currently activated ), and the first cluster farther includes one or more first processor core (0).When polyad collecting system Detect that the current operating frequency of the first cluster (that is, the cluster of activation) enters or crosses the one of the first cluster During the event of any one in individual or multiple preset frequency point district, polyad collecting system 100 performs following steps: (1) identify the second cluster (that is, target cluster), such as cluster Cluster (1), it include one or In multiple second processor cores, each first processor core in the first cluster and the second cluster each second Processor core has different energy efficiency characteristics；(2) at least one second process in the second cluster is activated Device core；(3) determine whether from the first cluster, one or more interrupt requests are transferred to the second cluster；And (4) at least one first processor in deexcitation the first cluster is determined whether according to performance and power demand Core.In one embodiment, the second cluster is identified as relevant to preset frequency point district that is that enter or that cross One cluster of connection.If additionally, the second cluster is activated before step (2), maintaining target cluster State of activation.If the second cluster is not activated before step (2), target cluster is switched to State of activation (that is, is activated).

In one embodiment, aforementioned events may indicate that the first cluster does not operates with energy efficiency.? In this polyad collecting system, cluster can be associated with respective one or more preset frequency point districts.When first The current operating frequency of cluster enters or crosses one or more preset frequencies of (that is, passing through) the first cluster When putting any one in district, this event detected.Special according to the respective energy efficiency of the processor core in cluster Property, it may be determined that respective one or more preset frequency point districts.Each in these Frequency point districts is permissible Be the first cluster with energy efficiency to operate time residing scheduled frequency range borderline region.Frequency range and Frequency point district can be predefined by the designer of processor core or maker.Additionally, in certain embodiments, This polyad collecting system includes voltage adjuster (voltage regulator), and it is used for controlling in feed system not Voltage with the different processor core of type processor.Each cluster or each processor core have with system The voltage adjuster of oneself is compared, and has the single voltage adjuster for whole system and can save hardware cost. But, the present invention is not limited to single voltage adjuster or multiple voltage adjuster.

When above-mentioned event being detected, interrupt requests can be transferred to another processor core in the second cluster.? Under certain situation, when above-mentioned event being detected, at another in interrupt requests is transferred to the second cluster After reason device core, can deexcitation the first cluster.If interrupt requests is transferred to the second cluster, system according to System workload, can keep or not keep the first cluster to activate.Therefore, system need not what maintenance was fixed Processor core or fixing cluster are constantly in mode of operation to process interrupt requests.Therefore, system can more have Efficient operate.Whether shift the decision of interrupt requests and whether deexcitation the first cluster, can be depending on down Some factors described in detail in literary composition.

In one embodiment, above-mentioned steps (3) determines whether one or more interrupt requests from first Cluster is transferred in the second cluster and step (4) determine whether that deexcitation is worked as according to performance and power demand Before the middle at least one of cluster that activated, depend on the quantity required of processor core and the polyad activated Comparison between the total quantity of the processor activated in collecting system.In other words, step (3) and (4) Determination step at least one be according to the quantity required of processor core activated and polyad collecting system In the total quantity of processor that activated perform.

In one embodiment, for determining quantity required or the whether deexcitation of the processor core activated One factor of the cluster activated, is to need thread to be processed or the quantity of task.It is referred to as the instruction of hTLP Symbol represents have loaded thread or the quantity of task, and h represents load wherein, and TLP represents " line Journey level parallelization (thread level parallelism) " or " task level parallelization (task level parallelism)”." loading " can be percentage ratio or ratio (such as, 50%, 80% or 100% etc.). HTLP instruction is for the quantity required of the processor core activated of processing system live load.An enforcement In example, thread or the quantity (thread that such as, system need to process of task of threshold value can be more than according to load Or the quantity of task is multiplied by load) obtain or calculate the quantity required of the processor core activated.Work as system When the quantity of the processor core that live load increase activates keeps identical, then load increase.When load is super When crossing a predetermined threshold, more processor core or more cluster can be activated, to maintain load less than being somebody's turn to do Threshold value.In one embodiment, value based on hTLP, determine the processor needing how much activated in system Core and the cluster activated.

In this example embodiment, all LLP in cluster Cluster (0) are activated, and cluster Cluster (1) all LP in are deactivated.Additionally, the one in processor core LLP is to process the place interrupted Reason device (interrupt-handling processor) (shows having the white square of oblique line).In order to retouch The facility stated, in this example all other cluster Cluster (2), Cluster (3) ... Cluster (M) It is deexcitation.However, it should be understood that any cluster can be activated in any preset time.As subsequently Will describe, when the operating frequency changes, another cluster that can activate in polyad collecting system 100 (claims For " the second cluster " or " target cluster "), and the one or more processor cores in this second cluster Can the role of adapter interrupt processing.Based on polyad collecting system 100, whether there is enough process activated Device core processes live load current or on the horizon, cluster Cluster (0) can remain that activated or Deexcitation.

When in the system with more than two cluster, interrupt requests being transferred to another cluster from a cluster Time, this transfer can be directly or indirectly.Such as, current operation frequency can be from cluster Cluster (0) Peak performance frequency range increase to the peak frequency range of cluster Cluster (2).But, with cluster The peak performance frequency range of Cluster (2) is compared, the peak performance frequency range of cluster Cluster (1) Peak performance frequency range closer to cluster Cluster (0).Interrupt requests can be from cluster Cluster (0) It is directly transferred to cluster Cluster (2).Alternatively, interrupt requests can first turn from cluster Cluster (0) Moving to cluster Cluster (1), then interrupt requests can be transferred to cluster Cluster from cluster Cluster (1) (2)。

Following description is herein in connection with Fig. 7-14, it is provided that AMP framework and the details of AMP scheduling.Fig. 7 is root According to one embodiment of the invention, two clusters (such as, cluster Cluster (0) and Cluster (1)) The schematic diagram of transformation of four operation scenarios.Cluster Cluster (0) and Cluster (1) have substantially Identical calculated performance and different energy efficiency characteristics；Such as, their energy efficiency curve is probably figure Curve (such as, curve 210 and 220, or curve 260 and 270) shown in 2A or Fig. 2 B.Figure 7 show four kinds of operation scenarios: (S1) and (S3) is high-performance scene, and two clusters are all wherein Activate；(S2) being low performance scene, the most only cluster Cluster (0) activates；(S4) Activate for intermediate performance scene, the most only cluster Cluster (1).Each scene can be changed into Other scene arbitrary.(S2) with (S4) all under conditions of hTLP N operate, and (S1) with (S3) all operate under conditions of hTLP ＞ N, the most in this example embodiment N=4 (processor in a cluster The total quantity of core).

Hereinafter, " the first cluster " represents the cluster initially activated.Therefore, the first cluster is also referred to as " cluster activated "." the second cluster " represents the cluster being different from the first cluster.Second cluster is also referred to as For " target cluster ".It it is the most all the scene (such as, (S1) and (3)) activated two clusters In, the cluster initially processing all interrupt requests is referred to as the first cluster.Enter at current operation frequency or When crossing the Frequency point district of the first cluster, trigger the switching between any two scenes.

In (S2), only cluster Cluster (0) activates, and it has the process processing interruption Device core (showing having the square of oblique line) can process all interrupt requests.Enter at current operation frequency Enter SP (0,1) or the frequency range from cluster Cluster (0), cross SP (0,1) and enter cluster During the frequency range of Cluster (1), management module 120 activates cluster Cluster (1), and determines and be No interrupt requests is transferred to cluster Cluster (1) and whether deexcitation cluster Cluster (0).According to being System is currently located in which operation scenario and which operation scenario will be system will enter, and determines that these determine.System System can be changed into (S4) from (S2), and all of interrupt requests is transferred to cluster Cluster (1) wherein Processor core (as shown in the oblique line square in scene (S4)), and deexcitation cluster Cluster (0). Similarly, if system is initially residing in (S4), current operation frequency enter SP (0,1) or from The frequency range of cluster Cluster (1), crosses SP (0,1) and enters the frequency model of cluster Cluster (0) When enclosing, system can be changed into (S2) from (S4).Additionally, all of interrupt requests is transferred to cluster Cluster (0) the process core in, and deexcitation cluster Cluster (1).

Briefly, occur under the following conditions with the transformation of any direction between (S2) and (S4): the Two when being clustered in before transformation by initially deexcitation, and the quantity required at the processor core activated When (that is, hTLP) is less than or equal to the total quantity of the processor core activated in the first cluster.(S2) with (S4) mean that interrupt requests is transferred to the second cluster from the first cluster with the transformation of any direction between.Additionally, (S2) and between (S4) with the transformation of any direction it is meant that after the transition, the second cluster is to have activated And the first cluster is deexcitation.

In (S1) and (S3), initially cluster Cluster (0) and Cluster (1) activate. In (S1), cluster Cluster (0) has and processes the processor interrupted and (show having the square of oblique line Show) all interrupt requests can be processed.SP (0,1) is entered or from cluster Cluster at current operation frequency (0) frequency range, when crossing SP (0,1) and enter the frequency range of cluster Cluster (1), management Module 120 determines whether interrupt requests is transferred to cluster Cluster (1), and determines whether deexcitation group Collection Cluster (0).Which operation scenario can be currently located according to system and which operation will be system will enter Scene, determines that these determine.System can be changed into (S3) from (S1), wherein by all of interruption Request is transferred to the process core (as shown in the oblique line square in scene (S3)) of cluster Cluster (1).Phase As, enter SP (0,1) or the frequency range from cluster Cluster (1) at current operation frequency, When crossing SP (0,1) and enter the frequency range of cluster Cluster (0), system can be changed into from (S3) (S1).In from (S3) to the transformation of (S1), interrupt requests is transferred to cluster Cluster (0) Process core.

Briefly, occur under the following conditions with the transformation of any direction between (S1) and (S3): the Two be clustered in transformation before when initially being activated, and the processor core activated quantity required (i.e., When hTLP) being more than in the first cluster the total quantity of the processor core activated.(S1) and between (S3) with The transformation of any direction is it is meant that interrupt requests is transferred to the second cluster from the first cluster.Additionally, (S1) with (S3) changing it is meant that first and second cluster all maintains its state activated with any direction between.

Change between system left side and right side the most in the figure 7.Such as, from (S2) to the transformation of (S1) And from (S4) to the transformation of (S3) occur under the following conditions: second be clustered in transformation before the most initial During ground deexcitation, and increase to greater than the at the quantity required (that is, hTLP) of the processor core activated When a group concentrates the total quantity of the processor core activated.In these change after any one, the second cluster Activated and do not shifted interrupt requests, that is, by with change before identical cluster process.

Additionally, from (S1) to the transformation of (S2) with from (S3) to the transformation of (S4) occur at following bar Under part: second be clustered in transformation before when the most initially activating, and the demand at the processor core activated Quantity (that is, hTLP) is reduced to less than or during equal to the total quantity of the processor core activated in the first cluster. In the two changes after any one, deexcitation the second cluster and interrupt requests by with phase before changing Same cluster processes.

Although Fig. 7 and accompanying drawing subsequently demonstrate that each only one of which processor core processes interrupt requests, one In the case of Xie, the same time can be processed interrupt requests by more than one processor core.Therefore, one In a little embodiments, when activating the second cluster, in the second processor core in system identification the second cluster Individual or multiple target processor cores, and interrupt requests is transferred to the second cluster from the first cluster this one Individual or multiple target processor cores.

Fig. 8 is according to an embodiment of the invention, uses AMP scheduling to operate the side of level-2 group efficiently The indicative flowchart of method 400.Method 400 is performed by polyad collecting system 100；Such as, in Fig. 1 Management module 120.In this example, it is assumed that level-2 group includes (M+1) individual cluster, without loss of generality In the case of, it is assumed that sorting from low to high in frequency according to cluster respective peak performance frequency range Time, this (M+1) individual cluster follows Cluster (0) ＜ Cluster (1) ＜ Cluster (2) ＜ ... ＜ Cluster (M) order.Also assume that, in this example embodiment, as hTLP N (or equivalently, when the most sharp When the quantity required of the processor core lived is less than or equal to N), system workload can be by single cluster (Cluster (m)) effectively process, the quantity of processor core during N is cluster Cluster (m) wherein, and m It is integral indices, 0 m M.

Initially, in block 410, cluster Cluster (m) activates, and it has in process Disconnected processor core (referred to herein as IHP).Cluster Cluster (m) and SP (m-1, m) and SP (m, m+1) Be associated, wherein SP (m-1, m) and SP (m, m+1) is the preset frequency of cluster Cluster (m) The lower boundary of scope and coboundary.For simplicity, herein operation frequency is referred to as OPFreq.Additionally, Term " OPFreq SP (i, j) " or its equivalents it is meant that operation frequency Frequency point district SP (i, j) in Or less than Frequency point district SP (i, j).In other words, " OPFreq ＞ SP (i, j) " or equivalents it is meant that Operation frequency increased and cross SP (i, j).

If polyad collecting system 100 detects that (m-1 m) and hTLP N, then meets OPFreq SP Situation about specifying in square 420, and system advances to square 425, to find closest to OPFreq's SP (i-1, i) so that OPFreq SP (i-1, i), wherein 1 i m.Polyad collecting system 100 is the most sharp Live Cluster (i-1), IHP is switched to Cluster (i-1) from Cluster (m), task is transferred to Cluster (i-1) and deexcitation Cluster (m).Without meeting the condition specified in square 420, System advances to square 430.

In square 430, if polyad collecting system 100 detects OPFreq ＞ SP (m, m+1) and hTLP N, then meet the condition specified in square 430, and system advances to square 435, connects most to find The SP (j, j+1) of nearly OPFreq so that and OPFreq ＞ SP (i-1, i), wherein m j M.Polyad Collecting system 100 also activate Cluster (j+1), by IHP from Cluster (m) switch to Cluster (j+1), Task is transferred to Cluster (j+1) and deexcitation Cluster (m).Without meeting square 430 The middle condition specified, system advances to square 440.

Use the example of Fig. 7, from square 410 to the transformation of square 420 corresponding to from (S4) to (S2) Transformation, and from square 410 to the transformation of square 430 corresponding to from (S2) to the transformation of (S4).

In square 440, if polyad collecting system 100 detect OPFreq SP (m-1, m) and hTLP ＞ N, then meet the condition specified in square 440, and system advances to square 445, in the following sequence In any one activate cluster: (1) Cluster (m-1), Cluster (m-2), Cluster (m-3) etc., Or (2) Cluster (m-1), Cluster (m+1), Cluster (m-2), Cluster (m+2) etc., directly The processor core activated in polyad collecting system 100 provides enough disposal abilities to work with support system Till load；In other words, until the total quantity of the processor activated is more than or equal to the process activated Till the quantity required of device core.In this case, Cluster (m) keeps activating.IHP can stay and change In cluster (such as, Cluster (m)) identical before square 445, an example of this transformation is figure It is converted to (S3) from (S4) in 7.Alternatively, when being converted to square 445, another group activated The processor core concentrated can undertake the role of IHP, and an example of this transformation is directly to be converted to from (S4) (S1), or via (S3), it is converted to (S1) from (S4).

Without when meeting the condition specified in square 440, polyad collecting system 100 advances to square 450. In square 450, if polyad collecting system 100 detects OPFreq ＞ SP (m, m+1) and hTLP ＞ N, then meet the condition specified in square 450, and system 100 advances to square 455, with following Any one in Shun Xu activates cluster: (1) Cluster (m+1), Cluster (m+2), Cluster (m+3) Deng, or (2) Cluster (m+1), Cluster (m-1), Cluster (m+2), Cluster (m-2) etc., Until the processor core activated in polyad collecting system 100 provides enough disposal ability with support system Till live load；In other words, until the total quantity of the processor core activated is more than or equal to activating Processor core quantity required till.In this case, Cluster (m) keeps activating.IHP can stay Be converted to square 455 before in the most identical cluster (such as, Cluster (m)), an example of this transformation Son is to be converted to (S1) from (S2) in Fig. 7.Alternatively, when being changed into square 445, another is Processor core in the cluster activated can undertake the role of IHP, and an example of this transformation is straight from (S2) Switch through and fade to (S3), or via (S4), be converted to (S3) from (S2).

Without meeting the condition specified in square 450, polyad collecting system 100 may return to square 410. When new Action Events being detected, or when operation frequency change or system workload change being detected, Repetition methods 400 can be spaced at a fixed time.

As shown in square 445 and 455, when meeting condition 440 or 450, system can activate more than one Cluster.In each cluster activated, system can determine that all processor cores activated in this cluster or It is to activate the processor core all or fewer than number of processor cores.In one embodiment, polyad collecting system 100 Can decide whether to activate or the deexcitation one or more clusters in addition to the cluster activated and target cluster In one or more processor cores.This determines the quantity required that can be depending on the processor core activated.

Fig. 9 is according to an embodiment of the invention, has the additional detail of interrupt processing function, Fig. 1 The schematic diagram of polyad collecting system 100.In this embodiment, polyad collecting system 100 includes in the overall situation Disconnected controller (Global Interrupt Controller, GIC) 510, it is coupled to polyad collecting system 100 In each processor core and multiple equipment 520 and 530.Equipment 520 on chip, i.e. with (M+1) On the system single chip (SOC) 550 that individual cluster is identical；And equipment 530 is not on chip.Equipment 520 Be that example includes but not limited to graphic process unit, signal processor etc..Equipment 530 be example include but It is not limited to system storage, input-output apparatus etc..Processor in equipment 520 and 530, and cluster Core, can generate interrupt requests and send this interrupt requests to GIC 510.GIC 510 is for processing interruption Processor core (IHP 570) forwards this interrupt requests.In one embodiment, in response to transfer interrupt requests Determination (such as, be transferred to (S4) from (S2) in the figure 7, or be transferred to (S3) from (S1)), The interrupt processing role of IHP 570 will transfer to the one or more processor cores in another cluster activated.

Figure 10 is according to an embodiment of the invention, by interrupt requests and task from the first cluster (Cluster (0) processor core (representing with LLP 620) in) is transferred in the second cluster (Cluster (1)) The schematic diagram of the method for processor core (representing with LP 630).In square 611, manage module 120 Interrupt requests is transferred to the LP 630 of cluster Cluster (1) by detection from the LLP 620 of cluster Cluster (0), Condition, such as the condition changed between (S2) and (S4) in Fig. 7, and then deexcitation LLP 620. When this event being detected, management module 120 notifies that LLP 620 stops receiving interruption in block 612. After receiving the notice of Self management module 120 in square 621, LLP 620 stops receiving new interruption, And if any, complete current interrupt processing.In square 622, LLP 620 notifies LP 630 Prepare interrupt requests transfer, and wait the confirmation (acknowledgement, ACK) from LP 630. When in square 631, LP 630 receives the notice from LLP 620, LP 630 in square 632 ACK is returned to LLP 620.Then, in square 633, LP 630 waits the ACK from LLP 620, To work on.

Receiving after the ACK of LP 630 at LLP 620, in square 623, LLP 620 passes through GIC 510 is set interrupt processing function is transferred to LP 630, so that following all of interrupt requests is forwarded To LP 630.In square 624, LLP 620 arranges GIC 510 to enable the interrupt processing merit of LP 630 Energy.In square 625, LLP 620 notifies that LP 630 continues its work.In square 634, receiving After notice, LP 630 continues its work.

Be connected in or and then interrupt processing transfer, in square 641, task allocating module 140 start by Task in LLP 620 is transferred to other CPU activated.In square 642, task allocating module 140 Stop distributing task to LLP 620, to be at idle condition.Then, in square 643, task is divided Join module 140 deexcitation LLP 620.Some in an alternate embodiment, in square 641-643 Operation can be performed by management module 120.

Figure 11 is according to an embodiment of the invention, turns between cluster Cluster (0) and Cluster (1) The schematic diagram become.As shown in low-performance condition scene (S5), the place of 3/4ths in cluster Cluster (0) Reason core the most activate, the most each processor core be part loading (partially loaded) (with The dotted line crossing processor core represents).In cluster Cluster (0) three processor cores activated it Between live load be balance.From (S5) to the transformation of intermediate performance scene (S6) occur with at Fig. 7 In under the same terms that (S2) is converted to (S4).In from (S5) to the transformation of (S6), group Collection Cluster (0) is deexcitation, and cluster Cluster (1) activates.Additionally, interrupt requests from Cluster Cluster (0) is transferred to cluster Cluster (1).After this transformation, in cluster Cluster (1) Three processor cores activated between live load be balance.Similarly, from (S6) to (S5) Transformation occur be converted to from (S4) in the figure 7 under the same terms of (S2).From (S6) To the transformation of (S5), cluster Cluster (1) is deexcitation, and cluster Cluster (0) is to have swashed Live.Additionally, interrupt requests is transferred to cluster Cluster (0) from cluster Cluster (1).

This example shows, even if when the first cluster (that is, the current cluster activated) has one or many During the processor core of individual deexcitation, it is also possible to change.This example also shows that, the second cluster is (i.e., Target cluster) in the quantity of processor core that activated depend on current system workload.Therefore, exist When currently need not the disposal ability of some processor cores in the second cluster, these processor cores are positively retained at Deexcitation.With afterwards before changing, between the processor core activated in each cluster, live load It it is balance.

Figure 12 is according to another embodiment of the present invention, turns between cluster Cluster (0) and Cluster (1) The schematic diagram become.In this example embodiment, scene (5) is identical with the scene in Figure 11 with (6).But, take For being directly converted to (S6) from (S5), there is mid-scene (S7), wherein, processed and interrupt Processor core shifted before other processor core of Cluster (0).This example demonstrates, a cluster Activate can be performed by a processor core with the deexcitation of another cluster every time, each by two processor cores Perform or performed by any number of processor core every time.

In scene (S7), two clusters all have one or more processor core activated.At one In embodiment, these two clusters can operate the most on the same frequency, such as in Frequency point district SP (0,1). Alternatively, cluster Cluster (0) operation frequency not higher than SP (0,1) frequency range (i.e., Side in SP (0,1)), and the operation frequency of cluster Cluster (1) is being not less than SP's (0,1) Under the restriction of frequency range (that is, at the opposite side of SP (0,1)), these two clusters are operable, and it is respective In frequency range so that these two clusters can operate with energy efficiency.In certain embodiments, if two Difference between the operation frequency of individual different cluster within tolerance, single voltage adjuster can to this two Individual different cluster provides two different operation frequencies.

Figure 13, according to further embodiment of this invention, turns between cluster Cluster (0) and Cluster (1) The schematic diagram become.Cluster except having activated can include in addition to the processor core of one or more deexcitation, should Change similar to the transformation of (S1) from (S2) to Fig. 7.This example shows from (S5) to (S8) Changing, it occurs to be more than place in Cluster (0) at current operation frequency higher than SP (0,1) and hTLP When managing the total quantity of device core.In this transformation, the deexcitation processor core in cluster Cluster (0) and group One or more processor cores in collection Cluster (1) activate.And in non-clustered Cluster (1) All processor cores the most activate, as long as these processor cores of having activated provide enough process energy Power processes live load.The processor core interrupted that processes at this example is retained in cluster Cluster (0) In.Between the processor core activated in each cluster, live load is balance.

Figure 14 is according to yet another embodiment of the invention, turns between cluster Cluster (0) and Cluster (1) The schematic diagram become.Cluster except having activated can include in addition to the processor core of one or more deexcitation, should Change similar to the transformation of (S3) from (S4) to Fig. 7.This example shows from (S6) to (S9) Changing, it occurs to be more than place in Cluster (1) at current operation frequency less than SP (0,1) and hTLP When managing the total quantity of device core.In this transformation, the processor core of the deexcitation in cluster Cluster (1) and One or more processor cores in cluster Cluster (0) activate.And non-clustered Cluster (0) In all processor cores the most activate, as long as these processor cores of having activated provide enough process Ability processes live load.The processor core interrupted that processes at this example is retained in cluster Cluster (1) in.Between the processor core activated in each cluster, live load is balance.

In certain embodiments, above-mentioned transformation can be used scene (usage scenarios) to trigger.Such as, Polyad collecting system 100 can activate the processor core different from deexcitation according to some predetermined use scenes With cluster.Such as, as it is shown in fig. 7, from (S1) to the transformation of (S2) can by open screen trigger, From (S2) to the transformation of (S1) can by close screen trigger.These use scene can trigger Figure 11-14 Shown in any transformation.

This specification has been described for various functional unit or square.Those of ordinary skill in the art it is understood that These function block are preferably by circuit and (operate in one or more processors and encoded instruction Special circuit under Kong Zhiing or universal circuit) realize, it generally includes transistor, configures in order to according to herein The function described and operation, carry out the operation of control circuit.The certain architectures of transistor or be connected with each other can be by compiling Translate device to determine, such as depositor transfer language (register transfer language, RTL) compiler. RTL compiler operates according to the script being similar to very much assembly language coding, so that script to be compiled as final electricity The form that the layout on road or manufacture can use.During the electronics convenient design with digital display circuit, RTL Role and use as well-known.

Though the present invention is disclosed above with some embodiments, but those of ordinary skill in the art is appreciated that this Bright it is not limited to described embodiment, under without departing from the spirit and scope of the appended claims, can do Amendment and replacement.Therefore, this specification should be regarded as schematic and unrestricted.

Claims

1. one kind calculates system, it is characterised in that comprise:

Multiple processor cores in layering group, described layering group comprises:

Multiple level-1 groups, each described level-1 group includes that having of the plurality of processor core is identical Energy efficiency characteristic one or more, and each described level-1 group adjusted by respective level-1 Degree device is allocated first task；

One or more level-2 groups, each in the one or more level-2 group includes each Multiple levels-1 group, the place in different described level-1 group of identical described level-2 group Reason device core has each in different energy efficiency characteristics, and the one or more level-2 group It is allocated the second task by respective level-2 scheduler；And

Level-3 group, including the one or more level-2 group, and by level-3 scheduler quilt Distribute the 3rd task.

Calculate system the most as claimed in claim 1, it is characterised in that described level-3 group also comprises and do not belongs to One or more described level-1 group in any described level-2 group.

Calculate system the most as claimed in claim 1, it is characterised in that in identical described level-2 group The different any two processor cores in described level-1 group, has ratio in different described levels-2 groups Group different described level-1 groups in any two processor core closer to energy efficiency characteristic curve.

Calculate system the most as claimed in claim 1, it is characterised in that in each of described level-2 group In, the energy efficiency characteristic curve of each described level-1 group defines at least one preset frequency point district, institute State preset frequency point district to cross or adjacent to other institute of at least one in identical described level-2 group State level-1 group.

Calculate system the most as claimed in claim 1, it is characterised in that in each of described level-2 group In, the energy efficiency characteristic curve of each described level-1 group from different described level-2 groups The energy efficiency characteristic curve of each described level-1 group, at least at a distance of the distance of threshold value.

Calculate system the most as claimed in claim 1, it is characterised in that it is right that each described level-1 group has Claim multiprocessing framework.

Calculate system the most as claimed in claim 1, it is characterised in that described respective level-1 scheduler is Symmetric multi-processors scheduler.

Calculate system the most as claimed in claim 1, it is characterised in that it is many that described level-3 group has isomery Process framework.

Calculate system the most as claimed in claim 1, it is characterised in that described level-3 scheduler is that isomery is many Process scheduler.

Calculate system the most as claimed in claim 1, it is characterised in that in described level-3 scheduler is core Switch scheduler.

11. calculate system as claimed in claim 1, it is characterised in that each of described level-2 group There is asymmetric multiprocessing framework.

12. calculate system as claimed in claim 1, it is characterised in that in described level-2 scheduler is core Switch scheduler.

13. calculate system as claimed in claim 1, it is characterised in that described level-2 scheduler is non-right Claim multiprocessing scheduler.

14. calculate system as claimed in claim 13, it is characterised in that each in described level-2 group In person, each described level-1 group has one or more preset frequency point district, each preset frequency point district It is associated with described level-1 group and other level-1 group in identical described level-2 group.

15. calculate system as claimed in claim 13, it is characterised in that the institute of described level-2 group State asymmetry multiprocessing scheduler and be configured in detection event, described in described event in level-2 group The current operating frequency activating level-1 group, enter or cross described in activated the one of level-1 group Any one in individual or multiple preset frequency point district, wherein said activated level-1 group include one or Multiple first processor cores；And

When described event being detected, described asymmetry multiprocessing scheduler is configured that

Identify target tier-1 group in identical described level-2 group, wherein activated described Each first processor core in level-1 group and in described target tier-1 group each second processes Device core, has different energy efficiency characteristics；

Activate at least one described second processor core in described target tier-1 group；

Determine whether, by one or more interrupt requests, to migrate to described mesh from described level-1 group that activated Mark level-1 group；And

Based on performance and power demand, it is determined whether activated at least one of level-1 group described in deexcitation Described first processor core.

16. calculate system as claimed in claim 1, it is characterised in that at each described level-1 scheduler After performing the load balance between the processor core in corresponding described level-1 group, one or more institutes State each in level-2 scheduler and perform bearing between the processor core in corresponding described level-2 group Carry balance, and the most described level-3 scheduler performs between the processor core in described level-3 group Load balance.

17. 1 kinds calculate system, it is characterised in that comprise:

Multiple processor cores in layering group, described layering group comprises:

Multiple level-1 groups, each described level-1 group includes of the plurality of processor core or many Individual, and there is symmetry multiprocessing framework；

One or more level-2 groups, each in the one or more level-2 group includes each Multiple levels-1 group, the place in different described level-1 group of identical described level-2 group Reason device core has different energy efficiency characteristics；And

Level-3 group, including the one or more level-2 group, and has isomery multiprocessing framework.

18. calculate system as claimed in claim 17, it is characterised in that the one or more level-2 Group is on of multiple intermediate level of described layering group, and each intermediate level includes respective one Individual or multiple described level-2 groups.

19. calculate system as claimed in claim 17, it is characterised in that each in described level-1 group Person is allocated task by respective symmetry multiprocessing scheduler.

20. calculate system as claimed in claim 17, it is characterised in that the one or more level-2 Each in group is allocated task by respective level-2 scheduler.

21. calculate system as claimed in claim 17, it is characterised in that described level-3 group passes through level -3 schedulers are allocated task.

22. 1 kinds calculate system, it is characterised in that comprise:

Multiple processor cores in layering group, described layering group comprises:

One or more leaves-level group, at least one leaf-level in identical leaf-level group Processor core described at least two in group has different energy efficiency characteristics, and one or many Each in individual leaf-level group is allocated first task by respective leaf-hierarchical scheduler；And

Root-level group, including the one or more leaf-level group, and is dispatched by root-level Device is allocated the second task.

23. calculate system as claimed in claim 22, it is characterised in that identical described leaf-level group Any two processor core in group, has and processes than any two in different described leaf-level groups Device core closer to energy efficiency curve.

24. calculate system as claimed in claim 22, it is characterised in that at least one leaf-layer described In level group, the energy efficiency characteristic curve of each processor core defines at least one preset frequency point district, institute State preset frequency point district to cross or adjacent at least one in identical described leaf-level group at other Reason device core.

25. calculate system as claimed in claim 22, it is characterised in that in described leaf-level group In each, the energy efficiency characteristic curve of each processor core is every from different described leaf-level groups The energy efficiency characteristic curve of individual processor core, at least at a distance of the distance of threshold value.

26. calculate system as claimed in claim 22, it is characterised in that described leaf-hierarchical scheduler is Asymmetry multiprocessing scheduler.

27. calculate system as claimed in claim 22, it is characterised in that described-level group has different Structure multiprocessing framework.

28. calculate system as claimed in claim 22, it is characterised in that described-hierarchical scheduler is different Structure multiprocessing scheduler.

29. calculate system as claimed in claim 22, it is characterised in that described-hierarchical scheduler is core Interior switch scheduler.

30. calculate system as claimed in claim 22, it is characterised in that adjust in each described leaf-level After degree device performs the load balance between the processor in corresponding leaf-level group, described-level is adjusted Degree device performs the load balance between the processor core in described-level group.

31. 1 kinds calculate system, it is characterised in that comprise:

Multiple processor cores in layering group, described layering group comprises:

One or more leaves-level group, at least one leaf-level in identical leaf-level group At least two processor core in group has different energy efficiency characteristics；And

Root-level group, including the one or more leaf-level group, and has isomery multiprocessing frame Structure.

32. calculate system as claimed in claim 31, it is characterised in that in described leaf-level group At least one, is allocated task by asymmetry multiprocessing scheduler.

33. calculate system as claimed in claim 31, it is characterised in that described-level group by root- Hierarchical scheduler is allocated task.