WO2012026034A1 - スケジューラ、マルチコアプロセッサシステムおよびスケジューリング方法 - Google Patents
スケジューラ、マルチコアプロセッサシステムおよびスケジューリング方法 Download PDFInfo
- Publication number
- WO2012026034A1 WO2012026034A1 PCT/JP2010/064566 JP2010064566W WO2012026034A1 WO 2012026034 A1 WO2012026034 A1 WO 2012026034A1 JP 2010064566 W JP2010064566 W JP 2010064566W WO 2012026034 A1 WO2012026034 A1 WO 2012026034A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- priority
- task
- data
- processor
- execution
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0811—Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/544—Buffers; Shared memory; Pipes
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B29—WORKING OF PLASTICS; WORKING OF SUBSTANCES IN A PLASTIC STATE IN GENERAL
- B29C—SHAPING OR JOINING OF PLASTICS; SHAPING OF MATERIAL IN A PLASTIC STATE, NOT OTHERWISE PROVIDED FOR; AFTER-TREATMENT OF THE SHAPED PRODUCTS, e.g. REPAIRING
- B29C49/00—Blow-moulding, i.e. blowing a preform or parison to a desired shape within a mould; Apparatus therefor
- B29C49/02—Combined blow-moulding and manufacture of the preform or the parison
- B29C49/06—Injection blow-moulding
- B29C49/061—Injection blow-moulding with parison holding means displaceable between injection and blow stations
- B29C49/062—Injection blow-moulding with parison holding means displaceable between injection and blow stations following an arcuate path, e.g. rotary or oscillating-type
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B29—WORKING OF PLASTICS; WORKING OF SUBSTANCES IN A PLASTIC STATE IN GENERAL
- B29C—SHAPING OR JOINING OF PLASTICS; SHAPING OF MATERIAL IN A PLASTIC STATE, NOT OTHERWISE PROVIDED FOR; AFTER-TREATMENT OF THE SHAPED PRODUCTS, e.g. REPAIRING
- B29C49/00—Blow-moulding, i.e. blowing a preform or parison to a desired shape within a mould; Apparatus therefor
- B29C49/42—Component parts, details or accessories; Auxiliary operations
- B29C49/4205—Handling means, e.g. transfer, loading or discharging means
- B29C49/42113—Means for manipulating the objects' position or orientation
- B29C49/42121—Changing the center-center distance
- B29C49/42122—Adapting to blow-mould cavity center-center distance
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B29—WORKING OF PLASTICS; WORKING OF SUBSTANCES IN A PLASTIC STATE IN GENERAL
- B29D—PRODUCING PARTICULAR ARTICLES FROM PLASTICS OR FROM SUBSTANCES IN A PLASTIC STATE
- B29D22/00—Producing hollow articles
- B29D22/003—Containers for packaging, storing or transporting, e.g. bottles, jars, cans, barrels, tanks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0875—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0888—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B29—WORKING OF PLASTICS; WORKING OF SUBSTANCES IN A PLASTIC STATE IN GENERAL
- B29C—SHAPING OR JOINING OF PLASTICS; SHAPING OF MATERIAL IN A PLASTIC STATE, NOT OTHERWISE PROVIDED FOR; AFTER-TREATMENT OF THE SHAPED PRODUCTS, e.g. REPAIRING
- B29C49/00—Blow-moulding, i.e. blowing a preform or parison to a desired shape within a mould; Apparatus therefor
- B29C49/02—Combined blow-moulding and manufacture of the preform or the parison
- B29C2049/023—Combined blow-moulding and manufacture of the preform or the parison using inherent heat of the preform, i.e. 1 step blow moulding
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B29—WORKING OF PLASTICS; WORKING OF SUBSTANCES IN A PLASTIC STATE IN GENERAL
- B29C—SHAPING OR JOINING OF PLASTICS; SHAPING OF MATERIAL IN A PLASTIC STATE, NOT OTHERWISE PROVIDED FOR; AFTER-TREATMENT OF THE SHAPED PRODUCTS, e.g. REPAIRING
- B29C49/00—Blow-moulding, i.e. blowing a preform or parison to a desired shape within a mould; Apparatus therefor
- B29C49/18—Blow-moulding, i.e. blowing a preform or parison to a desired shape within a mould; Apparatus therefor using several blowing steps
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B29—WORKING OF PLASTICS; WORKING OF SUBSTANCES IN A PLASTIC STATE IN GENERAL
- B29C—SHAPING OR JOINING OF PLASTICS; SHAPING OF MATERIAL IN A PLASTIC STATE, NOT OTHERWISE PROVIDED FOR; AFTER-TREATMENT OF THE SHAPED PRODUCTS, e.g. REPAIRING
- B29C49/00—Blow-moulding, i.e. blowing a preform or parison to a desired shape within a mould; Apparatus therefor
- B29C49/42—Component parts, details or accessories; Auxiliary operations
- B29C49/4205—Handling means, e.g. transfer, loading or discharging means
- B29C49/42073—Grippers
- B29C49/42087—Grippers holding outside the neck
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B29—WORKING OF PLASTICS; WORKING OF SUBSTANCES IN A PLASTIC STATE IN GENERAL
- B29C—SHAPING OR JOINING OF PLASTICS; SHAPING OF MATERIAL IN A PLASTIC STATE, NOT OTHERWISE PROVIDED FOR; AFTER-TREATMENT OF THE SHAPED PRODUCTS, e.g. REPAIRING
- B29C49/00—Blow-moulding, i.e. blowing a preform or parison to a desired shape within a mould; Apparatus therefor
- B29C49/42—Component parts, details or accessories; Auxiliary operations
- B29C49/64—Heating or cooling preforms, parisons or blown articles
- B29C49/6409—Thermal conditioning of preforms
- B29C49/6427—Cooling of preforms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/28—Using a specific disk cache architecture
- G06F2212/283—Plural cache memories
Definitions
- the present invention relates to a scheduler, a multicore processor system, and a scheduling method when performing multitask processing by parallel processing by a plurality of cores.
- a hierarchical memory configuration such as a cache memory, a main memory, and a file system has been adopted as a memory area for storing data used by a processor when executing processing.
- the hierarchical memory configuration is expected to increase the system speed in order to improve the access speed to data.
- a cache memory that operates at a higher speed than other memories has a limited memory capacity, so the data stored in the cache memory is an algorithm such as LRU (Least Recently Used). Is used for replacement (see, for example, Patent Document 1 below).
- multi-core processor systems having a plurality of processors have been widely adopted. Since the multi-core processor system causes tasks to be executed in parallel by each processor, the processing performance can be greatly improved (see, for example, Patent Document 1 below). On the other hand, in a multi-core processor system, when tasks are executed in parallel, when data in the cache memory of each processor is rewritten, processing for synchronizing data in the cache memory of other processors is required.
- data synchronization methods include a snoop cache mechanism that is a mechanism for taking cache coherency between processors.
- the snoop cache mechanism operates when data on a cache memory that is shared by one processor with another processor is rewritten. Rewriting of data on the cache memory is detected by a snoop controller mounted on the cache memory of another processor. Then, the snoop controller reflects the rewritten new value in the cache memory of another processor via the bus between the cache memories (see, for example, Patent Document 2 below).
- JP-A-6-175923 Japanese Patent Laid-Open No. 10-240698 JP-A-11-212869
- FIG. 20 is an explanatory diagram showing an example of snoop operation in multi-core parallel processing.
- the multi-core for example, CPU # 0, CPU # 1 as shown in FIG. 20
- parallel processing in which each CPU executes processing simultaneously is performed.
- data in one cache memory for example, one of the cache L1 $ 0 and the cache L1 $ 1 is rewritten.
- the synchronization process is performed by the snoop 120. Specifically, when the value of the variable a in the data arranged in the cache L1 $ 0 is rewritten by the CPU # 0, the snoop 120 transfers the data of the variable a in the cache L1 $ 1 via the bus. rewrite.
- FIG. 21 is an explanatory diagram showing an example of cache rewriting in multitask processing.
- a task switch that switches a task to be executed is performed according to the task execution status.
- the multi-core processor system 2000 performs multi-task processing for task # 0 to task # 2.
- the CPU # 0 After the data arranged in the cache L1 $ 0 is rewritten, when returning to the execution of the process executed before the rewriting, the CPU # 0 needs to read the data used by the task # 0 from the memory 140 again. is there. For example, even if the data placed in the target cache memory is rewritten due to the occurrence of a task switch, the data placed in the cache memory by the CPU is not often used thereafter. In this way, the rewriting process of data having no reusability has a problem that it causes performance degradation for the CPU using the cache memory.
- the disclosed technology improves the processing efficiency of the multi-core processor system by improving the use efficiency of the cache even when parallel processing and multi-task processing are executed. It is an object to provide a scheduler, a multi-core processor system, and a scheduling method.
- the disclosed technique is such that the priority of the execution target process assigned to each processor in the process group assigned to each processor of the multi-core processor is equal to or higher than a threshold value.
- the high priority execution target process that is determined to have a priority level equal to or higher than the threshold value is accessed at the time of execution.
- the scheduler the multi-core processor system, and the scheduling method, it is possible to improve the processing efficiency of the multi-core processor system by improving the use efficiency of the cache even when parallel processing and multi-task processing are executed. Play.
- FIG. 1 is an explanatory diagram showing an example of a scheduling process according to the present embodiment.
- a plurality of processes can be executed in parallel by a plurality of processors provided in the multi-core processor system 100. Therefore, in the multi-core processor system 100, it is possible to extract processing groups (for example, parallel tasks) that can be executed in parallel from applications and perform efficient parallel processing.
- processing groups for example, parallel tasks
- the priority related to the execution order to the processing to be executed is selected and placed in the cache memory. be able to.
- the priority is set based on the frequency of accessing the data once stored in the cache memory when the process is executed and the deadline time.
- the setting contents of the priority of each task are stored in the task table 111. In FIG. 1 and subsequent figures, blocks representing high priority tasks are displayed larger than blocks representing low priority tasks.
- the scheduler 110 of the multi-core processor system 100 refers to the priority set for the processes to be executed in parallel, and each accesses data (hereinafter referred to as “shared data”) when executing each process. Place it in the optimal memory area. Further, the scheduler 110 selects what method is used as cache coherency for synchronizing shared data according to priority when the same shared data is arranged in a plurality of cache memories.
- the scheduler 110 stores the shared data of the high priority processes. Arrange preferentially from the memory area with fast access speed. For example, the shared data of tasks # 0, 1 and tasks # 3, 4 that can be set in parallel with high priority are arranged in a memory area with a high access speed in order from the cache L1 $. Then, the shared data of task # 2 and task # 5 set to the low priority is arranged in the remaining memory after the shared data of the high priority processing is arranged.
- the scheduler 110 increases the priority as in the left multi-core processor system 100.
- the shared data of the set process is arranged in the cache L1 $. After that, the scheduler 110 arranges shared data of task # 2 and task # 3 set to low priority in the remaining memory.
- the scheduler 110 performs cache coherency at the timing when a new value is written in a normal cache memory.
- the scheduler 110 does not reflect the writing of a new value from the CPU after a new value is written in a certain cache memory (for example, the cache L1 $ 0). Cache coherency is performed at the timing when reading into the memory (cache L1 $ 1) occurs.
- the multi-core processor system 100 since the multi-core processor system 100 according to the present embodiment preferentially arranges shared data that is frequently used in a cache memory having a high access speed, the processing speed can be improved.
- the shared data of the process set to the low priority the synchronization process by the cache coherency is postponed until an access request from the CPU is generated. Therefore, it is possible to avoid an operation that causes a decrease in processing performance, such as writing shared data having no reusability to the cache memory.
- the detailed configuration and processing procedure of the multi-core processor system 100 according to this embodiment will be described below.
- FIG. 2 is an explanatory diagram illustrating an example of a hierarchical memory configuration.
- the multi-core processor system 100 includes a plurality of types of memory areas. Since each memory area has a different access speed and memory capacity from the processor, data corresponding to each application is stored.
- each processor (CPU # 0, CPU # 1) of the multi-core processor system 100 includes a cache L1 $ (cache memory installed in each processor) and a cache L2 $ (cache installed in the snoop 120).
- a cache L1 $ cache memory installed in each processor
- a cache L2 $ cache installed in the snoop 120.
- Four types of memory areas are prepared: a memory), a memory 140, and a file system 150.
- FIG. 3 is an explanatory diagram illustrating an example of multitask processing.
- Multitask processing in the multicore processor system 100 means processing in which a plurality of tasks are executed in parallel by a plurality of processors.
- task # 0 to task # 5 are prepared as tasks to be executed by the multi-core processor system 100. Then, under the control of the scheduler 110, the CPU # 0 and the CPU # 1 each execute the dispatched task.
- the scheduler 110 causes each task to be executed in parallel while appropriately switching a task to be executed from among a plurality of tasks by time slicing or the like.
- the snoop 120 is set to either a normal cache coherency or a cache coherency in a low priority parallel task in accordance with an instruction from the scheduler 110.
- ⁇ Normal cache coherency (update at write)> 4 to 7 are explanatory diagrams showing a normal cache coherency procedure.
- the cache memory (cache L1 $ 0 and cache L1 $ 1) of the CPU # 0 and CPU # 1 that execute parallel tasks is based on the description 400 of the task to be executed. The latest data is stored.
- the value of the variable a of the cache L1 $ 1 in which old data is stored is first purged based on the description 400 as shown in FIG.
- the value of the variable a in the cache L1 $ 0 is stored as the value of the variable a in the cache L1 $ 1 via the bus of the snoop 120.
- consistency between the cache L1 $ 0 and the cache L1 $ 1 is maintained by performing the processing illustrated in FIGS.
- FIG. 8 is an explanatory diagram showing a procedure of cache coherency in a low priority parallel task.
- FIG. 8 shows a coherency procedure when the multi-core processor system 100 executes a parallel task set to a low priority.
- CPU # 0 and CPU # 1 are executing parallel tasks, and the same data is arranged in the cache L1 $ 0 and the cache L1 $ 1 (step S801).
- step S802 the variable a of the cache L1 $ 1 is purged.
- step S803 the same procedure as normal cache coherency is performed until rewriting of the variable a stored in the cache memory is detected and old data is purged. Is called.
- the snoop 120 changes the value of the latest variable a stored in the cache L1 $ 0 via the bus. Store in the cache L1 $ 1 (step S804).
- the snoop 120 controls when the access request to the variable a of the cache L1 $ 1 in which the latest rewrite content is not reflected by the CPU # 1 is generated. And coherence is taken. Therefore, redundant bus transactions such as normal cache coherency can be avoided.
- the operation starts when the variable a is updated.
- the operation starts only when the CPU # 1 issues a read request to the variable a.
- the snoop 120 reads the value of the variable a in the cache L1 $ 0 where the latest variable a is arranged, and arranges the read value as the variable a in the cache L1 $ 1.
- step S804 illustrated in FIG. 8 data to be accessed by the CPU # 0 is arranged in the cache L1 $ 0. However, depending on the task executed by the cache L1 $ 0, the data is stored in another memory area. In some cases, the data being accessed is the target of access. For example, it is assumed that CPU # 0 accesses data arranged in the cache L2 $, the memory 140, and the file system 150. In such a case, the snoop 120 can read out the target data from each data area and place it in the cache memory L1 $.
- FIG. 9 is a block diagram showing a functional configuration of the scheduler.
- the multi-core 901 includes n CPUs (Central Processing Units) and controls the entire multi-core processor system 100.
- the multi-core 901 is a processor or a group of processors on which a plurality of cores are mounted. If a plurality of cores are mounted, a single processor having a plurality of cores may be used, or a processor group in which single core processors are arranged in parallel may be used. In the present embodiment, in order to simplify the explanation, a processor group in which single-core processors are arranged in parallel will be described as an example.
- the scheduler 110 includes a determination unit 1001, a first arrangement unit 1002, a second arrangement unit 1003, a third arrangement unit 1004, a specifying unit 1005, an extraction unit 1006, and an allocation unit 1007. It is. Specifically, the determination unit 1001 to the allocation unit 1007, for example, specify a program stored in another memory 1008 of the multi-core processor system 100 (a memory other than the cache memory installed in the CPU) in a specific state in the multi-core 901. The function is realized by causing the CPU to execute the function.
- the determination unit 1001 has a function of determining whether or not the priority set for a process to be executed (hereinafter referred to as “execution target process”) is equal to or higher than a threshold in the multi-core processor system 100. Specifically, the determination unit 1001 determines whether the priority of the execution target process assigned to each processor in the process group assigned to each processor (CPU # 0 to CPU #n) of the multi-core processor system 100 and executed is the threshold. Judge whether or not it is greater than or equal to the value. The determination result by the determination unit 1001 is temporarily stored in a storage area such as another memory 1008.
- the priority is set based on the operation result obtained by the simulation of the execution target process. For example, the deadline of each execution target process may be compared, and the execution target process having a shorter time to the deadline may be set to have a higher priority.
- the scheduler 110 ends the processing once the shared data of the execution target process set with a high priority is placed in a memory (cache L1 $ or cache L2 $) having a high access speed. Keep locked until Therefore, an execution target process with a high priority is executed with higher priority than other execution target processes.
- the execution target process having a larger number of updates of the shared data arranged in the cache memory may be set to have a higher priority. Since the scheduler 110 according to the present embodiment preferentially places shared data with high reusability in the cache memory (cache L1 $) of each processor, the utilization efficiency of the cache memory can be maintained at a high value. .
- the threshold value that is the determination criterion in the determination unit 1001 can be adjusted. If the priority set for each execution target process is equal to or higher than the threshold value, the determination unit 1001 sets the execution target process with a high priority, and the set priority level does not satisfy the threshold value. For example, the execution target process has a low priority. Therefore, an optimal value can be set according to the application to be executed.
- an arbitrary unit such as a task, a process, or a thread can be selected as the unit of the execution target process. In the present embodiment, as an example, a task is described as a unit of execution target processing.
- the first placement unit 1002 has a function of placing data in a cache memory mounted on each CPU according to the determination result of the determination unit 1001. Specifically, the first arrangement unit 1002 is a shared data that is accessed by the determination unit 1001 during execution of a high-priority execution target process that is determined to have a priority level equal to or higher than a threshold among the execution target processes. Are placed in the cache memory of the target CPU.
- task A which is a high-priority execution target process
- CPU # 1 shared data accessed by task A at the time of execution
- shared data accessed by task A at the time of execution is cache memory 1 by first arrangement unit 1002.
- task B which is a high-priority execution target process
- shared data accessed by task B at the time of execution is cached by first allocation unit 1002.
- the execution unit process may determine that there is no high priority execution target process in the execution target process. In such a case, if the cache memory is left empty, the utilization efficiency of the cache memory is lowered. Therefore, the first arrangement unit 1002 arranges shared data in the cache memory mounted on each CPU even for processes other than the high priority execution target process (for example, the low priority execution target process described later). To do. Thereafter, when a high priority execution target process appears, the first placement unit 1002 preferentially places the shared data of the high priority process in the cache memory of the target CPU.
- the first placement unit 1002 places the shared data of the high priority execution target process in the cache memory of the target processor, the execution of the high priority execution target process ends. Until this is done, overwriting of the shared data can be prohibited (locked). Therefore, the first arrangement unit 1002 can prevent the shared data of the high priority execution target process from being overwritten by non-reusable data.
- the second arrangement unit 1003 has a function of arranging data in another memory 1008 having an access speed slower than the cache memory of each processor in accordance with the determination result of the determination unit 1001. Specifically, the second arrangement unit 1003 transfers the shared data accessed by the low priority execution target process, which is determined by the determination unit 1001 as not having a priority equal to or higher than the threshold, to the other memory 1008. Deploy.
- the memory 1008 other than the cache memory is provided with a plurality of types of memories hierarchically according to the access speed and the memory capacity. Therefore, the second arrangement unit 1003 sequentially stores data for a capacity that can be arranged in the order of high access speed memories. For example, in the case of FIG. 9, data is arranged in the order of cache L2 $ ⁇ memory 140 ⁇ file system 150. In addition, as for data, data having a high update frequency specified from a prior simulation is preferentially arranged in a memory having a high access speed.
- the third placement unit 1004 has a function of placing shared data requested for access from the multi-core 901 in a cache memory mounted on the requesting CPU. Specifically, the third placement unit 1004 stores the memory 1008 in the memory 1008 when an access request to the shared data placed in the memory 1008 occurs in any of the CPUs (for example, CPU # 1) in the multi-core 901. The arranged shared data is arranged in the cache memory 1 of the CPU # 1.
- the identifying unit 1005 identifies the capacity of the rewritable area in the cache memory of each CPU of the multi-core 901 when the determining unit 1001 determines whether the priority of the execution target process is equal to or higher than the threshold value.
- Has the function of The rewritable area means an overwritable area.
- the area where the shared data of the executed process is arranged and the area where the shared data of the low-priority process are arranged can be overwritten, and thus are specified as a rewritable area.
- the identification result by the identification unit 1005 is temporarily stored in a storage area such as another memory 1008.
- the first placement unit 1002 can adjust the placement process according to the capacity of the rewritable area specified by the specification unit 1005. For example, when the capacity of the rewritable area is smaller than the capacity of the shared data accessed during execution of the high priority execution target process, the first placement unit 1002 cannot place all the shared data in the cache memory. Therefore, the first arrangement unit 1002 arranges the shared data in a capacity that can be arranged in the cache memory in the order of data with the highest update frequency. Then, the second arrangement unit 1003 arranges the shared data that could not be arranged in the cache memory in another memory 1008 area.
- the capacity of the rewritable area may be larger than the capacity of the shared data accessed by the high priority execution target process at the time of execution.
- the first placement unit 1002 first places the shared data, which is accessed at the time of execution by the high priority execution target process, in the cache memory as usual. Thereafter, the first arrangement unit 1002 arranges the shared data accessed at the time of execution of the low priority execution target process in the free capacity in the cache memory in the order of the data with the highest update frequency.
- the extraction unit 1006 has a function of extracting a process that satisfies a specific condition from the execution target processes included in the application 1000. Specifically, the extraction unit 1006 extracts a process (for example, a parallel task) that has common data to be accessed during execution from the execution target processes. Whether or not the data to be accessed at the time of execution is common refers to the identifier of the shared data set in each execution target process (for example, the shared data ID described in FIG. 13 described later). The extraction result by the extraction unit 1006 is temporarily stored in a storage area such as the memory 1008.
- a process for example, a parallel task
- the assigning unit 1007 has a function of assigning the execution target process to each CPU of the multi-core 901. If there is no instruction from the scheduler 110, the allocating unit 1007 allocates each execution target process to the optimum CPU based on the dependency and execution order set in advance and the current processing load of each CPU. .
- the allocation unit 1007 allocates each process extracted as a process with common shared data to the same CPU in the multi-core 901. Further, the assigning unit 1007 may assign a process with the same priority among the processes extracted by the extracting unit 1006 to the same CPU (for example, CPU # 1) in the multi-core 901. it can.
- FIG. 10 is a flowchart showing a procedure of shared data arrangement processing.
- the flowchart of FIG. 10 represents a procedure for determining in which cache memory (cache L1 $ or cache L2 $) the shared data is to be placed.
- cache memory cache L1 $ or cache L2 $
- the shared data used when executing each task can be arranged in an appropriate cache memory corresponding to the contents of the cache coherency process.
- step S1001 determines whether or not the task to be executed is a high priority task. If it is determined in step S1001 that the task to be executed is a high priority task (step S1001: Yes), the scheduler 110 determines that the total shared data size of the task to be executed is larger than the cache L1 $ size. It is determined whether or not it is smaller (step S1002).
- step S1002 If it is determined in step S1002 that the total shared data size is smaller than the cache L1 $ size (step S1002: Yes), the scheduler 110 places all shared data in L1 $ (step S1003), and a series of The process ends. That is, in step S1003, the scheduler 110 accesses all shared data if the task to be executed is a high priority task and all shared data of the task to be executed can be stored in the CPU cache memory. Place in the fast cache L1 $.
- step S1002 If it is determined in step S1002 that the total shared data size is not smaller than the cache L1 $ size (step S1002: No), the scheduler 110 cannot place all the shared data in the cache L1 $. Therefore, the scheduler 110 arranges the shared data of the task to be executed in the caches L1 $ and L2 $ in the order of update frequency (step S1004). That is, in step S1004, the scheduler 110 arranges the shared data in the cache L1 $ in order from the data with the highest update frequency, and when the cache L1 $ runs out of capacity, the update frequency of the remaining shared data is subsequently updated. Arrange in the cache L2 $ in order from the highest data.
- steps S1002 to S1004 described above represents a procedure for arranging shared data of high priority tasks.
- shared data of tasks other than high priority tasks (low priority tasks) is arranged in an empty area of the cache L1 $ for data with a high update frequency.
- step S1001 When it is determined in step S1001 that the task to be executed is not a high priority task (step S1001: No), the scheduler 110 performs arrangement processing on data with high update frequency among the shared data. First, the scheduler 110 determines whether or not the shared data size of the update frequency among the shared data of the task to be executed is smaller than the unlocked cache L1 $ size (step S1005).
- the unlocked cache L1 $ size means the capacity of an area other than the lock area in which shared data of other execution target tasks is already arranged among all areas of the cache L1 $.
- step S1005 When it is determined in step S1005 that the size of all shared data with a high update frequency is smaller than the size of the unlocked cache L1 $ (step S1005: Yes), the scheduler 110 stores all shared data with a high update frequency in the cache L1 $. Judge that it can be placed in. Therefore, the scheduler 110 places shared data with a high update frequency in the cache L1 $ (step S1006), and ends a series of processing.
- step S1005 if it is determined that the size of all shared data with high update frequency is not smaller than the size of the unlocked cache L1 $ (step S1005: No), the scheduler 110 stores all shared data with high update frequency in the cache L1 $. Can not be placed. Therefore, the scheduler 110 arranges frequently updated data among the shared data of the task to be executed in order in the caches L1 $ and L2 $ (step S1007). That is, similarly to step S1004, the scheduler 110 arranges the shared data in the cache L1 $ in order from the data with the highest update frequency. When the capacity of the cache L1 $ is exhausted, the scheduler 110 subsequently arranges the remaining shared data in the cache L2 $ in order from the data with the highest update frequency.
- the scheduler 110 efficiently arranges shared data of low priority tasks in a memory area where shared data of high priority tasks is not arranged. Can do. Even if it is arranged in a memory area with a high access speed (for example, cache L1 $), unlike the case where the shared data of the high priority task is arranged, the shared data of the low priority task is not locked. It is possible to prevent a situation that disturbs the processing of the priority task.
- a high access speed for example, cache L1 $
- FIG. 11 is a flowchart showing a procedure of task table creation processing.
- the flowchart of FIG. 11 represents a procedure for performing a simulation of a task that constitutes an application to be executed by the multi-core processor system 100 and creating a task table 111 that represents the priority of the task based on the simulation result.
- the scheduler 110 can create a task table 111 necessary for appropriately arranging the shared data of each task.
- the scheduler 110 first analyzes each data size in each task to be executed (step S1101). Subsequently, the scheduler 110 performs deadline analysis of each task (step S1102). Furthermore, the scheduler 110 performs data dependency analysis between tasks (step S1103). Through steps S1101 to S1103 described above, the scheduler 110 can acquire data necessary for specifying the configuration of each task. The data acquired in steps S1101 to S1103 is stored in the task table 111, and is used for simulation for setting priorities described later.
- the scheduler 110 determines whether or not there is an unsimulated parallel task in each task (step S1104). If it is determined in step S1104 that an unsimulated parallel task exists (step S1104: Yes), the scheduler 110 executes simulation of any one of the unsimulated parallel tasks (step S1105).
- Step S1106 the scheduler 110 measures the update frequency of data with dependency analysis (step S1106), and determines whether or not the update frequency of data with dependency relationship is greater than a threshold value (step S1107).
- Step S1107 is processing for determining whether or not priority setting is necessary.
- step S1107 when the update frequency of the dependent data is larger than the threshold (step S1107: Yes), the scheduler 110 sets the priority based on the deadline stored in the task table 111 ( Step S1108). On the other hand, if the update frequency of the dependent data is not greater than the threshold value (step S1107: No), the scheduler 110 does not determine the priority because the update frequency is low even once stored in the cache. Then, the process proceeds to step S1109.
- step S1109 the scheduler 110 sets the parallel task being processed as a simulated task (step S1109), returns to the process of step S1104, and determines whether there is an unsimulated parallel task.
- step S1104 repeats the simulation by the processing in steps S1105 to S1109 and sets the priority of the parallel task. If it is determined in step S1104 that there are no unsimulated parallel tasks (step S1104: No), the scheduler 110 ends the series of processes because the simulation of all parallel tasks is completed.
- the scheduler 110 can create the task table 111 by executing each process of FIG.
- the task table creation process described above is executed mainly by the scheduler 110, but may be executed in advance by another compiler or simulator as the execution subject.
- the analysis in steps S1101 to S1103 can be executed by a general compiler. Further, the simulation in step S1105 using the analysis results in steps S1101 to S1103 can also be executed by a known simulator for estimating the execution time and the number of updates when each task is executed (see, for example, Japanese Patent Laid-Open No. 2000-276381). .)
- FIG. 12 is a data table showing an example of the data structure of the task table.
- FIG. 13 is a data table showing a setting example of the task table.
- a data table 1200 in FIG. 12 represents an example of the data structure of the task table 111 created by the task table creation process described in FIG.
- the task table 111 includes fields of the following information group representing task information and fields of the following information group representing shared data information, like the data table 1200 of FIG.
- a value with a blank value such as a task name, a task ID, and a deadline is input with a different value for each task.
- the fields such as the priority and the coherence mode that are binary values such as ⁇ / ⁇ , any one of the binary values is input.
- the coherence mode, the fork to other CPUs and the cache level to be arranged are determined at the time of task execution. Specifically, the coherence mode and the fork to another CPU are determined by a task execution process described with reference to FIGS.
- the cache level to be arranged is determined by the shared data arrangement process described with reference to FIG.
- FIG. 13 illustrates a data table 1200 in which specific numerical values of the task table 111 are set.
- FIGS. 14 to 17 are flowcharts showing the procedure of the task execution process.
- the flowcharts of FIGS. 14 to 17 show procedures when the scheduler 110 causes each processor to execute a parallel task to be executed.
- the parallel task to be executed is based on the coherence method according to the priority set in the task table 111 and the priority of other parallel tasks being executed. Executed.
- the scheduler 110 first determines whether or not a state transition has occurred in the task to be executed (step S1401).
- the state transition in step S1401 means “task generation”, “task end”, and “task switch”. Therefore, if it is determined in step S1401 that a state transition has occurred, the scheduler 110 further determines which of the above three types has been reached.
- step S1401 the scheduler 110 is in a standby state until a state transition occurs (step S1401: No loop). If it is determined in step S1401 that task generation has occurred in the state transition (step S1401: Yes task generation), the scheduler 110 determines whether the task to be executed is a parallel task (step S1402).
- step S1402 determines whether or not the newly generated parallel task is a master thread (step S1403).
- the master thread is a thread that is preferentially executed.
- step S1403 If it is determined in step S1403 that the newly generated parallel task is a master thread (step S1403: Yes), the scheduler 110 further determines whether or not the newly generated parallel task is a high priority task. Is determined (step S1404). In step S1404, whether or not the task is a high priority task can be determined with reference to the task table 111.
- step S1404 If it is determined in step S1404 that the newly generated parallel task is a high priority task (step S1404: Yes), the scheduler 110 further determines whether or not the CPU is executing the high priority task. Determination is made (step S1405).
- step S1405 When it is determined in step S1405 that the high priority task is being executed (step S1405: Yes), the scheduler 110 performs a preparation process for moving the task to be executed to execution. That is, the scheduler 110 migrates (executes data migration) the parallel task being executed to the CPU having the smallest load among the CPUs that are executing the parallel threads, and forks the new thread to other CPUs (newly) Copy generation of a thread) is prohibited (step S1406).
- step S1406 the scheduler 110 locks the cache area where the shared data of the migrated task is placed (step S1407). Then, the scheduler 110 sequentially executes the migrated task (step S1408), prohibits the forking of the thread to another CPU in the newly generated parallel task, and assigns it to the CPU with the smallest load (step S1409).
- the scheduler 110 locks the cache area where the newly generated shared data of the parallel task is arranged, and starts executing the task (step S1410).
- the scheduler 110 returns to the process of step S1401 and enters a standby state until a new state transition occurs.
- step S1403 determines whether or not thread forking is prohibited (step S1411). ).
- the thread that is the determination criterion is a thread that constitutes a newly generated task.
- step S1403 If it is determined in step S1403 that the forking of the newly generated task thread is prohibited (step S1411: Yes), the scheduler 110 sets the newly generated task to the CPU on which the master thread is executed. Queue to the same CPU (step S1412). The task queued by the process of step S1412 is executed by the queuing destination CPU after the currently executing task is completed. When the process of step S1412 ends, the scheduler 110 returns to the process of step S1401 and enters a standby state until a new state transition occurs.
- the scheduler 110 determines that the newly generated task is not a parallel task (step S1402: No), or determines that thread forking is not prohibited (step S1411: No).
- the task is queued to the CPU with the smallest load (step S1413).
- the task queued in step S1413 is a task determined to be newly generated in step S1401.
- the scheduler 110 returns to the process of step S1401 and enters a standby state until a new state transition occurs.
- the flowchart of FIG. 15 shows the scheduler in the case where it is determined in step S1401 that a task end has occurred (1401: Yes task end) and in the case where it is determined that a task switch has occurred (step S1401: Yes task switch). 110 processes.
- step S1401 when it is determined in step S1401 that task termination has occurred (1401: Yes task termination), the scheduler 110 first releases a cache area in which shared data of locked parallel tasks is arranged. (Step S1501).
- step S1502 determines whether there is a task waiting for execution (step S1502). If it is determined in step S1502 that there is a task waiting for execution (step S1502: Yes), the scheduler 110 proceeds to step S1503 and performs processing for executing the task waiting for execution. On the other hand, if it is determined in step S1502 that there is no task waiting to be executed (step S1502: No), the scheduler 110 returns to the process of step S1401 in FIG. 14 and enters a standby state until the next state transition occurs. .
- step S1401 determines whether it is a low-priority parallel task that passes the task execution right (step S1401). S1503). If it is determined in step S1502 that there is a task waiting to be executed (step S1502: Yes), the scheduler 110 performs the determination process in step S1503.
- step S1503 If it is determined in step S1503 that the task execution right is transferred to the low-priority parallel task (step S1503: Yes), the scheduler 110 executes the cache coherence method when executing the low-priority parallel task. Is adopted. That is, the scheduler 110 sets the CPU cache coherence method to a mode in which the snoop mechanism operates when another CPU accesses the data (step S1504).
- step S1503 when it is determined that it is not a low priority parallel task that passes the task execution right (step S1503: No), or when the process of step S1504 ends, the scheduler 110 becomes an execution target. Task execution is started (step S1505). When the task is executed in step S1505, the scheduler 110 returns to the process in step S1401 and is in a standby state until a state transition of the next task occurs.
- the flowchart of FIG. 16 represents the processing of the scheduler 110 when it is determined in step S1404 that the newly generated parallel task is not a high priority task (step S1404: No).
- step S1404 determines whether or not the parallel task newly generated in step S1404 is not a high priority task (step S1404: No). Is determined (step S1601). In step S1601, it is determined whether a high priority task is currently being executed in the CPU that executes the newly generated task.
- step S1601 If it is determined in step S1601 that a high priority task is being executed (step S1601: Yes), the scheduler 110 employs a cache coherence method when executing a low priority parallel task. That is, the scheduler 110 sets the cache coherence method of the parallel task being executed to a mode in which the snoop mechanism of the snoop 120 operates when another CPU accesses data (step S1602).
- step S1603 the scheduler 110 queues the task to be executed to the CPU with the smallest load (step S1603), and proceeds to the processing of step S1401.
- step S1603 the queued task is executed after the currently executing task is completed.
- the CPU with the smallest load means a CPU with the smallest processing amount of the queued task. Note that the scheduler 110 that has shifted to step S1401 is in a standby state until the next transition state occurs.
- step S1601 When it is determined in step S1601 that the high priority task is not being executed (step S1601: No), the scheduler 110 adopts a cache coherence method when executing the high priority parallel task. That is, the scheduler 110 migrates the parallel task being executed to the CPU having the smallest load among the other CPUs executing the parallel threads included in the parallel task, and the new thread included in the parallel task during the execution. Forking to another CPU is prohibited (step S1604).
- the scheduler 110 sequentially executes the tasks migrated in step S1604 (step S1605). Then, in the newly generated parallel task, the scheduler 110 prohibits the forking of the thread included in the parallel task to another CPU and queues it to the CPU with the smallest load (step S1606).
- step S 1606 the queued task is executed after the currently executing task is completed.
- step S1606 ends, the scheduler 110 shifts to the process of step S1401 and enters a standby state until a new state transition occurs.
- the flowchart of FIG. 17 represents the process of the scheduler 110 when it is determined in step S1405 that the newly generated parallel task is not executing the high priority task (step S1405: No).
- step S1405 when it is determined that the target CPU is not executing the high priority task in step S1405 (step S1405: No), the scheduler 110 first sets the newly generated task to the minimum load. The CPU is assigned (step S1701).
- step S1702 determines whether or not the newly generated parallel task does not satisfy the deadline constraint in the sequential execution.
- step S ⁇ b> 1702 the scheduler 110 determines whether the deadline constraint is not satisfied based on the deadline constraint set in the task table 111.
- step S1702 If it is determined in step S1702 that the deadline constraint is not satisfied (step S1702: Yes), the scheduler 110 further determines whether or not a low-priority parallel task is currently being executed (step S1703).
- step S1703 If it is determined in step S1703 that a low priority parallel task is being executed (step S1703: Yes), the scheduler 110 adopts a cache coherence method when executing the low priority parallel task. That is, the scheduler 110 sets the coherence method of the parallel task being executed to a mode in which the snoop mechanism operates when another CPU accesses data (step S1704).
- step S1704 When the processing in step S1704 is completed, the scheduler 110 locks the cache area where the newly generated shared data of the parallel task is placed (step S1705). If it is determined in step S1703 that the low-priority parallel task is not being executed (step S1703: No), the scheduler 110 employs a normal coherence method, and thus does not perform the process of step S1704. The process proceeds to step S1705.
- step S1705 When the process of step S1705 ends, the scheduler 110 starts execution of the newly generated parallel task (step S1706), returns to the process of step S1401, and enters a standby state until a next task state transition occurs. .
- step S1702 determines whether the deadline constraint is satisfied (step S1702: No). If it is determined in step S1702 that the deadline constraint is satisfied (step S1702: No), the scheduler 110 locks the cache area in which the newly generated shared data of the parallel task is placed (step S1707). .
- step S1708 the scheduler 110 starts sequential execution of the newly generated parallel task. Thereafter, the scheduler 110 returns to the process of step S1401 and enters a standby state until a state transition of the next task occurs.
- the scheduler 110 determines what priority (high priority / low priority) is set for each task specified as a parallel task, and the parallel tasks have the same priority. Depending on the degree, it can be scheduled to be executed by an optimal CPU. In addition, since the scheduler 110 sets a cache coherence method for shared data in accordance with the priority of each task, it is possible to prevent a decrease in usage efficiency of the cache memory (cache L1 $).
- FIG. 18 is an explanatory diagram of an execution example of parallel tasks having the same priority.
- the smartphone 1801 communicates with another smartphone 1802 in accordance with a WLAN (Wireless LAN) standard. Furthermore, the smartphone 1801 performs communication based on the LTE (Long Term Evolution) standard with the server 1803.
- WLAN Wireless LAN
- Tasks conforming to the WLAN standard (WLAN # 0, 1) and tasks conforming to the LTE standard (LTE # 0, 1) are both high priority tasks due to real-time restrictions. Therefore, the smartphone 1801 executes WLAN # 0, 1 and LTE # 0, 1 as parallel tasks having the same priority. Since the snoop 120 of the smartphone 1801 executes parallel tasks with the same priority, a snoop method for performing normal cache coherency is adopted.
- FIG. 19 is an explanatory diagram of an execution example of parallel tasks having different priorities.
- the smartphone 1801 communicates with the server 1803 in accordance with the LTE standard.
- a task (driver # 0, 1) for a driver application that does not require communication is executed.
- the driver application executed by the smartphone 1801 is a low-priority task because there is no real-time constraint. Therefore, the smartphone 1801 executes LTE # 0, 1 as a high priority parallel task, and executes driver # 0, 1 as a low priority parallel task. Since parallel tasks with different priorities are executed, the snoop 120 of the smartphone 1801 employs a snoop method that performs cache coherency in the low priority parallel tasks for LTE # 0,1.
- the shared data that is frequently used is preferentially arranged in the cache memory having a high access speed, so that the processing speed can be improved.
- the synchronization processing by cache coherency is postponed until an access request from the CPU is generated. That is, it is possible to avoid a process that causes a decrease in the processing performance of the multi-core processor system, such as writing shared data without reusability to the cache memory. Therefore, even when parallel processing and multitask processing are executed, it is possible to improve the use efficiency of the cache and improve the processing capability of the multicore processor system.
- the shared data of the low priority task may be arranged in the cache memory of each CPU. Therefore, even if there is no post-high priority task, the cache memory can be used efficiently.
- the shared data accessed when executing the high priority task arranged in the cache memory may be set to be locked until the high priority task is completed. Locking shared data prevents high-priority task shared data from being overwritten by shared data of other tasks even if a task switch occurs, and efficiently executes high-priority tasks Can be made.
- the shared data accessed by the high-priority task is larger than the capacity of the cache memory and cannot be allocated in the cache memory, the memory area other than the cache memory will have a higher access speed.
- Shared data may be arranged.
- the shared data is arranged in order from the memory with the higher access speed. Therefore, since the shared data of the high priority task is preferentially arranged in a memory area with a high access speed, efficient processing can be expected.
- the shared data accessed by the high priority task is smaller than the capacity of the cache memory and the cache memory has room
- the shared data of the low priority task may be arranged in the surplus area.
- each task can be efficiently executed by preferentially arranging the shared data of each task in a memory area having a high access speed.
- parallel tasks may be extracted from tasks to be executed and assigned to the same processor. Further, parallel tasks having the same priority among the parallel tasks may be extracted and assigned to the same processor. By assigning parallel tasks having the same priority to the same processor, shared data once placed in the cache memory can be used efficiently.
- the scheduling method described in the present embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation.
- the scheduler is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer.
- the scheduler may be distributed via a network such as the Internet.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mechanical Engineering (AREA)
- Software Systems (AREA)
- Manufacturing & Machinery (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
図2は、階層的なメモリ構成の一例を示す説明図である。図2に例示したように、本実施の形態にかかるマルチコアプロセッサシステム100は、複数種類のメモリ領域を備えている。各メモリ領域は、それぞれプロセッサからのアクセス速度やメモリ容量が異なるため、それぞれ用途に応じたデータが格納される。
図3は、マルチタスク処理の一例を示す説明図である。本実施の形態にかかるマルチコアプロセッサシステム100におけるマルチタスク処理とは、複数のタスクが複数のプロセッサによって並列に実行される処理を意味する。
次に、本実施の形態にかかるマルチコアプロセッサシステム100のスヌープ120によって実行されるキャッシュコヒーレンシの手順について説明する。図1にて説明したように、スヌープ120は、スケジューラ110からの指示に応じて、通常のキャッシュコヒーレンシと、低優先度並列タスクにおけるキャッシュコヒーレンシのいずれかのコヒーレンス方式が設定される。
図4~7は、通常のキャッシュコヒーレンシの手順を示す説明図である。図4に例示したマルチコアプロセッサシステム100では、並列タスクを実行するCPU#0およびCPU#1のキャッシュメモリ(キャッシュL1$0およびキャッシュL1$1)に、実行対象のタスクの記述400に基づいて、最新データが格納される。
図8は、低優先度並列タスクにおけるキャッシュコヒーレンシの手順を示す説明図である。図8は、低優先度に設定された並列タスクをマルチコアプロセッサシステム100によって実行させる場合のコヒーレンシの手順を表している。
図9は、スケジューラの機能的構成を示すブロック図である。図9において、マルチコア901は、n個のCPU(Central Processing Unit)を備え、マルチコアプロセッサシステム100の全体の制御を司る。マルチコア901とは、コアが複数搭載されたプロセッサまたはプロセッサ群である。コアが複数搭載されていれば、複数のコアが搭載された単一のプロセッサでもよく、シングルコアのプロセッサが並列されているプロセッサ群でもよい。なお、本実施の形態では、説明を単純化するため、シングルコアのプロセッサが並列されているプロセッサ群を例に挙げて説明する。
図10は、共有データの配置処理の手順を示すフローチャートである。図10のフローチャートは、共有データをいずれのキャッシュメモリ(キャッシュL1$やキャッシュL2$)に配置するかを決定する手順を表している。図10の各処理を実行することによって、各タスクを実行する際に利用する共有データをキャッシュコヒーレンシ処理の内容に対応した適切なキャッシュメモリに配置することができる。
図11は、タスクテーブル作成処理の手順を示すフローチャートである。図11のフローチャートは、マルチコアプロセッサシステム100によって実行させるアプリケーションを構成するタスクのシミュレーションを行い、シミュレーション結果に基づいて、タスクの優先度を表すタスクテーブル111を作成する手順を表している。図11の各処理を実行することによって、スケジューラ110が、各タスクの共有データを適切に配置するために必要な、タスクテーブル111を作成することができる。
・タスク名 :(タスクの名称)
・タスクID :(タスクの識別子)
・デッドライン :(ステップS1102の解析結果)
・優先度 :高/低(ステップS1108の設定内容)
・コヒーレンスモード :Write時更新/read時更新
・他のCPUへのfork:許可/不許可
・共有データ名 :(データの名称)
・共有データID :(データのID)
・更新回数 :(ステップS1106の測定結果)
・配置されるキャッシュレベル:L1(キャッシュL1$)/L2(キャッシュL2$)
・データサイズ :(ステップS1101の解析結果)
図14~17は、タスク実行処理の手順を示すフローチャートである。図14~17のフローチャートは、スケジューラ110が、実行対象となる並列タスクを各プロセッサに実行させる際の手順を表している。図14~17の各処理を実行することによって、実行対象となる並列タスクは、タスクテーブル111に設定されている優先度や、実行中の他の並列タスクの優先度に応じたコヒーレンス手法に基づいて実行される。
次に、本実施の形態にかかるスケジューリング処理を通信機器に適用させた場合の動作例について説明する。具体的には、スマートフォンなどの携帯型の通信機器と、サーバなどの固定型の通信機器とによってそれぞれ実行される並列タスクについて説明する。
図18は、同一優先度の並列タスクの実行例を示す説明図である。図18では、スマートフォン1801は、他のスマートフォン1802とWLAN(Wireless LAN)の規格に準拠した通信を行っている。さらに、スマートフォン1801は、サーバ1803ともLTE(Long Term Evolution)の規格に準拠した通信を行っている。
図19は、優先度の異なる並列タスクの実行例を示す説明図である。図19では、スマートフォン1801が、サーバ1803とLTEの規格に準拠した通信を行っている。また、スマートフォン1801では、通信を必要としないドライバのアプリケーションについてのタスク(driver#0,1)が実行されている。
110 スケジューラ
120 スヌープ
130 メモリコントローラ
140 メモリ
150 ファイルシステム
1000 アプリケーション
1001 判断部
1002 第1配置部
1003 第2配置部
1004 第3配置部
1005 特定部
1006 抽出部
1007 割当部
Claims (10)
- マルチコアプロセッサの各プロセッサに割り当てて実行させる処理群のうち前記各プロセッサに割り当てられる実行対象処理の優先度が、しきい値以上か否かを判断する判断工程と、
前記判断工程によって、前記実行対象処理のうち、前記しきい値以上の優先度であると判断された高優先度の実行対象処理が実行時にアクセスするデータを、前記高優先度の実行対象処理を実行する各プロセッサのキャッシュメモリに配置する第1の配置工程と、
前記判断工程によって、前記実行対象処理のうち、前記しきい値以上の優先度でないと判断された低優先度の実行対象処理が実行時にアクセスするデータを、前記各プロセッサのキャッシュメモリよりアクセス速度の遅い他のメモリ領域に配置する第2の配置工程と、
前記マルチコアプロセッサの中の一のプロセッサにおいて、前記他のメモリ領域に配置されたデータへのアクセス要求が発生した場合に、前記他のメモリ領域に配置されたデータを前記一のプロセッサのキャッシュメモリに配置する第3の配置工程と、
を前記マルチコアプロセッサ内の特定のプロセッサに実行させることを特徴とするスケジューラ。 - 前記第1の配置工程は、
前記判断工程によって、前記実行対象処理のうち、前記しきい値以上の優先度であると判断された前記高優先度の実行対象処理がなかった場合、前記実行対象処理のうち、前記低優先度の実行対象処理が実行時にアクセスするデータを、前記低優先度の実行対象処理を実行する各プロセッサのキャッシュメモリに配置することを特徴とする請求項1に記載のスケジューラ。 - 前記第1の配置工程は、
前記高優先度の実行対象処理を実行する各プロセッサのキャッシュメモリに配置した、前記高優先度の実行対象処理が実行時にアクセスするデータに対して、前記高優先度の実行対象処理の実行が終了するまで他のデータによる上書きを禁止することを特徴とする請求項1に記載のスケジューラ。 - 前記判断工程によって前記実行対象処理の優先度が、しきい値以上か否かの判断が行われると、前記マルチコアプロセッサの各プロセッサのキャッシュメモリの中の書き換え可能な領域の容量を特定する特定工程を、前記特定のプロセッサに実行させ、
前記第1の配置工程は、
前記特定工程によって特定された書き換え可能な領域の容量が、前記高優先度の実行対象処理が実行時にアクセスするデータの容量よりも小さい場合、当該データのうち、更新頻度が高いデータの順に前記キャッシュメモリに配置可能な容量分配置し、
前記第2の配置工程は、
前記第1の配置工程によって前記キャッシュメモリに配置できなかったデータを前記他のメモリ領域に配置することを特徴とする請求項1に記載のスケジューラ。 - 前記第1の配置工程は、
前記特定工程によって特定された書き換え可能な領域の容量が、前記高優先度の実行対象処理が実行時にアクセスするデータの容量よりも大きい場合、前記高優先度の実行対象処理が実行時にアクセスするデータの配置が終了した後、前記低優先度の実行対象処理が実行時にアクセスするデータのうち、更新頻度が高いデータの順に前記キャッシュメモリに配置可能な容量分配置することを特徴とする請求項4に記載のスケジューラ。 - 前記第2の配置工程は、
前記他のメモリ領域としてアクセス速度の異なる複数種類のメモリが用意されている場合、前記低優先度の実行対象処理が実行時にアクセスするデータを、前記他のメモリ領域のうち、アクセス速度の速いメモリの順に配置可能な容量分配置することを特徴とする請求項1に記載のスケジューラ。 - 前記実行対象処理のうち、実行時にアクセスするデータが共通する処理を抽出する抽出工程と、
前記抽出工程によって抽出された処理を前記マルチコアプロセッサの中の同一のプロセッサに割り当てる割当工程と、
を前記特定のプロセッサに実行させることを特徴とする請求項1~6のいずれか一つに記載のスケジューラ。 - 前記割当工程は、
前記抽出工程によって抽出された処理のうち、同一の優先度が設定されている処理を、前記マルチコアプロセッサの中の同一のプロセッサに割り当てることを特徴とする請求項7に記載のスケジューラ。 - マルチコアプロセッサの各プロセッサに割り当てて実行させる処理群のうち前記各プロセッサに割り当てられる実行対象処理の優先度が、しきい値以上か否かを判断する判断手段と、
前記判断手段によって、前記実行対象処理のうち、前記しきい値以上の優先度であると判断された高優先度の実行対象処理が実行時にアクセスするデータを、前記高優先度の実行対象処理を実行する各プロセッサのキャッシュメモリに配置する第1の配置手段と、
前記判断手段によって、前記実行対象処理のうち、前記しきい値以上の優先度でないと判断された低優先度の実行対象処理が実行時にアクセスするデータを、前記各プロセッサのキャッシュメモリよりアクセス速度の遅い他のメモリ領域に配置する第2の配置手段と、
前記マルチコアプロセッサの中の一のプロセッサにおいて、前記他のメモリ領域に配置されたデータへのアクセス要求が発生した場合に、前記他のメモリ領域に配置されたデータを前記一のプロセッサのキャッシュメモリに配置する第3の配置手段と、
を備えることを特徴とするマルチコアプロセッサシステム。 - マルチコアプロセッサの各プロセッサに割り当てて実行させる処理群のうち前記各プロセッサに割り当てられる実行対象処理の優先度が、しきい値以上か否かを判断する判断工程と、
前記判断工程によって、前記実行対象処理のうち、前記しきい値以上の優先度であると判断された高優先度の実行対象処理が実行時にアクセスするデータを、前記高優先度の実行対象処理を実行する各プロセッサのキャッシュメモリに配置する第1の配置工程と、
前記判断工程によって、前記実行対象処理のうち、前記しきい値以上の優先度でないと判断された低優先度の実行対象処理が実行時にアクセスするデータを、前記各プロセッサのキャッシュメモリよりアクセス速度の遅い他のメモリ領域に配置する第2の配置工程と、
前記マルチコアプロセッサの中の一のプロセッサにおいて、前記他のメモリ領域に配置されたデータへのアクセス要求が発生した場合に、前記他のメモリ領域に配置されたデータを前記一のプロセッサのキャッシュメモリに配置する第3の配置工程と、
を前記マルチコアプロセッサ内の特定のプロセッサが実行することを特徴とするスケジューリング方法。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2010/064566 WO2012026034A1 (ja) | 2010-08-27 | 2010-08-27 | スケジューラ、マルチコアプロセッサシステムおよびスケジューリング方法 |
JP2012530494A JP5516744B2 (ja) | 2010-08-27 | 2010-08-27 | スケジューラ、マルチコアプロセッサシステムおよびスケジューリング方法 |
CN201080068768.2A CN103080903B (zh) | 2010-08-27 | 2010-08-27 | 调度器、多核处理器***以及调度方法 |
US13/749,606 US8996811B2 (en) | 2010-08-27 | 2013-01-24 | Scheduler, multi-core processor system, and scheduling method |
US14/601,978 US9430388B2 (en) | 2010-08-27 | 2015-01-21 | Scheduler, multi-core processor system, and scheduling method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2010/064566 WO2012026034A1 (ja) | 2010-08-27 | 2010-08-27 | スケジューラ、マルチコアプロセッサシステムおよびスケジューリング方法 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/749,606 Continuation US8996811B2 (en) | 2010-08-27 | 2013-01-24 | Scheduler, multi-core processor system, and scheduling method |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012026034A1 true WO2012026034A1 (ja) | 2012-03-01 |
Family
ID=45723061
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/064566 WO2012026034A1 (ja) | 2010-08-27 | 2010-08-27 | スケジューラ、マルチコアプロセッサシステムおよびスケジューリング方法 |
Country Status (4)
Country | Link |
---|---|
US (2) | US8996811B2 (ja) |
JP (1) | JP5516744B2 (ja) |
CN (1) | CN103080903B (ja) |
WO (1) | WO2012026034A1 (ja) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103369524A (zh) * | 2013-07-30 | 2013-10-23 | 东莞宇龙通信科技有限公司 | 终端和数据处理方法 |
WO2015145595A1 (ja) * | 2014-03-26 | 2015-10-01 | 株式会社日立製作所 | 計算機システム及び計算機システム管理方法 |
JP2017138903A (ja) * | 2016-02-05 | 2017-08-10 | 株式会社日立製作所 | 情報処理システム、及び、情報処理方法 |
JP2020533659A (ja) * | 2018-08-28 | 2020-11-19 | カンブリコン テクノロジーズ コーポレイション リミティド | データ前処理方法、装置、コンピュータ機器及び記憶媒体 |
JP2022160691A (ja) * | 2015-12-28 | 2022-10-19 | アドバンスト・マイクロ・ディバイシズ・インコーポレイテッド | 複数の計算コア上のデータドリブンスケジューラ |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140244936A1 (en) * | 2013-02-25 | 2014-08-28 | Lsi Corporation | Maintaining cache coherency between storage controllers |
US9665410B2 (en) * | 2013-03-12 | 2017-05-30 | Google Inc. | Processing of application programming interface traffic |
GB2516435A (en) * | 2013-04-05 | 2015-01-28 | Continental Automotive Systems | Embedded memory management scheme for real-time applications |
CN105573920B (zh) * | 2014-10-09 | 2019-02-01 | 华为技术有限公司 | 存储空间管理方法和装置 |
US9930133B2 (en) * | 2014-10-23 | 2018-03-27 | Netapp, Inc. | System and method for managing application performance |
CN105740164B (zh) | 2014-12-10 | 2020-03-17 | 阿里巴巴集团控股有限公司 | 支持缓存一致性的多核处理器、读写方法、装置及设备 |
US10038744B1 (en) * | 2015-06-29 | 2018-07-31 | EMC IP Holding Company LLC | Intelligent core assignment |
US11204871B2 (en) * | 2015-06-30 | 2021-12-21 | Advanced Micro Devices, Inc. | System performance management using prioritized compute units |
US10127088B2 (en) | 2015-09-10 | 2018-11-13 | Oracle Inrternational Corporation | Adaptive techniques for improving performance of hardware transactions on multi-socket machines |
US10474600B2 (en) | 2017-09-14 | 2019-11-12 | Samsung Electronics Co., Ltd. | Heterogeneous accelerator for highly efficient learning systems |
FR3071334B1 (fr) * | 2017-09-19 | 2019-08-30 | Psa Automobiles Sa | Procede pour assurer la stabilite des donnees d’un processeur multicoeur d’un vehicule automobile |
CN109522101B (zh) * | 2017-09-20 | 2023-11-14 | 三星电子株式会社 | 用于调度多个操作***任务的方法、***和/或装置 |
US10423510B2 (en) * | 2017-10-04 | 2019-09-24 | Arm Limited | Apparatus and method for predicting a redundancy period |
CN112905111A (zh) * | 2021-02-05 | 2021-06-04 | 三星(中国)半导体有限公司 | 数据缓存的方法和数据缓存的装置 |
CN114237274B (zh) * | 2021-09-28 | 2024-04-19 | 航天时代飞鸿技术有限公司 | 融合imu的旋翼无人机环境障碍物快速感知方法及*** |
CN114237509B (zh) * | 2021-12-17 | 2024-03-26 | 西安紫光展锐科技有限公司 | 数据访问方法及装置 |
JP7102640B1 (ja) | 2022-02-28 | 2022-07-19 | ヤマザキマザック株式会社 | 付加製造方法、付加製造システム、及び、付加製造プログラム |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH02238556A (ja) * | 1989-03-13 | 1990-09-20 | Hitachi Ltd | プロセススケジューリング方式およびマルチプロセッサシステム |
JPH07248967A (ja) * | 1994-03-11 | 1995-09-26 | Hitachi Ltd | メモリ制御方式 |
JP2009509274A (ja) * | 2005-09-21 | 2009-03-05 | クゥアルコム・インコーポレイテッド | キャッシュのパーティショニングを管理する方法および装置 |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS628242A (ja) * | 1985-07-04 | 1987-01-16 | Digital:Kk | キャッシュメモリ装置 |
JP3130591B2 (ja) * | 1991-09-11 | 2001-01-31 | シャープ株式会社 | キャッシュメモリを持つコンピュータ |
JPH06175923A (ja) | 1992-12-04 | 1994-06-24 | Oki Electric Ind Co Ltd | ディスクキャッシュ装置の制御方法 |
JPH10240698A (ja) | 1997-02-24 | 1998-09-11 | Nec Corp | 高負荷ジョブ負荷分散方式 |
US6223256B1 (en) * | 1997-07-22 | 2001-04-24 | Hewlett-Packard Company | Computer cache memory with classes and dynamic selection of replacement algorithms |
JPH11212869A (ja) | 1998-01-27 | 1999-08-06 | Sharp Corp | キャッシュメモリ制御方法及びこれを用いたマルチプロセッサシステム |
JP2000276381A (ja) | 1999-03-23 | 2000-10-06 | Toshiba Corp | タスク実行時間の見積もり方法 |
US7191349B2 (en) * | 2002-12-26 | 2007-03-13 | Intel Corporation | Mechanism for processor power state aware distribution of lowest priority interrupt |
US7177985B1 (en) * | 2003-05-30 | 2007-02-13 | Mips Technologies, Inc. | Microprocessor with improved data stream prefetching |
JP4374221B2 (ja) | 2003-08-29 | 2009-12-02 | パナソニック株式会社 | コンピュータシステムおよび記録媒体 |
US20080177979A1 (en) * | 2006-03-01 | 2008-07-24 | Gheorghe Stefan | Hardware multi-core processor optimized for object oriented computing |
FR2927438B1 (fr) * | 2008-02-08 | 2010-03-05 | Commissariat Energie Atomique | Methode de prechargement dans une hierarchie de memoires des configurations d'un systeme heterogene reconfigurable de traitement de l'information |
JP2011210201A (ja) * | 2010-03-30 | 2011-10-20 | Toshiba Corp | 情報処理装置 |
US8826049B2 (en) * | 2010-11-16 | 2014-09-02 | International Business Machines Corporation | Minimizing airflow using preferential memory allocation by prioritizing memory workload allocation to memory banks according to the locations of memory banks within the enclosure |
US9292446B2 (en) * | 2012-10-04 | 2016-03-22 | International Business Machines Corporation | Speculative prefetching of remote data |
-
2010
- 2010-08-27 JP JP2012530494A patent/JP5516744B2/ja not_active Expired - Fee Related
- 2010-08-27 CN CN201080068768.2A patent/CN103080903B/zh not_active Expired - Fee Related
- 2010-08-27 WO PCT/JP2010/064566 patent/WO2012026034A1/ja active Application Filing
-
2013
- 2013-01-24 US US13/749,606 patent/US8996811B2/en not_active Expired - Fee Related
-
2015
- 2015-01-21 US US14/601,978 patent/US9430388B2/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH02238556A (ja) * | 1989-03-13 | 1990-09-20 | Hitachi Ltd | プロセススケジューリング方式およびマルチプロセッサシステム |
JPH07248967A (ja) * | 1994-03-11 | 1995-09-26 | Hitachi Ltd | メモリ制御方式 |
JP2009509274A (ja) * | 2005-09-21 | 2009-03-05 | クゥアルコム・インコーポレイテッド | キャッシュのパーティショニングを管理する方法および装置 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103369524A (zh) * | 2013-07-30 | 2013-10-23 | 东莞宇龙通信科技有限公司 | 终端和数据处理方法 |
WO2015014017A1 (zh) * | 2013-07-30 | 2015-02-05 | 宇龙计算机通信科技(深圳)有限公司 | 终端、负载平衡方法和负载平衡装置 |
WO2015145595A1 (ja) * | 2014-03-26 | 2015-10-01 | 株式会社日立製作所 | 計算機システム及び計算機システム管理方法 |
JPWO2015145595A1 (ja) * | 2014-03-26 | 2017-04-13 | 株式会社日立製作所 | 計算機システム及び計算機システム管理方法 |
JP2022160691A (ja) * | 2015-12-28 | 2022-10-19 | アドバンスト・マイクロ・ディバイシズ・インコーポレイテッド | 複数の計算コア上のデータドリブンスケジューラ |
JP2017138903A (ja) * | 2016-02-05 | 2017-08-10 | 株式会社日立製作所 | 情報処理システム、及び、情報処理方法 |
JP2020533659A (ja) * | 2018-08-28 | 2020-11-19 | カンブリコン テクノロジーズ コーポレイション リミティド | データ前処理方法、装置、コンピュータ機器及び記憶媒体 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2012026034A1 (ja) | 2013-10-28 |
JP5516744B2 (ja) | 2014-06-11 |
US8996811B2 (en) | 2015-03-31 |
CN103080903B (zh) | 2016-07-06 |
CN103080903A (zh) | 2013-05-01 |
US20130138886A1 (en) | 2013-05-30 |
US9430388B2 (en) | 2016-08-30 |
US20150134912A1 (en) | 2015-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5516744B2 (ja) | スケジューラ、マルチコアプロセッサシステムおよびスケジューリング方法 | |
JP6314355B2 (ja) | メモリ管理方法およびデバイス | |
US9276987B1 (en) | Identifying nodes already storing indicated input data to perform distributed execution of an indicated program in a node cluster | |
US8499010B2 (en) | Garbage collection in a multiple virtual machine environment | |
US9378069B2 (en) | Lock spin wait operation for multi-threaded applications in a multi-core computing environment | |
CN102981929B (zh) | 磁盘镜像的管理方法和*** | |
US20130254776A1 (en) | Method to reduce queue synchronization of multiple work items in a system with high memory latency between processing nodes | |
JP5861706B2 (ja) | スケジューリング方法およびシステム | |
US20110265093A1 (en) | Computer System and Program Product | |
US20170123975A1 (en) | Centralized distributed systems and methods for managing operations | |
US8954969B2 (en) | File system object node management | |
US20190286582A1 (en) | Method for processing client requests in a cluster system, a method and an apparatus for processing i/o according to the client requests | |
CN108733585B (zh) | 缓存***及相关方法 | |
US9934147B1 (en) | Content-aware storage tiering techniques within a job scheduling system | |
JP5776813B2 (ja) | マルチコアプロセッサシステム、マルチコアプロセッサシステムの制御方法および制御プログラム | |
CN107528871B (zh) | 存储***中的数据分析 | |
KR20140037749A (ko) | 실행 제어 방법 및 멀티프로세서 시스템 | |
JP5158576B2 (ja) | 入出力制御システム、入出力制御方法、及び、入出力制御プログラム | |
AU2011229395B2 (en) | Dual mode reader writer lock | |
Chen et al. | Data prefetching and eviction mechanisms of in-memory storage systems based on scheduling for big data processing | |
Al-Bayati et al. | Partitioning and selection of data consistency mechanisms for multicore real-time systems | |
US20090320036A1 (en) | File System Object Node Management | |
CN115686855A (zh) | 缓存数据的访问调度方法、处理器、电子设备及存储介质 | |
CN112685334A (zh) | 一种分块缓存数据的方法、装置及存储介质 | |
JP6524733B2 (ja) | 並列演算装置、並列演算システム、およびジョブ制御プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080068768.2 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10856437 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2012530494 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 10856437 Country of ref document: EP Kind code of ref document: A1 |