CN101777007A - Parallel function simulation system for on-chip multi-core processor and method thereof - Google Patents

Parallel function simulation system for on-chip multi-core processor and method thereof Download PDF

Info

Publication number
CN101777007A
CN101777007A CN 201010103887 CN201010103887A CN101777007A CN 101777007 A CN101777007 A CN 101777007A CN 201010103887 CN201010103887 CN 201010103887 CN 201010103887 A CN201010103887 A CN 201010103887A CN 101777007 A CN101777007 A CN 101777007A
Authority
CN
China
Prior art keywords
module
thread
simulation
load
context
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010103887
Other languages
Chinese (zh)
Other versions
CN101777007B (en
Inventor
吴俊敏
尹巍
隋秀峰
赵小雨
唐轶轩
朱小东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Institute for Advanced Study USTC
Original Assignee
Suzhou Institute for Advanced Study USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Institute for Advanced Study USTC filed Critical Suzhou Institute for Advanced Study USTC
Priority to CN 201010103887 priority Critical patent/CN101777007B/en
Publication of CN101777007A publication Critical patent/CN101777007A/en
Application granted granted Critical
Publication of CN101777007B publication Critical patent/CN101777007B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a parallel function simulation system for an on-chip multi-core processor and a method thereof. The system comprises a system input module and a system output module, and is characterized in that: a simulation kernel module is arranged between the system input module and the system output module; the simulation kernel module receives working load information run on a target system and provided by the system input module; and the simulation kernel module dynamically establishes a multithread according to the type of working load to perform the parallelization processing of simulation working load and outputs a result through the system output module. In the invention, the problem of performance reduction caused due to the increment of the cores of the target system in a serial function simulation technique is solved. The system of the invention has a higher speed-up ratio and relatively high overall performance.

Description

The parallel function simulation system of chip multi-core processor and method thereof
Technical field
The invention belongs to the emulation field of the processor of information handling system, be specifically related to a kind of parallel function simulation system and method thereof of chip multi-core processor.
Background technology
The behavior that Computer Simulation comes the simulation computer system with software, the researcher can analyze the performance and the behavior of new construction by simulation software, and does not need to set up prototype system, and this has reduced the cycle and the cost of research greatly.Since nearly ten years, industry member and academia apply to emulation technology in the research and performance history of computer hardware and software architecture widely.Along with the arriving in multinuclear epoch, it is more and more important that emulation technology will become in the design process of polycaryon processor.
At present, most multinuclear emulators all are the serial emulators, and these emulators only run on the main thread.Along with the increase of goal systems check figure, the performance of emulator will be worse and worse.In the near future, Moore's Law will be doubled the number that changes per 18 months hardware threads on the sheet into by the transistor size on per 18 months sheets and be doubled.
Yet along with the increase of check figure on the sheet, quantity of state in the simulation process and code space will increase, and this will cause the increase of simulation time.This also may cause increasing considerably of L2 Cache disappearance, thereby causes the increase of emulation periodicity.Therefore, along with the increase of the check figure of goal systems, how emulation multinuclear goal systems will become more and more important on polycaryon processor.
Functional simulation is called simulation kernel again, is a kind of important instrument in the computer system simulation process.It is designed to the storehouse of a high degree of autonomy usually, for other parts of emulator provide interface.Generally, simulation kernel has context, the inquiry context state of creating and destroying context of software, loading procedure, the current existence of emulation, the function of carrying out machine instruction and processing prediction behavior.In the serial simulation kernel, have only a thread on operating system nucleus, to move.Generally, simulation kernel reads contextual content from context configuration file, then each trace is distributed in each context.Main thread in the operating system nucleus will be carried out the instruction in these contexts.But the serial emulator has its inherent defective, and promptly along with the increase of check figure on the goal systems, the overall performance of emulator will descend.
Summary of the invention
The object of the invention is to provide a kind of parallel function simulation system of chip multi-core processor, has solved in the prior art in the serial functional simulation technology problem that performance that the increase owing to the check figure of goal systems forms descends.
In order to solve these problems of the prior art, technical scheme provided by the invention is:
A kind of parallel function simulation system of chip multi-core processor, comprise system's load module and system's output module, it is characterized in that described system also comprises the simulation kernel module, described simulation kernel module is accepted the workload information moved on the goal systems that system's load module provides, described simulation kernel module is carried out the parallelization processing of simulation work load according to the type dynamic creation multithreading of operating load, and passes through system's output module output.
Preferably, described simulation kernel module comprises multiprogramming dummy load processing module and multithread programs dummy load processing module.
Preferably, described multiprogramming dummy load processing module is each application assigned context according to configuration file, and context is organized into creates each contextual thread behind the context chained list.
Preferably, described multithread programs dummy load processing module is created top layer that main thread runs on goal systems and is carried out after the initialization contextual information invoke system call dynamic creation experimental process thread that inserts in the chained list based on context.
Preferably, described analogue system also comprises the shared variable protection module, and when a plurality of threads carried out concurrent visit to shared variable simultaneously, described shared variable protection module used the mutual exclusion lock and the barrier operations of operating system grade to make concurrent access process serializing.
Preferably, described analogue system also comprises the thread local memory module, and described thread local memory module is provided at the copy that carries out global variable when creating thread in each thread.
Preferably, described analogue system also comprises the data packing module, and described data packing module is filled the data of distributing the visit of different host processor nuclear in different cache lines.
Another object of the present invention is to provide a kind of parallel function emulation mode of chip multi-core processor, it is characterized in that said method comprising the steps of:
(1) workload information of being moved on the goal systems that described simulation kernel module receiving system load module provides, and from the context configuration file of operating load, load contextual information;
(2) described simulation kernel module is created the thread of respective number according to the contextual information that is loaded;
(3) instruction in the corresponding context is carried out in the thread of described simulation kernel module dynamic creation and main thread parallel running, carries out system's output and finishes system emulation.
Preferably, the thread of the dynamic creation of simulation kernel module described in the described method and main thread carry out when synchronous, and mutual exclusion lock and barrier operations by operating system grade realize the synchronization of access shared variable.
Preferably, the thread of simulation kernel module creation described in the described method carries out the global variable privatization and guarantees that by the method for filling the data on the different host processor nuclears are assigned with by the data packing module in different Cache is capable by the thread local memory module.
The inventor develops on the basis of former serial emulator through studying for a long period of time, obtains the functional simulation device that parallelization is handled.Parallel function emulator of the present invention utilizes the multiple programming technology to carry out the parallelization of correlative code based on the code of serial functional simulation device.Realize directly and effectively quickening simulation speed by the parallelization technology.
The concrete job step of parallelization functional simulation device is as follows: simulation kernel loads contextual information from context configuration file; According to the contextual information that is loaded, create the thread of respective number; The instruction in the corresponding context is carried out in thread that these are created and main thread parallel running, until finishing emulation.
Yet, simply directly the parallelization of serial emulator can not really be realized parallelization functional simulation device; The inventor runs into a following difficult problem of needing solution badly in the parallelization implementation procedure:
At first be the parallelization problem of simulation kernel: in the process of emulator operation, the number of creating thread is usually by the type decided of the operating load that is moved on the goal systems.According to the type of dummy load, carry out the parallelization of simulation kernel discriminatively, comprise the parallelization of multiprogramming dummy load and the parallelization of multithread programs dummy load.Secondly, the protection of shared variable and stationary problem: in the process of parallelization, need cross-thread synchronously, these synchronous operations will cause that performance descends, therefore, need to realize the protection of shared resource and cross-thread synchronously.The privatization problem of global variable in addition: some state variable is overall in the serial emulator, but in the parallel artificial device, these global variables may be had by single nuclear, must realize the privatization of these global variables.False in addition sharing problem: in the process of parallel artificial kernel, will have the false phenomenon of sharing, the false performance that will influence the parallel artificial device greatly of sharing.
The inventor finds out the solution of the problems referred to above through studying for a long period of time, and concrete scheme is as follows:
(1) parallelization of simulation kernel:
Generally, serial functional simulation kernel is by loading the execution route that contextual information disposes multiprogramming from configuration file.Before carrying out, kernel is generally each application assigned context, directly these contexts is organized into the context chained list then.Because in multiprogramming, the synchronous operation between the thread seldom, the usually corresponding thread of each context is so can create corresponding thread again after the context chained list forms.
In the multithreading operating load, a context (being commonly called main thread) is only arranged, initialized the time, this thread runs on the top layer of goal systems, but in the process of operation, it can invoke system call create a lot of sub-threads.Obviously, these contexts must dynamically be inserted in the context chained list.Therefore, not only need in sub-thread creation, create corresponding thread, and must expand, allow it can support the function of dynamic creation thread kernel.
(2) protection of shared variable and synchronous:
In the process of parallelization serial program, the protection of the shared data that concurrent operations is visited is extremely important.In this course, usually make these concurrent access process serializings with locking.In the implementation procedure of parallel function emulator, the shared variable of numerous species is arranged, for example Hash table, shared memory space and context chained list or the like.When a plurality of threads carried out concurrent visit to shared variable simultaneously, the mutual exclusion lock with operating system grade provided necessary protection usually.But, in utilization lock, to note the utilization granularity and the quantity of locking, avoiding deadlock, thereby reach more satisfactory performance.
Moreover, the difference that synchronous operation often applies to the different threads in the multithread programs hereinafter between.In the serial emulator, when carrying out synchronous operation, only need the context of correspondence is put in the corresponding hang-up tabulation.But, in the parallel artificial device, when synchronous operation takes place, must while pending operation system thread and its pairing context.In this process, can realize such operation by lock and the barrier operations of utilizing operating system grade.
(3) privatization of global variable:
In the serial emulator, many states of emulator all are to share as global variable.But in the parallel artificial device, these variablees have the state of many copy versions with the reflection different IPs.If simply these variablees are modified as vector, will increase whole complicacy.In the parallelization process, solve this problem with the thread local storage usually.
The thread local storage is achieved as follows: in gcc, all have its copy in order to show a variable all threads in thread creation, usually will _ _ the thread key word is placed on before the overall situation or the static variable statement.
(4) solution of false sharing problem
In the serial emulator of polycaryon processor, many data structures are converted to the form of structure array, if each nuclear has an element in this array, in the process of parallelization, may cause false sharing.In order to address this problem, can guarantee that the data on the different IPs are assigned with in different Cache is capable with the method for filling.
With respect to scheme of the prior art, advantage of the present invention is:
Compare with the serial emulator, confirm that through emulation experiment parallel function simulation system of the present invention has higher speed-up ratio, so parallel function simulation system of the present invention has higher overall performance.
Description of drawings
Below in conjunction with drawings and Examples the present invention is further described:
Fig. 1 is the execution model synoptic diagram of parallel function emulator;
The speed-up ratio that Fig. 2 obtains when moving the multiple tracks load for the parallel function emulator;
The average speedup that Fig. 3 obtains when moving the multiple tracks load for the parallel function emulator;
The speed-up ratio that Fig. 4 is obtained when moving the multithreading load for the parallel function emulator;
The average speedup that Fig. 5 is obtained when moving the multithreading load for the parallel function emulator.
Embodiment
Below in conjunction with specific embodiment such scheme is described further.Should be understood that these embodiment are used to the present invention is described and are not limited to limit the scope of the invention.The implementation condition that adopts among the embodiment can be done further adjustment according to the condition of concrete producer, and not marked implementation condition is generally the condition in the normal experiment.
The practice of embodiment parallel function simulation system and test
Present embodiment has been realized the parallel function emulator on the basis of serial emulator Multi2sim-2.1.In whole implementation process, used server is a dawn theory of evolution EP850-GF minicomputer, and the concrete configuration of this minicomputer is as follows: 84 nuclear AMD Opteron 83461.8G HE CPU, 32G DDR2ECC internal memory, 4*146G SAS hard disk.The operating system of this server operation is LinuxDebain (X86-64).
Experiment shows that serial emulator Multi2sim-2.1 is in the goal systems of emulation 8 nuclears, and emulation is slowed down and reached 18.24.In the time of the goal systems of this emulator emulation 16 nuclears, emulation is slowed down and is reached 165.24 unexpectedly.Along with the increase of target check figure, the emulation of serial emulator is slowed down and will sharply be increased.Therefore, must quicken emulator, improve the overall performance of this emulator with parallelization.
In the present embodiment, the serial simulation engine at first loads multiple tracks load or multithreading load, creates a plurality of contexts then respectively.If the loadtype difference, creating contextual mode also can be different.For the multiple tracks load, context is that each application program is created when being loaded into client's internal memory in the multiple tracks load, and for the multithreading load, only create a Your Majesty during loading hereinafter, all the other contexts then are dynamically to generate when hereinafter carrying out the thread creation primitive of correspondence as the Your Majesty.Simulation engine has been safeguarded a series of state-chain-tables (activity, hang-up etc.) jointly for the context of all establishments, carry out the contextual instruction of each activity then successively, may revise contextual state in this process, then end functions emulation when the instruction of the required execution of all contexts is all finished.From the description of this process as can be seen, the multinuclear functional simulation has natural symmetry and isolation, be also pointed out that the emulation that can not relate in the functional simulation microarchitectures such as streamline, Cache simultaneously, these have all simplified parallel function Simulator Design and realization greatly.
In the present embodiment, basic thought just is to use POSIX multi-thread programming model to come Multi2sim-2.1 is carried out parallelization, promptly realize each contextual simulation process with a Pthread thread respectively, implementation procedure at functional simulation, (wherein mainly being the position of Pthread thread creation) taked the strategy of dividing and ruling when determining parallel function emulator structure, promptly according to the difference of multiple tracks and multithreading load simulation process, formulate different thread creation schemes respectively and be achieved, the two merges and forms parallel function emulator the most at last in debugging and after optimizing.For the multiple tracks load, context is to create when each application program that the multiple tracks load is comprised is loaded into client's internal memory, therefore after finishing, loading just can create simultaneously and the corresponding to a plurality of threads of amount of context, and unified beginning simulation process; And for the multithreading load, sub-context dynamically generates when the Your Majesty hereinafter carries out context creation primitive, therefore with it mutually the thread of binding need wait until that also this could create constantly, system call place of realization context creation primitive has increased the thread creation process in emulator for this reason.
When carrying out thread creation, can realize by the parallel function emulator of multiple tracks load by the code below similar:
main()
{
//create?POSIX?threads?according?to?ctx
for(current_ctx=ke->contx_list;current_ctx->contx_next;current_ctx=current_
ctx->contx_next)
{
pth?read_create(&pid[i++],NULL,ke_execute,(void*)cu?rrent_ctx);
}
ke_execute((void*)current_ctx);
for(context_number=0;context_number<ctxnum1;context_number++)
{
pthread_join(pid[context_number],NULL);
}
}
ke_execute(void*args)
{
struct?ctx_t*ctx=(stru?ct?ctx_t*)args;
while(psim_cycle<max_cycles){
if(!ctx_get_status(ctx,ctx_running))
break;
/*Run?an?instruction?from?a?dedicated?context*/
ke_run((void*)ctx);
psim_cycle++;
}
}
When carrying out thread creation, can realize by the parallel function emulator of multithreading load by the code below similar:
void?syscall_do()
{
case?syscall_code_clone:
{
pid_array_index++;
if(pid_array_index==cores_num-1)
{
struct?ctx_t*current_ctx=NULL;
for(current_ctx=isa_ctx->contx_prev;current_ctx;
current_ctx=current_ctx->contx_prev){
pthread_t?sub_pid;
memcpy(current_ctx->mem,isa_ctx->mem,sizeof(struct?mem_t));
pthread_create(&sub_pid,NULL,ke_execute,(void*)current_ctx);
pid_array[pid_a?rray_index?]=sub_pid;
}
}
}
}
After finishing the thread creation process, the execution model of parallel artificial device as shown in Figure 1.To comprise and amount of context corresponding to a plurality of threads in the system this moment, and each thread is operated system call to different host processor nuclear, the concurrent complete functional simulation to respective contexts of each thread.
Below comprehensively in the implementation procedure of this example, be specifically related to the protection of shared variable with synchronously, realize the thread local variable storage and eliminate false shared mechanism.
Protection is the key factor that influences concurrent program correctness and performance to the concurrent visit of shared resource, and lock then is one of serialized common technology of visit with shared resource.In Multi2sim-2.1; exist in a large number such as the context state chained list; shared resources such as Hash table and client's internal memory; in the parallel artificial device; when the nuclear thread conducts interviews to these resources; use the mutual exclusion lock of operating system grade to come it is protected, reasonably arrange the granularity and the quantity of lock simultaneously, with the speed-up ratio of maximization parallel artificial device.
In addition, between a plurality of contexts of multithreading load, often exist a large amount of synchronization primitives (as lock, roadblock etc.), when the serial emulator is carried out certain contextual synchronization primitives, only need in corresponding system call, this context be inserted in the suitable context state chained list, during as the lock that can not obtain when certain context to be asked, then be inserted into and hang up in the chained list.Yet, for the parallel artificial device, when context is inserted into suitable state-chain-table, also the operating system thread at its place to be switched to rational state, in emulator, realize having increased in the system call function of synchronization primitives for this reason with its function relative operation system level synchronous operation to achieve the above object.
The same with many serial programs, the distinguishing feature of serial multinuclear emulator Multi2sim-2.1 just is to use global variable to control the executing state of simulator.For example, carry out different contextual instructions in order to use unified interface function, Multi2sim-2.1 has introduced isa_ctx, isa_regs, a series of global variables such as isa_mem.When realizing the parallel artificial device, most of similarly variable all needs an independent copy at each context, yet do the problem that is faced like this and will identify these variablees exactly exactly, it is extended to array (vector) form, and quote separately variable by context id.The workload of obvious this implementation is bigger, and bring certain trouble to writing and debugging of parallel artificial device code, use the language construction of thread local storage (Thread Local Storage:TLS) to solve the predicament that is faced in this class variable parallelization process for this reason.
In gcc, TLS can by before the statement of global variable or static local variable, use _ _ the thread key word realizes, this means that these variablees can generate a duplicate automatically when thread creation.Can manually the variable of the single thread nuclear of expression attribute and the variable of being shared by a plurality of thread be made a distinction in this way, and improve the performance of parallel artificial device.
In serial multinuclear emulator, many data structures all are to organize in the mode of array, the corresponding array element of each processor core.Although from the angle of code compiling, compare so conveniently, and can not cause any problem, when exploitation parallel artificial device, a plurality of threads are quoted array element separately can cause false sharing problem.In order to alleviate this problem, use the method for filling (Padding) to guarantee to be assigned in the different cache lines by the data of different host processor nuclear visit.
Because Multi2sim-2.1 is with the thinking design of serial program fully and realizes, therefore exist the variable of a large amount of the above-mentioned types, these variablees are very big to the performance impact of parallel artificial device in the parallelization process, spent the plenty of time excavates such variable for this reason, finally makes the speed of emulator reach gratifying effect.
Test process adopts the performance of multiple tracks and multithreading load evaluation and test parallel function emulator, and the multiple tracks load is to be combined by the related application among the SPEC2006.Table 1 has been listed employed whole multiple tracks load combinations in the test process.The multithreading test procedure is from Splash2.Used several test loads are FFT in this test process, LU (c), RADIX and LU (n).
Table 1 multiple tracks load combinations
Figure GSA00000009059500091
Figure GSA00000009059500101
The speed-up ratio that Fig. 2, Fig. 3 have been obtained when having provided this emulator operation multiple tracks test load, as can be seen from the figure, when goal systems had 2 nuclears, the speed-up ratio of the maximum that can reach can reach 1.914, and the speed-up ratio of the minimum that obtains is 1.478; When goal systems had 4 nuclears, the maximum speed-up ratio that can reach can reach 3.814, and the minimum speed-up ratio that obtains is 3.366; When goal systems has 8 nuclears, obtainable maximum speed-up ratio can reach 7.618, and the minimum speed-up ratio that obtains is 7.143, when goal systems has 16 nuclears, the maximum speed-up ratio that can reach can reach 15.827, and the minimum speed-up ratio that is obtained is 15.321.It can also be seen that from Fig. 3 in 2 nuclears, 4 nuclears, 8 nuclears and 16 nuclears, average speedup is respectively 1.748,3.644,7.372 and 15.628.
The speed-up ratio that has been obtained when having provided this emulator operation multithreading test load among Fig. 4, Fig. 5, as can be seen from the figure, when goal systems had 2 nuclears, the speed-up ratio of the maximum that can reach was 1.819, and the speed-up ratio of the minimum that obtains has only 1.378; When goal systems had 4 nuclears, the maximum speed-up ratio that can reach was 3.131, and the minimum speed-up ratio that obtains has only 1.838; When goal systems had 8 nuclears, obtainable maximum speed-up ratio was 4.852, and the minimum speed-up ratio that obtains has only 2.434, and when goal systems had 16 nuclears, the maximum speed-up ratio that can reach was 6.372, and the minimum speed-up ratio that is obtained has only 4.821.It can also be seen that from Fig. 5 in 2 nuclears, 4 nuclears, 8 nuclears and 16 nuclears, average speedup is respectively 1.692,2.760,3.833 and 5.292.
When the load of operation multithreading, the speed-up ratio height that the speed-up ratio that the parallel function emulator is obtained is obtained when not moving the multiple tracks load, this phenomenon is mainly caused by the signal post of cross-thread.In the multiple tracks load, cross-thread is communicated by letter hardly, can obtain comparatively desirable speed-up ratio when therefore moving the multiple tracks load.But, between the different threads of multithreading load, have a large amount of inter-thread communications.Along with the increase of check figure, such expense will be very big, thereby cause the decline of parallel function emulator overall performance.
Above-mentioned example only is explanation technical conceive of the present invention and characteristics, and its purpose is to allow the people who is familiar with this technology can understand content of the present invention and enforcement according to this, can not limit protection scope of the present invention with this.All equivalent transformations that spirit is done according to the present invention or modification all should be encompassed within protection scope of the present invention.

Claims (10)

1. the parallel function simulation system of a chip multi-core processor, comprise system's load module and system's output module, it is characterized in that described system also comprises the simulation kernel module, described simulation kernel module is accepted the workload information moved on the goal systems that system's load module provides, described simulation kernel module is carried out the parallelization processing of simulation work load according to the type dynamic creation multithreading of operating load, and passes through system's output module output.
2. the parallel function simulation system of chip multi-core processor according to claim 1 is characterized in that described simulation kernel module comprises multiprogramming dummy load processing module and multithread programs dummy load processing module.
3. the parallel function simulation system of chip multi-core processor according to claim 2, it is characterized in that described multiprogramming dummy load processing module is each application assigned context according to configuration file, and context is organized into creates each contextual thread behind the context chained list.
4. the parallel function simulation system of chip multi-core processor according to claim 2 is characterized in that described multithread programs dummy load processing module creates top layer that main thread runs on goal systems and carry out after the initialization contextual information invoke system call dynamic creation experimental process thread that inserts in the chained list based on context.
5. the parallel function simulation system of chip multi-core processor according to claim 1; it is characterized in that described analogue system also comprises the shared variable protection module; when a plurality of threads carried out concurrent visit to shared variable simultaneously, described shared variable protection module used the mutual exclusion lock and the barrier operations of operating system grade to make concurrent access process serializing.
6. the parallel function simulation system of chip multi-core processor according to claim 1, it is characterized in that described analogue system also comprises the thread local memory module, described thread local memory module is provided at the copy that carries out global variable when creating thread in each thread.
7. the parallel function simulation system of chip multi-core processor according to claim 1, it is characterized in that described analogue system also comprises the data packing module, described data packing module is filled the data of distributing the visit of different host processor nuclear in different cache lines.
8. the parallel function emulation mode of a chip multi-core processor is characterized in that said method comprising the steps of:
(1) workload information of being moved on the goal systems that described simulation kernel module receiving system load module provides, and from the context configuration file of operating load, load contextual information;
(2) described simulation kernel module is created the thread of respective number according to the contextual information that is loaded;
(3) instruction in the corresponding context is carried out in the thread of described simulation kernel module dynamic creation and main thread parallel running, carries out system's output and finishes system emulation.
9. method according to claim 8 is characterized in that the thread of the dynamic creation of simulation kernel module described in the described method and main thread carry out when synchronous, and mutual exclusion lock and barrier operations by operating system grade realize the synchronization of access shared variable.
10. method according to claim 8 is characterized in that the thread of simulation kernel module creation described in the described method carries out the global variable privatization and guarantees that by the method for filling the data on the different host processor nuclears are assigned with by the data packing module in different Cache is capable by the thread local memory module.
CN 201010103887 2010-01-28 2010-01-28 Parallel function simulation system for on-chip multi-core processor and method thereof Expired - Fee Related CN101777007B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010103887 CN101777007B (en) 2010-01-28 2010-01-28 Parallel function simulation system for on-chip multi-core processor and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010103887 CN101777007B (en) 2010-01-28 2010-01-28 Parallel function simulation system for on-chip multi-core processor and method thereof

Publications (2)

Publication Number Publication Date
CN101777007A true CN101777007A (en) 2010-07-14
CN101777007B CN101777007B (en) 2013-04-10

Family

ID=42513477

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010103887 Expired - Fee Related CN101777007B (en) 2010-01-28 2010-01-28 Parallel function simulation system for on-chip multi-core processor and method thereof

Country Status (1)

Country Link
CN (1) CN101777007B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102279766A (en) * 2011-08-30 2011-12-14 华为技术有限公司 Method and system for concurrently simulating processors and scheduler
CN102331961A (en) * 2011-09-13 2012-01-25 华为技术有限公司 Method, system and dispatcher for simulating multiple processors in parallel
CN102467406A (en) * 2010-11-09 2012-05-23 无锡江南计算技术研究所 Simulation method and simulator in multi-processor structure
CN102591759A (en) * 2011-12-29 2012-07-18 中国科学技术大学苏州研究院 Clock precision parallel simulation system for on-chip multi-core processor
CN102880770A (en) * 2012-10-29 2013-01-16 无锡江南计算技术研究所 Central processing unit (CPU) access sequence simulation model based on macro-instruction queue
CN103049310A (en) * 2012-12-29 2013-04-17 中国科学院深圳先进技术研究院 Multi-core simulation parallel accelerating method based on sampling
CN103472734A (en) * 2013-09-18 2013-12-25 南车株洲电力机车研究所有限公司 Semi-physical simulation method and system of urban rail traction system
US9639636B1 (en) * 2012-04-02 2017-05-02 Google Inc. Algorithmically driven selection of parallelization technique for running model simulation
CN107980118A (en) * 2015-06-10 2018-05-01 无比视视觉技术有限公司 Use the multi-nuclear processor equipment of multiple threads
CN109460677A (en) * 2018-11-12 2019-03-12 湖南中车时代通信信号有限公司 The data-storage system of multi-tasking is used under a kind of embedded environment
CN112463716A (en) * 2020-11-27 2021-03-09 中船重工(武汉)凌久电子有限责任公司 Global semaphore implementation method based on multi-core multi-processor parallel system
CN113360280A (en) * 2021-06-02 2021-09-07 西安中锐创联科技有限公司 Simulation curve display method based on multi-thread operation and dynamic global variable processing

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6142682A (en) * 1997-06-13 2000-11-07 Telefonaktiebolaget Lm Ericsson Simulation of computer processor
US7392165B2 (en) * 2002-10-21 2008-06-24 Fisher-Rosemount Systems, Inc. Simulation system for multi-node process control systems
CN101256502B (en) * 2007-02-27 2011-02-09 国际商业机器公司 System and method for simulating multiprocessor system

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467406A (en) * 2010-11-09 2012-05-23 无锡江南计算技术研究所 Simulation method and simulator in multi-processor structure
CN102467406B (en) * 2010-11-09 2014-04-16 无锡江南计算技术研究所 Simulation method and simulator in multi-processor structure
CN102279766B (en) * 2011-08-30 2014-05-07 华为技术有限公司 Method and system for concurrently simulating processors and scheduler
CN102279766A (en) * 2011-08-30 2011-12-14 华为技术有限公司 Method and system for concurrently simulating processors and scheduler
WO2013029513A1 (en) * 2011-08-30 2013-03-07 华为技术有限公司 Method and system, scheduler for parallel simulating processors
US9703905B2 (en) 2011-09-13 2017-07-11 Huawei Technologies Co., Ltd. Method and system for simulating multiple processors in parallel and scheduler
CN102331961B (en) * 2011-09-13 2014-02-19 华为技术有限公司 Method, system and dispatcher for simulating multiple processors in parallel
CN102331961A (en) * 2011-09-13 2012-01-25 华为技术有限公司 Method, system and dispatcher for simulating multiple processors in parallel
CN102591759A (en) * 2011-12-29 2012-07-18 中国科学技术大学苏州研究院 Clock precision parallel simulation system for on-chip multi-core processor
CN102591759B (en) * 2011-12-29 2014-08-13 中国科学技术大学苏州研究院 Clock precision parallel simulation system for on-chip multi-core processor
US9639636B1 (en) * 2012-04-02 2017-05-02 Google Inc. Algorithmically driven selection of parallelization technique for running model simulation
CN102880770A (en) * 2012-10-29 2013-01-16 无锡江南计算技术研究所 Central processing unit (CPU) access sequence simulation model based on macro-instruction queue
CN103049310B (en) * 2012-12-29 2016-12-28 中国科学院深圳先进技术研究院 A kind of multi-core simulation parallel acceleration method based on sampling
CN103049310A (en) * 2012-12-29 2013-04-17 中国科学院深圳先进技术研究院 Multi-core simulation parallel accelerating method based on sampling
CN103472734A (en) * 2013-09-18 2013-12-25 南车株洲电力机车研究所有限公司 Semi-physical simulation method and system of urban rail traction system
CN107980118A (en) * 2015-06-10 2018-05-01 无比视视觉技术有限公司 Use the multi-nuclear processor equipment of multiple threads
CN107980118B (en) * 2015-06-10 2021-09-21 无比视视觉技术有限公司 Multi-core processor device using multi-thread processing
US11294815B2 (en) 2015-06-10 2022-04-05 Mobileye Vision Technologies Ltd. Multiple multithreaded processors with shared data cache
CN109460677A (en) * 2018-11-12 2019-03-12 湖南中车时代通信信号有限公司 The data-storage system of multi-tasking is used under a kind of embedded environment
CN112463716A (en) * 2020-11-27 2021-03-09 中船重工(武汉)凌久电子有限责任公司 Global semaphore implementation method based on multi-core multi-processor parallel system
CN112463716B (en) * 2020-11-27 2024-02-13 中船重工(武汉)凌久电子有限责任公司 Global semaphore implementation method based on multi-core multi-processor parallel system
CN113360280A (en) * 2021-06-02 2021-09-07 西安中锐创联科技有限公司 Simulation curve display method based on multi-thread operation and dynamic global variable processing
CN113360280B (en) * 2021-06-02 2023-11-28 西安中锐创联科技有限公司 Simulation curve display method based on multithread operation and dynamic global variable processing

Also Published As

Publication number Publication date
CN101777007B (en) 2013-04-10

Similar Documents

Publication Publication Date Title
CN101777007B (en) Parallel function simulation system for on-chip multi-core processor and method thereof
Wu et al. Enabling and exploiting flexible task assignment on GPU through SM-centric program transformations
Wang et al. Dynamic thread block launch: A lightweight execution mechanism to support irregular applications on gpus
Chen et al. FlinkCL: An OpenCL-based in-memory computing architecture on heterogeneous CPU-GPU clusters for big data
Cui et al. Parrot: A practical runtime for deterministic, stable, and reliable threads
CN102576314B (en) The mapping with the data parallel thread across multiple processors processes logic
Krieder et al. Design and evaluation of the gemtc framework for gpu-enabled many-task computing
Bortolotti et al. Virtualsoc: A full-system simulation environment for massively parallel heterogeneous system-on-chip
Wang et al. SODA: Software defined FPGA based accelerators for big data
US10318261B2 (en) Execution of complex recursive algorithms
Haji et al. A State of Art Survey for OS Performance Improvement
Qian et al. Accelerating RTL simulation with GPUs
Tian et al. Concurrent execution of deferred OpenMP target tasks with hidden helper threads
Robson et al. Runtime coordinated heterogeneous tasks in Charm++
US10761821B1 (en) Object oriented programming model for graphics processing units (GPUS)
CN104899369A (en) Simulator multithreading operation method utilizing PERL script
Aoki et al. Hybrid opencl: Connecting different opencl implementations over network
Liu et al. Applying GPU and POSIX thread technologies in massive remote sensing image data processing
CN102117224B (en) Multi-core processor-oriented operating system noise control method
US20090133022A1 (en) Multiprocessing apparatus, system and method
Raghav et al. Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting
Valero et al. Towards a more efficient use of gpus
Tomiyama et al. SMYLE OpenCL: A programming framework for embedded many-core SoCs
Häuser et al. A test suite for high-performance parallel Java
Zou et al. Supernodal sparse Cholesky factorization on graphics processing units

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130410

Termination date: 20180128

CF01 Termination of patent right due to non-payment of annual fee