CN101777007A - Parallel function simulation system for on-chip multi-core processor and method thereof - Google Patents
Parallel function simulation system for on-chip multi-core processor and method thereof Download PDFInfo
- Publication number
- CN101777007A CN101777007A CN 201010103887 CN201010103887A CN101777007A CN 101777007 A CN101777007 A CN 101777007A CN 201010103887 CN201010103887 CN 201010103887 CN 201010103887 A CN201010103887 A CN 201010103887A CN 101777007 A CN101777007 A CN 101777007A
- Authority
- CN
- China
- Prior art keywords
- module
- thread
- simulation
- load
- context
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a parallel function simulation system for an on-chip multi-core processor and a method thereof. The system comprises a system input module and a system output module, and is characterized in that: a simulation kernel module is arranged between the system input module and the system output module; the simulation kernel module receives working load information run on a target system and provided by the system input module; and the simulation kernel module dynamically establishes a multithread according to the type of working load to perform the parallelization processing of simulation working load and outputs a result through the system output module. In the invention, the problem of performance reduction caused due to the increment of the cores of the target system in a serial function simulation technique is solved. The system of the invention has a higher speed-up ratio and relatively high overall performance.
Description
Technical field
The invention belongs to the emulation field of the processor of information handling system, be specifically related to a kind of parallel function simulation system and method thereof of chip multi-core processor.
Background technology
The behavior that Computer Simulation comes the simulation computer system with software, the researcher can analyze the performance and the behavior of new construction by simulation software, and does not need to set up prototype system, and this has reduced the cycle and the cost of research greatly.Since nearly ten years, industry member and academia apply to emulation technology in the research and performance history of computer hardware and software architecture widely.Along with the arriving in multinuclear epoch, it is more and more important that emulation technology will become in the design process of polycaryon processor.
At present, most multinuclear emulators all are the serial emulators, and these emulators only run on the main thread.Along with the increase of goal systems check figure, the performance of emulator will be worse and worse.In the near future, Moore's Law will be doubled the number that changes per 18 months hardware threads on the sheet into by the transistor size on per 18 months sheets and be doubled.
Yet along with the increase of check figure on the sheet, quantity of state in the simulation process and code space will increase, and this will cause the increase of simulation time.This also may cause increasing considerably of L2 Cache disappearance, thereby causes the increase of emulation periodicity.Therefore, along with the increase of the check figure of goal systems, how emulation multinuclear goal systems will become more and more important on polycaryon processor.
Functional simulation is called simulation kernel again, is a kind of important instrument in the computer system simulation process.It is designed to the storehouse of a high degree of autonomy usually, for other parts of emulator provide interface.Generally, simulation kernel has context, the inquiry context state of creating and destroying context of software, loading procedure, the current existence of emulation, the function of carrying out machine instruction and processing prediction behavior.In the serial simulation kernel, have only a thread on operating system nucleus, to move.Generally, simulation kernel reads contextual content from context configuration file, then each trace is distributed in each context.Main thread in the operating system nucleus will be carried out the instruction in these contexts.But the serial emulator has its inherent defective, and promptly along with the increase of check figure on the goal systems, the overall performance of emulator will descend.
Summary of the invention
The object of the invention is to provide a kind of parallel function simulation system of chip multi-core processor, has solved in the prior art in the serial functional simulation technology problem that performance that the increase owing to the check figure of goal systems forms descends.
In order to solve these problems of the prior art, technical scheme provided by the invention is:
A kind of parallel function simulation system of chip multi-core processor, comprise system's load module and system's output module, it is characterized in that described system also comprises the simulation kernel module, described simulation kernel module is accepted the workload information moved on the goal systems that system's load module provides, described simulation kernel module is carried out the parallelization processing of simulation work load according to the type dynamic creation multithreading of operating load, and passes through system's output module output.
Preferably, described simulation kernel module comprises multiprogramming dummy load processing module and multithread programs dummy load processing module.
Preferably, described multiprogramming dummy load processing module is each application assigned context according to configuration file, and context is organized into creates each contextual thread behind the context chained list.
Preferably, described multithread programs dummy load processing module is created top layer that main thread runs on goal systems and is carried out after the initialization contextual information invoke system call dynamic creation experimental process thread that inserts in the chained list based on context.
Preferably, described analogue system also comprises the shared variable protection module, and when a plurality of threads carried out concurrent visit to shared variable simultaneously, described shared variable protection module used the mutual exclusion lock and the barrier operations of operating system grade to make concurrent access process serializing.
Preferably, described analogue system also comprises the thread local memory module, and described thread local memory module is provided at the copy that carries out global variable when creating thread in each thread.
Preferably, described analogue system also comprises the data packing module, and described data packing module is filled the data of distributing the visit of different host processor nuclear in different cache lines.
Another object of the present invention is to provide a kind of parallel function emulation mode of chip multi-core processor, it is characterized in that said method comprising the steps of:
(1) workload information of being moved on the goal systems that described simulation kernel module receiving system load module provides, and from the context configuration file of operating load, load contextual information;
(2) described simulation kernel module is created the thread of respective number according to the contextual information that is loaded;
(3) instruction in the corresponding context is carried out in the thread of described simulation kernel module dynamic creation and main thread parallel running, carries out system's output and finishes system emulation.
Preferably, the thread of the dynamic creation of simulation kernel module described in the described method and main thread carry out when synchronous, and mutual exclusion lock and barrier operations by operating system grade realize the synchronization of access shared variable.
Preferably, the thread of simulation kernel module creation described in the described method carries out the global variable privatization and guarantees that by the method for filling the data on the different host processor nuclears are assigned with by the data packing module in different Cache is capable by the thread local memory module.
The inventor develops on the basis of former serial emulator through studying for a long period of time, obtains the functional simulation device that parallelization is handled.Parallel function emulator of the present invention utilizes the multiple programming technology to carry out the parallelization of correlative code based on the code of serial functional simulation device.Realize directly and effectively quickening simulation speed by the parallelization technology.
The concrete job step of parallelization functional simulation device is as follows: simulation kernel loads contextual information from context configuration file; According to the contextual information that is loaded, create the thread of respective number; The instruction in the corresponding context is carried out in thread that these are created and main thread parallel running, until finishing emulation.
Yet, simply directly the parallelization of serial emulator can not really be realized parallelization functional simulation device; The inventor runs into a following difficult problem of needing solution badly in the parallelization implementation procedure:
At first be the parallelization problem of simulation kernel: in the process of emulator operation, the number of creating thread is usually by the type decided of the operating load that is moved on the goal systems.According to the type of dummy load, carry out the parallelization of simulation kernel discriminatively, comprise the parallelization of multiprogramming dummy load and the parallelization of multithread programs dummy load.Secondly, the protection of shared variable and stationary problem: in the process of parallelization, need cross-thread synchronously, these synchronous operations will cause that performance descends, therefore, need to realize the protection of shared resource and cross-thread synchronously.The privatization problem of global variable in addition: some state variable is overall in the serial emulator, but in the parallel artificial device, these global variables may be had by single nuclear, must realize the privatization of these global variables.False in addition sharing problem: in the process of parallel artificial kernel, will have the false phenomenon of sharing, the false performance that will influence the parallel artificial device greatly of sharing.
The inventor finds out the solution of the problems referred to above through studying for a long period of time, and concrete scheme is as follows:
(1) parallelization of simulation kernel:
Generally, serial functional simulation kernel is by loading the execution route that contextual information disposes multiprogramming from configuration file.Before carrying out, kernel is generally each application assigned context, directly these contexts is organized into the context chained list then.Because in multiprogramming, the synchronous operation between the thread seldom, the usually corresponding thread of each context is so can create corresponding thread again after the context chained list forms.
In the multithreading operating load, a context (being commonly called main thread) is only arranged, initialized the time, this thread runs on the top layer of goal systems, but in the process of operation, it can invoke system call create a lot of sub-threads.Obviously, these contexts must dynamically be inserted in the context chained list.Therefore, not only need in sub-thread creation, create corresponding thread, and must expand, allow it can support the function of dynamic creation thread kernel.
(2) protection of shared variable and synchronous:
In the process of parallelization serial program, the protection of the shared data that concurrent operations is visited is extremely important.In this course, usually make these concurrent access process serializings with locking.In the implementation procedure of parallel function emulator, the shared variable of numerous species is arranged, for example Hash table, shared memory space and context chained list or the like.When a plurality of threads carried out concurrent visit to shared variable simultaneously, the mutual exclusion lock with operating system grade provided necessary protection usually.But, in utilization lock, to note the utilization granularity and the quantity of locking, avoiding deadlock, thereby reach more satisfactory performance.
Moreover, the difference that synchronous operation often applies to the different threads in the multithread programs hereinafter between.In the serial emulator, when carrying out synchronous operation, only need the context of correspondence is put in the corresponding hang-up tabulation.But, in the parallel artificial device, when synchronous operation takes place, must while pending operation system thread and its pairing context.In this process, can realize such operation by lock and the barrier operations of utilizing operating system grade.
(3) privatization of global variable:
In the serial emulator, many states of emulator all are to share as global variable.But in the parallel artificial device, these variablees have the state of many copy versions with the reflection different IPs.If simply these variablees are modified as vector, will increase whole complicacy.In the parallelization process, solve this problem with the thread local storage usually.
The thread local storage is achieved as follows: in gcc, all have its copy in order to show a variable all threads in thread creation, usually will _ _ the thread key word is placed on before the overall situation or the static variable statement.
(4) solution of false sharing problem
In the serial emulator of polycaryon processor, many data structures are converted to the form of structure array, if each nuclear has an element in this array, in the process of parallelization, may cause false sharing.In order to address this problem, can guarantee that the data on the different IPs are assigned with in different Cache is capable with the method for filling.
With respect to scheme of the prior art, advantage of the present invention is:
Compare with the serial emulator, confirm that through emulation experiment parallel function simulation system of the present invention has higher speed-up ratio, so parallel function simulation system of the present invention has higher overall performance.
Description of drawings
Below in conjunction with drawings and Examples the present invention is further described:
Fig. 1 is the execution model synoptic diagram of parallel function emulator;
The speed-up ratio that Fig. 2 obtains when moving the multiple tracks load for the parallel function emulator;
The average speedup that Fig. 3 obtains when moving the multiple tracks load for the parallel function emulator;
The speed-up ratio that Fig. 4 is obtained when moving the multithreading load for the parallel function emulator;
The average speedup that Fig. 5 is obtained when moving the multithreading load for the parallel function emulator.
Embodiment
Below in conjunction with specific embodiment such scheme is described further.Should be understood that these embodiment are used to the present invention is described and are not limited to limit the scope of the invention.The implementation condition that adopts among the embodiment can be done further adjustment according to the condition of concrete producer, and not marked implementation condition is generally the condition in the normal experiment.
The practice of embodiment parallel function simulation system and test
Present embodiment has been realized the parallel function emulator on the basis of serial emulator Multi2sim-2.1.In whole implementation process, used server is a dawn theory of evolution EP850-GF minicomputer, and the concrete configuration of this minicomputer is as follows: 84 nuclear AMD Opteron 83461.8G HE CPU, 32G DDR2ECC internal memory, 4*146G SAS hard disk.The operating system of this server operation is LinuxDebain (X86-64).
Experiment shows that serial emulator Multi2sim-2.1 is in the goal systems of emulation 8 nuclears, and emulation is slowed down and reached 18.24.In the time of the goal systems of this emulator emulation 16 nuclears, emulation is slowed down and is reached 165.24 unexpectedly.Along with the increase of target check figure, the emulation of serial emulator is slowed down and will sharply be increased.Therefore, must quicken emulator, improve the overall performance of this emulator with parallelization.
In the present embodiment, the serial simulation engine at first loads multiple tracks load or multithreading load, creates a plurality of contexts then respectively.If the loadtype difference, creating contextual mode also can be different.For the multiple tracks load, context is that each application program is created when being loaded into client's internal memory in the multiple tracks load, and for the multithreading load, only create a Your Majesty during loading hereinafter, all the other contexts then are dynamically to generate when hereinafter carrying out the thread creation primitive of correspondence as the Your Majesty.Simulation engine has been safeguarded a series of state-chain-tables (activity, hang-up etc.) jointly for the context of all establishments, carry out the contextual instruction of each activity then successively, may revise contextual state in this process, then end functions emulation when the instruction of the required execution of all contexts is all finished.From the description of this process as can be seen, the multinuclear functional simulation has natural symmetry and isolation, be also pointed out that the emulation that can not relate in the functional simulation microarchitectures such as streamline, Cache simultaneously, these have all simplified parallel function Simulator Design and realization greatly.
In the present embodiment, basic thought just is to use POSIX multi-thread programming model to come Multi2sim-2.1 is carried out parallelization, promptly realize each contextual simulation process with a Pthread thread respectively, implementation procedure at functional simulation, (wherein mainly being the position of Pthread thread creation) taked the strategy of dividing and ruling when determining parallel function emulator structure, promptly according to the difference of multiple tracks and multithreading load simulation process, formulate different thread creation schemes respectively and be achieved, the two merges and forms parallel function emulator the most at last in debugging and after optimizing.For the multiple tracks load, context is to create when each application program that the multiple tracks load is comprised is loaded into client's internal memory, therefore after finishing, loading just can create simultaneously and the corresponding to a plurality of threads of amount of context, and unified beginning simulation process; And for the multithreading load, sub-context dynamically generates when the Your Majesty hereinafter carries out context creation primitive, therefore with it mutually the thread of binding need wait until that also this could create constantly, system call place of realization context creation primitive has increased the thread creation process in emulator for this reason.
When carrying out thread creation, can realize by the parallel function emulator of multiple tracks load by the code below similar:
main()
{
//create?POSIX?threads?according?to?ctx
for(current_ctx=ke->contx_list;current_ctx->contx_next;current_ctx=current_
ctx->contx_next)
{
pth?read_create(&pid[i++],NULL,ke_execute,(void*)cu?rrent_ctx);
}
ke_execute((void*)current_ctx);
for(context_number=0;context_number<ctxnum1;context_number++)
{
pthread_join(pid[context_number],NULL);
}
}
ke_execute(void*args)
{
struct?ctx_t*ctx=(stru?ct?ctx_t*)args;
while(psim_cycle<max_cycles){
if(!ctx_get_status(ctx,ctx_running))
break;
/*Run?an?instruction?from?a?dedicated?context*/
ke_run((void*)ctx);
psim_cycle++;
}
}
When carrying out thread creation, can realize by the parallel function emulator of multithreading load by the code below similar:
void?syscall_do()
{
case?syscall_code_clone:
{
pid_array_index++;
if(pid_array_index==cores_num-1)
{
struct?ctx_t*current_ctx=NULL;
for(current_ctx=isa_ctx->contx_prev;current_ctx;
current_ctx=current_ctx->contx_prev){
pthread_t?sub_pid;
memcpy(current_ctx->mem,isa_ctx->mem,sizeof(struct?mem_t));
pthread_create(&sub_pid,NULL,ke_execute,(void*)current_ctx);
pid_array[pid_a?rray_index?]=sub_pid;
}
}
}
}
After finishing the thread creation process, the execution model of parallel artificial device as shown in Figure 1.To comprise and amount of context corresponding to a plurality of threads in the system this moment, and each thread is operated system call to different host processor nuclear, the concurrent complete functional simulation to respective contexts of each thread.
Below comprehensively in the implementation procedure of this example, be specifically related to the protection of shared variable with synchronously, realize the thread local variable storage and eliminate false shared mechanism.
Protection is the key factor that influences concurrent program correctness and performance to the concurrent visit of shared resource, and lock then is one of serialized common technology of visit with shared resource.In Multi2sim-2.1; exist in a large number such as the context state chained list; shared resources such as Hash table and client's internal memory; in the parallel artificial device; when the nuclear thread conducts interviews to these resources; use the mutual exclusion lock of operating system grade to come it is protected, reasonably arrange the granularity and the quantity of lock simultaneously, with the speed-up ratio of maximization parallel artificial device.
In addition, between a plurality of contexts of multithreading load, often exist a large amount of synchronization primitives (as lock, roadblock etc.), when the serial emulator is carried out certain contextual synchronization primitives, only need in corresponding system call, this context be inserted in the suitable context state chained list, during as the lock that can not obtain when certain context to be asked, then be inserted into and hang up in the chained list.Yet, for the parallel artificial device, when context is inserted into suitable state-chain-table, also the operating system thread at its place to be switched to rational state, in emulator, realize having increased in the system call function of synchronization primitives for this reason with its function relative operation system level synchronous operation to achieve the above object.
The same with many serial programs, the distinguishing feature of serial multinuclear emulator Multi2sim-2.1 just is to use global variable to control the executing state of simulator.For example, carry out different contextual instructions in order to use unified interface function, Multi2sim-2.1 has introduced isa_ctx, isa_regs, a series of global variables such as isa_mem.When realizing the parallel artificial device, most of similarly variable all needs an independent copy at each context, yet do the problem that is faced like this and will identify these variablees exactly exactly, it is extended to array (vector) form, and quote separately variable by context id.The workload of obvious this implementation is bigger, and bring certain trouble to writing and debugging of parallel artificial device code, use the language construction of thread local storage (Thread Local Storage:TLS) to solve the predicament that is faced in this class variable parallelization process for this reason.
In gcc, TLS can by before the statement of global variable or static local variable, use _ _ the thread key word realizes, this means that these variablees can generate a duplicate automatically when thread creation.Can manually the variable of the single thread nuclear of expression attribute and the variable of being shared by a plurality of thread be made a distinction in this way, and improve the performance of parallel artificial device.
In serial multinuclear emulator, many data structures all are to organize in the mode of array, the corresponding array element of each processor core.Although from the angle of code compiling, compare so conveniently, and can not cause any problem, when exploitation parallel artificial device, a plurality of threads are quoted array element separately can cause false sharing problem.In order to alleviate this problem, use the method for filling (Padding) to guarantee to be assigned in the different cache lines by the data of different host processor nuclear visit.
Because Multi2sim-2.1 is with the thinking design of serial program fully and realizes, therefore exist the variable of a large amount of the above-mentioned types, these variablees are very big to the performance impact of parallel artificial device in the parallelization process, spent the plenty of time excavates such variable for this reason, finally makes the speed of emulator reach gratifying effect.
Test process adopts the performance of multiple tracks and multithreading load evaluation and test parallel function emulator, and the multiple tracks load is to be combined by the related application among the SPEC2006.Table 1 has been listed employed whole multiple tracks load combinations in the test process.The multithreading test procedure is from Splash2.Used several test loads are FFT in this test process, LU (c), RADIX and LU (n).
Table 1 multiple tracks load combinations
The speed-up ratio that Fig. 2, Fig. 3 have been obtained when having provided this emulator operation multiple tracks test load, as can be seen from the figure, when goal systems had 2 nuclears, the speed-up ratio of the maximum that can reach can reach 1.914, and the speed-up ratio of the minimum that obtains is 1.478; When goal systems had 4 nuclears, the maximum speed-up ratio that can reach can reach 3.814, and the minimum speed-up ratio that obtains is 3.366; When goal systems has 8 nuclears, obtainable maximum speed-up ratio can reach 7.618, and the minimum speed-up ratio that obtains is 7.143, when goal systems has 16 nuclears, the maximum speed-up ratio that can reach can reach 15.827, and the minimum speed-up ratio that is obtained is 15.321.It can also be seen that from Fig. 3 in 2 nuclears, 4 nuclears, 8 nuclears and 16 nuclears, average speedup is respectively 1.748,3.644,7.372 and 15.628.
The speed-up ratio that has been obtained when having provided this emulator operation multithreading test load among Fig. 4, Fig. 5, as can be seen from the figure, when goal systems had 2 nuclears, the speed-up ratio of the maximum that can reach was 1.819, and the speed-up ratio of the minimum that obtains has only 1.378; When goal systems had 4 nuclears, the maximum speed-up ratio that can reach was 3.131, and the minimum speed-up ratio that obtains has only 1.838; When goal systems had 8 nuclears, obtainable maximum speed-up ratio was 4.852, and the minimum speed-up ratio that obtains has only 2.434, and when goal systems had 16 nuclears, the maximum speed-up ratio that can reach was 6.372, and the minimum speed-up ratio that is obtained has only 4.821.It can also be seen that from Fig. 5 in 2 nuclears, 4 nuclears, 8 nuclears and 16 nuclears, average speedup is respectively 1.692,2.760,3.833 and 5.292.
When the load of operation multithreading, the speed-up ratio height that the speed-up ratio that the parallel function emulator is obtained is obtained when not moving the multiple tracks load, this phenomenon is mainly caused by the signal post of cross-thread.In the multiple tracks load, cross-thread is communicated by letter hardly, can obtain comparatively desirable speed-up ratio when therefore moving the multiple tracks load.But, between the different threads of multithreading load, have a large amount of inter-thread communications.Along with the increase of check figure, such expense will be very big, thereby cause the decline of parallel function emulator overall performance.
Above-mentioned example only is explanation technical conceive of the present invention and characteristics, and its purpose is to allow the people who is familiar with this technology can understand content of the present invention and enforcement according to this, can not limit protection scope of the present invention with this.All equivalent transformations that spirit is done according to the present invention or modification all should be encompassed within protection scope of the present invention.
Claims (10)
1. the parallel function simulation system of a chip multi-core processor, comprise system's load module and system's output module, it is characterized in that described system also comprises the simulation kernel module, described simulation kernel module is accepted the workload information moved on the goal systems that system's load module provides, described simulation kernel module is carried out the parallelization processing of simulation work load according to the type dynamic creation multithreading of operating load, and passes through system's output module output.
2. the parallel function simulation system of chip multi-core processor according to claim 1 is characterized in that described simulation kernel module comprises multiprogramming dummy load processing module and multithread programs dummy load processing module.
3. the parallel function simulation system of chip multi-core processor according to claim 2, it is characterized in that described multiprogramming dummy load processing module is each application assigned context according to configuration file, and context is organized into creates each contextual thread behind the context chained list.
4. the parallel function simulation system of chip multi-core processor according to claim 2 is characterized in that described multithread programs dummy load processing module creates top layer that main thread runs on goal systems and carry out after the initialization contextual information invoke system call dynamic creation experimental process thread that inserts in the chained list based on context.
5. the parallel function simulation system of chip multi-core processor according to claim 1; it is characterized in that described analogue system also comprises the shared variable protection module; when a plurality of threads carried out concurrent visit to shared variable simultaneously, described shared variable protection module used the mutual exclusion lock and the barrier operations of operating system grade to make concurrent access process serializing.
6. the parallel function simulation system of chip multi-core processor according to claim 1, it is characterized in that described analogue system also comprises the thread local memory module, described thread local memory module is provided at the copy that carries out global variable when creating thread in each thread.
7. the parallel function simulation system of chip multi-core processor according to claim 1, it is characterized in that described analogue system also comprises the data packing module, described data packing module is filled the data of distributing the visit of different host processor nuclear in different cache lines.
8. the parallel function emulation mode of a chip multi-core processor is characterized in that said method comprising the steps of:
(1) workload information of being moved on the goal systems that described simulation kernel module receiving system load module provides, and from the context configuration file of operating load, load contextual information;
(2) described simulation kernel module is created the thread of respective number according to the contextual information that is loaded;
(3) instruction in the corresponding context is carried out in the thread of described simulation kernel module dynamic creation and main thread parallel running, carries out system's output and finishes system emulation.
9. method according to claim 8 is characterized in that the thread of the dynamic creation of simulation kernel module described in the described method and main thread carry out when synchronous, and mutual exclusion lock and barrier operations by operating system grade realize the synchronization of access shared variable.
10. method according to claim 8 is characterized in that the thread of simulation kernel module creation described in the described method carries out the global variable privatization and guarantees that by the method for filling the data on the different host processor nuclears are assigned with by the data packing module in different Cache is capable by the thread local memory module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010103887 CN101777007B (en) | 2010-01-28 | 2010-01-28 | Parallel function simulation system for on-chip multi-core processor and method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010103887 CN101777007B (en) | 2010-01-28 | 2010-01-28 | Parallel function simulation system for on-chip multi-core processor and method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101777007A true CN101777007A (en) | 2010-07-14 |
CN101777007B CN101777007B (en) | 2013-04-10 |
Family
ID=42513477
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201010103887 Expired - Fee Related CN101777007B (en) | 2010-01-28 | 2010-01-28 | Parallel function simulation system for on-chip multi-core processor and method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101777007B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102279766A (en) * | 2011-08-30 | 2011-12-14 | 华为技术有限公司 | Method and system for concurrently simulating processors and scheduler |
CN102331961A (en) * | 2011-09-13 | 2012-01-25 | 华为技术有限公司 | Method, system and dispatcher for simulating multiple processors in parallel |
CN102467406A (en) * | 2010-11-09 | 2012-05-23 | 无锡江南计算技术研究所 | Simulation method and simulator in multi-processor structure |
CN102591759A (en) * | 2011-12-29 | 2012-07-18 | 中国科学技术大学苏州研究院 | Clock precision parallel simulation system for on-chip multi-core processor |
CN102880770A (en) * | 2012-10-29 | 2013-01-16 | 无锡江南计算技术研究所 | Central processing unit (CPU) access sequence simulation model based on macro-instruction queue |
CN103049310A (en) * | 2012-12-29 | 2013-04-17 | 中国科学院深圳先进技术研究院 | Multi-core simulation parallel accelerating method based on sampling |
CN103472734A (en) * | 2013-09-18 | 2013-12-25 | 南车株洲电力机车研究所有限公司 | Semi-physical simulation method and system of urban rail traction system |
US9639636B1 (en) * | 2012-04-02 | 2017-05-02 | Google Inc. | Algorithmically driven selection of parallelization technique for running model simulation |
CN107980118A (en) * | 2015-06-10 | 2018-05-01 | 无比视视觉技术有限公司 | Use the multi-nuclear processor equipment of multiple threads |
CN109460677A (en) * | 2018-11-12 | 2019-03-12 | 湖南中车时代通信信号有限公司 | The data-storage system of multi-tasking is used under a kind of embedded environment |
CN112463716A (en) * | 2020-11-27 | 2021-03-09 | 中船重工(武汉)凌久电子有限责任公司 | Global semaphore implementation method based on multi-core multi-processor parallel system |
CN113360280A (en) * | 2021-06-02 | 2021-09-07 | 西安中锐创联科技有限公司 | Simulation curve display method based on multi-thread operation and dynamic global variable processing |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6142682A (en) * | 1997-06-13 | 2000-11-07 | Telefonaktiebolaget Lm Ericsson | Simulation of computer processor |
US7392165B2 (en) * | 2002-10-21 | 2008-06-24 | Fisher-Rosemount Systems, Inc. | Simulation system for multi-node process control systems |
CN101256502B (en) * | 2007-02-27 | 2011-02-09 | 国际商业机器公司 | System and method for simulating multiprocessor system |
-
2010
- 2010-01-28 CN CN 201010103887 patent/CN101777007B/en not_active Expired - Fee Related
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102467406A (en) * | 2010-11-09 | 2012-05-23 | 无锡江南计算技术研究所 | Simulation method and simulator in multi-processor structure |
CN102467406B (en) * | 2010-11-09 | 2014-04-16 | 无锡江南计算技术研究所 | Simulation method and simulator in multi-processor structure |
CN102279766B (en) * | 2011-08-30 | 2014-05-07 | 华为技术有限公司 | Method and system for concurrently simulating processors and scheduler |
CN102279766A (en) * | 2011-08-30 | 2011-12-14 | 华为技术有限公司 | Method and system for concurrently simulating processors and scheduler |
WO2013029513A1 (en) * | 2011-08-30 | 2013-03-07 | 华为技术有限公司 | Method and system, scheduler for parallel simulating processors |
US9703905B2 (en) | 2011-09-13 | 2017-07-11 | Huawei Technologies Co., Ltd. | Method and system for simulating multiple processors in parallel and scheduler |
CN102331961B (en) * | 2011-09-13 | 2014-02-19 | 华为技术有限公司 | Method, system and dispatcher for simulating multiple processors in parallel |
CN102331961A (en) * | 2011-09-13 | 2012-01-25 | 华为技术有限公司 | Method, system and dispatcher for simulating multiple processors in parallel |
CN102591759A (en) * | 2011-12-29 | 2012-07-18 | 中国科学技术大学苏州研究院 | Clock precision parallel simulation system for on-chip multi-core processor |
CN102591759B (en) * | 2011-12-29 | 2014-08-13 | 中国科学技术大学苏州研究院 | Clock precision parallel simulation system for on-chip multi-core processor |
US9639636B1 (en) * | 2012-04-02 | 2017-05-02 | Google Inc. | Algorithmically driven selection of parallelization technique for running model simulation |
CN102880770A (en) * | 2012-10-29 | 2013-01-16 | 无锡江南计算技术研究所 | Central processing unit (CPU) access sequence simulation model based on macro-instruction queue |
CN103049310B (en) * | 2012-12-29 | 2016-12-28 | 中国科学院深圳先进技术研究院 | A kind of multi-core simulation parallel acceleration method based on sampling |
CN103049310A (en) * | 2012-12-29 | 2013-04-17 | 中国科学院深圳先进技术研究院 | Multi-core simulation parallel accelerating method based on sampling |
CN103472734A (en) * | 2013-09-18 | 2013-12-25 | 南车株洲电力机车研究所有限公司 | Semi-physical simulation method and system of urban rail traction system |
CN107980118A (en) * | 2015-06-10 | 2018-05-01 | 无比视视觉技术有限公司 | Use the multi-nuclear processor equipment of multiple threads |
CN107980118B (en) * | 2015-06-10 | 2021-09-21 | 无比视视觉技术有限公司 | Multi-core processor device using multi-thread processing |
US11294815B2 (en) | 2015-06-10 | 2022-04-05 | Mobileye Vision Technologies Ltd. | Multiple multithreaded processors with shared data cache |
CN109460677A (en) * | 2018-11-12 | 2019-03-12 | 湖南中车时代通信信号有限公司 | The data-storage system of multi-tasking is used under a kind of embedded environment |
CN112463716A (en) * | 2020-11-27 | 2021-03-09 | 中船重工(武汉)凌久电子有限责任公司 | Global semaphore implementation method based on multi-core multi-processor parallel system |
CN112463716B (en) * | 2020-11-27 | 2024-02-13 | 中船重工(武汉)凌久电子有限责任公司 | Global semaphore implementation method based on multi-core multi-processor parallel system |
CN113360280A (en) * | 2021-06-02 | 2021-09-07 | 西安中锐创联科技有限公司 | Simulation curve display method based on multi-thread operation and dynamic global variable processing |
CN113360280B (en) * | 2021-06-02 | 2023-11-28 | 西安中锐创联科技有限公司 | Simulation curve display method based on multithread operation and dynamic global variable processing |
Also Published As
Publication number | Publication date |
---|---|
CN101777007B (en) | 2013-04-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101777007B (en) | Parallel function simulation system for on-chip multi-core processor and method thereof | |
Wu et al. | Enabling and exploiting flexible task assignment on GPU through SM-centric program transformations | |
Wang et al. | Dynamic thread block launch: A lightweight execution mechanism to support irregular applications on gpus | |
Chen et al. | FlinkCL: An OpenCL-based in-memory computing architecture on heterogeneous CPU-GPU clusters for big data | |
Cui et al. | Parrot: A practical runtime for deterministic, stable, and reliable threads | |
CN102576314B (en) | The mapping with the data parallel thread across multiple processors processes logic | |
Krieder et al. | Design and evaluation of the gemtc framework for gpu-enabled many-task computing | |
Bortolotti et al. | Virtualsoc: A full-system simulation environment for massively parallel heterogeneous system-on-chip | |
Wang et al. | SODA: Software defined FPGA based accelerators for big data | |
US10318261B2 (en) | Execution of complex recursive algorithms | |
Haji et al. | A State of Art Survey for OS Performance Improvement | |
Qian et al. | Accelerating RTL simulation with GPUs | |
Tian et al. | Concurrent execution of deferred OpenMP target tasks with hidden helper threads | |
Robson et al. | Runtime coordinated heterogeneous tasks in Charm++ | |
US10761821B1 (en) | Object oriented programming model for graphics processing units (GPUS) | |
CN104899369A (en) | Simulator multithreading operation method utilizing PERL script | |
Aoki et al. | Hybrid opencl: Connecting different opencl implementations over network | |
Liu et al. | Applying GPU and POSIX thread technologies in massive remote sensing image data processing | |
CN102117224B (en) | Multi-core processor-oriented operating system noise control method | |
US20090133022A1 (en) | Multiprocessing apparatus, system and method | |
Raghav et al. | Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting | |
Valero et al. | Towards a more efficient use of gpus | |
Tomiyama et al. | SMYLE OpenCL: A programming framework for embedded many-core SoCs | |
Häuser et al. | A test suite for high-performance parallel Java | |
Zou et al. | Supernodal sparse Cholesky factorization on graphics processing units |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130410 Termination date: 20180128 |
|
CF01 | Termination of patent right due to non-payment of annual fee |