WO2016041191A1 - 读写数据的方法、装置、存储设备和计算机*** - Google Patents

读写数据的方法、装置、存储设备和计算机*** Download PDF

Info

Publication number
WO2016041191A1
WO2016041191A1 PCT/CN2014/086925 CN2014086925W WO2016041191A1 WO 2016041191 A1 WO2016041191 A1 WO 2016041191A1 CN 2014086925 W CN2014086925 W CN 2014086925W WO 2016041191 A1 WO2016041191 A1 WO 2016041191A1
Authority
WO
WIPO (PCT)
Prior art keywords
data read
execution
write
storage device
execution thread
Prior art date
Application number
PCT/CN2014/086925
Other languages
English (en)
French (fr)
Inventor
杨辉联
卢磊
时代
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201680001619.1A priority Critical patent/CN106489132B/zh
Priority to PCT/CN2014/086925 priority patent/WO2016041191A1/zh
Priority to EP14902078.6A priority patent/EP3188002A4/en
Publication of WO2016041191A1 publication Critical patent/WO2016041191A1/zh
Priority to US15/462,057 priority patent/US10303474B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/3009Thread control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores

Definitions

  • the present invention relates to the field of computers and, more particularly, to a method, apparatus, storage device and computer system for reading and writing data.
  • data read and written between a core and a storage device may need to be forwarded by other cores, that is, data read and write operations require cooperation between the kernel and the kernel, and the kernel
  • data read and write operations require cooperation between the kernel and the kernel
  • the kernel There is a large data transmission delay between the core and the large data transmission delay, which seriously affects the completion time of the data read and write operations, thereby affecting the completion time of the entire task (for example, signal processing).
  • Embodiments of the present invention provide a method, apparatus, and system for reading and writing data, which can shorten the completion time of data read and write operations in a multi-core computer system.
  • a method for reading and writing data comprising: determining, by a host device, N cores for executing a target process, wherein the N cores correspond to N execution threads included in the target process , N ⁇ 2; grouping the N execution threads to determine M execution thread groups, and assigning an indication identifier to each execution thread group, wherein an indication identifier is used to identify an execution thread group, and one execution thread belongs only to An execution thread group, an execution thread group includes at least one execution thread, M ⁇ 2; and sends M data read/write instructions to the storage device, the M data read/write instructions are in one-to-one correspondence with the M execution thread groups, each of which The data read/write instruction includes an indication identifier of the corresponding execution thread group, so that the storage device determines the execution thread group corresponding to each data read/write instruction according to the indication identifier included in each data read/write instruction, and each of the Data read and write instructions are transmitted to the corresponding execution thread group, so that each execution thread
  • the data transfer between each of the execution threads and the storage device is based on a direct storage DMA protocol.
  • the sending, by the storage device, the M data read/write instructions includes: sending a data read/write signal to the storage device, the data read/write signal
  • the M signal components are respectively associated with the M data read/write instructions, and each data read/write instruction is carried in the corresponding signal component.
  • the sending, by the storage device, the M data read/write instructions includes: sending, by using the master thread included in the target process, to the storage device M data read and write instructions.
  • the grouping the N execution threads includes: determining a desired completion time of the target process and each of the cores Data transmission delay; grouping the N execution threads according to the expected completion time and the data transmission delay.
  • a method for reading and writing data includes: the storage device receives M data read/write instructions sent by the host device, and the M data read/write instructions are in one-to-one correspondence with the M execution thread groups.
  • the data read/write instruction includes an indication identifier of the corresponding execution thread group, and an indication identifier is used to identify an execution thread group, wherein the M execution thread group is determined by the host device grouping the N execution threads included in the target process.
  • the N cores determined by the host device for executing the target process are in one-to-one correspondence with the N execution threads, N ⁇ 2, one execution thread belongs to only one execution thread group, and one execution thread group includes at least one execution thread.
  • the data transfer between each of the execution threads and the storage device is based on a direct storage DMA protocol.
  • the receiving, by the receiving host device, the M data read/write instructions include: receiving a data read/write signal sent by the host device, the data reading The write signal includes M signal components, and the M signal components are in one-to-one correspondence with the M data read/write instructions, and each data read/write command is carried in the corresponding signal component.
  • the receiving, by the receiving host device, the M data read/write instructions include: receiving the M messages sent by the master thread included in the target process Data read and write instructions.
  • the M execution thread groups are specifically determined by the host device grouping the N execution threads according to the expected completion time of the target process and the data transmission delay between the cores.
  • an apparatus for reading and writing data comprising: a determining unit, configured to determine N kernels for executing a target process, wherein the N cores and N execution threads included in the target process a one-to-one correspondence, N ⁇ 2; a grouping unit, configured to group the N execution threads to determine M execution thread groups, wherein one execution thread belongs to only one execution thread group, and one execution thread group includes at least one execution group a thread, M ⁇ 2; a sending unit, configured to send M data read/write instructions to the storage device, the M data read/write instructions are in one-to-one correspondence with the M execution thread groups, and each of the data read/write instructions includes a corresponding one Executing an indication of the thread group, an indication identifier for uniquely identifying an execution thread group, so that the storage device transmits each of the data read/write instructions to the corresponding execution thread group according to the indication identifiers, so that each The execution thread performs data read and write operations on the storage device according to the data read
  • the data transfer between each of the execution threads and the storage device is based on a direct storage DMA protocol.
  • the sending unit is specifically configured to send a data read/write signal to the storage device, where the data read/write signal includes M signal components, and the M The signal components are in one-to-one correspondence with the M data read/write instructions, and each data read/write command is carried in the corresponding signal component.
  • the sending unit is configured to send, by using a master thread included in the target process, M data read/write instructions to the storage device.
  • the grouping unit is specifically configured to: according to a desired completion time of the target process and a data transmission delay between the cores, The N execution threads are grouped.
  • the device for reading and writing data is a host device in a computer system.
  • a fourth aspect provides a storage device, including: a transmission interface, a communication between the storage device and the host device; a storage space for storing data; and a controller, configured to receive the host device by using the transmission interface
  • the M data read/write instructions are sent, and the M data read/write instructions are in one-to-one correspondence with the M execution thread groups, each of the data read/write instructions includes an indication identifier of the corresponding execution thread group, and an indication identifier is used to identify one Execution thread group, the M execution thread group is the main
  • the device determines the N execution threads included in the target process, and the N cores determined by the host device to perform the target process are in one-to-one correspondence with the N execution threads, N ⁇ 2, and one execution thread only
  • An execution thread group, an execution thread group includes at least one execution thread, M ⁇ 2, configured to determine, according to the indication identifier, a data read/write instruction corresponding to each execution thread group, and use the transmission interface to use each data
  • the read and write instructions
  • the data transfer between each of the execution threads and the storage device is based on a direct storage DMA protocol.
  • the receiving unit is specifically configured to receive a data read/write signal sent by the host device, where the data read/write signal includes M signal components, where The M signal components are in one-to-one correspondence with the M data read/write instructions, and each data read/write command is carried in the corresponding signal component.
  • the controller is specifically configured to receive, by using the transmission interface, a data read/write signal sent by the host device, where the data read/write signal includes M Signal components, the M signal components are in one-to-one correspondence with the M data read/write instructions, and each data read/write command is carried in the corresponding signal component.
  • the M execution thread group is specifically the expected completion time of the host device according to the target process and the data between the cores
  • the transmission delay is determined by grouping the N execution threads.
  • a computer system comprising: a bus; a host device connected to the bus, configured to determine N cores for executing a target process, wherein the N cores and the N processes included in the target process
  • the execution thread has a one-to-one correspondence, N ⁇ 2, grouping the N execution threads to determine M execution thread groups, and assigning an indication identifier to each execution thread group, wherein an indication identifier is used to identify an execution thread group.
  • An execution thread belongs to only one execution thread group, and one execution thread group includes at least one execution thread, M ⁇ 2, and sends M data read/write instructions to the storage device through the bus, and the M data read/write instructions and the M executions
  • the thread groups are in one-to-one correspondence, each of the data read/write instructions includes an indication identifier of the corresponding execution thread group, and an indication identifier is used to uniquely identify an execution thread group; a storage device connected to the bus is configured to receive through the bus The M data read and write instructions, and according to the indication identifier, determine a data read/write instruction corresponding to each execution thread group, and transmit each data read/write instruction To the corresponding execution thread group, so that each of the executions The row thread performs data read and write operations on the storage device according to the obtained data read and write instructions.
  • the M execution thread group is a length of time required by the host device according to the target process, and a data transmission delay between the cores is N execution threads are determined by grouping.
  • a host device groups N cores for executing a target process, and groups N execution threads corresponding to the N cores to determine M execution thread groups, and carrying the instruction identifier of the execution thread group corresponding to the data read/write instruction in the data read/write instruction sent to the storage device, so that the storage device can identify the data read/write instruction according to the indication identifier
  • the corresponding thread group, and the storage device can forward the data read/write instruction to the thread group corresponding to the data read/write instruction, and enable each execution thread to read and write the data according to the read/write instruction obtained from the storage device.
  • Data read and write operations which can reduce signaling and data transmission between cores during data read and write operations, thereby reducing processing delays caused by the signaling and data transmission, and shortening data in multi-core computer systems The completion time of the read and write operations.
  • FIG. 1 is a schematic block diagram of an apparatus for reading and writing data according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a flow of reading and writing data according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of an apparatus for reading and writing data according to another embodiment of the present invention.
  • FIG. 4 is a schematic flow chart of a method for reading and writing data according to an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of a method for reading and writing data according to another embodiment of the present invention.
  • Fig. 6 is a schematic block diagram showing a system for reading and writing data according to an embodiment of the present invention.
  • the technical solution of the present invention can be run on a hardware device including, for example, a CPU, a memory management unit (MMU), and a memory (also referred to as a memory), and the operating system running by the hardware device can be variously adopted.
  • a hardware device including, for example, a CPU, a memory management unit (MMU), and a memory (also referred to as a memory), and the operating system running by the hardware device can be variously adopted.
  • a computer operating system that implements business processing by threads or processes (including multiple threads), such as Linux systems, Unux systems, and the like.
  • the device for reading and writing data of the present invention can be applied to a computer system, specifically for performing data read and write operations.
  • the data read and write operation can be a central processing unit (CPU, Central Processing Unit) controlling each thread to perform arithmetic processing in the cache.
  • CPU Central Processing Unit
  • the data read/write operation performed in the device (an example of the storage device) or the data read/write operation may be a data read/write operation performed by the CPU in the disk device (another example of the storage device), and the present invention is not particularly limited, and the following
  • the data reading and writing operation process performed in the cache device is taken as an example, and the method, device and system for reading and writing data according to the embodiment of the present invention are described in detail.
  • a real-time operating system also known as an instant operating system
  • RTOS real-time operating system
  • the result of the processing can control the production process or respond quickly to the processing system within a specified time, and control the operating system in which all real-time tasks run in unison.
  • the biggest feature is its "real-time”. That is to say, if there is a task that needs to be executed, the real-time operating system will complete the task immediately (in a short time), there will be no comparison. Long delays.
  • FIG. 1 shows a schematic block diagram of a method 100 of reading and writing data in accordance with an embodiment of the present invention. As shown in FIG. 1, the method 100 includes:
  • the host device determines N cores for executing the target process, wherein the N cores are in one-to-one correspondence with the N execution threads included in the target process, and N ⁇ 2;
  • one execution thread group includes at least one execution thread, M ⁇ 2;
  • each of the data read/write instructions includes an indication identifier of the corresponding execution thread group, so that the storage device determines each of the data read/write instructions according to the indication identifier included in each of the data read/write instructions.
  • a host device in a computer system may be cited, and the host device has a plurality of CPUs (or kernels), wherein the plurality of CPUs may cooperate to perform a target task, for example, Each CPU can separately run a portion (one or more) of the threads in the process corresponding to the target task.
  • a plurality of CPUs are communicatively connected to each other, so that data sharing can be realized by means of handshaking or the like.
  • the computer system further includes a storage device for providing a storage function, and the host device can access a storage space in the storage device when performing a target task, and perform signals or data generated for performing the target task. Read and write operations (or storage operations).
  • the storage device may support various storage media.
  • the storage device may further include a storage interface expansion module, and may connect at least one solid state drive (SSD, Solid State Disk) and/or a hybrid hard disk ( HHD, Hybrid Hard Disk) can expand the capacity of storage devices as needed.
  • SSD Solid State Disk
  • HHD Hybrid Hard Disk
  • the host device and the storage device can be connected by various computer interfaces capable of data transmission, for example, a Peripheral Component Interconnect Express (PCIE) interface, a Thunderbolt interface, Infiniband interface, high-speed Universal Serial Bus (USB) interface, and high-speed Ethernet interface.
  • PCIE Peripheral Component Interconnect Express
  • Thunderbolt Thunderbolt interface
  • USB Universal Serial Bus
  • Ethernet Ethernet interface
  • N CPUs ie, cores
  • the N CPUs for executing the target task may be determined from all CPUs included in the host device, and hereinafter, For ease of understanding and explanation, the N CPUs for performing the target task are described as: CPU #1 to CPU #N.
  • the host device may determine a specific value of the above “N” according to an operation amount required to execute the target task, for example, if the amount of computation required for the target task is large, in order to complete the task quickly, The value of the above “N” can be made larger; if the amount of computation required for the target task is small, only a small number of CPUs can complete the task quickly, then Let the value of "N" above be small.
  • the CPU #1 to CPU #N may be used to execute N threads (ie, execution threads) of the target task, respectively.
  • N threads ie, execution threads
  • the N threads are written as: thread #1 to thread #N, that is, CPU#1 ⁇ CPU#N correspond one-to-one with thread #1 to thread #N.
  • the corresponding rule of "one-to-one correspondence" may be that one CPU is used to control the same serial number. The thread is running.
  • the method for determining the N CPUs for performing the target task by the host device enumerated above and the parameters used are merely exemplary descriptions, and the present invention is not particularly limited.
  • the default value may also be used according to a preset value.
  • the number of CPUs used for all tasks performed is the same, for example, the preset value can be the sum of all the CPUs included in the host device.
  • the present invention is directed to solving the influence of the completion time of the data read/write operation caused by the data transmission delay between the CPUs. Therefore, when N ⁇ 2, the technical effects of the present invention can be fully embodied, and subsequently The technical effects will be described in detail.
  • N is 29, that is, the host device determines 29 CPUs to execute a target process, and the target process includes 29 execution threads. .
  • the host device may divide the CPU #1 to CPU #N determined as described above into M CPU groups, or to be used for the CPU #1 to CPU #N determined as described above.
  • Thread #1 to Thread #N are divided into M thread groups.
  • the following rules can be cited:
  • the N execution threads are grouped, including:
  • the N execution threads are grouped according to the expected completion time length and the data transmission delay.
  • the host device can determine the data transmission delay between the CPUs #1 to ##N described above, for example, information such as the models of the CPUs #1 to ##N, the connection manner between the CPUs, and the like, so that Based on the above information, the data transmission delay between the CPU #1 and the CPU #N is derived.
  • the method for determining the data transmission delay between the CPU #1 and the CPU #N by the host device enumerated above and the parameters used are merely exemplary descriptions, and the present invention is not particularly limited. For example, it is also possible to pass the test. In the other manner, the data transmission delay between the CPU #1 and the CPU #N is detected.
  • the host device may further determine a desired completion time of the target process, where the target process needs to be completed within a prescribed time, and the expected completion time may be that the target process starts from The time elapsed until execution of the end execution (for example, the CPU may determine that the task is successfully executed and exits the process), and the expected completion time may be less than or equal to the prescribed time.
  • the host device may determine the expected completion time of the target process according to the attribute information of the target process type, processing priority, and the like, for example, if the type of the target process indicates that the service of the target process belongs to the real-time type service. (eg, online games, video calls, etc.), it can be determined that the target process is more urgency and needs to be completed in a shorter time, so that it can be determined that the target process has a shorter completion time (for example, less than one pre-
  • the threshold value is set; for example, if the processing priority of the target process is marked as high, it can be determined that the target process is highly urgent and needs to be completed in a short time, so that the target process can be determined.
  • the expected completion time is shorter (eg, below a preset threshold).
  • the host device can group the CPUs #1 to CPU#N based on the data transmission delay between the CPUs #1 to CPU#N determined as described above and the expected completion time of the target process, so as to include the CPUs.
  • the completion time of the target process including the sum of the data transfer delays between the CPUs in the group, is less than or equal to the expected completion time of the target process, for example:
  • the host device can calculate the duration of completing the target process without the data transmission between the CPUs according to the processing capability of each CPU.
  • the reference completion time of the target process is referred to as short, so that the desired completion of the target process can be obtained.
  • the difference between the reference completion time of the target process of the duration may be further grouped according to the above results, so that the sum of the data transmission delays of the CPUs in each group is less than or equal to the above. Difference.
  • the host device may further divide CPU#1 ⁇ CPU#N into M CPU groups based on a preset reference value K.
  • M CPU groups at least the number of CPUs included in the M-1 CPU groups is equal to the reference value K, or the number of CPUs included in at most one CPU group is less than the reference value K, without rum
  • the number of CPUs included in each CPU group is an integer greater than zero.
  • the preset reference value K may be appropriately changed according to parameters such as the load of the computer system. For example, if the load of the current computer system is large, a smaller K value may be adopted.
  • K is 8, and therefore, 29 CPUs are divided into four CPU groups, that is, CPU group #1 to CPU group #4.
  • CPU group #1 includes 8 CPUs, that is, CPU #1 to CPU #8;
  • CPU group #2 includes 8 CPUs, that is, CPU #9 to CPU #16;
  • CPU group #3 includes 8 CPUs, That is, CPU #17 to CPU #24;
  • CPU group #4 includes five CPUs, that is, CPU #25 to CPU #29.
  • thread group #1 includes 8 threads, that is, thread #1 to thread #8; thread group #2 includes 8 threads, that is, thread #9 to thread #16; thread group #3 includes 8 threads, that is, Thread #17 to Thread #24; Thread Group #4 includes 5 threads, that is, Thread #25 to Thread #29.
  • the host device may send M data read/write instructions to the storage device. Specifically, the host device may acquire data read/write instructions from each CPU, and may be according to the CPU group or the thread group divided as described above. An indication flag is added for each data read/write instruction to indicate a CPU group from which each data read/write instruction comes, or a thread group corresponding to each data read/write instruction.
  • the storage device may receive the M data read/write instructions through the receiving unit, and determine, by the determining unit, the thread group corresponding to each data read/write command according to the indication identifier carried by each data read/write command, or , the CPU group corresponding to each data read and write instruction.
  • the storage device may transmit, by the sending unit, the data read/write instruction to the thread group indicated by the indication identifier carried by the data read/write instruction according to the indication identifier carried by the data read/write instruction, so that each thread can obtain The read/write command from the corresponding CPU, and further, the data read/write command can be performed in the storage space of the storage device based on the data read/write command.
  • data transmission between each of the execution threads and the storage device is performed based on a direct storage DMA protocol.
  • Direct Memory Access refers to a high-speed data transfer operation that allows direct reading and writing of data between an external device and a memory, neither by the CPU nor by CPU intervention.
  • the entire data transfer operation can be performed under a control called a "DMA controller.”
  • the CPU can perform other tasks during the transfer process. That is, in the embodiment of the present invention, the DMA controller can also be configured in the computer system, and the DMA controller controls the data read and write operations of the threads in the storage device.
  • the DMA controller issues a DMA request to the CPU (belonging to the host device):
  • the implementation manner of the DMA transmission listed above is merely an exemplary description, and the present invention is not particularly limited, and a method capable of realizing DMA transmission in the prior art may also be used.
  • the functions of the above DMA controller can be implemented by software or a program or the like.
  • the sending, by the storage device, M data read and write instructions including:
  • the data read/write signal includes M signal components
  • the M signal components are in one-to-one correspondence with the M data read/write commands
  • each data read/write command is carried in the corresponding signal component.
  • the host device can send the above data read/write commands to the storage device together with the same signal (or signal stream).
  • each data read/write command may be separately carried in an independent signal and sent to Storage device.
  • the sending, by the storage device, M data read and write instructions including:
  • M data read and write instructions are sent to the storage device through the master thread included in the target process.
  • the main control CPU and the main control thread corresponding to the main control CUP may be configured, that is, the main control CPU may determine each data read/write instruction of the CPU #1 to CPU#N. And sending the above data read/write instruction to the storage device through the master thread.
  • the master thread belongs to the execution thread.
  • one CPU may be selected as the master CPU from the CPU #1 to CPU #N, and a thread corresponding to the master CPU is used as the master thread. That is, the master thread can be used to transfer the data read and write instructions to the storage device, and can also be used to access the storage space of the storage device for data read and write operations.
  • the target process is executed by 29 CPUs
  • the target process includes 29 execution threads
  • 29 CPUs are divided into 4 CPU groups, that is, CPU group #1 to CPU group # 4.
  • CPU group #1 includes 8 CPUs, that is, CPU #1 to CPU #8
  • CPU group #2 includes 8 CPUs, that is, CPU #9 to CPU #16
  • CPU group #3 includes 8 CPUs, that is, CPU#17 to CPU#24
  • CPU group #4 includes five CPUs, that is, CPU#25 to CPU#29.
  • 29 threads ie, execution threads
  • 29 threads are divided into 4 thread groups, namely, thread group #1 to thread group #4.
  • thread group #1 includes 8 threads, that is, thread #1 to thread #8; thread group #2 includes 8 Threads, that is, thread #9 to thread #16; thread group #3 includes 8 threads, that is, thread #17 to thread #24; thread group #4 includes 5 threads, that is, thread #25 to thread #29 .
  • CPU #0 is the master CPU
  • thread #0 is the master thread
  • step 1 the thread #0 sends a request signal to the memory, where the request signal carries a data read/write request of each CPU that needs to perform data read and write operations in the current cycle, and each data read/write request carries The indication ID of the corresponding thread group.
  • the storage device may determine the thread group corresponding to each data read/write request according to the indication identifier carried in each data read/write request, and the storage device splits the request signal and generates the thread group as a unit.
  • the response signal corresponding to each thread group, and the indication identifiers of the data read/write instructions carried in each response signal all point to the same thread group.
  • the memory may send each response signal to the corresponding thread group according to the indication.
  • each thread group can perform data read and write operations in the storage device based on data read/write instructions from the storage device under the control of the corresponding CPU group, wherein the data read and write operation process can be
  • the technical similarities are hereby omitted in order to avoid redundancy.
  • the main control thread can detect the completion of the data read and write operations of each thread group, and the main control thread can end immediately after all the threads in a certain thread group complete the data read and write operations.
  • the control of the thread group, or the master thread can also end the control of all thread groups after all the threads in all thread groups complete the data read and write operations.
  • the storage device can detect the completion of the data read and write operations in the storage space, and the storage device can immediately notify the main after all the threads in a certain thread group complete the data read and write operations. Controlling the thread so that the master thread ends the control of the thread group, or the storage device can uniformly notify the master thread after all the threads in all the thread groups complete the data read and write operations, so that the master thread ends. Control of all thread groups.
  • jitter An important measure of a real-time operating system is the time it takes from receiving a task to completing it, and the change in time is called jitter.
  • the primary goal of designing a real-time operating system is not to have high throughput, but to ensure that tasks are completed within a certain time.
  • the host device groups the N cores for executing the target process, and groups the N execution threads corresponding to the N cores to determine M execution threads.
  • the data read/write instruction sent to the storage device carries an indication identifier of the execution thread group corresponding to the data read/write instruction, so that the storage device can identify the thread corresponding to the data read/write instruction according to the indication identifier.
  • the group, and further the storage device can forward the data read/write instruction to the thread group corresponding to the data read/write instruction, and enable each execution thread to perform data reading in the storage device according to the data read/write instruction obtained from the storage device.
  • FIG. 3 is a schematic flowchart of a method 200 for reading and writing data according to another embodiment of the present invention. As shown in FIG. 3, the method 200 includes:
  • the storage device receives M data read/write instructions sent by the host device, where the M data read/write instructions are in one-to-one correspondence with the M execution thread groups, and each of the data read/write instructions includes an indication identifier of the corresponding execution thread group.
  • An indication identifier is used to identify an execution thread group
  • the M execution thread group is determined by the host device grouping N execution threads included in the target process, and the N devices determined by the host device to execute the target process are determined.
  • the kernel corresponds to the N execution threads one by one, N ⁇ 2, one execution thread belongs to only one execution thread group, and one execution thread group includes at least one execution thread, M ⁇ 2;
  • the M execution thread groups are specifically determined by the host device grouping the N execution threads according to a desired completion time of the target process and a data transmission delay between the cores.
  • data transmission between each of the execution threads and the storage device is performed based on a direct storage DMA protocol.
  • the receiving by the receiving host device, M data read and write instructions, including:
  • the data read/write signal includes M signal components, the M signal components are in one-to-one correspondence with the M data read/write commands, and each data read/write command is carried in the corresponding signal component in.
  • the receiving by the receiving host device, M data read and write instructions, including:
  • the master thread belongs to the execution thread.
  • the executor of the method 400 for reading and writing data according to the embodiment of the present invention may correspond to the foregoing memory, and the specific process is similar to the operation of the foregoing memory, and details are not described herein again.
  • a host device groups N cores for executing a target process, and groups N execution threads corresponding to the N cores to determine M execution thread groups
  • the data read/write instruction sent to the storage device carries an indication identifier of the execution thread group corresponding to the data read/write instruction, so that the storage device can identify the thread group corresponding to the data read/write instruction according to the indication identifier, and further store
  • the device can forward the data read/write instruction to the thread group corresponding to the data read/write instruction, and enable each execution thread to perform data read and write operations on the storage device according to the data read/write instruction obtained from the storage device, thereby It can reduce the signaling and data transmission between the cores during data read and write operations, thereby reducing the processing delay caused by the signaling and data transmission, and shortening the completion time of data read and write operations in the multi-core computer system, and It can realize the expansion of multi-CPU by real-time operating system.
  • FIGS. 1 through 3 a method of reading and writing data according to an embodiment of the present invention is described in detail with reference to FIGS. 1 through 3.
  • an apparatus for reading and writing data according to an embodiment of the present invention will be described in detail with reference to FIGS. 4 and 5.
  • FIG. 4 shows a schematic block diagram of an apparatus 300 for reading and writing data in accordance with an embodiment of the present invention.
  • the apparatus 300 includes:
  • a determining unit 310 configured to determine N kernels for executing a target process, wherein the N cores are in one-to-one correspondence with N execution threads included in the target process, N ⁇ 2;
  • the grouping unit 320 is configured to group the N execution threads to determine M execution thread groups, and assign an indication identifier to each execution thread group, wherein an indication identifier is used to identify an execution thread group, and an execution thread Only belong to one execution thread group, one execution thread group includes at least one execution thread, M ⁇ 2;
  • the sending unit 330 is configured to send M data read/write instructions to the storage device, where the M data read/write instructions are in one-to-one correspondence with the M execution thread groups, and each of the data read/write instructions includes an indication of the corresponding execution thread group. Identifying, so that the storage device transmits each of the data read/write instructions to the corresponding execution thread group according to each indication identifier, so that each execution thread reads and writes instructions according to data obtained from the storage device, and the storage device Data read and write operations.
  • the device for reading and writing data is a host device in a computer system.
  • a host device in a computer system may be cited, and the host device has a plurality of CPUs (or kernels), wherein the plurality of CPUs may cooperate to perform a target task, for example, each The CPUs can respectively run part (one or more) threads in the process corresponding to the target task.
  • a plurality of CPUs are communicatively connected to each other, so that data sharing can be realized by means of handshaking or the like.
  • the computer system further includes a storage device for providing a storage function, and the host device can access a storage space in the storage device when performing a target task, and perform signals or data generated for performing the target task. Read and write operations (or storage operations).
  • the storage device may support various storage media.
  • the storage device may further include a storage interface expansion module, and may connect at least one solid state drive (SSD, Solid State Disk) and/or a hybrid hard disk ( HHD, Hybrid Hard Disk) can expand the capacity of storage devices as needed.
  • SSD Solid State Disk
  • HHD Hybrid Hard Disk
  • the host device and the storage device can be connected by various computer interfaces capable of data transmission, for example, a Peripheral Component Interconnect Express (PCIE) interface, a Thunderbolt interface, Infiniband interface, high-speed Universal Serial Bus (USB) interface, and high-speed Ethernet interface.
  • PCIE Peripheral Component Interconnect Express
  • Thunderbolt Thunderbolt interface
  • USB Universal Serial Bus
  • Ethernet Ethernet interface
  • the determining unit 310 may determine N CPUs (ie, cores) for executing the target task (ie, the target process) from all the CPUs included in the host device, below, For ease of understanding and explanation, the N CPUs for performing the target task are denoted as: CPU #1 to CPU #N.
  • the determining unit 310 may determine a specific value of the above “N” according to an operation amount required to execute the target task, for example, if the target task requires an operation The amount is large. In order to complete the task quickly, the determining unit 310 can make the value of the above “N” larger; if the amount of calculation required by the target task is small, only a small number of CPUs can complete the task quickly, then The determining unit 310 can make the value of the above "N” small.
  • the CPU #1 to CPU #N may be used to execute N threads (ie, execution threads) of the target task, respectively.
  • N threads ie, execution threads
  • the N threads are written as: thread #1 to thread #N, that is, CPU#1 ⁇ CPU#N correspond one-to-one with thread #1 to thread #N.
  • the corresponding rule of "one-to-one correspondence" may be that one CPU is used to control the same serial number. The thread is running.
  • the above-listed determining unit 310 determines the method of the N CPUs for executing the target task and the parameters used are merely exemplary descriptions, and the present invention is not particularly limited.
  • the determining unit 310 may also be preset according to the preset.
  • the value of the default value is the same for all tasks executed.
  • the preset value can be the sum of all the CPUs included in the host device.
  • the present invention is directed to solving the influence of the completion time of the data read/write operation caused by the data transmission delay between the CPUs. Therefore, when N ⁇ 2, the technical effects of the present invention can be fully embodied, and subsequently The technical effects will be described in detail.
  • N is 29, that is, the determining unit 310 determines 29 CPUs to execute a target process, and the target process includes 29 executions. Thread.
  • the CPU #1 to CPU #N determined as described above are divided into M CPU groups, or Thread #1 to Thread #N corresponding to the CPU #1 to CPU #N determined as described above. Divided into M thread groups. As a grouping basis, for example, the following rules can be cited:
  • the grouping unit is specifically configured to group the N execution threads according to a desired completion time of the target process and a data transmission delay between the cores.
  • the grouping unit 320 can determine the data transmission delay between the CPUs #1 to ##N described above, for example, the grouping unit 320 can acquire the models of the CPUs #1 to ##N, and the connection manner between each other. The information can be used to derive the data transmission delay between the CPU #1 and the CPU #N based on the above information.
  • the method of determining the data transmission delay between the CPUs #1 and CPU #N by the grouping unit 320 enumerated above and the parameters used are merely exemplary, and the present invention is not particularly limited.
  • the grouping unit 320 can also detect the data transmission delay between the CPUs #1 and ###N by experiments or the like.
  • the grouping unit 320 may further determine a desired completion time of the target process, where the target process needs to be completed within a prescribed time, and the expected completion time may be the target process from the start execution to the end execution (for example, may include The CPU elapses the time elapsed when the task is successfully executed and exits the process, and the expected completion time may be less than or equal to the prescribed time.
  • the grouping unit 320 may determine the expected completion time of the target process according to the attribute information of the target process type, processing priority, and the like, for example, if the type of the target process indicates that the service of the target process belongs to the real-time.
  • Type business for example, online games, video calls, etc.
  • it can be determined that the target process is more urgent and needs to be completed in a shorter time so that it can be determined that the target process has a shorter completion time (for example, lower than a preset threshold); for example, if the processing priority of the target process is marked as high, it can be determined that the target process has a higher urgency and needs to be completed in a shorter time, thereby determining The expected completion time of the target process is shorter (eg, below a preset threshold).
  • the grouping unit 320 can group the CPUs #1 to CPU#N based on the data transmission delay between the CPUs #1 to CPU#N determined as described above and the expected completion time of the target process, so as to include the respective The completion time of the target process, such as the sum of the data transfer delays between the CPUs in the CPU group, is less than or equal to the expected completion time of the target process, for example:
  • the grouping unit 320 can estimate the duration of completing the target process without the data transmission between the CPUs according to the processing capability of each CPU.
  • the reference completion time of the target process is simply referred to, so that the expectation of the target process can be obtained.
  • the difference of the reference completion time of the target process of the completion time may be further grouped based on the above results, so that the sum of the data transmission delays of the CPUs in each group is less than or equal to The above difference.
  • the grouping unit 320 may further divide the CPU#1 ⁇ CPU#N into M CPUs based on the preset reference value K.
  • Group, in the M CPU groups at least the number of CPUs included in the M-1 CPU groups is equal to the reference value K, or the number of CPUs included in at most one CPU group is less than the reference value K, In other words, the number of CPUs included in each CPU group is an integer greater than zero.
  • the preset reference value K may be appropriately changed according to parameters such as the load of the computer system, for example, if the load of the current computer system is large, Small K value.
  • K is 8, and therefore, 29 CPUs are divided into four CPU groups, that is, CPU group #1 to CPU group #4.
  • CPU group #1 includes 8 CPUs, that is, CPU #1 to CPU #8;
  • CPU group #2 includes 8 CPUs, that is, CPU #9 to CPU #16;
  • CPU group #3 includes 8 CPUs, that is, CPU#17 to CPU#24;
  • CPU group #4 includes five CPUs, that is, CPU#25 to CPU#29.
  • thread group #1 includes 8 threads, that is, thread #1 to thread #8; thread group #2 includes 8 threads, that is, thread #9 to thread #16; thread group #3 includes 8 threads, that is, Thread #17 to Thread #24; Thread Group #4 includes 5 threads, that is, Thread #25 to Thread #29.
  • the sending unit 330 can acquire data read/write instructions from each CPU, and can read and write instructions for each data according to the CPU group or the thread group divided as described above.
  • An indication flag is added to indicate the CPU group from which each data read/write instruction comes, or the thread group corresponding to each data read/write instruction.
  • the storage device may receive the M data read/write instructions through the receiving unit, and determine, by the determining unit, the thread group corresponding to each data read/write command according to the indication identifier carried by each data read/write command, or , the CPU group corresponding to each data read and write instruction.
  • the storage device may transmit, by the sending unit, the data read/write instruction to the thread group indicated by the indication identifier carried by the data read/write instruction according to the indication identifier carried by the data read/write instruction, so that each thread can obtain The read/write command from the corresponding CPU, and further, the data read/write command can be performed in the storage space of the storage device based on the data read/write command.
  • data transmission between each of the execution threads and the storage device is performed based on a direct storage DMA protocol.
  • Direct Memory Access refers to a high-speed data transfer operation that allows direct reading and writing of data between an external device and a memory, neither by the CPU nor by CPU intervention.
  • the entire data transfer operation can be performed under a control called a "DMA controller.”
  • the CPU can perform other tasks during the transfer process. That is, in the embodiment of the present invention, the apparatus 300 for reading and writing data may further have a DMA controller, and the DMA controller controls the data read and write operations of the threads in the storage device.
  • the DMA controller issues a DMA request to the CPU:
  • the implementation manner of the DMA transmission listed above is merely an exemplary description, and the present invention is not particularly limited, and a method capable of realizing DMA transmission in the prior art may also be used.
  • the functions of the above DMA controller can be implemented by software or a program or the like.
  • the sending unit 330 is specifically configured to send a data read/write signal to the storage device, where the data read/write signal includes M signal components, and the M signal components are in one-to-one correspondence with the M data read/write commands, and each data Read and write instructions are carried in the corresponding signal component.
  • the sending unit 330 can transmit the above data read/write commands to the storage device together with the same signal (or signal stream).
  • the method for sending the data read/write instruction to the storage device by the sending unit 330 listed above is only an exemplary description, and the present invention is not limited thereto, and the sending unit 330 may also carry each data read/write command on an independent signal. In, sent to the storage device.
  • the sending unit is specifically configured to send M data read/write instructions to the storage device by using a master thread included in the target process.
  • the main control CPU and the main control thread corresponding to the main control CUP may be configured, that is, the main control CPU may determine each data read/write instruction of the CPU #1 to CPU#N. And sending the above data read/write instruction to the storage device through the master thread.
  • the master thread belongs to the execution thread.
  • one CPU may be selected as the master CPU from the CPU #1 to CPU #N, and a thread corresponding to the master CPU is used as the master thread. That is, the master thread can be used to transfer the data read and write instructions to the storage device, and can also be used to access the storage space of the storage device for data read and write operations.
  • the target process is executed by 29 CPUs
  • the target process includes 29 execution threads
  • 29 CPUs are divided into 4 CPU groups, that is, CPU group #1 to CPU group # 4.
  • CPU group #1 includes 8 CPUs, that is, CPU #1 to CPU #8
  • CPU group #2 package 8 CPUs, that is, CPU #9 to CPU #16
  • CPU group #3 includes 8 CPUs, that is, CPU #17 to CPU #24
  • CPU group #4 includes 5 CPUs, that is, CPU #25 to CPU #29.
  • 29 threads ie, execution threads
  • 29 threads are divided into 4 thread groups, namely, thread group #1 to thread group #4.
  • thread group #1 includes 8 threads, that is, thread #1 to thread #8; thread group #2 includes 8 threads, that is, thread #9 to thread #16; thread group #3 includes 8 threads, that is, Thread #17 to Thread #24; Thread Group #4 includes 5 threads, that is, Thread #25 to Thread #29.
  • CPU #0 is the master CPU
  • thread #0 is the master thread
  • step 1 the thread #0 sends a request signal to the memory, where the request signal carries a data read/write request of each CPU that needs to perform data read and write operations in the current cycle, and each data read/write request carries The indication ID of the corresponding thread group.
  • the storage device may determine the thread group corresponding to each data read/write request according to the indication identifier carried in each data read/write request, and the storage device splits the request signal and generates the thread group as a unit.
  • the response signal corresponding to each thread group, and the indication identifiers of the data read/write instructions carried in each response signal all point to the same thread group.
  • the memory may send each response signal to the corresponding thread group according to the indication.
  • each thread group can perform data read and write operations in the storage device based on data read/write instructions from the storage device under the control of the corresponding CPU group.
  • the main control thread can detect the completion of the data read and write operations of each thread group, and the main control thread can end immediately after all the threads in a certain thread group complete the data read and write operations.
  • the control of the thread group, or the master thread can also end the control of all thread groups after all the threads in all thread groups complete the data read and write operations.
  • the storage device can detect the completion of the data read and write operations in the storage space, and the storage device can immediately notify the main after all the threads in a certain thread group complete the data read and write operations. Controlling the thread so that the master thread ends the control of the thread group, or the storage device can uniformly notify the master thread after all the threads in all the thread groups complete the data read and write operations, so that the master thread ends. Control of all thread groups.
  • jitter An important measure of a real-time operating system is the time it takes from receiving a task to completing it, and the change in time is called jitter.
  • the primary goal of designing a real-time operating system is not to have high throughput, but to ensure that tasks are completed within a certain time.
  • the host device groups the N cores for executing the target process, and groups the N execution threads corresponding to the N cores to determine M execution threads.
  • the data read/write instruction sent to the storage device carries an indication identifier of the execution thread group corresponding to the data read/write instruction, so that the storage device can identify the thread corresponding to the data read/write instruction according to the indication identifier.
  • the group, and further the storage device can forward the data read/write instruction to the thread group corresponding to the data read/write instruction, and enable each execution thread to perform data reading in the storage device according to the data read/write instruction obtained from the storage device.
  • FIG. 5 shows a schematic block diagram of a memory device 400 in accordance with an embodiment of the present invention.
  • the storage device 400 includes:
  • a transmission interface 410 configured to communicate between the storage device and the host device
  • the controller 430 is configured to receive, by the transmission interface, M data read/write instructions sent by the host device, where the M data read/write instructions are in one-to-one correspondence with the M execution thread groups, and each of the data read/write instructions includes a corresponding one.
  • the indication identifier is used to identify an execution thread group
  • the M execution thread group is determined by the host device grouping the N execution threads included in the target process, and the host device determines the
  • the N cores executing the target process are in one-to-one correspondence with the N execution threads, N ⁇ 2, one execution thread belongs to only one execution thread group, and one execution thread group includes at least one execution thread, M ⁇ 2, according to the Instructing the identifier to determine a data read/write instruction corresponding to each execution thread group, and transmitting, by the transmission interface, each data read/write instruction to the corresponding execution thread group, so that each execution thread reads according to the obtained data Write instructions to perform data read and write operations in the storage device 400.
  • the M execution thread groups are specifically determined by the controller 420 by grouping the N execution threads according to a desired completion time of the target process and a data transmission delay between the cores.
  • data transmission between each of the execution threads and the storage device is performed based on a direct storage DMA protocol.
  • the controller 420 is specifically configured to receive, by using the transmission interface, a data read/write signal sent by the host device, where the data read/write signal includes M signal components, and the M signal components and the M data read and write commands.
  • the data read/write signal includes M signal components, and the M signal components and the M data read and write commands.
  • the controller 420 is specifically configured to receive, by using the transmission interface, a data read/write signal sent by the host device, where the data read/write signal includes M signal components, and the M signal components and the M data read and write commands.
  • the controller 420 is specifically configured to receive, by using the transmission interface, a data read/write signal sent by the host device, where the data read/write signal includes M signal components, and the M signal components and the M data read and write commands.
  • M signal components includes M signal components
  • the M signal components and the M data read and write commands One-to-one correspondence, each data read and write instruction is carried in the corresponding signal component.
  • the M data read/write instructions are sent by the host device through a master thread included in the target process.
  • the master thread belongs to the execution thread.
  • the storage device 400 reads and writes data in its storage space according to the data read/write instructions from each execution thread group.
  • the operation process can be similar to the data reading and writing process in the prior art. Here, in order to avoid redundancy, detailed description thereof is omitted.
  • the controller may implement or perform the steps and logic block diagrams disclosed in the method embodiments of the present invention.
  • the controller can be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented by the hardware processor, or may be performed by a combination of hardware and software modules in the decoding processor.
  • the software module may be located in the above storage space, for example, a random access memory, a flash memory, a read only memory, a programmable read only memory or an electrically erasable programmable memory, a register, and the like.
  • the controller reads the information in the above storage space, and completes the steps of the above method in combination with the hardware thereof.
  • the storage device 400 may correspond to the storage device in the above description, and the functions of the modules and units included in the storage device 400 are similar to those of the corresponding modules or units in the storage device. Here, in order to avoid redundancy, detailed description thereof is omitted. .
  • the storage device 400 can be a read only memory and a random access memory and provides instructions and data to the host device.
  • a portion of storage device 400 may also include a non-volatile random access memory.
  • the storage device 400 can also store information of the device type.
  • the host device groups the N cores for executing the target process, and groups the N execution threads corresponding to the N cores to determine M execution thread groups, and sends them to the storage.
  • the data read/write instruction of the device carries the corresponding data read/write instruction
  • the execution indicator of the execution thread group so that the storage device can identify the thread group corresponding to the data read/write instruction according to the indication identifier, and the storage device can forward the data read/write instruction to the thread corresponding to the data read/write instruction.
  • the group can enable each execution thread to perform data read and write operations on the storage device according to the data read and write instructions obtained from the storage device, thereby reducing signaling and data transmission between the kernels during data read and write operations. In turn, the processing delay caused by the signaling and data transmission is reduced, the completion time of data read and write operations in the multi-core computer system can be shortened, and the real-time operating system can be extended to multiple CPUs.
  • FIG. 6 is a schematic flowchart of a computer system 500 according to another embodiment of the present invention. As shown in FIG. 6, the computer system 500 includes:
  • a host device 520 connected to the bus, configured to determine N cores for executing a target process, wherein the N cores are in one-to-one correspondence with N execution threads included in the target process, N ⁇ 2, for the N
  • the execution thread performs grouping to determine M execution thread groups, and assigns an indication identifier to each execution thread group, wherein an indication identifier is used to identify an execution thread group, and an execution thread belongs to only one execution thread group and one execution thread group.
  • the storage device 530 connected to the bus is configured to receive the M data read/write instructions through the bus 510, and determine, according to the indication identifier, a data read/write instruction corresponding to each execution thread group, and transmit each data read/write instruction To the corresponding execution thread group, so that each execution thread performs data read and write operations in the storage device according to the obtained data read/write instructions.
  • the M execution thread groups are determined by the host device grouping the N execution threads according to a desired completion time of the target process and a data transmission delay between the cores.
  • data transmission between each of the execution threads and the storage device is performed based on a direct storage DMA protocol.
  • the host device 510 described above may correspond to the apparatus 300 for reading and writing data according to an embodiment of the present invention.
  • the foregoing storage device 520 may correspond to the apparatus 200 for reading and writing data according to the embodiment of the present invention.
  • the functions thereof are not described herein again.
  • the host device pairs N for executing the target process
  • the cores are grouped, and the N execution threads corresponding to the N cores are grouped to determine M execution thread groups, and the data read/write instructions sent to the storage device carry the execution corresponding to the data read/write instructions.
  • the indication of the thread group so that the storage device can identify the thread group corresponding to the data read/write instruction according to the indication identifier, and the storage device can forward the data read/write instruction to the thread group corresponding to the data read/write instruction.
  • the execution thread can perform data read and write operations on the storage device according to the data read and write instructions obtained from the storage device, thereby reducing signaling and data transmission between the kernels during data read and write operations, thereby reducing
  • the processing delay caused by the signaling and data transmission can shorten the completion time of data read and write operations in the multi-core computer system, and can realize the expansion of the real-time operating system to multiple CPUs.
  • the size of the sequence numbers of the above processes does not mean the order of execution, and the order of execution of each process should be determined by its function and internal logic, and should not be taken to the embodiments of the present invention.
  • the implementation process constitutes any limitation.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Bus Control (AREA)

Abstract

本发明实施例提供了一种读写数据的方法、装置、存储设备和计算机***,能够缩短多核计算机***中数据读写操作的完成时间,该方法包括:主机设备确定用于执行目标进程的N个内核,该N个内核与该目标进程包括的N个执行线程一一对应;对该N个执行线程进行分组,以确定M个执行线程组,并为各执行线程组分配指示标识;向存储设备发送M个数据读写指令,各该数据读写指令包括所对应的执行线程组的指示标识,以便于该存储设备根据各该数据读写指令包括的指示标识,确定各该数据读写指令所对应的执行线程组,并将各该数据读写指令传输至所对应的执行线程组,以使各该执行线程根据从该存储设备获得的数据读写指令,在该存储设备中进行数据读写操作。

Description

读写数据的方法、装置、存储设备和计算机*** 技术领域
本发明涉及计算机领域,并且更具体地,涉及读写数据的方法、装置、存储设备和计算机***。
背景技术
在具有多个内核的计算机***中,一个内核与存储设备之间进行读写操作的数据,可能需要经由其他内核的转发,即,数据读写操作需要内核与内核之间的配合执行,而内核与内核之间存在较大的数据传输时延,该较大的数据传输时延,严重影响了数据读写操作的完成时间,进而影响了整个任务(例如,信号处理)的完成时间。
因此,希望提供一种技术,能够缩短多核计算机***中数据读写操作的完成时间。
发明内容
本发明实施例提供一种读写数据的方法、装置和***,能够缩短多核计算机***中数据读写操作的完成时间。
第一方面,提供了一种读写数据的方法,该方法包括:主机设备确定用于执行目标进程的N个内核,其中,该N个内核与该目标进程包括的N个执行线程一一对应,N≥2;对该N个执行线程进行分组,以确定M个执行线程组,并为各执行线程组分配指示标识,其中,一个指示标识用于标识一个执行线程组,一个执行线程仅属于一个执行线程组,一个执行线程组包括至少一个执行线程,M≥2;向存储设备发送M个数据读写指令,该M个数据读写指令与该M个执行线程组一一对应,各该数据读写指令包括所对应的执行线程组的指示标识,以便于该存储设备根据各该数据读写指令包括的指示标识,确定各该数据读写指令所对应的执行线程组,并将各该数据读写指令传输至所对应的执行线程组,以使各该执行线程根据从该存储设备获得的数据读写指令,在该存储设备中进行数据读写操作。
结合第一方面,在第一方面的第一种实现方式中,各该执行线程与该存储设备之间的数据传输是基于直接存储DMA协议进行的。
结合第一方面及其上述实现方式,在第一方面的第二种实现方式中,该向存储设备发送M个数据读写指令,包括:向存储设备发送数据读写信号,该数据读写信号包括M个信号分量,该M个信号分量与该M个数据读写指令一一对应,各数据读写指令承载于该对应的信号分量中。
结合第一方面及其上述实现方式,在第一方面的第三种实现方式中,该向存储设备发送M个数据读写指令,包括:通过该目标进程包括的主控线程,向存储设备发送M个数据读写指令。
结合第一方面及其上述实现方式,在第一方面的第四种实现方式中,该对该N个执行线程进行分组,包括:确定该目标进程的期望完成时长以及各该内核彼此之间的数据传输时延;根据该期望完成时长和该数据传输时延,对该N个执行线程进行分组。
第二方面,提供了一种读写数据的方法,该方法包括:存储设备接收主机设备发送的M个数据读写指令,该M个数据读写指令与M个执行线程组一一对应,各该数据读写指令包括所对应的执行线程组的指示标识,一个指示标识用于标识一个执行线程组,该M个执行线程组是该主机设备对目标进程包括的N个执行线程进行分组而确定的,该主机设备确定的用于执行该目标进程的N个内核与该N个执行线程一一对应,N≥2,一个执行线程仅属于一个执行线程组,一个执行线程组包括至少一个执行线程,M≥2;根据该指示标识,确定各执行线程组所对应的数据读写指令;将各数据读写指令传输至所对应的执行线程组,以使各该执行线程根据所获得的数据读写指令,在该存储设备中进行数据读写操作。
结合第二方面,在第二方面的第一种实现方式中,各该执行线程与该存储设备之间的数据传输是基于直接存储DMA协议进行的。
结合第二方面及其上述实现方式,在第二方面的第二种实现方式中,该接收主机设备发送的M个数据读写指令,包括:接收主机设备发送的数据读写信号,该数据读写信号包括M个信号分量,该M个信号分量与该M个数据读写指令一一对应,各数据读写指令承载于该对应的信号分量中。
结合第二方面及其上述实现方式,在第二方面的第三种实现方式中,该接收主机设备发送的M个数据读写指令,包括:接收该目标进程包括的主控线程发送的M个数据读写指令。
结合第二方面及其上述实现方式,在第二方面的第四种实现方式中,该 M个执行线程组具体是该主机设备根据该目标进程的期望完成时长以及各该内核彼此之间的数据传输时延,对该N个执行线程进行分组而确定的。
第三方面,提供了一种读写数据的装置,该装置包括:确定单元,用于确定用于执行目标进程的N个内核,其中,该N个内核与该目标进程包括的N个执行线程一一对应,N≥2;分组单元,用于对该N个执行线程进行分组,以确定M个执行线程组,其中,一个执行线程仅属于一个执行线程组,一个执行线程组包括至少一个执行线程,M≥2;发送单元,用于向存储设备发送M个数据读写指令,该M个数据读写指令与该M个执行线程组一一对应,各该数据读写指令包括所对应的执行线程组的指示标识,一个指示标识用于唯一地标识一个执行线程组,以便于该存储设备根据各该指示标识将各该数据读写指令传输至所对应的执行线程组,以使各该执行线程根据从该存储设备获得的数据读写指令,在该存储设备中进行数据读写操作。
结合第三方面,在第三方面的第一种实现方式中,各该执行线程与该存储设备之间的数据传输是基于直接存储DMA协议进行的。
结合第三方面及其上述实现方式,在第三方面的第二种实现方式中,该发送单元具体用于向存储设备发送数据读写信号,该数据读写信号包括M个信号分量,该M个信号分量与该M个数据读写指令一一对应,各数据读写指令承载于该对应的信号分量中。
结合第三方面及其上述实现方式,在第三方面的第三种实现方式中,该发送单元具体用于通过该目标进程包括的主控线程,向存储设备发送M个数据读写指令。
结合第三方面及其上述实现方式,在第三方面的第四种实现方式中,该分组单元具体用于根据该目标进程的期望完成时长以及各该内核彼此之间的数据传输时延,对该N个执行线程进行分组。
结合第三方面及其上述实现方式,在第三方面的第五种实现方式中,该读写数据的装置为计算机***中的主机设备。
第四方面,提供了一种存储设备,包括:传输接口,用于该存储设备与主机设备之间的通信;存储空间,用于存储数据;控制器,用于通过该传输接口接收该主机设备发送的M个数据读写指令,该M个数据读写指令与M个执行线程组一一对应,各该数据读写指令包括所对应的执行线程组的指示标识,一个指示标识用于标识一个执行线程组,该M个执行线程组是该主 机设备对目标进程包括的N个执行线程进行分组而确定的,该主机设备确定的用于执行该目标进程的N个内核与该N个执行线程一一对应,N≥2,一个执行线程仅属于一个执行线程组,一个执行线程组包括至少一个执行线程,M≥2,用于根据该指示标识,确定各执行线程组所对应的数据读写指令,通过该传输接口,用于将各数据读写指令传输至所对应的执行线程组,以使各该执行线程根据所获得的数据读写指令,在该存储空间中进行数据读写操作。
结合第四方面,在第四方面的第一种实现方式中,各该执行线程与该存储设备之间的数据传输是基于直接存储DMA协议进行的。
结合第四方面及其上述实现方式,在第四方面的第二种实现方式中,该接收单元具体用于接收主机设备发送的数据读写信号,该数据读写信号包括M个信号分量,该M个信号分量与该M个数据读写指令一一对应,各数据读写指令承载于该对应的信号分量中。
结合第四方面及其上述实现方式,在第四方面的第三种实现方式中,该控制器具体用于通过该传输接口接收该主机设备发送的数据读写信号,该数据读写信号包括M个信号分量,该M个信号分量与该M个数据读写指令一一对应,各数据读写指令承载于该对应的信号分量中。
结合第四方面及其上述实现方式,在第四方面的第四种实现方式中,该M个执行线程组具体是该主机设备根据该目标进程的期望完成时长以及各该内核彼此之间的数据传输时延,对该N个执行线程进行分组而确定的。
第五方面,提供了一种计算机***,包括:总线;与该总线相连的主机设备,用于确定用于执行目标进程的N个内核,其中,该N个内核与该目标进程包括的N个执行线程一一对应,N≥2,对该N个执行线程进行分组,以确定M个执行线程组,并为各执行线程组分配指示标识,其中,一个指示标识用于标识一个执行线程组,一个执行线程仅属于一个执行线程组,一个执行线程组包括至少一个执行线程,M≥2,通过该总线向存储设备发送M个数据读写指令,该M个数据读写指令与该M个执行线程组一一对应,各该数据读写指令包括所对应的执行线程组的指示标识,一个指示标识用于唯一地标识一个执行线程组;与该总线相连的存储设备,用于通过该总线接收该M个数据读写指令,并根据该指示标识,确定各执行线程组所对应的数据读写指令,将各数据读写指令传输至所对应的执行线程组,以使各该执 行线程根据所获得的数据读写指令,在该存储设备中进行数据读写操作。
结合第五方面,在第五方面的第一种实现方式中,该M个执行线程组是该主机设备根据该目标进程的期望完成时长以及各该内核彼此之间的数据传输时延,对该N个执行线程进行分组而确定的。
根据本发明的读写数据的方法、装置、存储设备和计算机***,主机设备对用于执行目标进程的N个内核进行分组,并对该N个内核所对应的N个执行线程进行分组以确定M个执行线程组,并在发送给存储设备的数据读写指令中携带与该数据读写指令所对应的执行线程组的指示标识,从而存储设备能够根据该指示标识,识别该数据读写指令所对应的线程组,进而存储设备能够将该数据读写指令转发给该数据读写指令所对应的线程组,能够使各执行线程根据从该存储设备获得的数据读写指令,在该存储设备中进行数据读写操作,从而能够减少在进行数据读写操作时内核之间的信令和数据传输,进而减少因该信令和数据传输而导致的处理时延,能够缩短多核计算机***中数据读写操作的完成时间。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明一实施例的读写数据的装置的示意性结构图。
图2是本发明一实施例的读写数据的流程的示意图。
图3是本发明另一实施例的读写数据的装置的示意性结构图。
图4是本发明一实施例的读写数据的方法的示意性流程图。
图5是本发明另一实施例的读写数据的方法的示意性流程图。
图6是本发明一实施例的读写数据的***的示意结构图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创 造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本发明的技术方案,可以运行在包括例如,CPU、存储器管理单元(MMU,Memory Management Unit)、内存(也称为存储器)的硬件设备上,该硬件设备所运行的操作***可以是各种通过线程或进程(包括多个线程)实现业务处理的计算机操作***,例如,Linux***、Unux***等。
本发明的读写数据的装置可应用计算机***,具体用于进行数据读写操作,例如,该数据读写操作可以是中央处理器(CPU,Central Processing Unit)控制各线程进行运算处理时在缓存设备(存储设备的一例)中进行的数据读写操,或者该数据读写操作可以是CPU在磁盘设备(存储设备的另一例)中进行的数据读写操,本发明并未特别限定,以下,为了便于理解,以在缓存设备中进行的数据读写操过程为例,对本发明实施例的读写数据的方法、装置和***进行详细说明。
另外,作为上述计算机***,例如,可以列举实时操作***(RTOS,Real-time operating system),又称即时操作***,是指当外界事件或数据产生时,能够接受并以足够快的速度予以处理,其处理的结果又能在规定的时间之内来控制生产过程或对处理***做出快速响应,并控制所有实时任务协调一致运行的操作***。其与一般的操作***相比,最大的特色就是其“实时性”,也就是说,如果有一个任务需要执行,实时操作***会马上(在较短时间内)完成该任务,不会有较长的时延。
应理解,以上列举的实时操作***仅为计算机***的示例性说明,本发明并未特别限定,为了便于理解和说明,以下,以在实时操作***中的应用为例,对本发明实施例的读写数据的方法、装置和***进行详细说明。
图1示出了根据本发明一实施例的读写数据的方法100的示意性框图。如图1所示,该方法100包括:
S110,主机设备确定用于执行目标进程的N个内核,其中,该N个内核与该目标进程包括的N个执行线程一一对应,N≥2;
S120,对该N个执行线程进行分组,以确定M个执行线程组,并为各执行线程组分配指示标识,其中,一个指示标识用于标识一个执行线程组,一个执行线程仅属于一个执行线程组,一个执行线程组包括至少一个执行线程,M≥2;
S130,向存储设备发送M个数据读写指令,该M个数据读写指令与该 M个执行线程组一一对应,各该数据读写指令包括所对应的执行线程组的指示标识,以便于该存储设备根据各该数据读写指令包括的指示标识,确定各该数据读写指令所对应的执行线程组,并将各该数据读写指令传输至所对应的执行线程组,以使各该执行线程根据从该存储设备获得的数据读写指令,在该存储设备中进行数据读写操作
首先,作为该方法100的执行主体,可以列举计算机***中的主机设备,并且,该主机设备具有多个CPU(或者说,内核),其中,该多个CPU可以协同作业以完成目标任务,例如,每个CPU可以分别运行与该目标任务相对应的进程中的部分(一个或多个)线程。多个CPU彼此之间通信连接,从而可以通过信号交换等方式,实现数据共享。
此外,该计算机***还包括存储设备,该存储设备用于提供存储功能,主机设备在执行目标任务时可以访问该存储设备中的存储空间,进行针对在执行目标任务时产生的信号或数据等的读写操作(或者说,存储操作)。在本发明实施例中,存储设备可以支持各种存储介质,可选地,该存储设备还可以包括存储接口扩展模块,可以连接至少一个固态硬盘(SSD,Solid State Disk)和/或混合硬盘(HHD,Hybrid Hard Disk)从而可以根据需要扩大存储设备的容量。
在本发明实施例中,主机设备与存储设备之间可以通过能够实现数据传输各种计算机接口连接,例如,高速外设部件互连(PCIE,Peripheral Component Interconnect Express)接口、雷电(Thunderbolt)接口、无限带宽(Infiniband)接口、高速通用串行总线(USB,Universal Serial Bus)接口以及高速以太网接口等。
下面,分别对该方法100的各步骤进行详细说明。
在S110,当主机设备确定需要执行目标任务时,可以从主机设备所包括的所有CPU中,确定用于执行该目标任务(即,目标进程)的N个CPU(即,内核),以下,为了便于理解和说明,将用于执行该目标任务的N个CPU记做:CPU#1~CPU#N。
作为示例而非限定,主机设备可以根据执行该目标任务所需要的运算量,来确定上述“N”的具体数值,例如,如果该目标任务所需要的运算量较大,为了快速完成该任务,可以使上述“N”的数值较大;如果该目标任务所需要的运算量较小,仅需较少的CPU便能够快速完成该任务,则可以 使上述“N”的数值较小。
另外,该CPU#1~CPU#N可以分别用于执行上述目标任务的N个线程(即,执行线程),以下,为了便于理解和说明,将该N个线程记做:线程#1~线程#N,即,CPU#1~CPU#N与线程#1~线程#N一一对应,作为示例而非限定,上述“一一对应”的对应规则可以为,一个CPU用于控制序号相同的线程的运行。
应理解,以上列举的主机设备确定用于执行该目标任务的N个CPU的方法以及所使用的参数仅为示例性说明,本发明并未特别限定,例如,还可以根据预设的数值,默认为所执行的所有任务使用的CPU的数量均相同,例如,该预设的数值可以为主机设备所包括的所有CPU的总和。
需要说明的是,本发明致力于解决因CPU之间的数据传输时延而导致的数据读写操作的完成时间的影响,因此,当N≥2时,能够充分体现本发明的技术效果,随后对技术效果进行详细说明。
图2是本发明一实施例的读写数据的流程的示意图,在图2所示示例中,N为29,即,主机设备确定29个CPU来执行目标进程,该目标进程包括29个执行线程。
其后,在S120,主机设备可以将如上所述确定的CPU#1~CPU#N分为M个CPU组,或者说,用于将如上所述确定的CPU#1~CPU#N所对应的线程#1~线程#N分为M个线程组。作为分组依据,例如,可以列举以下规则:
可选地,该对该N个执行线程进行分组,包括:
确定该目标进程的期望完成时长以及各该内核彼此之间的数据传输时延;
根据该期望完成时长和该数据传输时延,对该N个执行线程进行分组。
具体地说,主机设备可以确定上述CPU#1~CPU#N彼此之间的数据传输时延,例如,可以获取CPU#1~CPU#N的型号、彼此之间连接方式等信息,从而可以根据上述信息,推算出CPU#1~CPU#N彼此之间的数据传输时延。
应理解,以上列举的主机设备确定CPU#1~CPU#N彼此之间的数据传输时延的方法以及所使用的参数仅为示例性说明,本发明并未特别限定,例如,还可以通过试验等方式,检测CPU#1~CPU#N彼此之间的数据传输时延。
并且,主机设备还可以确定该目标进程的期望完成时长,其中,该该目标进程需要在规定的时间内完成,该期望完成时长可以是该目标进程从开始 执行到结束执行(例如,可以包括CPU判定该任务执行成功并退出进程的时间)所经历的时间,并且,该期望完成时长可以小于或等于上述规定的时间。
作为示例而非限定,主机设备可以根据该目标进程的类型、处理优先级等属性信息,确定该目标进程的期望完成时长,例如,如果该目标进程的类型指示该目标进程的业务属于实时类型业务(例如,在线游戏,视频通话等),则可以确定该目标进程的紧急程度较高,且需要在较短时间内完成,从而可以确定目标进程的期望完成时长较短(例如,低于一个预设的门限值);再例如,如果该目标进程的处理优先级被标记为高时,则可以确定该目标进程的紧急程度较高,且需要在较短时间内完成,从而可以确定目标进程的期望完成时长较短(例如,低于一个预设的门限值)。
从而,主机设备可以基于如上所述确定的CPU#1~CPU#N彼此之间的数据传输时延以及目标进程的期望完成时长,对CPU#1~CPU#N进行分组,以使包括各CPU组内的CPU彼此之间的数据传输时延之和在内的目标进程的完成时长小于或等于目标进程的期望完成时长,例如:
主机设备可以根据各CPU的处理能力,推算在CPU彼此之间不发生数据传输的情况下完成该目标进程的时长,以下,简称该目标进程的参考完成时长,从而可以获得该目标进程的期望完成时长的该目标进程的参考完成时长的差值,可以基于上述结果进而对CPU#1~CPU#N进行分组,以使各组内的CPU彼此之间的数据传输时延之和小于或等于上述差值。
应理解,以上列举的分组依据仅为示例性说明,本发明并未限定于此,例如,主机设备还可以基于预设的基准值K,将CPU#1~CPU#N分为M个CPU组,在该M个CPU组中,至少M-1个CPU组所包括的CPU的数量等于该基准值K,或者说,至多1个CPU组所包括的CPU的数量小于该基准值K,无需赘言,各CPU组所包括的CPU的数量为大于零的整数。
并且,在本发明实施例中,该预设的基准值K可以根据计算机***的负载等参数适当变更,例如,如果当前计算机***的负载较大,则可以采用较小的K值。
在图2所示示例中,K为8,因此,29个CPU被分为4个CPU组,即,CPU组#1~CPU组#4。其中,CPU组#1包括8个CPU,即,CPU#1~CPU#8;CPU组#2包括8个CPU,即,CPU#9~CPU#16;CPU组#3包括8个CPU, 即,CPU#17~CPU#24;CPU组#4包括5个CPU,即,CPU#25~CPU#29。
同样,29个线程(即,执行线程)被分为4个线程组,即,线程组#1~线程组#4。其中,线程组#1包括8个线程,即,线程#1~线程#8;线程组#2包括8个线程,即,线程#9~线程#16;线程组#3包括8个线程,即,线程#17~线程#24;线程组#4包括5个线程,即,线程#25~线程#29。
其后,在S130,主机设备可以向存储设备发送M个数据读写指令,具体地说,主机设备可以从各CPU获取数据读写指令,并且,可以根据如上述划分的CPU组或线程组,为各数据读写指令添加指示标识,以指示各数据读写指令所来自的CPU组,或者说,各数据读写指令所对应的线程组。
从而,存储设备可以通过接收单元接收上述M个数据读写指令,并且,可以通过确定单元,根据各数据读写指令所携带的指示标识,确定各数据读写指令所对应的线程组,或者说,各数据读写指令所对应的CPU组。
其后,存储设备可以通过发送单元,根据数据读写指令所携带的指示标识,将数据读写指令传输至该数据读写指令所携带的指示标识所指示的线程组,从而,各线程能够获得来自所对应的CPU的读写指令,进而,能够根据该数据读写指令在存储设备的存储空间中进行数据读写操作。
可选地,各该执行线程与该存储设备之间的数据传输是基于直接存储DMA协议进行的。
具体地说,存储器直接访问(DMA,Direct Memory Access)是指一种高速的数据传输操作,允许在外部设备和存储器之间直接读写数据,既不通过CPU,也不需要CPU干预。例如,可以使整个数据传输操作在一个称为“DMA控制器”的控制下进行。CPU除了在数据传输开始和结束时做一点处理外,在传输过程中还可以进行其他的工作。即,在本发明实施例中,在该计算机***中还可以配置DMA控制器,并由DMA控制器控制各线程在存储设备中的数据读写操作。
实现DMA传送的基本操作如下:
(1)DMA控制器向CPU(属于主机设备)发出DMA请求:
(2)CPU响应DMA请求,***转变为DMA工作方式,并把总线控制权交给DMA控制器;
(3)由DMA控制器发送存储器地址,并决定传送数据块的长度;
(4)执行DMA传送;
(5)DMA操作结束,并把总线控制权交还CPU。
应理解,以上列举的DMA传输的实现方式仅为示例性说明,本发明并未特别限定,也可以使用现有技术中能够实现DMA传输的方法。例如,在本发明实施例中,可以通过软件或程序等实现上述DMA控制器的功能。
可选地,该向存储设备发送M个数据读写指令,包括:
向存储设备发送数据读写信号,该数据读写信号包括M个信号分量,该M个信号分量与该M个数据读写指令一一对应,各数据读写指令承载于该对应的信号分量中。
具体地说,主机设备可以将上述各数据读写指令承载于同一信号(或者说,信号流)中一并发送给存储设备。
应理解,以上列举的主机设备向存储设备发送数据读写指令的方法仅为示例性说明,本发明并未限定于此,也可以将各数据读写指令分别承载于独立的信号中,发送给存储设备。
可选地,该向存储设备发送M个数据读写指令,包括:
通过该目标进程包括的主控线程,向存储设备发送M个数据读写指令。
具体地说,在本发明实施例中,可以配置主控CPU以及与该主控CUP相对应的主控线程,即,主控CPU可以确定上述CPU#1~CPU#N的各数据读写指令,并通过该主控线程向存储设备发送上述数据读写指令。
可选地,该主控线程属于该执行线程。
具体地说,在本发明实施例中,可以从上述CPU#1~CPU#N中选择一个CPU作为主控CPU,与该主控CPU相对应的线程作为主控线程。即,该主控线程可以即用于向存储设备传输上述数据读写指令,也可以用于访问存储设备的存储空间,以进行数据读写操作。
下面结合图2,对本发明实施例的读写数据的流程进行详细说明。
在图2所示实施例中,由29个CPU来执行目标进程,该目标进程包括29个执行线程,并且,29个CPU被分为4个CPU组,即,CPU组#1~CPU组#4。其中,CPU组#1包括8个CPU,即,CPU#1~CPU#8;CPU组#2包括8个CPU,即,CPU#9~CPU#16;CPU组#3包括8个CPU,即,CPU#17~CPU#24;CPU组#4包括5个CPU,即,CPU#25~CPU#29。对应的,29个线程(即,执行线程)被分为4个线程组,即,线程组#1~线程组#4。其中,线程组#1包括8个线程,即,线程#1~线程#8;线程组#2包括8 个线程,即,线程#9~线程#16;线程组#3包括8个线程,即,线程#17~线程#24;线程组#4包括5个线程,即,线程#25~线程#29。
并且,在图2所示实施例中,CPU#0为主控CPU,线程#0为主控线程。
在步骤1,线程#0向存储器发送请求信号,该请求信号中携带有在本周期内需要进行数据读写操作的各CPU的数据读写请求,并且,各数据读写请求中分别携带有所对应的线程组的指示标识。
在步骤2,存储设备可以根据各数据读写请求中携带的指示标识,确定各数据读写请求所对应的线程组,并且,存储设备对该请求信号进行拆分,以线程组为单位,生成与各线程组相对应的响应信号,并且,各响应信号中承载的数据读写指令的指示标识均指向同一线程组。
在步骤3,存储器可以将根据指示标识,将各响应信号发送至所对应的线程组。
在步骤4,各线程组可以在所对应的CUP组的控制下,基于来自存储设备的数据读写指令,在存储设备中进行数据读写操作,其中,该数据读写操作的过程可以与现有技术相似,这里,为了避免赘述,省略其详细说明。
另外,在本发明实施例中,主控线程可以检测各线程组的数据读写操作完成情况,并且,主控线程可以在某一线程组内的所有线程均完成数据读写操作之后,立即结束对该线程组的控制,或者,主控线程也可以在全部线程组内的所有线程均完成数据读写操作之后,统一结束对所有线程组的控制。
或者,在本发明实施例中,存储设备可以检测其存储空间内的数据读写操作完成情况,并且,存储设备可以在某一线程组内的所有线程均完成数据读写操作之后,立即通知主控线程,以使主控线程结束对该线程组的控制,或者,存储设备也可以在全部线程组内的所有线程均完成数据读写操作之后,统一通知主控线程,以使主控线程结束对所有线程组的控制。
衡量一个实时操作***的重要指标,是它从接收一个任务,到完成该任务所需的时间,其时间的变化称为抖动。设计实时操作***的首要目标不是高的吞吐量,而是保证任务在特定时间内完成。
但是,目前的实时操作***对多CPU或者说,多核的支持有限。其原因是多核实时操作***存在较大的核间数据传输时延。具体地说,通常CPU之间采用非互联(mesh)的快速通道互联(QPI,QuickPath Interconnect)连接方式,当一个实时的任务需要跨多个CPU配合执行的时候,例如,一个 CPU在缓存设备中的所读写的数据需要经由另一个CPU的转发,就会产生较大的时延,随即产生一连串的时延反应,从而致使整个***无法正常运行。从而,当执行运算量较大的任务时,仍然仅能依靠数量有限的内核,导致执行时间增长,达不到对实时操作***的要求。
与此相对,根据本发明的读写数据的方法,主机设备对用于执行目标进程的N个内核进行分组,并对该N个内核所对应的N个执行线程进行分组以确定M个执行线程组,并在发送给存储设备的数据读写指令中携带与该数据读写指令所对应的执行线程组的指示标识,从而存储设备能够根据该指示标识,识别该数据读写指令所对应的线程组,进而存储设备能够将该数据读写指令转发给该数据读写指令所对应的线程组,能够使各执行线程根据从该存储设备获得的数据读写指令,在该存储设备中进行数据读写操作,从而能够减少在进行数据读写操作时内核之间的信令和数据传输,进而减少因该信令和数据传输而导致的处理时延,能够缩短多核计算机***中数据读写操作的完成时间,并且能够实现实时操作***对多CPU的扩展。
图3是本发明另一实施例的读写数据的方法200的示意性流程图,如图3所示,该方法200包括:
S210,存储设备接收主机设备发送的M个数据读写指令,该M个数据读写指令与M个执行线程组一一对应,各该数据读写指令包括所对应的执行线程组的指示标识,一个指示标识用于标识一个执行线程组,该M个执行线程组是该主机设备对目标进程包括的N个执行线程进行分组而确定的,该主机设备确定的用于执行该目标进程的N个内核与该N个执行线程一一对应,N≥2,一个执行线程仅属于一个执行线程组,一个执行线程组包括至少一个执行线程,M≥2;
S220,根据该指示标识,确定各执行线程组所对应的数据读写指令;
S230,将各数据读写指令传输至所对应的执行线程组,以使各该执行线程根据所获得的数据读写指令,在该存储设备中进行数据读写操作。
可选地,该M个执行线程组具体是该主机设备根据该目标进程的期望完成时长以及各该内核彼此之间的数据传输时延,对该N个执行线程进行分组而确定的。
可选地,各该执行线程与该存储设备之间的数据传输是基于直接存储DMA协议进行的。
可选地,该接收主机设备发送的M个数据读写指令,包括:
接收主机设备发送的数据读写信号,该数据读写信号包括M个信号分量,该M个信号分量与该M个数据读写指令一一对应,各数据读写指令承载于该对应的信号分量中。
可选地,该接收主机设备发送的M个数据读写指令,包括:
接收该主机设备通过该目标进程包括的主控线程发送的M个数据读写指令。
可选地,该主控线程属于该执行线程。
根据本发明实施例的读写数据的方法400的执行主体可对应上述存储器,其具体流程与上述存储器的动作相似,在此不再赘述。
根据本发明的读写数据的方法,主机设备对用于执行目标进程的N个内核进行分组,并对该N个内核所对应的N个执行线程进行分组以确定M个执行线程组,并在发送给存储设备的数据读写指令中携带与该数据读写指令所对应的执行线程组的指示标识,从而存储设备能够根据该指示标识,识别该数据读写指令所对应的线程组,进而存储设备能够将该数据读写指令转发给该数据读写指令所对应的线程组,能够使各执行线程根据从该存储设备获得的数据读写指令,在该存储设备中进行数据读写操作,从而能够减少在进行数据读写操作时内核之间的信令和数据传输,进而减少因该信令和数据传输而导致的处理时延,能够缩短多核计算机***中数据读写操作的完成时间,并且能够实现实时操作***对多CPU的扩展。
上文中,结合图1至图3,详细描述了根据本发明实施例的读写数据的方法,下面,将结合图4和图5,详细描述根据本发明实施例的读写数据的的装置。
图4示出了根据本发明一实施例的读写数据的装置300的示意性框图。如图4所示,该装置300包括:
确定单元310,用于确定用于执行目标进程的N个内核,其中,该N个内核与该目标进程包括的N个执行线程一一对应,N≥2;
分组单元320,用于对该N个执行线程进行分组,以确定M个执行线程组,并为各该执行线程组分配指示标识,其中,一个指示标识用于标识一个执行线程组,一个执行线程仅属于一个执行线程组,一个执行线程组包括至少一个执行线程,M≥2;
发送单元330,用于向存储设备发送M个数据读写指令,该M个数据读写指令与该M个执行线程组一一对应,各该数据读写指令包括所对应的执行线程组的指示标识,以便于该存储设备根据各该指示标识将各该数据读写指令传输至所对应的执行线程组,以使各该执行线程根据从该存储设备获得的数据读写指令,在该存储设备中进行数据读写操作。
并且,可选地,该读写数据的装置为计算机***中的主机设备。
具体地说,作为装置300,可以列举计算机***中的主机设备,并且,该主机设备具有多个CPU(或者说,内核),其中,该多个CPU可以协同作业以完成目标任务,例如,每个CPU可以分别运行与该目标任务相对应的进程中的部分(一个或多个)线程。多个CPU彼此之间通信连接,从而可以通过信号交换等方式,实现数据共享。
此外,该计算机***还包括存储设备,该存储设备用于提供存储功能,主机设备在执行目标任务时可以访问该存储设备中的存储空间,进行针对在执行目标任务时产生的信号或数据等的读写操作(或者说,存储操作)。在本发明实施例中,存储设备可以支持各种存储介质,可选地,该存储设备还可以包括存储接口扩展模块,可以连接至少一个固态硬盘(SSD,Solid State Disk)和/或混合硬盘(HHD,Hybrid Hard Disk)从而可以根据需要扩大存储设备的容量。
在本发明实施例中,主机设备与存储设备之间可以通过能够实现数据传输各种计算机接口连接,例如,高速外设部件互连(PCIE,Peripheral Component Interconnect Express)接口、雷电(Thunderbolt)接口、无限带宽(Infiniband)接口、高速通用串行总线(USB,Universal Serial Bus)接口以及高速以太网接口等。
下面,分别对主机设备中各模块的功能进行详细说明。
A.确定单元310
当主机设备确定需要执行目标任务时,该确定单元310可以从主机设备所包括的所有CPU中,确定用于执行该目标任务(即,目标进程)的N个CPU(即,内核),以下,为了便于理解和说明,将用于执行该目标任务的N个CPU记做:CPU#1~CPU#N。
作为示例而非限定,确定单元310可以根据执行该目标任务所需要的运算量,来确定上述“N”的具体数值,例如,如果该目标任务所需要的运算 量较大,为了快速完成该任务,确定单元310可以使上述“N”的数值较大;如果该目标任务所需要的运算量较小,仅需较少的CPU便能够快速完成该任务,则确定单元310可以使上述“N”的数值较小。
另外,该CPU#1~CPU#N可以分别用于执行上述目标任务的N个线程(即,执行线程),以下,为了便于理解和说明,将该N个线程记做:线程#1~线程#N,即,CPU#1~CPU#N与线程#1~线程#N一一对应,作为示例而非限定,上述“一一对应”的对应规则可以为,一个CPU用于控制序号相同的线程的运行。
应理解,以上列举的确定单元310确定用于执行该目标任务的N个CPU的方法以及所使用的参数仅为示例性说明,本发明并未特别限定,例如,确定单元310还可以根据预设的数值,默认为所执行的所有任务使用的CPU的数量均相同,例如,该预设的数值可以为主机设备所包括的所有CPU的总和。
需要说明的是,本发明致力于解决因CPU之间的数据传输时延而导致的数据读写操作的完成时间的影响,因此,当N≥2时,能够充分体现本发明的技术效果,随后对技术效果进行详细说明。
图2是本发明一实施例的读写数据的流程的示意图,在图2所示示例中,N为29,即,确定单元310确定29个CPU来执行目标进程,该目标进程包括29个执行线程。
B.分组单元320
用于将如上所述确定的CPU#1~CPU#N分为M个CPU组,或者说,用于将如上所述确定的CPU#1~CPU#N所对应的线程#1~线程#N分为M个线程组。作为分组依据,例如,可以列举以下规则:
可选地,该分组单元具体用于根据该目标进程的期望完成时长以及各该内核彼此之间的数据传输时延,对该N个执行线程进行分组。
具体地说,分组单元320可以确定上述CPU#1~CPU#N彼此之间的数据传输时延,例如,该分组单元320可以获取CPU#1~CPU#N的型号、彼此之间连接方式等信息,从而可以根据上述信息,推算出CPU#1~CPU#N彼此之间的数据传输时延。
应理解,以上列举的分组单元320确定CPU#1~CPU#N彼此之间的数据传输时延的方法以及所使用的参数仅为示例性说明,本发明并未特别限定, 例如,分组单元320还可以通过试验等方式,检测CPU#1~CPU#N彼此之间的数据传输时延。
并且,分组单元320还可以确定该目标进程的期望完成时长,其中,该该目标进程需要在规定的时间内完成,该期望完成时长可以是该目标进程从开始执行到结束执行(例如,可以包括CPU判定该任务执行成功并退出进程的时间)所经历的时间,并且,该期望完成时长可以小于或等于上述规定的时间。
作为示例而非限定,该分组单元320可以根据该目标进程的类型、处理优先级等属性信息,确定该目标进程的期望完成时长,例如,如果该目标进程的类型指示该目标进程的业务属于实时类型业务(例如,在线游戏,视频通话等),则可以确定该目标进程的紧急程度较高,且需要在较短时间内完成,从而可以确定目标进程的期望完成时长较短(例如,低于一个预设的门限值);再例如,如果该目标进程的处理优先级被标记为高时,则可以确定该目标进程的紧急程度较高,且需要在较短时间内完成,从而可以确定目标进程的期望完成时长较短(例如,低于一个预设的门限值)。
从而,分组单元320可以基于如上所述确定的CPU#1~CPU#N彼此之间的数据传输时延以及目标进程的期望完成时长,对CPU#1~CPU#N进行分组,以使包括各CPU组内的CPU彼此之间的数据传输时延之和在内的目标进程的完成时长小于或等于目标进程的期望完成时长,例如:
分组单元320可以根据各CPU的处理能力,推算在CPU彼此之间不发生数据传输的情况下完成该目标进程的时长,以下,简称该目标进程的参考完成时长,从而可以获得该目标进程的期望完成时长的该目标进程的参考完成时长的差值,可以基于上述结果进而对CPU#1~CPU#N进行分组,以使各组内的CPU彼此之间的数据传输时延之和小于或等于上述差值。
应理解,以上列举的分组依据仅为示例性说明,本发明并未限定于此,例如,分组单元320还可以基于预设的基准值K,将CPU#1~CPU#N分为M个CPU组,在该M个CPU组中,至少M-1个CPU组所包括的CPU的数量等于该基准值K,或者说,至多1个CPU组所包括的CPU的数量小于该基准值K,无需赘言,各CPU组所包括的CPU的数量为大于零的整数。
并且,在本发明实施例中,该预设的基准值K可以根据计算机***的负载等参数适当变更,例如,如果当前计算机***的负载较大,则可以采用较 小的K值。
在图2所示示例中,K为8,因此,29个CPU被分为4个CPU组,即,CPU组#1~CPU组#4。其中,CPU组#1包括8个CPU,即,CPU#1~CPU#8;CPU组#2包括8个CPU,即,CPU#9~CPU#16;CPU组#3包括8个CPU,即,CPU#17~CPU#24;CPU组#4包括5个CPU,即,CPU#25~CPU#29。
同样,29个线程(即,执行线程)被分为4个线程组,即,线程组#1~线程组#4。其中,线程组#1包括8个线程,即,线程#1~线程#8;线程组#2包括8个线程,即,线程#9~线程#16;线程组#3包括8个线程,即,线程#17~线程#24;线程组#4包括5个线程,即,线程#25~线程#29。
C.发送单元330
用于向存储设备发送M个数据读写指令,具体地说,发送单元330可以从各CPU获取数据读写指令,并且,可以根据如上述划分的CPU组或线程组,为各数据读写指令添加指示标识,以指示各数据读写指令所来自的CPU组,或者说,各数据读写指令所对应的线程组。
从而,存储设备可以通过接收单元接收上述M个数据读写指令,并且,可以通过确定单元,根据各数据读写指令所携带的指示标识,确定各数据读写指令所对应的线程组,或者说,各数据读写指令所对应的CPU组。
其后,存储设备可以通过发送单元,根据数据读写指令所携带的指示标识,将数据读写指令传输至该数据读写指令所携带的指示标识所指示的线程组,从而,各线程能够获得来自所对应的CPU的读写指令,进而,能够根据该数据读写指令在存储设备的存储空间中进行数据读写操作。
可选地,各该执行线程与该存储设备之间的数据传输是基于直接存储DMA协议进行的。
具体地说,存储器直接访问(DMA,Direct Memory Access)是指一种高速的数据传输操作,允许在外部设备和存储器之间直接读写数据,既不通过CPU,也不需要CPU干预。例如,可以使整个数据传输操作在一个称为“DMA控制器”的控制下进行。CPU除了在数据传输开始和结束时做一点处理外,在传输过程中还可以进行其他的工作。即,在本发明实施例中,该读写数据的装置300还可以具有DMA控制器,并由DMA控制器控制各线程在存储设备中的数据读写操作。
实现DMA传送的基本操作如下:
(1)DMA控制器向CPU发出DMA请求:
(2)CPU响应DMA请求,***转变为DMA工作方式,并把总线控制权交给DMA控制器;
(3)由DMA控制器发送存储器地址,并决定传送数据块的长度;
(4)执行DMA传送;
(5)DMA操作结束,并把总线控制权交还CPU。
应理解,以上列举的DMA传输的实现方式仅为示例性说明,本发明并未特别限定,也可以使用现有技术中能够实现DMA传输的方法。例如,在本发明实施例中,可以通过软件或程序等实现上述DMA控制器的功能。
可选地,该发送单元330具体用于向存储设备发送数据读写信号,该数据读写信号包括M个信号分量,该M个信号分量与该M个数据读写指令一一对应,各数据读写指令承载于该对应的信号分量中。
具体地说,发送单元330可以将上述各数据读写指令承载于同一信号(或者说,信号流)中一并发送给存储设备。
应理解,以上列举的发送单元330向存储设备发送数据读写指令的方法仅为示例性说明,本发明并未限定于此,发送单元330也可以将各数据读写指令分别承载于独立的信号中,发送给存储设备。
可选地,该发送单元具体用于通过该目标进程包括的主控线程,向存储设备发送M个数据读写指令。
具体地说,在本发明实施例中,可以配置主控CPU以及与该主控CUP相对应的主控线程,即,主控CPU可以确定上述CPU#1~CPU#N的各数据读写指令,并通过该主控线程向存储设备发送上述数据读写指令。
可选地,该主控线程属于该执行线程。
具体地说,在本发明实施例中,可以从上述CPU#1~CPU#N中选择一个CPU作为主控CPU,与该主控CPU相对应的线程作为主控线程。即,该主控线程可以即用于向存储设备传输上述数据读写指令,也可以用于访问存储设备的存储空间,以进行数据读写操作。
下面结合图2,对本发明实施例的读写数据的流程进行详细说明。
在图2所示实施例中,由29个CPU来执行目标进程,该目标进程包括29个执行线程,并且,29个CPU被分为4个CPU组,即,CPU组#1~CPU组#4。其中,CPU组#1包括8个CPU,即,CPU#1~CPU#8;CPU组#2包 括8个CPU,即,CPU#9~CPU#16;CPU组#3包括8个CPU,即,CPU#17~CPU#24;CPU组#4包括5个CPU,即,CPU#25~CPU#29。对应的,29个线程(即,执行线程)被分为4个线程组,即,线程组#1~线程组#4。其中,线程组#1包括8个线程,即,线程#1~线程#8;线程组#2包括8个线程,即,线程#9~线程#16;线程组#3包括8个线程,即,线程#17~线程#24;线程组#4包括5个线程,即,线程#25~线程#29。
并且,在图2所示实施例中,CPU#0为主控CPU,线程#0为主控线程。
在步骤1,线程#0向存储器发送请求信号,该请求信号中携带有在本周期内需要进行数据读写操作的各CPU的数据读写请求,并且,各数据读写请求中分别携带有所对应的线程组的指示标识。
在步骤2,存储设备可以根据各数据读写请求中携带的指示标识,确定各数据读写请求所对应的线程组,并且,存储设备对该请求信号进行拆分,以线程组为单位,生成与各线程组相对应的响应信号,并且,各响应信号中承载的数据读写指令的指示标识均指向同一线程组。
在步骤3,存储器可以将根据指示标识,将各响应信号发送至所对应的线程组。
在步骤4,各线程组可以在所对应的CUP组的控制下,基于来自存储设备的数据读写指令,在存储设备中进行数据读写操作。
另外,在本发明实施例中,主控线程可以检测各线程组的数据读写操作完成情况,并且,主控线程可以在某一线程组内的所有线程均完成数据读写操作之后,立即结束对该线程组的控制,或者,主控线程也可以在全部线程组内的所有线程均完成数据读写操作之后,统一结束对所有线程组的控制。
或者,在本发明实施例中,存储设备可以检测其存储空间内的数据读写操作完成情况,并且,存储设备可以在某一线程组内的所有线程均完成数据读写操作之后,立即通知主控线程,以使主控线程结束对该线程组的控制,或者,存储设备也可以在全部线程组内的所有线程均完成数据读写操作之后,统一通知主控线程,以使主控线程结束对所有线程组的控制。
衡量一个实时操作***的重要指标,是它从接收一个任务,到完成该任务所需的时间,其时间的变化称为抖动。设计实时操作***的首要目标不是高的吞吐量,而是保证任务在特定时间内完成。
但是,目前的实时操作***对多CPU或者说,多核的支持有限。其原 因是多核实时操作***存在较大的核间数据传输时延。具体地说,通常CPU之间采用非互联(mesh)的快速通道互联(QPI,QuickPath Interconnect)连接方式,当一个实时的任务需要跨多个CPU配合执行的时候,例如,一个CPU在缓存设备中的所读写的数据需要经由另一个CPU的转发,就会产生较大的时延,随即产生一连串的时延反应,从而致使整个***无法正常运行。从而,当执行运算量较大的任务时,仍然仅能依靠数量有限的内核,导致执行时间增长,达不到对实时操作***的要求。
与此相对,根据本发明的读写数据的装置,主机设备对用于执行目标进程的N个内核进行分组,并对该N个内核所对应的N个执行线程进行分组以确定M个执行线程组,并在发送给存储设备的数据读写指令中携带与该数据读写指令所对应的执行线程组的指示标识,从而存储设备能够根据该指示标识,识别该数据读写指令所对应的线程组,进而存储设备能够将该数据读写指令转发给该数据读写指令所对应的线程组,能够使各执行线程根据从该存储设备获得的数据读写指令,在该存储设备中进行数据读写操作,从而能够减少在进行数据读写操作时内核之间的信令和数据传输,进而减少因该信令和数据传输而导致的处理时延,能够缩短多核计算机***中数据读写操作的完成时间,并且能够实现实时操作***对多CPU的扩展。
图5示出了根据本发明一实施例的存储设备400的示意性框图。如图5所示,存储设备400包括:
传输接口410,用于该存储设备与主机设备之间的通信;
存储空间420,用于存储数据;
控制器430,用于通过该传输接口接收该主机设备发送的M个数据读写指令,该M个数据读写指令与M个执行线程组一一对应,各该数据读写指令包括所对应的执行线程组的指示标识,一个指示标识用于标识一个执行线程组,该M个执行线程组是该主机设备对目标进程包括的N个执行线程进行分组而确定的,该主机设备确定的用于执行该目标进程的N个内核与该N个执行线程一一对应,N≥2,一个执行线程仅属于一个执行线程组,一个执行线程组包括至少一个执行线程,M≥2,用于根据该指示标识,确定各执行线程组所对应的数据读写指令,通过该传输接口,用于将各数据读写指令传输至所对应的执行线程组,以使各该执行线程根据所获得的数据读写指令,在该存储设备400中进行数据读写操作。
可选地,该M个执行线程组具体是该控制器420根据该目标进程的期望完成时长以及各该内核彼此之间的数据传输时延,对该N个执行线程进行分组而确定的。
可选地,各该执行线程与该存储设备之间的数据传输是基于直接存储DMA协议进行的。
可选地,该控制器420具体用于通过该传输接口接收该主机设备发送的数据读写信号,该数据读写信号包括M个信号分量,该M个信号分量与该M个数据读写指令一一对应,各数据读写指令承载于该对应的信号分量中。
可选地,该M个数据读写指令是该主机设备通过该目标进程包括的主控线程发送的。
可选地,该主控线程属于该执行线程。
需要说明的是,在本发明实施例中,存储设备400将M个数据读写指令发送至各执行线程组之后,根据来自各执行线程组的数据读写指令在其存储空间内进行数据读写操作的过程可以与现有技术中数据读写过程相似,这里,为了避免赘述,省略其详细说明
在本发明实施例中,该控制器可以实现或者执行本发明方法实施例中的公开的各步骤及逻辑框图。控制器可以是任何常规的处理器等。结合本发明实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用解码处理器中的硬件及软件模块组合执行完成。软件模块可以位于上述存储空间中,例如,随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质。控制器读取上述存储空间中的信息,结合其硬件完成上述方法的步骤
存储设备400可以对应于以上说明中的存储设备,并且,该存储设备400包含的各模块和单元的作用与上述存储设备中对应模块或单元的作用相似,这里,为了避免赘述,省略其详细说明。
该存储设备400可以是只读存储器和随机存取存储器,并向主机设备提供指令和数据。存储设备400的一部分还可以包括非易失性随机存取存储器。例如,存储设备400还可以存储设备类型的信息。
根据本发明的存储设备,主机设备对用于执行目标进程的N个内核进行分组,并对该N个内核所对应的N个执行线程进行分组以确定M个执行线程组,并在发送给存储设备的数据读写指令中携带与该数据读写指令所对应 的执行线程组的指示标识,从而存储设备能够根据该指示标识,识别该数据读写指令所对应的线程组,进而存储设备能够将该数据读写指令转发给该数据读写指令所对应的线程组,能够使各执行线程根据从该存储设备获得的数据读写指令,在该存储设备中进行数据读写操作,从而能够减少在进行数据读写操作时内核之间的信令和数据传输,进而减少因该信令和数据传输而导致的处理时延,能够缩短多核计算机***中数据读写操作的完成时间,并且能够实现实时操作***对多CPU的扩展。
图6是本发明另一实施例的计算机***500的示意性流程图,如图6所示,该计算机***500包括:
总线510
与该总线相连的主机设备520,用于确定用于执行目标进程的N个内核,其中,该N个内核与该目标进程包括的N个执行线程一一对应,N≥2,对该N个执行线程进行分组,以确定M个执行线程组,并为各执行线程组分配指示标识,其中,一个指示标识用于标识一个执行线程组,一个执行线程仅属于一个执行线程组,一个执行线程组包括至少一个执行线程,M≥2,通过该总线510向存储设备发送M个数据读写指令,该M个数据读写指令与该M个执行线程组一一对应,各该数据读写指令包括所对应的执行线程组的指示标识,一个指示标识用于唯一地标识一个执行线程组;
与该总线相连的存储设备530,用于通过该总线510接收该M个数据读写指令,并根据该指示标识,确定各执行线程组所对应的数据读写指令,将各数据读写指令传输至所对应的执行线程组,以使各该执行线程根据所获得的数据读写指令,在该存储设备中进行数据读写操作。
可选地,该M个执行线程组是该主机设备根据该目标进程的期望完成时长以及各该内核彼此之间的数据传输时延,对该N个执行线程进行分组而确定的
可选地,各该执行线程与该存储设备之间的数据传输是基于直接存储DMA协议进行的。
上述主机设备510可对应于本发明实施例的读写数据的装置300。上述存储设备520可对应于本发明实施例的读写数据的装置200,为了简洁,其功能在此不再赘述。
根据本发明的读写数据的***,主机设备对用于执行目标进程的N个内 核进行分组,并对该N个内核所对应的N个执行线程进行分组以确定M个执行线程组,并在发送给存储设备的数据读写指令中携带与该数据读写指令所对应的执行线程组的指示标识,从而存储设备能够根据该指示标识,识别该数据读写指令所对应的线程组,进而存储设备能够将该数据读写指令转发给该数据读写指令所对应的线程组,能够使各执行线程根据从该存储设备获得的数据读写指令,在该存储设备中进行数据读写操作,从而能够减少在进行数据读写操作时内核之间的信令和数据传输,进而减少因该信令和数据传输而导致的处理时延,能够缩短多核计算机***中数据读写操作的完成时间,并且能够实现实时操作***对多CPU的扩展。
应理解,在本发明的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (28)

  1. 一种读写数据的方法,其特征在于,所述方法包括:
    主机设备确定用于执行目标进程的N个内核,其中,所述N个内核与所述目标进程包括的N个执行线程一一对应,N≥2;
    对所述N个执行线程进行分组,以确定M个执行线程组,并为各执行线程组分配指示标识,其中,一个指示标识用于标识一个执行线程组,一个执行线程仅属于一个执行线程组,一个执行线程组包括至少一个执行线程,M≥2;
    向存储设备发送M个数据读写指令,所述M个数据读写指令与所述M个执行线程组一一对应,各所述数据读写指令包括所对应的执行线程组的指示标识,以便于所述存储设备根据各所述数据读写指令包括的指示标识,确定各所述数据读写指令所对应的执行线程组,并将各所述数据读写指令传输至所对应的执行线程组,以使各所述执行线程根据从所述存储设备获得的数据读写指令,在所述存储设备中进行数据读写操作。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述N个执行线程进行分组,包括:
    确定所述目标进程的期望完成时长以及各所述内核彼此之间的数据传输时延;
    根据所述期望完成时长和所述数据传输时延,对所述N个执行线程进行分组。
  3. 根据权利要求1或2所述的方法,其特征在于,各所述执行线程与所述存储设备之间的数据传输是基于直接存储DMA协议进行的。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述向存储设备发送M个数据读写指令,包括:
    向存储设备发送数据读写信号,所述数据读写信号包括M个信号分量,所述M个信号分量与所述M个数据读写指令一一对应,各数据读写指令承载于所述对应的信号分量中。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述向存储设备发送M个数据读写指令,包括:
    通过所述目标进程包括的主控线程,向存储设备发送M个数据读写指令。
  6. 根据权利要求5所述的方法,其特征在于,所述主控线程属于所述执行线程。
  7. 一种读写数据的方法,其特征在于,所述方法包括:
    存储设备接收主机设备发送的M个数据读写指令,所述M个数据读写指令与M个执行线程组一一对应,各所述数据读写指令包括所对应的执行线程组的指示标识,一个指示标识用于标识一个执行线程组,所述M个执行线程组是所述主机设备对目标进程包括的N个执行线程进行分组而确定的,所述主机设备确定的用于执行所述目标进程的N个内核与所述N个执行线程一一对应,N≥2,一个执行线程仅属于一个执行线程组,一个执行线程组包括至少一个执行线程,M≥2;
    根据所述指示标识,确定各执行线程组所对应的数据读写指令;
    将各数据读写指令传输至所对应的执行线程组,以使各所述执行线程根据所获得的数据读写指令,在所述存储设备中进行数据读写操作。
  8. 根据权利要求7所述的方法,其特征在于,所述M个执行线程组具体是所述主机设备根据所述目标进程的期望完成时长以及各所述内核彼此之间的数据传输时延,对所述N个执行线程进行分组而确定的。
  9. 根据权利要求7或8所述的方法,其特征在于,各所述执行线程与所述存储设备之间的数据传输是基于直接存储DMA协议进行的。
  10. 根据权利要求7至9所述的方法,其特征在于,所述接收主机设备发送的M个数据读写指令,包括:
    接收主机设备发送的数据读写信号,所述数据读写信号包括M个信号分量,所述M个信号分量与所述M个数据读写指令一一对应,各数据读写指令承载于所述对应的信号分量中。
  11. 根据权利要求7至10中任一项所述的方法,其特征在于,所述接收主机设备发送的M个数据读写指令,包括:
    接收所述主机设备通过所述目标进程包括的主控线程发送的M个数据读写指令。
  12. 根据权利要求11所述的方法,其特征在于,所述主控线程属于所述执行线程。
  13. 一种读写数据的装置,其特征在于,所述装置包括:
    确定单元,用于确定用于执行目标进程的N个内核,其中,所述N个 内核与所述目标进程包括的N个执行线程一一对应,N≥2;
    分组单元,用于对所述N个执行线程进行分组,以确定M个执行线程组,并为各所述执行线程组分配指示标识,其中,一个指示标识用于标识一个执行线程组,一个执行线程仅属于一个执行线程组,一个执行线程组包括至少一个执行线程,M≥2;
    发送单元,用于向存储设备发送M个数据读写指令,所述M个数据读写指令与所述M个执行线程组一一对应,各所述数据读写指令包括所对应的执行线程组的指示标识,以便于所述存储设备根据各所述指示标识将各所述数据读写指令传输至所对应的执行线程组,以使各所述执行线程根据从所述存储设备获得的数据读写指令,在所述存储设备中进行数据读写操作。
  14. 根据权利要求13所述的装置,其特征在于,所述分组单元具体用于根据所述目标进程的期望完成时长以及各所述内核彼此之间的数据传输时延,对所述N个执行线程进行分组。
  15. 根据权利要求13或14所述的装置,其特征在于,各所述执行线程与所述存储设备之间的数据传输是基于直接存储DMA协议进行的。
  16. 根据权利要求13至15中任一项所述的装置,其特征在于,所述发送单元具体用于向存储设备发送数据读写信号,所述数据读写信号包括M个信号分量,所述M个信号分量与所述M个数据读写指令一一对应,各数据读写指令承载于所述对应的信号分量中。
  17. 根据权利要求13至16中任一项所述的装置,其特征在于,所述发送单元具体用于通过所述目标进程包括的主控线程,向存储设备发送M个数据读写指令。
  18. 根据权利要求17所述的装置,其特征在于,所述主控线程属于所述执行线程。
  19. 根据权利要求13至16中任一项所述的装置,其特征在于,所述装置为计算机***中的主机设备。
  20. 一种存储设备,其特征在于,包括:
    传输接口,用于所述存储设备与主机设备之间的通信;
    存储空间,用于存储数据;
    控制器,用于通过所述传输接口接收所述主机设备发送的M个数据读写指令,所述M个数据读写指令与M个执行线程组一一对应,各所述数据 读写指令包括所对应的执行线程组的指示标识,一个指示标识用于标识一个执行线程组,所述M个执行线程组是所述主机设备对目标进程包括的N个执行线程进行分组而确定的,所述主机设备确定的用于执行所述目标进程的N个内核与所述N个执行线程一一对应,N≥2,一个执行线程仅属于一个执行线程组,一个执行线程组包括至少一个执行线程,M≥2,用于根据所述指示标识,确定各执行线程组所对应的数据读写指令,通过所述传输接口,用于将各数据读写指令传输至所对应的执行线程组,以使各所述执行线程根据所获得的数据读写指令,在所述存储设备中进行数据读写操作。
  21. 根据权利要求20所述的装置,其特征在于,所述M个执行线程组具体是所述控制器根据所述目标进程的期望完成时长以及各所述内核彼此之间的数据传输时延,对所述N个执行线程进行分组而确定的。
  22. 根据权利要求20或21所述的装置,其特征在于,各所述执行线程与所述存储设备之间的数据传输是基于直接存储DMA协议进行的。
  23. 根据权利要求20至22中任一项所述的装置,其特征在于,所述控制器具体用于通过所述传输接口接收所述主机设备发送的数据读写信号,所述数据读写信号包括M个信号分量,所述M个信号分量与所述M个数据读写指令一一对应,各数据读写指令承载于所述对应的信号分量中。
  24. 根据权利要求20至23中任一项所述的装置,其特征在于,所述M个数据读写指令是所述主机设备通过所述目标进程包括的主控线程发送的。
  25. 根据权利要求24所述的装置,其特征在于,所述主控线程属于所述执行线程。
  26. 一种计算机***,其特征在于,包括:
    总线;
    与所述总线相连的主机设备,用于确定用于执行目标进程的N个内核,其中,所述N个内核与所述目标进程包括的N个执行线程一一对应,N≥2,对所述N个执行线程进行分组,以确定M个执行线程组,并为各执行线程组分配指示标识,其中,一个指示标识用于标识一个执行线程组,一个执行线程仅属于一个执行线程组,一个执行线程组包括至少一个执行线程,M≥2,通过所述总线向存储设备发送M个数据读写指令,所述M个数据读写指令与所述M个执行线程组一一对应,各所述数据读写指令包括所对应的执行线程组的指示标识,一个指示标识用于唯一地标识一个执行线程组;
    与所述总线相连的存储设备,用于通过所述总线接收所述M个数据读写指令,并根据所述指示标识,确定各执行线程组所对应的数据读写指令,将各数据读写指令传输至所对应的执行线程组,以使各所述执行线程根据所获得的数据读写指令,在所述存储设备中进行数据读写操作。
  27. 根据权利要求26所述的***,其特征在于,所述M个执行线程组是所述主机设备根据所述目标进程的期望完成时长以及各所述内核彼此之间的数据传输时延,对所述N个执行线程进行分组而确定的。
  28. 根据权利要求26或27所述的***,其特征在于,各所述执行线程与所述存储设备之间的数据传输是基于直接存储DMA协议进行的。
PCT/CN2014/086925 2014-09-19 2014-09-19 读写数据的方法、装置、存储设备和计算机*** WO2016041191A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201680001619.1A CN106489132B (zh) 2014-09-19 2014-09-19 读写数据的方法、装置、存储设备和计算机***
PCT/CN2014/086925 WO2016041191A1 (zh) 2014-09-19 2014-09-19 读写数据的方法、装置、存储设备和计算机***
EP14902078.6A EP3188002A4 (en) 2014-09-19 2014-09-19 Method and apparatus for reading and writing data, storage device and computer system
US15/462,057 US10303474B2 (en) 2014-09-19 2017-03-17 Data read/write method and apparatus, storage device, and computer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/086925 WO2016041191A1 (zh) 2014-09-19 2014-09-19 读写数据的方法、装置、存储设备和计算机***

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/462,057 Continuation US10303474B2 (en) 2014-09-19 2017-03-17 Data read/write method and apparatus, storage device, and computer system

Publications (1)

Publication Number Publication Date
WO2016041191A1 true WO2016041191A1 (zh) 2016-03-24

Family

ID=55532472

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/086925 WO2016041191A1 (zh) 2014-09-19 2014-09-19 读写数据的方法、装置、存储设备和计算机***

Country Status (4)

Country Link
US (1) US10303474B2 (zh)
EP (1) EP3188002A4 (zh)
CN (1) CN106489132B (zh)
WO (1) WO2016041191A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022126532A1 (zh) * 2020-12-17 2022-06-23 华为技术有限公司 数据处理方法及装置

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110018782B (zh) * 2018-01-09 2022-06-17 阿里巴巴集团控股有限公司 一种数据读/写方法及相关装置
CN110688231B (zh) * 2018-07-04 2023-02-28 阿里巴巴集团控股有限公司 读写请求统计信息的处理方法、装置和***
CN109582521B (zh) * 2018-12-10 2022-04-29 浪潮(北京)电子信息产业有限公司 测试存储***读写性能的方法、装置、设备及介质
CN110806942B (zh) * 2019-11-08 2024-05-07 广州华多网络科技有限公司 数据处理的方法和装置
CN111104097B (zh) * 2019-12-13 2023-06-30 上海众源网络有限公司 一种数据写入、读取方法及装置
CN111258740A (zh) * 2020-02-03 2020-06-09 北京无限光场科技有限公司 一种用于启动应用程序的方法、装置和电子设备
CN114025032B (zh) * 2022-01-06 2022-04-22 深圳市聚能优电科技有限公司 Ems与bms的传输协议方法、***、设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101131652A (zh) * 2006-08-21 2008-02-27 英业达股份有限公司 多核多中央处理器的执行线程分配方法
WO2014021995A1 (en) * 2012-07-31 2014-02-06 Empire Technology Development, Llc Thread migration across cores of a multi-core processor
CN103838552A (zh) * 2014-03-18 2014-06-04 北京邮电大学 4g宽带通信***多核并行流水线信号的处理***和方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7861060B1 (en) * 2005-12-15 2010-12-28 Nvidia Corporation Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior
US7925860B1 (en) * 2006-05-11 2011-04-12 Nvidia Corporation Maximized memory throughput using cooperative thread arrays
CN100492339C (zh) 2007-01-22 2009-05-27 北京中星微电子有限公司 一种可实现芯片内多核间通信的芯片及通信方法
US20090255531A1 (en) * 2008-01-07 2009-10-15 Johnson Douglas E Portable system for assisting body movement
US8151008B2 (en) * 2008-07-02 2012-04-03 Cradle Ip, Llc Method and system for performing DMA in a multi-core system-on-chip using deadline-based scheduling
CN101751295B (zh) 2009-12-22 2012-08-29 浙江大学 多核架构下核间线程迁移的实现方法
CN102193779A (zh) * 2011-05-16 2011-09-21 武汉科技大学 一种面向MPSoC的多线程调度方法
KR20130093995A (ko) * 2012-02-15 2013-08-23 한국전자통신연구원 계층적 멀티코어 프로세서의 성능 최적화 방법 및 이를 수행하는 멀티코어 프로세서 시스템
CN103279445A (zh) * 2012-09-26 2013-09-04 上海中科高等研究院 运算任务的计算方法及超算***
US9842014B2 (en) * 2012-11-22 2017-12-12 Nxp Usa, Inc. Data processing device, method of execution error detection and integrated circuit

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101131652A (zh) * 2006-08-21 2008-02-27 英业达股份有限公司 多核多中央处理器的执行线程分配方法
WO2014021995A1 (en) * 2012-07-31 2014-02-06 Empire Technology Development, Llc Thread migration across cores of a multi-core processor
CN103838552A (zh) * 2014-03-18 2014-06-04 北京邮电大学 4g宽带通信***多核并行流水线信号的处理***和方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3188002A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022126532A1 (zh) * 2020-12-17 2022-06-23 华为技术有限公司 数据处理方法及装置

Also Published As

Publication number Publication date
CN106489132A (zh) 2017-03-08
US10303474B2 (en) 2019-05-28
EP3188002A1 (en) 2017-07-05
EP3188002A4 (en) 2018-01-24
US20170185401A1 (en) 2017-06-29
CN106489132B (zh) 2019-04-19

Similar Documents

Publication Publication Date Title
WO2016041191A1 (zh) 读写数据的方法、装置、存储设备和计算机***
CN100592271C (zh) 使用集成dma引擎进行高性能易失性磁盘驱动器存储器访问的装置和方法
US9946670B2 (en) Determining when to throttle interrupts to limit interrupt processing to an interrupt processing time period
US9600618B2 (en) Implementing system irritator accelerator FPGA unit (AFU) residing behind a coherent attached processors interface (CAPI) unit
US8990451B2 (en) Controller for direct access to a memory for the direct transfer of data between memories of several peripheral devices, method and computer program enabling the implementation of such a controller
US9684613B2 (en) Methods and systems for reducing spurious interrupts in a data storage system
US8928677B2 (en) Low latency concurrent computation
TWI636366B (zh) 資料冗餘的處理方法及其相關電腦系統
WO2017173618A1 (zh) 压缩数据的方法、装置和设备
US10990544B2 (en) PCIE root complex message interrupt generation method using endpoint
US20240143392A1 (en) Task scheduling method, chip, and electronic device
US10216634B2 (en) Cache directory processing method for multi-core processor system, and directory controller
CN115168256A (zh) 中断控制方法、中断控制器、电子设备、介质和芯片
US10162549B2 (en) Integrated circuit chip and method therefor
US11275632B2 (en) Broadcast command and response
US8756356B2 (en) Pipe arbitration using an arbitration circuit to select a control circuit among a plurality of control circuits and by updating state information with a data transfer of a predetermined size
CN102945214B (zh) 基于io延迟时间分布优化中断处理任务的方法
CN106598742B (zh) 一种ssd主控内部负载均衡***及方法
CN102929819B (zh) 用于处理计算机***中的存储设备的中断请求的方法
JP2014167818A (ja) データ転送装置およびデータ転送方法
CN110413562B (zh) 一种具有自适应功能的同步***和方法
CN112711442A (zh) 一种主机命令写入方法、设备、***及可读存储介质
CN106557429A (zh) 一种内存数据的迁移方法和节点控制器
CN114328345B (zh) 控制信息的处理方法、装置以及计算机可读存储介质
US11675718B2 (en) Enhanced low-priority arbitration

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14902078

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2014902078

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2014902078

Country of ref document: EP