WO2020024207A1 - 处理业务请求的方法、装置与存储*** - Google Patents

处理业务请求的方法、装置与存储*** Download PDF

Info

Publication number
WO2020024207A1
WO2020024207A1 PCT/CN2018/098277 CN2018098277W WO2020024207A1 WO 2020024207 A1 WO2020024207 A1 WO 2020024207A1 CN 2018098277 W CN2018098277 W CN 2018098277W WO 2020024207 A1 WO2020024207 A1 WO 2020024207A1
Authority
WO
WIPO (PCT)
Prior art keywords
processor cores
request
processor
core
cores
Prior art date
Application number
PCT/CN2018/098277
Other languages
English (en)
French (fr)
Inventor
卢玥
余思
龚骏辉
毛依平
陈贞
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2018/098277 priority Critical patent/WO2020024207A1/zh
Priority to CN201880005605.6A priority patent/CN110178119B/zh
Publication of WO2020024207A1 publication Critical patent/WO2020024207A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Definitions

  • the present application relates to the field of information technology, and more particularly, to a method, an apparatus, and a processor for processing a service request.
  • the central processing unit (CPU) of the array controller is a key factor affecting system performance.
  • a method for processing a service request in a storage system includes multiple processor cores, including: receiving a request for a current stage of a service request, where the current stage request is the service request A request of one of the multiple stages of the request; determining a first set of processor cores to execute the request of the current stage, the first set of processor cores being one processor core of the plurality of processor cores Send the request of the current stage to the first processor core set with the lightest load processor core.
  • this application processes business requests.
  • the method can ensure load balancing between processor cores, determine the set of processor cores for each stage of business request, and schedule the current stage of requests within the scope of the processor set.
  • the processor core considers the correlation between the request of each stage and the delay that affects the processing of the request by the processor core, and reduces the delay of processing service requests.
  • the determining a first set of processor cores to execute the request of the current phase includes: querying a core binding relationship, determining the first set of processor cores to execute the request of the current phase, and The core binding relationship is used to indicate an association relationship between the request in the current stage and the first processor core set.
  • the method further includes: re-determining the number of processor cores that executes the request of the current stage according to the first set of processor cores; Re-determining the number of processor cores that execute the request of the current phase, and allocating, in the plurality of processor cores, the request of the current phase to a second set of processor cores that meets the number; according to the The second processor core set generates a new core binding relationship, where the new core binding relationship is used to indicate an association relationship between the request at the current stage and the second processor core set.
  • re-determining the number of processor cores that executes the request in the current phase according to the first set of processor cores includes: determining a utilization of the processor cores in the first set of processor cores And the average utilization rate of the plurality of processor cores; and re-determining the execution location according to the total utilization rate of the processor cores in the first processor core set and the average utilization rate of the plurality of processor cores. Describes the number of processor cores requested in the current phase.
  • processor cores for requests at the corresponding stage By periodically monitoring the utilization of processor cores in the storage system, and according to changes in the utilization of processor cores allocated for requests at any stage, reallocating processor cores for requests at the corresponding stage can be based on The change of the utilization rate of the processor cores is periodically adjusted to the processor cores allocated to the requests in the corresponding phases, thereby improving the load imbalance between the processor cores.
  • the number includes: re-determining the execution of the request of the current stage based on the following relationship according to the sum of the utilization of the processor cores in the first processor core set and the average utilization of the plurality of processor cores. Number of processor cores:
  • N U P / U ave
  • N is the number of re-determined processor cores executing the current stage request
  • U P is the total utilization of the processor cores in the first set of processor cores
  • U ave is the multiple processes. Processor core average utilization.
  • the allocating, in the plurality of processor cores, the request of the current stage for the current stage request to a second set of processor cores including: generating Multiple sets of allocation results.
  • Each set of allocation results includes a set of processor cores that satisfy the corresponding number of requests for reassignment for each stage of the request.
  • Multiple path lengths are determined for the multiple sets of allocation results, and each set of allocation results corresponds to a path Length, the path length L satisfies:
  • c i, i + 1 represents the communication volume generated by the interaction between the processor cores executing the requests in adjacent stages
  • d i, i + 1 represents the average topological distance between the processor cores executing the requests in the adjacent stages.
  • M is the number of requests in multiple stages of the service request; according to a set of allocation results corresponding to the shortest path length among the plurality of path lengths, the request in the current stage is allocated a second that satisfies the number Processor core collection.
  • processor cores allocated for the requests of each stage According to the determined number of processor cores allocated for the requests of each stage, generate multiple sets of processor core allocation results, determine multiple path lengths for the multiple sets of allocation results, and consider the allocation of processor cores for the requests of each stage
  • the topological distance between processor cores determines the allocation result corresponding to the shortest path length among multiple path lengths as the final processor core allocation result, thereby ensuring load balancing between processor cores and reducing the delay in processing business requests .
  • the first set of processor cores includes K processor cores, where K is an integer greater than or equal to 3, and the first processor core
  • the processor with the lightest load in the core set sending the request of the current stage includes: determining a scheduler for the request of the current stage among the K processor cores according to the sliding window length w and the sliding step d.
  • Region, the scheduling sub-region includes w processor cores, w is an integer greater than or equal to 2 and less than K, and d is an integer greater than or equal to 1 and less than K; the load is the largest among the w processor cores
  • the light processor core sends the request at the current stage.
  • the search range of the processor core with the lightest search load is narrowed, so that the processor core with the lightest load in the scheduling subregion executes the request of the corresponding stage To ensure load balancing between processor cores and further reduce the delay in processing business requests.
  • the d and the K are prime numbers each other.
  • a configuration method for processing a service request including: configuring a first set of processor cores for a request in a first stage of a service request, the first set of processor cores being used to execute the first stage Request; configure a first rule, the first rule instructing to send the request of the first stage to the lightest-loaded processor core in the first processor core set.
  • the configuration method for processing a business request of the present application enables the processor to be guaranteed when processing a business request.
  • Load balancing between cores takes into account the correlation between the requests in each phase and the delay that affects the processor core's processing of requests in each phase, reducing the delay in processing business requests.
  • the method further includes: configuring a second set of processor cores for a request in a second phase of a service request, the second set of processor cores being used for execution The request of the second phase; configuring a second rule, the second rule instructing to send the request of the second phase to the lightest-loaded processor core in the second set of processor cores.
  • an apparatus for processing a service request is provided.
  • the apparatus is configured in a storage system, and the apparatus is configured to execute the method in any one of the possible implementation manners of the first aspect or the second aspect.
  • the apparatus may include a module for executing a method in any possible implementation manner of the first aspect or the second aspect.
  • a storage system includes a plurality of processor cores and a memory; the memory is configured to store computer instructions; and one or more of the plurality of processor cores are configured to execute Computer instructions stored in the memory, when the computer instructions in the memory are executed, the one or more processor cores are configured to execute any one of the possible implementations of the first aspect or the second aspect above method.
  • a computer-readable storage medium stores computer instructions, and when the computer instructions are run on a computer, the computer is caused to execute any one of the first aspect or the second aspect Methods in possible implementations.
  • a computer program product including computer instructions is provided, and when the computer instructions are run on a computer, the computer is caused to execute the method in any possible implementation manner of the first aspect or the second aspect.
  • FIG. 1 is a schematic diagram of a storage array architecture according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a controller of a storage array according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a distributed block storage system according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural block diagram of a server of a distributed block storage system.
  • FIG. 5 is a schematic block diagram of a processor according to an embodiment of the present invention.
  • FIG. 6 is a schematic flowchart of a method for processing a service request in a storage system according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of scheduling a processor core based on a sliding window mechanism according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a topology distance between logical cores sharing different levels of memory or cache under a NUMA architecture according to an embodiment of the present invention.
  • FIG. 9 is a schematic flowchart of a configuration method for processing a service request according to an embodiment of the present invention.
  • FIG. 10 is a schematic block diagram of an apparatus for processing a service request according to an embodiment of the present invention.
  • FIG. 11 is a schematic block diagram of a storage system according to an embodiment of the present invention.
  • the storage system in the embodiment of the present invention may be a storage array (such as Huawei Oceanstor 18000 series, V3 series).
  • the storage array includes a storage controller 101 and a plurality of hard disks.
  • the hard disks include a solid state disk (SSD), a mechanical hard disk, or a hybrid hard disk.
  • Mechanical hard disks such as HDD (hard disk drive).
  • the controller 101 includes a central processing unit (CPU) 201, a memory 202, and an interface 203.
  • the memory 202 stores computer instructions
  • the CPU 201 includes multiple processor cores (not shown in FIG. 2).
  • the CPU 201 executes computer instructions in the memory 202 to perform management and data access operations on the storage system.
  • a field programmable gate array (FPGA) or other hardware can also be used to perform all operations of the CPU 201 in the embodiment of the present invention, or the FPGA or other hardware and the CPU 201 are respectively used for The operations of the CPU 201 according to the embodiment of the present invention are performed.
  • the CPU 201 and the memory 202 are referred to as a processor, or the FPGA and other hardware replacing the CPU 201 are referred to as a processor, or the combination of the FPGA and other hardware replacing the CPU 201 and the CPU 201 are collectively referred to as a processor.
  • the processor is in communication with the interface 203.
  • the interface 203 may be a network interface card (NIC), a host bus adaptor (HBA), or the like.
  • the CPU 201 is configured to process a service request, such as receiving a service request sent by a host or a client, and use the method for processing a service request provided by an embodiment of the present invention to process the service request.
  • the storage system in the embodiment of the present invention may also be a distributed file storage system (such as Huawei of 9000 series), distributed block storage systems (such as Huawei of Series) and so on. Huawei of Series as an example.
  • a distributed block storage system includes multiple servers, such as server 1, server 2, server 3, server 4, server 5, and server 6, and infiniband technology or Ethernet is used between the servers Waiting to communicate with each other.
  • the number of servers in the distributed block storage system can be increased according to actual needs, which is not limited in the embodiment of the present invention.
  • the server of the distributed block storage system includes a structure as shown in FIG. 4.
  • each server in the distributed block storage system includes a central processing unit (CPU) 401, a memory 402, an interface 403, a hard disk 1, a hard disk 2, and a hard disk 3.
  • the memory 402 stores computer instructions
  • the CPU 401 includes multiple processor cores (not shown in FIG. 4), and the CPU 401 executes computer instructions in the memory 402 to perform corresponding operations.
  • the interface 403 may be a hardware interface, such as a network interface card (NIC) or a host bus adapter (HBA), or a program interface module.
  • the hard disk includes a solid state disk (SSD), a mechanical hard disk, or a hybrid hard disk. Mechanical hard disks such as HDD (hard disk drive).
  • a field programmable gate array (FPGA) or other hardware can also perform the corresponding operations in place of the CPU401, or FPGA or other hardware can perform the corresponding operations in conjunction with the CPU401.
  • the CPU 401 and the memory 402 are referred to as a processor, or the FPGA and other hardware replacing the CPU 401 are referred to as a processor, or the combination of the FPGA and other hardware replacing the CPU 401 and the CPU 401 are collectively referred to as a processor.
  • the interface 403 may be a network interface card (NIC), a host bus adapter (HBA), or the like.
  • the CPU 401 is configured to process a service request, such as receiving a service request sent by a host or a client, and use the method for processing a service request provided by an embodiment of the present invention to process the service request.
  • the load of the processor core is estimated based on the number of business requests to be processed on each processor core in a storage system containing multiple processor cores, and the business request is finally sent to the load in the storage system
  • the lightest for example, the least number of pending business requests
  • an embodiment of the present invention proposes a method for processing a service request.
  • the pending service request can be divided into multiple stages of request execution, and a certain number of processor cores (for example, processors) are allocated for each stage of the request. Core set), and send each stage request to the lightest-loaded processor core in the set of processor cores allocated for the request in this stage, as opposed to sending business requests to all processor cores in the storage system Lightest processor core.
  • factors that affect the delay such as the access delay, access distance, connection relationship between processors, or bus type, for each level of memory or cache accessed by the CPU (such as a processor core) are for each stage.
  • the method for processing service requests in the embodiments of the present invention can ensure load balancing among processor cores, and schedule requests at the current stage within the scope of the processor core set.
  • the access request can be divided into two phases: a resource waiting phase and a resource using phase.
  • requests in the resource waiting phase generally require special resources, such as disks, memory, and files. When resources are occupied by the previous request and not released, requests in the resource waiting phase are blocked until the resource can be used;
  • a request using the resource phase is a request that actually performs a data access phase.
  • the SCSI subsystem is a layered architecture, which is divided into three layers.
  • the top layer which is called the upper layer, represents the highest interface of the operating system kernel to access the SCSI protocol device and the driver of the main device type.
  • the middle layer also known as the common layer or unified layer, in this layer contains some of the higher and lower layers of the SCSI stack public services.
  • the lower layer represents the actual driver for the physical interface of the device that is suitable for the SCSI protocol.
  • SCSI-based access requests are also divided into three stages of requests.
  • the processor for example, the CPU 201 in FIG. 2 and the CPU 401 in FIG. 4 provided by the embodiment of the present invention is first introduced.
  • the processor in the embodiment of the present invention includes multiple processor cores (for example, processor core 0 to processor core S, S ⁇ 2), and one of the multiple processor cores
  • the load balancing module 501 and the binding core calculation module 502 are included.
  • the other processor cores include a scheduling module 503.
  • the load balancing module 501 is used to calculate the number of processor cores to be bound for each stage of a service request;
  • the core binding relationship calculation module 502 is used to allocate a request that satisfies a corresponding number of requests for each stage of a service request.
  • a processor core which in turn generates a binding relationship, which indicates a correspondence between a request for a phase of a service request and a set of processor cores that process the request for the phase;
  • a scheduling module 503 is configured to save the binding relationship and receive
  • a business request is made in a certain stage, query the binding core relationship, determine the set of processing cores used to execute the request in this stage, and send the request in this stage to the lightest-loaded processor core in the set of processor cores.
  • the request of this stage is executed by the processor core.
  • At least one processor core is provided with a listening module 504.
  • the listening module 504 is configured to monitor a service request from a host or a client. When a service request is sent, the service request is sent to the scheduling module 503 in the processor core.
  • the processor in the embodiment of the present invention is described above only by taking the load balancing module 501 and the core relationship calculation module 502 as an example to deploy in the processor core S, but the embodiment of the present invention is not limited to this.
  • the load balancing module 501 and the binding relationship calculation module 502 may be deployed in any one of the processor cores 0 to S, and the load balancing module 501 and the binding relationship calculation module 502 may be deployed in the same process.
  • Processor cores can also be deployed in different processor cores.
  • FIG. 6 shows a schematic flowchart of a method for processing a service request in a storage system, including steps 601 to 603.
  • the listening module 504 in the processor core listens to the service request from the host or the client
  • the service request in the current stage is a multiple of the service request.
  • the monitoring module 504 in the processor core 1 sends the request of the current stage to the scheduling module 503 in the processor core 1.
  • the scheduling module 503 in the processor core 1 determines a set of processor cores (for example, a first set of processor cores) that executes the request of the current phase for the received request of the current phase.
  • the scheduling module 503 may determine a first set of processor cores that executes the request of the current phase according to the specific type of the request of the current phase, and the first set of processor cores is a processor core of multiple processor cores in the storage system. set.
  • determining the first set of processor cores that executes the request of the current phase includes: querying the core binding relationship, determining the first set of processor cores used to execute the request of the current phase, and the binding core relationship is used to indicate An association relationship between the request in the current stage and the first processor core set.
  • the scheduling module 503 in the processor core 1 may query a binding core relationship, where the binding core relationship indicates a set of processor cores allocated for each stage of the service request request, and each processor core set includes A plurality of processor cores, and the scheduling module 503 in the processor core 1 determines a first set of processor cores that executes the request in the current stage according to the core-binding relationship.
  • the scheduling module 503 in the processor core 1 queries the binding core relationship to determine the processor core set including the processor core 1, the processor core 2, the processor core 4, the processor core 7 and the processor core 9 and the current core. There is an association relationship between the requests of the phases, and then the processor core set is determined as the first processor core set that executes the request of the current phase.
  • the scheduling module 503 in processor core 1 sends the service request to the lightest-loaded processing in the first set of processor cores Processor core, which executes the request at the current stage.
  • the scheduling module 503 in processor core 1 determines the lightest-loaded processing among processor core 1, processor core 2, processor core 4, processor core 7, and processor core 9 in the first set of processor cores.
  • the processor core is the processor core 7, and the scheduling module 503 in the processor core 1 sends a service request to the processor core 7, and the processor core 7 executes the request at the current stage.
  • the scheduling module 503 in the processor core 7 determines the The processor core set for the next stage request is sent to the processor core with the lightest load in the processor core set, and the processor core executes the request for the next stage.
  • a certain number of processor cores are allocated to the requests for each phase, and the requests for each phase are sent to the The lightest-loaded processor core in the set of processor cores requesting allocation, compared to sending a service request to the lightest-loaded processor core among multiple processor cores in a storage system, the embodiment of the present invention
  • the method can ensure load balancing among processor cores, determine the set of processor cores for each stage of business request, and schedule the current stage of requests within the scope of the processor set.
  • the processor core considers the correlation between the requests in each phase and the delay that affects the processing of the requests by the processor core, reducing the delay in processing business requests.
  • the first processor core set includes K processor cores, where K is an integer greater than or equal to 3, and the lightest-loaded processor core in the first processor core set is sent the current stage
  • the request includes: according to the sliding window length w and the sliding step d, determining a scheduling sub-region for the current stage request in the K processor cores, the scheduling sub-region includes w processor cores, and w is greater than or An integer equal to 2 and less than K, and d is an integer greater than or equal to 1 and less than K; sending the request of the current stage to the lightest-loaded processor core among the w processor cores.
  • the scheduling module 503 may send the request of the current phase to the lightest-loaded processor core in the first set of processor cores,
  • the processor core executes the request of the current stage; or, the processor core executing the request of the current stage may also be determined based on the sliding window mechanism.
  • the scheduling module 503 may use the sliding window length w and the sliding step d to determine the The phase request determines a scheduling sub-area, determines the lightest-loading processor core from the processor cores included in the scheduling sub-area, and sends the service request to the lightest-loading processor core in the scheduling sub-area.
  • the scheduling sub-region determined by the scheduling module 503 for the request of the current stage is shown in FIG. 7. It can be seen from FIG. 7 that the processor core included in the scheduling sub-region is the processor.
  • Core 1, processor core 3, and processor core 4, the scheduling module 503 sends the request of the current stage to the processor core 1, processor core 3, and processor core 4 with the lightest load processor core, and the load The lightest processor core executes the request at this stage.
  • processor core 1 When the processor set including processor core 1, processor core 3, processor core 4, processor core 5, processor core 8, processor core 9, and processor core 10 is also used to process the request at the current stage
  • the scheduling sub-area of the request for a certain stage of the other service request is to slide the sliding window backwards by two processor cores.
  • the scheduling module 503 sends a request for a certain stage of the other service request to the processor core 4, the processor core 5, and the processor core 8 with the lightest load processor core.
  • the processor core executes a request of a certain stage of the other service request.
  • the search range of the processor core with the lightest search load is narrowed, so that the processor core with the lightest load in the scheduling subregion executes the request of the corresponding stage.
  • the method for processing service requests in the embodiments of the present invention can ensure load balancing among processor cores, determine a set of processor cores for each stage of service request requests, and schedule requests in the current stage within the scope of the processor set.
  • the processor core with the lightest load in the storage system is directly selected, and the correlation between the request of each stage and the delay that affects the processing of the request by the processor core is considered, which further reduces the delay of processing business requests.
  • the binding relationship may be pre-configured, and then the binding relationship calculation module 502 in the processor core updates the binding relationship, that is, generates a new Nuclear ties.
  • the method further includes: re-determining the number of processor cores that execute the request of the current stage according to the first set of processor cores; and according to the re-determined processor core that executes the request of the current stage The number of second processor core sets that satisfy the number of requests for the current stage among the multiple processor cores; according to the second processor core set, a new core binding relationship is generated, and the new core binding The relationship is used to indicate an association relationship between the request in the current stage and the second processor core set.
  • the load balancing module 501 in the processor core S periodically determines the processing in the set of processor cores used to execute the requests of each phase for the requests of multiple phases of the service request.
  • the number of processor cores, and the determined number of processor cores in the set of processor cores used to execute the request of each phase is provided to the binding core relationship calculation module 502 according to the load balancing module 501 Re-determine the number of processor cores in the set of processor cores used to execute the request for each phase, reallocate the request for each phase to meet the corresponding number of processor cores, and reallocate according to the request for each phase Satisfy the corresponding number of processor cores and periodically generate new core binding relationships.
  • the following uses the load balancing module 501 to re-determine the number of processor cores used to execute the request of the current stage as an example, and describes a method to re-determine the number of processor cores used to execute the request of each stage.
  • re-determining the number of processor cores executing the request of the current stage according to the first set of processor cores includes: determining a total utilization rate of the processor cores in the first set of processor cores With the average utilization of the plurality of processor cores; and re-determining the execution of the request at the current stage according to the sum of the utilization of the processor cores in the first processor core set and the average utilization of the plurality of processor cores The number of processor cores.
  • the load balancing module 501 monitors the utilization rate of each processor core in the storage system in real time, wherein the utilization rate of the processor core is a ratio of the running time of the processor core to the sum of the running time and the idle time, and according to the processing, Changes in the utilization of processor cores, and re-determine the number of processor cores in the set of processor cores used to execute the request in the current phase.
  • the first set of processor cores bound to the request at the current stage is represented as P
  • the utilization rate of the first processor core set is represented by U P
  • the utilization rate of the first processor core set is U P is equal to the total utilization of the processor cores in the first processor core set in the current cycle, which is expressed as:
  • U j represents the utilization rate of any processor core in the first processor core set in the current cycle.
  • a plurality of processor cores in a storage system in the current cycle average utilization is expressed as U ave, the scheduling module 503 in accordance with the U P U ave execution request for re-determining the current phase of the processing core of the set of processors The number of processor cores.
  • the number of processor cores executing the request of the current stage is re-determined according to the sum of the utilization ratios of the processor cores in the first processor core set and the average utilization ratio of the plurality of processor cores. Including: re-determining the number of processor cores executing the request of the current phase based on the following relationship according to the sum of the utilization ratio of the processor cores in the first processor core set and the average utilization ratio of the plurality of processor cores :
  • N is the number of processor cores that are re-determined to execute the request at the current stage
  • U P is the total utilization of the processor cores in the first set of processor cores
  • U ave is the Average utilization.
  • the load balancing module 501 After the load balancing module 501 re-determines the number N of processor cores used to execute the request of the current stage in the current cycle, it will determine the processor cores in the set of processor cores used to execute the request of the current stage. The number is provided to the core relationship calculation module 502, and the core relationship calculation module 502 reallocates a set of processor cores (for example, a second processor core set) that meets the foregoing number N at the beginning of the next cycle for the request of the current stage. ).
  • a set of processor cores for example, a second processor core set
  • the number of processor cores used to execute the request of the current phase in the current cycle is 8, and after the load balancing module 501 re-determines the number of processor cores used to execute the request of the current phase in the current cycle, for example, The number of processor cores re-determined by the load balancing module 501 in the current cycle to execute the request of the current phase is 6, and the load balancing module 501 provides the number of processor cores 6 re-determined for the request of the current phase to the core binding relationship.
  • the binding core calculation module 502 may delete two processor cores from the eight processor cores stored in the binding relationship to execute the request of the current phase at the beginning of the next cycle, that is, generate a new processor core Nuclear ties.
  • the load balancing module 501 provides the number of processor cores 6 that is re-determined for the current request to the binding core calculation module 502.
  • the binding core calculation module 502 does not use the Two processor cores are deleted from the eight processor cores executing the current phase of the request, but six processor cores are reassigned in the storage system for the current phase of the request, and the core relationship will be tied at the beginning of the next cycle
  • the 8 processor cores that were originally allocated for the current stage request are replaced with the 6 processor cores that were reassigned to generate a new core-binding relationship.
  • processor cores for requests at the corresponding stage By periodically monitoring the utilization of processor cores in the storage system, and according to changes in the utilization of processor cores allocated for requests at any stage, reallocating processor cores for requests at the corresponding stage can be based on The change of the utilization rate of the processor cores is periodically adjusted to the processor cores allocated to the requests in the corresponding phases, thereby improving the load imbalance between the processor cores.
  • a method for assigning the core relationship calculation module 502 to the current stage request in the storage system to satisfy the number of processor cores is taken as an example.
  • the request for each phase is provided. The method of allocating the corresponding number of processor cores will be described in detail.
  • multiple processor cores usually share different levels of memory or cache.
  • the different levels of memory or cache can include L 1 cache, L 2 cache, L 3 cache, and local memory.
  • L 1 cache L 1 cache
  • L 2 cache L 3 cache
  • local memory L 1 cache
  • the processor core When sharing different levels of memory or cache, the topological distance between processor cores is also different.
  • each processor core can access local memory in a remote node (hereinafter referred to as "remote memory").
  • remote memory a remote node
  • each processor core can be abstracted into multiple logical cores. For example, each processor core is abstracted into two logical cores, which are respectively logical core 0 and logical core 1, as shown in FIG. 8.
  • Figure 8 shows a schematic diagram of the topology distance between logical cores sharing different levels of memory or cache under the NUMA architecture. It can be seen that under the NUMA architecture, there are nodes 0 and 1, and the logical cores in node 0 can be connected to nodes. The logical core in 1 shares the local memory in node 1. The local memory in node 1 is the remote memory for node 0.
  • the topological distance between two logical cores sharing L 1 cache in node 0 is D 1
  • the topological distance between two logical cores sharing L 2 cache is D 2
  • L 3 is shared
  • the topological distance between the two logical cores of the cache is D 3
  • the topological distance between the two logical cores sharing local memory is D 4.
  • the logical core in node 0 and the logical core in node 1 share the In local memory, the topological distance between the two logical cores is D 5 .
  • the access latency ratio of accessing local memory and remote memory is approximately 8:12, so the topological distance between logical cores that share remote memory between nodes can be calculated as 64.
  • the binding core calculation module 502 of the embodiment of the present invention is in each stage of the storage system A method for requesting allocation of a processor core satisfying a corresponding number will be described in detail.
  • node 0 and node 1 in FIG. 8 are in a NUMA architecture and communicate with each other through hyper-threading.
  • allocating a second processor core set that meets the number of requests for the current stage in multiple processors includes generating multiple sets of allocation results, and each set of allocation results includes requests for each stage The allocated set of processor cores meets the corresponding number; multiple path lengths are determined for the multiple sets of allocation results, and each set of allocation results corresponds to a path length, and the path length L satisfies:
  • c i, i + 1 represents the communication volume generated by the interaction between the processor cores executing requests in adjacent phases
  • d i, i + 1 represents the average topological distance between the processor cores executing requests in the adjacent phases
  • M is the number of requests in multiple stages of the service request; where the communication volume can represent the number of interactions between processor cores.
  • the request of the current stage is allocated to satisfy the number of processor cores.
  • each processor core is abstracted into logical core 0 and logical core 1, and 16 processor cores are abstracted into 32 logical cores.
  • the three phase requests are denoted as M 0 , M 1, and M 2 respectively .
  • the processor core used to execute the request of the current phase is determined by the foregoing.
  • the method of determining the number of logic cores used to execute M 0 , M 1 and M 2 in the current cycle it is determined that the number of logic cores used to execute M 0 is 8, the number of logic cores used to execute M 1 is determined, and the number of logic cores used to execute M 2 is 16.
  • the binding relationship calculation module 502 generates multiple sets of allocation results according to the number of logical cores determined for M 0 , M 1, and M 2. Each group of allocation results includes logical cores that satisfy a corresponding number of allocations for each stage of the request.
  • the allocation result 1 is: logical cores 0 to 7 in node 0 are assigned to M 0 , logical cores 8 to 15 of node 0 are assigned to M 1 , and logical cores 0 to 15 of node 1 are assigned to M 2 ;
  • the allocation result 2 is: logic cores 0 to 3 in node 0 and logic cores 0 to 3 in node 1 are allocated to M 0 , and logic cores 4 to 7 in node 0 and logic cores 4 to 7 in node 1 It is assigned to M 1 , and logical cores 8 to 15 in node 0 and logical cores 8 to 15 in node 1 are assigned to M 2 .
  • the binding core calculation module 502 will allocate logical cores 0 to 3 in node 0 and logical cores 0 to 3 in node 1.
  • M 0 assign logical cores 4 to 7 in node 0 and logical cores 4 to 7 in node 1 to M 1
  • M 2 and replace the processor core originally allocated for the request of each stage of the business request in the binding relationship with the reallocated processor core at the beginning of the next cycle.
  • multiple path lengths are determined for the multiple sets of allocation results.
  • the topological distance between processor cores is considered, and multiple paths are The allocation result corresponding to the shortest path length in the length is determined as the final processor core allocation result, thereby ensuring load balancing among the processor cores, determining the processor core set for each stage of the business request request, and within the scope of the processor set.
  • the scheduling of requests in the current phase takes into account the correlation between the requests in each phase and the delay that affects the processing of requests by the processor core in each phase, reducing the processing of business requests. Delay.
  • FIG. 9 shows a schematic flowchart of a configuration method for processing a service request.
  • the processing of the service request is divided into multiple stages, and the multiple stages correspond to the multiple stage requests.
  • the multiple stage requests include the first stage requests, and a processor is configured for the first stage requests.
  • a set of cores eg, a first set of processor cores through which the first stage of requests are processed.
  • a first rule may be configured, and the first rule may indicate that the lightest-loaded processor core in the first set of processor cores configured for the request of the first stage executes the request of the first stage.
  • the method further includes:
  • the service request also includes a request in a second phase
  • the request in the second phase may be a request in a phase subsequent to the request in the first phase
  • a processor core set is configured for the request in the second phase.
  • a second set of processor cores through which the requests in the second stage are processed.
  • a second rule may be configured, and the second rule may indicate that the lightest-loaded processor core in the second set of processor cores configured for the request of the second stage executes the request of the second stage.
  • the lightest processor core By allocating a certain number of processor cores (for example, a set of processor cores) for each stage of a business request, and sending requests for each stage to a load in the set of processor cores allocated for the request for that stage
  • the lightest processor core compared to the lightest load processor core among multiple processor cores in the storage system, sends the service request to the service request configuration method according to the embodiment of the present invention, which can ensure that when processing a service request, Load balancing among processor cores, determining the set of processor cores for each stage of a business request, and scheduling the current stage of requests within the scope of the processor set.
  • the correlation between the request of each stage and the delay that affects the processing of the request by the processor core is considered to reduce the delay of processing business requests.
  • the service request includes the request of the first stage and the request of the second stage, and does not specifically limit the embodiment of the present invention.
  • the service request may also include requests of other stages.
  • FIG. 10 is a schematic block diagram of an apparatus 800 for processing a service request according to an embodiment of the present invention.
  • the apparatus 800 is configured in a storage system and includes a transceiver module 801 and a processing module 802.
  • the transceiver module 801 is configured to receive a request in a current stage of a service request, where the request in the current stage is a request in one of a plurality of stages in the service request.
  • the processing module 802 is configured to determine a first set of processor cores that executes the request at the current stage, where the first set of processor cores is a subset of the plurality of processor cores.
  • the transceiver module 801 is further configured to send the request of the current stage to the processor core with the lightest load in the first processor core set.
  • the processing module 802 is further configured to query a core binding relationship and determine the first processor core set used to execute the request of the current phase, and the core binding relationship is used to indicate that the request of the current phase and the first phase Association between processor core sets.
  • the processing module 802 is further configured to re-determine the number of processor cores that execute the request of the current stage according to the first set of processor cores; and according to the re-determined processor that executes the request of the current stage The number of cores, among the plurality of processor cores, for the current stage request, a second set of processor cores satisfying the number is allocated; according to the second set of processor cores, a new binding core relationship is generated, and the new binding The core relationship is used to indicate an association relationship between the request in the current stage and the second processor core set.
  • the processing module 802 is further configured to determine a total utilization rate of the processor cores in the first processor core set and an average utilization rate of the plurality of processor cores; according to the first processor core set, The sum of the utilization of the processor cores and the average utilization of the plurality of processor cores re-determines the number of processor cores executing the request in the current stage.
  • the processing module 802 is further configured to re-determine the execution of the current based on the following relationship based on the sum of the utilization rates of the processor cores in the first processor core set and the average utilization rate of the plurality of processor cores. Number of requested processor cores for the phase:
  • N U P / U ave
  • N is the number of processor cores that are re-determined to execute the request at the current stage
  • U P is the total utilization of the processor cores in the first set of processor cores
  • U ave is the Average utilization.
  • the processing module 802 is further configured to generate multiple sets of allocation results, and each set of allocation results includes a set of processor cores that meets a corresponding number of requests for reallocation for each stage of the request; Path length, each group of allocation results corresponds to a path length, the path length L satisfies:
  • c i, i + 1 represents the communication volume generated by the interaction between the processor cores executing requests in adjacent phases
  • d i, i + 1 represents the average topological distance between the processor cores executing requests in the adjacent phases
  • M is the number of requests in multiple stages of the service request; according to a set of allocation results corresponding to the shortest path length among the multiple path lengths, the request for the current stage is allocated a second set of processor cores that meets the number.
  • the first processor core set includes K processor cores, where K is an integer greater than or equal to 3, and the processing module 802 is further configured to, according to the sliding window length w and the sliding step size d, in the K Among the processor cores, a scheduling sub-region is determined for the request of the current stage.
  • the scheduling sub-region includes w processor cores, where w is an integer greater than or equal to 2 and less than K, and d is an integer greater than or equal to 1 and less than K.
  • the transceiver module 801 is further configured to send the request of the current stage to the lightest-loaded processor core among the w processor cores.
  • d and K are prime numbers each other.
  • the apparatus 800 for processing a service request may correspond to executing the method 600 or the method 700 described in the embodiment of the present invention, and the above and other operations and / or functions of each module in the apparatus 800 are respectively implemented in order to implement FIG. 6.
  • the specific implementation of the apparatus 800 for processing a service request in the embodiment of the present invention may be a processor, or a software module, or a combination of a processor and a software module, which is not limited in the embodiment of the present invention.
  • FIG. 11 is a schematic block diagram of a storage system 900 according to an embodiment of the present invention.
  • the storage system includes a processor 901 and a memory 902, and the processor 901 includes multiple processor cores.
  • One or more processor cores in the plurality of processor cores are used to execute computer instructions stored in the memory 902.
  • the one or more processor cores are used to execute The following operations: receiving a request of a current stage of a service request, the current stage request being one of a plurality of stage requests of the service request; determining a first set of storage system cores to execute the current stage request, the The first storage system core set is a subset of the storage system cores of the plurality of storage system cores; and the request of the current stage is sent to the storage system core with the lightest load in the first storage system core set.
  • the one or more processor cores are further configured to query a core binding relationship, determine the first storage system core set used to execute the request of the current phase, and the core binding relationship is used to indicate the current phase. An association relationship between the request and the first storage system core set.
  • the one or more processor cores are further configured to re-determine the number of storage system cores that execute the request of the current phase according to the first set of storage system cores; The number of requested storage system cores, and among the multiple storage system cores, a second storage system core set that satisfies the number is allocated to the current stage of the request; and a new binding core relationship is generated according to the second storage system core set, The new core binding relationship is used to indicate an association relationship between the request in the current stage and the second storage system core set.
  • the one or more processor cores are further configured to determine a sum of utilization rates of the storage system cores in the first storage system core set and an average utilization rate of the plurality of storage system cores; according to the first processing The sum of the utilization rates of the processor cores in the processor core set and the average utilization rate of the plurality of processor cores re-determines the number of processor cores executing the request in the current stage.
  • the one or more processor cores are further configured to re-based on the sum of the utilization rates of the processor cores in the first processor core set and the average utilization rate of the plurality of processor cores based on the following relationship: Determine the number of processor cores executing requests for this current phase:
  • N U P / U ave
  • N is the number of processor cores that are re-determined to execute the request at the current stage
  • U P is the total utilization of the processor cores in the first set of processor cores
  • U ave is the Average utilization.
  • the one or more processor cores are further configured to generate multiple groups of allocation results, and each group of allocation results includes a set of processor cores that meets a corresponding number of requests reallocated for each stage of the request;
  • the allocation result determines multiple path lengths.
  • Each group of allocation results corresponds to a path length.
  • the path length L satisfies:
  • c i, i + 1 represents the communication volume generated by the interaction between the processor cores executing requests in adjacent phases
  • d i, i + 1 represents the average topological distance between the processor cores executing requests in the adjacent phases
  • M is the number of requests in multiple stages of the service request; according to a set of allocation results corresponding to the shortest path length among the multiple path lengths, the request for the current stage is allocated a second set of processor cores that meets the number.
  • the first set of processor cores includes K processor cores, where K is an integer greater than or equal to 3, and the one or more processor cores are further used according to the sliding window length w and the sliding step size d.
  • the scheduling sub-region includes w processor cores, w is an integer greater than or equal to 2 and less than K, and d is greater than or equal to 1 And an integer smaller than K; sending the request of the current stage to the lightest-loaded processor core among the w processor cores.
  • the d and the K are prime numbers each other.
  • Each module shown in FIG. 5 in the embodiment of the present invention may be hardware logic in the processor core, or may be computer instructions executed by the processor core, or a combination of hardware logic and computer instructions, which is not limited in the embodiment of the present invention. .
  • Each module of the apparatus 800 for processing a service request may be implemented by a processor, may be implemented by a processor and a memory together, or may be implemented by a software module. Accordingly, each module shown in FIG. 5 may correspond to one or more modules shown in FIG. 8, and the module shown in FIG. 8 includes corresponding functions of the module shown in FIG. 5.
  • An embodiment of the present invention provides a computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions.
  • the computer instructions When the computer instructions are run on a computer, the computer executes a method for processing a service request in an embodiment of the present invention. Or configuration methods for processing business requests.
  • Embodiments of the present invention provide a computer program product containing computer instructions, and when the computer instructions are run on a computer, the computer is caused to execute the method for processing a service request or the method for configuring a service request in an embodiment of the present invention.
  • processors mentioned in the embodiments of the present invention may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), and application-specific integrated circuits (DSPs).
  • DSPs digital signal processors
  • DSPs application-specific integrated circuits
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory mentioned in the embodiments of the present invention may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrical memory Erase programmable read-only memory (EPROM, EEPROM) or flash memory.
  • the volatile memory may be a random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double SDRAM double SDRAM
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • enhanced SDRAM enhanced SDRAM
  • SLDRAM synchronous connection dynamic random access memory
  • direct RAMbus RAM direct RAMbus RAM
  • the processor is a general-purpose processor, a DSP, an ASIC, an FPGA, or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component
  • the memory memory module
  • memory described herein is intended to include, but is not limited to, these and any other suitable types of memory.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the embodiment of the present invention is essentially a part that contributes to the existing technology or a part of the technical solution may be embodied in the form of a software product, which is stored in a storage medium.
  • the foregoing storage media include: U disks, mobile hard disks, read-only memories (ROM), random access memories (RAM), magnetic disks or compact discs, and other media that can store computer instructions .

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请提供了一种存储***中处理业务请求的方法,该存储***包含多个处理器核,其特征在于,包括:接收业务请求的当前阶段的请求,当前阶段的请求为业务请求的多个阶段的请求中的一个阶段的请求;确定执行当前阶段的请求的第一处理器核集合,第一处理器核集合为多个处理器核的一个处理器核子集;向第一处理器核集合负载最轻的处理器核发送当前阶段的请求。该方法能够保证处理器核之间的负载均衡,降低处理业务请求的时延。

Description

处理业务请求的方法、装置与存储*** 技术领域
本申请涉及信息技术领域,并且更具体地,涉及处理业务请求的方法、装置与处理器。
背景技术
在存储***中,阵列控制器的中央处理器(central processing unit,CPU)是影响***性能的关键因素,通常CPU包括的处理器核越多,存储***的性能就越高。
然而,在阵列控制器包含多处理器核的存储***中,随着处理器核数的增多,调度处理器核处理业务请求时会出现处理器核之间的负载不均衡问题。
现在技术,根据处理器核上待处理的业务请求个数来估计处理器核的负载,最终将业务请求发送至负载最小的处理器核。这种方法虽然能够改善处理器核之间的负载不均衡问题,但是处理业务请求的时间复杂度会随着处理器核数的增多而线性扩展,导致处理业务请求的时延的不可控。
发明内容
第一方面,提供了一种存储***中处理业务请求的方法,所述存储***包含多个处理器核,包括:接收业务请求的当前阶段的请求,所述当前阶段的请求为所述业务请求的多个阶段的请求中的一个阶段的请求;确定执行所述当前阶段的请求的第一处理器核集合,所述第一处理器核集合为所述多个处理器核的一个处理器核子集;向所述第一处理器核集合负载最轻的处理器核发送所述当前阶段的请求。
通过将待处理的业务请求划分为多个阶段的请求来执行,为每一阶段的请求分配一定数量的处理器核(例如,处理器核集合),并将每一阶段的请求均发送至为该阶段的请求分配的处理器核集合中的负载最轻的处理器核,相对于将业务请求发送至存储***中多个处理器核当中负载最轻的处理器核,本申请的处理业务请求的方法能够保证处理器核之间的负载均衡,为业务请求每个阶段的请求确定处理器核集合,在处理器集合范围内调度当前阶段的请求,相对于直接选择存储***中负载最轻的处理器核,考虑了各阶段的请求与影响处理器核处理各阶段的请求的时延的相关性,降低处理业务请求的时延。
可选地,所述确定执行所述当前阶段的请求的第一处理器核集合,包括:查询绑核关系,确定用于执行所述当前阶段的请求的所述第一处理器核集合,所述绑核关系用于指示所述当前阶段的请求与所述第一处理器核集合之间的关联关系。
结合第一方面,在第一方面的某些实现方式中,所述方法还包括:根据所述第一处理器核集合,重新确定执行所述当前阶段的请求的处理器核的数量;根据所述重新确定的执行所述当前阶段的请求的处理器核的数量,在所述多个处理器核中为所述当前阶段的请求分配满足所述数量的第二处理器核集合;根据所述第二处理器核集合,生成新的绑核关系,所述新的绑核关系用于指示所述当前阶段的请求与所述第二处理器核集合之间的关联关 系。
可选地,所述根据所述第一处理器核集合,重新确定执行所述当前阶段的请求的处理器核的数量,包括:确定所述第一处理器核集合中的处理器核的利用率总和与所述多个处理器核的平均利用率;根据所述第一处理器核集合中的处理器核的利用率总和与所述多个处理器核的平均利用率,重新确定执行所述当前阶段的请求的处理器核的数量。
通过周期性地监控存储***中的处理器核的利用率,并根据为任一阶段的请求分配的处理器核的利用率的变化情况,为相应阶段的请求重新分配处理器核,从而能够根据处理器核的利用率的变化情况,周期性地调整为相应阶段的请求分配的处理器核,进而改善处理器核之间的负载不均衡的现象。
可选地,所述根据所述第一处理器核集合中的处理器核的利用率总和与所述多个处理器核的平均利用率,重新确定执行所述当前阶段的请求的处理器核的数量,包括:根据所述第一处理器核集合中的处理器核的利用率总和与所述多个处理器核的平均利用率,基于以下关系式重新确定执行所述当前阶段的请求的处理器核的数量:
N=U P/U ave
其中,N为重新确定的执行所述当前阶段的请求的处理器核的数量,U P为所述第一处理器核集合中的处理器核的利用率总和,U ave为所述多个处理器核的平均利用率。
结合第一方面,在第一方面的某些实现方式中,所述在所述多个处理器核中为所述当前阶段的请求分配满足所述数量的第二处理器核集合,包括:生成多组分配结果,每组分配结果中包括为每一个阶段的请求重新分配的满足相应数量的处理器核集合;针对所述多组分配结果确定多个路径长度,每一组分配结果对应一个路径长度,所述路径长度L满足:
Figure PCTCN2018098277-appb-000001
其中,c i,i+1表示执行相邻阶段的请求的处理器核间交互产生的通信量,d i,i+1表示执行所述相邻阶段的请求的处理器核间的平均拓扑距离,M为所述业务请求的多个阶段的请求的数量;根据所述多个路径长度中的最短路径长度对应的一组分配结果,为所述当前阶段的请求分配满足所述数量的第二处理器核集合。
根据确定的为各个阶段的请求分配的处理器核的数量,生成多组处理器核的分配结果,针对该多组分配结果确定多个路径长度,通过为各个阶段的请求分配处理器核时考虑处理器核间的拓扑距离,将多个路径长度中的最短路径长度对应的分配结果确定为最终的处理器核分配结果,从而保证处理器核之间的负载均衡,降低处理业务请求的时延。
结合第一方面,在第一方面的某些实现方式中,所述第一处理器核集合中包括K个处理器核,K为大于或等于3的整数,所述向所述第一处理器核集合中负载最轻的处理器核发送所述当前阶段的请求,包括:根据滑动窗口长度w与滑动步长d,在所述K个处理器核中为所述当前阶段的请求确定调度子区域,所述调度子区域中包括w个处理器核,w为大于或等于2且小于K的整数,d为大于或等于1且小于K的整数;向所述w个处理器核中负载最轻的处理器核发送所述当前阶段的请求。
在确定执行任一阶段的请求的处理器核时,通过引入滑动窗口机制,缩小搜索负载最轻的处理器核的搜索范围,使调度子区域中负载最轻的处理器核执行相应阶段的请求,保 证处理器核之间的负载均衡,进一步降低处理业务请求的时延。
结合第一方面,在第一方面的某些实现方式中,所述d与所述K互为质数。
引入滑动窗口机制后,当存在多个阶段的请求与同一处理器核集合之间存在绑定关系时,并且当该处理器核集合中的每个处理器核的负载相同时,此时,在依次处理该多个阶段的请求时,为了保证处理器核间的负载均衡,需要保证负载相同(即,待处理的请求队列的个数相同)的处理器核被选中用于执行请求的概率相同,即,需要保证每个处理器核作为滑动窗口内的搜索起始点的概率相同,当该处理器核集合中的处理器核的个数K与滑动步长d互为质数时,能够保证每个处理器核作为滑动窗口内的搜索起始点的概率相同。
第二方面,提供了一种处理业务请求的配置方法,包括;为业务请求的第一阶段的请求配置第一处理器核集合,所述第一处理器核集合用于执行所述第一阶段的请求;配置第一规则,所述第一规则指示向所述第一处理器核集合中负载最轻的处理器核发送所述第一阶段的请求。
通过为业务请求的每一阶段的请求分配一定数量的处理器核(例如,处理器核集合),并将每一阶段的请求均发送至为该阶段的请求分配的处理器核集合中的负载最轻的处理器核,相对于将业务请求发送至存储***中多个处理器核当中负载最轻的处理器核,本申请的处理业务请求的配置方法能够使得处理业务请求时,保证处理器核之间的负载均衡,考虑了各阶段的请求与影响处理器核处理各阶段的请求的时延的相关性,降低处理业务请求的时延。
结合第二方面,在第二方面的某些实现方式中,所述方法还包括:为业务请求的第二阶段的请求配置第二处理器核集合,所述第二处理器核集合用于执行所述第二阶段的请求;配置第二规则,所述第二规则指示向所述第二处理器核集合中负载最轻的处理器核发送所述第二阶段的请求。
第三方面,提供一种处理业务请求的装置,所述装置配置于存储***中,所述装置用于执行上述第一方面或第二方面的任一可能的实现方式中的方法。具体地,所述装置可以包括用于执行第一方面或第二方面的任一可能的实现方式中的方法的模块。
第四方面,提供一种存储***,所述存储***包括多个处理器核与存储器;存储器,用于存储计算机指令;所述多个处理器核中的一个或多个处理器核用于执行所述存储器中存储的计算机指令,当所述存储器中的计算机指令被执行时,所述一个或多个处理器核用于执行上述第一方面或第二方面的任一可能的实现方式中的方法。
第五方面,提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行第一方面或第二方面的任一可能的实现方式中的方法。
第六方面,提供一种包含计算机指令的计算机程序产品,当该计算机指令在计算机上运行时,使得计算机执行第一方面或第二方面的任一可能的实现方式中的方法。
附图说明
图1是本发明实施例的存储阵列架构示意图。
图2是本发明实施例的存储阵列的控制器的示意图。
图3是本发明实施例的分布式块存储***的示意图。
图4是分布式块存储***的服务器的示意性结构框图。
图5是本发明实施例的处理器的示意性框图。
图6是本发明实施例提供的存储***中处理业务请求的方法的示意性流程图。
图7是本发明实施例提供的基于滑动窗口机制调度处理器核的原理性示意图。
图8是本发明实施例的NUMA架构下共享不同层次的内存或cache的逻辑核之间的拓扑距离示意图。
图9位本发明实施例提供的处理业务请求的配置方法的示意性流程图。
图10为本发明实施例提供的处理业务请求的装置的示意性框图。
图11为本发明实施例提供的存储***的示意性框图。
具体实施方式
下面将结合附图,对本发明实施例中的技术方案进行描述。
首先对适用于本发明实施例的存储***进行介绍。
如图1所示,本发明实施例中的存储***,可以为存储阵列(如华为
Figure PCTCN2018098277-appb-000002
的Oceanstor
Figure PCTCN2018098277-appb-000003
18000系列,
Figure PCTCN2018098277-appb-000004
V3系列)。存储阵列包括存储控制器101和多块硬盘,其中,硬盘包含固态硬盘(solid state disk,SSD)、机械硬盘或者混合硬盘等。机械硬盘如HDD(hard disk drive)。如图2所示,控制器101包含中央处理单元(central processing unit,CPU)201、存储器202和接口203,存储器202中存储计算机指令,CPU201包括多个处理器核(图2中未示出),CPU201执行存储器202中的计算机指令对存储***进行管理及数据访问操作。另外,为节省CPU201的计算资源,现场可编程门阵列(field programmable gate array,FPGA)或其他硬件也可以用于执行本发明实施例中CPU201全部操作,或者,FPGA或其他硬件与CPU201分别用于执行本发明实施例CPU201的操作。为方便描述,本发明实施例将CPU201与内存202称为处理器、或将FPGA及其他替代CPU201的硬件称为处理器,或将FPGA及其他替代CPU201的硬件与CPU201的组合统称为处理器,处理器与接口203通信。接口203可以为网络接口卡(networking interface card,NIC)、主机总线适配器(host bus adaptor,HBA)等。
如图1和图2所描述的存储阵列,CPU201用于处理业务请求,如接收主机或客户端发送的业务请求,使用本发明实施例提供的处理业务请求的方法处理该业务请求。
进一步的,本发明实施例的存储***还可以为分布式文件存储***(如华为
Figure PCTCN2018098277-appb-000005
Figure PCTCN2018098277-appb-000006
9000系列),分布式块存储***(如华为
Figure PCTCN2018098277-appb-000007
Figure PCTCN2018098277-appb-000008
系列)等。以华为
Figure PCTCN2018098277-appb-000009
Figure PCTCN2018098277-appb-000010
系列为例。示例性的如图3所示,分布式块存储***包括多台服务器,如服务器1、服务器2、服务器3、服务器4、服务器5和服务器6,服务器间通过无限带宽(infiniband)技术或以太网络等互相通信。在实际应用当中,分布式块存储***中服务器的数量可以根据实际需求增加,本发明实施例对此不作限定。
分布式块存储***的服务器中包含如图4所示的结构。如图4所示,分布式块存储***中的每台服务器包含中央处理单元(central processing unit,CPU)401、内存402、接口403、硬盘1、硬盘2和硬盘3,内存402中存储计算机指令,CPU401包括多个处理器核(图4中未示出)、CPU401执行内存402中的计算机指令执行相应的操作。接口403可以为硬件接口,如网络接口卡(network interface card,NIC)或主机总线适 配器(host bus adaptor,HBA)等,也可以为程序接口模块等。硬盘包含固态硬盘(solid state disk,SSD)、机械硬盘或者混合硬盘。机械硬盘如HDD(hard disk drive)。另外,为节省CPU401的计算资原,现场可编程门阵列(field programmable gate array,FPGA)或其他硬件也可以代替CPU401执行上述相应的操作,或者,FPGA或其他硬件与CPU401共同执行上述相应的操作。为方便描述,本发明实施例将CPU401与内存402称为处理器、或将FPGA及其他替代CPU401的硬件称为处理器,或将FPGA及其他替代CPU401的硬件与CPU401的组合统称为处理器。接口403可以为网络接口卡(networking interface card,NIC)、主机总线适配器(host bus adaptor,HBA)等。
如图3和图4所描述的分布式块存储***,CPU401用于处理业务请求,如接收主机或客户端发送的业务请求,使用本发明实施例提供的处理业务请求的方法处理该业务请求。
下面对处理业务请求的一般方法进行简单介绍:
在处理业务请求时,根据包含多个处理器核的存储***中的每个处理器核上待处理的业务请求的数量来估计处理器核的负载情况,最终将业务请求发送至存储***中负载最轻(例如,待处理的业务请求的数量最少)的处理器核。
这种方法虽然能够改善处理器核之间的负载不均衡的现象,但是处理业务请求的时间复杂度会随着处理器核数的增多而线性扩展,导致对处理业务请求的时延的不可控。
针对上述问题,本发明实施例提出一种处理业务请求的方法,待处理的业务请求可以划分为多个阶段的请求执行,为每一阶段的请求分配一定数量的处理器核(例如,处理器核集合),并将每一阶段的请求均发送至为该阶段的请求分配的处理器核集合中的负载最轻的处理器核,相对于将业务请求发送至存储***中所有处理器核当中负载最轻的处理器核。本发明实施例中,基于CPU(如处理器核)访问各个层次的内存或cache的访问时延、访问距离、处理器之间的连接关系或总线类型等影响时延的因素,为每一个阶段的请求分配处理器核集合。本发明实施例的处理业务请求的方法能够保证处理器核之间的负载均衡,在处理器核集合范围内调度当前阶段的请求,相对于直接选择存储***中负载最轻的处理器核,考虑了各阶段请求与处理器核处理各阶段的请求的时延的相关性,降低处理业务请求的时延。示例性的,访问请求可以分为两个阶段:等待资源阶段和使用资源阶段。其中,等待资源阶段的请求一般需要请求特殊的资源,如磁盘、内存、文件等,当资源被上一个请求占用没有被释放时,等待资源阶段的请求就会被阻塞,直到能够使用这个资源;使用资源阶段的请求是真正进行数据访问阶段的请求。再例如,以小型计算机***接口(computer system interface,SCSI)子***为例,SCSI子***是一种分层的架构,共分为三层。顶部的那层,即是上层叫做较高层,代表的是操作***内核访问SCSI协议的设备和主要设备类型的驱动器的最高接口。接下来的是中间层,也称为公共层或统一层,在这一层包含SCSI堆栈的较高层和较低层的一些公共服务。最后是较低层,代表的是适用于SCSI协议的设备的物理接口的实际驱动器。基于SCSI的访问请求也相应划分为3个阶段的请求。
在对本发明实施例提供的存储***中处理业务请求的方法进行介绍之前,首先对本发明实施例提供的处理器(例如,图2中的CPU201与图4中的CPU401)进行介绍。
如图5所示,本发明实施例中的处理器包括多个处理器核(例如,处理器核0~处理 器核S,S≥2),多个处理器核中的一个处理器核中包括负载均衡模块501与绑核关系计算模块502,其他处理器核中包括调度模块503。其中,负载均衡模块501用于为业务请求的每一阶段的请求计算需要绑定的处理器核的数量;绑核关系计算模块502用于为业务请求的每一阶段的请求分配满足相应数量的处理器核,进而生成绑核关系,该绑核关系指示业务请求的一个阶段的请求与一个处理该阶段请求的处理器核集合的对应关系;调度模块503用于保存该绑核关系,在接收到某一阶段的业务请求时,查询该绑核关系,确定用于执行该阶段的请求的处理核集合,并将该阶段的请求发送至该处理器核集合中负载最轻的处理器核,由该处理器核执行该阶段的请求。
此外,在部署有调度模块503的处理器核中,至少有一个处理器核中部署有监听模块504,该监听模块504用于监听来自主机或客户端的业务请求,在监听到来自主机或客户端的业务请求时,将该业务请求发送至处理器核中的调度模块503。
需要说明的是,上述仅以负载均衡模块501与绑核关系计算模块502部署在处理器核S中为例对本发明实施例中的处理器进行说明,但本发明实施例并不限定于此,负载均衡模块501与绑核关系计算模块502可以部署在处理器核0~处理器核S中的任意一个处理器核中,并且负载均衡模块501与绑核关系计算模块502可以部署在同一个处理器核中,也可以部署在不同的处理器核中。
下面对本发明实施例提供的存储***中处理业务请求的方法600进行详细说明。图6示出了存储***中处理业务请求的方法的示意性流程图,包括步骤601至603。
601,接收业务请求的当前阶段的请求,该当前阶段的请求为该业务请求的多个阶段的请求中的一个阶段的请求。需要说明的是,在本发明实施例中,业务请求的处理分为多个阶段进行,并为每一阶段分配了一个处理器核集合,由相应处理器核集合中负载最轻的处理器核处理业务请求的相应阶段的请求。业务请求的当前待处理的阶段的请求称为当前阶段的请求。
具体地,例如,当处理器核中的监听模块504(例如,处理器核1中的监听模块504)监听到来自主机或客户端的该业务请求时,当前阶段的业务请求是业务请求的多个阶段的请求中的第一个阶段的请求。
处理器核1中的监听模块504将该当前阶段的请求发送至处理器核1中的调度模块503。
602,确定执行该当前阶段的请求的第一处理器核集合,该第一处理器核集合为该多个处理器核的一个处理器核子集。
具体地,处理器核1中的调度模块503为接收到的当前阶段的请求确定执行该当前阶段的请求的处理器核集合(例如,第一处理器核集合)。
例如,调度模块503可以根据当前阶段的请求的具体类型,确定执行当前阶段的请求的第一处理器核集合,第一处理器核集合是存储***中的多个处理器核的一个处理器核子集。
还例如,确定执行该当前阶段的请求的第一处理器核集合,包括:查询绑核关系,确定用于执行该当前阶段的请求的该第一处理器核集合,该绑核关系用于指示该当前阶段的请求与该第一处理器核集合之间的关联关系。
具体地,处理器核1中的调度模块503可以查询绑核关系,该绑核关系中指示了为该 业务请求的每一阶段的请求分配的处理器核集合,每个处理器核集合中包括多个处理器核,处理器核1中的调度模块503根据该绑核关系,确定执行当前阶段的请求的第一处理器核集合。
例如,处理器核1中的调度模块503查询该绑核关系,确定包含处理器核1、处理器核2、处理器核4、处理器核7与处理器核9的处理器核集合与当前阶段的请求之间存在关联关系,进而将该处理器核集合确定为执行当前阶段的请求的第一处理器核集合。
603,向该第一处理器核集合负载最轻的处理器核发送该当前阶段的请求。
具体地,在确定了用于执行当前阶段的请求的第一处理器核集合后,处理器核1中的调度模块503将该业务请求发送至第一处理器核集合中的负载最轻的处理器核,由该处理器核执行当前阶段的请求。
例如,处理器核1中的调度模块503确定第一处理器核集合中的处理器核1、处理器核2、处理器核4、处理器核7与处理器核9中负载最轻的处理器核为处理器核7,则处理器核1中的调度模块503将业务请求发送至处理器核7,由处理器核7执行当前阶段的请求。
当处理器核7完成对该当前阶段的请求的执行后,该业务请求便进入下一执行阶段,该处理器核7中的调度模块503根据保存的绑核关系,确定用于执行业务请求的下一阶段的请求的处理器核集合,并将该下一阶段的请求发送至该处理器核集合中的负载最轻的处理器核,由该处理器核执行该下一阶段的请求。
依次重复上述操作,直至最终完成对该业务请求的处理。
通过将待处理的业务请求划分为多个阶段执行,为每一阶段的请求分配一定数量的处理器核(例如,处理器核集合),并将每一阶段的请求均发送至为该阶段的请求分配的处理器核集合中的负载最轻的处理器核,相对于将业务请求发送至存储***中多个处理器核当中负载最轻的处理器核,本发明实施例的处理业务请求的方法能够保证处理器核之间的负载均衡,为业务请求每个阶段的请求确定处理器核集合,在处理器集合范围内调度当前阶段的请求,相对于直接选择存储***中负载最轻的处理器核,考虑了各阶段的请求与影响处理器核处理各阶段的请求的时延的相关性,降低处理业务请求的时延。
可选地,该第一处理器核集合中包括K个处理器核,K为大于或等于3的整数,该向该第一处理器核集合中负载最轻的处理器核发送该当前阶段的请求,包括:根据滑动窗口长度w与滑动步长d,在该K个处理器核中为该当前阶段的请求确定调度子区域,该调度子区域中包括w个处理器核,w为大于或等于2且小于K的整数,d为大于或等于1且小于K的整数;向该w个处理器核中负载最轻的处理器核发送该当前阶段的请求。
具体地,调度模块503在确定了用于执行当前阶段的请求的第一处理器核集合后,可以将当前阶段的请求发送至该第一处理器核集合中的负载最轻的处理器核,由该处理器核执行当前阶段的请求;或者,还可以基于滑动窗口机制确定执行该当前阶段的请求的处理器核。
调度模块503在确定用于执行当前阶段的请求的第一处理器核集合后,可以根据滑动窗口长度w与滑动步长d,在根据绑核关系确定的第一处理器核集合中为该当前阶段的请求确定调度子区域,从该调度子区域包括的处理器核中确定负载最轻的处理器核,将该业务请求发送至该调度子区域中负载最轻的处理器核。
例如,调度模块503根据绑核关系确定的用于执行当前阶段的请求的第一处理器核集合中的处理器核为处理器核1、处理器核3、处理器核4、处理器核5、处理器核8、处理器核9与处理器核10(即,K=7)。例如,w=3,d=2,则调度模块503为当前阶段的请求确定的调度子区域如图7所示,从图7中可以看出,调度子区域中包括的处理器核为处理器核1、处理器核3、处理器核4,则调度模块503将当前阶段的请求发送至处理器核1、处理器核3、处理器核4中负载最轻的处理器核,由该负载最轻的处理器核执行该当前阶段的请求。
当该包含处理器核1、处理器核3、处理器核4、处理器核5、处理器核8、处理器核9与处理器核10的处理器集合还用于处理该当前阶段的请求之后的其他业务请求的某一阶段的请求时,则该其他业务请求的某一阶段的请求的调度子区域是将滑动窗口向后滑动两个处理器核,由处理器核4、处理器核5、处理器核8形成的子区域,调度模块503将该其他业务请求的某一阶段的请求发送至处理器核4、处理器核5、处理器核8中负载最轻的处理器核,由该处理器核执行该其他业务请求的某一阶段的请求。
在确定执行任一阶段的请求的处理器核时,通过引入滑动窗口机制,缩小搜索负载最轻的处理器核的搜索范围,使调度子区域中负载最轻的处理器核执行相应阶段的请求,本发明实施例的处理业务请求的方法能够保证处理器核之间的负载均衡,为业务请求每个阶段的请求确定处理器核集合,在处理器集合范围内调度当前阶段的请求,相对于直接选择存储***中负载最轻的处理器核,考虑了各阶段的请求与影响处理器核处理各阶段的请求的时延的相关性,进一步降低处理业务请求的时延。
引入滑动窗口机制后,当存在多个阶段的请求与同一处理器核集合之间存在绑定关系时,并且当该处理器核集合中的每个处理器核的负载相同时,此时,在依次处理该多个阶段的请求时,为了保证处理器核间的负载均衡,需要保证负载相同(即,待处理的请求队列的个数相同)的处理器核被选中用于执行请求的概率相同,即,需要保证每个处理器核作为滑动窗口内的搜索起始点的概率相同,当该处理器核集合中的处理器核的个数K与滑动步长d互为质数时,能够保证每个处理器核作为滑动窗口内的搜索起始点的概率相同。
需要说明的是,在该存储***刚开始运行时,该绑核关系可以是预先配置好的,后面由处理器核中的绑核关系计算模块502对该绑核关系进行更新,即生成新的绑核关系。
下面对本发明实施例提供的生成新的绑核关系的方法进行详细说明。
作为示例而非限定,该方法还包括:根据该第一处理器核集合,重新确定执行该当前阶段的请求的处理器核的数量;根据该重新确定的执行该当前阶段的请求的处理器核的数量,在该多个处理器核中为该当前阶段的请求分配满足该数量的第二处理器核集合;根据该第二处理器核集合,生成新的绑核关系,该新的绑核关系用于指示该当前阶段的请求与该第二处理器核集合之间的关联关系。
具体地,随着存储***的运行,处理器核S中的负载均衡模块501针对业务请求的多个阶段的请求,周期性地确定用于执行每一阶段的请求的处理器核集合中的处理器核的数量,将确定的用于执行每一阶段的请求的处理器核集合中的处理器核的数量提供给绑核关系计算模块502,绑核关系计算模块502根据负载均衡模块501提供的重新确定的用于执行每一阶段的请求的处理器核集合中的处理器核的数量,为每一阶段的请求重新分配满足相应数量的处理器核,并根据为每一阶段的请求重新分配满足相应数量的处理器核,周期 性的生成新的绑核关系。
以下以负载均衡模块501重新确定用于执行当前阶段的请求的处理器核的数量的方法为例,对重新确定用于执行每一阶段的请求的处理器核的数量的方法进行说明。
作为示例而非限定,该根据该第一处理器核集合,重新确定执行该当前阶段的请求的处理器核的数量,包括:确定该第一处理器核集合中的处理器核的利用率总和与该多个处理器核的平均利用率;根据该第一处理器核集合中的处理器核的利用率总和与该多个处理器核的平均利用率,重新确定执行该当前阶段的请求的处理器核的数量。
具体地,负载均衡模块501实时监控存储***中的每个处理器核的利用率,其中,处理器核的利用率为处理器核的运行时间与运行时间加空闲时间之和的比值,根据处理器核的利用率的变化情况,重新确定用于执行当前阶段的请求的处理器核集合中的处理器核的数量。
例如,在当前监控周期内,当前阶段的请求绑定的第一处理器核集合表示为P,第一处理器核集合的利用率用U P表示,则第一处理器核集合的利用率U P等于第一处理器核集合中的处理器核在当前周期内的利用率的总和,表示为:
U P=ΣU j,j∈P (1)
其中,U j表示第一处理器核集合中的任一处理器核在当前周期内的利用率。
将存储***中的多个处理器核在当前周期内的平均利用率表示为U ave,则调度模块503根据U P与U ave重新确定用于执行当前阶段的请求的处理器核集合中的处理器核的数量。
作为示例而非限定,该根据该第一处理器核集合中的处理器核的利用率总和与该多个处理器核的平均利用率,重新确定执行该当前阶段的请求的处理器核的数量,包括:根据该第一处理器核集合中的处理器核的利用率总和与该多个处理器核的平均利用率,基于以下关系式重新确定执行该当前阶段的请求的处理器核的数量:
N=U P/U ave (2)
其中,N为重新确定的执行该当前阶段的请求的处理器核的数量,U P为该第一处理器核集合中的处理器核的利用率总和,U ave为该多个处理器核的平均利用率。
当负载均衡模块501在当前周期内重新确定出用于执行该当前阶段的请求的处理器核的数量N后,将确定的用于执行该当前阶段的请求的处理器核集合中的处理器核的数量提供给绑核关系计算模块502,由绑核关系计算模块502在下一周期的起始时刻为当前阶段的请求重新分配满足上述数量N的处理器核集合(例如,第二处理器核集合)。
例如,当前周期内用于执行当前阶段的请求的处理器核的数量为8,而当负载均衡模块501在当前周期对用于执行当前阶段的请求的处理器核的数量重新确定后,例如,负载均衡模块501在当前周期重新确定的用于执行当前阶段的请求的处理器核的数量为6,负载均衡模块501将为当前阶段的请求重新确定的处理器核的数量6提供给绑核关系计算模块502,则绑核关系计算模块502可以在下一周期的起始时刻将绑核关系中保存的用于执行当前阶段的请求的8个处理器核中删除两个处理器核,即生成新的绑核关系。
再例如,负载均衡模块501将为当前阶段的请求重新确定的处理器核的数量6提供给绑核关系计算模块502,此时绑核关系计算模块502不去从绑核关系中保存的用于执行当前阶段的请求的8个处理器核中删除两个处理器核,而是在存储***中为当前阶段的请求 重新分配6个处理器核,并在下一周期的起始时刻将绑核关系中原来为当前阶段的请求分配的8个处理器核替换为重新分配的该6个处理器核,即生成新的绑核关系。
通过周期性地监控存储***中的处理器核的利用率,并根据为任一阶段的请求分配的处理器核的利用率的变化情况,为相应阶段的请求重新分配处理器核,从而能够根据处理器核的利用率的变化情况,周期性地调整为相应阶段的请求分配的处理器核,进而改善处理器核之间的负载不均衡的现象。
下面以绑核关系计算模块502在该存储***中为该当前阶段的请求分配满足该数量的处理器核的方法为例,对绑核关系计算模块502在该存储***中为该各个阶段的请求分配满足相应数量的处理器核的方法进行详细说明。
在存储***中,多个处理器核通常会共享不同层次的内存或缓存(cache),不同层次的内存或缓存可以包括L 1cache、L 2cache、L 3cache以及本地内存,当处理器核共享不同层次的内存或cache时,处理器核间的拓扑距离也是不同的。
在非统一内存访问架构(non uniform memory access architecture,NUMA)中,每个处理器核可以访问远端节点中的本地内存(以下简称为“远端内存”),当采用超线程通信时,每个处理器核可以被抽象为多个逻辑核。例如,每个处理器核被抽象为两个逻辑核,该两个逻辑核分别为逻辑核0与逻辑核1,如图8所示。
图8示出了NUMA架构下共享不同层次的内存或cache的逻辑核之间的拓扑距离示意图,可以看出,在NUMA架构下,存在节点0与节点1,节点0中的逻辑核可以与节点1中的逻辑核共享节点1中的本地内存,节点1中的本地内存对节点0而言是远端内存。
从图8中可以看出,节点0中共享L 1cache的两个逻辑核之间的拓扑距离为D 1,共享L 2cache的两个逻辑核之间的拓扑距离为D 2,共享L 3cache的两个逻辑核之间的拓扑距离为D 3,共享本地内存的两个逻辑核之间的拓扑距离为D 4,节点0中的逻辑核与节点1中的逻辑核共享节点1中的本地内存时,两个逻辑核之间的拓扑距离为D 5
根据Intel发布的各版本的CPU手册,可以获取到CPU访问各个层次的内存或cache的访问时延数据。以Xeon E5-2658v2型号的CPU为例,访问时延如表1所示。
表1
共享的内存或缓存 访问时延
L 1 cache 1.3ns
L 2 cache 3.7ns
L 3 cache 12.8ns
本地内存 56.5ns
通过参考CPU访问不同层次的内存或cache的时延的比例关系,可以量化共享不同层次的内存或cache两个逻辑核之间的拓扑距离。假设共享L 1cache的两个逻辑核之间的拓扑距离D 1=1,则根据CPU访问各个层次的内存或cache的访问时延,可以得到共享不同层次的内存或cache的两个逻辑核之间的拓扑距离,如表2所示。
在NUMA架构中,访问本地内存和远端内存的访问时延比大约是8:12,因此,可以计算出节点之间共享远端内存的逻辑核之间的拓扑距离为64。
表2
共享的内存或缓存 两个逻辑核之间的拓扑距离
L 1 cache 1
L 2 cache 3
L 3 cache 10
本地内存 43
远端内存 64
下面以存储***中的CPU满足图8所示的拓扑结构,以为当前阶段的请求分配满足相应数量的逻辑核为例,对本发明实施例的绑核关系计算模块502在存储***中为各个阶段的请求分配满足相应数量的处理器核的方法进行详细说明。其中,图8中的节点0与节点1在NUMA架构中,并且之间通过超线程通信。
作为示例而非限定,在多个处理器中为该当前阶段的请求分配满足该数量的第二处理器核集合,包括:生成多组分配结果,每组分配结果中包括为每一个阶段的请求分配的满足相应数量的处理器核集合;针对该多组分配结果确定多个路径长度,每一组分配结果对应一个路径长度,该路径长度L满足:
Figure PCTCN2018098277-appb-000011
其中,c i,i+1表示执行相邻阶段的请求的处理器核间交互产生的通信量,d i,i+1表示执行该相邻阶段的请求的处理器核间的平均拓扑距离,M为该业务请求的多个阶段的请求的数量;其中,通信量可以表示处理器核间的交互次数。
根据该多个路径长度中的最短路径长度对应的一组分配结果,为该当前阶段的请求分配满足该数量的处理器核。
具体地,在图8所示的CPU拓扑结构中,当采用超线程通信时,每个处理器核被抽象为逻辑核0与逻辑核1,16个处理器核被抽象为32个逻辑核。
假设该业务请求需要分为3个阶段的请求进行处理,该3个阶段的请求分别记为M 0、M 1与M 2,例如,通过上述的确定用于执行当前阶段的请求的处理器核的数量的方法,在当前周期内分别确定用于执行M 0、M 1与M 2的逻辑核的数量。其中,确定用于执行M 0的逻辑核的数量为8,确定用于执行M 1的逻辑核的数量为8,确定用于执行M 2的逻辑核的数量为16。
绑核关系计算模块502根据为M 0、M 1与M 2确定的逻辑核的数量,生成多组分配结果,每组分配结果中包括为每一阶段的请求分配的满足相应数量的逻辑核。
例如,分配结果1为:将节点0中的逻辑核0~7分配给M 0,将节点0的逻辑核8~15分配给M 1,将节点1的逻辑核0~15分配给M 2
分配结果2为:将节点0中的逻辑核0~3与节点1中的逻辑核0~3分配给M 0,将节点0中的逻辑核4~7与节点1中的逻辑核4~7分配给M 1,将节点0中的逻辑核8~15与节点1中的逻辑核8~15分配给M 2
针对分配结果1,使用式(3)计算路径长度,其中,将执行M 0与M 1的逻辑核之间的平均拓扑距离记为d 0,1,将执行M 1与M 2时逻辑核之间的平均拓扑距离记为d 1,2,则d 0,1=D 4, d 1,2=D 5,将执行M 0与M 1时逻辑核之间交互产生的通信量记为c 0,1,将执行M 1与M 2时逻辑核之间交互产生的通信量记为c 1,2,则分配结果1对应的路径长度L 1满足:
L 1=c 0,1×D 4+c 1,2×D 5  (4)
由表2得知,D 3=10,D 4=43,D 5=64,则L 1=c 0,1×43+c 1,2×64。
针对分配结果2,使用式(3)计算路径长度,其中,将执行M 0与M 1时逻辑核之间的平均拓扑距离记为d 0,1,将执行M 1与M 2时逻辑核之间的平均拓扑距离记为d 1,2,则d 0,1=D 3×0.5+D 5×0.5,d 1,2=D 4×0.5+D 5×0.5,将执行M 0与M 1时逻辑核之间交互产生的通信量记为c 0,1,将执行M 1与M 2时逻辑核之间交互产生的通信量记为c 1,2,则分配结果2对应的路径长度L 2满足:
L 2=c 0,1×(D 3×0.5+D 5×0.5)+c 0,2×(D 4×0.5+D 5×0.5)  (5)
由表2得知,D 3=10,D 4=43,D 5=64,则L 2=c 0,1×37+c 1,2×53.5。
可以看出,相对于分配结果1,分配结果2对应的路径长度较短,因此,绑核关系计算模块502将将节点0中的逻辑核0~3与节点1中的逻辑核0~3分配给M 0,将节点0中的逻辑核4~7与节点1中的逻辑核4~7分配给M 1,将节点0中的逻辑核8~15与节点1中的逻辑核8~15分配给M 2,并在下一周期的起始时刻将绑核关系中原来为该业务请求的各个阶段的请求分配的处理器核替换为重新分配的处理器核。
根据上面实施例生成的多组处理器核的分配结果,针对该多组分配结果确定多个路径长度,通过为业务模块分配处理器核时考虑处理器核间的拓扑距离,并将多个路径长度中的最短路径长度对应的分配结果确定为最终的处理器核分配结果,从而保证处理器核之间的负载均衡,为业务请求每个阶段的请求确定处理器核集合,在处理器集合范围内调度当前阶段的请求,相对于直接选择存储***中负载最轻的处理器核,考虑了各阶段的请求与影响处理器核处理各阶段的请求的时延的相关性,降低处理业务请求的时延。
需要说明的是,上述仅列举两种逻辑核的分配结果仅是为了说明问题所做的示例性说明,并不对本发明实施例构成任何限定,实际应用中可以随机生成多中分配结果,并按照该多组分配结果中与最短路径长度对应的分配结果为各个阶段的请求分配逻辑核。示例性的,本发明实施例中,还可以基于处理器之间的连接关系或总线类型等影响时延的因素,为每一个阶段的请求分配处理器核集合。本发明实施例对此不作限定。
下面对本发明实施例提供的处理业务请求的配置方法700进行详细说明。图9示出了处理业务请求的配置方法的示意性流程图。
701,为业务请求的第一阶段的请求配置第一处理器核集合,该第一处理器核集合用于执行该第一阶段的请求。
具体地,业务请求的处理分为多个阶段进行,多个阶段对应多个阶段的请求,例如,该多个阶段的请求包括第一阶段的请求,为该第一阶段的请求配置一个处理器核集合(例如,第一处理器核集合),通过该第一处理器核集合来处理第一阶段的请求。
702,配置第一规则,该第一规则指示向该第一处理器核集合中负载最轻的处理器核发送该第一阶段的请求。
具体地,可以配置第一规则,该第一规则可以指示为该第一阶段的请求配置的第一处理器核集合中的负载最轻的处理器核执行该第一阶段的请求。
可选地,该方法还包括:
703,为业务请求的第二阶段的请求配置第二处理器核集合,该第二处理器核集合用于执行该第二阶段的请求。
具体地,例如,该业务请求还包括第二阶段的请求,该第二阶段的请求可以是该第一阶段的请求之后的一个阶段的请求,为该第二阶段的请求配置一个处理器核集合(例如,第二处理器核集合),通过该第二处理器核集合来处理第二阶段内的请求。
704,配置第二规则,该第二规则指示向该第二处理器核集合中负载最轻的处理器核发送该第二阶段的请求。
具体地,可以配置第二规则,该第二规则可以指示为该第二阶段的请求配置的第二处理器核集合中的负载最轻的处理器核执行该第二阶段的请求。
关于如何为第一阶段的请求与第二阶段的请求配置相应的处理器核集合,请参考方法600中的相关描述,为了简洁,此处不再赘述。
通过为业务请求的每一阶段的请求分配一定数量的处理器核(例如,处理器核集合),并将每一阶段的请求均发送至为该阶段的请求分配的处理器核集合中的负载最轻的处理器核,相对于将业务请求发送至存储***中多个处理器核当中负载最轻的处理器核,本发明实施例的处理业务请求的配置方法能够使得处理业务请求时,保证处理器核之间的负载均衡,为业务请求每个阶段的请求确定处理器核集合,在处理器集合范围内调度当前阶段的请求,相对于直接选择存储***中负载最轻的处理器核,考虑了各阶段的请求与影响处理器核处理各阶段的请求的时延的相关性,降低处理业务请求的时延。
需要说明的是,上述仅以业务请求包括第一阶段的请求与第二阶段的请求为例进行说明,并不对本发明实施例构成特别限定,例如,该业务请求还可以包括其他阶段的请求。
进一步的,上述配置方法实施例中确定处理器核集合的方法可以参考前面本发明实施例相关部分的描述,在此不再赘述。
上文结合图6至图9,描述了本发明实施例提供的存储***中处理业务请求的方法与处理业务请求的配置方法,下面结合图10至图11描述本发明实施例提供的处理业务请求的装置与存储***。
图10为本发明实施例提供的处理业务请求的装置800的示意性框图,该装置800配置于存储***中,包括收发模块801与处理模块802。
收发模块801,用于接收业务请求的当前阶段的请求,该当前阶段的请求为该业务请求的多个阶段的请求中的一个阶段的请求。
处理模块802,用于确定执行该当前阶段的请求的第一处理器核集合,该第一处理器核集合为该多个处理器核的一个处理器核子集。
收发模块801,还用于向该第一处理器核集合负载最轻的处理器核发送该当前阶段的请求。
可选地,处理模块802,还用于查询绑核关系,确定用于执行该当前阶段的请求的该第一处理器核集合,该绑核关系用于指示该当前阶段的请求与该第一处理器核集合之间的关联关系。
可选地,该处理模块802,还用于根据该第一处理器核集合,重新确定执行该当前阶段的请求的处理器核的数量;根据该重新确定的执行该当前阶段的请求的处理器核的数量,在该多个处理器核中为该当前阶段的请求分配满足该数量的第二处理器核集合;根据 该第二处理器核集合,生成新的绑核关系,该新的绑核关系用于指示该当前阶段的请求与该第二处理器核集合之间的关联关系。
可选地,该处理模块802,还用于确定该第一处理器核集合中的处理器核的利用率总和与该多个处理器核的平均利用率;根据该第一处理器核集合中的处理器核的利用率总和与该多个处理器核的平均利用率,重新确定执行该当前阶段的请求的处理器核的数量。
可选地,该处理模块802,还用于根据该第一处理器核集合中的处理器核的利用率总和与该多个处理器核的平均利用率,基于以下关系式重新确定执行该当前阶段的请求的处理器核的数量:
N=U P/U ave
其中,N为重新确定的执行该当前阶段的请求的处理器核的数量,U P为该第一处理器核集合中的处理器核的利用率总和,U ave为该多个处理器核的平均利用率。
可选地,该处理模块802,还用于生成多组分配结果,每组分配结果中包括为每一个阶段的请求重新分配的满足相应数量的处理器核集合;针对该多组分配结果确定多个路径长度,每一组分配结果对应一个路径长度,该路径长度L满足:
Figure PCTCN2018098277-appb-000012
其中,c i,i+1表示执行相邻阶段的请求的处理器核间交互产生的通信量,d i,i+1表示执行该相邻阶段的请求的处理器核间的平均拓扑距离,M为该业务请求的多个阶段的请求的数量;根据该多个路径长度中的最短路径长度对应的一组分配结果,为该当前阶段的请求分配满足该数量的第二处理器核集合。
可选地,该第一处理器核集合中包括K个处理器核,K为大于或等于3的整数,该处理模块802,还用于根据滑动窗口长度w与滑动步长d,在该K个处理器核中为该当前阶段的请求确定调度子区域,该调度子区域中包括w个处理器核,w为大于或等于2且小于K的整数,d为大于或等于1且小于K的整数;
该收发模块801,还用于向该w个处理器核中负载最轻的处理器核发送该当前阶段的请求。
可选地,d与K互为质数。
根据本发明实施例的处理业务请求的装置800可对应于执行本发明实施例中描述的方法600或方法700,并且装置800中的各个模块的上述和其它操作和/或功能分别为了实现图6中的方法600或图9中的方法700的相应流程,相应的,图5所示的各个模块可以对应到图8所示的一个或多个模块。为了简洁,在此不再赘述。
进一步的,本发明实施例的处理业务请求的装置800具体实现可以是处理器,或者软件模块,或者处理器与软件模块的组合等,本发明实施例对此不作限定。
图11为本发明实施例提供的存储***900的示意性框图,该存储***包括处理器901与存储器902,处理器901包括多个处理器核;
存储器902,用于存储计算机指令;
该多个处理器核中的一个或多个处理器核用于执行该存储器902中存储的计算机指令,当该存储器902中的计算机指令被执行时,该一个或多个处理器核用于执行下列操作:接收业务请求的当前阶段的请求,该当前阶段的请求为该业务请求的多个阶段的请求中的 一个阶段的请求;确定执行该当前阶段的请求的第一存储***核集合,该第一存储***核集合为该多个存储***核的一个存储***核子集;向该第一存储***核集合负载最轻的存储***核发送该当前阶段的请求。
可选地,该一个或多个处理器核,还用于查询绑核关系,确定用于执行该当前阶段的请求的该第一存储***核集合,该绑核关系用于指示该当前阶段的请求与该第一存储***核集合之间的关联关系。
可选地,该一个或多个处理器核,还用于根据该第一存储***核集合,重新确定执行该当前阶段的请求的存储***核的数量;根据该重新确定的执行该当前阶段的请求的存储***核的数量,在该多个存储***核中为该当前阶段的请求分配满足该数量的第二存储***核集合;根据该第二存储***核集合,生成新的绑核关系,该新的绑核关系用于指示该当前阶段的请求与该第二存储***核集合之间的关联关系。
可选地,该一个或多个处理器核,还用于确定该第一存储***核集合中的存储***核的利用率总和与该多个存储***核的平均利用率;根据该第一处理器核集合中的处理器核的利用率总和与该多个处理器核的平均利用率,重新确定执行该当前阶段的请求的处理器核的数量。
可选地,该一个或多个处理器核,还用于根据该第一处理器核集合中的处理器核的利用率总和与该多个处理器核的平均利用率,基于以下关系式重新确定执行该当前阶段的请求的处理器核的数量:
N=U P/U ave
其中,N为重新确定的执行该当前阶段的请求的处理器核的数量,U P为该第一处理器核集合中的处理器核的利用率总和,U ave为该多个处理器核的平均利用率。
可选地,该一个或多个处理器核,还用于生成多组分配结果,每组分配结果中包括为每一个阶段的请求重新分配的满足相应数量的处理器核集合;针对该多组分配结果确定多个路径长度,每一组分配结果对应一个路径长度,该路径长度L满足:
Figure PCTCN2018098277-appb-000013
其中,c i,i+1表示执行相邻阶段的请求的处理器核间交互产生的通信量,d i,i+1表示执行该相邻阶段的请求的处理器核间的平均拓扑距离,M为该业务请求的多个阶段的请求的数量;根据该多个路径长度中的最短路径长度对应的一组分配结果,为该当前阶段的请求分配满足该数量的第二处理器核集合。
可选地,该第一处理器核集合中包括K个处理器核,K为大于或等于3的整数,该一个或多个处理器核,还用于根据滑动窗口长度w与滑动步长d,在该K个处理器核中为该当前阶段的请求确定调度子区域,该调度子区域中包括w个处理器核,w为大于或等于2且小于K的整数,d为大于或等于1且小于K的整数;向该w个处理器核中负载最轻的处理器核发送该当前阶段的请求。
可选地,该d与该K互为质数。
本发明实施例图5所示的各模块可以为处理器核中的硬件逻辑,也可以是处理器核执行的计算机指令,或者硬件逻辑与计算机指令的组合等,本发明实施例对此不作限定。
根据本发明实施例的处理业务请求的装置800的各模块可以由处理器实现,也可以由 处理器与存储器共同实现,也可以由软件模块实现。相应的,图5所示的各个模块可以对应到图8所示的一个或多个模块,图8所示的模块包含图5所示的模块相应功能。
本发明实施例提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机指令,当该计算机指令在计算机上运行时,使得计算机执行本发明实施例中的处理业务请求的方法或处理业务请求的配置方法。
本发明实施例提供了一种包含计算机指令的计算机程序产品,当该计算机指令在计算机上运行时,使得计算机执行本发明实施例中的处理业务请求的方法或处理业务请求的配置方法。
应理解,本发明实施例中提及的处理器可以是中央处理单元(central processing unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本发明实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
需要说明的是,当处理器为通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件时,存储器(存储模块)集成在处理器中。
应注意,本文描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明实施例的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本发明实施例所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点, 所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干计算机指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储计算机指令的介质。
以上所述,仅为本发明实施例的具体实施方式,但本发明实施例的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明实施例揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明实施例的保护范围之内。因此,本发明实施例的保护范围应以所述权利要求的保护范围为准。

Claims (24)

  1. 一种存储***中处理业务请求的方法,所述存储***包含多个处理器核,其特征在于,包括:
    接收业务请求的当前阶段的请求,所述当前阶段的请求为所述业务请求的多个阶段的请求中的一个阶段的请求;
    确定执行所述当前阶段的请求的第一处理器核集合,所述第一处理器核集合为所述多个处理器核的一个处理器核子集;
    向所述第一处理器核集合负载最轻的处理器核发送所述当前阶段的请求。
  2. 根据权利要求1所述的方法,其特征在于,所述确定执行所述当前阶段的请求的第一处理器核集合,包括:
    查询绑核关系,确定用于执行所述当前阶段的请求的所述第一处理器核集合,所述绑核关系用于指示所述当前阶段的请求与所述第一处理器核集合之间的关联关系。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    根据所述第一处理器核集合,重新确定执行所述当前阶段的请求的处理器核的数量;
    根据所述重新确定的执行所述当前阶段的请求的处理器核的数量,在所述多个处理器核中为所述当前阶段的请求分配满足所述数量的第二处理器核集合;
    根据所述第二处理器核集合,生成新的绑核关系,所述新的绑核关系用于指示所述当前阶段的请求与所述第二处理器核集合之间的关联关系。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述第一处理器核集合,重新确定执行所述当前阶段的请求的处理器核的数量,包括:
    确定所述第一处理器核集合中的处理器核的利用率总和与所述多个处理器核的平均利用率;
    根据所述第一处理器核集合中的处理器核的利用率总和与所述多个处理器核的平均利用率,重新确定执行所述当前阶段的请求的处理器核的数量。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述第一处理器核集合中的处理器核的利用率总和与所述多个处理器核的平均利用率,重新确定执行所述当前阶段的请求的处理器核的数量,包括:
    根据所述第一处理器核集合中的处理器核的利用率总和与所述多个处理器核的平均利用率,基于以下关系式重新确定执行所述当前阶段的请求的处理器核的数量:
    N=U P/U ave
    其中,N为重新确定的执行所述当前阶段的请求的处理器核的数量,U P为所述第一处理器核集合中的处理器核的利用率总和,U ave为所述多个处理器核的平均利用率。
  6. 根据权利要求3至5中任一项所述的方法,其特征在于,所述在所述多个处理器核中为所述当前阶段的请求分配满足所述数量的第二处理器核集合,包括:
    生成多组分配结果,每组分配结果中包括为每一个阶段的请求重新分配的满足相应数量的处理器核集合;
    针对所述多组分配结果确定多个路径长度,每一组分配结果对应一个路径长度,所述 路径长度L满足:
    Figure PCTCN2018098277-appb-100001
    其中,c i,i+1表示执行相邻阶段的请求的处理器核间交互产生的通信量,d i,i+1表示执行所述相邻阶段的请求的处理器核间的平均拓扑距离,M为所述业务请求的多个阶段的请求的数量;
    根据所述多个路径长度中的最短路径长度对应的一组分配结果,为所述当前阶段的请求分配满足所述数量的第二处理器核集合。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述第一处理器核集合中包括K个处理器核,K为大于或等于3的整数,所述向所述第一处理器核集合中负载最轻的处理器核发送所述当前阶段的请求,包括:
    根据滑动窗口长度w与滑动步长d,在所述K个处理器核中为所述当前阶段的请求确定调度子区域,所述调度子区域中包括w个处理器核,w为大于或等于2且小于K的整数,d为大于或等于1且小于K的整数;
    向所述w个处理器核中负载最轻的处理器核发送所述当前阶段的请求。
  8. 根据权利要求7所述的方法,其特征在于,所述d与所述K互为质数。
  9. 一种处理业务请求的装置,其特征在于,所述装置配置于存储***中,包括:
    收发模块,用于接收业务请求的当前阶段的请求,所述当前阶段的请求为所述业务请求的多个阶段的请求中的一个阶段的请求;
    处理模块,用于确定执行所述当前阶段的请求的第一处理器核集合,所述第一处理器核集合为所述多个处理器核的一个处理器核子集;
    所述收发模块,还用于向所述第一处理器核集合负载最轻的处理器核发送所述当前阶段的请求。
  10. 根据权利要求9所述的装置,其特征在于,所述处理模块,还用于查询绑核关系,确定用于执行所述当前阶段的请求的所述第一处理器核集合,所述绑核关系用于指示所述当前阶段的请求与所述第一处理器核集合之间的关联关系。
  11. 根据权利要求10所述的装置,其特征在于,所述处理模块,还用于根据所述第一处理器核集合,重新确定执行所述当前阶段的请求的处理器核的数量;根据所述重新确定的执行所述当前阶段的请求的处理器核的数量,在所述多个处理器核中为所述当前阶段的请求分配满足所述数量的第二处理器核集合;根据所述第二处理器核集合,生成新的绑核关系,所述新的绑核关系用于指示所述当前阶段的请求与所述第二处理器核集合之间的关联关系。
  12. 根据权利要求11所述的装置,其特征在于,所述处理模块,还用于确定所述第一处理器核集合中的处理器核的利用率总和与所述多个处理器核的平均利用率;根据所述第一处理器核集合中的处理器核的利用率总和与所述多个处理器核的平均利用率,重新确定执行所述当前阶段的请求的处理器核的数量。
  13. 根据权利要求12所述的装置,其特征在于,所述处理模块,还用于根据所述第一处理器核集合中的处理器核的利用率总和与所述多个处理器核的平均利用率,基于以下关系式重新确定执行所述当前阶段的请求的处理器核的数量:
    N=U P/U ave
    其中,N为重新确定的执行所述当前阶段的请求的处理器核的数量,U P为所述第一处理器核集合中的处理器核的利用率总和,U ave为所述多个处理器核的平均利用率。
  14. 根据权利要求11至13中任一项所述的装置,其特征在于,所述处理模块,还用于
    生成多组分配结果,每组分配结果中包括为每一个阶段的请求重新分配的满足相应数量的处理器核集合;针对所述多组分配结果确定多个路径长度,每一组分配结果对应一个路径长度,所述路径长度L满足:
    Figure PCTCN2018098277-appb-100002
    其中,c i,i+1表示执行相邻阶段的请求的处理器核间交互产生的通信量,d i,i+1表示执行所述相邻阶段的请求的处理器核间的平均拓扑距离,M为所述业务请求的多个阶段的请求的数量;根据所述多个路径长度中的最短路径长度对应的一组分配结果,为所述当前阶段的请求分配满足所述数量的第二处理器核集合。
  15. 根据权利要求9至14中任一项所述的装置,其特征在于,所述第一处理器核集合中包括K个处理器核,K为大于或等于3的整数,所述处理模块,还用于根据滑动窗口长度w与滑动步长d,在所述K个处理器核中为所述当前阶段的请求确定调度子区域,所述调度子区域中包括w个处理器核,w为大于或等于2且小于K的整数,d为大于或等于1且小于K的整数;
    所述收发模块,还用于向所述w个处理器核中负载最轻的处理器核发送所述当前阶段的请求。
  16. 根据权利要求15所述的装置,其特征在于,所述d与所述K互为质数。
  17. 一种存储***,其特征在于,所述存储***包括多个处理器核与存储器;
    存储器,用于存储计算机指令;
    所述多个处理器核中的一个或多个处理器核用于执行所述存储器中存储的计算机指令,当所述存储器中的计算机指令被执行时,所述一个或多个处理器核用于:
    接收业务请求的当前阶段的请求,所述当前阶段的请求为所述业务请求的多个阶段的请求中的一个阶段的请求;确定执行所述当前阶段的请求的第一存储***核集合,所述第一存储***核集合为所述多个存储***核的一个存储***核子集;向所述第一存储***核集合负载最轻的存储***核发送所述当前阶段的请求。
  18. 根据权利要求17所述的存储***,其特征在于,所述一个或多个处理器核,还用于:
    查询绑核关系,确定用于执行所述当前阶段的请求的所述第一存储***核集合,所述绑核关系用于指示所述当前阶段的请求与所述第一存储***核集合之间的关联关系。
  19. 根据权利要求18所述的存储***,其特征在于,所述一个或多个处理器核,还用于:
    根据所述第一存储***核集合,重新确定执行所述当前阶段的请求的存储***核的数量;根据所述重新确定的执行所述当前阶段的请求的存储***核的数量,在所述多个存储***核中为所述当前阶段的请求分配满足所述数量的第二存储***核集合;根据所述第二 存储***核集合,生成新的绑核关系,所述新的绑核关系用于指示所述当前阶段的请求与所述第二存储***核集合之间的关联关系。
  20. 根据权利要求19所述的存储***,其特征在于,所述一个或多个处理器核,还用于:
    确定所述第一存储***核集合中的存储***核的利用率总和与所述多个存储***核的平均利用率;根据所述第一处理器核集合中的处理器核的利用率总和与所述多个处理器核的平均利用率,重新确定执行所述当前阶段的请求的处理器核的数量。
  21. 根据权利要求20所述的存储***,其特征在于,所述一个或多个处理器核,还用于:
    根据所述第一处理器核集合中的处理器核的利用率总和与所述多个处理器核的平均利用率,基于以下关系式重新确定执行所述当前阶段的请求的处理器核的数量:
    N=U P/U ave
    其中,N为重新确定的执行所述当前阶段的请求的处理器核的数量,U P为所述第一处理器核集合中的处理器核的利用率总和,U ave为所述多个处理器核的平均利用率。
  22. 根据权利要求19至21中任一项所述的存储***,其特征在于,所述一个或多个处理器核,还用于:
    生成多组分配结果,每组分配结果中包括为每一个阶段的请求重新分配的满足相应数量的处理器核集合;针对所述多组分配结果确定多个路径长度,每一组分配结果对应一个路径长度,所述路径长度L满足:
    Figure PCTCN2018098277-appb-100003
    其中,c i,i+1表示执行相邻阶段的请求的处理器核间交互产生的通信量,d i,i+1表示执行所述相邻阶段的请求的处理器核间的平均拓扑距离,M为所述业务请求的多个阶段的请求的数量;根据所述多个路径长度中的最短路径长度对应的一组分配结果,为所述当前阶段的请求分配满足所述数量的第二处理器核集合。
  23. 根据权利要求17至22中任一项所述的存储***,其特征在于,所述第一处理器核集合中包括K个处理器核,K为大于或等于3的整数,所述一个或多个处理器核,还用于:
    根据滑动窗口长度w与滑动步长d,在所述K个处理器核中为所述当前阶段的请求确定调度子区域,所述调度子区域中包括w个处理器核,w为大于或等于2且小于K的整数,d为大于或等于1且小于K的整数;向所述w个处理器核中负载最轻的处理器核发送所述当前阶段的请求。
  24. 根据权利要求23所述的存储***,其特征在于,所述d与所述K互为质数。
PCT/CN2018/098277 2018-08-02 2018-08-02 处理业务请求的方法、装置与存储*** WO2020024207A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/098277 WO2020024207A1 (zh) 2018-08-02 2018-08-02 处理业务请求的方法、装置与存储***
CN201880005605.6A CN110178119B (zh) 2018-08-02 2018-08-02 处理业务请求的方法、装置与存储***

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/098277 WO2020024207A1 (zh) 2018-08-02 2018-08-02 处理业务请求的方法、装置与存储***

Publications (1)

Publication Number Publication Date
WO2020024207A1 true WO2020024207A1 (zh) 2020-02-06

Family

ID=67689271

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/098277 WO2020024207A1 (zh) 2018-08-02 2018-08-02 处理业务请求的方法、装置与存储***

Country Status (2)

Country Link
CN (1) CN110178119B (zh)
WO (1) WO2020024207A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118069374A (zh) * 2024-04-18 2024-05-24 清华大学 数据中心智能训练仿真事务加速方法、装置、设备及介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231099B (zh) * 2020-10-14 2024-07-05 北京中科网威信息技术有限公司 一种处理器的内存访问方法及装置
CN114924866A (zh) * 2021-04-30 2022-08-19 华为技术有限公司 数据处理方法和相关设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090064167A1 (en) * 2007-08-28 2009-03-05 Arimilli Lakshminarayana B System and Method for Performing Setup Operations for Receiving Different Amounts of Data While Processors are Performing Message Passing Interface Tasks
CN102411510A (zh) * 2011-09-16 2012-04-11 华为技术有限公司 在多核处理器的虚拟机上映射业务数据流的方法和装置
CN102681902A (zh) * 2012-05-15 2012-09-19 浙江大学 一种基于多核***任务分配的负载均衡方法
CN102855218A (zh) * 2012-05-14 2013-01-02 中兴通讯股份有限公司 数据处理***、方法及装置
CN104391747A (zh) * 2014-11-18 2015-03-04 北京锐安科技有限公司 一种并行计算方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8015392B2 (en) * 2004-09-29 2011-09-06 Intel Corporation Updating instructions to free core in multi-core processor with core sequence table indicating linking of thread sequences for processing queued packets
CN102306139A (zh) * 2011-08-23 2012-01-04 北京科技大学 用于ofdm无线通信***的异构多核数字信号处理器
CN103473120A (zh) * 2012-12-25 2013-12-25 北京航空航天大学 一种基于加速因子的多核实时***任务划分方法
US10467120B2 (en) * 2016-11-11 2019-11-05 Silexica GmbH Software optimization for multicore systems

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090064167A1 (en) * 2007-08-28 2009-03-05 Arimilli Lakshminarayana B System and Method for Performing Setup Operations for Receiving Different Amounts of Data While Processors are Performing Message Passing Interface Tasks
CN102411510A (zh) * 2011-09-16 2012-04-11 华为技术有限公司 在多核处理器的虚拟机上映射业务数据流的方法和装置
CN102855218A (zh) * 2012-05-14 2013-01-02 中兴通讯股份有限公司 数据处理***、方法及装置
CN102681902A (zh) * 2012-05-15 2012-09-19 浙江大学 一种基于多核***任务分配的负载均衡方法
CN104391747A (zh) * 2014-11-18 2015-03-04 北京锐安科技有限公司 一种并行计算方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118069374A (zh) * 2024-04-18 2024-05-24 清华大学 数据中心智能训练仿真事务加速方法、装置、设备及介质

Also Published As

Publication number Publication date
CN110178119B (zh) 2022-04-26
CN110178119A (zh) 2019-08-27

Similar Documents

Publication Publication Date Title
US10534542B2 (en) Dynamic core allocation for consistent performance in a non-preemptive scheduling environment
JP5514041B2 (ja) 識別子割当て方法及びプログラム
JP5510556B2 (ja) 仮想マシンのストレージスペースおよび物理ホストを管理するための方法およびシステム
US9866450B2 (en) Methods and apparatus related to management of unit-based virtual resources within a data center environment
EP3281359B1 (en) Application driven and adaptive unified resource management for data centers with multi-resource schedulable unit (mrsu)
US10394606B2 (en) Dynamic weight accumulation for fair allocation of resources in a scheduler hierarchy
WO2018120991A1 (zh) 一种资源调度方法及装置
WO2021008197A1 (zh) 资源分配方法、存储设备和存储***
US11496413B2 (en) Allocating cloud computing resources in a cloud computing environment based on user predictability
US20220156115A1 (en) Resource Allocation Method And Resource Borrowing Method
WO2016041446A1 (zh) 一种资源分配方法、装置及设备
JP2014522036A (ja) クラウド環境内で仮想リソースを割り当てるための方法および装置
WO2020024207A1 (zh) 处理业务请求的方法、装置与存储***
JP7506096B2 (ja) 計算資源の動的割り当て
WO2020224531A1 (zh) 存储***中令牌的分配方法和装置
WO2018032519A1 (zh) 一种资源分配方法、装置及numa***
CN112506650A (zh) 资源分配方法、***、计算机设备和存储介质
WO2024022142A1 (zh) 资源使用方法和装置
CN116483740B (zh) 内存数据的迁移方法、装置、存储介质及电子装置
JP2018190355A (ja) リソース管理方法
CN109298949B (zh) 一种分布式文件***的资源调度***
US20120151175A1 (en) Memory apparatus for collective volume memory and method for managing metadata thereof
US8918555B1 (en) Adaptive and prioritized replication scheduling in storage clusters
WO2017133421A1 (zh) 一种多租户资源共享的方法及装置
JP7127155B2 (ja) セルラ電気通信ネットワーク

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18928883

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18928883

Country of ref document: EP

Kind code of ref document: A1