WO2024046283A1 - 任务调度方法、模型生成方法、以及电子设备 - Google Patents

任务调度方法、模型生成方法、以及电子设备 Download PDF

Info

Publication number
WO2024046283A1
WO2024046283A1 PCT/CN2023/115329 CN2023115329W WO2024046283A1 WO 2024046283 A1 WO2024046283 A1 WO 2024046283A1 CN 2023115329 W CN2023115329 W CN 2023115329W WO 2024046283 A1 WO2024046283 A1 WO 2024046283A1
Authority
WO
WIPO (PCT)
Prior art keywords
pmu
indicators
task
core
training set
Prior art date
Application number
PCT/CN2023/115329
Other languages
English (en)
French (fr)
Inventor
王耀光
李海程
成坚
周轶刚
谢星华
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024046283A1 publication Critical patent/WO2024046283A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3433Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment for load management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5019Workload prediction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor

Definitions

  • the present disclosure relates to the field of terminal technology, and more specifically, to task scheduling methods and devices, model generation methods and devices, electronic devices, computer-readable storage media, chips, and computer program products.
  • Terminals such as mobile phones, personal computers, and servers have commonly used multi-core processing architectures.
  • processors with heterogeneous computing power, such as large cores and small cores. nuclear.
  • large cores have strong computing power but high power consumption
  • small cores have small computing power but low power consumption.
  • Example embodiments of the present disclosure provide a solution for scheduling tasks based on their operating characteristics.
  • a task scheduling method includes: obtaining multiple first performance monitoring unit (Performance Monitoring Unit, PMU) indicators when the task is running on the first core of the heterogeneous system; inputting the multiple first PMU indicators into a pre-generated load characteristic identification model , to obtain the predicted running characteristics of the task; and schedule the task based on the predicted running characteristics.
  • PMU Performance Monitoring Unit
  • the load characteristic identification model can provide the predicted operating characteristics of the task, thereby providing a reliable reference for task scheduling, which in turn can make the task scheduling more accurate. Accordingly, full utilization of hardware resources can be achieved.
  • the method further includes: migrating the task to run on the second core of the heterogeneous system; obtaining multiple second PMU indicators when the task runs on the second core and the second task running
  • the second task running indicator includes a second running time and/or a second application performance indicator.
  • the running indicators of the task on different cores can be obtained separately, which can be used to evaluate the model.
  • the method further includes: obtaining a first task running indicator when the task runs on the first core, where the first task running indicator includes a first running time and/or a first application performance indicator.
  • the first application performance indicator may include a first throughput and/or a first number of instructions per clock cycle (Instructions per cycle, IPC).
  • the method further includes: determining the actual running characteristics of the task based on the first task running indicator and the second task running indicator.
  • the method further includes storing at least one of the following: a plurality of first PMU indicators, a plurality of second PMU indicators, a first task operation indicator, a second task operation indicator, predicted operation characteristics, or actual Operating characteristics.
  • an online data set can be constructed during the actual execution of the task, which can be used for online updating of the model.
  • the method further includes: if the error between the predicted operating characteristics and the actual operating characteristics exceeds an error threshold, determining the number of tasks for which the error exceeds the error threshold; and if the number of tasks exceeds the number threshold, based on the th. 2. PMU indicators and actual operating characteristics, and update the load characteristic identification model.
  • the method further includes: determining whether the updated load feature identification model meets accuracy requirements; if it is determined that the accuracy requirements are not met, regenerating the load feature identification model.
  • the updated model can be evaluated for accuracy to ensure the accuracy of the model.
  • the load feature identification model is generated through the following process: constructing an initial training set, the initial training set includes a plurality of initial data items, each initial data item includes tasks running on heterogeneous systems. The first number of PMU indicators and target values when the first core or the second core is on. The target value indicates the operating characteristics of the task; an update training set is constructed based on the initial training set.
  • the update training set includes multiple update data items, each update The data items include a second number of PMU indicators and target values, where the second number is smaller than the first number; and based on the updated training set, a load feature identification model is generated.
  • a load feature identification model can be generated through training based on the training set, and the model can provide more accurate task operating characteristics for task scheduling.
  • constructing the initial training set includes: obtaining a first number of PMU indicators; obtaining a first application performance indicator when the task runs on the first core; obtaining a first application performance indicator when the task runs on the second core. the second application performance index; determining the target value based on the first application performance index and the second application performance index; and constructing an initial training set based on the first number of PMU indicators and the target value.
  • the target value can be used as a label in supervised learning, which can facilitate the model training process and improve the efficiency of model generation.
  • constructing the updated training set based on the initial training set includes: dividing a first number of PMU indicators into a plurality of clusters; extracting a second number of PMU indicators from the plurality of clusters; and based on the second number of PMU indicators and target values to construct an updated training set.
  • the number of PMU indicators in the updated training set can be reduced, which can reduce the scale of training and improve training efficiency.
  • the number of PMU indicators that need to be collected is smaller, which can reduce the cost of kernel and/or processor monitoring and reduce energy consumption.
  • dividing the first number of PMU indicators into a plurality of clusters includes: determining a first degree of correlation between pairs of the first number of PMU indicators; dividing the first number of PMU indicators into a plurality of clusters based on a correlation threshold. Cluster the PMU indicators to obtain multiple clusters, where the first correlation between any two PMU indicators located in the same cluster is not lower than the correlation threshold, or where every two PMU indicators located in the same cluster The mean value of the first correlation between PMU indicators is not lower than the correlation threshold.
  • the first correlation includes at least one of the following: covariance, Euclidean distance, or Pearson correlation coefficient.
  • PMU indicators can be clustered by correlation, so that PMU indicators located in the same cluster have close performance, which can facilitate the extraction of PMU indicators.
  • extracting the second number of PMU indicators from the plurality of clusters includes: determining a second correlation between each PMU indicator in the first number of PMU indicators and the target value; based on the Second correlation degree, sort the first number of PMU indicators; according to the order of the second correlation degree from high to low, extract the second number of PMU indicators from the second number of clusters in multiple clusters.
  • the second number of PMU indicators includes a first PMU indicator extracted from a first cluster of the plurality of clusters, and the method further includes: based on user preferences or adjustment instructions from the user, adding the first PMU indicator to One PMU indicator is replaced with the second PMU indicator in the first cluster.
  • the input of the model can also be adjusted based on the user's needs, which can meet the user's personalized needs and increase the flexibility of the model.
  • the method further includes: receiving input information from a user, the input information indicating the second quantity and the relevance threshold.
  • generating a load feature identification model based on the updated training set includes: generating a load feature identification model based on the updated training set and a model configuration, wherein the model configuration includes a model type and/or load of the load feature identification model.
  • the model type includes at least one of the following in a machine learning model: supervised learning, or unsupervised learning, wherein the supervised learning includes at least one of the following: a linear regression type, or a neural network type , where unsupervised learning includes at least one of the following: K nearest neighbors, or maximum expectation type.
  • the input model configuration can provide necessary information for the model to be trained, thereby facilitating the model training process.
  • the method further includes: testing the load feature identification model based on the test set, where the tested load feature identification model meets accuracy requirements. In this way, the generated load signature identification model can also be tested to ensure the accuracy of the model.
  • the operating characteristic includes a performance speedup between the first core and the second core.
  • a model generation method includes: constructing an initial training set, the initial training set including a plurality of initial data items, each initial data item including a first number of PMU indicators and targets when the task is run on the first core or the second core of the heterogeneous system. value, the target value indicates the operating characteristics of the task; an updated training set is constructed based on the initial training set, the updated training set includes a plurality of updated data items, each updated data item includes a second number of PMU indicators and target values, where the second number is less than the first quantity; and based on the updated training set, generate a load feature identification model.
  • constructing the initial training set includes: obtaining a first number of PMU indicators; obtaining a first application performance indicator when the task runs on the first core; obtaining a first application performance indicator when the task runs on the second core. the second application performance index; determining the target value based on the first application performance index and the second application performance index; and constructing an initial training set based on the first number of PMU indicators and the target value.
  • constructing the updated training set based on the initial training set includes: dividing a first number of PMU indicators into a plurality of clusters; extracting a second number of PMU indicators from the plurality of clusters; and based on the second Number of PMU indicators and target values to construct update training set.
  • dividing the first number of PMU indicators into a plurality of clusters includes: determining a first degree of correlation between pairs of the first number of PMU indicators; dividing the first number of PMU indicators into a plurality of clusters based on a correlation threshold. Cluster the PMU indicators to obtain multiple clusters, where the first correlation between any two PMU indicators located in the same cluster is not lower than the correlation threshold, or where every two PMU indicators located in the same cluster The mean value of the first correlation between PMU indicators is not lower than the correlation threshold.
  • the first correlation includes at least one of the following: covariance, Euclidean distance, or Pearson correlation coefficient.
  • extracting the second number of PMU indicators from the plurality of clusters includes: determining a second correlation between each PMU indicator in the first number of PMU indicators and the target value; based on the first Second correlation degree, sort the first number of PMU indicators; according to the order of the second correlation degree from high to low, extract the second number of PMU indicators from the second number of clusters in multiple clusters.
  • the second number of PMU indicators includes a first PMU indicator extracted from a first cluster of the plurality of clusters, and the method further includes: based on user preferences or adjustment instructions from the user, adding the first PMU indicator to One PMU indicator is replaced with the second PMU indicator in the first cluster.
  • the method further includes: receiving input information from the user, the input information indicating the second quantity and the relevance threshold.
  • generating a load feature identification model based on the updated training set includes: generating a load feature identification model based on the updated training set and a model configuration, wherein the model configuration includes a model type and/or load of the load feature identification model. Hyperparameters of feature recognition models.
  • the model type includes one or more of the following in a machine learning model: supervised learning or unsupervised learning, wherein the supervised learning includes at least one of the following: a linear regression type, or a neural network Type, where unsupervised learning includes at least one of the following: K-nearest neighbors, or maximum expectation type.
  • the method further includes: testing the load feature identification model based on the test set, where the tested load feature identification model meets accuracy requirements.
  • the operating characteristic includes a performance speedup between the first core and the second core.
  • an apparatus for task scheduling includes: an acquisition module configured to acquire a plurality of first PMU indicators when a task is run on a first core of a heterogeneous system; a prediction module configured to input a plurality of first PMU indicators into pre-generated load characteristics an identification model to obtain the predicted operating characteristics of the task; and a scheduling module configured to schedule the task based on the predicted operating characteristics.
  • the device further includes: a task migration module configured to migrate the task to run on the second core of the heterogeneous system; a post-migration information acquisition module configured to acquire the task execution A plurality of second PMU indicators and a second task running indicator when on the second core.
  • the second task running indicator includes a second running time and/or a second application performance indicator.
  • the acquisition module is further configured to: acquire a first task running indicator when the task runs on the first core, where the first task running indicator includes a first running time and/or a first application performance. index.
  • the first application performance indicator may include first throughput and/or first IPC.
  • the apparatus further includes an actual operating characteristic determining module configured to determine the actual operating characteristics of the task based on the first task operating indicator and the second task operating indicator.
  • the apparatus further includes a storage module configured to store at least one of the following: a plurality of first PMU indicators, a plurality of second PMU indicators, a first task running indicator, a second task running indicator Indicators, predicted operating characteristics, or actual operating characteristics.
  • the apparatus further includes a model update module configured to, if an error between the predicted operating characteristics and the actual operating characteristics exceeds an error threshold, determine the number of tasks in which the error exceeds the error threshold; and if the task If the number exceeds the number threshold, the load feature identification model is updated based on the second PMU indicator and actual operating characteristics.
  • a model update module configured to, if an error between the predicted operating characteristics and the actual operating characteristics exceeds an error threshold, determine the number of tasks in which the error exceeds the error threshold; and if the task If the number exceeds the number threshold, the load feature identification model is updated based on the second PMU indicator and actual operating characteristics.
  • the device further includes: an accuracy determination module configured to determine whether the updated load characteristic identification model meets accuracy requirements; a model update module configured to determine if the updated load characteristic identification model does not meet accuracy requirements. If required, the load characteristic identification model will be regenerated.
  • the apparatus further includes: an initial training set building module configured to build an initial training set, the initial training set includes a plurality of initial data items, each initial data item includes a task running on a heterogeneous The first number of PMU indicators and target values when the first core or the second core of the system is on, and the target value indicates the operating characteristics of the task;
  • the update training set construction module is configured to build an update training set based on the initial training set, and update the training The set includes a plurality of updated data items, each updated data item including a second number of PMU indicators and target values, wherein the second number is less than the first number; and a model generation module configured to generate load feature identification based on the updated training set. Model.
  • the initial training set building module includes: a first acquisition sub-module configured to acquire a first number of PMU index; the second acquisition sub-module is configured to acquire the first application performance index when the task runs on the first core; the third acquisition sub-module is configured to acquire the second application performance index when the task runs on the second core ;
  • the target value determination sub-module is configured to determine the target value based on the first application performance index and the second application performance index; and the initial training set construction sub-module is configured to construct the initial training based on the first number of PMU indicators and the target value. set.
  • the updating training set building module includes: the clustering sub-module is configured to divide the first number of PMU indicators into multiple clusters; the extraction sub-module is configured to extract the first number of PMU indicators from the multiple clusters. The second number of PMU indicators; and the update training set construction sub-module is configured to build an update training set based on the second number of PMU indicators and the target value.
  • the clustering sub-module includes: a first correlation determination unit configured to determine a first correlation between pairs of the first number of PMU indicators; the clustering unit is configured to Cluster the first number of PMU indicators based on the correlation threshold to obtain multiple clusters, in which the first correlation between any two PMU indicators located in the same cluster is not lower than the correlation threshold, or among which The mean value of the first correlation between every two PMU indicators in the same cluster is not lower than the correlation threshold.
  • the first correlation includes at least one of the following: covariance, Euclidean distance, or Pearson correlation coefficient.
  • the extraction sub-module includes: a second correlation determining unit configured to determine a second correlation between each PMU indicator in the first number of PMU indicators and the target value; a sorting unit is configured to sort the first number of PMU indicators based on the second correlation degree; the extraction unit is configured to extract the second number of PMU indicators from the second number of clusters in the plurality of clusters in order from high to low of the second correlation degree. Number of PMU indicators.
  • the extraction sub-module further includes an adjustment unit configured to replace the first PMU indicator with a second PMU indicator in the first cluster based on user preferences or adjustment instructions from the user.
  • the apparatus further includes a receiving module configured to receive input information from the user, the input information indicating the second quantity and the relevance threshold.
  • the model generation module is configured to: generate a load feature identification model based on the updated training set and the model configuration, wherein the model configuration includes a model type of the load feature identification model and/or a load feature identification model. hyperparameters.
  • the model type includes one or more of the following in a machine learning model: supervised learning or unsupervised learning, wherein the supervised learning includes at least one of the following: a linear regression type, or a neural network Type, where unsupervised learning includes at least one of the following: K-nearest neighbors, or maximum expectation type.
  • the device further includes a testing module configured to test the load feature identification model based on the test set, wherein the tested load feature identification model meets the accuracy requirement.
  • the operating characteristic includes a performance speedup between the first core and the second core.
  • an apparatus for model generation includes: an initial training set building module configured to build an initial training set, the initial training set includes a plurality of initial data items, and each initial data item includes when a task is run on the first core or the second core of the heterogeneous system.
  • the first number of PMU indicators and target values, the target value indicates the operating characteristics of the task;
  • the update training set building module is configured to build an update training set based on the initial training set, the update training set includes multiple update data items, each update
  • the data items include a second number of PMU indicators and target values, where the second number is smaller than the first number; and a model generation module configured to generate a load feature identification model based on the updated training set.
  • the initial training set building module includes: a first acquisition sub-module configured to acquire a first number of PMU indicators; a second acquisition sub-module configured to acquire when the task is running on the first core The first application performance index; the third acquisition sub-module is configured to obtain the second application performance index when the task is running on the second core; the target value determination sub-module is configured to be based on the first application performance index and the second application performance indicators, determining target values; and the initial training set construction sub-module is configured to build an initial training set based on the first number of PMU indicators and the target values.
  • the updating training set building module includes: the clustering sub-module is configured to divide the first number of PMU indicators into multiple clusters; the extraction sub-module is configured to extract the first number of PMU indicators from the multiple clusters. The second number of PMU indicators; and the update training set construction sub-module is configured to build an update training set based on the second number of PMU indicators and the target value.
  • the clustering sub-module includes: a first correlation determining unit configured to determine a first correlation between pairs of the first number of PMU indicators; the clustering unit is configured to Cluster the first number of PMU indicators based on the correlation threshold to obtain multiple clusters, in which the first correlation between any two PMU indicators located in the same cluster is not lower than the correlation threshold, or among which The mean value of the first correlation between every two PMU indicators in the same cluster is not lower than the correlation threshold.
  • the first correlation includes at least one of the following: covariance, Euclidean distance, or Pearson correlation coefficient.
  • the extraction sub-module includes: the second correlation determination unit is configured to determine the first number of PMUs The second correlation degree between each PMU indicator in the indicator and the target value; the sorting unit is configured to sort the first number of PMU indicators based on the second correlation degree; the extraction unit is configured to sort the first number of PMU indicators based on the second correlation degree from In order from high to low, the second number of PMU indicators are extracted from the second number of clusters among the plurality of clusters.
  • the extraction sub-module further includes an adjustment unit configured to replace the first PMU indicator with a second PMU indicator in the first cluster based on user preferences or adjustment instructions from the user.
  • the apparatus further includes a receiving module configured to receive input information from the user, the input information indicating the second quantity and the relevance threshold.
  • the model generation module is configured to: generate a load feature identification model based on the updated training set and the model configuration, wherein the model configuration includes a model type of the load feature identification model and/or a load feature identification model. hyperparameters.
  • the model type includes one or more of the following in a machine learning model: supervised learning or unsupervised learning, wherein the supervised learning includes at least one of the following: a linear regression type, or a neural network Type, where unsupervised learning includes at least one of the following: K-nearest neighbors, or maximum expectation type.
  • the device further includes a testing module configured to test the load feature identification model based on the test set, wherein the tested load feature identification model meets the accuracy requirement.
  • the operating characteristic includes a performance speedup between the first core and the second core.
  • an electronic device including a multi-core processor and a memory.
  • the memory stores instructions executed by the multi-core processor.
  • the electronic device Implementation: Obtain multiple first PMU indicators when the task is running on the first core of the heterogeneous system; input multiple first PMU indicators into the pre-generated load feature identification model to obtain the predicted operating characteristics of the task; and based on Predict operating characteristics and schedule tasks.
  • the electronic device when the instructions are executed by the multi-core processor, the electronic device is caused to: migrate the task to run on the second core of the heterogeneous system; obtain the task to run on the second core A plurality of second PMU indicators and a second task running indicator are updated, and the second task running indicator includes a second running time and/or a second application performance indicator.
  • the electronic device when the instruction is executed by the multi-core processor, the electronic device is caused to: obtain a first task running indicator when the task is running on the first core, and the first task running indicator includes the first task running indicator. a runtime and/or first application performance metric.
  • the first application performance indicator may include first throughput and/or first IPC.
  • the electronic device determines the actual running characteristics of the task based on the first task running indicator and the second task running indicator.
  • the electronic device when the instructions are executed by the multi-core processor, the electronic device is caused to: store at least one of the following: a plurality of first PMU indicators, a plurality of second PMU indicators, a first task Operation indicators, second task operation indicators, predicted operation characteristics, or actual operation characteristics.
  • the electronic device when the instructions are executed by the multi-core processor, the electronic device is caused to: if the error between the predicted operating characteristics and the actual operating characteristics exceeds an error threshold, determine that the error exceeds the error threshold. The number of tasks; and if the number of tasks exceeds the number threshold, the load characteristic identification model is updated based on the second PMU indicator and actual operating characteristics.
  • the electronic device when the instructions are executed by the multi-core processor, the electronic device is caused to: determine whether the updated load feature identification model meets accuracy requirements; if it is determined that the accuracy requirements are not met, then Regenerate the load signature identification model.
  • the electronic device when the instructions are executed by the multi-core processor, the electronic device generates a load feature identification model through the following process: constructing an initial training set, the initial training set including a plurality of initial data items , each initial data item includes the first number of PMU indicators and target values when the task runs on the first core or the second core of the heterogeneous system.
  • the target value indicates the operating characteristics of the task; an updated training set is constructed based on the initial training set.
  • the updated training set includes a plurality of updated data items, each updated data item includes a second number of PMU indicators and target values, where the second number is smaller than the first number; and based on the updated training set, a load feature identification model is generated.
  • the electronic device when the instruction is executed by the multi-core processor, the electronic device is caused to: obtain the first number of PMU indicators; obtain the first application performance indicator when the task is run on the first core. ; Obtain the second application performance indicator when the task is running on the second core; determine the target value based on the first application performance indicator and the second application performance indicator; and build an initial training set based on the first number of PMU indicators and target values. .
  • the electronic device when the instructions are executed by the multi-core processor, the electronic device is caused to: divide a first number of PMU indicators into a plurality of clusters; extract a second number of PMU indicators from the plurality of clusters. PMU indicators; and constructing an updated training set based on the second number of PMU indicators and target values.
  • the instructions when executed by the multi-core processor, cause the electronic device to: determine the first quantity The first correlation between two PMU indicators; cluster the first number of PMU indicators based on the correlation threshold to obtain multiple clusters, in which the first correlation between any two PMU indicators in the same cluster The first correlation is not lower than the correlation threshold, or the mean value of the first correlation between every two PMU indicators located in the same cluster is not lower than the correlation threshold.
  • the first correlation includes at least one of the following: covariance, Euclidean distance, or Pearson correlation coefficient.
  • the electronic device when the instructions are executed by the multi-core processor, the electronic device is caused to: determine a second degree of correlation between each PMU indicator in the first number of PMU indicators and the target value. ; Sort the first number of PMU indicators based on the second correlation degree; extract the second number of PMU indicators from the second number of clusters in the plurality of clusters in order of the second correlation degree from high to low.
  • the second number of PMU indicators includes a first PMU indicator extracted from a first cluster of a plurality of clusters, which when executed by the multi-core processor causes the electronic device to: The first PMU indicator is replaced with a second PMU indicator in the first cluster based on user preference or adjustment instructions from the user.
  • the instructions when executed by the multi-core processor, cause the electronic device to: receive input information from the user, the input information indicating the second quantity and the relevance threshold.
  • the electronic device when the instructions are executed by the multi-core processor, the electronic device is caused to: generate a load feature identification model based on the updated training set and the model configuration, wherein the model configuration includes a load feature identification model
  • the model type and/or load signature identifies the model's hyperparameters.
  • the model type includes one or more of the following in a machine learning model: supervised learning, or unsupervised learning, wherein the supervised learning includes at least one of the following: a linear regression type, or a neural Network type, unsupervised learning includes at least one of the following: K nearest neighbors, or maximum expectation type.
  • the electronic device when the instructions are executed by the multi-core processor, the electronic device implements: based on the test set, test the load feature identification model, wherein the tested load feature identification model meets the accuracy Require.
  • the operating characteristic includes a performance speedup between the first core and the second core.
  • an electronic device including a multi-core processor and a memory. Instructions executed by the multi-core processor are stored on the memory. When the instructions are executed by the multi-core processor, the electronic device Implementation: Construct an initial training set.
  • the initial training set includes multiple initial data items. Each initial data item includes a first number of PMU indicators and target values when the task is run on the first core or the second core of the heterogeneous system.
  • the target value indicates the operating characteristics of the task; an updated training set is constructed based on the initial training set, the updated training set includes a plurality of updated data items, each updated data item includes a second number of PMU indicators and target values, where the second number is smaller than the first quantity; and based on the updated training set, generate a load feature identification model.
  • the electronic device when the instruction is executed by the multi-core processor, the electronic device is caused to: obtain the first number of PMU indicators; obtain the first application performance indicator when the task is run on the first core. ; Obtain the second application performance indicator when the task is running on the second core; determine the target value based on the first application performance indicator and the second application performance indicator; and build an initial training set based on the first number of PMU indicators and target values. .
  • the electronic device when the instructions are executed by the multi-core processor, the electronic device is caused to: divide the first number of PMU indicators into multiple clusters; extract the second number of PMU indicators from the multiple clusters. PMU indicators; and constructing an updated training set based on the second number of PMU indicators and target values.
  • the electronic device when the instructions are executed by the multi-core processor, the electronic device is caused to: determine a first degree of correlation between pairs of the first number of PMU indicators; The first number of PMU indicators are clustered to obtain multiple clusters, where the first correlation between any two PMU indicators located in the same cluster is not lower than the correlation threshold, or where the first correlation between any two PMU indicators located in the same cluster is The average value of the first correlation between each two PMU indicators is not lower than the correlation threshold.
  • the first correlation includes at least one of the following: covariance, Euclidean distance, or Pearson correlation coefficient.
  • the electronic device when the instructions are executed by the multi-core processor, the electronic device is caused to: determine a second degree of correlation between each PMU indicator in the first number of PMU indicators and the target value. ; Sort the first number of PMU indicators based on the second correlation degree; extract the second number of PMU indicators from the second number of clusters in the plurality of clusters in order of the second correlation degree from high to low.
  • the second number of PMU indicators includes a first PMU indicator extracted from a first cluster of a plurality of clusters, which when executed by the multi-core processor causes the electronic device to: The first PMU indicator is replaced with a second PMU indicator in the first cluster based on user preference or adjustment instructions from the user.
  • the electronic device when the instructions are executed by the multi-core processor, the electronic device is caused to: receive input information from the user, the input information indicating the second quantity and the relevance threshold.
  • the electronic device when the instructions are executed by the multi-core processor, the electronic device is caused to: generate a load feature identification model based on the updated training set and the model configuration, wherein the model configuration includes a load feature identification model Model type and/or load characteristics identification Hyperparameters of different models.
  • the model type includes one or more of the following in a machine learning model: supervised learning, or unsupervised learning, wherein the supervised learning includes at least one of the following: a linear regression type or a neural network Type, unsupervised learning includes at least one of the following: K-nearest neighbors or maximum expectation type.
  • the electronic device when the instructions are executed by the multi-core processor, the electronic device implements: testing the load feature identification model based on the test set, wherein the tested load feature identification model satisfies the accuracy Require.
  • the operating characteristic includes a performance speedup between the first core and the second core.
  • a computer-readable storage medium is provided.
  • a computer program is stored on the computer-readable storage medium.
  • the computer program When executed by a processor, the computer program implements the first aspect or the second aspect or any implementation thereof. The operation of the method in the example.
  • a chip or chip system in an eighth aspect, includes a processing circuit configured to perform operations according to the method in the above first or second aspect or any embodiment thereof.
  • a computer program or computer program product is provided.
  • the computer program or computer program product is tangibly stored on a computer-readable medium and includes computer-executable instructions that, when executed, cause the device to implement the device according to the above-mentioned first or second aspect or any embodiment thereof.
  • Figure 1 shows a schematic block diagram of a heterogeneous system in which embodiments of the present disclosure can be applied
  • Figure 2 shows a schematic flowchart of a process of task scheduling according to some embodiments of the present disclosure
  • Figure 3 shows a schematic block diagram of a system in accordance with some embodiments of the present disclosure
  • Figure 4 illustrates a schematic flowchart of a process of generating a load feature identification model according to some embodiments of the present disclosure
  • Figure 5 shows another schematic flowchart of a process of generating a load feature identification model according to some embodiments of the present disclosure
  • Figure 6 shows a schematic flow diagram of a process of model calibration according to some embodiments of the present disclosure
  • Figure 7 shows a schematic block diagram of an apparatus for task scheduling according to some embodiments of the present disclosure
  • Figure 8 shows a schematic block diagram of an apparatus for model generation according to some embodiments of the present disclosure.
  • Figure 9 shows a schematic block diagram of an example device that may be used to implement embodiments of the present disclosure.
  • scheduling tasks to appropriate hardware resources can fully utilize the hardware resources.
  • tasks can be scheduled to hardware resources based on the running characteristics of the tasks, so more accurate running characteristics of the tasks are needed.
  • Embodiments of the present disclosure provide a task scheduling solution for heterogeneous systems, which can obtain more accurate running characteristics of tasks based on pre-generated models, and thus can be used for task scheduling. In this way, tasks can be scheduled to appropriate hardware resources, improving the utilization of hardware resources and improving task processing efficiency.
  • Figure 1 shows a schematic block diagram of a heterogeneous system 100 in which embodiments of the present disclosure can be applied.
  • the heterogeneous system 100 includes a first core 110 and a second core 120 .
  • the first core 110 and the second core 120 belong to different hardware resources.
  • the first core 110 and the second core 120 may have different computing power.
  • the computing power of the first core 110 is lower than that of the second core 120 .
  • the first core 110 may be a central processing unit (CPU), and the second core 120 may be a graphics processing unit (GPU).
  • the first core 110 may be a CPU
  • the second core 120 may be a field programmable gate array (Field Programmable Gate Array, FPGA).
  • the first core 110 may be a CPU
  • the second core 110 may be a CPU
  • 120 may be a tensor processing unit (Tensor Processing Unit, TPU). It can be understood that the examples of the first core 110 and the second core 120 here are only illustrative and should not be construed as limiting the embodiments of the present disclosure.
  • the heterogeneous system 100 may be various XPU heterogeneous computing platforms, or simply referred to as heterogeneous computing platforms or heterogeneous platforms.
  • the first core 110 may be a small core
  • the second core 120 may be a large core.
  • the first core 110 may be a large core
  • the second core 120 may be a small core. It should be noted that the expressions of the terms "big core” and "small core” in the embodiments of the present disclosure are only illustrative. For example, the computing power of a large core is stronger than that of a small core.
  • a task 130 can be scheduled (or deployed) in the heterogeneous system 100 and run on a core.
  • the task 130 can run on the first core 110 .
  • the task 130 may also be migrated in the heterogeneous system 100. As shown by the dotted arrow 132, the task 130 may be migrated from the first core 110 to the second core 120.
  • the term "task" in this disclosure may be referred to as a load, a task load, or the like.
  • monitoring indicators can be selected based on expert experience, and strategies can be defined to characterize the running characteristics of tasks, thereby scheduling tasks on different hardware resources.
  • expert experience is limited. When faced with systems with different computing power and a large number of indicators, the monitoring indicators selected by expert experience may be inappropriate; in addition, the defined strategies cannot cope with changes in tasks.
  • the heterogeneous system 100 may have a multi-core processing architecture, and the term "multi-core processing architecture" in this disclosure may be referred to as a multi-core processor architecture, a heterogeneous multi-core processing architecture, a heterogeneous processor architecture, or the like.
  • multi-core in this disclosure may also include many cores, and “heterogeneous” may also include super-heterogeneous, etc., which is not limited by this disclosure.
  • the heterogeneous system 100 shown in FIG. 1 is only an example and should not be construed as a limitation of the embodiments of the present disclosure.
  • the heterogeneous system 100 may include a larger number of cores (eg, mid-cores, etc.), and for example, the heterogeneous system 100 may also include other modules such as an OS, as described below in conjunction with FIG. 3 .
  • FIG. 2 shows a schematic flowchart of a process 200 of task scheduling according to some embodiments of the present disclosure.
  • a plurality of first PMU indicators when the task runs on the first core of the heterogeneous system are obtained.
  • a plurality of first PMU indicators are input into the pre-generated load characteristic identification model to obtain predicted operating characteristics of the task.
  • tasks are scheduled based on the predicted operating characteristics.
  • a plurality of first PMU indicators and a first task running indicator may be obtained, where the first task running indicator includes, for example, a first running time and/or a first Application performance indicators, the first application performance indicators may include, for example, first throughput or first IPC.
  • a pre-generated load feature identification model can be obtained.
  • the input of the load feature identification model is multiple (such as K, K is a positive integer) PMU indicators, and the output is the operating characteristics of the task. Therefore, at block 220, the acquired plurality of first PMU indicators can be input into the load characteristic identification model, thereby obtaining the output of the model, that is, the operating characteristics of the task predicted by the model. For example, the operating characteristics may represent the performance speedup ratio between different cores.
  • the generation process of the load feature identification model reference may be made to the following embodiment in conjunction with FIGS. 4 to 5 .
  • the task may be scheduled based on the predicted operating characteristics of the task, eg, the task may be migrated from a first core to a second core.
  • the task can be migrated to run on the second core of the heterogeneous system, and a plurality of second PMU indicators and second task running indicators when the task runs on the second core can be obtained, wherein
  • the second task running index includes, for example, the second running time and/or the second application performance index.
  • the second application performance index may include, for example, the second throughput or the second IPC.
  • the actual running characteristics of the task may be further determined based on the first task running indicator and the second task running indicator.
  • the actual running characteristics of the task such as the actual performance acceleration ratio, may be determined based on the first application performance indicator and the second application performance indicator.
  • the obtained indicators, etc. may be stored, for example, one or more of the following may be stored: multiple first PMU indicators, multiple second PMU indicators, a first task running indicator, a third 2. Task operation indicators, predicted operation characteristics, or actual operation characteristics.
  • actual operating characteristics may be compared to predicted operating characteristics to determine the accuracy of the load signature identification model. For example, if the error between the predicted operating characteristics and the actual operating characteristics exceeds the error threshold, the number of tasks for which the error exceeds the error threshold is determined; and if the number of tasks exceeds the number threshold, the load characteristic identification model is updated. For example, an updated load characteristic identification model can be obtained through retraining based on the second PMU index and actual operating characteristics.
  • the load feature identification model is regenerated. For example, a load feature identification model that meets the accuracy requirements can be obtained through retraining. If it is determined that the accuracy requirements are met, the resulting load characteristic identification model can be output (or stored). For example, it can be used as shown in Figure 2 Task scheduling.
  • Figure 3 illustrates a schematic block diagram of a system 300 in accordance with some embodiments of the present disclosure.
  • the system 300 includes hardware 310 and software modules, where the software modules include an operating system 320 and a software development kit (Software Development Kit, SDK) 330.
  • Software Development Kit, SDK Software Development Kit
  • Hardware 310 may include multiple cores, such as large core 311 and small core 312 .
  • the operating system 320 may include a load characteristic identification tool, specifically including a model inference calculation module 321, a model evaluation module 322, a model retraining and online update module 323, a heterogeneous computing power scheduler 324, and a sample storage module 325.
  • the software development kit 330 may include a task feature modeling tool, specifically including a microarchitecture feature selection module 331, a model training module 332, and a feature and model analysis and evaluation module 333.
  • the large core 311 and the small core 312 represent different types of cores.
  • the number of cores of the same type may be multiple.
  • the hardware 310 may include multiple large cores and multiple small cores. It can be understood that in other examples, the system 300 may also use one or more cores of other types, such as mid-cores, and this disclosure is not limited thereto.
  • a task characteristic modeling tool may be used to build (or generate) a load characteristic identification model 334.
  • the load characteristic identification model 334 may also be referred to as a load characteristic identification model instance or other Name, this disclosure is not limited to this.
  • the task characteristic modeling tool may generate the load characteristic identification model 334 through training based on the input information.
  • the input information may indicate one or more of the following: microarchitectural indicators 301, processor type 302, application indicators 303, feature selection configuration 304, and model configuration 305.
  • the input information may indicate information of multiple tasks, where the information of each task indicates a micro-architectural indicator 301, a processor type 302, and an application indicator 303.
  • the processor type 302 can indicate which core is running task 1, such as the large core 311 or the small core 312.
  • the microarchitectural indicators 301 may include multiple PMU indicators, such as D, when task 1 runs on a core (such as the large core 311).
  • Application metrics 303 may include application performance, such as throughput or IPC, when Task 1 runs on a core (eg, large core 311).
  • the application indicator 303 may include the power consumption of task 1 when running on the core.
  • the input information may indicate information about model training, wherein the information about model training indicates feature selection configuration 304 and model configuration 305 .
  • the feature selection configuration 304 may include a correlation threshold and a second quantity (denoted as K), where the second quantity represents the number of PMU indicators in the input of the load characteristic identification model, and the correlation threshold is used to determine the load characteristic identification model.
  • feature selection configuration 304 may include user preferences to indicate specific microarchitectural metrics.
  • model configuration 305 may include a type of model, and/or hyperparameters of the model, where the type of model may be one or more of the following in a machine learning model: supervised learning, or unsupervised learning, where supervised learning Including at least one of the following: linear regression type, or neural network type, unsupervised learning includes at least one of the following: K nearest neighbor, maximum expectation type.
  • model configuration 305 may include an accuracy threshold.
  • the task characteristic modeling tool can be implemented in the form of software code in the SDK provided by the chip manufacturer and/or the OS manufacturer for the user, and this disclosure is not limited to this.
  • Figure 4 illustrates a schematic flow diagram of a process 400 of generating a load signature identification model in accordance with some embodiments of the present disclosure.
  • an initial training set is constructed, the initial training set including a plurality of initial data items, each initial data item including a first number of PMU metrics when the task runs on the first core or the second core of the heterogeneous system.
  • Target value indicates the operating characteristics of the task.
  • an updated training set is constructed based on the initial training set, the updated training set including a plurality of updated data items, each updated data item including a second number of PMU indicators and target values, wherein the second number is less than the first number.
  • a load signature identification model is generated based on the updated training set.
  • the operating characteristics may include a performance speedup between the first core and the second core.
  • a first number of PMU indicators may be obtained, a first application performance indicator when the task is running on the first core, and a second application performance indicator when the task is running on the second core may be obtained. ; Determine the target value based on the first application performance index and the second application performance index; and construct an initial training set based on the first number of PMU indicators and the target value.
  • a first number of PMU metrics may be divided into a plurality of clusters; a second number of PMU metrics may be extracted from the plurality of clusters; and based on the second number of PMU metrics and the target value, Confirm to update the training set.
  • the second number is smaller than the first number, which can reduce the size of the training set, thereby reducing the processing scale of subsequent model training and improving processing efficiency.
  • a first correlation degree between pairs of the first number of PMU indicators may be determined, and the first number of PMU indicators may be clustered based on the first correlation degree to be divided into a plurality of clusters, wherein The first correlation between any two PMU indicators in the same cluster is not lower than the correlation threshold, or the mean value of the first correlation between every two PMU indicators in the same cluster is not lower than the correlation degree threshold.
  • the first correlation may include covariance, Euclidean distance, Pearson correlation coefficient, etc.
  • a second correlation degree between each PMU indicator in the first number of PMU indicators and the target value may be determined; based on the second correlation degree, the first number of PMU indicators are sorted; according to the second correlation Degree order from high to low, second number from multiple clusters Extract the second number of PMU indicators from the clusters.
  • the second number of PMU indicators includes a first PMU indicator extracted from a first cluster of the plurality of clusters, and may further include: replacing the first PMU indicator based on user preferences or adjustment instructions from the user. is the second PMU indicator in the first cluster.
  • user input may be received indicating the second quantity and relevance threshold.
  • a load signature identification model may be generated based on the updated training set and a model configuration, where the model configuration includes a model type of the load signature identification model and/or hyperparameters of the load signature identification model.
  • the model type includes one or more of the following in machine learning models: supervised learning, or unsupervised learning, where supervised learning includes at least one of the following: linear regression type, or neural network type, etc., and unsupervised learning includes At least one of the following: K nearest neighbors, maximum expectation type, etc.
  • the model types listed here are only illustrative. In actual applications, they can also be other types, and this disclosure is not limited to this.
  • the load characteristic identification model can also be tested based on the test set.
  • FIG. 5 illustrates another schematic flow diagram of a process 500 of generating a load signature identification model in accordance with some embodiments of the present disclosure.
  • Process 500 may be a more specific implementation of process 400.
  • PMU metrics are also called microarchitectural metrics.
  • an initial training set is constructed through preprocessing.
  • an initial training set may be constructed based on input information.
  • the initial training set may be constructed based on microarchitectural indicators 301, processor type 302, and application indicators 303.
  • the target value can be obtained through different application indicators 303 corresponding to different processor types 302.
  • the target value can be an acceleration ratio, also called a performance acceleration ratio.
  • the input information indicates the information of N tasks, and the information of each task indicates D micro-architecture indicators, the core being run and the corresponding application indicators.
  • the input information can indicate D micro-architecture indicators, application indicators 1 when running on the first core, and application indicators 2 when running on the second core.
  • the performance acceleration ratio of task 1 between different cores can be determined based on application indicator 1 and application indicator 2, such as the ratio of application indicator 1 to application indicator 2.
  • application indicator 1 and application indicator 2 refer to the same attributes, such as throughput or IPC.
  • the initial training set may include N data items about N tasks, and each data item includes D micro-architecture indicators (also called D PMU indicators) of the corresponding tasks and target values (such as performance acceleration ratio).
  • D micro-architecture indicators also called D PMU indicators
  • target values such as performance acceleration ratio
  • the target value can be thought of as a label for supervised learning in machine learning.
  • the input information can come from the user.
  • the user can collect microarchitecture indicators 301, processor types 302, and application indicators 303 on a heterogeneous computing platform. Then the user can combine the collected information with feature selection.
  • the configuration 304 and the model configuration 305 are provided together to the task characteristic modeling tool as shown in Figure 3, so that the task characteristic modeling tool generates the load characteristic identification model 334.
  • the correlation between each micro-architectural indicator and the target value is determined and the D micro-architectural indicators are ranked.
  • the correlation between the microarchitecture indicator and the target value is called the second correlation here.
  • the second correlation degree may be Euclidean distance, covariance, Pearson correlation coefficient, etc.
  • the D micro-architecture indicators may be sorted in the order of the second correlation degree from high to low.
  • the second correlation degree is covariance
  • the D micro-architecture indicators are sorted in descending order of the absolute value of the covariance.
  • cov( ) represents the covariance of the two variables in parentheses
  • E[ ] represents the expectation of the variable in square brackets. The larger the absolute value of the covariance, the higher the correlation between the two variables.
  • Table 1 shows multiple micro-architecture indicators sorted from high to low according to the second relevance.
  • the correlation between two micro-architectural indicators is called the first correlation here.
  • the first correlation degree may be Euclidean distance, covariance, Pearson correlation coefficient, etc.
  • clustering may be performed based on the first correlation and the correlation threshold indicated by the feature selection configuration 304 to divide the D micro-architectural indicators into multiple clusters.
  • one or more micro-architectural indicators may be included in the same cluster, and the first correlation between any two micro-architectural indicators located in the same cluster is not lower than (or higher than) the correlation threshold.
  • multiple clusters may also be called multiple classes, multiple sets, multiple subsets, or other names, and this disclosure is not limited thereto.
  • clustering may also be called grouping, clustering, classification, partitioning or other names, which is not limited by this disclosure.
  • the first correlation degree between two of the D micro-architectural indicators may include D ⁇ D values.
  • the first correlation between the first microarchitecture indicator and the second microarchitecture indicator is equal to the first correlation between the second microarchitecture indicator and the first microarchitecture indicator, and a certain microarchitecture indicator (such as The first correlation between the first micro-architectural indicator) and itself is equal to 1, so the correlation between two of the D micro-architectural indicators can include D ⁇ (D-1)/2 values.
  • the first correlation between microarchitectural indicators and themselves is not considered in this disclosure.
  • D micro-architectural indicators can be divided into multiple clusters (for example, C). Each micro-architectural indicator belongs to and only belongs to one cluster.
  • the first correlation between any two micro-architectural indicators in each cluster is The degree is neither lower than nor higher than the correlation threshold.
  • Table 2 shows a cluster including 3 microarchitectural indicators.
  • a second number (K) of candidate microarchitectural metrics are selected based on the ranking determined at 520 and the plurality of clusters determined at 530.
  • K is much smaller than D, for example, K is two orders of magnitude lower or three orders of magnitude lower than D.
  • the selected K micro-architectural indicators belong to K different clusters, that is, when making the selection, each cluster selects at most one of the micro-architectural indicators.
  • the selection may be performed in order, that is, in the order of the second correlation degree from high to low. For example, determine the unselected second micro-architecture indicator with the highest correlation from the ranking. If the micro-architecture indicator does not belong to the same cluster as all the selected micro-architecture indicators, select the micro-architecture with the second highest correlation. indicator; otherwise no selection is made.
  • K candidate microarchitectural indicators may constitute a candidate set.
  • Such sequential selection may continue until the number of candidate microarchitectural indicators in the candidate set reaches the second number (K) indicated by the feature selection configuration 304 .
  • the above-mentioned operations at blocks 510 to 540 may be performed by the micro-architectural feature selection module 331 as shown in FIG. 3 .
  • the micro-architecture feature selection module 331 can automatically select, for example, K micro-architectural indicators from a large number of (eg, D) micro-architectural indicators based on the feature selection configuration 304 input by the user.
  • the micro-architecture feature selection module 331 can determine the correlation between the input D micro-architecture indicators, perform clustering according to the correlation threshold in the feature selection configuration 304, and then perform clustering according to the micro-architecture indicators and the target.
  • the correlation ranking results between values automatically select the K best micro-architectural indicators.
  • K microarchitectural metrics are obtained for training based on the K candidate microarchitectural metrics.
  • the K candidate microarchitectural metrics may be used as the K microarchitectural metrics for training.
  • the K candidate microarchitecture indicators can be adjusted based on user preferences, such as replacing one or several candidate microarchitecture indicators with other microarchitecture indicators to obtain K microarchitecture indicators for training. index.
  • the K candidate microarchitectural indicators do not include the specific microarchitectural indicator, then it may be determined that the K candidate microarchitectural indicators belong to the same cluster as the specific microarchitectural indicator. a first candidate microarchitectural indicator, and replace the first candidate microarchitectural indicator with the specific microarchitectural indicator. In this way, K micro-architectural indicators used for training can be obtained after replacement.
  • the user can evaluate K candidate microarchitecture indicators to determine whether the K candidate microarchitecture indicators are representative, and the user can consider one of the K candidate microarchitecture indicators as not representative or Certain candidate microarchitectural indicators are replaced by the same number of other more representative microarchitectural indicators.
  • the replaced candidate microarchitectural indicator and the replaced microarchitectural indicator belong to the same cluster.
  • the above-mentioned operation at block 550 may be omitted, for example, the K candidate microarchitecture indicators obtained at 540 are used for the training operation in the following 560.
  • the operations at block 550 may be performed by the feature and model analysis evaluation module 333.
  • the feature and model analysis and evaluation module 333 includes a feature analysis and evaluation sub-module.
  • the feature analysis and evaluation sub-module may obtain K micro-architectural indicators for training based on user preferences indicated by the user in the feature selection configuration 304 .
  • the feature analysis and evaluation sub-module can be selected and/or confirmed by the user through a human-computer interaction graphical user interface (Graphical User Interface, GUI).
  • GUI Human-computer interaction graphical user interface
  • the feature analysis and evaluation sub-module can provide the K clusters where the K candidate micro-architectural indicators are located, as well as the multiple micro-architectural indicators included in the K clusters and the corresponding second correlation degrees to the user through the GUI, so that the user can Candidate microarchitectural metrics are adjusted based on the second correlation in the cluster. For example, one could select a microarchitectural metric that does not have the highest secondary correlation but may be more general.
  • model training is performed based on the K microarchitectural metrics.
  • training can be performed by the model training module 332.
  • a training set may be constructed based on the K micro-architectural indicators and the target values in the aforementioned initial training set, where the target values may be labels corresponding to the micro-architectural indicators in the training set.
  • model training can be performed based on the training set and model configuration 305.
  • the model configuration 305 may include a model type, for example, the model type is a linear regression type.
  • the model configuration 305 may include a model type and model hyperparameters.
  • the model type is a neural network type
  • the model hyperparameters include the structural parameters of the model, the number of iterations, the learning rate, etc., where the structural parameters of the model may include the network The number of intermediate layers, the number of neurons in each layer, etc.
  • the load feature identification model 334 generated after training can be expressed as the following formula (2):
  • P1 to P7 represent K micro-architectural indicators, and the coefficients 2.5e07, 4.89e09, 2.35e, 4.83e08, 4.15e07, 3.48e06, 2.92e08 and 2.43 are determined through training.
  • the generated load feature identification model 334 can be used for model inference later.
  • the performance speedup ratio (speedup) of the task between two cores can be obtained based on the microarchitecture indicators P1 to P7 when the task is running.
  • the generated load signature identification model 334 may indicate that its inputs include K micro-architectural indicators and that its outputs include target values.
  • the generated load characteristic identification model 334 may also indicate its model parameters, such as including each weight parameter in equation (2).
  • the generated load feature identification model 334 may also indicate the training error of the model on the training set and the test error on the test set.
  • requirements may include error requirements or accuracy requirements. In this way, by evaluating the model, underfitting or overfitting can be avoided.
  • the load feature identification model can be tested based on the test set to obtain the accuracy of the model. If the accuracy is lower than the accuracy threshold indicated by the model configuration 305, it means that the model does not meet the requirements, and then 540 or 560 can be returned. If the accuracy is not lower than (eg, higher than) the accuracy threshold indicated by the model configuration 305, then the model meets the requirements and you can proceed to 580.
  • a part of the training set can be selected as the test set. Alternatively, you can also construct a test set that is completely different from the training set or has some of the same data items as the training set.
  • the test set can be called a validation set.
  • the training error and test error of the load feature identification model can be determined, and the difference between the training error and the test error can be determined. If the difference is higher than the error threshold, it means that the model does not meet the requirements, then it can Return 540 or return 560. If the difference is lower than (if not higher than) the error threshold, it means that the model meets the requirements, and you can proceed to 580.
  • the way of determining the difference may be, for example,
  • the way of determining the difference may be, for example,
  • model training can be triggered again by adjusting the hyperparameters of the model (such as the number of iterations in a deep neural network model, etc.) until the generated model meets the requirements (such as accuracy requirements or error requirements).
  • the operations at block 570 may be performed by the feature and model analysis evaluation module 333 as shown in FIG. 3 .
  • the feature and model analysis and evaluation module 333 includes a model analysis and evaluation sub-module.
  • the model analysis evaluation sub-module may be used to determine whether the accuracy of training the generated load signature identification model 334 is sufficient.
  • the generated load signature identification model 334 is output.
  • the load characteristic identification model 334 may be provided to the model inference calculation module 321 .
  • the output load characteristic identification model 334 may indicate that its inputs include K micro-architectural indicators and that its outputs include target values.
  • the generated load characteristic identification model 334 may also indicate its model parameters, such as including each weight parameter in equation (2).
  • the generated load feature identification model 334 may also indicate the training error of the model on the training set and the test error on the test set.
  • process 500 is only an example and should not be construed as a limitation of the embodiment of the present disclosure, such as at blocks 520 and 530
  • the order of the operations may be reversed, for example the operations at block 550 may be omitted, for example the value of the second quantity K may be pre-stored, etc.
  • embodiments of the present disclosure provide a model generation solution.
  • the best second number of micro-architectural indicators can be automatically selected from a large number of micro-architectural indicators.
  • the selected micro-architectural indicators can reflect the performance of the task on heterogeneous computing power. of difference.
  • model training based on the second number of micro-architectural indicators can reduce the complexity of the system.
  • a load feature identification model 334 based on the second number of micro-architectural indicators can be output.
  • the load characteristic identification model 334 can be used to determine the running characteristics of the task based on the second number of micro-architectural indicators when the task is run on a certain core, such as the performance acceleration ratio between different cores. It is understandable that only the second number of micro-architectural indicators can be monitored without monitoring all micro-architectural indicators, which can reduce kernel monitoring overhead and reduce energy consumption.
  • the model generation solution in the embodiment of the present disclosure is based on the first number of micro-architecture index training and generation, where the micro-architecture index can come from any heterogeneous system (such as a heterogeneous processor architecture including large cores and small cores) Microarchitectural indicators measured on hardware can also come from data collected for different scenarios (such as databases or big data scenarios) on the same computing hardware (core) of heterogeneous systems. Therefore, the model generation solution in the embodiment of the present disclosure has universal applicability.
  • Figure 6 shows a schematic flow diagram of a process 600 of model calibration in accordance with some embodiments of the present disclosure.
  • the load characteristic identification model can be implemented in the OS kernel.
  • the first core may be the large core 311 or the small core 312 .
  • Task running indicators can include task running time, throughput, IPC, etc.
  • the core before migration may be called the first core
  • the core after migration may be called the second core.
  • K micro-architectural indicators, task running indicators, and predicted performance speedup ratios are obtained when the task runs on the pre-migration core.
  • the core before migration may be the first core 311, and the core after migration may be the second core 312; or, the core before migration may be the second core 312, and the core after migration may be the second core 312. It can be the first core 311.
  • K micro-architectural indicators when the task is running can be obtained, and the K micro-architectural indicators are input to the load feature identification model 334 to obtain the predicted performance acceleration ratio.
  • the heterogeneous computing power scheduler 324 can collect K micro-architectural indicators when the task is running, and the model inference calculation module 321 identifies the model 334 by using the load characteristics and based on the K micro-architectural indicators from the heterogeneous computing power scheduler 324. Microarchitectural metrics to get predicted performance speedup.
  • the model inference calculation module 321 can provide the predicted performance speedup ratio to the heterogeneous computing power scheduler 324, so that the heterogeneous computing power scheduler 324 can make task scheduling decisions based on the predicted performance speedup ratio, for example Perform task migration.
  • the heterogeneous computing power scheduler 324 when the heterogeneous computing power scheduler 324 decides to perform task migration, it can provide the K micro-architectural indicators, task running indicators and predicted performance acceleration ratio of the task running on the core before migration to the model evaluation module. 322.
  • K microarchitecture indicators and task running indicators can be obtained through detection.
  • the operations at blocks 610 and 620 may be performed by model evaluation module 322 .
  • the actual performance speedup is determined.
  • the actual performance acceleration ratio can be determined by the model evaluation module 322 based on the task running index of the task on the core before migration and the task running index of the task on the core after migration.
  • the model evaluation module 322 may store the following information to the sample storage module 325: K micro-architectural indicators and task running indicators when the task runs on the pre-migration core; K K micro-architecture indicators and task running indicators when the task runs on the post-migration core. Microarchitecture metrics and task running metrics; and actual performance speedup.
  • the model evaluation module 322 may also store the predicted performance speedup determined when the task is run on the pre-migration core into the sample storage module 325 .
  • the model evaluation module 322 can also store the type of the model, such as a linear regression type or a neural network type.
  • the actual performance speedup is compared to the predicted performance speedup.
  • the comparison result may be obtained by comparison by the model evaluation module 322, where the comparison result may represent an error or the like, for example.
  • a Mean Absolute Error (MAE) between the actual performance speedup and the predicted performance speedup within a period of time may be determined.
  • the deviation between the actual performance speedup and the predicted performance speedup may be determined, expressed as a percentage.
  • the operations at block 650 may be performed by model evaluation module 322.
  • the error threshold may be the training error when the model is trained. If the MAE between the actual performance speedup and the predicted performance speedup over a period of time exceeds the training error when the model is trained, the exceeded error may be determined. threshold, then increase the number of tasks exceeding the error threshold by 1.
  • the error threshold may be a preset percentage value (for example, 5%). If the deviation between the actual performance speedup and the predicted performance speedup exceeds 5%, it may be determined that the error threshold is exceeded, and the error threshold will be exceeded. The number of tasks exceeding the error threshold is increased by 1.
  • the preset time period can be one day, one hour, or other longer or shorter time, such as the number of times threshold. It can be 5 reps, 20 reps, or any number of more or less reps. In this way, the cumulative accuracy of the model within a preset time period can be determined.
  • the process may proceed to block 601 and continue to perform the operation at block 610 for the next task.
  • the model evaluation module 322 can perform an online evaluation on the model to determine the accuracy of the model during runtime, thereby determining whether the model has aged, and further determining whether model retraining is required.
  • model retraining occurs.
  • the model retraining and online updating module 323 can retrain the model based on the sample set from the sample storage module 325 .
  • the retrained model can be obtained through retraining based on the K microarchitecture indicators and the actual performance acceleration ratio of the task running on the migrated core, combined with the type of the model (such as linear regression type).
  • model parameters such as weight coefficients can be calibrated and updated through retraining.
  • the model evaluation module 322 determines that the model has aged, the sample set is extracted from the sample storage module 325 and provided to the model retraining and online update module 323, thereby triggering the model retraining and online update module 323 to retrain the model. .
  • the operation at block 670 is similar to the aforementioned operation at block 570 in Figure 5 , and will not be described again to avoid duplication. For example, if it is determined at block 670 that the requirements are not met, then return to block 660. If it is determined at block 670 that the requirements are met, proceed to block 680.
  • an updated load signature identification model is obtained. For example, a retrained load feature identification model that meets the requirements can be obtained.
  • the updated load characteristic identification model may be provided to the model inference calculation module 321 for subsequent task scheduling.
  • embodiments of the present disclosure provide a feedback-based calibration mechanism for the online load characteristic identification model.
  • the microarchitecture indicators and operating indicators of tasks running on different cores of heterogeneous systems can be collectively referred to as feedback information.
  • the model can be retrained based on the feedback information before and after the task is transferred between different cores of the heterogeneous system, thereby obtaining an updated model. This can achieve online calibration of the model and improve the accuracy of the model.
  • the heterogeneous computing power scheduler 324 in the embodiment of the present disclosure can be understood as an enhanced OS kernel scheduler.
  • the model inference calculation module 321 can be called to determine the running characteristics of the task, such as predicting the performance acceleration ratio of the task between heterogeneous computing power; on the other hand, after the task migration, feedback information can also be sent to the model
  • the evaluation module 322 is used together with the model retraining and online update module 323 to implement dynamic model calibration with online feedback.
  • embodiments of the present disclosure can solve the problem of reduced accuracy of the model during deployment and use, and can ensure the accuracy of the model through online data collection, analysis and evaluation, and retraining.
  • the application metric 303 may include power consumption when the task is running on the core.
  • a power consumption model can be obtained through training, and the power consumption model can be used for comprehensive scheduling of power consumption.
  • the load characteristic identification model 334 and the power consumption model may have different model types, different model parameters, etc. The two can be independent of each other.
  • the load feature identification model can provide predicted operating characteristics of tasks, thereby providing a reliable reference for task scheduling, thereby making task scheduling more accurate. Accordingly, full utilization of hardware resources can be achieved.
  • Figure 7 shows a schematic block diagram of an apparatus 700 for task scheduling according to some embodiments of the present disclosure.
  • Device 700 can be implemented by software, hardware, or a combination of both.
  • the apparatus 700 includes an acquisition module 710, a prediction module 720 and a scheduling module 730.
  • the obtaining module 710 may be configured to obtain a plurality of first PMU indicators when the task runs on the first core of the heterogeneous system.
  • the prediction module 720 is configured to input a plurality of first PMU indicators into the pre-generated load characteristic identification model to obtain the predicted operating characteristics of the task.
  • the scheduling module 730 is configured to schedule tasks based on predicted operating characteristics.
  • the apparatus 700 further includes: a task migration module configured to migrate the task to run on the second core of the heterogeneous system; and a post-migration information acquisition module configured to acquire the task running on the second core of the heterogeneous system.
  • a task migration module configured to migrate the task to run on the second core of the heterogeneous system
  • a post-migration information acquisition module configured to acquire the task running on the second core of the heterogeneous system.
  • the second task running indicators include a second running time and/or a second application performance indicator.
  • the obtaining module 710 is further configured to: obtain a first task running indicator when the task runs on the first core, where the first task running indicator includes a first running time and/or a first application performance indicator.
  • the first application performance indicator may include first throughput and/or first IPC.
  • the apparatus 700 further includes an actual operating characteristic determining module configured to determine the actual operating characteristics of the task based on the first task operating indicator and the second task operating indicator.
  • the apparatus 700 further includes a storage module configured to store at least one of the following: a plurality of first PMU indicators, a plurality of second PMU indicators, a first task running indicator, a second task running indicator, a prediction Operating characteristics, or actual operating characteristics.
  • the apparatus 700 further includes a model update module configured to determine the number of tasks in which the error exceeds the error threshold if the error between the predicted operating characteristics and the actual operating characteristics exceeds the error threshold; and if the number of tasks exceeds the number threshold, the load characteristic identification model is updated based on the second PMU indicator and actual operating characteristics.
  • a model update module configured to determine the number of tasks in which the error exceeds the error threshold if the error between the predicted operating characteristics and the actual operating characteristics exceeds the error threshold; and if the number of tasks exceeds the number threshold, the load characteristic identification model is updated based on the second PMU indicator and actual operating characteristics.
  • the apparatus 700 further includes: an accuracy determination module configured to determine whether the updated load characteristic identification model meets accuracy requirements; a model update module configured to determine if the updated load feature identification model does not meet the accuracy requirements. Regenerate the load signature identification model.
  • the apparatus 700 further includes: an initial training set building module configured to build an initial training set.
  • the initial training set includes a plurality of initial data items, and each initial data item includes a task running on a heterogeneous system. A first number of PMU indicators and target values when the first core or the second core is on, and the target value indicates the operating characteristics of the task;
  • the update training set construction module is configured to build an update training set based on the initial training set, and the update training set includes multiple update data items, each update data item includes a second number of PMU indicators and target values, of which the second The number is less than the first number; and a model generation module configured to generate a load feature identification model based on the updated training set.
  • the initial training set building module includes: a first acquisition sub-module configured to acquire a first number of PMU indicators; a second acquisition sub-module configured to acquire the first application performance when the task runs on the first core. indicator; the third acquisition sub-module is configured to obtain the second application performance indicator when the task is running on the second core; the target value determination sub-module is configured to determine the target value based on the first application performance indicator and the second application performance indicator. ; and the initial training set construction sub-module is configured to build an initial training set based on the first number of PMU indicators and target values.
  • the updating training set building module includes: the clustering sub-module is configured to divide the first number of PMU indicators into multiple clusters; the extraction sub-module is configured to extract the second number of PMU indicators from the multiple clusters. ; and the update training set construction sub-module is configured to build an update training set based on the second number of PMU indicators and target values.
  • the clustering submodule includes: a first correlation determination unit configured to determine a first correlation between pairs of the first number of PMU indicators; a clustering unit configured to determine the first correlation based on a correlation threshold.
  • a number of PMU indicators are clustered to obtain multiple clusters, where the first correlation between any two PMU indicators located in the same cluster is not lower than the correlation threshold, or where each of the PMU indicators located in the same cluster is not lower than the correlation threshold.
  • the mean value of the first correlation between the two PMU indicators is not lower than the correlation threshold.
  • the first correlation includes at least one of the following: covariance, Euclidean distance, or Pearson correlation coefficient.
  • the extraction sub-module includes: a second correlation determining unit configured to determine a second correlation between each PMU indicator in the first number of PMU indicators and the target value; the sorting unit configured to determine a second correlation based on the second The first number of PMU indicators are sorted according to the correlation degree; the extraction unit is configured to extract the second number of PMU indicators from the second number of clusters in the plurality of clusters in order from high to low according to the second correlation degree.
  • the extraction sub-module further includes an adjustment unit configured to replace the first PMU indicator with a second PMU indicator in the first cluster based on user preferences or adjustment instructions from the user.
  • the apparatus 700 further includes a receiving module configured to receive input information from the user, the input information indicating the second quantity and the relevance threshold.
  • the model generation module is configured to: generate a load feature identification model based on the updated training set and a model configuration, where the model configuration includes a model type of the load feature identification model and/or hyperparameters of the load feature identification model.
  • the model type includes one or more of the following in machine learning models: supervised learning, or unsupervised learning, where supervised learning includes at least one of the following: linear regression type, or neural network type, etc., unsupervised Learning includes at least one of the following: K nearest neighbors, maximum expectation type, etc.
  • the apparatus 700 further includes a testing module configured to test the load feature identification model based on the test set.
  • the operating characteristics include a performance speedup ratio between the first core and the second core.
  • the device 700 in Figure 7 can be used to implement the various processes described above in conjunction with Figures 2 to 6. For the sake of brevity, they will not be described again here.
  • Figure 8 shows a schematic block diagram of an apparatus 800 for task scheduling according to some embodiments of the present disclosure.
  • the device 800 can be implemented by software, hardware, or a combination of both.
  • the apparatus 800 includes an initial training set construction module 810 , an updated training set construction module 820 and a model generation module 830 .
  • the initial training set building module 810 may be configured to build an initial training set, the initial training set including a plurality of initial data items, each initial data item including a first core when the task is run on the first core or the second core of the heterogeneous system.
  • the number of PMU indicators and target values, the target value indicates the operating characteristics of the task.
  • the update training set building module 820 is configured to build an update training set based on the initial training set, the update training set includes a plurality of update data items, each update data item includes a second number of PMU indicators and target values, wherein the second number is less than a third number.
  • the model generation module 830 is configured to generate a load feature identification model based on the updated training set.
  • the initial training set building module 810 may include: a first acquisition sub-module configured to acquire a first number of PMU indicators; a second acquisition sub-module configured to acquire a first number of PMU indicators when the task is run on the first core. an application performance index; the third acquisition sub-module is configured to obtain the second application performance index when the task is running on the second core; the target value determination sub-module is configured to be based on the first application performance index and the second application performance index, Determine the target value; and the initial training set construction sub-module is configured to build an initial training set based on the first number of PMU indicators and the target value.
  • the updating training set construction module 820 may include: the clustering sub-module is configured to divide the first number of PMU indicators into multiple clusters; the extraction sub-module is configured to extract the second number from the multiple clusters. PMU indicators; and the update training set construction sub-module is configured to build an update training set based on the second number of PMU indicators and target values.
  • the clustering sub-module may include: a first correlation determining unit configured to determine a first correlation between pairs of the first number of PMU indicators; and a clustering unit configured to determine based on the correlation threshold.
  • the first number of PMU indicators are clustered to obtain multiple clusters, in which the first correlation between any two PMU indicators located in the same cluster is not lower than the correlation threshold, or which are located in the same cluster.
  • the mean value of the first correlation between each two PMU indicators is not lower than the correlation threshold.
  • the first correlation includes at least one of the following: covariance, Euclidean distance, or Pearson correlation coefficient.
  • the extraction sub-module may include: a second correlation determining unit configured to determine a second correlation between each PMU indicator in the first number of PMU indicators and the target value; and the sorting unit configured to determine the second correlation based on the first number of PMU indicators.
  • the second correlation degree is used to sort the first number of PMU indicators; the extraction unit is configured to extract the second number of PMU indicators from the second number of clusters in the plurality of clusters in order from high to low according to the second correlation degree.
  • the second correlation includes at least one of the following: covariance, Euclidean distance, or Pearson correlation coefficient.
  • the extraction sub-module may further include an adjustment unit configured to replace the first PMU indicator with a second PMU indicator in the first cluster based on user preferences or adjustment instructions from the user.
  • the apparatus 800 may further include a receiving module configured to receive input information from the user, the input information indicating the second quantity and the relevance threshold.
  • the model generation module 830 may be configured to generate a load signature identification model based on the updated training set and a model configuration, where the model configuration includes a model type of the load signature identification model and/or hyperparameters of the load signature identification model.
  • the model type may include one or more of the following in a machine learning model: supervised learning or unsupervised learning, where supervised learning includes at least one of the following: a linear regression type, or a neural network type, where unsupervised Learning includes at least one of the following: K-nearest neighbors, or maximum expectation type.
  • the apparatus 800 may further include a testing module configured to test the load feature identification model based on the test set.
  • the operating characteristics include a performance speedup ratio between the first core and the second core.
  • the device 800 in Figure 8 can be used to implement the various processes described above in conjunction with Figures 4 to 5. For the sake of simplicity, they will not be described again here.
  • each functional unit in the disclosed embodiments may be integrated In a unit, it may exist physically alone, or two or more units may be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • Figure 9 shows a schematic block diagram of an example device 900 that may be used to implement embodiments of the present disclosure.
  • Device 900 may be implemented as or included in heterogeneous system 100 of FIG. 1 .
  • device 900 includes a multi-core processor 901, a read-only memory (Read-Only Memory, ROM) 902, and a random access memory (Random Access Memory, RAM) 903.
  • the multi-core processor 901 can perform various appropriate actions and processes according to computer program instructions stored in the RAM 902 and/or RAM 903 or loaded from the storage unit 908 into the ROM 902 and/or RAM 903. In ROM 902 and/or RAM 903, various programs and data required for operation of device 900 may also be stored.
  • Multi-core processor 901 and ROM 902 and/or RAM 903 are connected to each other through bus 904.
  • An input/output (I/O) interface 905 is also connected to bus 904.
  • I/O interface 905 Multiple components in device 900 are connected to I/O interface 905, including: input unit 906, such as keyboard, mouse, etc.; output unit 907, such as various types of displays, speakers, etc.; storage unit 908, such as magnetic disk, optical disk, etc. ; and communication unit 909, such as a network card, modem, wireless communication transceiver, etc.
  • the communication unit 909 allows the device 900 to exchange information/data with other devices through computer networks such as the Internet and/or various telecommunications networks.
  • Multi-core processor 901 may include multiple cores, each of which may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples that can be implemented include, but are not limited to, Central Processing Unit (CPU), Graphics Processing Unit (GPU), various dedicated artificial intelligence (Artificial Intelligence, AI) computing chips, various The computing unit, Digital Signal Processor (DSP), and any appropriate processor, controller, microcontroller, etc. that run the machine learning model algorithm can accordingly be called computing units. Multi-core processor 901 performs various methods and processes described above. For example, in some embodiments, the various processes described above may be implemented as a computer software program that is tangibly embodied in a computer-readable medium, such as storage unit 908.
  • part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or RAM 903 and/or communication unit 909.
  • ROM 902 and/or RAM 903 and/or communication unit 909 When a computer program is loaded into ROM 902 and/or RAM 903 and executed by multi-core processor 901, one or more steps of the process described above may be performed.
  • the multi-core processor 901 may be configured to perform the various processes described above in any other suitable manner (eg, via firmware).
  • the device 900 in FIG. 9 may be implemented as a computing device, or may be implemented as a chip or chip system in the computing device, which is not limited by the embodiments of the present disclosure.
  • Embodiments of the present disclosure also provide a chip, which may include an input interface, an output interface, and a processing circuit.
  • the input interface and the output interface can complete the interaction of signaling or data
  • the processing circuit can complete the generation and processing of signaling or data information.
  • Embodiments of the present disclosure also provide a chip system, including a processor, for supporting a computing device to implement the functions involved in any of the above embodiments.
  • the chip system may also include a memory for storing necessary program instructions and data.
  • the processor runs the program instructions
  • the device installed with the chip system can implement any of the above-mentioned embodiments.
  • the chip system may be composed of one or more chips, or may include chips and other discrete devices.
  • Embodiments of the present disclosure also provide a processor for coupling with a memory, and the memory stores instructions. When the processor executes the instructions, the processor performs the methods and functions involved in any of the above embodiments.
  • Embodiments of the present disclosure also provide a computer program product containing instructions, which, when run on a computer, causes the computer to perform the methods and functions involved in any of the above embodiments.
  • Embodiments of the present disclosure also provide a computer-readable storage medium on which computer instructions are stored.
  • the processor executes the instructions, the processor is caused to perform the methods and functions involved in any of the above embodiments.
  • various embodiments of the present disclosure may be implemented in hardware or special purpose circuitry, software, logic, or any combination thereof. Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software, which may be executed by a controller, microprocessor or other computing device. Although various aspects of embodiments of the present disclosure are shown and described as block diagrams, flowcharts, or using some other graphical representation, it should be understood that the blocks, devices, systems, techniques, or methods described herein may be implemented as, for example, without limitation Examples include hardware, software, firmware, special purpose circuitry or logic, general purpose hardware or controllers or other computing devices, or some combination thereof.
  • the present disclosure also provides at least one computer program product tangibly stored on a non-transitory computer-readable storage medium.
  • the computer program product includes computer-executable instructions, for example instructions included in program modules, which are executed in a device on a real or virtual processor of the target to perform the process/method as described above with reference to the accompanying drawings.
  • program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • the functionality of program modules may be combined or split between program modules as desired.
  • Machine-executable instructions for program modules can execute locally or in a distributed device. In a distributed device, program modules can be located in local and remote storage media.
  • Computer program code for implementing the methods of the present disclosure may be written in one or more programming languages. These computer program codes may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, so that when executed by the computer or other programmable data processing device, the program code causes the flowcharts and/or block diagrams to be displayed. The functions/operations specified in are implemented.
  • the program code may execute entirely on the computer, partly on the computer, as a stand-alone software package, partly on the computer and partly on a remote computer or entirely on the remote computer or server.
  • computer program code or related data may be carried by any suitable carrier to enable a device, apparatus, or processor to perform the various processes and operations described above.
  • carriers include signals, computer-readable media, and the like.
  • signals may include electrical, optical, radio, acoustic, or other forms of propagated signals, such as carrier waves, infrared signals, and the like.
  • a computer-readable medium may be any tangible medium that contains or stores a program for or in connection with an instruction execution system, apparatus, or device.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • Computer-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared or semiconductor systems, devices or devices, or any suitable combination thereof. More detailed examples of computer readable storage media include an electrical connection with one or more wires, portable computer disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory Memory (Erasable Programmable Read-Only Memory, EPROM), optical storage device, magnetic storage device, or any suitable combination thereof.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本公开的实施例提供了一种任务调度方法、模型生成方法、以及电子设备。该任务调度方法包括:获取任务运行在异构***的第一核上时的多个第一PMU指标;将多个第一PMU指标输入到预先生成的负载特征识别模型,以得到任务的预测运行特征;以及基于预测运行特征,对任务进行调度。以此方式,通过负载特征识别模型能够提供任务的预测运行特征,从而为任务的调度提供可靠的参考,进而能够使得对任务的调度更加准确。相应地,能够实现硬件资源的充分利用。

Description

任务调度方法、模型生成方法、以及电子设备 技术领域
本公开涉及终端技术领域,并且更具体地,涉及任务调度方法和装置、模型生成方法和装置、电子设备、计算机可读存储介质、芯片以及计算机程序产品。
背景技术
诸如移动电话、个人电脑等终端以及服务器等已普遍使用多核处理架构,另外出于性能和功耗的考虑,越来越多的硬件也开始采用异构算力的处理器,例如大核和小核。一般而言,大核算力强但是功耗高,小核算力小但是功耗低。当任务在具有异构算力的多核处理架构上运行时,将任务调度到合适的硬件资源上能够充分发挥异构算力的价值。但是,基于何种因素来准确地进行任务调度是目前仍需解决的问题之一。
发明内容
本公开的示例实施例提供了一种基于任务的运行特征来对任务进行调度的方案。
在第一方面,提供了一种任务调度方法。该方法包括:获取任务运行在异构***的第一核上时的多个第一性能监控单元(Performance Monitoring Unit,PMU)指标;将多个第一PMU指标输入到预先生成的负载特征识别模型,以得到任务的预测运行特征;以及基于预测运行特征,对任务进行调度。
以此方式,通过负载特征识别模型能够提供任务的预测运行特征,从而为任务的调度提供可靠的参考,进而能够使得对任务的调度更加准确。相应地,能够实现硬件资源的充分利用。
在第一方面的一些实施例中,还包括:将任务进行迁移以运行在异构***的第二核上;获取任务运行在第二核上时的多个第二PMU指标以及第二任务运行指标,第二任务运行指标包括第二运行时间和/或第二应用性能指标。
以此方式,在任务实际运行过程中,可以分别获取任务在不同的核上时的运行指标,从而能够用于对模型的评估。
在第一方面的一些实施例中,还包括:获取任务运行在第一核上时的第一任务运行指标,第一任务运行指标包括第一运行时间和/或第一应用性能指标。可选地,第一应用性能指标可以包括第一吞吐量和/或第一每个时钟周期内的指令数(Instructions per cycle,IPC)。
在第一方面的一些实施例中,还包括:基于第一任务运行指标和第二任务运行指标,确定任务的实际运行特征。
在第一方面的一些实施例中,还包括存储以下至少一项:多个第一PMU指标、多个第二PMU指标、第一任务运行指标、第二任务运行指标、预测运行特征、或实际运行特征。
以此方式,通过存储,能够构建在任务实际运行中的在线的数据集,从而能够被用于模型的在线更新。
在第一方面的一些实施例中,还包括:如果预测运行特征与实际运行特征之间的误差超过误差阈值,则确定误差超过误差阈值的任务次数;以及如果任务次数超过次数阈值,则基于第二PMU指标和实际运行特征,更新负载特征识别模型。
以此方式,能够在线地评估模型的准确度是否已经降低,并且能够实时地在线地对模型进行更新,从而确保所使用的模型精度,进而保证任务调度的准确性。
在第一方面的一些实施例中,还包括:确定更新后的负载特征识别模型是否满足准确度要求;如果确定不满足准确度要求,则重新生成负载特征识别模型。
以此方式,对更新后的模型可以进行准确度评估,以确保模型的准确性。
在第一方面的一些实施例中,负载特征识别模型是通过下述过程生成的:构建初始训练集,初始训练集包括多个初始数据项,每个初始数据项包括任务运行在异构***的第一核或第二核上时的第一数量的PMU指标和目标值,目标值指示任务的运行特征;基于初始训练集构建更新训练集,更新训练集包括多个更新数据项,每个更新数据项包括第二数量的PMU指标和目标值,其中第二数量小于第一数量;以及基于更新训练集,生成负载特征识别模型。
以此方式,能够基于训练集通过训练生成负载特征识别模型,该模型能够提供较为准确的任务的运行特征,以用于任务调度。
在第一方面的一些实施例中,构建初始训练集包括:获取第一数量的PMU指标;获取任务运行在第一核上时的第一应用性能指标;获取任务运行在第二核上时的第二应用性能指标;基于第一应用性能指标和第二应用性能指标,确定目标值;以及基于第一数量的PMU指标和目标值,构建初始训练集。
以此方式,目标值可以作为有监督学习中的标签,能够便于模型训练过程,提高模型生成的效率。
在第一方面的一些实施例中,基于初始训练集构建更新训练集包括:将第一数量的PMU指标划分为多个簇;从多个簇中提取第二数量的PMU指标;以及基于第二数量的PMU指标和目标值,构建更新训练集。
以此方式,能够使得更新训练集中的PMU指标的数量更少,这样能够缩减训练的规模,提升训练效率。并且在后续模型推理过程中,所需要采集的PMU指标的数量较少,这样能够降低内核和/或处理器监控的开销,降低能耗。
在第一方面的一些实施例中,将第一数量的PMU指标划分为多个簇包括:确定第一数量的PMU指标中两两之间的第一相关度;基于相关度阈值将第一数量的PMU指标进行聚类,以得到多个簇,其中位于同一个簇中的任两个PMU指标之间的第一相关度不低于相关度阈值,或者其中位于同一个簇中的每两个PMU指标之间的第一相关度的均值不低于相关度阈值。在第一方面的一些实施例中,第一相关度包括如下至少一项:协方差、欧式距离、或皮尔逊相关系数。
以此方式,能够通过相关度将PMU指标进行分簇,使得位于同一个簇中的PMU指标具有接近的性能,这样能够便于PMU指标的提取。
在第一方面的一些实施例中,从多个簇中提取第二数量的PMU指标包括:确定第一数量的PMU指标中的每个PMU指标与目标值之间的第二相关度;基于第二相关度,将第一数量的PMU指标进行排序;按照第二相关度从高到低的顺序,从多个簇中第二数量的簇中提取第二数量的PMU指标。
以此方式,能够提取出更具有代表性的第二数量的PMU指标,用于模型训练过程,这样能够使得所生成的模型更加准确。
在第一方面的一些实施例中,第二数量的PMU指标包括从多个簇的第一簇中提取的第一PMU指标,该方法还包括:基于用户偏好或者来自用户的调整指示,将第一PMU指标替换为第一簇中的第二PMU指标。
以此方式,还可以基于用户的需求对模型的输入进行调整,这样能够满足用户的个性化需求,增加了模型的灵活性。
在第一方面的一些实施例中,还包括:接收用户的输入信息,输入信息指示第二数量和相关度阈值。
在第一方面的一些实施例中,基于更新训练集生成负载特征识别模型包括:基于更新训练集和模型配置,生成负载特征识别模型,其中模型配置包括负载特征识别模型的模型类型和/或负载特征识别模型的超参数。在第一方面的一些实施例中,模型类型包括机器学习模型中的以下至少一项:有监督学习、或无监督学习,其中有监督学习包括以下至少一项:线性回归类型、或神经网络类型,其中无监督学习包括以下至少一项:K近邻、或最大期望类型。
以此方式,通过输入的模型配置,能够为待训练的模型提供必要的信息,从而便于模型训练过程。
在第一方面的一些实施例中,还包括:基于测试集,对负载特征识别模型进行测试,其中经测试的负载特征识别模型满足准确度要求。以此方式,还可以对生成的负载特征识别模型进行测试,以确保模型的准确度。
在第一方面的一些实施例中,运行特征包括第一核与第二核之间的性能加速比。
在第二方面,提供了一种模型生成方法。该方法包括:构建初始训练集,初始训练集包括多个初始数据项,每个初始数据项包括任务运行在异构***的第一核或第二核上时的第一数量的PMU指标和目标值,目标值指示任务的运行特征;基于初始训练集构建更新训练集,更新训练集包括多个更新数据项,每个更新数据项包括第二数量的PMU指标和目标值,其中第二数量小于第一数量;以及基于更新训练集,生成负载特征识别模型。
在第二方面的一些实施例中,构建初始训练集包括:获取第一数量的PMU指标;获取任务运行在第一核上时的第一应用性能指标;获取任务运行在第二核上时的第二应用性能指标;基于第一应用性能指标和第二应用性能指标,确定目标值;以及基于第一数量的PMU指标和目标值,构建初始训练集。
在第二方面的一些实施例中,基于初始训练集构建更新训练集包括:将第一数量的PMU指标划分为多个簇;从多个簇中提取第二数量的PMU指标;以及基于第二数量的PMU指标和目标值,构建更新训练 集。
在第二方面的一些实施例中,将第一数量的PMU指标划分为多个簇包括:确定第一数量的PMU指标中两两之间的第一相关度;基于相关度阈值将第一数量的PMU指标进行聚类,以得到多个簇,其中位于同一个簇中的任两个PMU指标之间的第一相关度不低于相关度阈值,或者其中位于同一个簇中的每两个PMU指标之间的第一相关度的均值不低于相关度阈值。
在第二方面的一些实施例中,第一相关度包括如下至少一项:协方差、欧式距离、或皮尔逊相关系数。
在第二方面的一些实施例中,从多个簇中提取第二数量的PMU指标包括:确定第一数量的PMU指标中的每个PMU指标与目标值之间的第二相关度;基于第二相关度,将第一数量的PMU指标进行排序;按照第二相关度从高到低的顺序,从多个簇中第二数量的簇中提取第二数量的PMU指标。
在第二方面的一些实施例中,第二数量的PMU指标包括从多个簇的第一簇中提取的第一PMU指标,该方法还包括:基于用户偏好或者来自用户的调整指示,将第一PMU指标替换为第一簇中的第二PMU指标。
在第二方面的一些实施例中,还包括:接收用户的输入信息,输入信息指示第二数量和相关度阈值。
在第二方面的一些实施例中,基于更新训练集生成负载特征识别模型包括:基于更新训练集和模型配置,生成负载特征识别模型,其中模型配置包括负载特征识别模型的模型类型和/或负载特征识别模型的超参数。
在第二方面的一些实施例中,模型类型包括机器学习模型中的以下一项或多项:有监督学习或无监督学习,其中有监督学习包括以下至少一项:线性回归类型、或神经网络类型,其中无监督学习包括以下至少一项:K近邻、或最大期望类型。
在第二方面的一些实施例中,还包括:基于测试集,对负载特征识别模型进行测试,其中经测试的负载特征识别模型满足准确度要求。
在第二方面的一些实施例中,运行特征包括第一核与第二核之间的性能加速比。
在第三方面,提供了一种用于任务调度的装置。该装置包括:获取模块,被配置获取任务运行在异构***的第一核上时的多个第一PMU指标;预测模块,被配置为将多个第一PMU指标输入到预先生成的负载特征识别模型,以得到任务的预测运行特征;以及调度模块,被配置为基于预测运行特征,对任务进行调度。
在第三方面的一些实施例中,该装置还包括:任务迁移模块,被配置为将任务进行迁移以运行在异构***的第二核上;迁移后信息获取模块,被配置为获取任务运行在第二核上时的多个第二PMU指标以及第二任务运行指标,第二任务运行指标包括第二运行时间和/或第二应用性能指标。
在第三方面的一些实施例中,获取模块还被配置为:获取任务运行在第一核上时的第一任务运行指标,第一任务运行指标包括第一运行时间和/或第一应用性能指标。可选地,第一应用性能指标可以包括第一吞吐量和/或第一IPC。
在第三方面的一些实施例中,该装置还包括实际运行特征确定模块,被配置为基于第一任务运行指标和第二任务运行指标,确定任务的实际运行特征。
在第三方面的一些实施例中,该装置还包括存储模块,被配置为存储以下至少一项:多个第一PMU指标、多个第二PMU指标、第一任务运行指标、第二任务运行指标、预测运行特征、或实际运行特征。
在第三方面的一些实施例中,该装置还包括模型更新模块,被配置为如果预测运行特征与实际运行特征之间的误差超过误差阈值,则确定误差超过误差阈值的任务次数;以及如果任务次数超过次数阈值,则基于第二PMU指标和实际运行特征,更新负载特征识别模型。
在第三方面的一些实施例中,该装置还包括:准确度确定模块,被配置为确定更新后的负载特征识别模型是否满足准确度要求;模型更新模块,被配置为如果确定不满足准确度要求,则重新生成负载特征识别模型。
在第三方面的一些实施例中,该装置还包括:初始训练集构建模块,被配置为构建初始训练集,初始训练集包括多个初始数据项,每个初始数据项包括任务运行在异构***的第一核或第二核上时的第一数量的PMU指标和目标值,目标值指示任务的运行特征;更新训练集构建模块,被配置为基于初始训练集构建更新训练集,更新训练集包括多个更新数据项,每个更新数据项包括第二数量的PMU指标和目标值,其中第二数量小于第一数量;以及模型生成模块,被配置为基于更新训练集,生成负载特征识别模型。
在第三方面的一些实施例中,初始训练集构建模块包括:第一获取子模块被配置为获取第一数量的 PMU指标;第二获取子模块被配置为获取任务运行在第一核上时的第一应用性能指标;第三获取子模块被配置为获取任务运行在第二核上时的第二应用性能指标;目标值确定子模块被配置为基于第一应用性能指标和第二应用性能指标,确定目标值;以及初始训练集构建子模块被配置为基于第一数量的PMU指标和目标值,构建初始训练集。
在第三方面的一些实施例中,更新训练集构建模块包括:分簇子模块被配置为将第一数量的PMU指标划分为多个簇;提取子模块被配置为从多个簇中提取第二数量的PMU指标;以及更新训练集构建子模块被配置为基于第二数量的PMU指标和目标值,构建更新训练集。
在第三方面的一些实施例中,分簇子模块包括:第一相关度确定单元被配置为确定第一数量的PMU指标中两两之间的第一相关度;分簇单元被配置为为基于相关度阈值将第一数量的PMU指标进行聚类,以得到多个簇,其中位于同一个簇中的任两个PMU指标之间的第一相关度不低于相关度阈值,或者其中位于同一个簇中的每两个PMU指标之间的第一相关度的均值不低于相关度阈值。
在第三方面的一些实施例中,第一相关度包括如下至少一项:协方差、欧式距离、或皮尔逊相关系数。
在第三方面的一些实施例中,提取子模块包括:第二相关度确定单元被配置为确定第一数量的PMU指标中的每个PMU指标与目标值之间的第二相关度;排序单元被配置为基于第二相关度,将第一数量的PMU指标进行排序;提取单元被配置为按照第二相关度从高到低的顺序,从多个簇中第二数量的簇中提取第二数量的PMU指标。
在第三方面的一些实施例中,提取子模块还包括调整单元,被配置为基于用户偏好或者来自用户的调整指示,将第一PMU指标替换为第一簇中的第二PMU指标。
在第三方面的一些实施例中,该装置还包括接收模块,被配置为接收用户的输入信息,输入信息指示第二数量和相关度阈值。
在第三方面的一些实施例中,模型生成模块被配置为:基于更新训练集和模型配置,生成负载特征识别模型,其中模型配置包括负载特征识别模型的模型类型和/或负载特征识别模型的超参数。
在第三方面的一些实施例中,模型类型包括机器学习模型中的以下一项或多项:有监督学习或无监督学习,其中有监督学习包括以下至少一项:线性回归类型、或神经网络类型,其中无监督学习包括以下至少一项:K近邻、或最大期望类型。
在第三方面的一些实施例中,该装置还包括测试模块,被配置为基于测试集,对负载特征识别模型进行测试,其中经测试的负载特征识别模型满足准确度要求。
在第三方面的一些实施例中,运行特征包括第一核与第二核之间的性能加速比。
在第四方面,提供了一种用于模型生成的装置。该装置包括:初始训练集构建模块,被配置为构建初始训练集,初始训练集包括多个初始数据项,每个初始数据项包括任务运行在异构***的第一核或第二核上时的第一数量的PMU指标和目标值,目标值指示任务的运行特征;更新训练集构建模块,被配置为基于初始训练集构建更新训练集,更新训练集包括多个更新数据项,每个更新数据项包括第二数量的PMU指标和目标值,其中第二数量小于第一数量;以及模型生成模块,被配置为基于更新训练集,生成负载特征识别模型。
在第四方面的一些实施例中,初始训练集构建模块包括:第一获取子模块被配置为获取第一数量的PMU指标;第二获取子模块被配置为获取任务运行在第一核上时的第一应用性能指标;第三获取子模块被配置为获取任务运行在第二核上时的第二应用性能指标;目标值确定子模块被配置为基于第一应用性能指标和第二应用性能指标,确定目标值;以及初始训练集构建子模块被配置为基于第一数量的PMU指标和目标值,构建初始训练集。
在第四方面的一些实施例中,更新训练集构建模块包括:分簇子模块被配置为将第一数量的PMU指标划分为多个簇;提取子模块被配置为从多个簇中提取第二数量的PMU指标;以及更新训练集构建子模块被配置为基于第二数量的PMU指标和目标值,构建更新训练集。
在第四方面的一些实施例中,分簇子模块包括:第一相关度确定单元被配置为确定第一数量的PMU指标中两两之间的第一相关度;分簇单元被配置为为基于相关度阈值将第一数量的PMU指标进行聚类,以得到多个簇,其中位于同一个簇中的任两个PMU指标之间的第一相关度不低于相关度阈值,或者其中位于同一个簇中的每两个PMU指标之间的第一相关度的均值不低于相关度阈值。
在第四方面的一些实施例中,第一相关度包括如下至少一项:协方差、欧式距离、或皮尔逊相关系数。
在第四方面的一些实施例中,提取子模块包括:第二相关度确定单元被配置为确定第一数量的PMU 指标中的每个PMU指标与目标值之间的第二相关度;排序单元被配置为基于第二相关度,将第一数量的PMU指标进行排序;提取单元被配置为按照第二相关度从高到低的顺序,从多个簇中第二数量的簇中提取第二数量的PMU指标。
在第四方面的一些实施例中,提取子模块还包括调整单元,被配置为基于用户偏好或者来自用户的调整指示,将第一PMU指标替换为第一簇中的第二PMU指标。
在第四方面的一些实施例中,该装置还包括接收模块,被配置为接收用户的输入信息,输入信息指示第二数量和相关度阈值。
在第四方面的一些实施例中,模型生成模块被配置为:基于更新训练集和模型配置,生成负载特征识别模型,其中模型配置包括负载特征识别模型的模型类型和/或负载特征识别模型的超参数。
在第四方面的一些实施例中,模型类型包括机器学习模型中的以下一项或多项:有监督学习或无监督学习,其中有监督学习包括以下至少一项:线性回归类型、或神经网络类型,其中无监督学习包括以下至少一项:K近邻、或最大期望类型。
在第四方面的一些实施例中,该装置还包括测试模块,被配置为基于测试集,对负载特征识别模型进行测试,其中经测试的负载特征识别模型满足准确度要求。
在第四方面的一些实施例中,运行特征包括第一核与第二核之间的性能加速比。
在第五方面,提供了一种电子设备,包括多核处理器以及存储器,所述存储器上存储有由所述多核处理器执行的指令,当所述指令被所述多核处理器执行时使得电子设备实现:获取任务运行在异构***的第一核上时的多个第一PMU指标;将多个第一PMU指标输入到预先生成的负载特征识别模型,以得到任务的预测运行特征;以及基于预测运行特征,对任务进行调度。
在第五方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:将任务进行迁移以运行在异构***的第二核上;获取任务运行在第二核上时的多个第二PMU指标以及第二任务运行指标,第二任务运行指标包括第二运行时间和/或第二应用性能指标。
在第五方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:获取任务运行在第一核上时的第一任务运行指标,第一任务运行指标包括第一运行时间和/或第一应用性能指标。可选地,第一应用性能指标可以包括第一吞吐量和/或第一IPC。
在第五方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现基于第一任务运行指标和第二任务运行指标,确定任务的实际运行特征。
在第五方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:存储以下至少一项:多个第一PMU指标、多个第二PMU指标、第一任务运行指标、第二任务运行指标、预测运行特征、或实际运行特征。
在第五方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:如果预测运行特征与实际运行特征之间的误差超过误差阈值,则确定误差超过误差阈值的任务次数;以及如果任务次数超过次数阈值,则基于第二PMU指标和实际运行特征,更新负载特征识别模型。
在第五方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:确定更新后的负载特征识别模型是否满足准确度要求;如果确定不满足准确度要求,则重新生成负载特征识别模型。
在第五方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现通过下述过程生成负载特征识别模型:构建初始训练集,初始训练集包括多个初始数据项,每个初始数据项包括任务运行在异构***的第一核或第二核上时的第一数量的PMU指标和目标值,目标值指示任务的运行特征;基于初始训练集构建更新训练集,更新训练集包括多个更新数据项,每个更新数据项包括第二数量的PMU指标和目标值,其中第二数量小于第一数量;以及基于更新训练集,生成负载特征识别模型。
在第五方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:获取第一数量的PMU指标;获取任务运行在第一核上时的第一应用性能指标;获取任务运行在第二核上时的第二应用性能指标;基于第一应用性能指标和第二应用性能指标,确定目标值;以及基于第一数量的PMU指标和目标值,构建初始训练集。
在第五方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:将第一数量的PMU指标划分为多个簇;从多个簇中提取第二数量的PMU指标;以及基于第二数量的PMU指标和目标值,构建更新训练集。
在第五方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:确定第一数量 的PMU指标中两两之间的第一相关度;基于相关度阈值将第一数量的PMU指标进行聚类,以得到多个簇,其中位于同一个簇中的任两个PMU指标之间的第一相关度不低于相关度阈值,或者其中位于同一个簇中的每两个PMU指标之间的第一相关度的均值不低于相关度阈值。
在第五方面的一些实施例中,第一相关度包括如下至少一项:协方差、欧式距离、或皮尔逊相关系数。
在第五方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:确定第一数量的PMU指标中的每个PMU指标与目标值之间的第二相关度;基于第二相关度,将第一数量的PMU指标进行排序;按照第二相关度从高到低的顺序,从多个簇中第二数量的簇中提取第二数量的PMU指标。
在第五方面的一些实施例中,第二数量的PMU指标包括从多个簇的第一簇中提取的第一PMU指标,当所述指令被所述多核处理器执行时使得电子设备实现:基于用户偏好或者来自用户的调整指示,将第一PMU指标替换为第一簇中的第二PMU指标。
在第五方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:接收用户的输入信息,输入信息指示第二数量和相关度阈值。
在第五方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:基于更新训练集和模型配置,生成负载特征识别模型,其中模型配置包括负载特征识别模型的模型类型和/或负载特征识别模型的超参数。在第一方面的一些实施例中,模型类型包括机器学习模型中的以下一项或多项:有监督学习、或无监督学习,其中有监督学习包括以下至少一项:线性回归类型、或神经网络类型,无监督学习包括以下至少一项:K近邻、或最大期望类型。
在第五方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:基于测试集,对负载特征识别模型进行测试,其中经测试的负载特征识别模型满足准确度要求。
在第五方面的一些实施例中,运行特征包括第一核与第二核之间的性能加速比。
在第六方面,提供了一种电子设备,包括多核处理器以及存储器,所述存储器上存储有由所述多核处理器执行的指令,当所述指令被所述多核处理器执行时使得电子设备实现:构建初始训练集,初始训练集包括多个初始数据项,每个初始数据项包括任务运行在异构***的第一核或第二核上时的第一数量的PMU指标和目标值,目标值指示任务的运行特征;基于初始训练集构建更新训练集,更新训练集包括多个更新数据项,每个更新数据项包括第二数量的PMU指标和目标值,其中第二数量小于第一数量;以及基于更新训练集,生成负载特征识别模型。
在第六方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:获取第一数量的PMU指标;获取任务运行在第一核上时的第一应用性能指标;获取任务运行在第二核上时的第二应用性能指标;基于第一应用性能指标和第二应用性能指标,确定目标值;以及基于第一数量的PMU指标和目标值,构建初始训练集。
在第六方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:将第一数量的PMU指标划分为多个簇;从多个簇中提取第二数量的PMU指标;以及基于第二数量的PMU指标和目标值,构建更新训练集。
在第六方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:确定第一数量的PMU指标中两两之间的第一相关度;基于相关度阈值将第一数量的PMU指标进行聚类,以得到多个簇,其中位于同一个簇中的任两个PMU指标之间的第一相关度不低于相关度阈值,或者其中位于同一个簇中的每两个PMU指标之间的第一相关度的均值不低于相关度阈值。
在第六方面的一些实施例中,第一相关度包括如下至少一项:协方差、欧式距离、或皮尔逊相关系数。
在第六方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:确定第一数量的PMU指标中的每个PMU指标与目标值之间的第二相关度;基于第二相关度,将第一数量的PMU指标进行排序;按照第二相关度从高到低的顺序,从多个簇中第二数量的簇中提取第二数量的PMU指标。
在第六方面的一些实施例中,第二数量的PMU指标包括从多个簇的第一簇中提取的第一PMU指标,当所述指令被所述多核处理器执行时使得电子设备实现:基于用户偏好或者来自用户的调整指示,将第一PMU指标替换为第一簇中的第二PMU指标。
在第六方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:接收用户的输入信息,输入信息指示第二数量和相关度阈值。
在第六方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:基于更新训练集和模型配置,生成负载特征识别模型,其中模型配置包括负载特征识别模型的模型类型和/或负载特征识 别模型的超参数。
在第六方面的一些实施例中,模型类型包括机器学习模型中的以下一项或多项:有监督学习、或无监督学习,其中有监督学习包括以下至少一项:线性回归类型或神经网络类型,无监督学习包括以下至少一项:K近邻或最大期望类型。
在第六方面的一些实施例中,当所述指令被所述多核处理器执行时使得电子设备实现:基于测试集,对负载特征识别模型进行测试,其中经测试的负载特征识别模型满足准确度要求。
在第六方面的一些实施例中,运行特征包括第一核与第二核之间的性能加速比。
在第七方面,提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现根据上述第一方面或第二方面或其任一实施例中的方法的操作。
在第八方面,提供了一种芯片或芯片***。该芯片或芯片***包括处理电路,被配置为执行根据上述第一方面或第二方面或其任一实施例中的方法的操作。
在第九方面,提供了一种计算机程序或计算机程序产品。该计算机程序或计算机程序产品被有形地存储在计算机可读介质上并且包括计算机可执行指令,计算机可执行指令在被执行时使设备实现根据上述第一方面或第二方面或其任一实施例中的方法的操作。
附图说明
结合附图并参考以下详细说明,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。在附图中,相同或相似的附图标注表示相同或相似的元素,其中:
图1示出了本公开的实施例能够被应用于其中的异构***的示意框图;
图2示出了根据本公开的一些实施例的任务调度的过程的示意流程图;
图3示出了根据本公开的一些实施例的***的示意框图;
图4示出了根据本公开的一些实施例的生成负载特征识别模型的过程的示意流程图;
图5示出了根据本公开的一些实施例的生成负载特征识别模型的过程的另一示意流程图;
图6示出了根据本公开的一些实施例的模型校准的过程的示意流程图;
图7示出了根据本公开的一些实施例的用于任务调度的装置的示意框图;
图8示出了根据本公开的一些实施例的用于模型生成的装置的示意框图;以及
图9示出了可以用来实施本公开的实施例的示例设备的示意性框图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
在本公开的实施例的描述中,术语“包括”及其类似用语应当理解为开放性包含,即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至少一个实施例”。术语“第一”、“第二”等等可以指代不同的或相同的对象。下文还可能包括其他明确的和隐含的定义。
如前所述,将任务调度到合适的硬件资源(诸如核)上,能够充分发挥硬件资源的利用率。一般地,可以基于任务的运行特征来将任务调度到硬件资源上,因此需要更加准确的任务的运行特征。
本公开的实施例提供了一种用于异构***的任务调度方案,能够基于预先生成的模型得到更为准确的任务的运行特征,从而能够被用于任务调度。以此,能够将任务调度到合适的硬件资源上,提升硬件资源的利用率,提高任务的处理效率。
图1示出了本公开的实施例能够被应用于其中的异构***100的示意框图。如图1所示,异构***100包括第一核110和第二核120。第一核110和第二核120属于不同的硬件资源。第一核110和第二核120可以具有不同的算力,例如第一核110的算力低于第二核120的算力。
举例而言,第一核110可以为中央处理单元(Central Processing Unit,CPU),第二核120可以为图形处理单元(Graphics Processing Unit,GPU)。再举例而言,第一核110可以为CPU,第二核120可以为现场可编程门阵列(Field Programmable Gate Array,FPGA)。再举例而言,第一核110可以为CPU,第二核 120可以为张量处理单元(Tensor Processing Unit,TPU)。可理解的是,此处对于第一核110和第二核120的举例仅是示意,不应解释为对本公开的实施例的限制。在一些示例中,异构***100可以为各种XPU异构计算平台,或简称为异构计算平台或异构平台。可选地,第一核110可以为小核,第二核120可以为大核。或者可选地,第一核110可以为大核,第二核120可以为小核。应注意,本公开的实施例中对术语“大核”和“小核”的表述仅是示意,例如大核的算力比小核的算力强。
参见图1,任务130能够被调度(或称部署)在异构***100中并在核上运行,例如任务130可以在第一核110上运行。示例性地,任务130也可以在异构***100中进行迁移,如虚线箭头132所示,该任务130可以从第一核110迁移到第二核120上。可理解,在本公开中的术语“任务”可以被称为负载、任务负载等。
为了反应任务运行时的状态,硬件、操作***(Operating System,OS)或基础设施服务软件等可以提供各种不同的检测指标。例如可以基于专家的经验来选择合适的监测指标,通过定义策略来刻画任务的运行特征,从而在不同的硬件资源上调度任务。然而,专家经验是有限的,当面对不同算力的***和大量的指标时,专家经验所选择的监测指标可能是不合适的;另外所定义的策略也无法应对任务的变化。
示例性地,异构***100可以具有多核处理架构,在本公开中的术语“多核处理架构”可以被称为多核处理器架构、异构多核处理架构、异构处理器架构等。示例性地,本公开中的“多核”也可以包括众核,“异构”也可以包括超异构等,本公开对此不限定。
应注意的是,图1所示出的异构***100仅是示例,而不应解释为本公开的实施例的限制。例如异构***100可以包括更多数量的核(例如中核等),例如异构***100还可以包括诸如OS等其它模块,如下结合图3所述。
图2示出了根据本公开的一些实施例的任务调度的过程200的示意流程图。在框210,获取任务运行在异构***的第一核上时的多个第一PMU指标。在框220,将多个第一PMU指标输入到预先生成的负载特征识别模型,以得到任务的预测运行特征。在框230,基于预测运行特征,对任务进行调度。
示例性地,在框210处,当任务运行在第一核时,可以获取多个第一PMU指标以及第一任务运行指标,其中第一任务运行指标例如包括第一运行时间和/或第一应用性能指标,第一应用性能指标例如可以包括第一吞吐量或者第一IPC。
在一些实施例中,可以获取预先生成的负载特征识别模型,该负载特征识别模型的输入是多个(如K个,K为正整数)PMU指标,输出为任务的运行特征。从而,在框220处,可以将所获取的多个第一PMU指标输入到负载特征识别模型,从而得到该模型的输出,即该模型所预测的任务的运行特征。举例而言,运行特征可以表示在不同核之间的性能加速比。示例性地,关于负载特征识别模型的生成过程可以参照如下结合图4至图5的实施例。
在一些实施例中,在框230处,可以基于所预测的任务的运行特征,对任务进行调度,例如可以将任务从第一核迁移到第二核。
附加地或可选地,可以将任务进行迁移以运行在异构***的第二核上,并且可以获取任务运行在第二核上时的多个第二PMU指标和第二任务运行指标,其中第二任务运行指标例如包括第二运行时间和/或第二应用性能指标,第二应用性能指标例如可以包括第二吞吐量或者第二IPC。
在一些示例中,进一步地可以基于第一任务运行指标和第二任务运行指标,确定任务的实际运行特征。例如,可以基于第一应用性能指标和第二应用性能指标,确定任务的实际运行特征,例如实际的性能加速比。
在一些实施例中,可以将所得到的指标等进行存储,例如将如下中的一项或多项进行存储:多个第一PMU指标、多个第二PMU指标、第一任务运行指标、第二任务运行指标、预测运行特征、或实际运行特征。
附加地或可选地,可以将实际运行特征与预测运行特征进行比较,来确定负载特征识别模型的准确性。示例性地,如果预测运行特征与实际运行特征之间的误差超过误差阈值,则确定误差超过误差阈值的任务次数;以及如果任务次数超过次数阈值,则对负载特征识别模型进行更新。例如可以基于第二PMU指标和实际运行特征,通过再训练得到经更新的负载特征识别模型。
可选地,还可以进一步确定经更新的负载特征识别模型是否满足准确度要求。如果确定不满足准确度要求,则重新生成负载特征识别模型,例如可以通过重新训练得到满足准确度要求的负载特征识别模型。如果确定满足准确度要求,则可以输出(或存储)所得到负载特征识别模型。例如可以用于如图2所示的 任务调度。
下面将结合图3至图6进行较为详细的阐述。图3示出了根据本公开的一些实施例的***300的示意框图。如图3所示,***300包括硬件310和软件模块,其中软件模块包括操作***320和软件开发套件(Software Development Kit,SDK)330。
硬件310可以包括多个核,例如大核311和小核312。操作***320可以包括负载特征识别工具,具体包括模型推理计算模块321、模型评估模块322、模型再训练与在线更新模块323、异构算力调度器324和样本存储模块325。软件开发套件330可以包括任务特征建模工具,具体包括微架构特征选择模块331、模型训练模块332和特征和模型分析评估模块333。
可选地,大核311和小核312表示不同类型的核,在***300中,同一类型的核的数量可以为多个,例如硬件310可以包括多个大核和多个小核。可理解,在其他示例中,***300还可以其他类型的一个或多个核,例如中核,本公开对此不限定。
在本公开的一些实施例中,任务特征建模工具可以被用于构建(或生成)负载特征识别模型334,可选地,负载特征识别模型334也可以被称为负载特征识别模型实例或其他名称,本公开对此不限定。示例性地,任务特征建模工具可以基于输入信息通过训练生成负载特征识别模型334。如图3所示,输入信息可以指示如下一项或多项:微架构指标301、处理器类型302、应用指标303、特征选择配置304和模型配置305。
示例性地,输入信息可以指示多个任务的信息,其中每个任务的信息指示微架构指标301、处理器类型302和应用指标303。以任务1为例,处理器类型302可以指示运行该任务1的是哪个核,如大核311还是小核312。微架构指标301可以包括任务1运行在核上(如大核311)时的多个PMU指标,例如D个。应用指标303可以包括任务1运行在核上(如大核311)时的应用性能,例如吞吐量或IPC。可选地,应用指标303可以包括任务1运行在核上时的功耗。
示例性地,输入信息可以指示关于模型训练的信息,其中关于模型训练的信息指示特征选择配置304和模型配置305。例如,特征选择配置304可以包括相关度阈值和第二数量(表示为K),其中第二数量表示负载特征识别模型的输入中的PMU指标的数量,而相关度阈值用于确定负载特征识别模型的输入。例如,特征选择配置304可以包括用户偏好,以指示特定微架构指标。例如,模型配置305可以包括模型的类型、和/或模型的超参数,其中模型的类型可以为机器学习模型中的以下一项或多项:有监督学习、或无监督学习,其中有监督学习包括以下至少一项:线性回归类型、或神经网络类型,无监督学习包括以下至少一项:K近邻、最大期望类型。例如,模型配置305可以包括准确率阈值。
可选地,任务特征建模工具可以被实现为在芯片厂商和/或OS厂商为用户提供的SDK中的软件代码的形式,本公开对此不限定。
图4示出了根据本公开的一些实施例的生成负载特征识别模型的过程400的示意流程图。在框410,构建初始训练集,该初始训练集包括多个初始数据项,每个初始数据项包括任务运行在异构***的第一核或第二核上时的第一数量的PMU指标和目标值,目标值指示任务的运行特征。在框420,基于初始训练集构建更新训练集,更新训练集包括多个更新数据项,每个更新数据项包括第二数量的PMU指标和目标值,其中第二数量小于第一数量。在框430,基于更新训练集,生成负载特征识别模型。
示例性地,运行特征可以包括在第一核与第二核之间的性能加速比。
在一些实施例中,在框410处,可以获取第一数量的PMU指标,获取任务运行在第一核上时的第一应用性能指标,获取任务运行在第二核上的第二应用性能指标;基于第一应用性能指标和第二应用性能指标,确定目标值;并基于第一数量的PMU指标和目标值来构建初始训练集。
在一些实施例中,在框420处,可以将第一数量的PMU指标划分为多个簇;从多个簇中提取第二数量的PMU指标;并且基于第二数量的PMU指标和目标值来确定更新训练集。第二数量小于第一数量,这样能够减小训练集的规模,进而减小后续的模型训练的处理规模,提高处理效率。
在一些示例中,可以确定第一数量的PMU指标中两两之间的第一相关度,并且可以基于第一相关度将第一数量的PMU指标进行聚类以划分为多个簇,其中位于同一个簇内的任两个PMU指标之间的第一相关度不低于相关度阈值,或者其中位于同一个簇中的每两个PMU指标之间的第一相关度的均值不低于相关度阈值。示例性地,第一相关度可以包括协方差、欧式距离、或皮尔逊相关系数等。
在一些示例中,可以确定第一数量的PMU指标中的每个PMU指标与目标值之间的第二相关度;基于第二相关度,将第一数量的PMU指标进行排序;按照第二相关度从高到低的顺序,从多个簇中第二数量 的簇中提取第二数量的PMU指标。
附加地或可选地,第二数量的PMU指标包括从多个簇的第一簇中提取的第一PMU指标,还可以包括:基于用户偏好或者来自用户的调整指示,将第一PMU指标替换为第一簇中的第二PMU指标。
在一些实施例中,可以接收用户的输入信息,该输入信息指示第二数量和相关度阈值。
在一些实施例中,在框430处,可以基于更新训练集和模型配置,生成负载特征识别模型,其中模型配置包括负载特征识别模型的模型类型和/或负载特征识别模型的超参数。例如,模型类型包括机器学习模型中的以下一项或多项:有监督学习、或无监督学习,其中有监督学习包括以下至少一项:线性回归类型、或神经网络类型等,无监督学习包括以下至少一项:K近邻、最大期望类型等。可理解,此处列出的模型类型仅是示意,在实际应用中,也可以是其他的类型,本公开对此不限定。
附加地或可选地,还可以基于测试集,对负载特征识别模型进行测试。
图5示出了根据本公开的一些实施例的生成负载特征识别模型的过程500的另一示意流程图。过程500可以是过程400的一种更具体的实现方式。示例性地,PMU指标也被称为微架构指标。
在框510,通过预处理构建初始训练集。
在一些实施例中,可以基于输入信息来构建初始训练集。具体而言,可以基于微架构指标301、处理器类型302和应用指标303来构建初始训练集。示例性地,可以通过不同的处理器类型302所对应的不同的应用指标303,来得到目标值,例如该目标值可以为加速比,也称为性能加速比。
假设输入信息指示N个任务的信息,每个任务的信息指示D个微架构指标、被运行时的核以及相应的应用指标。以任务1为例,输入信息可以指示D个微架构指标、运行在第一核时的应用指标1、以及运行在第二核时的应用指标2。那么可以基于应用指标1和应用指标2确定该任务1在不同核之间的性能加速比,例如应用指标1与应用指标2的比值。其中应用指标1和应用指标2指代相同的属性,例如吞吐量或者IPC。
以此方式,初始训练集可以包括关于N个任务的N个数据项,每个数据项包括对应的任务的D个微架构指标(也称D个PMU指标)以及目标值(如性能加速比)。可选地,该目标值可以认为是在机器学习中有监督学习的标签。
在一些实施例中,输入信息可以来自于用户,例如用户可以在异构算力平台上采集得到微架构指标301、处理器类型302和应用指标303,然后用户可以将采集到的信息与特征选择配置304和模型配置305一起提供给如图3所示的任务特征建模工具,以便于任务特征建模工具生成负载特征识别模型334。
在框520,确定每个微架构指标与目标值之间的相关度,并将D个微架构指标进行排序。
为了方便描述,这里将微架构指标与目标值之间的相关度称为第二相关度。示例性地,第二相关度可以为欧式距离、协方差、皮尔逊相关系数等。示例性地,可以按照第二相关度从高到低的顺序,将D个微架构指标进行排序。可选地,如果第二相关度为协方差,则按照协方差的绝对值从大到小的顺序,将D个微架构指标进行排序。
举例而言,假设D个微架构指标中的其中一个微架构指标为“CPU执行单元忙的次数”(cpus.iq.fuBusy),目标值为性能加速比(speedup),那么可以通过下式(1)来得到“CPU执行单元忙的次数”与“性能加速比”两者之间的协方差作为两者的第二相关度:
cov(cpus.iq.fuBusy,speedup)=E[cpus.iq.fuBusy*speedup]–E[cpus.iq.fuBusy]*E[speedup](1)
在式(1)中,cov( )表示圆括号中两个变量的协方差,E[ ]表示方括号中的变量的期望。协方差绝对值越大,表示这两个变量的相关度越高。
作为一个示例,如下表1示出了按照第二相关度从高到低进行排序后的多个微架构指标。
表1

在框530,确定D个微架构指标中两两之间的相关度,并进行聚类。
为了方便描述,这里将两个微架构指标之间的相关度称为第一相关度。示例性地,第一相关度可以为欧式距离、协方差、皮尔逊相关系数等。
在一些实施例中,可以基于第一相关度以及特征选择配置304所指示的相关度阈值进行聚类,以将D个微架构指标分为多个簇。示例性地,同一个簇中可以包括一个或多个微架构指标,并且位于同一簇中的任意两个微架构指标之间的第一相关度都不低于(或高于)相关度阈值。可选地,多个簇也可以被称为多个类、多个集合、多个子集或其他称呼,本公开对此不限定。可选地,聚类也可以被称为分组、分簇、分类、划分或其他称呼,本公开对此不限定。
示例性地,D个微架构指标中两两之间的第一相关度可以包括D×D个值。示例性地,由于第一微架构指标与第二微架构指标之间的第一相关度等于第二微架构指标与第一微架构指标之间的第一相关度,并且某微架构指标(如第一微架构指标)与其自身的第一相关度等于1,因此D个微架构指标中两两之间的相关度可以包括D×(D-1)/2个值。示例性地,本公开中不考虑微架构指标与其自身的第一相关度。
在进行聚类时,可以在遍历所有第一相关度后确定是否属于同一个簇,或者可以采用并查集算法等,本公开对此不限定。以此方式,可以将D个微架构指标划分为多个簇(例如C个),每个微架构指标属于且仅属于一个簇,每个簇中任两个微架构指标之间的第一相关度都不低于或高于相关度阈值。
作为一个示例,假设相关度阈值为0.8,如下表2示出了包括3个微架构指标的簇。
表2
在540,基于在520中确定的排序和在530中确定的多个簇,选择第二数量(K个)的候选微架构指标。示例性地,K远小于D,例如K比D低两个量级或低三个量级。作为一例,K=7,D=2000。示例性地,所选择的K个微架构指标属于K个不同的簇,也就是说,在进行选择时,每个簇最多选择其中一个微架构指标。
在一些实施例中,可以按照排序,也就是说按照第二相关度从高到低的顺序来选择。例如,从排序中确定未被选择的第二相关度最高的微架构指标,如果该微架构指标与已选择的所有微架构指标都不属于同一簇,则选择该第二相关度最高的微架构指标;否则不选择。
举例而言,结合前述的表1和表2,在选择cpus.iew.blockCycles之后,由于下一个微架构指标commit.commitSquashedInsts与cpus.iew.blockCycles属于同一个簇,因此不选择commit.commitSquashedInsts。
示例性地,K个候选微架构指标可以构成候选集合。可选地,可以按顺序选择微架构指标并放到候选集合中,如果某个微架构指标和候选集合已有的候选微架构指标属于同一个簇,则该微架构指标不放入候选集合里,反之则作为候选微架构指标。可以继续这样的按顺序选择,直到候选集合中的候选微架构指标的数量达到特征选择配置304所指示的第二数量(K)。
本公开的实施例中,上述的框510至框540处的操作可以由如图3所示的微架构特征选择模块331来执行。该微架构特征选择模块331能够从大量的(如D个)微架构指标中,根据用户输入的特征选择配置304,自动地选择例如K个微架构指标。具体而言,微架构特征选择模块331可以确定所输入的D个微架构指标两两之间的相关度后,根据特征选择配置304中的相关度阈值进行聚类,再根据微架构指标和目标 值之间的相关度排序结果,自动选择K个最佳微架构指标。
在框550,基于K个候选微架构指标得到用于训练的K个微架构指标。在一些示例中,可以将K个候选微架构指标作为用于训练的K个微架构指标。在另一些示例中,可以基于用户偏好对K个候选微架构指标进行调整,例如将其中的一个或几个候选微架构指标替换为其他的微架构指标,以得到用于训练的K个微架构指标。
在一些实施例中,如果用户偏好指示了特定微架构指标,并且K个候选微架构指标不包括该特定微架构指标,那么可以确定K个候选微架构指标中与特定微架构指标属于同一簇的第一候选微架构指标,并将该第一候选微架构指标替换为该特定微架构指标。以此方式,经替换之后可以得到用于训练的K个微架构指标。
可选地,在一些实施例中,用户可以对K个候选微架构指标进行评估,确定这K个候选微架构指标是否具有代表性,并且用户可以将其中其认为不具有代表性的某一个或某几个候选微架构指标替换为其他相同数量的更具有代表性的微架构指标。示例性地,被替换的候选微架构指标和替换的微架构指标属于同一簇。
本公开的实施例中,上述的框550处的操作可以被省略,例如在540所得到的K个候选微架构指标被用于在以下560中的训练操作。
结合图3,框550处的操作可以由特征和模型分析评估模块333来执行。示例性地,特征和模型分析评估模块333包括特征分析评估子模块。在一个示例中,该特征分析评估子模块可以基于用户在特征选择配置304中指示的用户偏好得到用于训练的K个微架构指标。在另一个示例中,该特征分析评估子模块可以通过人机交互的图形用户界面(Graphical User Interface,GUI)由用户进行选择和/或确认。例如,特征分析评估子模块可以将K个候选微架构指标所在的K个簇以及该K个簇中所包括的多个微架构指标和对应的第二相关度通过GUI提供给用户,从而用户可以根据簇中的第二相关度,对候选微架构指标进行调整。例如,可以选择不具有最高的第二相关度但可能更具普适性的微架构指标。
在框560,基于K个微架构指标进行模型训练。结合图3,可以由模型训练模块332进行训练。
在一些实施例中,可以基于K个微架构指标以及在前述初始训练集中的目标值,构建训练集,其中,目标值可以为训练集中的微架构指标所对应的标签。示例性地,可以基于训练集以及模型配置305,来进行模型训练。
在一个示例中,模型配置305可以包括模型类型,例如模型类型为线性回归类型。在另一个示例中,模型配置305可以包括模型类型以及模型超参数,例如模型类型为神经网络类型,模型超参数包括模型的结构参数、迭代次数、学习率等,其中模型的结构参数可以包括网络中间层的数量、每层神经元的数量等。
作为一例,例如模型类型为线性回归类型,经训练后生成的负载特征识别模型334可以被表示为如下的式(2):
speedup=2.5e07*P1+4.89e09*P2-2.35e*P3+4.83e08*P4+4.15e07*P5-3.48e06*P6-2.92e08*P7+2.43                              (2)
在式(2)中,P1至P7表示K个微架构指标,系数2.5e07,4.89e09,2.35e,4.83e08,4.15e07,3.48e06,2.92e08和2.43是通过训练而确定的。
这样,生成的负载特征识别模型334,如式(2)所示,可以在后续被用于模型推理。例如可以基于任务运行时的微架构指标P1至P7,得到该任务在两个核之间的性能加速比(speedup)。
可选地,所生成的负载特征识别模型334可以指示其输入包括K个微架构指标,其输出包括目标值。可选地,所生成的负载特征识别模型334还可以指示其模型参数,如包括式(2)中的各个权重参数。可选地,所生成的负载特征识别模型334还可以指示该模型在训练集上的训练误差和在测试集上的测试误差。
在框570,评估模型是否满足要求。示例性地,要求可以包括误差要求或准确率要求。以此方式,通过对模型进行评估,能够避免出现欠拟合或者过拟合的情况。
在一些示例中,可以基于测试集对负载特征识别模型进行测试,得到该模型的准确率。如果该准确度低于模型配置305所指示的准确率阈值,则说明该模型不满足要求,那么可以返回540或者返回560。如果该准确度不低于(如高于)模型配置305所指示的准确率阈值,则说明该模型满足要求,那么可以前进到580。可选地,可以从训练集中选取其中的一部分作为测试集。可选地,也可以构建完全不同于训练集或者与训练集的部分数据项相同的测试集。可选地,测试集可以被称为验证集。
在一些示例中,可以确定负载特征识别模型的训练误差和测试误差,并确定训练误差和测试误差之间的差值,如果该差值高于误差阈值,则说明该模型不满足要求,那么可以返回540或者返回560。如果该差值低于(如不高于)误差阈值,则说明该模型满足要求,那么可以前进到580。可选地,确定差值的方式例如可以为|(测试误差-训练误差)/训练误差|,例如误差阈值可以为5%或其他值。可选地,确定差值的方式例如可以为|测试误差-训练误差|,例如误差阈值可以为e-03。可理解,确定差值的方式不限于此,这里不再一一罗列。
可见,在模型不满足要求的情况下,可以再次触发对模型的训练。例如,可以通过调整模型的超参数(如深度神经网络模型中的迭代次数等)再次触发模型训练,直到生成的模型满足要求(如精确度要求或误差要求)。
本公开的实施例中,框570处的操作可以由如图3所示的特征和模型分析评估模块333来执行。示例性地,特征和模型分析评估模块333包括模型分析评估子模块。在一个示例中,这样,模型分析评估子模块可以用于确定训练所生成的负载特征识别模型334的精度是否足够。
在框580,输出生成的负载特征识别模型334。示例性地,参照图3,可以将负载特征识别模型334提供给模型推理计算模块321。
可选地,输出的负载特征识别模型334可以指示其输入包括K个微架构指标,其输出包括目标值。可选地,所生成的负载特征识别模型334还可以指示其模型参数,如包括式(2)中的各个权重参数。可选地,所生成的负载特征识别模型334还可以指示该模型在训练集上的训练误差和在测试集上的测试误差。
已经结合图5较为详细地描述了本公开的实施例中模型生成的过程,但是应理解,过程500仅是示例而不应解释为对本公开的实施例的限制,例如框520和框530处的操作可以互换顺序,例如框550处的操作可以被省略,例如第二数量K的值可以是预先存储的,等等。
以此方式,本公开的实施例提供了一种模型生成的方案。通过在SDK 330增加微架构特征选择模块331,能够从大量的微架构指标中自动选择出最佳的第二数量的微架构指标,被选择的这些微架构指标能够体现任务在异构算力上的差异性。并且基于第二数量的微架构指标进行模型训练能够降低***的复杂度。进一步地,通过模型训练模块332以及特征和模型分析评估模块333,能够输出基于第二数量的微架构指标的负载特征识别模型334。可理解,该负载特征识别模型334能够被用于基于任务在某个核上运行时的第二数量的微架构指标,确定出任务的运行特征,例如在不同核之间的性能加速比。可理解,可以仅监控第二数量的微架构指标而无需对所有的微架构指标进行监控,这样能够降低内核监控的开销,降低能耗。
另外可理解,本公开实施例中的模型生成的方案基于第一数量的微架构指标训练生成,其中微架构指标可以来自任意异构***(如包括大核和小核的异构处理器架构)硬件上测量的微架构指标,还可以来自异构***的同一算力硬件(核)上的面向不同场景(如数据库或大数据场景)所采集的数据。因此,本公开实施例中的模型生成的方案具有普适性。
图6示出了根据本公开的一些实施例的模型校准的过程600的示意流程图。示例性地,负载特征识别模型可以在OS内核中实现。为了方便描述,假设任务初始被运行在第一核,例如第一核可以为大核311或者为小核312。
在框610,在任务发生迁移之后,监测任务运行在迁移后的核上时的K个微架构指标和任务运行指标。任务运行指标可以包括任务运行时间、吞吐量、IPC等。例如,迁移前的核可以被称为第一核,迁移后的核可以被称为第二核。
在框620,获取任务运行在迁移前的核上时的K个微架构指标、任务运行指标和预测的性能加速比。
在一些示例中,以图3为例,迁移前的核可以为第一核311,迁移后的核可以为第二核312;或者,迁移前的核可以为第二核312,迁移后的核可以为第一核311。
示例性地,任务运行(例如在迁移前的核上)时,可以获取任务运行时的K个微架构指标,将K个微架构指标输入到负载特征识别模型334,从而得到预测的性能加速比。结合图3,可以由异构算力调度器324采集任务运行时的K个微架构指标,由模型推理计算模块321通过使用负载特征识别模型334并基于来自异构算力调度器324的K个微架构指标来得到预测的性能加速比。在一些示例中,模型推理计算模块321可以将预测的性能加速比提供给异构算力调度器324,从而异构算力调度器324能够基于该预测的性能加速比进行任务调度的决策,例如执行任务迁移。在一些示例中,异构算力调度器324在决定执行任务迁移时,可以将任务运行在迁移前的核上的K个微架构指标、任务运行指标和预测的性能加速比提供给模型评估模块322。
示例性地,任务运行(在迁移前的核或在迁移后的核上)时,可以通过检测获取K个微架构指标以及任务运行指标。
示例性地,结合图3,可以由模型评估模块322执行在框610和框620处的操作。
在框630,确定实际的性能加速比。示例性地,结合图3,可以由模型评估模块322基于任务在迁移前的核上的任务运行指标以及任务在迁移后的核上的任务运行指标,来确定实际的性能加速比。
在一些示例中,模型评估模块322可以将以下信息存储到样本存储模块325:任务运行在迁移前的核上的K个微架构指标和任务运行指标;任务运行在迁移后的核上的K个微架构指标和任务运行指标;以及实际的性能加速比。可选地,模型评估模块322也可以将任务运行在迁移前的核上时所确定的预测的性能加速比存储到样本存储模块325中。可选地,模型评估模块322还可以存储模型的类型,如线性回归类型或神经网络类型。
在框640,将实际的性能加速比和预测的性能加速比进行比较。在一些示例中,可以由模型评估模块322通过比较得到比较结果,其中比较结果例如可以表示误差等。
可选地,在一些实施例中,可以确定在一段时间内的实际的性能加速比和预测的性能加速比之间的平均绝对误差(Mean Absolute Error,MAE)。可选地,在一些实施例中,可以确定实际的性能加速比和预测的性能加速比之间的偏差,表示为百分比的形式。
在框650,确定超过误差阈值的任务次数是否大于次数阈值。在一些示例中,可以由模型评估模块322执行框650处的操作。
在一些示例中,误差阈值可以是模型训练时的训练误差,如果在一段时间内的实际的性能加速比和预测的性能加速比之间的MAE超过模型训练时的训练误差,则可以确定超过误差阈值,则将超过误差阈值的任务次数增加1。在一些示例中,误差阈值可以是预先设定的百分比值(例如5%),如果实际的性能加速比和预测的性能加速比之间的偏差超过5%,则可以确定超过误差阈值,则将超过误差阈值的任务次数增加1。
在一些实施例中,可以确定在预设时间段内的、超过误差阈值的次数是否大于次数阈值,例如预设时间段可以为一天、一小时或其他更长或更短的时间,例如次数阈值可以为5次、20次或其他更多或更少的次数。如此,能够确定模型在预设时间段内的累积的准确率。
在另一些实施例中,作为替选,在框650处,可以确定在预设时间段内的、超过误差阈值的次数的比例是否超过比例阈值。本公开对此不限定。
示例性地,如果确定大于次数阈值,则前进到框660。如果确定不大于次数阈值,则可以前进到框601,针对下一个任务继续执行框610处的操作。
以此方式,模型评估模块322可以对模型进行在线评估,以确定模型运行时的准确性,从而可以确定模型是否已经老化,进而能够确定是否需要进行模型再训练。
在框660,进行模型再训练。结合图3,可以由模型再训练与在线更新模块323基于来自样本存储模块325的样本集对模型进行再训练。举例而言,可以基于任务运行在迁移后的核上的K个微架构指标和实际的性能加速比,结合模型的类型(如线性回归类型),通过再训练得到再训练后的模型。例如,以上述式(2)所示的模型为例,通过再训练可以对权重系数等模型参数进行校准和更新。
在一些示例中,如果模型评估模块322确定模型已经老化,则从样本存储模块325提取样本集提供给模型再训练与在线更新模块323,从而触发模型再训练与在线更新模块323对模型进行再训练。
在框670,确定再训练后的模型是否满足要求。结合图3,可以由模型评估模块322确定是否满足要求。在一些实施例中,框670处的操作类似于前述图5中框570处的操作,为避免重复,这里不再赘述。示例性地,如果在框670确定不满足要求,则返回框660。如果在框670确定满足要求,则前进到框680。
在框680,得到经更新的负载特征识别模型。示例性地,可以得到满足要求的再训练后的负载特征识别模型。在一些示例中,经更新的负载特征识别模型可以被提供给模型推理计算模块321,以用于后续的任务调度。
以此方式,本公开的实施例提供了一种基于反馈的在线负载特征识别模型的校准机制。示例性地,任务运行在异构***的不同核上的微架构指标和运行指标等可以被统称为反馈信息。如此,能够基于任务在异构***的不同核间的迁移前后的反馈信息,对模型进行再训练,从而得到经更新的模型,如此能够实现模型的在线校准,提升模型的准确性。
结合图3,本公开实施例中的异构算力调度器324可以被理解为是增强后的OS内核调度器,一方面 在任务运行时可以调用模型推理计算模块321来确定任务的运行特征,如预测任务在异构算力之间的性能加速比;另一方面,在任务迁移之后,还可以将反馈信息发送给模型评估模块322,以便与模型再训练与在线更新模块323一起实现在线反馈的动态模型校准。
因此,本公开的实施例能够解决模型在部署使用过程中准确率降低的问题,通过在线采集数据,分析评估以及再训练,能够保证模型的准确性。
应注意的是,上述结合图3至图5对本公开的一些实施例进行了较为详细的阐述,但是应理解,这些仅是本公开的部分实施例,不应解释为对本公开的实施例的限制。例如,在一些示例中,可以通过在硬件310中的性能监控器/模型推理等来实现如上所述的模型推理计算模块321的功能,此处不再详述。
可选地,如前所述,应用指标303可以包括任务运行在核上时的功耗。类似地,可以通过训练得到功耗模型,并且该功耗模型可以被用于进行功耗的综合调度。可选地,负载特征识别模型334和功耗模型可以具有不同的模型类型、不同的模型参数等。两者可以是彼此独立的。
通过本公开的实施例,提供了负载特征识别模型生成的方案,该负载特征识别模型能够提供任务的预测运行特征,从而为任务的调度提供可靠的参考,进而能够使得对任务的调度更加准确。相应地,能够实现硬件资源的充分利用。
应理解,在本公开的实施例中,“第一”,“第二”,“第三”等表述只是为了表示多个对象可能是不同的,但是同时不排除两个对象之间是相同的。“第一”,“第二”,“第三”等表述不应当解释为对本公开实施例的任何限制。
还应理解,本公开的实施例中的方式、情况、类别以及实施例的划分仅是为了描述的方便,不应构成特别的限定,各种方式、类别、情况以及实施例中的特征在符合逻辑的情况下,可以相互结合。
还应理解,上述内容只是为了帮助本领域技术人员更好地理解本公开的实施例,而不是要限制本公开的实施例的范围。本领域技术人员根据上述内容,可以进行各种修改或变化或组合等。这样的修改、变化或组合后的方案也在本公开的实施例的范围内。
还应理解,上述内容的描述着重于强调各个实施例之前的不同之处,相同或相似之处可以互相参考或借鉴,为了简洁,这里不再赘述。
图7示出了根据本公开的一些实施例的用于任务调度的装置700的示意框图。装置700可以通过软件、硬件或者两者结合的方式实现。如图7所示,装置700包括获取模块710、预测模块720和调度模块730。
获取模块710可以被配置为获取任务运行在异构***的第一核上时的多个第一PMU指标。预测模块720被配置为将多个第一PMU指标输入到预先生成的负载特征识别模型,以得到任务的预测运行特征。调度模块730被配置为基于预测运行特征,对任务进行调度。
在一些实施例中,该装置700还包括:任务迁移模块,被配置为将任务进行迁移以运行在异构***的第二核上;迁移后信息获取模块,被配置为获取任务运行在第二核上时的多个第二PMU指标以及第二任务运行指标,第二任务运行指标包括第二运行时间和/或第二应用性能指标。
在一些实施例中,获取模块710还被配置为:获取任务运行在第一核上时的第一任务运行指标,第一任务运行指标包括第一运行时间和/或第一应用性能指标。可选地,第一应用性能指标可以包括第一吞吐量和/或第一IPC。
在一些实施例中,该装置700还包括实际运行特征确定模块,被配置为基于第一任务运行指标和第二任务运行指标,确定任务的实际运行特征。
在一些实施例中,该装置700还包括存储模块,被配置为存储以下至少一项:多个第一PMU指标、多个第二PMU指标、第一任务运行指标、第二任务运行指标、预测运行特征、或实际运行特征。
在一些实施例中,该装置700还包括模型更新模块,被配置为如果预测运行特征与实际运行特征之间的误差超过误差阈值,则确定误差超过误差阈值的任务次数;以及如果任务次数超过次数阈值,则基于第二PMU指标和实际运行特征,更新负载特征识别模型。
在一些实施例中,该装置700还包括:准确度确定模块,被配置为确定更新后的负载特征识别模型是否满足准确度要求;模型更新模块,被配置为如果确定不满足准确度要求,则重新生成负载特征识别模型。
在一些实施例中,该装置700还包括:初始训练集构建模块,被配置为构建初始训练集,初始训练集包括多个初始数据项,每个初始数据项包括任务运行在异构***的第一核或第二核上时的第一数量的PMU指标和目标值,目标值指示任务的运行特征;更新训练集构建模块,被配置为基于初始训练集构建更新训练集,更新训练集包括多个更新数据项,每个更新数据项包括第二数量的PMU指标和目标值,其中第二 数量小于第一数量;以及模型生成模块,被配置为基于更新训练集,生成负载特征识别模型。
在一些示例中,初始训练集构建模块包括:第一获取子模块被配置为获取第一数量的PMU指标;第二获取子模块被配置为获取任务运行在第一核上时的第一应用性能指标;第三获取子模块被配置为获取任务运行在第二核上时的第二应用性能指标;目标值确定子模块被配置为基于第一应用性能指标和第二应用性能指标,确定目标值;以及初始训练集构建子模块被配置为基于第一数量的PMU指标和目标值,构建初始训练集。
在一些示例中,更新训练集构建模块包括:分簇子模块被配置为将第一数量的PMU指标划分为多个簇;提取子模块被配置为从多个簇中提取第二数量的PMU指标;以及更新训练集构建子模块被配置为基于第二数量的PMU指标和目标值,构建更新训练集。
示例性地,分簇子模块包括:第一相关度确定单元被配置为确定第一数量的PMU指标中两两之间的第一相关度;分簇单元被配置为为基于相关度阈值将第一数量的PMU指标进行聚类,以得到多个簇,其中位于同一个簇中的任两个PMU指标之间的第一相关度不低于相关度阈值,或者其中位于同一个簇中的每两个PMU指标之间的第一相关度的均值不低于相关度阈值。可选地,第一相关度包括如下至少一项:协方差、欧式距离、或皮尔逊相关系数。
示例性地,提取子模块包括:第二相关度确定单元被配置为确定第一数量的PMU指标中的每个PMU指标与目标值之间的第二相关度;排序单元被配置为基于第二相关度,将第一数量的PMU指标进行排序;提取单元被配置为按照第二相关度从高到低的顺序,从多个簇中第二数量的簇中提取第二数量的PMU指标。
示例性地,提取子模块还包括调整单元,被配置为基于用户偏好或者来自用户的调整指示,将第一PMU指标替换为第一簇中的第二PMU指标。
在一些示例中,该装置700还包括接收模块,被配置为接收用户的输入信息,输入信息指示第二数量和相关度阈值。
在一些示例中,模型生成模块被配置为:基于更新训练集和模型配置,生成负载特征识别模型,其中模型配置包括负载特征识别模型的模型类型和/或负载特征识别模型的超参数。可选地,模型类型包括机器学习模型中的以下一项或多项:有监督学习、或无监督学习,其中有监督学习包括以下至少一项:线性回归类型、或神经网络类型等,无监督学习包括以下至少一项:K近邻、最大期望类型等。
在一些示例中,该装置700还包括测试模块,被配置为基于测试集,对负载特征识别模型进行测试。
示例性地,运行特征包括第一核与第二核之间的性能加速比。
图7中的装置700能够用于实现上述结合图2至图6所述的各个过程,为了简洁,这里不再赘述。
图8示出了根据本公开的一些实施例的用于任务调度的装置800的示意框图。装置800可以通过软件、硬件或者两者结合的方式实现。如图8所示,装置800包括初始训练集构建模块810、更新训练集构建模块820和模型生成模块830。
初始训练集构建模块810可以被配置为构建初始训练集,初始训练集包括多个初始数据项,每个初始数据项包括任务运行在异构***的第一核或第二核上时的第一数量的PMU指标和目标值,目标值指示任务的运行特征。更新训练集构建模块820被配置为基于初始训练集构建更新训练集,更新训练集包括多个更新数据项,每个更新数据项包括第二数量的PMU指标和目标值,其中第二数量小于第一数量。模型生成模块830被配置为基于更新训练集,生成负载特征识别模型。
在一些实施例中,初始训练集构建模块810可以包括:第一获取子模块被配置为获取第一数量的PMU指标;第二获取子模块被配置为获取任务运行在第一核上时的第一应用性能指标;第三获取子模块被配置为获取任务运行在第二核上时的第二应用性能指标;目标值确定子模块被配置为基于第一应用性能指标和第二应用性能指标,确定目标值;以及初始训练集构建子模块被配置为基于第一数量的PMU指标和目标值,构建初始训练集。
在一些实施例中,更新训练集构建模块820可以包括:分簇子模块被配置为将第一数量的PMU指标划分为多个簇;提取子模块被配置为从多个簇中提取第二数量的PMU指标;以及更新训练集构建子模块被配置为基于第二数量的PMU指标和目标值,构建更新训练集。
示例性地,分簇子模块可以包括:第一相关度确定单元被配置为确定第一数量的PMU指标中两两之间的第一相关度;分簇单元被配置为为基于相关度阈值将第一数量的PMU指标进行聚类,以得到多个簇,其中位于同一个簇中的任两个PMU指标之间的第一相关度不低于相关度阈值,或者其中位于同一个簇中 的每两个PMU指标之间的第一相关度的均值不低于相关度阈值。可选地,第一相关度包括如下至少一项:协方差、欧式距离、或皮尔逊相关系数。
示例性地,提取子模块可以包括:第二相关度确定单元被配置为确定第一数量的PMU指标中的每个PMU指标与目标值之间的第二相关度;排序单元被配置为基于第二相关度,将第一数量的PMU指标进行排序;提取单元被配置为按照第二相关度从高到低的顺序,从多个簇中第二数量的簇中提取第二数量的PMU指标。可选地,第二相关度包括如下至少一项:协方差、欧式距离、或皮尔逊相关系数。
在一些示例中,提取子模块还可以包括调整单元,被配置为基于用户偏好或者来自用户的调整指示,将第一PMU指标替换为第一簇中的第二PMU指标。
在一些实施例中,该装置800还可以包括接收模块,被配置为接收用户的输入信息,输入信息指示第二数量和相关度阈值。
在一些实施例中,模型生成模块830可以被配置为:基于更新训练集和模型配置,生成负载特征识别模型,其中模型配置包括负载特征识别模型的模型类型和/或负载特征识别模型的超参数。可选地,模型类型可以包括机器学习模型中的以下一项或多项:有监督学习或无监督学习,其中有监督学习包括以下至少一项:线性回归类型、或神经网络类型,其中无监督学习包括以下至少一项:K近邻、或最大期望类型。
在一些实施例中,该装置800还可以包括测试模块,被配置为基于测试集,对负载特征识别模型进行测试。
示例性地,运行特征包括第一核与第二核之间的性能加速比。
图8中的装置800能够用于实现上述结合图4至图5所述的各个过程,为了简洁,这里不再赘述。
本公开的实施例中对模块或单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时也可以有另外的划分方式,另外,在公开的实施例中的各功能单元可以集成在一个单元中,也可以是单独物理存在,也可以两个或两个以上单元集成为一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
图9示出了可以用来实施本公开的实施例的示例设备900的示意性框图。设备900可以被实现为或者被包括在图1的异构***100中。
如图所示,设备900包括多核处理器901、只读存储器(Read-Only Memory,ROM)902以及随机存取存储器(Random Access Memory,RAM)903。多核处理器901可以根据存储在RAM 902和/或RAM 903中的计算机程序指令或者从存储单元908加载到ROM 902和/或RAM 903中的计算机程序指令,来执行各种适当的动作和处理。在ROM 902和/或RAM 903中,还可存储设备900操作所需的各种程序和数据。多核处理器901和ROM 902和/或RAM 903通过总线904彼此相连。输入/输出(Input/Output,I/O)接口905也连接至总线904。
设备900中的多个部件连接至I/O接口905,包括:输入单元906,例如键盘、鼠标等;输出单元907,例如各种类型的显示器、扬声器等;存储单元908,例如磁盘、光盘等;以及通信单元909,例如网卡、调制解调器、无线通信收发机等。通信单元909允许设备900通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。
多核处理器901可以包括多个核,每个核可以是各种具有处理和计算能力的通用和/或专用处理组件。可以被实现为的一些示例包括但不限于中央处理单元(Central Processing Unit,CPU)、图形处理单元(Graphics Processing Unit,GPU)、各种专用的人工智能(Artificial Intelligence,AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(Digital Signal Processor,DSP)、以及任何适当的处理器、控制器、微控制器等,相应地可以被称为计算单元。多核处理器901执行上文所描述的各个方法和处理。例如,在一些实施例中,上文所描述的各个过程可被实现为计算机软件程序,其被有形地包含于计算机可读介质,例如存储单元908。在一些实施例中,计算机程序的部分或者全部可以经由ROM 902和/或RAM 903和/或通信单元909而被载入和/或安装到设备900上。当计算机程序加载到ROM 902和/或RAM 903并由多核处理器901执行时,可以执行上文描述的过程的一个或多个步骤。备选地,在其他实施例中,多核处理器901可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行上文所描述的各个过程。
示例性地,图9中的设备900可以被实现为计算设备,或者可以被实现为计算设备中的芯片或芯片***,本公开的实施例对此不限定。
本公开的实施例还提供了一种芯片,该芯片可以包括输入接口、输出接口和处理电路。在本公开的实 施例中,可以由输入接口和输出接口完成信令或数据的交互,由处理电路完成信令或数据信息的生成以及处理。
本公开的实施例还提供了一种芯片***,包括处理器,用于支持计算设备以实现上述任一实施例中所涉及的功能。在一种可能的设计中,芯片***还可以包括存储器,用于存储必要的程序指令和数据,当处理器运行该程序指令时,使得安装该芯片***的设备实现上述任一实施例中所涉及的方法。示例性地,该芯片***可以由一个或多个芯片构成,也可以包含芯片和其他分立器件。
本公开的实施例还提供了一种处理器,用于与存储器耦合,存储器存储有指令,当处理器运行所述指令时,使得处理器执行上述任一实施例中涉及的方法和功能。
本公开的实施例还提供了一种包含指令的计算机程序产品,其在计算机上运行时,使得计算机执行上述各实施例中任一实施例中涉及的方法和功能。
本公开的实施例还提供了一种计算机可读存储介质,其上存储有计算机指令,当处理器运行所述指令时,使得处理器执行上述任一实施例中涉及的方法和功能。
通常,本公开的各种实施例可以以硬件或专用电路、软件、逻辑或其任何组合来实现。一些方面可以用硬件实现,而其他方面可以用固件或软件实现,其可以由控制器,微处理器或其他计算设备执行。虽然本公开的实施例的各个方面被示出并描述为框图,流程图或使用一些其他图示表示,但是应当理解,本文描述的框,装置、***、技术或方法可以实现为,如非限制性示例,硬件、软件、固件、专用电路或逻辑、通用硬件或控制器或其他计算设备,或其某种组合。
本公开还提供有形地存储在非暂时性计算机可读存储介质上的至少一个计算机程序产品。该计算机程序产品包括计算机可执行指令,例如包括在程序模块中的指令,其在目标的真实或虚拟处理器上的设备中执行,以执行如上参考附图的过程/方法。通常,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、库、对象、类、组件、数据结构等。在各种实施例中,可以根据需要在程序模块之间组合或分割程序模块的功能。用于程序模块的机器可执行指令可以在本地或分布式设备内执行。在分布式设备中,程序模块可以位于本地和远程存储介质中。
用于实现本公开的方法的计算机程序代码可以用一种或多种编程语言编写。这些计算机程序代码可以提供给通用计算机、专用计算机或其他可编程的数据处理装置的处理器,使得程序代码在被计算机或其他可编程的数据处理装置执行的时候,引起在流程图和/或框图中规定的功能/操作被实施。程序代码可以完全在计算机上、部分在计算机上、作为独立的软件包、部分在计算机上且部分在远程计算机上或完全在远程计算机或服务器上执行。
在本公开的上下文中,计算机程序代码或者相关数据可以由任意适当载体承载,以使得设备、装置或者处理器能够执行上文描述的各种处理和操作。载体的示例包括信号、计算机可读介质、等等。信号的示例可以包括电、光、无线电、声音或其它形式的传播信号,诸如载波、红外信号等。
计算机可读介质可以是包含或存储用于或有关于指令执行***、装置或设备的程序的任何有形介质。计算机可读介质可以是计算机可读信号介质或计算机可读存储介质。计算机可读介质可以包括但不限于电子的、磁的、光学的、电磁的、红外的或半导体***、装置或设备,或其任意合适的组合。计算机可读存储介质的更详细示例包括带有一根或多根导线的电气连接、便携式计算机磁盘、硬盘、随机存储存取器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)、光存储设备、磁存储设备,或其任意合适的组合。
此外,尽管在附图中以特定顺序描述了本公开的方法的操作,但是这并非要求或者暗示必须按照该特定顺序来执行这些操作,或是必须执行全部所示的操作才能实现期望的结果。相反,流程图中描绘的步骤可以改变执行顺序。附加地或备选地,可以省略某些步骤,将多个步骤组合为一个步骤执行,和/或将一个步骤分解为多个步骤执行。还应当注意,根据本公开的两个或更多装置的特征和功能可以在一个装置中具体化。反之,上文描述的一个装置的特征和功能可以进一步划分为由多个装置来具体化。
以上已经描述了本公开的各实现,上述说明是示例性的,并非穷尽的,并且也不限于所公开的各实现。在不偏离所说明的各实现的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在很好地解释各实现的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其他普通技术人员能理解本文公开的各个实现方式。

Claims (35)

  1. 一种任务调度方法,包括:
    获取任务运行在异构***的第一核上时的多个第一性能监控单元PMU指标;
    将所述多个第一PMU指标输入到预先生成的负载特征识别模型,以得到所述任务的预测运行特征;以及
    基于所述预测运行特征,对所述任务进行调度。
  2. 根据权利要求1所述的方法,还包括:
    将所述任务进行迁移以运行在所述异构***的第二核上;
    获取所述任务运行在所述第二核上时的多个第二PMU指标以及第二任务运行指标,所述第二任务运行指标包括第二运行时间和/或第二应用性能指标。
  3. 根据权利要求2所述的方法,还包括:
    获取所述任务运行在所述第一核上时的第一任务运行指标,所述第一任务运行指标包括第一运行时间和/或第一应用性能指标。
  4. 根据权利要求3所述的方法,还包括:
    基于所述第一任务运行指标和所述第二任务运行指标,确定所述任务的实际运行特征。
  5. 根据权利要求4所述的方法,还包括存储以下至少一项:
    所述多个第一PMU指标、所述多个第二PMU指标、所述第一任务运行指标、所述第二任务运行指标、所述预测运行特征、或所述实际运行特征。
  6. 根据权利要求4或5所述的方法,还包括:
    如果所述预测运行特征与所述实际运行特征之间的误差超过误差阈值,则确定误差超过所述误差阈值的任务次数;以及
    如果所述任务次数超过次数阈值,则基于所述第二PMU指标和所述实际运行特征,更新所述负载特征识别模型。
  7. 根据权利要求6所述的方法,还包括:
    确定更新后的负载特征识别模型是否满足准确度要求;
    如果确定不满足所述准确度要求,则重新生成所述负载特征识别模型。
  8. 根据权利要求1至7中任一项所述的方法,其中所述负载特征识别模型是通过下述过程生成的:
    构建初始训练集,所述初始训练集包括多个初始数据项,每个初始数据项包括任务运行在所述异构***的所述第一核或第二核上时的第一数量的PMU指标和目标值,所述目标值指示所述任务的运行特征;
    基于所述初始训练集构建更新训练集,所述更新训练集包括多个更新数据项,每个更新数据项包括第二数量的PMU指标和所述目标值,其中所述第二数量小于所述第一数量;以及
    基于所述更新训练集,生成所述负载特征识别模型。
  9. 根据权利要求8所述的方法,其中构建初始训练集包括:
    获取所述第一数量的PMU指标;
    获取所述任务运行在所述第一核上时的第一应用性能指标;
    获取所述任务运行在所述第二核上时的第二应用性能指标;
    基于所述第一应用性能指标和所述第二应用性能指标,确定所述目标值;以及
    基于所述第一数量的PMU指标和所述目标值,构建所述初始训练集。
  10. 根据权利要求8或9所述的方法,其中基于所述初始训练集构建更新训练集包括:
    将所述第一数量的PMU指标划分为多个簇;
    从所述多个簇中提取所述第二数量的PMU指标;以及
    基于所述第二数量的PMU指标和所述目标值,构建所述更新训练集。
  11. 根据权利要求10所述的方法,其中将所述第一数量的PMU指标划分为多个簇包括:
    确定所述第一数量的PMU指标中两两之间的第一相关度;
    基于相关度阈值将所述第一数量的PMU指标进行聚类,以得到所述多个簇,其中位于同一个簇中的任两个PMU指标之间的第一相关度不低于所述相关度阈值,或者其中位于同一个簇中的每两个PMU指标之间的第一相关度的均值不低于所述相关度阈值。
  12. 根据权利要求11所述的方法,其中所述第一相关度包括如下至少一项:协方差、欧式距离、或皮尔逊相关系数。
  13. 根据权利要求11或12所述的方法,其中从所述多个簇中提取所述第二数量的PMU指标包括:
    确定所述第一数量的PMU指标中的每个PMU指标与所述目标值之间的第二相关度;
    基于所述第二相关度,将所述第一数量的PMU指标进行排序;
    按照所述第二相关度从高到低的顺序,从所述多个簇中第二数量的簇中提取所述第二数量的PMU指标。
  14. 根据权利要求13所述的方法,所述第二数量的PMU指标包括从所述多个簇的第一簇中提取的第一PMU指标,所述方法还包括:
    基于用户偏好或者来自用户的调整指示,将所述第一PMU指标替换为所述第一簇中的第二PMU指标。
  15. 根据权利要求11至14中任一项所述的方法,还包括:
    接收用户的输入信息,所述输入信息指示所述第二数量和所述相关度阈值。
  16. 根据权利要求8至15中任一项所述的方法,其中基于所述更新训练集生成负载特征识别模型包括:
    基于所述更新训练集和模型配置,生成所述负载特征识别模型,其中所述模型配置包括所述负载特征识别模型的模型类型和/或所述负载特征识别模型的超参数。
  17. 根据权利要求16所述的方法,其中所述模型类型包括机器学习模型中以下至少一项:有监督学习、或无监督学习,
    其中所述有监督学习包括以下至少一项:线性回归类型、或神经网络类型,
    其中所述无监督学习包括以下至少一项:K近邻、或最大期望类型。
  18. 根据权利要求8至17中任一项所述的方法,还包括:
    基于测试集,对所述负载特征识别模型进行测试,其中所述经测试的负载识别模型满足准确度要求。
  19. 根据权利要求8至18中任一项所述的方法,其中所述运行特征包括所述第一核与所述第二核之间的性能加速比。
  20. 一种模型生成方法,包括:
    构建初始训练集,所述初始训练集包括多个初始数据项,每个初始数据项包括任务运行在异构***的第一核或第二核上时的第一数量的性能监控单元PMU指标和目标值,所述目标值指示所述任务的运行特征;
    基于所述初始训练集构建更新训练集,所述更新训练集包括多个更新数据项,每个更新数据项包括第二数量的PMU指标和所述目标值,其中所述第二数量小于所述第一数量;以及
    基于所述更新训练集,生成负载特征识别模型。
  21. 根据权利要求20所述的方法,其中构建初始训练集包括:
    获取所述第一数量的PMU指标;
    获取所述任务运行在所述第一核上时的第一应用性能指标;
    获取所述任务运行在所述第二核上时的第二应用性能指标;
    基于所述第一应用性能指标和所述第二应用性能指标,确定所述目标值;以及
    基于所述第一数量的PMU指标和所述目标值,构建所述初始训练集。
  22. 根据权利要求20或21所述的方法,其中基于所述初始训练集构建更新训练集包括:
    将所述第一数量的PMU指标划分为多个簇;
    从所述多个簇中提取所述第二数量的PMU指标;以及
    基于所述第二数量的PMU指标和所述目标值,构建所述更新训练集。
  23. 根据权利要求22所述的方法,其中将所述第一数量的PMU指标划分为多个簇包括:
    确定所述第一数量的PMU指标中两两之间的第一相关度;
    基于相关度阈值将所述第一数量的PMU指标进行聚类,以得到所述多个簇,其中位于同一个簇中的任两个PMU指标之间的第一相关度不低于所述相关度阈值,或者其中位于同一个簇中的每两个PMU指标之间的第一相关度的均值不低于所述相关度阈值。
  24. 根据权利要求23所述的方法,其中所述第一相关度包括如下至少一项:协方差、欧式距离、或皮尔逊相关系数。
  25. 根据权利要求23或24所述的方法,其中从所述多个簇中提取所述第二数量的PMU指标包括:
    确定所述第一数量的PMU指标中的每个PMU指标与所述目标值之间的第二相关度;
    基于所述第二相关度,将所述第一数量的PMU指标进行排序;
    按照所述第二相关度从高到低的顺序,从所述多个簇中第二数量的簇中提取所述第二数量的PMU指标。
  26. 根据权利要求25所述的方法,所述第二数量的PMU指标包括从所述多个簇的第一簇中提取的第一PMU指标,所述方法还包括:
    基于用户偏好或者来自用户的调整指示,将所述第一PMU指标替换为所述第一簇中的第二PMU指标。
  27. 根据权利要求23至26中任一项所述的方法,还包括:
    接收用户的输入信息,所述输入信息指示所述第二数量和所述相关度阈值。
  28. 根据权利要求20至27中任一项所述的方法,其中基于所述更新训练集生成负载特征识别模型包括:
    基于所述更新训练集和模型配置,生成所述负载特征识别模型,其中所述模型配置包括所述负载特征识别模型的模型类型和/或所述负载特征识别模型的超参数。
  29. 根据权利要求28所述的方法,其中所述模型类型包括机器学习模型中以下至少一项:有监督学习、或无监督学习,
    其中所述有监督学习包括以下至少一项:线性回归类型、或神经网络类型,
    其中所述无监督学习包括以下至少一项:K近邻、或最大期望类型。
  30. 根据权利要求20至29中任一项所述的方法,还包括:
    基于测试集,对所述负载特征识别模型进行测试,其中所述经测试的负载识别模型满足准确度要求。
  31. 根据权利要求20至30中任一项所述的方法,其中所述运行特征包括所述第一核与所述第二核之间的性能加速比。
  32. 一种电子设备,包括多核处理器以及存储器,所述存储器上存储有由所述多核处理器执行的指令,当所述指令被所述多核处理器执行时使得所述电子设备实现根据权利要求1至31中任一项所述的方法。
  33. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现根据权利要求1至31中任一项所述的方法。
  34. 一种芯片,包括处理电路,被配置为执行根据权利要求1至31中任一项所述的方法。
  35. 一种计算机程序产品,被有形地存储在计算机可读介质上并且包括计算机可执行指令,所述计算机可执行指令在被执行时使设备实现根据权利要求1至31中任一项所述的方法。
PCT/CN2023/115329 2022-08-30 2023-08-28 任务调度方法、模型生成方法、以及电子设备 WO2024046283A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211056476.8 2022-08-30
CN202211056476.8A CN117687745A (zh) 2022-08-30 2022-08-30 任务调度方法、模型生成方法、以及电子设备

Publications (1)

Publication Number Publication Date
WO2024046283A1 true WO2024046283A1 (zh) 2024-03-07

Family

ID=90100357

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/115329 WO2024046283A1 (zh) 2022-08-30 2023-08-28 任务调度方法、模型生成方法、以及电子设备

Country Status (2)

Country Link
CN (1) CN117687745A (zh)
WO (1) WO2024046283A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111062495A (zh) * 2019-11-28 2020-04-24 深圳市华尊科技股份有限公司 机器学习方法及相关装置
CN112580815A (zh) * 2019-09-27 2021-03-30 西门子医疗有限公司 用于可扩展和去中心化增量机器学习的方法和***
US20220091960A1 (en) * 2021-12-01 2022-03-24 Intel Corporation Automatic profiling of application workloads in a performance monitoring unit using hardware telemetry
CN114895773A (zh) * 2022-04-08 2022-08-12 中山大学 异构多核处理器的能耗优化方法、***、装置及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580815A (zh) * 2019-09-27 2021-03-30 西门子医疗有限公司 用于可扩展和去中心化增量机器学习的方法和***
CN111062495A (zh) * 2019-11-28 2020-04-24 深圳市华尊科技股份有限公司 机器学习方法及相关装置
US20220091960A1 (en) * 2021-12-01 2022-03-24 Intel Corporation Automatic profiling of application workloads in a performance monitoring unit using hardware telemetry
CN114895773A (zh) * 2022-04-08 2022-08-12 中山大学 异构多核处理器的能耗优化方法、***、装置及存储介质

Also Published As

Publication number Publication date
CN117687745A (zh) 2024-03-12

Similar Documents

Publication Publication Date Title
CN111309479B (zh) 一种任务并行处理的实现方法、装置、设备和介质
WO2022083624A1 (zh) 一种模型的获取方法及设备
Song et al. Towards pervasive and user satisfactory cnn across gpu microarchitectures
Islam et al. Predicting application failure in cloud: A machine learning approach
CN113361680B (zh) 一种神经网络架构搜索方法、装置、设备及介质
WO2022012407A1 (zh) 一种用于神经网络的训练方法以及相关设备
CN111401433A (zh) 用户信息获取方法、装置、电子设备及存储介质
CN113692594A (zh) 通过强化学***性改进
EP3912056A1 (en) Leveraging query executions to improve index recommendations
Cui et al. Cross-platform machine learning characterization for task allocation in IoT ecosystems
CN110377472B (zh) 定位芯片运行错误的方法及装置
US20230401092A1 (en) Runtime task scheduling using imitation learning for heterogeneous many-core systems
WO2023150912A1 (zh) 算子的调度运行时间比较方法、装置及存储介质
CN113553138A (zh) 一种云资源调度的方法及装置
Gupta et al. A supervised deep learning framework for proactive anomaly detection in cloud workloads
Khodaverdian et al. A shallow deep neural network for selection of migration candidate virtual machines to reduce energy consumption
CN114925938A (zh) 一种基于自适应svm模型的电能表运行状态预测方法、装置
CN110413406A (zh) 一种任务负载预测***及方法
CN112463205B (zh) 基于ai和大数据的应用程序管理方法及人工智能服务器
CN113886454A (zh) 一种基于lstm-rbf的云资源预测方法
WO2024046283A1 (zh) 任务调度方法、模型生成方法、以及电子设备
WO2023224742A1 (en) Predicting runtime variation in big data analytics
US20230237371A1 (en) Systems and methods for providing predictions with supervised and unsupervised data in industrial systems
Ni et al. Online performance and power prediction for edge TPU via comprehensive characterization
CN113961765B (zh) 基于神经网络模型的搜索方法、装置、设备和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23859313

Country of ref document: EP

Kind code of ref document: A1