CN112667594A - Heterogeneous computing platform based on hybrid cloud resources and model training method - Google Patents
Heterogeneous computing platform based on hybrid cloud resources and model training method Download PDFInfo
- Publication number
- CN112667594A CN112667594A CN202110049064.0A CN202110049064A CN112667594A CN 112667594 A CN112667594 A CN 112667594A CN 202110049064 A CN202110049064 A CN 202110049064A CN 112667594 A CN112667594 A CN 112667594A
- Authority
- CN
- China
- Prior art keywords
- resources
- model training
- layer
- training task
- resource
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 94
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000004364 calculation method Methods 0.000 claims abstract description 11
- 230000002787 reinforcement Effects 0.000 claims abstract description 8
- 238000004422 calculation algorithm Methods 0.000 claims description 17
- 238000013135 deep learning Methods 0.000 claims description 5
- 230000006978 adaptation Effects 0.000 abstract description 3
- 238000010801 machine learning Methods 0.000 abstract description 3
- 238000007726 management method Methods 0.000 description 29
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000011161 development Methods 0.000 description 4
- 238000013523 data management Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Landscapes
- Stored Programmes (AREA)
Abstract
The invention discloses a heterogeneous computing platform based on hybrid cloud resources and a model training method, wherein the platform comprises a basic component layer, a computing framework layer, a resource management layer and a resource management layer, and the method comprises the following steps: a user sets a model training task through a basic component layer and starts the task, wherein the setting of the model training task comprises selecting a model, a data set, a learning frame and/or a calculation resource; the calculation framework layer provides the selected learning framework; and the resource management layer allocates the model training task according to the setting of the model training task and calls computing resources, network resources and storage resources of the infrastructure layer to perform model training. The heterogeneous computing platform can enable the whole process of machine learning modeling to be visualized by supporting various reinforcement learning architectures and ultra-large-scale distributed training, and meanwhile solves the problems of limited computing power, single AI chip adaptation, frame fixation and the like commonly existing in the existing cloud management platform, so that the model training process is convenient, rapid and efficient.
Description
Technical Field
The invention relates to the technical field of cloud, in particular to a heterogeneous computing platform based on mixed cloud resources and a model training method.
Background
The existing three resources of computing, storage and network are isolated in different virtualization platforms, so that unified monitoring and management on a private cloud layer cannot be realized, and with the development of cloud computing technology, in order to realize frequent switching of management users among different management interfaces and master different management logics and virtualization models of various platforms, enterprises need to hire or cultivate managers familiar with specific virtualization platforms to perform respective management.
The hybrid cloud is a solution combining a private cloud and one or more public cloud services, and not only can provide a private and safe data storage and computing environment, but also can provide more flexible and lower-cost computing, storage and network resources.
At present, most of hybrid Cloud Management systems realize Management of a multi-Cloud system based on a Cloud Management Platform (CMP), but the Cloud Management Platform generally has the problems of long process, easy error in manual operation and the like, so that a user cannot apply for using resources and improve self-service capability in a uniform manner.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides the following technical scheme.
One aspect of the present invention relates to a heterogeneous computing platform based on hybrid cloud resources, comprising:
the basic component layer is used for providing an interface of user operation, and the user operation comprises setting a model training task;
the calculation framework layer is used for providing a learning framework used by the model training task;
a resource management layer for allocating and scheduling the hybrid cloud resources in the infrastructure layer to perform the model training task;
and the infrastructure layer is used for providing mixed cloud resources, including heterogeneous computing resources, network resources and storage resources.
Further, the learning framework includes a deep learning framework and a reinforcement learning framework.
Further, the resource management layer comprises a resource management module, a kubernets module and a Docker module, and the resource management module realizes scheduling of heterogeneous computational resources, network resources and storage resources in the infrastructure layer through the kubernets module and the Docker module.
Further, the heterogeneous computing resources include distributed CPU, GPU, ASIC processor resources, the network resources include RDMA networks, and the storage resources include distributed storage systems HDFS, Ceph, and/or ClusterFS.
Further, the user operation further comprises uploading a data set and/or an uploading algorithm.
Further, the computing framework layer further comprises a big data engine for managing the uploaded data set.
Another aspect of the present invention relates to a model training method implemented by using the above heterogeneous computing platform based on hybrid cloud resources, including:
a user sets a model training task through the basic component layer and starts the task, wherein the setting of the model training task comprises selecting a model, a data set, a learning frame and/or a calculation resource;
the calculation framework layer provides the selected learning framework;
and the resource management layer allocates and calls computing resources, network resources and storage resources of the infrastructure layer according to the settings of the model training task to perform model training.
Preferably, the resource management layer allocates and calls computational resources, network resources and storage resources of the infrastructure layer according to the settings of the model training task, including:
and the resource management layer allocates computational power resources, network resources and storage resources to the model training task according to the settings of the model training task, and calls a Kubernetes module and a Docker module to establish a container for the model training task, wherein the container comprises the mirror images of the allocated computational power resources, network resources and storage resources.
Further, the resource management layer allocating computing resources to the model training task according to the settings of the model training task includes:
acquiring currently available computing power resources;
if the setting of the model training task comprises selection of computing resources, distributing corresponding computing resources based on the selection;
otherwise, identifying the type of the model training task, and determining the type and the size of the calculation force resource according to the type;
and allocating the computing resources from the currently available computing resources according to the type and the size of the needed computing resources.
Further, the resource management layer records the resource condition used by each model training task in real time, and dynamically adjusts the distributed computing resources, network resources and storage resources in the model training process.
The invention has the beneficial effects that: the invention provides a heterogeneous computing platform based on hybrid cloud resources and a model training method, wherein an operation and maintenance mode taking an administrator as a center is converted into a decentralized self-service operation and maintenance mode, and an operation mode of one-way supply is converted into a transparent autonomous operation mode, so that the working efficiency of managing and using heterogeneous resources is improved. And the heterogeneous computing platform can realize unified management of multiple clusters and synchronous use of large-scale and multiple users, can enable the whole course of machine learning modeling to be visualized by supporting various reinforcement learning architectures and super-large-scale distributed training, and simultaneously solves the problems of limited computing power, single AI chip adaptation, frame fixation and the like commonly existing in the existing cloud platform, so that the model training process is convenient, rapid and efficient.
Drawings
FIG. 1 is a schematic structural diagram of a hybrid cloud resource-based heterogeneous computing platform according to the present invention;
fig. 2 is a schematic flow chart of a model training method implemented by using a heterogeneous computing platform based on hybrid cloud resources according to the present invention.
Detailed Description
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
Example one
As shown in fig. 1, an embodiment of the present invention provides a heterogeneous computing platform based on hybrid cloud resources, including:
the basic component layer 11 is used for providing an interface of user operation, and the user operation comprises setting a model training task;
a computation framework layer 12, configured to provide a learning framework used by the model training task;
a resource management layer 13, configured to allocate and schedule the hybrid cloud resources in the infrastructure layer to execute the model training task;
the infrastructure layer 14 is configured to provide hybrid cloud resources including heterogeneous computing resources, network resources, and storage resources.
The basic component layer 11 includes a data management module 111, an algorithm development module 112, and a model training module 113. The user uploads the data set through the data management module 111, and deletes, modifies, and exports the data set. The user uploads the algorithm, and modifies and deletes the algorithm through the algorithm development module 112. The user sets model training tasks, including setting algorithms, data sets, and/or learning frameworks used for model training, through the model training module 113. Optionally, the base component layer 11 further includes a customization orchestration module 114 for customizing resources used by the model training, including processor type, number of processors, and the like.
The learning framework 121 provided by the computation framework layer 12 includes a deep learning framework and a reinforcement learning framework. The deep learning frames comprise international mainstream deep learning frames such as TensorFlow, mxnet, Caffe and PyTorch, and domestic frames such as OneFlow, MegEngine, PaddlePaddle and MindSpore. The reinforcement learning framework comprises a multi-tenant reinforcement learning framework Ray. The learning frame is preset in the platform.
In the using process, a user can designate a computing framework used for model training through the model training module 113 of the basic component layer 11, and when the model is trained, the platform directly calls the designated computing framework from the computing framework layer 12, so that the method is convenient and fast, the deployment process is greatly simplified, and the operation efficiency is improved.
The computing framework layer 12 also includes a big data engine 122 for operating on the uploaded data set, including storage, computation, mining, management, and the like. The big data engine comprises a plurality of data engines, such as SPARK, HADOOP, STORM, HIVE, FLINK, KAFKA and the like, so that full data intercommunication and zero configuration use are completed, and a unified rich ecological data collection body is created.
The resource management layer 13 includes a resource management module 131, a kubernets module 132, and a Docker module 133, where the resource management module 131 implements allocation and scheduling of heterogeneous computing resources, network resources, and storage resources in the infrastructure layer through the kubernets module 132 and the Docker module 133. Specifically, the resource management module 131 allocates computational power resources, network resources, and storage resources of the infrastructure layer 14 to the model training task according to the settings of the model training task, and then calls the kubernets module 132 to establish a container for the model training task, where the container includes a mirror image of the allocated computational power resources, network resources, and storage resources, and the container is stored in the Docker module 133, so that resource scheduling is performed in units of Docker containers.
Infrastructure layer 14 includes private cloud module 141 and public cloud module 142, and private cloud module 141 is used for providing private cloud resources, and public cloud module 142 is used for providing public cloud resources 142, and the private cloud resources include heterogeneous computing resources, network resources, and storage resources. Heterogeneous computing resources include various types of processors, such as distributed CPUs, GPUs, ASICs, and processor families of different manufacturers, such as cambrian, huashanteng, hectoritan, etc., so as to satisfy various computing requirements of users; the network resources comprise an RDMA network, so that the copying overhead from a user space to a system space is avoided, and the CPU use efficiency of a remote server is improved; the storage resources comprise a distributed storage system HDFS, a Ceph and/or a ClusterFS, so that users can more conveniently access shared files distributed on a network. The public cloud resources comprise Huazhiyun, Aliyun, Jinshan cloud and the like.
The invention provides a heterogeneous computing platform based on mixed cloud resources, which realizes multi-cluster unified management and large-scale multi-user synchronous use by integrating heterogeneous computing resources, a plurality of computing frames and a big data engine, supports various reinforcement learning architectures and ultra-large-scale distributed training, can enable the whole course of machine learning modeling to be visualized, and simultaneously solves the problems of limited computing power, single AI chip adaptation, frame fixation and the like commonly existing in the existing cloud platform, so that the model training process is convenient, rapid and efficient.
Example two
As shown in fig. 2, the embodiment provides a model training method implemented by using the heterogeneous computing platform based on hybrid cloud resources according to the first embodiment, including:
s101, a user sets a model training task through the basic component layer and starts the task, wherein the set model training task comprises an algorithm, a data set, a learning frame and/or a calculation resource used for selecting training;
s102, providing the selected learning frame by the computing frame layer;
s103, the resource management layer allocates and calls computing resources, network resources and storage resources of the infrastructure layer according to the settings of the model training task to perform model training.
Specifically, the user sets a model training task through the model training module 113, including setting an algorithm, a data set, and/or a learning framework used for model training, the algorithm used may be uploaded by the user through the algorithm development module 112, the data set used may be uploaded by the user through the data management module 111, and the algorithm and the data set may also be uploaded in advance by an administrator or other users. The administrator or user can choose whether to disclose the algorithm or the data set when uploading the algorithm or the data set, and if so, all users of the platform can choose to use the algorithm or the data set. The user customizes the resource usage scenario of model training via customization orchestration module 114, including selecting a public cloud or a private cloud, selecting computational resources in the private cloud, such as processor type, number of processors, processor family, etc. Therefore, the user can flexibly set the training resources based on the self requirement. For example, if the user wishes to increase the training speed, a greater number of processors may be selected; if the user has a requirement on the type of the processor, the GPU or the CPU can be selected; if the user wants to verify a processor of a particular vendor, a processor family of that vendor may be selected, such as for example, the martial era. Therefore, a flexible and uniform resource using mode can be provided for the user, and the personalized resource requirements of the user can be met.
After the setup is complete, the user initiates a model training task. The platform calls the selected learning frame from the learning frame 121 of the computing frame layer 12 according to the setting of the user, extracts a data set as training data, extracts an algorithm code and executes the algorithm code;
meanwhile, the resource management layer 13 allocates computational resources, network resources, and storage resources according to the customized resource usage scheme, and for the case of not customizing or only customizing part of resources, allocates resources according to the setting of model training and the current usage of the resources that are not customized. For example, if the customized resource usage scheme only defines the type and number of processors, the resource management layer 13 allocates free network resources and storage resources according to the settings of the model training (e.g., the size of the data set used by the model training). If the current available resource is smaller than the customized resource use scheme, the model training task is distributed according to the current available resource, and the model training task is recorded, and when new idle resources exist, the model training task is preferentially distributed to the task until the customized resource use scheme is reached.
Then, the platform calls a Kubernetes module to establish a Docker container for the model training task, the container is stored in the Docker module, and the distributed computing power resource, network resource and storage resource are packaged into a mirror image and placed in the established container. Therefore, when a plurality of model training tasks are executed in parallel, each task has a corresponding Docker container, and the platform can call the Kubernetes model to uniformly manage the Docker containers, for example, recording the resource condition used by each model training task in real time, and dynamically adjusting the mirror image of computational resources, network resources and storage resources contained in each container.
The heterogeneous computing platform based on the hybrid cloud resources and the model training method provided by the embodiment of the invention can be well applied to various scenes related to the field of artificial intelligence, such as machine translation, face recognition, AI medical treatment, brain-like computing, intelligent simulation and the like.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (10)
1. A heterogeneous computing platform based on hybrid cloud resources, comprising:
the basic component layer is used for providing an interface of user operation, and the user operation comprises setting a model training task;
the calculation framework layer is used for providing a learning framework used by the model training task;
a resource management layer for allocating and scheduling the hybrid cloud resources in the infrastructure layer to perform the model training task;
and the infrastructure layer is used for providing mixed cloud resources, including heterogeneous computing resources, network resources and storage resources.
2. The hybrid cloud resource-based heterogeneous computing platform of claim 1, wherein the learning framework comprises a deep learning framework and a reinforcement learning framework.
3. The hybrid cloud resource-based heterogeneous computing platform of claim 2, wherein the resource management layer comprises a resource management module, a kubernets module, and a Docker module, and wherein the resource management module implements scheduling of heterogeneous computing, network, and storage resources in the infrastructure layer via the kubernets module and the Docker module.
4. The hybrid cloud resource-based heterogeneous computing platform of claim 3, wherein the heterogeneous computational resources comprise distributed CPU, GPU, ASIC processor resources, the network resources comprise RDMA networks, and the storage resources comprise distributed storage systems (HDFS), Ceph, and/or ClusterFS.
5. The hybrid cloud resource-based heterogeneous computing platform of claim 4, wherein the user operations further comprise an upload dataset and/or upload algorithm.
6. The hybrid cloud resource-based heterogeneous computing platform of claim 5, wherein the computing framework layer further comprises a big data engine to manage the uploaded data set.
7. A model training method implemented by the hybrid cloud resource-based heterogeneous computing platform of claim 6, comprising:
a user sets a model training task through the basic component layer and starts the task, wherein the setting of the model training task comprises selecting a model, a data set, a learning frame and/or a calculation resource;
the calculation framework layer provides the selected learning framework;
and the resource management layer allocates and calls computing resources, network resources and storage resources of the infrastructure layer according to the settings of the model training task to perform model training.
8. The model training method of claim 7, wherein the resource management layer allocating and invoking computational resources, network resources, and storage resources of the infrastructure layer for the model training task according to the settings of the model training task comprises:
and the resource management layer allocates computational power resources, network resources and storage resources to the model training task according to the settings of the model training task, and calls a Kubernetes module and a Docker module to establish a container for the model training task, wherein the container comprises the mirror images of the allocated computational power resources, network resources and storage resources.
9. The model training method of claim 8, wherein the resource management layer assigning computational resources to the model training task based on the settings of the model training task comprises:
acquiring currently available computing power resources;
if the setting of the model training task comprises selection of computing resources, distributing corresponding computing resources based on the selection;
otherwise, identifying the type of the model training task, and determining the type and the size of the calculation force resource according to the type;
and allocating the computing resources from the currently available computing resources according to the type and the size of the needed computing resources.
10. The model training method of claim 9, wherein the resource management layer records the resource situation used by each model training task in real time and dynamically adjusts the allocated computational resources, network resources and storage resources during the model training process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110049064.0A CN112667594A (en) | 2021-01-14 | 2021-01-14 | Heterogeneous computing platform based on hybrid cloud resources and model training method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110049064.0A CN112667594A (en) | 2021-01-14 | 2021-01-14 | Heterogeneous computing platform based on hybrid cloud resources and model training method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112667594A true CN112667594A (en) | 2021-04-16 |
Family
ID=75415161
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110049064.0A Pending CN112667594A (en) | 2021-01-14 | 2021-01-14 | Heterogeneous computing platform based on hybrid cloud resources and model training method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112667594A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113326116A (en) * | 2021-06-30 | 2021-08-31 | 北京九章云极科技有限公司 | Data processing method and system |
CN114661482A (en) * | 2022-05-25 | 2022-06-24 | 成都索贝数码科技股份有限公司 | GPU computing power management method, medium, equipment and system |
DE202022104275U1 (en) | 2022-07-28 | 2022-08-25 | Ahmed Alemran | System for intelligent resource management for distributed machine learning tasks |
CN115562877A (en) * | 2022-11-15 | 2023-01-03 | 北京阿丘科技有限公司 | Arrangement method, device and equipment of distributed computing power resources and storage medium |
CN116521380A (en) * | 2023-07-05 | 2023-08-01 | 之江实验室 | Resource self-adaptive collaborative model training acceleration method, device and equipment |
CN117271424A (en) * | 2023-11-24 | 2023-12-22 | 北京中星微人工智能芯片技术有限公司 | Processing device and processing method based on multimode fusion computing framework |
CN117421108A (en) * | 2023-12-15 | 2024-01-19 | 企商在线(北京)数据技术股份有限公司 | Heterogeneous computing power platform design method, heterogeneous computing power platform and resource scheduling method |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107995043A (en) * | 2017-12-15 | 2018-05-04 | 南京南瑞信息通信科技有限公司 | Application disaster recovery and backup systems and calamity based on mixing cloud platform are for collocation method |
CN108881446A (en) * | 2018-06-22 | 2018-11-23 | 深源恒际科技有限公司 | A kind of artificial intelligence plateform system based on deep learning |
CN109670600A (en) * | 2018-12-14 | 2019-04-23 | 启元世界(北京)信息技术服务有限公司 | Decision-making technique and system based on cloud platform |
CN109933306A (en) * | 2019-02-11 | 2019-06-25 | 山东大学 | Mix Computational frame generation, data processing method, device and mixing Computational frame |
WO2019130009A1 (en) * | 2017-12-29 | 2019-07-04 | Agarik Sas | Orchestrated hybrid cloud platform for multi-cloud environment |
CN110347498A (en) * | 2019-06-10 | 2019-10-18 | 华南理工大学 | A kind of load dynamic migration method under container and virtual machine mixing cloud environment |
CN110490450A (en) * | 2019-08-15 | 2019-11-22 | 安诺优达生命科学研究院 | Biological information management system based on mixed cloud |
WO2020092446A2 (en) * | 2018-10-29 | 2020-05-07 | Strong Force TX Portfolio 2018, LLC | Methods and systems for improving machines and systems that automate execution of distributed ledger and other transactions in spot and forward markets for energy, compute, storage and other resources |
CN111209077A (en) * | 2019-12-26 | 2020-05-29 | 中科曙光国际信息产业有限公司 | Deep learning framework design method |
WO2020135806A1 (en) * | 2018-12-28 | 2020-07-02 | 华为技术有限公司 | Operation maintenance method and equipment applied to data center |
CN111427549A (en) * | 2020-03-30 | 2020-07-17 | 中国科学院计算机网络信息中心 | Artificial intelligence reinforcement learning service platform |
CN111612300A (en) * | 2020-04-16 | 2020-09-01 | 国网甘肃省电力公司信息通信公司 | Scene anomaly perception index calculation method and system based on deep hybrid cloud model |
CN111626338A (en) * | 2020-05-06 | 2020-09-04 | 中移雄安信息通信科技有限公司 | Cloud environment matching method, device, equipment and medium based on fusion classification model |
-
2021
- 2021-01-14 CN CN202110049064.0A patent/CN112667594A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107995043A (en) * | 2017-12-15 | 2018-05-04 | 南京南瑞信息通信科技有限公司 | Application disaster recovery and backup systems and calamity based on mixing cloud platform are for collocation method |
WO2019130009A1 (en) * | 2017-12-29 | 2019-07-04 | Agarik Sas | Orchestrated hybrid cloud platform for multi-cloud environment |
CN108881446A (en) * | 2018-06-22 | 2018-11-23 | 深源恒际科技有限公司 | A kind of artificial intelligence plateform system based on deep learning |
WO2020092446A2 (en) * | 2018-10-29 | 2020-05-07 | Strong Force TX Portfolio 2018, LLC | Methods and systems for improving machines and systems that automate execution of distributed ledger and other transactions in spot and forward markets for energy, compute, storage and other resources |
CN109670600A (en) * | 2018-12-14 | 2019-04-23 | 启元世界(北京)信息技术服务有限公司 | Decision-making technique and system based on cloud platform |
WO2020135806A1 (en) * | 2018-12-28 | 2020-07-02 | 华为技术有限公司 | Operation maintenance method and equipment applied to data center |
CN109933306A (en) * | 2019-02-11 | 2019-06-25 | 山东大学 | Mix Computational frame generation, data processing method, device and mixing Computational frame |
CN110347498A (en) * | 2019-06-10 | 2019-10-18 | 华南理工大学 | A kind of load dynamic migration method under container and virtual machine mixing cloud environment |
CN110490450A (en) * | 2019-08-15 | 2019-11-22 | 安诺优达生命科学研究院 | Biological information management system based on mixed cloud |
CN111209077A (en) * | 2019-12-26 | 2020-05-29 | 中科曙光国际信息产业有限公司 | Deep learning framework design method |
CN111427549A (en) * | 2020-03-30 | 2020-07-17 | 中国科学院计算机网络信息中心 | Artificial intelligence reinforcement learning service platform |
CN111612300A (en) * | 2020-04-16 | 2020-09-01 | 国网甘肃省电力公司信息通信公司 | Scene anomaly perception index calculation method and system based on deep hybrid cloud model |
CN111626338A (en) * | 2020-05-06 | 2020-09-04 | 中移雄安信息通信科技有限公司 | Cloud environment matching method, device, equipment and medium based on fusion classification model |
Non-Patent Citations (4)
Title |
---|
朱连章 等: "基于深度学习的普适云服务迁移方法研究", 太原理工大学学报, no. 05, 15 September 2018 (2018-09-15), pages 736 - 744 * |
林健;谢冬鸣;余波;: "深度学习云服务适配问题研究", 软件导刊, no. 06, 15 June 2020 (2020-06-15), pages 1 - 8 * |
陈建辉;: "混合云环境下基于椭圆曲线加密的隐私保护模型", 微电子学与计算机, no. 08, 5 August 2017 (2017-08-05), pages 128 - 132 * |
陈星;兰兴土;李隘鹏;郭文忠;黄罡;: "基于运行时模型的混合云管理方法", 软件学报, no. 07, pages 1881 - 1897 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113326116A (en) * | 2021-06-30 | 2021-08-31 | 北京九章云极科技有限公司 | Data processing method and system |
CN114661482A (en) * | 2022-05-25 | 2022-06-24 | 成都索贝数码科技股份有限公司 | GPU computing power management method, medium, equipment and system |
DE202022104275U1 (en) | 2022-07-28 | 2022-08-25 | Ahmed Alemran | System for intelligent resource management for distributed machine learning tasks |
CN115562877A (en) * | 2022-11-15 | 2023-01-03 | 北京阿丘科技有限公司 | Arrangement method, device and equipment of distributed computing power resources and storage medium |
CN115562877B (en) * | 2022-11-15 | 2023-03-24 | 北京阿丘科技有限公司 | Arranging method, device and equipment of distributed computing power resources and storage medium |
CN116521380A (en) * | 2023-07-05 | 2023-08-01 | 之江实验室 | Resource self-adaptive collaborative model training acceleration method, device and equipment |
CN117271424A (en) * | 2023-11-24 | 2023-12-22 | 北京中星微人工智能芯片技术有限公司 | Processing device and processing method based on multimode fusion computing framework |
CN117271424B (en) * | 2023-11-24 | 2024-02-06 | 北京中星微人工智能芯片技术有限公司 | Processing device and processing method based on multimode fusion computing framework |
CN117421108A (en) * | 2023-12-15 | 2024-01-19 | 企商在线(北京)数据技术股份有限公司 | Heterogeneous computing power platform design method, heterogeneous computing power platform and resource scheduling method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112667594A (en) | Heterogeneous computing platform based on hybrid cloud resources and model training method | |
CN103810023B (en) | The intelligent deployment method of Distributed Application and system in a kind of cloud platform | |
US10467725B2 (en) | Managing access to a resource pool of graphics processing units under fine grain control | |
CN105207798B (en) | Service arrangement method and device in software defined network | |
CN103870314B (en) | Method and system for simultaneously operating different types of virtual machines by single node | |
Sotomayor et al. | Virtual infrastructure management in private and hybrid clouds | |
CN107222531B (en) | Container cloud resource scheduling method | |
CN109672709B (en) | Hybrid cloud service scheduling system and method | |
CN104123182B (en) | Based on the MapReduce task of client/server across data center scheduling system and method | |
CN102971724B (en) | The method and apparatus relevant with the management based on modular virtual resource in data center environment | |
CN104503832B (en) | A kind of scheduling virtual machine system and method for fair and efficiency balance | |
CN106325975A (en) | Method for automatically deploying and managing big data clusters through Docker container | |
CN108920153A (en) | A kind of Docker container dynamic dispatching method based on load estimation | |
CN111045786B (en) | Container creation system and method based on mirror image layering technology in cloud environment | |
US20240111586A1 (en) | Multi-policy intelligent scheduling method and apparatus oriented to heterogeneous computing power | |
CN104112049B (en) | Based on the MapReduce task of P2P framework across data center scheduling system and method | |
CN104021029B (en) | Spatial information cloud computing system and implementing method thereof | |
CN110069341A (en) | What binding function configured on demand has the dispatching method of dependence task in edge calculations | |
CN109144661A (en) | A kind of deep learning management method based on docker | |
CN116541134B (en) | Method and device for deploying containers in multi-architecture cluster | |
CN104331332A (en) | Virtual resource preallocation algorithm based on SLA (Service Level Agreement) | |
CN109992373A (en) | Resource regulating method, approaches to IM and device and task deployment system | |
CN112433823A (en) | Apparatus and method for dynamically virtualizing physical card | |
EP2923320A1 (en) | Transparently routing job submissions between disparate environments | |
Kherbache et al. | Scheduling live-migrations for fast, adaptable and energy-efficient relocation operations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |