WO2021088964A1 - Système d'inférence, procédé d'inférence, dispositif électronique et support de stockage informatique - Google Patents

Système d'inférence, procédé d'inférence, dispositif électronique et support de stockage informatique Download PDF

Info

Publication number
WO2021088964A1
WO2021088964A1 PCT/CN2020/127026 CN2020127026W WO2021088964A1 WO 2021088964 A1 WO2021088964 A1 WO 2021088964A1 CN 2020127026 W CN2020127026 W CN 2020127026W WO 2021088964 A1 WO2021088964 A1 WO 2021088964A1
Authority
WO
WIPO (PCT)
Prior art keywords
inference
reasoning
computing device
calculation model
information
Prior art date
Application number
PCT/CN2020/127026
Other languages
English (en)
Chinese (zh)
Inventor
林立翔
李鹏
游亮
龙欣
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2021088964A1 publication Critical patent/WO2021088964A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/045Explanation of inference; Explainable artificial intelligence [XAI]; Interpretable artificial intelligence

Definitions

  • Fig. 7 is a flowchart of an inference method according to the sixth embodiment of the present invention.
  • Fig. 9 is a schematic structural diagram of an electronic device according to the eighth embodiment of the present invention.
  • the corresponding model information can be obtained when the deep learning framework is loaded into the model.
  • the inference client first sends the information of the calculation model to the second terminal device, and the second terminal device receives the calculation through the inference server.
  • Model information Assuming that the information of the calculation model indicates that the calculation model to be used is calculation model A, and the resource pool of the second terminal device stores calculation models A, B, C, and D, then the second terminal device will directly access the resource from the resource through the GPU. Load calculation model A in the pool.
  • one or more types of inference acceleration resources are provided in the second computing device 204; when the inference acceleration resources include multiple types, different types of inference acceleration resources have different usage priorities; the reasoning server 2042 is based on The preset load balancing rules and the priority of various types of inference acceleration resources, use the inference acceleration resources.
  • the number of a certain type of reasoning acceleration resource may be one or multiple, which is set by a person skilled in the art according to requirements, and the embodiment of the present invention does not limit this.
  • those skilled in the art can also set other appropriate load balancing rules according to actual needs, which is not limited in the embodiment of the present invention.
  • the inference calculation can be seamlessly transferred to the remote target computing device with inference acceleration resources, and the interaction between the current computing device and the target computing device is important to the user. It is imperceptible. Therefore, it can ensure that the business logic of the application involving reasoning and the user's use habits for reasoning business remain unchanged, and the reasoning is realized at low cost and the user experience is improved.
  • Step S506 Feed back the result of the inference processing to the source computing device.
  • the inference method of this embodiment can be implemented by the inference server of the second computing device in the foregoing embodiment, and the specific implementation of the foregoing process can also refer to the operation of the inference server in the foregoing embodiment, which will not be repeated here.
  • the reasoning acceleration resource includes one or more types; when the reasoning acceleration resource includes multiple types, different types of reasoning acceleration resources have different usage priorities; then,
  • the calculation model indicated by the inference acceleration resource loading the model information includes: according to a preset load balancing rule and the priority of multiple types of the inference acceleration resource, the inference acceleration resource is used to load the model information indication Calculation model.
  • the electronic device may include a processor (processor) 702, a communication interface (Communications Interface) 704, a memory (memory) 706, and a communication bus 708.
  • processor processor
  • Communication interface Communication Interface
  • memory memory
  • the information of the processing function is API interface information of the processing function.
  • the inference processing is deployed in different computing devices, where the target computing device is provided with inference acceleration resources, and the main inference processing can be performed through the computing model, and the inference method of this embodiment is executed.
  • Current electronic equipment can be responsible for data processing before and after inference processing.
  • the current electronic device can first send the model information of the calculation model to the target computing device, and the target computing device uses the inference acceleration resource to load the corresponding calculation model; then, the current electronic device sends the data to be inferred to the target computing device, After the target computing device receives the data to be inferred, it can perform inference processing through the loaded computing model. In this way, the decoupling of computing resources used for inference is realized.
  • the processor 802 is configured to execute the program 810, and specifically can execute the relevant steps in the inference method embodiment in the fifth or sixth embodiment.
  • each component/step described in the embodiment of the present invention can be split into more components/steps, or two or more components/steps or partial operations of components/steps can be combined into New components/steps to achieve the purpose of the embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer And Data Communications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

L'invention concerne un système d'inférence, un procédé d'inférence, un dispositif électronique et un support de stockage informatique. Le système d'inférence comporte un premier dispositif informatique et un second dispositif informatique qui sont reliés l'un à l'autre, le premier dispositif informatique étant muni d'un client d'inférence, et le second dispositif informatique comportant une ressource d'accélération d'inférence et un serveur d'inférence, le client d'inférence étant utilisé pour acquérir des informations de modèle d'un modèle informatique servant à l'inférence et des données à soumettre à une inférence, et pour envoyer respectivement les informations de modèle et lesdites données au serveur d'inférence dans le second dispositif informatique ; et le serveur d'inférence étant utilisé pour charger et appeler, au moyen de la ressource d'accélération d'inférence, le modèle informatique indiqué par les informations de modèle, et effectuer, au moyen du modèle informatique, un traitement d'inférence sur lesdites données, et renvoyer le résultat du traitement d'inférence au client d'inférence.
PCT/CN2020/127026 2019-11-08 2020-11-06 Système d'inférence, procédé d'inférence, dispositif électronique et support de stockage informatique WO2021088964A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911089253.XA CN112784989B (zh) 2019-11-08 2019-11-08 推理***、推理方法、电子设备及计算机存储介质
CN201911089253.X 2019-11-08

Publications (1)

Publication Number Publication Date
WO2021088964A1 true WO2021088964A1 (fr) 2021-05-14

Family

ID=75748575

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/127026 WO2021088964A1 (fr) 2019-11-08 2020-11-06 Système d'inférence, procédé d'inférence, dispositif électronique et support de stockage informatique

Country Status (3)

Country Link
CN (1) CN112784989B (fr)
TW (1) TW202119255A (fr)
WO (1) WO2021088964A1 (fr)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344208B (zh) * 2021-06-25 2023-04-07 中国电信股份有限公司 数据推理方法、装置及***
CN116127082A (zh) * 2021-11-12 2023-05-16 华为技术有限公司 一种数据采集方法、***以及相关装置
TWI832279B (zh) * 2022-06-07 2024-02-11 宏碁股份有限公司 人工智慧模型運算加速系統及人工智慧模型運算加速方法
WO2024000605A1 (fr) * 2022-07-01 2024-01-04 北京小米移动软件有限公司 Procédé et appareil de raisonnement de modèle d'ia
CN114997401B (zh) * 2022-08-03 2022-11-04 腾讯科技(深圳)有限公司 自适应推理加速方法、装置、计算机设备和存储介质
CN116402141B (zh) * 2023-06-09 2023-09-05 太初(无锡)电子科技有限公司 一种模型推理方法、装置、电子设备及存储介质
CN116723191B (zh) * 2023-08-07 2023-11-10 深圳鲲云信息科技有限公司 利用加速装置执行数据流加速计算的方法和***

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106383835A (zh) * 2016-08-29 2017-02-08 华东师范大学 一种基于形式语义推理和深度学习的自然语言知识挖掘***
CN109145168A (zh) * 2018-07-11 2019-01-04 广州极天信息技术股份有限公司 一种专家服务机器人云平台
CN110199274A (zh) * 2016-12-02 2019-09-03 微软技术许可有限责任公司 用于自动化查询回答生成的***和方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101126524B1 (ko) * 2010-06-25 2012-03-22 국민대학교산학협력단 사용자 중심의 상황 인지 시스템, 이에 적합한 상황 정보 변환 방법 및 사례 기반 추론 방법
CN104020983A (zh) * 2014-06-16 2014-09-03 上海大学 一种基于OpenCL的KNN-GPU加速方法
CN105808568B (zh) * 2014-12-30 2020-02-14 华为技术有限公司 一种上下文分布式推理方法和装置
CN108171117B (zh) * 2017-12-05 2019-05-21 南京南瑞信息通信科技有限公司 基于多核异构并行计算的电力人工智能视觉分析***
CN109902818B (zh) * 2019-01-15 2021-05-25 中国科学院信息工程研究所 一种面向深度学习训练任务的分布式加速方法及***

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106383835A (zh) * 2016-08-29 2017-02-08 华东师范大学 一种基于形式语义推理和深度学习的自然语言知识挖掘***
CN110199274A (zh) * 2016-12-02 2019-09-03 微软技术许可有限责任公司 用于自动化查询回答生成的***和方法
CN109145168A (zh) * 2018-07-11 2019-01-04 广州极天信息技术股份有限公司 一种专家服务机器人云平台

Also Published As

Publication number Publication date
TW202119255A (zh) 2021-05-16
CN112784989B (zh) 2024-05-03
CN112784989A (zh) 2021-05-11

Similar Documents

Publication Publication Date Title
WO2021088964A1 (fr) Système d'inférence, procédé d'inférence, dispositif électronique et support de stockage informatique
WO2021139177A1 (fr) Procédé et appareil d'augmentation d'image, dispositif informatique et support d'enregistrement
EP3343364A1 (fr) Procédé et appareil de virtualisation d'accélérateur, et gestionnaire de ressources centralisé
US11790004B2 (en) Systems, methods, and apparatuses for providing assistant deep links to effectuate third-party dialog session transfers
CN110569127B (zh) 虚拟资源转移、发送、获取方法和装置
US11182210B2 (en) Method for resource allocation and terminal device
CN111338808B (zh) 一种协同计算方法及***
WO2023029961A1 (fr) Procédé et système d'exécution de tâche, dispositif électronique et support de stockage informatique
CN111200606A (zh) 深度学习模型任务处理方法、***、服务器及存储介质
US20240152393A1 (en) Task execution method and apparatus
CN110738156A (zh) 一种基于消息中间件的人脸识别***及方法
CN111813529B (zh) 数据处理方法、装置、电子设备及存储介质
CN115550354A (zh) 一种数据处理方法、装置及计算机可读存储介质
WO2017185632A1 (fr) Procédé de transmission de données et dispositif électronique
US9124702B2 (en) Strategy pairing
CN113126958B (zh) 基于信息流的决策调度定制方法和***
CN114222028A (zh) 语音识别方法、装置、计算机设备和存储介质
CN113033475A (zh) 目标对象追踪方法、相关装置及计算机程序产品
CN113746754B (zh) 一种数据传输方法、装置、设备及存储介质
CN115460053B (zh) 服务调用方法、装置及边缘计算***
WO2024087844A1 (fr) Procédé et système d'apprentissage de réseau neuronal de graphe, et procédé d'identification de compte anormal
WO2023206049A1 (fr) Procédés et appareils d'exécution de service d'ia, et éléments de réseau, support de stockage et puce
WO2022120993A1 (fr) Procédé et appareil d'attribution de ressources pour scénario en ligne et dispositif électronique
CN115699167A (zh) 当确定是否从某些客户端设备卸载助理相关处理任务时补偿硬件差异
Chatzopoulos et al. Fides: A hidden market approach for trusted mobile ambient computing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20884822

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20884822

Country of ref document: EP

Kind code of ref document: A1