WO2021088964A1 - Système d'inférence, procédé d'inférence, dispositif électronique et support de stockage informatique - Google Patents
Système d'inférence, procédé d'inférence, dispositif électronique et support de stockage informatique Download PDFInfo
- Publication number
- WO2021088964A1 WO2021088964A1 PCT/CN2020/127026 CN2020127026W WO2021088964A1 WO 2021088964 A1 WO2021088964 A1 WO 2021088964A1 CN 2020127026 W CN2020127026 W CN 2020127026W WO 2021088964 A1 WO2021088964 A1 WO 2021088964A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- inference
- reasoning
- computing device
- calculation model
- information
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/045—Explanation of inference; Explainable artificial intelligence [XAI]; Interpretable artificial intelligence
Definitions
- Fig. 7 is a flowchart of an inference method according to the sixth embodiment of the present invention.
- Fig. 9 is a schematic structural diagram of an electronic device according to the eighth embodiment of the present invention.
- the corresponding model information can be obtained when the deep learning framework is loaded into the model.
- the inference client first sends the information of the calculation model to the second terminal device, and the second terminal device receives the calculation through the inference server.
- Model information Assuming that the information of the calculation model indicates that the calculation model to be used is calculation model A, and the resource pool of the second terminal device stores calculation models A, B, C, and D, then the second terminal device will directly access the resource from the resource through the GPU. Load calculation model A in the pool.
- one or more types of inference acceleration resources are provided in the second computing device 204; when the inference acceleration resources include multiple types, different types of inference acceleration resources have different usage priorities; the reasoning server 2042 is based on The preset load balancing rules and the priority of various types of inference acceleration resources, use the inference acceleration resources.
- the number of a certain type of reasoning acceleration resource may be one or multiple, which is set by a person skilled in the art according to requirements, and the embodiment of the present invention does not limit this.
- those skilled in the art can also set other appropriate load balancing rules according to actual needs, which is not limited in the embodiment of the present invention.
- the inference calculation can be seamlessly transferred to the remote target computing device with inference acceleration resources, and the interaction between the current computing device and the target computing device is important to the user. It is imperceptible. Therefore, it can ensure that the business logic of the application involving reasoning and the user's use habits for reasoning business remain unchanged, and the reasoning is realized at low cost and the user experience is improved.
- Step S506 Feed back the result of the inference processing to the source computing device.
- the inference method of this embodiment can be implemented by the inference server of the second computing device in the foregoing embodiment, and the specific implementation of the foregoing process can also refer to the operation of the inference server in the foregoing embodiment, which will not be repeated here.
- the reasoning acceleration resource includes one or more types; when the reasoning acceleration resource includes multiple types, different types of reasoning acceleration resources have different usage priorities; then,
- the calculation model indicated by the inference acceleration resource loading the model information includes: according to a preset load balancing rule and the priority of multiple types of the inference acceleration resource, the inference acceleration resource is used to load the model information indication Calculation model.
- the electronic device may include a processor (processor) 702, a communication interface (Communications Interface) 704, a memory (memory) 706, and a communication bus 708.
- processor processor
- Communication interface Communication Interface
- memory memory
- the information of the processing function is API interface information of the processing function.
- the inference processing is deployed in different computing devices, where the target computing device is provided with inference acceleration resources, and the main inference processing can be performed through the computing model, and the inference method of this embodiment is executed.
- Current electronic equipment can be responsible for data processing before and after inference processing.
- the current electronic device can first send the model information of the calculation model to the target computing device, and the target computing device uses the inference acceleration resource to load the corresponding calculation model; then, the current electronic device sends the data to be inferred to the target computing device, After the target computing device receives the data to be inferred, it can perform inference processing through the loaded computing model. In this way, the decoupling of computing resources used for inference is realized.
- the processor 802 is configured to execute the program 810, and specifically can execute the relevant steps in the inference method embodiment in the fifth or sixth embodiment.
- each component/step described in the embodiment of the present invention can be split into more components/steps, or two or more components/steps or partial operations of components/steps can be combined into New components/steps to achieve the purpose of the embodiments of the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer And Data Communications (AREA)
- Debugging And Monitoring (AREA)
Abstract
L'invention concerne un système d'inférence, un procédé d'inférence, un dispositif électronique et un support de stockage informatique. Le système d'inférence comporte un premier dispositif informatique et un second dispositif informatique qui sont reliés l'un à l'autre, le premier dispositif informatique étant muni d'un client d'inférence, et le second dispositif informatique comportant une ressource d'accélération d'inférence et un serveur d'inférence, le client d'inférence étant utilisé pour acquérir des informations de modèle d'un modèle informatique servant à l'inférence et des données à soumettre à une inférence, et pour envoyer respectivement les informations de modèle et lesdites données au serveur d'inférence dans le second dispositif informatique ; et le serveur d'inférence étant utilisé pour charger et appeler, au moyen de la ressource d'accélération d'inférence, le modèle informatique indiqué par les informations de modèle, et effectuer, au moyen du modèle informatique, un traitement d'inférence sur lesdites données, et renvoyer le résultat du traitement d'inférence au client d'inférence.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911089253.XA CN112784989B (zh) | 2019-11-08 | 2019-11-08 | 推理***、推理方法、电子设备及计算机存储介质 |
CN201911089253.X | 2019-11-08 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021088964A1 true WO2021088964A1 (fr) | 2021-05-14 |
Family
ID=75748575
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/127026 WO2021088964A1 (fr) | 2019-11-08 | 2020-11-06 | Système d'inférence, procédé d'inférence, dispositif électronique et support de stockage informatique |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN112784989B (fr) |
TW (1) | TW202119255A (fr) |
WO (1) | WO2021088964A1 (fr) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113344208B (zh) * | 2021-06-25 | 2023-04-07 | 中国电信股份有限公司 | 数据推理方法、装置及*** |
CN116127082A (zh) * | 2021-11-12 | 2023-05-16 | 华为技术有限公司 | 一种数据采集方法、***以及相关装置 |
TWI832279B (zh) * | 2022-06-07 | 2024-02-11 | 宏碁股份有限公司 | 人工智慧模型運算加速系統及人工智慧模型運算加速方法 |
WO2024000605A1 (fr) * | 2022-07-01 | 2024-01-04 | 北京小米移动软件有限公司 | Procédé et appareil de raisonnement de modèle d'ia |
CN114997401B (zh) * | 2022-08-03 | 2022-11-04 | 腾讯科技(深圳)有限公司 | 自适应推理加速方法、装置、计算机设备和存储介质 |
CN116402141B (zh) * | 2023-06-09 | 2023-09-05 | 太初(无锡)电子科技有限公司 | 一种模型推理方法、装置、电子设备及存储介质 |
CN116723191B (zh) * | 2023-08-07 | 2023-11-10 | 深圳鲲云信息科技有限公司 | 利用加速装置执行数据流加速计算的方法和*** |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106383835A (zh) * | 2016-08-29 | 2017-02-08 | 华东师范大学 | 一种基于形式语义推理和深度学习的自然语言知识挖掘*** |
CN109145168A (zh) * | 2018-07-11 | 2019-01-04 | 广州极天信息技术股份有限公司 | 一种专家服务机器人云平台 |
CN110199274A (zh) * | 2016-12-02 | 2019-09-03 | 微软技术许可有限责任公司 | 用于自动化查询回答生成的***和方法 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101126524B1 (ko) * | 2010-06-25 | 2012-03-22 | 국민대학교산학협력단 | 사용자 중심의 상황 인지 시스템, 이에 적합한 상황 정보 변환 방법 및 사례 기반 추론 방법 |
CN104020983A (zh) * | 2014-06-16 | 2014-09-03 | 上海大学 | 一种基于OpenCL的KNN-GPU加速方法 |
CN105808568B (zh) * | 2014-12-30 | 2020-02-14 | 华为技术有限公司 | 一种上下文分布式推理方法和装置 |
CN108171117B (zh) * | 2017-12-05 | 2019-05-21 | 南京南瑞信息通信科技有限公司 | 基于多核异构并行计算的电力人工智能视觉分析*** |
CN109902818B (zh) * | 2019-01-15 | 2021-05-25 | 中国科学院信息工程研究所 | 一种面向深度学习训练任务的分布式加速方法及*** |
-
2019
- 2019-11-08 CN CN201911089253.XA patent/CN112784989B/zh active Active
-
2020
- 2020-08-19 TW TW109128235A patent/TW202119255A/zh unknown
- 2020-11-06 WO PCT/CN2020/127026 patent/WO2021088964A1/fr active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106383835A (zh) * | 2016-08-29 | 2017-02-08 | 华东师范大学 | 一种基于形式语义推理和深度学习的自然语言知识挖掘*** |
CN110199274A (zh) * | 2016-12-02 | 2019-09-03 | 微软技术许可有限责任公司 | 用于自动化查询回答生成的***和方法 |
CN109145168A (zh) * | 2018-07-11 | 2019-01-04 | 广州极天信息技术股份有限公司 | 一种专家服务机器人云平台 |
Also Published As
Publication number | Publication date |
---|---|
TW202119255A (zh) | 2021-05-16 |
CN112784989B (zh) | 2024-05-03 |
CN112784989A (zh) | 2021-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021088964A1 (fr) | Système d'inférence, procédé d'inférence, dispositif électronique et support de stockage informatique | |
WO2021139177A1 (fr) | Procédé et appareil d'augmentation d'image, dispositif informatique et support d'enregistrement | |
EP3343364A1 (fr) | Procédé et appareil de virtualisation d'accélérateur, et gestionnaire de ressources centralisé | |
US11790004B2 (en) | Systems, methods, and apparatuses for providing assistant deep links to effectuate third-party dialog session transfers | |
CN110569127B (zh) | 虚拟资源转移、发送、获取方法和装置 | |
US11182210B2 (en) | Method for resource allocation and terminal device | |
CN111338808B (zh) | 一种协同计算方法及*** | |
WO2023029961A1 (fr) | Procédé et système d'exécution de tâche, dispositif électronique et support de stockage informatique | |
CN111200606A (zh) | 深度学习模型任务处理方法、***、服务器及存储介质 | |
US20240152393A1 (en) | Task execution method and apparatus | |
CN110738156A (zh) | 一种基于消息中间件的人脸识别***及方法 | |
CN111813529B (zh) | 数据处理方法、装置、电子设备及存储介质 | |
CN115550354A (zh) | 一种数据处理方法、装置及计算机可读存储介质 | |
WO2017185632A1 (fr) | Procédé de transmission de données et dispositif électronique | |
US9124702B2 (en) | Strategy pairing | |
CN113126958B (zh) | 基于信息流的决策调度定制方法和*** | |
CN114222028A (zh) | 语音识别方法、装置、计算机设备和存储介质 | |
CN113033475A (zh) | 目标对象追踪方法、相关装置及计算机程序产品 | |
CN113746754B (zh) | 一种数据传输方法、装置、设备及存储介质 | |
CN115460053B (zh) | 服务调用方法、装置及边缘计算*** | |
WO2024087844A1 (fr) | Procédé et système d'apprentissage de réseau neuronal de graphe, et procédé d'identification de compte anormal | |
WO2023206049A1 (fr) | Procédés et appareils d'exécution de service d'ia, et éléments de réseau, support de stockage et puce | |
WO2022120993A1 (fr) | Procédé et appareil d'attribution de ressources pour scénario en ligne et dispositif électronique | |
CN115699167A (zh) | 当确定是否从某些客户端设备卸载助理相关处理任务时补偿硬件差异 | |
Chatzopoulos et al. | Fides: A hidden market approach for trusted mobile ambient computing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20884822 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20884822 Country of ref document: EP Kind code of ref document: A1 |