CN114503084A - 并行程序可扩展性瓶颈检测方法和计算装置 - Google Patents

并行程序可扩展性瓶颈检测方法和计算装置 Download PDF

Info

Publication number
CN114503084A
CN114503084A CN202080035153.3A CN202080035153A CN114503084A CN 114503084 A CN114503084 A CN 114503084A CN 202080035153 A CN202080035153 A CN 202080035153A CN 114503084 A CN114503084 A CN 114503084A
Authority
CN
China
Prior art keywords
program
vertex
performance data
performance
vertices
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202080035153.3A
Other languages
English (en)
Other versions
CN114503084B (zh
Inventor
翟季冬
金煜阳
陈文光
郑纬民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Publication of CN114503084A publication Critical patent/CN114503084A/zh
Application granted granted Critical
Publication of CN114503084B publication Critical patent/CN114503084B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3404Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for parallel or distributed programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/433Dependency analysis; Data or control flow analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/865Monitoring of software
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

提供了一种计算机执行的并行程序可扩展性瓶颈检测方法,包括:针对所述程序源码,构建程序结构图;基于采样技术在并行程序运行时收集性能数据,所述性能数据包括:程序结构图每个顶点的硬件计数器性能数据和通信顶点的进程间通信依赖性能数据;基于构建的程序结构图和采样收集的性能数据,以采集的性能数据填充程序结构图来构建程序性能图,程序性能图记录了每个进程的数据和控制依赖性以及进程间的通信依赖性;从程序性能图检测有问题的顶点,以及从有问题的顶点中的部分或者全部开始,通过进程中的数据/控制相关性边以及不同进程之间的通信相关性边进行反向跟踪,来检测自动扩展性瓶颈所处的顶点。

Description

PCT国内申请,说明书已公开。

Claims (17)

  1. PCT国内申请,权利要求书已公开。
CN202080035153.3A 2020-08-27 2020-08-27 并行程序可扩展性瓶颈检测方法和计算装置 Active CN114503084B (zh)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/111588 WO2022041024A1 (zh) 2020-08-27 2020-08-27 并行程序可扩展性瓶颈检测方法和计算装置

Publications (2)

Publication Number Publication Date
CN114503084A true CN114503084A (zh) 2022-05-13
CN114503084B CN114503084B (zh) 2023-07-25

Family

ID=80352317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080035153.3A Active CN114503084B (zh) 2020-08-27 2020-08-27 并行程序可扩展性瓶颈检测方法和计算装置

Country Status (3)

Country Link
US (1) US11768754B2 (zh)
CN (1) CN114503084B (zh)
WO (1) WO2022041024A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115712580A (zh) * 2022-11-25 2023-02-24 格兰菲智能科技有限公司 内存地址分配方法、装置、计算机设备和存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115249057A (zh) * 2021-04-26 2022-10-28 阿里巴巴新加坡控股有限公司 用于图形节点采样的***和由计算机实现的方法
US20240045662A1 (en) * 2022-08-02 2024-02-08 Nvidia Corporation Software code verification using call graphs for autonomous systems and applications

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6067415A (en) * 1995-12-26 2000-05-23 Kabushiki Kaisha Toshiba System for assisting a programmer find errors in concurrent programs
CN101661409A (zh) * 2009-09-22 2010-03-03 清华大学 并行程序通信模式的提取方法及***
CN106294136A (zh) * 2016-07-29 2017-01-04 鄞州浙江清华长三角研究院创新中心 并行程序运行期间性能变化的在线检测方法和***
US20180039570A1 (en) * 2016-08-05 2018-02-08 International Business Machines Corporation Prioritizing resiliency tests of microservices
US20190121919A1 (en) * 2017-10-23 2019-04-25 Onespin Solutions Gmbh Method of Selecting a Prover
CN111367780A (zh) * 2020-03-30 2020-07-03 西安芯瞳半导体技术有限公司 一种gpu的性能测试方法、装置及计算机存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102790695B (zh) 2012-07-23 2015-03-25 华为技术有限公司 服务器i/o子***性能瓶颈诊断***及方法
US9691171B2 (en) 2012-08-03 2017-06-27 Dreamworks Animation Llc Visualization tool for parallel dependency graph evaluation
US10599551B2 (en) * 2016-08-12 2020-03-24 The University Of Chicago Automatically detecting distributed concurrency errors in cloud systems
US10216699B2 (en) * 2016-11-11 2019-02-26 1Qb Information Technologies Inc. Method and system for setting parameters of a discrete optimization problem embedded to an optimization solver and solving the embedded discrete optimization problem
CN111124675B (zh) * 2019-12-11 2023-06-20 华中科技大学 一种面向图计算的异构存内计算设备及其运行方法
JP7359394B2 (ja) * 2020-03-06 2023-10-11 オムロン株式会社 情報処理装置および機械学習方法
US20200226124A1 (en) * 2020-03-27 2020-07-16 Intel Corporation Edge batch reordering for streaming graph analytics
CN114174765B (zh) * 2020-06-22 2023-02-21 格步计程车控股私人有限公司 用于校正地图数据中的错误的方法和设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6067415A (en) * 1995-12-26 2000-05-23 Kabushiki Kaisha Toshiba System for assisting a programmer find errors in concurrent programs
CN101661409A (zh) * 2009-09-22 2010-03-03 清华大学 并行程序通信模式的提取方法及***
CN106294136A (zh) * 2016-07-29 2017-01-04 鄞州浙江清华长三角研究院创新中心 并行程序运行期间性能变化的在线检测方法和***
US20180039570A1 (en) * 2016-08-05 2018-02-08 International Business Machines Corporation Prioritizing resiliency tests of microservices
US20190121919A1 (en) * 2017-10-23 2019-04-25 Onespin Solutions Gmbh Method of Selecting a Prover
CN111367780A (zh) * 2020-03-30 2020-07-03 西安芯瞳半导体技术有限公司 一种gpu的性能测试方法、装置及计算机存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘晓平 等: "并行程序性能检测及可视化", 仪器仪表学报, vol. 29, no. 09 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115712580A (zh) * 2022-11-25 2023-02-24 格兰菲智能科技有限公司 内存地址分配方法、装置、计算机设备和存储介质
CN115712580B (zh) * 2022-11-25 2024-01-30 格兰菲智能科技有限公司 内存地址分配方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
WO2022041024A1 (zh) 2022-03-03
CN114503084B (zh) 2023-07-25
US20230244588A1 (en) 2023-08-03
US11768754B2 (en) 2023-09-26

Similar Documents

Publication Publication Date Title
CN114503084B (zh) 并行程序可扩展性瓶颈检测方法和计算装置
Vetter et al. Dynamic software testing of MPI applications with Umpire
US7386577B2 (en) Dynamic determination of transaction boundaries in workflow systems
Cai et al. MagicFuzzer: Scalable deadlock detection for large-scale applications
Tan et al. Visual, log-based causal tracing for performance debugging of mapreduce systems
US8578348B2 (en) System and method of cost oriented software profiling
Tallent et al. Scalable identification of load imbalance in parallel executions using call path profiles
Rosa et al. Predicting and mitigating jobs failures in big data clusters
CN108153587B (zh) 一种针对大数据平台的慢任务原因检测方法
Zhai et al. Cypress: Combining static and dynamic analysis for top-down communication trace compression
Las-Casas et al. Weighted sampling of execution traces: Capturing more needles and less hay
Schulz Extracting critical path graphs from MPI applications
Bevan et al. Identification of Software Instabilities.
CN108647137A (zh) 一种作业性能预测方法、装置、介质、设备及***
US20040181562A1 (en) System and method for determining deallocatable memory in a heap
US20090182994A1 (en) Two-level representative workload phase detection method, apparatus, and computer usable program code
CN105243023A (zh) 并行运行时错误检测方法
Creţu-Ciocârlie et al. Hunting for problems with Artemis
Teoh et al. Perfdebug: Performance debugging of computation skew in dataflow systems
Butrovich et al. Tastes great! Less filling! High performance and accurate training data collection for self-driving database management systems
Isaacs et al. Ordering traces logically to identify lateness in message passing programs
CN110109811B (zh) 一种面向gpu计算性能问题的溯源方法
WO2023143426A1 (zh) 性能分析编程框架、方法和装置
Bahmani et al. Chameleon: Online clustering of mpi program traces
Goswami et al. Dynamic slicing of concurrent programs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant