US10467569B2 - Apparatus and method for scheduling distributed workflow tasks - Google Patents

Apparatus and method for scheduling distributed workflow tasks Download PDF

Info

Publication number
US10467569B2
US10467569B2 US14/506,500 US201414506500A US10467569B2 US 10467569 B2 US10467569 B2 US 10467569B2 US 201414506500 A US201414506500 A US 201414506500A US 10467569 B2 US10467569 B2 US 10467569B2
Authority
US
United States
Prior art keywords
data
work flow
server
cluster
execution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US14/506,500
Other languages
English (en)
Other versions
US20160098662A1 (en
Inventor
Peter Voss
Kelly Nawrocke
Matthew McManus
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Datameer Inc
Original Assignee
Datameer Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Datameer Inc filed Critical Datameer Inc
Assigned to DATAMEER, INC. reassignment DATAMEER, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MCMANUS, Matthew, NAWROCKE, Kelly, VOSS, PETER
Priority to US14/506,500 priority Critical patent/US10467569B2/en
Priority to CA2963088A priority patent/CA2963088C/en
Priority to SG11201601137RA priority patent/SG11201601137RA/en
Priority to PCT/US2015/051557 priority patent/WO2016053695A1/en
Priority to CN201580001459.6A priority patent/CN105593818B/zh
Priority to EP15846678.9A priority patent/EP3201771A4/de
Publication of US20160098662A1 publication Critical patent/US20160098662A1/en
Priority to HK16108789.0A priority patent/HK1221027A1/zh
Publication of US10467569B2 publication Critical patent/US10467569B2/en
Application granted granted Critical
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06316Sequencing of tasks or work

Definitions

  • the top-k statistic is used to determine the frequency of that value in the original data source.
  • the constant value is not included in the top-k statistic (because its frequency is less than the least frequent value in the statistic), as an upper bound the number of records of the original data source has to be chosen.
  • an advanced filter is applied (function or nested functions) the system falls back to the number of input records as an upper bound.
  • This graph representation is decorated by the work flow scheduler 200 with the data profiles from the data profile store 304 to provide fine grained access to optimization opportunities both at the work flow level and at the operation level. Edges are marked with data size and shape information and vertices contain operation specific details which when combined allow for intelligent optimization decisions to be made.

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)
US14/506,500 2014-10-03 2014-10-03 Apparatus and method for scheduling distributed workflow tasks Active 2036-12-12 US10467569B2 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
US14/506,500 US10467569B2 (en) 2014-10-03 2014-10-03 Apparatus and method for scheduling distributed workflow tasks
CN201580001459.6A CN105593818B (zh) 2014-10-03 2015-09-22 用于调度分布式工作流程任务的装置和方法
SG11201601137RA SG11201601137RA (en) 2014-10-03 2015-09-22 Apparatus and method for scheduling distributed workflow tasks
PCT/US2015/051557 WO2016053695A1 (en) 2014-10-03 2015-09-22 Apparatus and method for scheduling distributed workflow tasks
CA2963088A CA2963088C (en) 2014-10-03 2015-09-22 Apparatus and method for scheduling distributed workflow tasks
EP15846678.9A EP3201771A4 (de) 2014-10-03 2015-09-22 Vorrichtung und verfahren zur planung von verteilten aufgaben eines arbeitsablaufs
HK16108789.0A HK1221027A1 (zh) 2014-10-03 2016-07-21 用於調度分布式工作流程任務的裝置和方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/506,500 US10467569B2 (en) 2014-10-03 2014-10-03 Apparatus and method for scheduling distributed workflow tasks

Publications (2)

Publication Number Publication Date
US20160098662A1 US20160098662A1 (en) 2016-04-07
US10467569B2 true US10467569B2 (en) 2019-11-05

Family

ID=55631269

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/506,500 Active 2036-12-12 US10467569B2 (en) 2014-10-03 2014-10-03 Apparatus and method for scheduling distributed workflow tasks

Country Status (7)

Country Link
US (1) US10467569B2 (de)
EP (1) EP3201771A4 (de)
CN (1) CN105593818B (de)
CA (1) CA2963088C (de)
HK (1) HK1221027A1 (de)
SG (1) SG11201601137RA (de)
WO (1) WO2016053695A1 (de)

Families Citing this family (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9798775B2 (en) * 2015-01-16 2017-10-24 International Business Machines Corporation Database statistical histogram forecasting
US10095547B1 (en) * 2015-03-13 2018-10-09 Twitter, Inc. Stream processing at scale
US10318491B1 (en) 2015-03-31 2019-06-11 EMC IP Holding Company LLC Object metadata query with distributed processing systems
US11016946B1 (en) * 2015-03-31 2021-05-25 EMC IP Holding Company LLC Method and apparatus for processing object metadata
US10425350B1 (en) 2015-04-06 2019-09-24 EMC IP Holding Company LLC Distributed catalog service for data processing platform
US10496926B2 (en) 2015-04-06 2019-12-03 EMC IP Holding Company LLC Analytics platform for scalable distributed computations
US10776404B2 (en) * 2015-04-06 2020-09-15 EMC IP Holding Company LLC Scalable distributed computations utilizing multiple distinct computational frameworks
US10511659B1 (en) * 2015-04-06 2019-12-17 EMC IP Holding Company LLC Global benchmarking and statistical analysis at scale
US10528875B1 (en) 2015-04-06 2020-01-07 EMC IP Holding Company LLC Methods and apparatus implementing data model for disease monitoring, characterization and investigation
US10791063B1 (en) 2015-04-06 2020-09-29 EMC IP Holding Company LLC Scalable edge computing using devices with limited resources
US10860622B1 (en) 2015-04-06 2020-12-08 EMC IP Holding Company LLC Scalable recursive computation for pattern identification across distributed data processing nodes
US10331380B1 (en) 2015-04-06 2019-06-25 EMC IP Holding Company LLC Scalable distributed in-memory computation utilizing batch mode extensions
US10366111B1 (en) * 2015-04-06 2019-07-30 EMC IP Holding Company LLC Scalable distributed computations utilizing multiple distinct computational frameworks
US10706970B1 (en) 2015-04-06 2020-07-07 EMC IP Holding Company LLC Distributed data analytics
US10505863B1 (en) 2015-04-06 2019-12-10 EMC IP Holding Company LLC Multi-framework distributed computation
US10812341B1 (en) 2015-04-06 2020-10-20 EMC IP Holding Company LLC Scalable recursive computation across distributed data processing nodes
US10015106B1 (en) 2015-04-06 2018-07-03 EMC IP Holding Company LLC Multi-cluster distributed data processing platform
US10509684B2 (en) 2015-04-06 2019-12-17 EMC IP Holding Company LLC Blockchain integration for scalable distributed computations
US10348810B1 (en) * 2015-04-06 2019-07-09 EMC IP Holding Company LLC Scalable distributed computations utilizing multiple distinct clouds
US10515097B2 (en) * 2015-04-06 2019-12-24 EMC IP Holding Company LLC Analytics platform for scalable distributed computations
US10541936B1 (en) * 2015-04-06 2020-01-21 EMC IP Holding Company LLC Method and system for distributed analysis
US10541938B1 (en) 2015-04-06 2020-01-21 EMC IP Holding Company LLC Integration of distributed data processing platform with one or more distinct supporting platforms
US10404787B1 (en) * 2015-04-06 2019-09-03 EMC IP Holding Company LLC Scalable distributed data streaming computations across multiple data processing clusters
CN104834561B (zh) * 2015-04-29 2018-01-19 华为技术有限公司 一种数据处理方法及装置
WO2017059012A1 (en) * 2015-09-29 2017-04-06 Skytree, Inc. Exporting a transformation chain including endpoint of model for prediction
US10033816B2 (en) * 2015-09-30 2018-07-24 Amazon Technologies, Inc. Workflow service using state transfer
US10013214B2 (en) * 2015-12-29 2018-07-03 International Business Machines Corporation Adaptive caching and dynamic delay scheduling for in-memory data analytics
US10656861B1 (en) 2015-12-29 2020-05-19 EMC IP Holding Company LLC Scalable distributed in-memory computation
US10949251B2 (en) * 2016-04-01 2021-03-16 Intel Corporation System and method to accelerate reduce operations in graphics processor
US10698954B2 (en) * 2016-06-30 2020-06-30 Facebook, Inc. Computation platform agnostic data classification workflows
US10592813B1 (en) * 2016-11-29 2020-03-17 EMC IP Holding Company LLC Methods and apparatus for data operation pre-processing with probabilistic estimation of operation value
US10374968B1 (en) * 2016-12-30 2019-08-06 EMC IP Holding Company LLC Data-driven automation mechanism for analytics workload distribution
US10554577B2 (en) 2017-03-14 2020-02-04 International Business Machines Corporation Adaptive resource scheduling for data stream processing
US10817334B1 (en) * 2017-03-14 2020-10-27 Twitter, Inc. Real-time analysis of data streaming objects for distributed stream processing
US10726007B2 (en) 2017-09-26 2020-07-28 Microsoft Technology Licensing, Llc Building heavy hitter summary for query optimization
US10671436B2 (en) 2018-05-02 2020-06-02 International Business Machines Corporation Lazy data loading for improving memory cache hit ratio in DAG-based computational system
US20190042308A1 (en) * 2018-08-31 2019-02-07 Intel Corporation Technologies for providing efficient scheduling of functions
KR102579058B1 (ko) * 2018-09-11 2023-09-14 후아웨이 테크놀러지 컴퍼니 리미티드 Dag을 순차적으로 계산하기 위한 이종 스케줄링
US10901797B2 (en) 2018-11-06 2021-01-26 International Business Machines Corporation Resource allocation
CN110287245B (zh) * 2019-05-15 2021-03-19 北方工业大学 用于分布式etl任务调度执行的方法及***
CN110427252B (zh) * 2019-06-18 2024-03-26 平安银行股份有限公司 基于任务依赖关系的任务调度方法、装置及存储介质
US11269879B2 (en) * 2020-01-13 2022-03-08 Google Llc Optimal query scheduling according to data freshness requirements
CN113495679B (zh) * 2020-04-01 2022-10-21 北京大学 基于非易失存储介质的大数据存储访问与处理的优化方法
CN111475684B (zh) * 2020-06-29 2020-09-22 北京一流科技有限公司 数据处理网络***及其计算图生成方法
US20220012525A1 (en) * 2020-07-10 2022-01-13 International Business Machines Corporation Histogram generation
KR102465932B1 (ko) * 2020-11-19 2022-11-11 주식회사 와이즈넛 태스크별 플랫폼 선정을 자동화하는 크로스 모델 데이터 통합처리 플랫폼
CN112527387B (zh) * 2020-11-20 2024-03-01 杭州大搜车汽车服务有限公司 应用处理方法和装置
CN112529438B (zh) * 2020-12-18 2023-06-09 平安银行股份有限公司 分布调度***工作流处理方法、装置、计算机设备及存储介质
CN113434279A (zh) * 2021-07-14 2021-09-24 上海浦东发展银行股份有限公司 一种任务执行方法、装置、设备及存储介质
CN113379397B (zh) * 2021-07-16 2023-09-22 北京华博创科科技股份有限公司 一种基于机器学习的云工作流架智能管理与调度***
CN114662932B (zh) * 2022-03-24 2024-07-19 重庆邮电大学 一种节点分级的工作流类定时任务调度方法
CN118331712B (zh) * 2024-06-12 2024-08-09 北京科杰科技有限公司 一种Spark多任务依赖调度方法

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070156771A1 (en) 2005-12-19 2007-07-05 Hurley Paul T Method, device and computer program product for determining a malicious workload pattern
US20100107142A1 (en) 2008-10-24 2010-04-29 Microsoft Corporation Scalability analysis for server systems
US20110276977A1 (en) 2010-05-07 2011-11-10 Microsoft Corporation Distributed workflow execution
US20110321051A1 (en) 2010-06-25 2011-12-29 Ebay Inc. Task scheduling based on dependencies and resources
US20120290862A1 (en) 2011-05-13 2012-11-15 International Business Machines Corporation Optimizing energy consumption utilized for workload processing in a networked computing environment
US20130166515A1 (en) * 2011-12-22 2013-06-27 David Kung Generating validation rules for a data report based on profiling the data report in a data processing tool
US20130290973A1 (en) 2011-11-21 2013-10-31 Emc Corporation Programming model for transparent parallelization of combinatorial optimization
US20130318277A1 (en) * 2012-05-22 2013-11-28 Xockets IP, LLC Processing structured and unstructured data using offload processors
US20130346988A1 (en) 2012-06-22 2013-12-26 Microsoft Corporation Parallel data computing optimization
EP2752779A2 (de) 2013-01-07 2014-07-09 Facebook, Inc. System und Verfahren für verteilte Datenbankabfragemaschine
US20140229221A1 (en) 2013-02-11 2014-08-14 Amazon Technologies, Inc. Cost-minimizing task scheduler
US20150066646A1 (en) * 2013-08-27 2015-03-05 Yahoo! Inc. Spark satellite clusters to hadoop data stores
US20150378696A1 (en) * 2014-06-27 2015-12-31 International Business Machines Corporation Hybrid parallelization strategies for machine learning programs on top of mapreduce

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4165522B2 (ja) * 2005-04-27 2008-10-15 ブラザー工業株式会社 画像読取装置
JP4822154B2 (ja) * 2005-07-29 2011-11-24 株式会社吉野工業所 インモールドラベル付き容器とその成形方法
WO2010001353A1 (en) * 2008-07-02 2010-01-07 Nxp B.V. A multiprocessor circuit using run-time task scheduling

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070156771A1 (en) 2005-12-19 2007-07-05 Hurley Paul T Method, device and computer program product for determining a malicious workload pattern
US20100107142A1 (en) 2008-10-24 2010-04-29 Microsoft Corporation Scalability analysis for server systems
US20110276977A1 (en) 2010-05-07 2011-11-10 Microsoft Corporation Distributed workflow execution
US20110321051A1 (en) 2010-06-25 2011-12-29 Ebay Inc. Task scheduling based on dependencies and resources
US20120290862A1 (en) 2011-05-13 2012-11-15 International Business Machines Corporation Optimizing energy consumption utilized for workload processing in a networked computing environment
US20130290973A1 (en) 2011-11-21 2013-10-31 Emc Corporation Programming model for transparent parallelization of combinatorial optimization
US20130166515A1 (en) * 2011-12-22 2013-06-27 David Kung Generating validation rules for a data report based on profiling the data report in a data processing tool
US20130318277A1 (en) * 2012-05-22 2013-11-28 Xockets IP, LLC Processing structured and unstructured data using offload processors
US20130346988A1 (en) 2012-06-22 2013-12-26 Microsoft Corporation Parallel data computing optimization
EP2752779A2 (de) 2013-01-07 2014-07-09 Facebook, Inc. System und Verfahren für verteilte Datenbankabfragemaschine
US20140229221A1 (en) 2013-02-11 2014-08-14 Amazon Technologies, Inc. Cost-minimizing task scheduler
US20150066646A1 (en) * 2013-08-27 2015-03-05 Yahoo! Inc. Spark satellite clusters to hadoop data stores
US20150378696A1 (en) * 2014-06-27 2015-12-31 International Business Machines Corporation Hybrid parallelization strategies for machine learning programs on top of mapreduce

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Extended European Search Report dated Mar. 9, 2018, for EP Application No. 15 846 678.9, filed on Sep. 22, 2015, 8 pages.
International Search Report and Written Opinion issued to International Patent Application No. PCT/US2015/051557, dated Dec. 17, 2015, 9 pgs.
Stoica, I. (2014). "Apache Spark and Hadoop: Working Together," located at https://databricks.com/blog/2014/01/21/spark-and-hadoop.html, 4 total pages.

Also Published As

Publication number Publication date
EP3201771A4 (de) 2018-04-11
CA2963088A1 (en) 2016-04-07
WO2016053695A1 (en) 2016-04-07
SG11201601137RA (en) 2016-05-30
HK1221027A1 (zh) 2017-05-19
US20160098662A1 (en) 2016-04-07
CN105593818B (zh) 2020-09-18
CA2963088C (en) 2021-10-26
CN105593818A (zh) 2016-05-18
EP3201771A1 (de) 2017-08-09

Similar Documents

Publication Publication Date Title
US10467569B2 (en) Apparatus and method for scheduling distributed workflow tasks
Hernández et al. Using machine learning to optimize parallelism in big data applications
Ousterhout et al. Monotasks: Architecting for performance clarity in data analytics frameworks
Rodrigo et al. Towards understanding HPC users and systems: a NERSC case study
Gounaris et al. Dynamic configuration of partitioning in spark applications
Zhang et al. Automated profiling and resource management of pig programs for meeting service level objectives
Grover et al. Extending map-reduce for efficient predicate-based sampling
Tang et al. Dynamic memory-aware scheduling in spark computing environment
US8458136B2 (en) Scheduling highly parallel jobs having global interdependencies
Kroß et al. Model-based performance evaluation of batch and stream applications for big data
Shao et al. Stage delay scheduling: Speeding up dag-style data analytics jobs with resource interleaving
Guo et al. Automatic task re-organization in MapReduce
Kllapi et al. Elastic processing of analytical query workloads on iaas clouds
Geng et al. A profile-based ai-assisted dynamic scheduling approach for heterogeneous architectures
Wang et al. A speculative parallel decompression algorithm on apache spark
Sejdiu et al. DistLODStats: Distributed computation of RDF dataset statistics
Boucheneb et al. Optimal reachability in cost time Petri nets
Lucas Filho et al. Investigating automatic parameter tuning for sql-on-hadoop systems
Malensek et al. Using distributed analytics to enable real-time exploration of discrete event simulations
Bhosale et al. Big data processing using hadoop: survey on scheduling
Cohen et al. High-performance statistical modeling
Liang et al. Scalable adaptive optimizations for stream-based workflows in multi-HPC-clusters and cloud infrastructures
Liu et al. Multivariate modeling and two-level scheduling of analytic queries
Yin et al. Performance modeling and optimization of MapReduce programs
Ghoshal et al. Characterizing scientific workflows on HPC systems using logs

Legal Events

Date Code Title Description
AS Assignment

Owner name: DATAMEER, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VOSS, PETER;NAWROCKE, KELLY;MCMANUS, MATTHEW;SIGNING DATES FROM 20141002 TO 20141003;REEL/FRAME:033885/0730

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment: 4