US10467569B2 - Apparatus and method for scheduling distributed workflow tasks - Google Patents
Apparatus and method for scheduling distributed workflow tasks Download PDFInfo
- Publication number
- US10467569B2 US10467569B2 US14/506,500 US201414506500A US10467569B2 US 10467569 B2 US10467569 B2 US 10467569B2 US 201414506500 A US201414506500 A US 201414506500A US 10467569 B2 US10467569 B2 US 10467569B2
- Authority
- US
- United States
- Prior art keywords
- data
- work flow
- server
- cluster
- execution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06316—Sequencing of tasks or work
Definitions
- the top-k statistic is used to determine the frequency of that value in the original data source.
- the constant value is not included in the top-k statistic (because its frequency is less than the least frequent value in the statistic), as an upper bound the number of records of the original data source has to be chosen.
- an advanced filter is applied (function or nested functions) the system falls back to the number of input records as an upper bound.
- This graph representation is decorated by the work flow scheduler 200 with the data profiles from the data profile store 304 to provide fine grained access to optimization opportunities both at the work flow level and at the operation level. Edges are marked with data size and shape information and vertices contain operation specific details which when combined allow for intelligent optimization decisions to be made.
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Educational Administration (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Debugging And Monitoring (AREA)
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/506,500 US10467569B2 (en) | 2014-10-03 | 2014-10-03 | Apparatus and method for scheduling distributed workflow tasks |
CN201580001459.6A CN105593818B (zh) | 2014-10-03 | 2015-09-22 | 用于调度分布式工作流程任务的装置和方法 |
SG11201601137RA SG11201601137RA (en) | 2014-10-03 | 2015-09-22 | Apparatus and method for scheduling distributed workflow tasks |
PCT/US2015/051557 WO2016053695A1 (en) | 2014-10-03 | 2015-09-22 | Apparatus and method for scheduling distributed workflow tasks |
CA2963088A CA2963088C (en) | 2014-10-03 | 2015-09-22 | Apparatus and method for scheduling distributed workflow tasks |
EP15846678.9A EP3201771A4 (de) | 2014-10-03 | 2015-09-22 | Vorrichtung und verfahren zur planung von verteilten aufgaben eines arbeitsablaufs |
HK16108789.0A HK1221027A1 (zh) | 2014-10-03 | 2016-07-21 | 用於調度分布式工作流程任務的裝置和方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/506,500 US10467569B2 (en) | 2014-10-03 | 2014-10-03 | Apparatus and method for scheduling distributed workflow tasks |
Publications (2)
Publication Number | Publication Date |
---|---|
US20160098662A1 US20160098662A1 (en) | 2016-04-07 |
US10467569B2 true US10467569B2 (en) | 2019-11-05 |
Family
ID=55631269
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/506,500 Active 2036-12-12 US10467569B2 (en) | 2014-10-03 | 2014-10-03 | Apparatus and method for scheduling distributed workflow tasks |
Country Status (7)
Country | Link |
---|---|
US (1) | US10467569B2 (de) |
EP (1) | EP3201771A4 (de) |
CN (1) | CN105593818B (de) |
CA (1) | CA2963088C (de) |
HK (1) | HK1221027A1 (de) |
SG (1) | SG11201601137RA (de) |
WO (1) | WO2016053695A1 (de) |
Families Citing this family (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9798775B2 (en) * | 2015-01-16 | 2017-10-24 | International Business Machines Corporation | Database statistical histogram forecasting |
US10095547B1 (en) * | 2015-03-13 | 2018-10-09 | Twitter, Inc. | Stream processing at scale |
US10318491B1 (en) | 2015-03-31 | 2019-06-11 | EMC IP Holding Company LLC | Object metadata query with distributed processing systems |
US11016946B1 (en) * | 2015-03-31 | 2021-05-25 | EMC IP Holding Company LLC | Method and apparatus for processing object metadata |
US10425350B1 (en) | 2015-04-06 | 2019-09-24 | EMC IP Holding Company LLC | Distributed catalog service for data processing platform |
US10496926B2 (en) | 2015-04-06 | 2019-12-03 | EMC IP Holding Company LLC | Analytics platform for scalable distributed computations |
US10776404B2 (en) * | 2015-04-06 | 2020-09-15 | EMC IP Holding Company LLC | Scalable distributed computations utilizing multiple distinct computational frameworks |
US10511659B1 (en) * | 2015-04-06 | 2019-12-17 | EMC IP Holding Company LLC | Global benchmarking and statistical analysis at scale |
US10528875B1 (en) | 2015-04-06 | 2020-01-07 | EMC IP Holding Company LLC | Methods and apparatus implementing data model for disease monitoring, characterization and investigation |
US10791063B1 (en) | 2015-04-06 | 2020-09-29 | EMC IP Holding Company LLC | Scalable edge computing using devices with limited resources |
US10860622B1 (en) | 2015-04-06 | 2020-12-08 | EMC IP Holding Company LLC | Scalable recursive computation for pattern identification across distributed data processing nodes |
US10331380B1 (en) | 2015-04-06 | 2019-06-25 | EMC IP Holding Company LLC | Scalable distributed in-memory computation utilizing batch mode extensions |
US10366111B1 (en) * | 2015-04-06 | 2019-07-30 | EMC IP Holding Company LLC | Scalable distributed computations utilizing multiple distinct computational frameworks |
US10706970B1 (en) | 2015-04-06 | 2020-07-07 | EMC IP Holding Company LLC | Distributed data analytics |
US10505863B1 (en) | 2015-04-06 | 2019-12-10 | EMC IP Holding Company LLC | Multi-framework distributed computation |
US10812341B1 (en) | 2015-04-06 | 2020-10-20 | EMC IP Holding Company LLC | Scalable recursive computation across distributed data processing nodes |
US10015106B1 (en) | 2015-04-06 | 2018-07-03 | EMC IP Holding Company LLC | Multi-cluster distributed data processing platform |
US10509684B2 (en) | 2015-04-06 | 2019-12-17 | EMC IP Holding Company LLC | Blockchain integration for scalable distributed computations |
US10348810B1 (en) * | 2015-04-06 | 2019-07-09 | EMC IP Holding Company LLC | Scalable distributed computations utilizing multiple distinct clouds |
US10515097B2 (en) * | 2015-04-06 | 2019-12-24 | EMC IP Holding Company LLC | Analytics platform for scalable distributed computations |
US10541936B1 (en) * | 2015-04-06 | 2020-01-21 | EMC IP Holding Company LLC | Method and system for distributed analysis |
US10541938B1 (en) | 2015-04-06 | 2020-01-21 | EMC IP Holding Company LLC | Integration of distributed data processing platform with one or more distinct supporting platforms |
US10404787B1 (en) * | 2015-04-06 | 2019-09-03 | EMC IP Holding Company LLC | Scalable distributed data streaming computations across multiple data processing clusters |
CN104834561B (zh) * | 2015-04-29 | 2018-01-19 | 华为技术有限公司 | 一种数据处理方法及装置 |
WO2017059012A1 (en) * | 2015-09-29 | 2017-04-06 | Skytree, Inc. | Exporting a transformation chain including endpoint of model for prediction |
US10033816B2 (en) * | 2015-09-30 | 2018-07-24 | Amazon Technologies, Inc. | Workflow service using state transfer |
US10013214B2 (en) * | 2015-12-29 | 2018-07-03 | International Business Machines Corporation | Adaptive caching and dynamic delay scheduling for in-memory data analytics |
US10656861B1 (en) | 2015-12-29 | 2020-05-19 | EMC IP Holding Company LLC | Scalable distributed in-memory computation |
US10949251B2 (en) * | 2016-04-01 | 2021-03-16 | Intel Corporation | System and method to accelerate reduce operations in graphics processor |
US10698954B2 (en) * | 2016-06-30 | 2020-06-30 | Facebook, Inc. | Computation platform agnostic data classification workflows |
US10592813B1 (en) * | 2016-11-29 | 2020-03-17 | EMC IP Holding Company LLC | Methods and apparatus for data operation pre-processing with probabilistic estimation of operation value |
US10374968B1 (en) * | 2016-12-30 | 2019-08-06 | EMC IP Holding Company LLC | Data-driven automation mechanism for analytics workload distribution |
US10554577B2 (en) | 2017-03-14 | 2020-02-04 | International Business Machines Corporation | Adaptive resource scheduling for data stream processing |
US10817334B1 (en) * | 2017-03-14 | 2020-10-27 | Twitter, Inc. | Real-time analysis of data streaming objects for distributed stream processing |
US10726007B2 (en) | 2017-09-26 | 2020-07-28 | Microsoft Technology Licensing, Llc | Building heavy hitter summary for query optimization |
US10671436B2 (en) | 2018-05-02 | 2020-06-02 | International Business Machines Corporation | Lazy data loading for improving memory cache hit ratio in DAG-based computational system |
US20190042308A1 (en) * | 2018-08-31 | 2019-02-07 | Intel Corporation | Technologies for providing efficient scheduling of functions |
KR102579058B1 (ko) * | 2018-09-11 | 2023-09-14 | 후아웨이 테크놀러지 컴퍼니 리미티드 | Dag을 순차적으로 계산하기 위한 이종 스케줄링 |
US10901797B2 (en) | 2018-11-06 | 2021-01-26 | International Business Machines Corporation | Resource allocation |
CN110287245B (zh) * | 2019-05-15 | 2021-03-19 | 北方工业大学 | 用于分布式etl任务调度执行的方法及*** |
CN110427252B (zh) * | 2019-06-18 | 2024-03-26 | 平安银行股份有限公司 | 基于任务依赖关系的任务调度方法、装置及存储介质 |
US11269879B2 (en) * | 2020-01-13 | 2022-03-08 | Google Llc | Optimal query scheduling according to data freshness requirements |
CN113495679B (zh) * | 2020-04-01 | 2022-10-21 | 北京大学 | 基于非易失存储介质的大数据存储访问与处理的优化方法 |
CN111475684B (zh) * | 2020-06-29 | 2020-09-22 | 北京一流科技有限公司 | 数据处理网络***及其计算图生成方法 |
US20220012525A1 (en) * | 2020-07-10 | 2022-01-13 | International Business Machines Corporation | Histogram generation |
KR102465932B1 (ko) * | 2020-11-19 | 2022-11-11 | 주식회사 와이즈넛 | 태스크별 플랫폼 선정을 자동화하는 크로스 모델 데이터 통합처리 플랫폼 |
CN112527387B (zh) * | 2020-11-20 | 2024-03-01 | 杭州大搜车汽车服务有限公司 | 应用处理方法和装置 |
CN112529438B (zh) * | 2020-12-18 | 2023-06-09 | 平安银行股份有限公司 | 分布调度***工作流处理方法、装置、计算机设备及存储介质 |
CN113434279A (zh) * | 2021-07-14 | 2021-09-24 | 上海浦东发展银行股份有限公司 | 一种任务执行方法、装置、设备及存储介质 |
CN113379397B (zh) * | 2021-07-16 | 2023-09-22 | 北京华博创科科技股份有限公司 | 一种基于机器学习的云工作流架智能管理与调度*** |
CN114662932B (zh) * | 2022-03-24 | 2024-07-19 | 重庆邮电大学 | 一种节点分级的工作流类定时任务调度方法 |
CN118331712B (zh) * | 2024-06-12 | 2024-08-09 | 北京科杰科技有限公司 | 一种Spark多任务依赖调度方法 |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070156771A1 (en) | 2005-12-19 | 2007-07-05 | Hurley Paul T | Method, device and computer program product for determining a malicious workload pattern |
US20100107142A1 (en) | 2008-10-24 | 2010-04-29 | Microsoft Corporation | Scalability analysis for server systems |
US20110276977A1 (en) | 2010-05-07 | 2011-11-10 | Microsoft Corporation | Distributed workflow execution |
US20110321051A1 (en) | 2010-06-25 | 2011-12-29 | Ebay Inc. | Task scheduling based on dependencies and resources |
US20120290862A1 (en) | 2011-05-13 | 2012-11-15 | International Business Machines Corporation | Optimizing energy consumption utilized for workload processing in a networked computing environment |
US20130166515A1 (en) * | 2011-12-22 | 2013-06-27 | David Kung | Generating validation rules for a data report based on profiling the data report in a data processing tool |
US20130290973A1 (en) | 2011-11-21 | 2013-10-31 | Emc Corporation | Programming model for transparent parallelization of combinatorial optimization |
US20130318277A1 (en) * | 2012-05-22 | 2013-11-28 | Xockets IP, LLC | Processing structured and unstructured data using offload processors |
US20130346988A1 (en) | 2012-06-22 | 2013-12-26 | Microsoft Corporation | Parallel data computing optimization |
EP2752779A2 (de) | 2013-01-07 | 2014-07-09 | Facebook, Inc. | System und Verfahren für verteilte Datenbankabfragemaschine |
US20140229221A1 (en) | 2013-02-11 | 2014-08-14 | Amazon Technologies, Inc. | Cost-minimizing task scheduler |
US20150066646A1 (en) * | 2013-08-27 | 2015-03-05 | Yahoo! Inc. | Spark satellite clusters to hadoop data stores |
US20150378696A1 (en) * | 2014-06-27 | 2015-12-31 | International Business Machines Corporation | Hybrid parallelization strategies for machine learning programs on top of mapreduce |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4165522B2 (ja) * | 2005-04-27 | 2008-10-15 | ブラザー工業株式会社 | 画像読取装置 |
JP4822154B2 (ja) * | 2005-07-29 | 2011-11-24 | 株式会社吉野工業所 | インモールドラベル付き容器とその成形方法 |
WO2010001353A1 (en) * | 2008-07-02 | 2010-01-07 | Nxp B.V. | A multiprocessor circuit using run-time task scheduling |
-
2014
- 2014-10-03 US US14/506,500 patent/US10467569B2/en active Active
-
2015
- 2015-09-22 CN CN201580001459.6A patent/CN105593818B/zh active Active
- 2015-09-22 CA CA2963088A patent/CA2963088C/en active Active
- 2015-09-22 SG SG11201601137RA patent/SG11201601137RA/en unknown
- 2015-09-22 EP EP15846678.9A patent/EP3201771A4/de not_active Withdrawn
- 2015-09-22 WO PCT/US2015/051557 patent/WO2016053695A1/en active Application Filing
-
2016
- 2016-07-21 HK HK16108789.0A patent/HK1221027A1/zh unknown
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070156771A1 (en) | 2005-12-19 | 2007-07-05 | Hurley Paul T | Method, device and computer program product for determining a malicious workload pattern |
US20100107142A1 (en) | 2008-10-24 | 2010-04-29 | Microsoft Corporation | Scalability analysis for server systems |
US20110276977A1 (en) | 2010-05-07 | 2011-11-10 | Microsoft Corporation | Distributed workflow execution |
US20110321051A1 (en) | 2010-06-25 | 2011-12-29 | Ebay Inc. | Task scheduling based on dependencies and resources |
US20120290862A1 (en) | 2011-05-13 | 2012-11-15 | International Business Machines Corporation | Optimizing energy consumption utilized for workload processing in a networked computing environment |
US20130290973A1 (en) | 2011-11-21 | 2013-10-31 | Emc Corporation | Programming model for transparent parallelization of combinatorial optimization |
US20130166515A1 (en) * | 2011-12-22 | 2013-06-27 | David Kung | Generating validation rules for a data report based on profiling the data report in a data processing tool |
US20130318277A1 (en) * | 2012-05-22 | 2013-11-28 | Xockets IP, LLC | Processing structured and unstructured data using offload processors |
US20130346988A1 (en) | 2012-06-22 | 2013-12-26 | Microsoft Corporation | Parallel data computing optimization |
EP2752779A2 (de) | 2013-01-07 | 2014-07-09 | Facebook, Inc. | System und Verfahren für verteilte Datenbankabfragemaschine |
US20140229221A1 (en) | 2013-02-11 | 2014-08-14 | Amazon Technologies, Inc. | Cost-minimizing task scheduler |
US20150066646A1 (en) * | 2013-08-27 | 2015-03-05 | Yahoo! Inc. | Spark satellite clusters to hadoop data stores |
US20150378696A1 (en) * | 2014-06-27 | 2015-12-31 | International Business Machines Corporation | Hybrid parallelization strategies for machine learning programs on top of mapreduce |
Non-Patent Citations (3)
Title |
---|
Extended European Search Report dated Mar. 9, 2018, for EP Application No. 15 846 678.9, filed on Sep. 22, 2015, 8 pages. |
International Search Report and Written Opinion issued to International Patent Application No. PCT/US2015/051557, dated Dec. 17, 2015, 9 pgs. |
Stoica, I. (2014). "Apache Spark and Hadoop: Working Together," located at https://databricks.com/blog/2014/01/21/spark-and-hadoop.html, 4 total pages. |
Also Published As
Publication number | Publication date |
---|---|
EP3201771A4 (de) | 2018-04-11 |
CA2963088A1 (en) | 2016-04-07 |
WO2016053695A1 (en) | 2016-04-07 |
SG11201601137RA (en) | 2016-05-30 |
HK1221027A1 (zh) | 2017-05-19 |
US20160098662A1 (en) | 2016-04-07 |
CN105593818B (zh) | 2020-09-18 |
CA2963088C (en) | 2021-10-26 |
CN105593818A (zh) | 2016-05-18 |
EP3201771A1 (de) | 2017-08-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10467569B2 (en) | Apparatus and method for scheduling distributed workflow tasks | |
Hernández et al. | Using machine learning to optimize parallelism in big data applications | |
Ousterhout et al. | Monotasks: Architecting for performance clarity in data analytics frameworks | |
Rodrigo et al. | Towards understanding HPC users and systems: a NERSC case study | |
Gounaris et al. | Dynamic configuration of partitioning in spark applications | |
Zhang et al. | Automated profiling and resource management of pig programs for meeting service level objectives | |
Grover et al. | Extending map-reduce for efficient predicate-based sampling | |
Tang et al. | Dynamic memory-aware scheduling in spark computing environment | |
US8458136B2 (en) | Scheduling highly parallel jobs having global interdependencies | |
Kroß et al. | Model-based performance evaluation of batch and stream applications for big data | |
Shao et al. | Stage delay scheduling: Speeding up dag-style data analytics jobs with resource interleaving | |
Guo et al. | Automatic task re-organization in MapReduce | |
Kllapi et al. | Elastic processing of analytical query workloads on iaas clouds | |
Geng et al. | A profile-based ai-assisted dynamic scheduling approach for heterogeneous architectures | |
Wang et al. | A speculative parallel decompression algorithm on apache spark | |
Sejdiu et al. | DistLODStats: Distributed computation of RDF dataset statistics | |
Boucheneb et al. | Optimal reachability in cost time Petri nets | |
Lucas Filho et al. | Investigating automatic parameter tuning for sql-on-hadoop systems | |
Malensek et al. | Using distributed analytics to enable real-time exploration of discrete event simulations | |
Bhosale et al. | Big data processing using hadoop: survey on scheduling | |
Cohen et al. | High-performance statistical modeling | |
Liang et al. | Scalable adaptive optimizations for stream-based workflows in multi-HPC-clusters and cloud infrastructures | |
Liu et al. | Multivariate modeling and two-level scheduling of analytic queries | |
Yin et al. | Performance modeling and optimization of MapReduce programs | |
Ghoshal et al. | Characterizing scientific workflows on HPC systems using logs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: DATAMEER, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VOSS, PETER;NAWROCKE, KELLY;MCMANUS, MATTHEW;SIGNING DATES FROM 20141002 TO 20141003;REEL/FRAME:033885/0730 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 4 |