Fu et al., 2024 - Google Patents

Improving data locality of tasks by executor allocation in Spark computing environment

Fu et al., 2024

Document ID: 14719705551490982756
Author: Fu Z; He M; Yi Y; Tang Z
Publication year: 2024
Publication venue: IEEE Transactions on Cloud Computing

External Links

Cited by

Snippet

The concept of data locality is crucial for distributed systems (eg, Spark and Hadoop) to process Big Data. Most of the existing research optimized the data locality from the aspect of task scheduling. However, as the execution container of Spark's tasks, the executor …

Continue reading at ieeexplore.ieee.org (other versions)

230000006854 communication 0 abstract description 33

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network-specific arrangements or communication protocols supporting networked applications
- H04L67/10—Network-specific arrangements or communication protocols supporting networked applications in which an application is distributed across nodes in the network

Similar Documents

Publication	Publication Date	Title
Fu et al.	2020	An optimal locality-aware task scheduling algorithm based on bipartite graph modelling for spark applications
Pakize	2014	A comprehensive view of Hadoop MapReduce scheduling algorithms
Fu et al.	2024	Improving data locality of tasks by executor allocation in Spark computing environment
Gandomi et al.	2019	HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework
Maleki et al.	2019	MapReduce: an infrastructure review and research insights
Senthilkumar et al.	2016	A survey on job scheduling in big data
Javanmardi et al.	2021	A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems.
Li et al.	2014	MapReduce delay scheduling with deadline constraint
Maleki et al.	2020	TMaR: a two-stage MapReduce scheduler for heterogeneous environments
Ji et al.	2017	Adaptive workflow scheduling for diverse objectives in cloud environments
Hadjar et al.	2019	A new approach for scheduling tasks and/or jobs in big data cluster
Aarthee et al.	2023	Energy-aware heuristic scheduling using bin packing mapreduce scheduler for heterogeneous workloads performance in big data
Idris et al.	2015	Context‐aware scheduling in MapReduce: a compact review
Ghazali et al.	2022	CLQLMRS: improving cache locality in MapReduce job scheduling using Q-learning
Perwej	2018	The ambient scrutinize of scheduling algorithms in big data territory
SM et al.	2014	Priority based resource allocation and demand based pricing model in peer-to-peer clouds
Fu et al.	2022	Load balancing algorithms for hadoop cluster in unbalanced environment
Lin et al.	2014	Impact of MapReduce policies on job completion reliability and job energy consumption
Li et al.	2018	Migration-based online CPSCN big data analysis in data centers
Fu et al.	2023	Optimizing data locality by executor allocation in spark computing environment
Xu et al.	2022	Multi resource scheduling with task cloning in heterogeneous clusters
Pandey et al.	2018	An energy-efficient greedy MapReduce scheduler for heterogeneous Hadoop YARN cluster
Chen et al.	2017	A real-time scheduling strategy based on processing framework of Hadoop
He et al.	2025	Design and implementation of fully distributed heterogeneous resource management system
Dadamis et al.	2025	IgNITE: Scheduling pipeline-parallel DNN training jobs on heterogeneous infrastructures