Zhang et al., 2025 - Google Patents

Cache-Aware Transformer-Based Scheduling for LLM-Driven IoT Workflows in Multi-Clouds

Zhang et al., 2025

Document ID: 17722529931946095430
Author: Zhang J; Mashayekhy L
Publication year: 2025
Publication venue: 2025 IEEE Cloud Summit

External Links

Cited by

Snippet

The integration of Large Language Models (LLMs) into Internet-of-Things (IoT) ecosystems has enabled users to issue high-level natural-language intents that are automatically translated into fine-grained, serverless workflows using protocols such as the Model Context …

Continue reading at ieeexplore.ieee.org (other versions)

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G06F9/4887—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues involving deadlines, e.g. rate based, periodic
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models
- G06Q10/063—Operations research or analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass

Similar Documents

Publication	Publication Date	Title
CN110737529B (en)	2022-02-08	Short-time multi-variable-size data job cluster scheduling adaptive configuration method
Braun et al.	1998	A taxonomy for describing matching and scheduling heuristics for mixed-machine heterogeneous computing systems
Yu et al.	2021	Faasrank: Learning to schedule functions in serverless platforms
Li et al.	2016	Performance modeling and predictive scheduling for distributed stream data processing
Ding et al.	2022	Kubernetes-oriented microservice placement with dynamic resource allocation
CN107038070B (en)	2021-04-16	A reliability-aware parallel task scheduling method in cloud environment
Cheng et al.	2017	Adaptive scheduling of parallel jobs in spark streaming
Prodan et al.	2008	Overhead analysis of scientific workflows in grid environments
Zhang et al.	2009	Combined fault tolerance and scheduling techniques for workflow applications on computational grids
Nguyen et al.	2017	Monad: Self-adaptive micro-service infrastructure for heterogeneous scientific workflows
Razavi et al.	2022	FA2: Fast, accurate autoscaling for serving deep learning inference with SLA guarantees
Han et al.	2019	Workload-adaptive configuration tuning for hierarchical cloud schedulers
Ivashko et al.	2018	A survey of desktop grid scheduling
Garg et al.	2014	Fault tolerant task scheduling on computational grid using checkpointing under transient faults
Raman et al.	2021	Computation of workflow scheduling using backpropagation neural network in cloud computing: a virtual machine placement approach
Incerto et al.	2016	Symbolic performance adaptation
Mendoza et al.	2024	Model selection for latency-critical inference serving
Sahu et al.	2023	Multiobjective Prioritized Workflow Scheduling in Cloud Computing Using Cuckoo Search Algorithm
Wen et al.	2023	Fast DRL-based scheduler configuration tuning for reducing tail latency in edge-cloud jobs
Zhang et al.	2025	Cache-Aware Transformer-Based Scheduling for LLM-Driven IoT Workflows in Multi-Clouds
Jain et al.	2024	Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Load Balancing
Yue et al.	2024	Demeter: Fine-grained function orchestration for geo-distributed serverless analytics
Lu et al.	2024	SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing
Collins et al.	2001	Parallel and sequential job scheduling in heterogeneous clusters: A simulation study using software in the loop
Zafarzade et al.	2025	Capacity planning of a microservices-based image classification application using analytic modeling