Zhang et al., 2025 - Google Patents
Cache-Aware Transformer-Based Scheduling for LLM-Driven IoT Workflows in Multi-CloudsZhang et al., 2025
- Document ID
- 17722529931946095430
- Author
- Zhang J
- Mashayekhy L
- Publication year
- Publication venue
- 2025 IEEE Cloud Summit
External Links
Snippet
The integration of Large Language Models (LLMs) into Internet-of-Things (IoT) ecosystems has enabled users to issue high-level natural-language intents that are automatically translated into fine-grained, serverless workflows using protocols such as the Model Context …
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G06F9/4887—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues involving deadlines, e.g. rate based, periodic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models
- G06Q10/063—Operations research or analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110737529B (en) | Short-time multi-variable-size data job cluster scheduling adaptive configuration method | |
Braun et al. | A taxonomy for describing matching and scheduling heuristics for mixed-machine heterogeneous computing systems | |
Yu et al. | Faasrank: Learning to schedule functions in serverless platforms | |
Li et al. | Performance modeling and predictive scheduling for distributed stream data processing | |
Ding et al. | Kubernetes-oriented microservice placement with dynamic resource allocation | |
CN107038070B (en) | A reliability-aware parallel task scheduling method in cloud environment | |
Cheng et al. | Adaptive scheduling of parallel jobs in spark streaming | |
Prodan et al. | Overhead analysis of scientific workflows in grid environments | |
Zhang et al. | Combined fault tolerance and scheduling techniques for workflow applications on computational grids | |
Nguyen et al. | Monad: Self-adaptive micro-service infrastructure for heterogeneous scientific workflows | |
Razavi et al. | FA2: Fast, accurate autoscaling for serving deep learning inference with SLA guarantees | |
Han et al. | Workload-adaptive configuration tuning for hierarchical cloud schedulers | |
Ivashko et al. | A survey of desktop grid scheduling | |
Garg et al. | Fault tolerant task scheduling on computational grid using checkpointing under transient faults | |
Raman et al. | Computation of workflow scheduling using backpropagation neural network in cloud computing: a virtual machine placement approach | |
Incerto et al. | Symbolic performance adaptation | |
Mendoza et al. | Model selection for latency-critical inference serving | |
Sahu et al. | Multiobjective Prioritized Workflow Scheduling in Cloud Computing Using Cuckoo Search Algorithm | |
Wen et al. | Fast DRL-based scheduler configuration tuning for reducing tail latency in edge-cloud jobs | |
Zhang et al. | Cache-Aware Transformer-Based Scheduling for LLM-Driven IoT Workflows in Multi-Clouds | |
Jain et al. | Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Load Balancing | |
Yue et al. | Demeter: Fine-grained function orchestration for geo-distributed serverless analytics | |
Lu et al. | SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing | |
Collins et al. | Parallel and sequential job scheduling in heterogeneous clusters: A simulation study using software in the loop | |
Zafarzade et al. | Capacity planning of a microservices-based image classification application using analytic modeling |