Disclosure of Invention
The invention mainly aims to provide a scheduling optimization method, a scheduling optimization system, a scheduling optimization terminal and a computer readable storage medium based on a job shop scheduling problem in an industrial chain, and aims to solve the problem that a scheduling optimization scheme generated in the prior art is difficult to adapt to specific problems and conditions, so that the accuracy of a finally generated scheduling scheme is low.
In order to achieve the above object, the present invention provides a scheduling optimization method based on job shop scheduling problem in an industrial chain, the scheduling optimization method based on job shop scheduling problem in the industrial chain includes the following steps:
acquiring industry chain information of a target industry chain, and preprocessing the industry chain information to obtain JSSP problems corresponding to the target industry chain;
generating a first extraction graph according to the JSSP problems, and extracting features of the first extraction graph according to a graph neural network which is trained in advance to obtain a first state corresponding to the first extraction graph;
inputting the first extraction graph and the first state into a pre-trained decision network, wherein the decision network performs action selection according to the first state to obtain a first selection result, and updates the first extraction graph according to the first selection result to obtain a second extraction graph;
And performing feature extraction on the second extracted graph according to the graph neural network to obtain a second state corresponding to the second extracted graph, inputting the second extracted graph and the second state into the decision network, performing action selection by the decision network according to the second state to obtain a second selection result, updating the second extracted graph according to the second selection result to obtain a third extracted graph, … … until all action selections are completed, and generating a scheduling optimization scheme aiming at the target industrial chain.
Optionally, in the scheduling optimization method based on job shop scheduling problem in the industrial chain, the acquiring industrial chain information of the target industrial chain, preprocessing the industrial chain information to obtain JSSP problems corresponding to the target industrial chain specifically includes:
Acquiring industry chain information corresponding to a target industry chain, wherein the industry chain information comprises tasks, resources and processing time;
Taking all tasks as jobs, taking all resources as machines, and acquiring process relationships and resource limitations among all jobs;
and generating JSSP problems corresponding to the target industrial chain according to all the jobs, all the resources, all the process relationships and all the resource limitations.
Optionally, in the scheduling optimization method based on job shop scheduling problem in the industry chain, the generating a first extraction graph according to the JSSP problem, and extracting features of the first extraction graph according to a graph neural network that is trained in advance, so as to obtain a first state corresponding to the first extraction graph specifically includes:
Taking the operation of all the jobs in the JSSP problems as different nodes, and distributing corresponding machines for each node according to all the resource limitations to obtain a disjunctive node corresponding to each node;
constructing a directed acyclic graph according to all the extraction nodes, the resource limitations and the process relationships, and taking the directed acyclic graph as a first extraction graph;
Inputting the first extraction graph into a pre-trained graph neural network, and extracting features of the first extraction graph by the graph neural network to obtain a first state of the first extraction graph.
Optionally, the scheduling optimization method based on job shop scheduling problem in the industrial chain, wherein the inputting the first extracted graph into a pre-trained graph neural network, the graph neural network performs feature extraction on the first extracted graph to obtain a first state of the first extracted graph, specifically includes:
The graph neural network respectively acquires adjacent disjunct node sets corresponding to each disjunct node, wherein each adjacent disjunct node set comprises all adjacent disjunct nodes of the corresponding disjunct node;
Respectively obtaining feature vectors of all adjacent disjunct nodes in each adjacent disjunct node set, and respectively carrying out aggregation operation on the feature vectors corresponding to each adjacent disjunct node set to obtain adjacent feature vectors corresponding to each adjacent disjunct node set;
Taking the adjacent feature vector of each adjacent disjunct node set as the disjunct feature vector of the disjunct node corresponding to each adjacent disjunct node set;
And carrying out pooling operation on all the extracted feature vectors to obtain a first global feature vector of the first extracted graph, and taking the first global feature vector as a first state of the first extracted graph.
Optionally, the scheduling optimization method based on job shop scheduling problem in an industrial chain, wherein the inputting the first extraction graph and the first state into a pre-trained decision network, the decision network performs action selection according to the first state to obtain a first selection result, and updates the first extraction graph according to the first selection result to obtain a second extraction graph, includes:
Inputting the first extraction graph and the first state into a pre-trained decision network;
The decision network performs action selection according to the first state, selects any extraction node connected with the starting node of the first extraction graph as a first target node, and takes the first target node as a first selection result;
And updating the extraction arc direction of the first extraction graph according to the first selection result, and generating a second extraction graph.
Optionally, in the scheduling optimization method based on job shop scheduling problem in the industrial chain, the feature extraction is performed on the second extraction graph according to the graph neural network to obtain a second state corresponding to the second extraction graph, the second extraction graph and the second state are input into the decision network, and the decision network performs action selection according to the second state to obtain a second selection result, which specifically includes:
The graph neural network respectively acquires adjacent feature vectors of adjacent disjunct node sets corresponding to each disjunct node in the second disjunct graph, respectively serves as disjunct feature vectors corresponding to each disjunct node in the second disjunct graph, and performs pooling operation according to all disjunct feature vectors of the second disjunct graph to obtain a second state of the second disjunct graph;
Inputting the second extraction graph and the second state into the decision network;
the decision network performs action selection according to the second state, and obtains disjunctive nodes connected with the starting node and disjunctive nodes connected with the first target node except the first target node according to the process relation to serve as a plurality of second target nodes to be selected;
And acquiring the processing time corresponding to each second target node to be selected, taking the second target node to be selected with the shortest processing time as a second target node, and taking the second target node as a second selection result.
Optionally, the method for optimizing scheduling based on job shop scheduling problem in an industrial chain, wherein updating the second extraction graph according to the second selection result to obtain a third extraction graph, … …, until all action selections are completed, generates a scheduling optimization scheme for the target industrial chain, specifically includes:
updating the extraction arc direction of the second extraction graph according to the second selection result to generate a third extraction graph;
Performing feature extraction on the third extraction graph according to the graph neural network to obtain a third state corresponding to the third extraction graph, and inputting the third extraction graph and the third state into the decision network;
the decision network performs action selection according to the third state to obtain a third selection result, and updates the extraction arc direction of the third extraction graph according to the third selection result to generate a fourth extraction graph;
performing feature extraction on the fourth extraction graph according to the graph neural network to obtain a fourth state corresponding to the fourth extraction graph, inputting the fourth extraction graph and the third state into the decision network, … … until action selection of all extraction nodes in the first extraction graph is completed, and obtaining a final extraction graph;
And obtaining all extraction arc directions in the final extraction graph, and generating a scheduling optimization scheme aiming at the target industrial chain according to the process relation and the extraction arc directions.
In addition, in order to achieve the above object, the present invention further provides a scheduling optimization system based on a job shop scheduling problem in an industrial chain, wherein the scheduling optimization system based on the job shop scheduling problem in the industrial chain includes:
the problem construction module is used for acquiring the industry chain information of the target industry chain, preprocessing the industry chain information and obtaining JSSP problems corresponding to the target industry chain;
The feature extraction module is used for generating a first extraction graph according to the JSSP problems, and carrying out feature extraction on the first extraction graph according to a graph neural network which is trained in advance to obtain a first state corresponding to the first extraction graph;
The extraction graph updating module is used for inputting the first extraction graph and the first state into a pre-trained decision network, performing action selection by the decision network according to the first state to obtain a first selection result, and updating the first extraction graph according to the first selection result to obtain a second extraction graph;
The scheme generating module is configured to perform feature extraction on the second extracted graph according to the graph neural network to obtain a second state corresponding to the second extracted graph, input the second extracted graph and the second state into the decision network, perform action selection according to the second state to obtain a second selection result, update the second extracted graph according to the second selection result to obtain a third extracted graph, … … until all action selections are completed, and generate a scheduling optimization scheme for the target industrial chain.
In addition, to achieve the above object, the present invention also provides a terminal, wherein the terminal includes: the system comprises a memory, a processor and a scheduling optimization program which is stored in the memory and can run on the processor and is based on the scheduling problem of the job shop in the industry chain, wherein the scheduling optimization program based on the scheduling problem of the job shop in the industry chain realizes the steps of the scheduling optimization method based on the scheduling problem of the job shop in the industry chain when being executed by the processor.
In addition, in order to achieve the above object, the present invention also provides a computer readable storage medium storing a scheduling optimization program based on a job shop scheduling problem in an industrial chain, which when executed by a processor, implements the steps of the scheduling optimization method based on a job shop scheduling problem in an industrial chain as described above.
In the invention, the industrial chain information of a target industrial chain is acquired, and the industrial chain information is preprocessed to obtain JSSP problems corresponding to the target industrial chain; generating a first extraction graph according to the JSSP problems, and extracting features of the first extraction graph according to a graph neural network which is trained in advance to obtain a first state corresponding to the first extraction graph; inputting the first extraction graph and the first state into a pre-trained decision network, wherein the decision network performs action selection according to the first state to obtain a first selection result, and updates the first extraction graph according to the first selection result to obtain a second extraction graph; and performing feature extraction on the second extracted graph according to the graph neural network to obtain a second state corresponding to the second extracted graph, inputting the second extracted graph and the second state into the decision network, performing action selection by the decision network according to the second state to obtain a second selection result, updating the second extracted graph according to the second selection result to obtain a third extracted graph, … … until all action selections are completed, and generating a scheduling optimization scheme aiming at the target industrial chain. The invention can solve JSSP problems in a short time and obtain satisfactory finishing time, and also ensures the accuracy of the generated scheduling optimization scheme.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clear and clear, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The method for optimizing the scheduling of the job shop scheduling problem in the industrial chain according to the preferred embodiment of the present invention, as shown in fig. 1 and 2, comprises the following steps:
step S10, acquiring industry chain information of a target industry chain, and preprocessing the industry chain information to obtain JSSP problems corresponding to the target industry chain.
Specifically, the pretreatment of the industrial chain information in the target industrial chain is needed to form a corresponding JSSP problem, such as a production plan of material distribution in the industrial chain, a sequence plan of auto parts assembly, a textile processing sequence, and steps of mixing, granulating, tabletting and the like in coordination in pharmaceutical production to meet the production target and the like, and a data structure conforming to the JSSP problem (namely, job+machine is extracted from the industrial chain information of the industrial chain to meet a plurality of jobs formed by operations constrained by multiple sequences, and each operation is limited by resource constraint of a Machine and the like, and the situation that different operations need to use the same Machine and cannot be used simultaneously occurs.
Further, the acquiring the industry chain information of the target industry chain, and preprocessing the industry chain information to obtain JSSP problems corresponding to the target industry chain specifically includes:
Acquiring industry chain information corresponding to a target industry chain, wherein the industry chain information comprises tasks, resources and processing time; taking all tasks as jobs, taking all resources as machines, and acquiring process relationships and resource limitations among all jobs; and generating JSSP problems corresponding to the target industrial chain according to all the jobs, all the resources, all the process relationships and all the resource limitations.
Specifically, when the scheduling problem in the target industry chain is converted into JSSP problem, firstly, the industry chain information corresponding to the target industry chain needs to be acquired, the industry chain information includes tasks, resources and processing time, then the tasks in the target industry chain are abstracted into jobs (Job) and resources are abstracted into machines (Machine), the process relationships among all the jobs are acquired (the relationships such as operation sequence are extracted according to expert experience and the like), the resource limitations of all the jobs are acquired (the limits that the machines are required by a plurality of operations but cannot be met at the same time), and then the representation adapted to JSSP problem is formed according to all the jobs, all the resources, all the process relationships and all the resource limitations.
In addition, it is also necessary to determine the objective to be optimized (the optimization objective of JSSP problem is to minimize the completion time, which is reflected in the mapped problem is to minimize the scheduling time or minimize the resource consumption, specifically, as shown in fig. 2, the scheduling resource element is extracted first and then the optimization objective is determined, the scheduling resource element is referred to as the resource element of the machine), and the optimization objective is converted into the objective function in JSSP (i.e. Reward function, the difference between the minimum completion time of t+1 step and the minimum completion time of t step in JSSP problem, t iteration is subtracted, and the Reward function can obtain a higher Reward value under the current JSSP problem is improved).
Note that JSSP constraint in fig. 2 means that the target industry chain can be mapped into the form of JSSP problem, and if not, the simplification process is performed; the simplification problem in fig. 2, i.e., the elimination of some of the interference conditions in the target industry chain, is similar to the denoising in the optimization problem, if the problem cannot be simplified, it is indicated that such an industry chain problem is not suitable for modeling as JSSP problem, and the related process flow is ended.
In the present invention, the data of the actual industry chain problem is instantiated as an instance of JSSP problem, including specific job, resource and time information. Finally, a proper JSSP solving algorithm, such as a heuristic rule, an accurate solving algorithm or a meta-heuristic algorithm, is selected to solve the JSSP problem after transformation, and the efficiency and the accuracy of problem solving are improved. The transformation can refer to the research and algorithm in JSSP fields, and provides a more systematic and effective solution to the scheduling problem in the industry chain.
And step S20, generating a first extraction graph according to the JSSP problems, and extracting features of the first extraction graph according to a pre-trained graph neural network to obtain a first state corresponding to the first extraction graph.
Specifically, in a preferred embodiment of the present invention, the JSSP problem is modeled as a first extraction graph, and then the first extraction graph is input into a pre-trained Graph Neural Network (GNN), in which all extraction nodes in the first extraction graph are embedded into the GNN using an embedding technique to form a record of global information, that is, a first state corresponding to the first extraction graph.
Further, the generating a first extraction graph according to the JSSP problem, extracting features of the first extraction graph according to a pre-trained graph neural network, and obtaining a first state corresponding to the first extraction graph specifically includes:
Taking the operation of all the jobs in the JSSP problems as different nodes, and distributing corresponding machines for each node according to all the resource limitations to obtain a disjunctive node corresponding to each node; constructing a directed acyclic graph according to all the extraction nodes, the resource limitations and the process relationships, and taking the directed acyclic graph as a first extraction graph; inputting the first extraction graph into a pre-trained graph neural network, and extracting features of the first extraction graph by the graph neural network to obtain a first state of the first extraction graph.
Specifically, different operations corresponding to all jobs in the JSSP problem are obtained first, all the operations are used as different nodes, corresponding machines are distributed to each node according to resource limitation, so that extracted nodes corresponding to each node are obtained, and a directed acyclic graph is built according to process relations and resource limitation among the extracted nodes to serve as a first extracted graph.
As an example, as shown in fig. 3, fig. 3 is a first extraction graph established for a first JSSP problem of 3*3 (JSSP problem of 3Job and 3Machine scale), where S and E represent virtual nodes that start and end, O 11、O12 and O 13 represent three operations corresponding to Job1 (Job 1), O 21、O22 and O 23 represent three operations corresponding to Job2 (Job 2), O 31、O32 and O 33 represent three operations corresponding to Job3 (Job 3), a circle represents Machine 1 (M1) when performing an operation, a square represents Machine 2 (M2) when performing an operation, a triangle represents Machine 3 (M3) when performing an operation, an arrow represents a process relationship (i.e., constraint conditions are determined by extracting constraint resource elements of JSSP problem, as shown in fig. 2 in particular), a dotted line connection represents different operations (dotted line represents an extraction arc) requiring the same Machine to be completed, and a solution process of the first extraction graph is a process for directional processing of the dotted line; after the processing is completed, a final extraction graph (as shown in fig. 4) is obtained, and all extraction arcs in the final extraction graph have definite directions.
And then inputting the first extracted graph into a pre-trained graph neural network, wherein the graph neural network performs feature extraction processing on the first extracted graph, so that a first state corresponding to the first extracted graph can be obtained.
Further, the inputting the first extracted graph into a pre-trained graph neural network, where the graph neural network performs feature extraction on the first extracted graph to obtain a first state of the first extracted graph, specifically includes:
the graph neural network respectively acquires adjacent disjunct node sets corresponding to each disjunct node, wherein each adjacent disjunct node set comprises all adjacent disjunct nodes of the corresponding disjunct node; respectively obtaining feature vectors of all adjacent disjunct nodes in each adjacent disjunct node set, and respectively carrying out aggregation operation on the feature vectors corresponding to each adjacent disjunct node set to obtain adjacent feature vectors corresponding to each adjacent disjunct node set; taking the adjacent feature vector of each adjacent disjunct node set as the disjunct feature vector of the disjunct node corresponding to each adjacent disjunct node set; and carrying out pooling operation on all the extracted feature vectors to obtain a first global feature vector of the first extracted graph, and taking the first global feature vector as a first state of the first extracted graph.
Specifically, a Graph Neural Network (GNN) is an efficient framework for learning graph representations, which employs a neighborhood aggregation scheme that computes representation vectors for nodes by recursively aggregating and transforming representation vectors between neighboring nodes, and which broadly follows a recursive neighborhood aggregation (or messaging) scheme in which each node aggregates feature vectors of its neighbors to compute its new feature vector. Through k aggregation iterations, wherein k represents the number of adjacent nodes of the node, the node is represented by transformed feature vectors, and the feature vectors capture structural information in k-hop neighbors of the node.
The Graph Isomorphic Network (GIN) is a kind of Graph Neural Network (GNN), and the graph isomorphic network can perfectly fit training data and has high empirical characterization capability, so that GIN is adopted as the graph neural network for performing feature extraction on the first extracted graph in the solution JSSP problem in the preferred embodiment of the invention.
The method comprises the steps that a graph neural network firstly obtains all adjacent disjunct nodes corresponding to each disjunct node, integrates the adjacent disjunct nodes to obtain adjacent disjunct node sets corresponding to each disjunct node, respectively obtains feature vectors corresponding to each adjacent node in each adjacent disjunct node set, and then carries out aggregation operation on all feature vectors of each adjacent disjunct node set, so that adjacent feature vectors corresponding to each adjacent disjunct node set can be obtained, at the moment, the adjacent feature vectors of each adjacent disjunct node set are actually disjunct feature vectors of the corresponding disjunct node, and a specific expression formula is as follows:
wherein/> Representing the kth aggregation operation on the disjunct nodes, h being a feature vector, disjunct node v being the operation represented by each disjunct node in the first disjunct graph, N (v) representing the adjacent disjunct nodes (all nodes with a distance of N, N being determined according to parameter settings) of disjunct node v, the optional Aggregate generally having Sum, avg, max, etc. according to actual conditions, and commonly using Combine tandem method,The k-th layer information indicating the field u, which is the information of the last extracted node v and the field information of the kth v node, is integrated with each other, and the field u is any node of N (v).
Therefore, it can be easily known that, under the condition that the iteration times are enough, the global relation information in the first extraction graph can be well represented finally, and finally, the record node information is updated by using the following formula: wherein/> The k-layer information (i.e., extracted feature vector) extracted for the current extraction node v, e (k) is a learnable parameter or fixed scalar in GIN, if it is a learnable parameter, it is learned by neural network, and each iteration is updated.
And then carrying out pooling operation on all the obtained extracted feature vectors, for example, adding, averaging and maximizing the extracted feature vectors of all the extracted nodes in the first extracted graph, wherein the specific formula is as follows: Wherein READOUT is a pooling operation, h G is a global feature vector corresponding to a disjunctive graph, a set formed by aggregation of READOUT operations synthesizes results of node features in the last iteration to obtain h G, and h G is used as a State representation in a subsequent MDP decision (markov decision) process, that is, a first State (i.e., an initial State, denoted as s 0) corresponding to the first disjunctive graph.
Step S30, inputting the first extraction graph and the first state into a pre-trained decision network, performing action selection by the decision network according to the first state to obtain a first selection result, and updating the first extraction graph according to the first selection result to obtain a second extraction graph.
Specifically, after the first extraction graph is constructed, the step of solving JSSP the problem continuous decision using |o| (i o| representing the set of operations) based on the PDR (Priority dispatching rule, priority scheduling rule) is performed (in practice, the process of action selection), in each of which a qualified set of operations (i.e., those legal operations for which previous operations have been scheduled) first need to be identified, then a specific PDR is applied to calculate the priority index for each eligible operation, and the operation with the highest priority is selected for scheduling (or allocation); in addition, an appropriate start time is required to be selected for the operation with the highest priority, so that the time is more compact and the efficiency is higher, obviously, the operation principle can be corresponding earlier, and once all operations are selected, a corresponding scheduling optimization scheme is generated.
Further, the inputting the first extraction graph and the first state into a pre-trained decision network, the decision network performs action selection according to the first state to obtain a first selection result, and updates the first extraction graph according to the first selection result to obtain a second extraction graph, which specifically includes:
Inputting the first extraction graph and the first state into a pre-trained decision network; the decision network performs action selection according to the first state, selects any extraction node connected with the starting node of the first extraction graph as a first target node, and takes the first target node as a first selection result; and updating the extraction arc direction of the first extraction graph according to the first selection result, and generating a second extraction graph.
Specifically, in the preferred embodiment of the present invention, the decision network learns an operation selection sequence to form a final optimal scheduling scheme by means of reinforcement learning DecayP O method. In this strategy learning process, the DecayP O method provides us with a larger solution space (P3O) than the conventional Proximal Policy Optimization (PPO) training method. And by utilizing the characteristic that the operation scheduled earlier in the process scheduling problem has larger influence, the dynamic attenuation is introduced and integrated into the loss function. Thus, when training the policy Agent, the policy Agent can learn a better scheduling scheme faster. Finally, as the global characteristic information of the related graph is extracted by the graph neural network, the scheduling model trained by the user can be well migrated to other problems with similar structures and small scale, and an effective method is provided for solving the scheduling problems in other industrial chains.
In the preferred embodiment of the present invention, the first extraction graph and the first state are input into the pre-trained decision network, and the decision network performs action selection on the first extraction graph, that is, determines what the first operation is, for example, as shown in fig. 3, since the processing time of which operation is performed by the three operations of the initial step, O 11、O21 and O 31, is similar, any extraction node (one of O 11、O21 and O 31) connected to the starting node may be selected as the first target node, here we select O 21 as the first target point, and set O 21 with a shadow, which indicates that the selection has been performed, and the above process of selecting O 21 is the decision step t=1 of the decision network; then updating the extraction arc direction of the first extraction graph according to the selection result, and generating a second extraction graph (shown in fig. 5) without changing the extraction arc direction because the selection has no influence on the extraction arc direction.
And S40, performing feature extraction on the second extracted graph according to the graph neural network to obtain a second state corresponding to the second extracted graph, inputting the second extracted graph and the second state into the decision network, performing action selection by the decision network according to the second state to obtain a second selection result, updating the second extracted graph according to the second selection result to obtain a third extracted graph, … … until all action selections are completed, and generating a scheduling optimization scheme aiming at the target industrial chain.
Specifically, after the second extraction graph is obtained, feature extraction is performed on the second extraction graph to obtain a second state, the second extraction graph and the second state are input into a decision network, the decision network performs corresponding action selection, the second extraction graph is updated according to a second selection result of the action selection to obtain a third extraction graph, feature extraction is performed on the third extraction graph … …, and the operation is repeated until all the action selections are completed, namely all the operations in the extraction graph are selected to be completed, so that a scheduling optimization scheme aiming at a target industrial chain can be produced.
Further, the feature extraction is performed on the second extracted graph according to the graph neural network to obtain a second state corresponding to the second extracted graph, the second extracted graph and the second state are input into the decision network, the decision network performs action selection according to the second state to obtain a second selection result, and the method specifically includes:
The graph neural network respectively acquires adjacent feature vectors of adjacent disjunct node sets corresponding to each disjunct node in the second disjunct graph, respectively serves as disjunct feature vectors corresponding to each disjunct node in the second disjunct graph, and performs pooling operation according to all disjunct feature vectors of the second disjunct graph to obtain a second state of the second disjunct graph; inputting the second extraction graph and the second state into the decision network; the decision network performs action selection according to the second state, and obtains disjunctive nodes connected with the starting node and disjunctive nodes connected with the first target node except the first target node according to the process relation to serve as a plurality of second target nodes to be selected; and acquiring the processing time corresponding to each second target node to be selected, taking the second target node to be selected with the shortest processing time as a second target node, and taking the second target node as a second selection result.
Specifically, after the second extraction graph is obtained, feature extraction processing is performed on the second extraction graph, and similar to the above steps, adjacent feature vectors of adjacent extraction node sets of each extraction node in the second extraction graph are obtained, the corresponding adjacent feature vectors are respectively used as extraction feature vectors of each extraction node, and then pooling operation is performed on all the extraction feature vectors, so that a second state of the second extraction graph is obtained.
And inputting the second state and the second extraction graph into the decision network, as shown in fig. 5, wherein the second extraction graph in fig. 5 has selected O 21 as a first selection result, that is, O 21 is a qualified operation, and acquiring an optional operation at this time, in this step, the optional operations are O 11、O31 and O 22, and nodes corresponding to the three operations can be used as second target nodes to be selected, and the node with the shortest processing time in the three second target nodes to be selected is selected as the second target node, so that the selection of the second extraction graph is completed, for example, the processing time of O 31 is the shortest, and then, O 31 is used as the second selection result.
Further, updating the second extraction graph according to the second selection result to obtain a third extraction graph … …, until all action selections are completed, and generating a scheduling optimization scheme for the target industrial chain, which specifically includes:
Updating the extraction arc direction of the second extraction graph according to the second selection result to generate a third extraction graph; performing feature extraction on the third extraction graph according to the graph neural network to obtain a third state corresponding to the third extraction graph, and inputting the third extraction graph and the third state into the decision network; the decision network performs action selection according to the third state to obtain a third selection result, and updates the extraction arc direction of the third extraction graph according to the third selection result to generate a fourth extraction graph; performing feature extraction on the fourth extraction graph according to the graph neural network to obtain a fourth state corresponding to the fourth extraction graph, inputting the fourth extraction graph and the third state into the decision network, … … until action selection of all extraction nodes in the first extraction graph is completed, and obtaining a final extraction graph; and obtaining all extraction arc directions in the final extraction graph, and generating a scheduling optimization scheme aiming at the target industrial chain according to the process relation and the extraction arc directions.
Specifically, at decision step t, the state s t is a disjunctive graph G (t) = (O, C ∈du (t), D (t)) representing the state of the current solution, where O is the set of operations (represented as operations in the disjunctive graph), C is the process constraint (represented by solid arrows),All (directed) extraction arcs (indicated by dashed lines with arrows) containing the specified direction up to t, whileIncluding arcs (indicated by non-arrowed dashed lines) that have not yet been assigned a direction. Initial state S 0 represents the first disjunctive graph of the original JSSP problem (as shown in fig. 3), while final state S T represents the complete solution (as shown in fig. 4), i.e.I.e. all extracted arcs have been directed
At decision step t, action a t∈At is a qualified operation, A t represents an optional action space at decision step t (which action space is determined according to existing JSSP scheduling constraints), and a t represents the specific operation selected at operation step t. Since at most one operation can be ready per job at t, the maximum size of the operation space is |j| (e.g., JSSP problem for the distance 3*3 of the present invention, the maximum size of the operation space is 3), depending on the number of jobs for which the JSSP problem is being solved. In the solving process, |a t | becomes smaller as more work is done.
Once the PDR determines the operation to dispatch, we first find the earliest feasible time period allocated on the desired machine. Then, the machine's arc direction is updated according to the current time relationship, and a new arc map is generated as a new state s t+1, for example, as shown in fig. 6, in the case of state s 6 (corresponding to the seventh arc map), that is, action a 6 (selecting the seventh selection result), when determining that all three operations of Job2 have been selected to be completed in the optional action space of a 6, after Mask operation is performed on Job2, the number of optional operations is changed from 3 to 2, that is { O 13,O32 }, so that at this time, action a 6 selects one from O 13 and O 32 as a seventh selection result, and finally, the processing time of O 13 is found to be shorter, so that O 13 is selected as the seventh selection result, and the arc direction of the seventh arc map (the left image in fig. 6) is updated according to O 13, so that the eighth arc map (the right image in fig. 6) is obtained.
Mask is a masking strategy that, when a job is completed, does not take into account the job repeatedly for convenience of subsequent computation, but marks the completed form in a matrix, facilitating skipping of the completed portion when a subsequent operation is selected, called Mask.
Repeating the steps until all the extraction nodes in the first extraction graph are selected, that is, all the operations in the first extraction graph are selected, obtaining a final extraction graph (as shown in fig. 4), obtaining all the extraction arc directions in the final extraction graph at this time, and generating a scheduling optimization scheme for a target industrial chain according to the process relation and the extraction arc directions, that is, the execution sequence of each operation finally, wherein the specific scheduling optimization scheme can be presented in a report form or a chart form, and is specifically set according to the requirement.
Further, since the MDP is targeted for gradual scheduling, the completion time is minimized. Thus, the bonus function R (s t,at) is designed as the quality difference between the partial solutions corresponding to the two states s t and s t+1, i.e., R (s t,at)=H(st)-H(st+1), where H (-) is defined as the lower bound of the finishing time C max, and the calculation formula is H (s t)=maxi,j{CLB(Oi,j,st) }, where C LB is the lower bound of the finishing time; for the final state s |o|, it is apparent that there is H (s |o|=Cmax because all operations have been scheduled, therefore, when the discount factor γ=1, the cumulative reward isAssuming that H (s 0) is a constant, maximizing the jackpot is consistent with minimizing the completion time, and the MDP decision process is continually optimized by the rewarding function.
Further, decayP O is a solution interval, and also considers the importance degree in the case of sequential action selection. JSSP the problem belongs to the sequential job scheduling problem, the PDR of each step is taken as an action selection, the formed action sequence is constrained by the sequence, and the starting time of the subsequent actions is strongly related to the preceding actions. Therefore, decayP O is very suitable, and similar Attention effects are well achieved while the retaining structure is more concise than that of Attention. The loss function is as follows: wherein r t (θ) is/> Representing the degree of change between the new strategy and the old strategy; /(I)Representing a dominance estimate; sigma is a sigmoid function; τ is a super parameter; beta is a regularization term; KL is KL divergence; frac is a dynamic decay function that dynamically adjusts the importance level according to the policy parameter order; feedback in a loss form enables policy learning to be more focused on learning of preamble operations.
Further, as shown in fig. 7, the present invention further provides a scheduling optimization system based on the job shop scheduling problem in the industrial chain based on the scheduling optimization method based on the job shop scheduling problem in the industrial chain, where the scheduling optimization system based on the job shop scheduling problem in the industrial chain includes:
The problem construction module 51 is configured to obtain industry chain information of a target industry chain, and preprocess the industry chain information to obtain JSSP problems corresponding to the target industry chain;
The feature extraction module 52 is configured to generate a first extraction graph according to the JSSP problem, and perform feature extraction on the first extraction graph according to a graph neural network that is trained in advance, so as to obtain a first state corresponding to the first extraction graph;
The extraction graph updating module 53 is configured to input the first extraction graph and the first state into a pre-trained decision network, perform action selection according to the first state, obtain a first selection result, and update the first extraction graph according to the first selection result to obtain a second extraction graph;
The scheme generating module 54 is configured to perform feature extraction on the second extracted graph according to the graph neural network, obtain a second state corresponding to the second extracted graph, input the second extracted graph and the second state into the decision network, perform action selection according to the second state, obtain a second selection result, update the second extracted graph according to the second selection result, obtain a third extracted graph, … …, and generate a scheduling optimization scheme for the target industrial chain until all action selections are completed.
Further, as shown in fig. 8, based on the above-mentioned scheduling optimization method and system based on job shop scheduling problem in the industry chain, the present invention further provides a terminal correspondingly, where the terminal includes a processor 10, a memory 20 and a display 30. Fig. 8 shows only some of the components of the terminal, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead.
The memory 20 may in some embodiments be an internal storage unit of the terminal, such as a hard disk or a memory of the terminal. The memory 20 may in other embodiments also be an external storage device of the terminal, such as a plug-in hard disk provided on the terminal, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), etc. Further, the memory 20 may also include both an internal storage unit and an external storage device of the terminal. The memory 20 is used for storing application software installed in the terminal and various data, such as program codes of the installation terminal. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 stores a scheduling optimization program 40 based on job shop scheduling problem in the industry chain, and the scheduling optimization program 40 based on job shop scheduling problem in the industry chain can be executed by the processor 10, so as to implement the scheduling optimization method based on job shop scheduling problem in the industry chain in the present application.
The processor 10 may in some embodiments be a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chip for executing program code or processing data stored in the memory 20, such as performing the scheduling optimization method based on job shop scheduling problems in the industry chain.
The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like in some embodiments. The display 30 is used for displaying information at the terminal and for displaying a visual user interface. The components 10-30 of the terminal communicate with each other via a system bus.
In one embodiment, the following steps are implemented when the processor 10 executes the dispatch optimizer 40 in the memory 20 based on job shop scheduling problems in the industry chain:
acquiring industry chain information of a target industry chain, and preprocessing the industry chain information to obtain JSSP problems corresponding to the target industry chain;
generating a first extraction graph according to the JSSP problems, and extracting features of the first extraction graph according to a graph neural network which is trained in advance to obtain a first state corresponding to the first extraction graph;
inputting the first extraction graph and the first state into a pre-trained decision network, wherein the decision network performs action selection according to the first state to obtain a first selection result, and updates the first extraction graph according to the first selection result to obtain a second extraction graph;
And performing feature extraction on the second extracted graph according to the graph neural network to obtain a second state corresponding to the second extracted graph, inputting the second extracted graph and the second state into the decision network, performing action selection by the decision network according to the second state to obtain a second selection result, updating the second extracted graph according to the second selection result to obtain a third extracted graph, … … until all action selections are completed, and generating a scheduling optimization scheme aiming at the target industrial chain.
The method for obtaining the industrial chain information of the target industrial chain includes the steps of preprocessing the industrial chain information to obtain JSSP problems corresponding to the target industrial chain, and specifically includes:
Acquiring industry chain information corresponding to a target industry chain, wherein the industry chain information comprises tasks, resources and processing time;
Taking all tasks as jobs, taking all resources as machines, and acquiring process relationships and resource limitations among all jobs;
and generating JSSP problems corresponding to the target industrial chain according to all the jobs, all the resources, all the process relationships and all the resource limitations.
Generating a first extraction graph according to the JSSP problems, and extracting features of the first extraction graph according to a graph neural network which is trained in advance to obtain a first state corresponding to the first extraction graph, wherein the method specifically comprises the following steps of:
Taking the operation of all the jobs in the JSSP problems as different nodes, and distributing corresponding machines for each node according to all the resource limitations to obtain a disjunctive node corresponding to each node;
constructing a directed acyclic graph according to all the extraction nodes, the resource limitations and the process relationships, and taking the directed acyclic graph as a first extraction graph;
Inputting the first extraction graph into a pre-trained graph neural network, and extracting features of the first extraction graph by the graph neural network to obtain a first state of the first extraction graph.
The step of inputting the first extraction graph into a pre-trained graph neural network, wherein the graph neural network performs feature extraction on the first extraction graph to obtain a first state of the first extraction graph, and specifically comprises the following steps:
The graph neural network respectively acquires adjacent disjunct node sets corresponding to each disjunct node, wherein each adjacent disjunct node set comprises all adjacent disjunct nodes of the corresponding disjunct node;
Respectively obtaining feature vectors of all adjacent disjunct nodes in each adjacent disjunct node set, and respectively carrying out aggregation operation on the feature vectors corresponding to each adjacent disjunct node set to obtain adjacent feature vectors corresponding to each adjacent disjunct node set;
Taking the adjacent feature vector of each adjacent disjunct node set as the disjunct feature vector of the disjunct node corresponding to each adjacent disjunct node set;
And carrying out pooling operation on all the extracted feature vectors to obtain a first global feature vector of the first extracted graph, and taking the first global feature vector as a first state of the first extracted graph.
The step of inputting the first extraction graph and the first state into a pre-trained decision network, wherein the decision network performs action selection according to the first state to obtain a first selection result, and updates the first extraction graph according to the first selection result to obtain a second extraction graph, and the method specifically comprises the steps of:
Inputting the first extraction graph and the first state into a pre-trained decision network;
The decision network performs action selection according to the first state, selects any extraction node connected with the starting node of the first extraction graph as a first target node, and takes the first target node as a first selection result;
And updating the extraction arc direction of the first extraction graph according to the first selection result, and generating a second extraction graph.
The feature extraction is performed on the second extracted graph according to the graph neural network to obtain a second state corresponding to the second extracted graph, the second extracted graph and the second state are input into the decision network, the decision network performs action selection according to the second state to obtain a second selection result, and the method specifically comprises the following steps:
The graph neural network respectively acquires adjacent feature vectors of adjacent disjunct node sets corresponding to each disjunct node in the second disjunct graph, respectively serves as disjunct feature vectors corresponding to each disjunct node in the second disjunct graph, and performs pooling operation according to all disjunct feature vectors of the second disjunct graph to obtain a second state of the second disjunct graph;
Inputting the second extraction graph and the second state into the decision network;
the decision network performs action selection according to the second state, and obtains disjunctive nodes connected with the starting node and disjunctive nodes connected with the first target node except the first target node according to the process relation to serve as a plurality of second target nodes to be selected;
And acquiring the processing time corresponding to each second target node to be selected, taking the second target node to be selected with the shortest processing time as a second target node, and taking the second target node as a second selection result.
Updating the second extraction graph according to the second selection result to obtain a third extraction graph … … until all action selections are completed, and generating a scheduling optimization scheme for the target industrial chain, wherein the method specifically comprises the following steps of:
updating the extraction arc direction of the second extraction graph according to the second selection result to generate a third extraction graph;
Performing feature extraction on the third extraction graph according to the graph neural network to obtain a third state corresponding to the third extraction graph, and inputting the third extraction graph and the third state into the decision network;
the decision network performs action selection according to the third state to obtain a third selection result, and updates the extraction arc direction of the third extraction graph according to the third selection result to generate a fourth extraction graph;
performing feature extraction on the fourth extraction graph according to the graph neural network to obtain a fourth state corresponding to the fourth extraction graph, inputting the fourth extraction graph and the third state into the decision network, … … until action selection of all extraction nodes in the first extraction graph is completed, and obtaining a final extraction graph;
And obtaining all extraction arc directions in the final extraction graph, and generating a scheduling optimization scheme aiming at the target industrial chain according to the process relation and the extraction arc directions.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
Of course, those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by a computer program for instructing relevant hardware (e.g., processor, controller, etc.), the program may be stored on a computer readable storage medium, and the program may include the above described methods when executed. The computer readable storage medium may be a memory, a magnetic disk, an optical disk, etc.