CN121138812A

CN121138812A - Drilling auxiliary decision-making method, device and machine-readable storage medium

Info

Publication number: CN121138812A
Application number: CN202511669106.5A
Authority: CN
Inventors: 祝兆鹏; 李明伟; 宋先知; 李根生; 张诚恺; 周蒙蒙; 朱林
Original assignee: China University of Petroleum Beijing
Current assignee: China University of Petroleum Beijing
Priority date: 2025-11-14
Filing date: 2025-11-14
Publication date: 2025-12-16

Abstract

The application relates to the technical field of oil and gas exploration, and discloses a drilling auxiliary decision-making method, a device and a machine-readable storage medium. The drilling auxiliary decision-making method comprises the steps of processing first real-time operation data comprising first real-time working parameters and first real-time drilling data by means of a first initial decision-making model obtained based on behavior clone training to obtain an initial decision, determining actual operation and actual application by a driller according to the initial decision, generating second real-time operation data comprising second real-time working parameters of drilling equipment and second real-time drilling data, determining a first intention label for representing an engineering target of the actual operation according to the second real-time operation data, inputting the second real-time operation data and the first intention label into a decision-making optimization model determined based on reinforcement learning, and finally obtaining the drilling decision efficiently and accurately.

Description

Drilling auxiliary decision-making method, device and machine-readable storage medium

Technical Field

The application relates to the technical field of oil and gas exploration, in particular to a drilling auxiliary decision-making method, a device and a machine-readable storage medium.

Background

Drillers are the core operators in drilling operations and are generally responsible for specific job commands and equipment operations at the drilling site. Thus, driller decision quality is directly related to drilling efficiency, drilling quality, and operational safety.

In the existing drilling operation, the operation environment is complex and the uncertainty is high, and the traditional decision support system usually depends on static rules, so that the decision efficiency and the precision are not high, and the requirement of an operation site cannot be met.

Disclosure of Invention

The embodiment of the application aims to provide a drilling auxiliary decision-making method which is used for solving the problems that in the prior art, the decision-making speed is low, the decision-making accuracy is not strong and the decision cannot be matched with the operation requirement correctly.

To achieve the above object, a first aspect of the present application provides a drilling aid decision making method, comprising:

acquiring first real-time operation data, wherein the first real-time operation data comprises first real-time working parameters of drilling equipment and first real-time drilling data;

Inputting the first real-time operation data into an initial decision model to obtain an initial decision, wherein the initial decision model is obtained based on behavior cloning;

acquiring second real-time operation data of the drilling equipment, wherein the second real-time operation data is obtained after a driller determines actual operation based on an initial decision and applies the actual operation to the drilling equipment;

Determining a first intention label according to the second real-time operation data, wherein the first intention label is used for representing an engineering target of actual operation;

based on reinforcement learning, the second real-time operation data and the first intention label are input into a decision optimization model to obtain an optimization decision.

In the embodiment of the application, the training process of the initial decision model comprises the following steps:

Acquiring historical operation data and corresponding historical decision information, wherein the historical operation data comprises historical working parameters of drilling equipment and historical drilling data;

determining a second intent tag based on the historical drilling data, wherein the second intent tag is used to characterize motivations for the historical decision information;

inputting the historical operation data, the corresponding historical decision information and the second intention label into a preset neural network to obtain the predictive decision information corresponding to the historical operation data;

calculating a loss value corresponding to the historical operation data based on the first loss function according to the historical decision information and the predictive decision information;

And iteratively training the preset neural network by taking the minimum loss value as a target, and ending training until the preset first training ending condition is met so as to obtain an initial decision model.

In the embodiment of the application, the prediction decision information comprises whether a driller makes a decision, the type of the working parameter to be adjusted corresponding to the decision and the adjustment value of each working parameter type.

In the embodiment of the application, the first loss function is obtained by weighted summation of decision occurrence judgment loss, decision type loss and decision parameter adjustment value loss, wherein the decision occurrence judgment loss is a cross entropy loss function, the decision type loss is a multi-label cross entropy loss function, and the decision parameter adjustment value loss is a mean square error loss function.

In an embodiment of the application, the historical decision information comprises a plurality of historical decisions within a predetermined distance of movement of the drilling apparatus.

In the embodiment of the application, the second real-time operation data and the first intention label based on reinforcement learning are input into a decision optimization model to obtain an optimization decision, and the method comprises the following steps:

according to the second real-time operation data as a state space, and taking a first intention label as state enhancement of the state space, inputting the state enhancement into a decision optimization model, and determining a target action in an action space, wherein the action space comprises a change state of a target working parameter;

Determining a desired total prize through a prize function and a second loss function according to the action space and the first intention label;

The optimization decision is determined with the goal of maximizing the desired total prize.

In the embodiment of the application, the reward function is obtained by weighted summation according to the consistency rewards of the initial decision and the actual operation and the consistency rewards of the actual operation and the first intention label.

In an embodiment of the present application, the second loss function includes:

Where L _RL is the second loss function, Q represents the expected total rewards that can be achieved by taking some action in a certain state, Representing the desire to calculate the difference in Q values for all states s and actions a, R is the actual reward obtained by taking some action in the current state, γ is the discount factor,Is the maximum Q value that the agent takes in the next state s', Q (s, a) is the Q value that the agent takes action a in state s.

A second aspect of the application provides a drilling aid decision making apparatus comprising:

a memory configured to store instructions, and

A processor configured to debug instructions from the memory and a method capable of implementing any of the drilling assistance decisions when executing the instructions.

A third aspect of the application provides a machine-readable storage medium having stored thereon instructions for causing a machine to perform the method of drilling assistance decision making of any one of the claims.

According to the technical scheme, the first real-time operation data comprising the first real-time working parameters and the first real-time drilling data are processed by means of the first initial decision model obtained based on behavior clone training to obtain an initial decision, the driller determines actual operation according to the initial decision and performs actual application, generates second real-time operation data comprising the second real-time working parameters of the drilling equipment and the second real-time drilling data, determines a first intention label for representing an engineering target of the actual operation according to the actual operation, inputs the second real-time operation data and the first intention label into the decision optimization model determined based on reinforcement learning, and finally obtains the optimized drilling decision efficiently and accurately.

Additional features and advantages of embodiments of the application will be set forth in the detailed description which follows.

Drawings

The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain, without limitation, the embodiments of the application. In the drawings:

FIG. 1 schematically illustrates a flow diagram of a method of drilling assistance decision making in accordance with an embodiment of the application;

FIG. 2 schematically illustrates a training process diagram of an initial decision model according to an embodiment of the application;

fig. 3 schematically shows a schematic diagram of an apparatus for drilling aid decision making according to an embodiment of the application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the detailed description described herein is merely for illustrating and explaining the embodiments of the present application, and is not intended to limit the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that, in the technical scheme of the application, the acquisition, transmission, storage, use, processing and the like of the data all conform to the relevant regulations of the law and regulation. In the embodiments of the present application, some software, components, models, etc. may be mentioned in the industry, and they should be regarded as exemplary only for the purpose of illustrating the feasibility of implementing the technical solution of the present application, but it does not mean that the applicant has or must not use the solution.

It should be noted that, if directional indications (such as up, down, left, right, front, and rear are referred to in the embodiments of the present application), the directional indications are merely used to explain the relative positional relationship, movement conditions, and the like between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.

In addition, if there is a description of "first", "second", etc. in the embodiments of the present application, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present application.

Fig. 1 schematically shows a flow diagram of a method of drilling assistance decision making according to an embodiment of the application. As shown in fig. 1, an embodiment of the present application provides a method of drilling assistance decision making, which may include the following steps S110-S150.

Step S110, first real-time operation data are acquired, wherein the first real-time operation data comprise first real-time working parameters of drilling equipment and first real-time drilling data.

In the embodiment of the application, first real-time operation data including first real-time operation parameters of the drilling equipment and first real-time drilling data in the drilling operation process are collected through the automatic equipment, and preprocessing is carried out.

Specifically, in an alternative embodiment, the collected first real-time operating parameters of the drilling equipment may include a set weight on bit, a set rotational speed, a set displacement, drilling fluid parameters, drill bit wear, equipment operating conditions, etc., while the collected first real-time drilling data may include weight on bit, rotational speed downhole, displacement downhole, formation information, well depth, geological conditions, etc. The preprocessing operations may include outlier/missing value processing, noise reduction, normalization/normalization, etc., to further convert the collected data into an input data set that may be input to the initial decision model.

By collecting the various data, the real-time state of the driller on the actual operation of the drilling equipment and the underground working environment can be reflected, and a basis is provided for determining an initial decision for an initial decision model.

Step S120, inputting the first real-time operation data into an initial decision model to obtain an initial decision, wherein the initial decision model is obtained based on behavior cloning.

In the embodiment of the application, the initial decision model for performing behavior clone training based on historical data is used for processing the first real-time data so as to obtain an initial decision and provide the initial decision for a driller to refer to.

Specifically, in an alternative embodiment, the initial decision model may provide the driller with adjustment decisions including the type of parameters of weight on bit, rotational speed, displacement, etc. of the drilling equipment.

The initial decision determined by the initial decision model can provide basic decision reference and assistance for the driller, so that the driller can be helped to better achieve the potential engineering target judged by the initial decision model at this stage.

Step S130, second real-time operation data of the drilling equipment are obtained after the driller determines actual operation based on the initial decision and applies the actual operation to the drilling equipment.

In the embodiment of the application, after knowing the initial decision recommended by the initial decision model, the driller can determine the actual operation applied to the drilling equipment according to the engineering experience and the actual engineering target of the driller based on the initial decision, so as to obtain second real-time operation data comprising second real-time working parameters and second real-time drilling data of the drilling equipment. The second real-time operating parameters of the drilling apparatus may include set weight on bit, set rotational speed, set displacement, drilling fluid parameters, bit wear, apparatus operating conditions, etc., and the second real-time drilling data may include downhole weight on bit, downhole rotational speed, downhole displacement, formation information, well depth, geological conditions, etc.

Specifically, in an alternative embodiment, the driller may choose to take or not take the initial decision, may choose to take one or more of the parameter types recommended for adjustment in the initial decision, or may choose to take a specific parameter value for adjustment of one or more of the parameter type recommended in the initial decision. Meanwhile, the actual operation of driller applied to the drilling equipment is accurately reflected in the second real-time operation data.

By analyzing and processing the second real-time working parameters including the drilling equipment and the second real-time working data of the second real-time drilling data, decisions of drillers on actual drilling operations under different downhole environments and different engineering targets can be more accurately known by the decision optimization model, and further, the processing speed and accuracy of the decision optimization model are improved through reinforcement learning.

And step 140, determining a first intention label according to the second real-time operation data, wherein the first intention label is used for representing an engineering target of actual operation.

In the embodiment of the application, the first intention label used for representing the engineering target of the actual operation is further determined through the second real-time operation data generated by the actual operation of the driller.

Specifically, in an alternative implementation manner, the change records of the weight on bit, the rotating speed and the displacement in the second real-time operation data are judged through an algorithm, mutation points of parameter changes are found, a mutation vector is determined according to the parameter change values, and then the first intention label is determined according to the mutation vector. One possible way of determining is to determine the mutation point using the CUSUM algorithm. When the algorithm determines that a mutation exists in a parameter, the decision is considered as being made by the driller.

Through the first intention label, the decision optimization model can more accurately acquire engineering targets corresponding to actual operations performed by the driller, and further provides assistance for determining an optimization decision.

And S150, inputting the second real-time operation data and the first intention label into a decision optimization model to obtain an optimization decision, wherein the decision optimization model is obtained based on reinforcement learning.

In the embodiment of the application, the decision optimization model obtained based on reinforcement learning (Reinforcement Learning, RL) is based on an initial decision model, and the second real-time operation data and the first intention label are used as inputs to determine an optimization decision.

By inputting the second real-time operation data reflecting the current actual drilling operation state and the first intention label reflecting the driller actual operation and the target into the decision optimization model, the determined optimization decision can more accord with the actual operation requirement than the initial decision.

According to the technical scheme, according to the first real-time operation data comprising the first real-time working parameters of the drilling equipment and the first real-time drilling data, an initial decision is obtained by using an initial decision model obtained based on behavior cloning, then the actual operation is determined according to the driller reference initial decision and the second real-time operation data generated after application is carried out, a first intention label used for representing an engineering target of the actual operation is determined, and finally, the optimal decision which meets the actual operation requirement, has adaptability and intelligence is determined more efficiently by means of a decision optimization model based on reinforcement learning according to the second real-time operation data and the first intention label.

FIG. 2 schematically illustrates a training process diagram of an initial decision model according to an embodiment of the application. As shown in FIG. 2, an embodiment of the present application provides a training process for an initial decision model, which may include the following steps S210-S250.

Step S210, acquiring historical operation data and corresponding historical decision information, wherein the historical operation data comprises historical working parameters of drilling equipment and historical drilling data.

In the embodiment of the application, various data generated in the historical actual operation can be collected and preprocessed, converted into a training data set, and the behavior clone is used for training the preset neural network. The historical operation data comprise historical operation parameter records of drilling equipment, such as set bit pressure, set rotating speed, set displacement, drilling fluid parameters, drill bit abrasion, equipment running state and the like, and the historical drilling data record data, such as underground bit pressure, underground rotating speed, underground displacement, stratum information, well depth, geological conditions and the like. Meanwhile, the behavior of artificially carrying out parameter adjustment on drillers in the historical operation data is correspondingly a historical decision, and engineering targets are marked, such as 'avoiding equipment abrasion', 'improving the mechanical drilling rate', and the like.

Specifically, in the historical drilling data, due to formation heterogeneity, hydraulic system delay, automatic control compensation, etc., successive changes in multiple historical parameters in the historical drilling data over a short period of time may represent historical decisions for the same driller, while the drilling process is still continuing and the equipment is still moving. Therefore, in the embodiment of the application, the drilling equipment is moved by a preset distance to accurately judge the historical decision of the driller, and one possible value is 3 meters.

Step S220, determining a second intention label according to the historical drilling data, wherein the second intention label is used for representing motivation of the historical decision information.

In the embodiment of the application, change records of underground weight on bit, underground rotating speed and underground displacement in historical drilling data are judged through an algorithm, mutation points of parameter change are found, a mutation vector is determined according to a parameter change value, and a second intention label is determined according to the mutation vector. One possible way of determining is to determine the mutation point using the CUSUM algorithm. When the algorithm determines that a mutation exists in a parameter, the decision is considered as being made by the driller.

Step S230, inputting the historical operation data, the corresponding historical decision information and the second intention label into a preset neural network to obtain the predictive decision information corresponding to the historical operation data.

In the embodiment of the application, in the imitation learning, the behavior cloning makes the preset neural network imitate the decision-making behavior of a driller in a supervised learning mode. Specifically, the operation rule of the driller can be learned by inputting historical operation data and corresponding historical decision information and analyzing the decision of the driller (such as adjusting the weight on bit, the rotating speed and the displacement), and the training target can be to judge whether the driller makes a decision at the next moment, what decision the driller makes and what the parameter change value corresponding to the decision is.

In one possible embodiment, the predictive decision information includes whether the driller makes a decision, the type of operating parameter that the decision corresponds to be adjusted, and the adjustment value for each operating parameter type.

The first step is a binary classification problem, where historical drilling data is processed to output a probability value between 0 and 1, indicating the probability of decision making by the driller at the next time (which may be 1 meter below), and when the probability is greater than a set point, i.e., considered to be operational, the set point may be 0.8 in one embodiment.

The second step is a multi-label classification problem, and the second step shares part of the feature extraction layer with the first step. Since parameters of a plurality of drilling equipment may need to be adjusted simultaneously in the actual drilling operation process, whether the parameters of the drilling equipment are adjusted is judged through three independent Sigmoid output layers which are not mutually exclusive.

The third step is the multi-objective regression problem, which also shares the feature extraction layer. In case the decision parameter type has been determined, the parameter variation values of the drilling equipment are output in this step. This step may have a gating mechanism, i.e. if the second step determines that a certain parameter is not adjusted, the third step forces the rate of change of the parameter to zero.

And for the second step and the third step, physical constraints of drilling engineering experience are introduced to prevent the safety problem of the output parameter adjustment mode. Such as when the formation data is "salt bed," the reduction of displacement is forcibly prohibited.

Therefore, by inputting the historical operation data, the corresponding historical decision information and the second intention label, the prediction decision information corresponding to the historical operation data can be obtained.

Step S240, according to the historical decision information and the predictive decision information, calculating a loss value corresponding to the historical operation data based on the first loss function.

In the embodiment of the application, it can be understood that the predicted decision information in the training process does not necessarily coincide with the historical decision information. Therefore, after each training, the prediction decision information is compared with the historical decision information, and the accuracy of the model is evaluated through the first loss function.

In the embodiment of the application, the first loss function is obtained by weighted summation of decision occurrence judgment loss, decision type loss and decision parameter adjustment value loss, wherein the decision occurrence judgment loss is a cross entropy loss function (binary cross entropy), the decision type loss is a multi-label cross entropy loss function (multi-label cross entropy), and the decision parameter adjustment value loss is a mean square error loss function (MSE).

And S250, iteratively training the preset neural network by taking the minimum loss value as a target, and ending training to obtain an initial decision model when the preset first training ending condition is met.

In the embodiment of the application, the training process is verified and tested with the minimum loss value as a target, and the training is stopped until the preset first training termination condition is met by continuously iterating through adjusting the training parameters and the algorithm structure, so as to obtain an initial decision model. The first training termination condition may be that the loss value is smaller than a preset loss value threshold, or the training number reaches a preset training number threshold.

Through the technical scheme, the behavior cloning training is carried out on the preset neural network by means of the historical operation data and the corresponding historical decision information, the driller behavior is simulated under supervision and learning, and then the first loss function is used for verification and evaluation, so that an initial decision model with higher accuracy is finally obtained.

In an alternative embodiment, step S150 may include the following steps S151-S153.

And step S151, according to the second real-time operation data as a state space, taking the first intention label as the state enhancement of the state space, inputting the state enhancement into a decision optimization model, and determining a target action in an action space, wherein the action space comprises the change state of a target working parameter.

Those skilled in the art will appreciate that in reinforcement learning, a state is a description of an agent interacting with an environment, and an action is an action that an agent may take in a particular state. In the embodiment of the application, preprocessing including outlier/missing value processing, noise reduction, normalization/standardization and the like is firstly carried out on the second real-time operation data, modeling and quantization are further completed to be used as a state space for input, and meanwhile, a first intention label is introduced to represent an engineering target behind the actual operation and is used as state enhancement of the state space and is input to a decision optimization model.

The decision optimization model is based on an initial decision model, based on reinforcement learning, and determines a target action in an action space to output according to an input state space, wherein the action space comprises a change state of a target working parameter.

Step S152, determining the expected total rewards through a rewarding function and a second loss function according to the action space and the first intention label.

In one possible embodiment of the present application, the movement space includes, but is not limited to, either, not changing, increasing or decreasing the weight on bit of the drilling apparatus, not changing, increasing or decreasing the rotational speed of the drill bit of the drilling apparatus, not changing, increasing or decreasing the displacement of the mud pump of the drilling apparatus. Under the action space definition and in combination with the first intention label, calculating the expected total rewards through the rewards function and the second loss function.

In the embodiment of the application, a double-track system rewarding function design is innovatively provided, and the rewarding function is obtained by weighting and summing the rewarding according to the consistency of initial decision and actual operation and the consistency of the actual operation and the first intention label. The consistency rewards of the initial decision and the actual operation represent the initial decision obtained through an initial decision model according to the first real-time operation data, the consistency of the initial decision and the actual operation applied to the drilling equipment is determined after the driller refers to the initial strategy and the actual engineering target, 1 is obtained, 0 is not obtained, the consistency rewards of the actual operation and the first intention label represent the driller actual operation, the consistency of the actual operation and the first intention label obtained through the second real-time operation data, 1 is obtained, and 0 is not obtained. Through the double-track reward function design, the reinforcement learning model can be encouraged to learn more decision trends of drillers, and the efficiency of optimizing decisions to achieve the final engineering goal can be enhanced.

It may be appreciated that the weights of the initial decision and the consistency rewards of the actual operation and the first intention label may be set according to actual requirements, which is not limited in the embodiment of the present application. Illustratively, the initial decision may be 0.3 with a consistency prize for actual operation and the first intent tag may be 0.7.

Through the setting of the reward function, the model can balance various parameters and purposes in the operation in the training process, and meanwhile, the adaptability is improved, and the overall efficiency and safety of the operation are optimized.

In an embodiment of the present application, the second loss function includes:

Through the second loss function, the model can continuously improve the prediction precision, so that the accuracy and the adaptability of the optimization decision are gradually improved.

Step S153, determining an optimization decision with the aim of maximizing the expected total rewards.

In the embodiment of the application, the error between the expected total rewards and the rewards actually obtained, which can be obtained by taking a certain action under a certain state, of the decision optimization model is judged by calculating the time difference (Temporal Difference, TD) error, and the decision optimization model is updated, so that the optimization decision is determined.

According to the technical scheme, the second real-time operation data and the first intention label are simultaneously input into the decision optimization model based on reinforcement learning, and the decision optimization model with high intelligence and high adaptability is finally formed by means of cooperation training of the reward function and the second loss function, so that accurate, high-adaptability and decision support capable of explaining decision purposes can be provided for a driller.

Fig. 3 schematically shows a schematic diagram of an apparatus for drilling aid decision making according to an embodiment of the application. As shown in fig. 3, an embodiment of the present application provides a controller 300, which may include:

a memory 310 configured to store instructions, and

Processor 320 is configured to invoke instructions from memory 310 and when executing the instructions, to implement the methods for drilling assistance decisions described above.

Embodiments of the present application also provide a machine-readable storage medium having stored thereon instructions for causing a machine to perform the above-described method of drilling assistance decision making.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A drilling-aided decision-making method, characterized in that it includes:

Acquire first real-time operation data, wherein the first real-time operation data includes first real-time operating parameters of the drilling equipment and first real-time drilling data;

The first real-time operation data is input into the initial decision model to obtain an initial decision, wherein the initial decision model is obtained based on behavior cloning;

Acquire second real-time operation data of the drilling equipment, wherein the second real-time operation data is obtained after the driller determines the actual operation based on the initial decision and applies the actual operation to the drilling equipment;

Based on the second real-time operation data, a first intent tag is determined, wherein the first intent tag is used to characterize the engineering goal of the actual operation;

Based on reinforcement learning, the second real-time job data and the first intent label are input into the decision optimization model to obtain an optimized decision.

2. The method according to claim 1, wherein the training process of the initial decision model includes:

Acquire historical operation data and corresponding historical decision information, wherein the historical operation data includes the historical operating parameters of the drilling equipment and historical drilling data;

Based on the historical drilling data, a second intent label is determined, wherein the second intent label is used to characterize the motivation of the historical decision information;

The historical task data, the corresponding historical decision information, and the second intent label are input into a preset neural network to obtain the predictive decision information corresponding to the historical task data.

Based on the historical decision information and the predicted decision information, and using the first loss function, the loss value corresponding to the historical operation data is calculated.

With the goal of minimizing the loss value, the preset neural network is trained iteratively until a preset first training termination condition is met, at which point the training is terminated to obtain the initial decision model.

3. The method according to claim 2, wherein the predictive decision information includes whether the driller makes a decision, the type of working parameter to be adjusted corresponding to the decision, and the adjustment value for each type of working parameter.

4. The method according to claim 2, wherein the first loss function is obtained by weighted summation of decision occurrence judgment loss, decision type loss, and decision parameter adjustment value loss, wherein the decision occurrence judgment loss is a cross-entropy loss function, the decision type loss is a multi-label cross-entropy loss function, and the decision parameter adjustment value loss is a mean squared error loss function.

5. The method according to claim 2, characterized in that it further includes, the historical decision information including multiple historical decisions within a preset distance of the drilling equipment movement.

6. The method according to claim 1, wherein the step of inputting the second real-time job data and the first intent label into the decision optimization model based on reinforcement learning to obtain an optimization decision includes:

The second real-time job data is used as the state space, and the first intent label is used as the state enhancement of the state space. The data is then input into the decision optimization model to determine the target action from the action space, wherein the action space includes the changing states of the target working parameters.

Based on the action space and the first intent label, the expected total reward is determined using a reward function and a second loss function;

The optimization decision is determined with the objective of maximizing the expected total reward.

7. The method according to claim 6, wherein the reward function is obtained by weighted summation of the consistency reward between the initial decision and the actual operation and the consistency reward between the actual operation and the first intention label.

8. The method according to claim 6, wherein the second loss function comprises:

Where L _<sub>RL </sub> is the second loss function, and Q represents the expected total reward that can be obtained by taking a certain action in a certain state. Let γ represent the expected value of the difference between the Q values for all states s and actions a, where R is the actual reward obtained by taking an action in the current state, and γ is the discount factor. Q(s,a) is the maximum Q value that the agent takes in the next state s', and Q(s,a) is the Q value of the agent taking action a in state s.

9. A drilling auxiliary decision-making device, characterized in that it comprises:

The memory is configured to store instructions; and

A processor is configured to retrieve the instructions from the memory and, when executing the instructions, to implement the drilling assistance decision-making method according to any one of claims 1 to 8.

10. A machine-readable storage medium, characterized in that the machine-readable storage medium stores instructions for causing a machine to perform a drilling auxiliary decision-making method according to any one of claims 1 to 8.