CN115865806B

CN115865806B - Congestion control method, device, electronic device and storage medium

Info

Publication number: CN115865806B
Application number: CN202211487825.1A
Authority: CN
Inventors: 王玲; 吕磊; 程诚; 程博锋
Original assignee: New H3C Technologies Co Ltd
Current assignee: New H3C Technologies Co Ltd
Priority date: 2022-11-25
Filing date: 2022-11-25
Publication date: 2025-02-25
Anticipated expiration: 2042-11-25
Also published as: CN115865806A

Abstract

The embodiments of the present application provide a congestion control method, device, electronic device and storage medium, the method comprising: when the preset collection period is reached, obtaining the network status data of the current collection period as the first network status data; based on the first network status data, calculating the reward value of the current collection period as the first reward value; inputting the first network status data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy; adjusting the current congestion window according to the target adjustment strategy. For different network environments, there is no need to distinguish whether the congestion signal is caused by network congestion, and the reward value can be calculated based on the first network status data, thereby achieving the adjustment of the congestion window. Therefore, the congestion control method of the present application can be applied to complex network environments and can improve the effectiveness of congestion control.

Description

Congestion control method, congestion control device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer networks, and in particular, to a congestion control method, apparatus, electronic device, and storage medium.

Background

With the development of network technology, the current network environment is more and more complex, and the factors influencing the network transmission efficiency are numerous. In order to avoid network congestion, ensure network stability and high-efficiency transmission of data, the data sending rate of the data sending end can be adjusted in a congestion control mode.

For example, in the related art, a heuristic-based congestion control method uses packet loss rate and delay as congestion signals and dynamically controls a transmission rate or congestion window to avoid network congestion. However, as the network environment becomes more complex, based on this method, whether the congestion signal is network congestion cannot be effectively resolved, and it is difficult to adapt to the current complex network environment, so that the validity of congestion control is not high.

Disclosure of Invention

The embodiment of the application aims to provide a congestion control method, a congestion control device, electronic equipment and a storage medium, which are suitable for complex network environments and improve the effectiveness of congestion control. The specific technical scheme is as follows:

According to a first aspect of an embodiment of the present application, there is provided a congestion control method, the method including:

when a preset acquisition period is reached, acquiring network state data of the current acquisition period as first network state data;

The first network state data comprises a sending rate and a receiving rate of a current collection period, wherein the sending rate of the current collection period is determined based on the sending rate of each appointed time in the current collection period, the receiving rate of the current collection period is determined based on the receiving rate of each appointed time in the current collection period, the appointed time comprises a first time when an ACK message is received, the receiving rate of each first time represents the rate of receiving data between the time when the ACK message is received last before the data packet corresponding to the first time is sent and the first time, the data packet corresponding to the first time represents the data packet responded by the ACK message received at the first time, and the sending rate of each first time represents the sending time of the data packet responded by the last received ACK message and the rate of sending data between the sending time of the data packet corresponding to the first time;

calculating a reward value of the current acquisition period based on the first network state data as a first reward value;

Inputting the first network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy, wherein the adjustment strategy prediction network model is obtained by training based on a reinforcement learning algorithm;

And adjusting the current congestion window according to the target adjustment strategy.

Optionally, the first reward value is positively correlated with the receiving rate of the current acquisition period and is negatively correlated with the rate difference value of the current acquisition period, and the rate difference value of the current acquisition period represents the difference value between the sending rate of the current acquisition period and the receiving rate of the current acquisition period.

Optionally, the first network status data further includes at least one of:

the minimum round trip time of the current acquisition period is represented by the minimum value in round trip time of each acquired designated moment when reaching each designated moment in the current acquisition period;

The average round trip time of the current acquisition period represents the average value of round trip time acquired at each appointed moment in the current acquisition period;

The average time delay of the current acquisition period represents the average value of the time delays acquired at each appointed moment in the current acquisition period;

The average congestion window size of the current collection period represents the average value of the congestion window sizes collected at each appointed moment in the current collection period;

The average flying data size of the current acquisition period represents the average value of the flying data sizes acquired at each appointed time in the current acquisition period, and the flying data size acquired at one appointed time represents the size of a data packet which is transmitted at the appointed time and does not receive a corresponding ACK message;

The size of the transmitted data in the current acquisition period;

The size of the data packet responded by the received ACK message in the current acquisition period;

The size of the lost data packet in the current acquisition period;

the number of congestion signals displayed in the ACK message received in the current acquisition period.

Optionally, the designated time further includes a second time when the packet loss event is detected;

The round trip time collected at each second moment represents the round trip time collected when the last ACK message received before the second moment;

The time delay collected at each second moment represents the time delay collected when the last ACK message received before the second moment;

the sending rate collected at each second moment represents the sending rate collected when the ACK message is received last before the second moment;

the receiving rate collected at each second time represents the receiving rate collected when the last ACK message received before the second time.

Optionally, the first prize value is inversely related to the average time delay of the current acquisition period.

Optionally, the calculating, based on the first network state data, a prize value of a current acquisition period as a first prize value includes:

Judging whether the average time delay of the current acquisition period is smaller than a first threshold value or not;

if yes, determining a first rewarding value of the current acquisition period as the receiving rate of the current acquisition period;

If not, calculating the rewarding value of the current collection period based on the sending rate of the current collection period, the receiving rate of the current collection period, the average time delay of the current collection period and the minimum round trip time of the current collection period, and taking the rewarding value as a first rewarding value.

Optionally, the first threshold is positively correlated with a minimum round trip time of the current acquisition period.

Optionally, the first threshold is calculated based on a first formula;

the first formula is:

S=εMinRtt+ρ

Wherein S represents a first threshold, minRtt represents the minimum round trip time of the current acquisition period, ε represents a first preset parameter, ρ represents a second preset parameter;

The calculating, based on the sending rate of the current collection period, the receiving rate of the current collection period, the average time delay of the current collection period, and the minimum round trip time of the current collection period, the bonus value of the current collection period as a first bonus value includes:

Calculating a reward value of the current acquisition period as a first reward value according to a second formula based on the sending rate of the current acquisition period, the receiving rate of the current acquisition period, the average time delay of the current acquisition period and the minimum round trip time of the current acquisition period, wherein the second formula is as follows:

wherein, reorder represents a first reward value, AR represents a receiving rate of a current acquisition period, D represents an average delay of the current acquisition period, minRtt represents a minimum round trip time of the current acquisition period, SR represents a transmitting rate of the current acquisition period, and δ represents a third preset parameter.

Optionally, the sending rate of the current collection period is the sending rate of the last appointed time in the current collection period, and the receiving rate of the current collection period is the receiving rate of the last appointed time in the current collection period.

Optionally, the target adjustment policy includes two or more first designated adjustment multiples greater than 1, two or more second designated adjustment multiples that are reciprocal to the two or more first designated adjustment multiples, and probabilities corresponding to each designated adjustment multiple;

The adjusting the current congestion window according to the target adjustment policy includes:

And adjusting the current congestion window according to the specified adjustment multiple with the maximum probability.

Optionally, the method further comprises:

and when the preset adjustment period is reached, transmitting the data packet according to the rate smaller than the current receiving rate in the first duration.

Optionally, before inputting the first network state data and the first reward value into a pre-trained adjustment policy prediction network model to obtain a target adjustment policy, the method further includes:

acquiring network state data of a preset number of historical periods before a current acquisition period as second network state data;

inputting the first network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy, wherein the method comprises the following steps:

and inputting the first network state data, the second network state data and the first rewards value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy.

Optionally, the training process of the adjustment strategy prediction network model includes the following steps:

acquiring network state data of a sample period as sample network state data;

The sample network state data comprises a sending rate and a receiving rate of a sample period, wherein the sending rate of the sample period is determined based on the sending rate collected at each appointed time in the sample period, the receiving rate of the sample period is determined based on the receiving rate collected at each appointed time in the sample period, the appointed time comprises first time points for receiving the ACK message, the receiving rate of each first time point represents the rate of receiving data between the time point of the last received ACK message before the data packet corresponding to the first time point is sent and the first time point, the data packet corresponding to the first time point represents the data packet responded by the ACK message received at the first time point, and the sending rate of each first time point represents the sending time point of the data packet responded by the last received ACK message and the data packet corresponding to the first time point;

calculating a second rewarding value of the sample period based on the sample network state data, wherein the second rewarding value is positively correlated with the receiving rate of the sample period and negatively correlated with the rate difference value of the sample period;

inputting the sample network state data and the second rewarding value into an adjustment strategy prediction network model of initial parameters to obtain a sample adjustment strategy and a strategy grading value;

Adjusting the current congestion window according to the sample adjustment strategy;

And adjusting model parameters of the initial parameter adjustment strategy prediction network model based on the strategy grading value and the second rewarding value until convergence conditions are reached.

According to a second aspect of the embodiments of the present application, there is provided a congestion control apparatus, the apparatus comprising:

the first network state acquisition module is used for acquiring network state data of a current acquisition period when a preset acquisition period is reached, and the network state data are used as first network state data;

a first reward value calculation module, configured to calculate, based on the first network state data, a reward value of a current acquisition period as a first reward value;

the target adjustment strategy acquisition module is used for inputting the first network state data and the first rewarding value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy, wherein the adjustment strategy prediction network model is obtained by training based on a reinforcement learning algorithm;

and the congestion window adjusting module is used for adjusting the current congestion window according to the target adjusting strategy.

Optionally, the first network status data further includes at least one of:

The size of the transmitted data in the current acquisition period;

The size of the lost data packet in the current acquisition period;

Optionally, the first prize value calculating module includes:

the first threshold judging sub-module is used for judging whether the average time delay of the current acquisition period is smaller than a first threshold, if yes, the first rewarding value calculating sub-module is triggered, and if not, the second rewarding value calculating sub-module is triggered;

the first rewards value calculation sub-module is used for determining a first rewards value of the current acquisition period as the receiving rate of the current acquisition period;

And the second rewarding value calculating sub-module is used for calculating the rewarding value of the current collecting period based on the sending rate of the current collecting period, the receiving rate of the current collecting period, the average time delay of the current collecting period and the minimum round trip time of the current collecting period, and the rewarding value is used as the first rewarding value.

Optionally, the first threshold is calculated based on a first formula;

the first formula is:

S=εMinRtt+ρ

the second rewarding value calculating sub-module is configured to calculate, as a first rewarding value, a rewarding value of a current acquisition period according to a second formula based on a sending rate of the current acquisition period, a receiving rate of the current acquisition period, an average time delay of the current acquisition period, and a minimum round trip time of the current acquisition period, where the second formula is:

the congestion window adjusting module is specifically configured to adjust a current congestion window according to a specified adjustment multiple with a maximum corresponding probability.

Optionally, the apparatus further includes:

And the sending rate adjusting module is used for sending the data packet according to the rate smaller than the current receiving rate in the first duration when the preset adjusting period is reached.

Optionally, the apparatus further includes:

The second network state acquisition module is used for acquiring network state data of a preset number of history periods before a current acquisition period as second network state data before inputting the first network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy;

The target adjustment strategy obtaining module is configured to input the first network state data, the second network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy.

Optionally, the apparatus further includes:

the training module is used for acquiring network state data of a sample period and taking the network state data as sample network state data;

According to a third aspect of an embodiment of the present application, there is provided an electronic device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory perform communication with each other through the communication bus;

A memory for storing a computer program;

and a processor for implementing any of the above-described method steps when executing a program stored on the memory.

According to a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium having a computer program stored therein, which when executed by a processor, implements any of the above-described method steps.

According to the congestion control method provided by the embodiment of the application, the sending rate and the receiving rate of the current acquisition period can be obtained aiming at different network environments, and the sending rate and the receiving rate can effectively reflect the current network state, so that the congestion control can be effectively realized based on the obtained information. The congestion window can be adjusted only by acquiring the sending rate and the receiving rate of the current acquisition period because whether the congestion signal is network congestion or not is not needed to be distinguished. Therefore, the congestion control method of the application can be suitable for complex network environment and can improve the effectiveness of congestion control. In addition, since the reward value is related to the sending rate and the receiving rate of the current acquisition period, the receiving rate and the sending rate are made equal as much as possible through the reward mechanism. Therefore, the congestion window is adjusted according to the target adjustment strategy obtained based on the reward value, a larger bandwidth utilization rate can be obtained, and the problem of excessive transmission caused by the fact that the transmission rate is larger than the receiving rate can be avoided.

Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the application, and other embodiments may be obtained according to these drawings to those skilled in the art.

Fig. 1 is a flowchart of a congestion control method according to an embodiment of the present application;

Fig. 2 is a schematic diagram of calculating a sending rate and a receiving rate at a designated time according to an embodiment of the present application;

fig. 3 is a schematic diagram of a congestion control method according to an embodiment of the present application;

fig. 4 is a training flowchart of an adjustment policy prediction network model in the congestion control method according to the embodiment of the present application;

FIG. 5 is a schematic diagram of generating an adjustment policy based on an adjustment policy prediction network model according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a congestion control apparatus according to an embodiment of the present application;

fig. 7 is a schematic diagram of another structure of a congestion control apparatus according to an embodiment of the present application;

fig. 8 is a schematic diagram of another structure of a congestion control apparatus according to an embodiment of the present application;

fig. 9 is a schematic diagram of another structure of a congestion control apparatus according to an embodiment of the present application;

fig. 10 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. Based on the embodiments of the present application, all other embodiments obtained by the person skilled in the art based on the present application are included in the scope of protection of the present application.

In order to adapt to a complex network environment and improve the validity of congestion control, an embodiment of the present application provides a congestion control method, referring to fig. 1, fig. 1 is a flowchart of the congestion control method provided by the embodiment of the present application, where the method may include the following steps:

Step S101, when a preset acquisition period is reached, acquiring network state data of the current acquisition period as first network state data.

The first network state data comprises a sending rate and a receiving rate of a current collection period, the sending rate of the current collection period is determined based on the sending rate of each appointed time in the current collection period, the receiving rate of the current collection period is determined based on the receiving rate of each appointed time in the current collection period, the appointed time comprises a first time when an ACK (Acknowledgement character) message is received, the receiving rate of each first time represents the rate of receiving data between the time when the ACK message is finally received before the data packet corresponding to the first time is sent and the first time, the data packet corresponding to the first time represents the data packet responded by the ACK message received at the first time, and the sending rate of each first time represents the sending time when the data packet corresponding to the first time is sent after the data packet corresponding to the last received before the data packet is sent.

Step S102, calculating a reward value of the current acquisition period as a first reward value based on the first network state data.

Step S103, inputting the first network state data and the first rewards value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy.

The strategy prediction network model is adjusted to be obtained by training based on a reinforcement learning algorithm.

Step S104, according to the target adjustment strategy, the current congestion window is adjusted.

By applying the congestion control method provided by the embodiment of the application, the sending rate and the receiving rate of the current acquisition period can be obtained aiming at different network environments, and the sending rate and the receiving rate can effectively embody the current network state, so that the congestion control can be effectively realized based on the obtained information. The congestion window can be adjusted only by acquiring the sending rate and the receiving rate of the current acquisition period because whether the congestion signal is network congestion or not is not needed to be distinguished. Therefore, the congestion control method of the application can be suitable for complex network environment and can improve the effectiveness of congestion control. In addition, since the reward value is related to the sending rate and the receiving rate of the current acquisition period, the receiving rate and the sending rate are made equal as much as possible through the reward mechanism. Therefore, the congestion window is adjusted according to the target adjustment strategy obtained based on the reward value, so that a larger bandwidth utilization rate can be obtained, and the problem of excessive transmission caused by the fact that the transmission rate is larger than the receiving rate can be avoided.

For step S101, the period duration of the preset acquisition period may be a fixed duration. Or the period duration of the preset acquisition period may also be determined based on RTT (Round Trip Time) of the current data packet. For example, at the end of one acquisition cycle, the RTT of the data packet at the current time may be determined, and the cycle length of the next acquisition cycle may be determined as the RTT.

In the present application, an event occurring at a specified time may be referred to as a specified event. For example, the specified event includes an event that receives an ACK message. In each acquisition period, when a specified event occurs, current network state data including a current transmission rate and a current reception rate, that is, a transmission rate and a reception rate acquired at a current specified time, may be acquired. The network state data for an acquisition cycle is determined based on network state data acquired at each designated time within the acquisition cycle.

The transmission rate of the current acquisition period is determined based on the transmission rate acquired at each specified time in the current acquisition period. For example, the sending rate of the current collection period may be the sending rate collected at the last designated time in the current collection period, the sending rate collected at other designated times in the current collection period, or an average value of the sending rates collected at the designated times in the current collection period.

The receiving rate of the current acquisition period is determined based on the receiving rate acquired at each specified time in the current acquisition period. For example, the receiving rate of the current collecting period may be the receiving rate collected at the last designated time in the current collecting period, the receiving rate collected at other designated times in the current collecting period, or the average value of the sending rates collected at the designated times in the current collecting period.

In network communications, a transmitting end may send a data packet to a receiving end, and further, after receiving the data packet, the receiving end may send an ACK message to the transmitting end in response to the received data packet. The sending end receives the ACK message, and can determine the data packet to which the ACK message responds. A packet corresponding to a first time indicates a packet to which the ACK message received at the first time corresponds, that is, the ACK message received at the first time is a packet corresponding to the first time.

The sending end can record the sending time of the data packet when sending the data packet, and the sending end can record the corresponding receiving time when receiving the ACK message sent by the receiving end. The reception rate at the first time can represent a rate at which data is received in one of the history periods (which may be referred to as a first period) before the first time, and the transmission rate at the first time can represent a rate at which data is transmitted in another of the history periods (which may be referred to as a second period). The ACK message received in the first period corresponds to the data packet transmitted in the second period.

Referring to fig. 2, fig. 2 is a schematic diagram of calculating a sending rate and a receiving rate at a specified time according to an embodiment of the present application. Wherein the abscissa indicates time, the black rectangular box indicates a transmitted data packet, and the white rectangular box indicates a received ACK message. The time of transmitting the data packet a is T1, the time of receiving the ACK message A1 is T2, the time of transmitting the data packet B is T3, and the time of receiving the ACK message B1 is T4. The data packet responded by the ACK message B1 is the data packet B, the last received ACK message before the data packet B is sent is the ACK message A1, and the data packet responded by the ACK message A1 is the data packet A.

When the first time is T4, it can be seen that the transmission rate at T4 is the ratio of the size of the data packet transmitted between the time T3 and the time T1 to the time interval between the time T3 and the time T1. The receiving rate at the time T4 is the ratio of the size of a data packet responded by receiving the ACK message between the time T4 and the time T2 to the time interval between the time T4 and the time T2.

The transmission rate at time T4 can be calculated based on equation (1):

The send_rate indicates the transmission rate at time T4, send indicates the size of a packet transmitted between time T3 and time T1, and T3-T1 indicates the time interval between time T3 and time T1.

The reception rate at time T4 can be calculated based on the formula (2):

Where acked_rate represents the receiving rate at time T4, acked represents the size of a data packet to which an ACK message received between time T4 and time T2 corresponds, and T4-T2 represents the time interval between time T4 and time T2.

For step S102, in one embodiment, the first prize value is positively correlated with the rate of receipt of the current acquisition period and negatively correlated with the rate difference for the current acquisition period, the rate difference for the current acquisition period representing the difference between the rate of transmission of the current acquisition period and the rate of receipt of the current acquisition period. Since the first reward value is positively correlated with the receiving rate of the current acquisition period, that is, the higher the receiving rate of the current acquisition period is, the larger the first reward value is, so that in order to obtain the larger reward value, the adjustment strategy predicts that the target adjustment strategy formed by the network model can promote the increase of the receiving rate. The first reward value is inversely related to a rate difference value of the current acquisition period, wherein the rate difference value of the current acquisition period represents a difference value between a sending rate of the current acquisition period and a receiving rate of the current acquisition period. The smaller the difference between the sending rate of the current collection period and the receiving rate of the current collection period, the larger the first rewarding value, so that in order to obtain the larger rewarding value, the target adjustment strategy formed by adjusting the strategy prediction network model can reduce the rate difference of the current collection period, so that the receiving rate is equal to the sending rate as much as possible, and further, the larger bandwidth utilization rate is obtained without the problem of overdriving.

For step S103 and step S104, the first network state data includes network state data characterizing the current network state, such as a sending rate of the current acquisition period and a receiving rate of the current acquisition period. The adjustment strategy prediction network model can sense the network state of the current acquisition period based on the first network state data, and then combines the rewarding value to obtain a target adjustment strategy. The target adjustment policy includes adjustment actions for adjusting the current congestion window, and probabilities corresponding to the adjustment actions. For example, the adjustment action may represent increasing the current congestion window, such as by increasing a preset multiple, increasing a fixed value to increase the current congestion window. The adjustment action may also represent a reduction of the current congestion window, e.g. by reducing the preset multiple, reducing the fixed value to reduce the current congestion window. Or the adjustment action may also mean keeping the size of the congestion window unchanged. Furthermore, according to the probability corresponding to each adjustment action, an adjustment action with high probability can be selected to adjust the current congestion window size, and correspondingly, data packet transmission can be performed according to the adjusted congestion window, so as to realize congestion control.

In one embodiment, the sending rate of the current collection period is the sending rate of the last appointed time in the current collection period, and the receiving rate of the current collection period is the receiving rate of the last appointed time in the current collection period. The sending rate and the receiving rate acquired at the last appointed time in the current acquisition period can effectively embody the current network state, so that the sending rate and the receiving rate acquired at the last appointed time in the current acquisition period are used as the sending rate and the receiving rate of the current acquisition period, the congestion window is adjusted, and the adjustment effectiveness can be improved.

In one embodiment, the first network status data may further comprise at least one of:

The minimum round trip time of the current acquisition cycle represents the minimum of the round trip times of the designated times that have been acquired when each designated time is reached during the current acquisition cycle.

The average round trip time of the current acquisition period represents the average value of round trip time acquired at each designated moment in the current acquisition period.

The average time delay of the current acquisition period represents the average value of the time delays acquired at each appointed moment in the current acquisition period. The time delay for a given time acquisition may represent the difference between the round trip time acquired at the given time and the minimum of the round trip times for each given time that have been acquired at the given time.

The average congestion window size of the current acquisition period represents the average value of the congestion window sizes acquired at each appointed moment in the current acquisition period.

The average flying data size of the current acquisition period represents the average value of the flying data sizes acquired at each appointed time in the current acquisition period, and the flying data size acquired at one appointed time represents the size of a data packet which is transmitted at the appointed time and does not receive a corresponding ACK message.

The size of the data sent in the current acquisition period.

The size of the data packet to which the received ACK message is responsive in the current acquisition period.

The size of the lost data packet in the current acquisition period.

In one example, the minimum round trip time for each acquisition period may be recorded and updated during the acquisition period. During the acquisition period, when each specified time is reached, the round trip time of the specified time is acquired. If the round-trip time acquired at the appointed moment is smaller than the minimum round-trip time of the current acquisition period recorded currently, updating the minimum round-trip time of the current acquisition period to the round-trip time acquired at the appointed moment, otherwise, not updating the minimum round-trip time of the current acquisition period. I.e. the minimum round trip time of the current acquisition cycle, represents the minimum of the round trip times of the respective specified times that have been acquired when each specified time is reached within the current acquisition cycle.

The sending end can record the sending time of the data packet when sending the data packet, and the sending end can record the corresponding receiving time when receiving the ACK message corresponding to the data packet sent by the receiving end. Round trip time may be calculated based on the reception time of the ACK message and the transmission time of the corresponding data packet.

For each specified time in the current acquisition cycle, the size of the currently transmitted data packet and the size of the data packet to which the currently received ACK message is responsive may be recorded at the specified time. Further, the size of the flight data at the specified time may be calculated based on the recorded data. For example, the difference between the size of the packet currently transmitted and the size of the packet to which the ACK message currently received corresponds may be calculated as the in-flight data size.

For each specified time in the current acquisition cycle, the size of the currently transmitted data packet may be recorded at the specified time. Furthermore, the size of the data packet sent at the last designated time of the acquisition period and the size of the data packet sent at the last designated time of the previous acquisition period can be obtained, and then, the difference between the two obtained values is calculated to obtain the size of the data sent in the acquisition period.

For each specified time in the current acquisition cycle, the size of the data packet to which the ACK message has been currently received in response may be recorded at the specified time. Furthermore, the size of the data packet responded by the ACK message received at the last designated time of the acquisition period and the size of the data packet responded by the ACK message received at the last designated time of the previous acquisition period can be obtained, and then the difference between the obtained two values is calculated to obtain the size of the data packet responded by the ACK message received in the acquisition period.

Because of network congestion, a data packet may be lost, and thus, the size of the lost data packet may also represent the network state in the acquisition period. For each specified time in the current acquisition cycle, the size of the currently lost data packet may be recorded at the specified time. Furthermore, the size of the data packet lost at the last appointed time of the acquisition period and the size of the data packet lost at the last appointed time of the previous acquisition period can be obtained, and then, the difference value of the two obtained values is calculated to obtain the size of the data packet lost in the acquisition period.

When congestion occurs in the network, the ECN (Explicit Congestion Notification, show congestion signals) capable router may set a flag in the packet, i.e., show congestion signals. And after receiving the data packet added with the display congestion signal, the receiving end sends a corresponding ACK message carrying the display congestion signal to the sending end. Therefore, the number of congestion signals displayed in the ACK message received in one acquisition period can also represent the network state in the acquisition period.

The first network state data can comprise at least one data capable of describing the network state of the current acquisition period, the data have clear physical significance, so that the first network state data are more significant in describing the network state, the network state of the current acquisition period can be better perceived based on the first network state data, the condition of network congestion in the current acquisition period can be clearly known, and the adaptability of the congestion control method provided by the embodiment of the application to the network environment is improved, and a better control effect is obtained.

In one embodiment, the specified time further includes a second time at which the packet loss event was detected.

The round trip time collected at each second time indicates the round trip time collected at the last ACK message received before the second time.

The time delay collected at each second time represents the time delay collected when the last ACK message received before the second time.

The sending rate collected at each second time represents the sending rate collected when the last ACK message received before the second time.

The sending end does not receive the ACK message when the packet loss event occurs, and the round trip time, the time delay, the sending rate and the receiving rate need to be obtained based on the ACK message. Therefore, these network status data cannot be collected when a packet loss event is detected. In order to describe the network state at the time of the packet loss event, the network state data corresponding to the packet loss event, which cannot be acquired, may be determined based on the relevant data acquired at the time of the last ACK message received before the packet loss event is detected.

The minimum round trip time, delay, congestion window size, size of data on the fly, size of data sent for the second time acquisition is similar to the way these data were acquired at the first time.

If the ACK message corresponding to a certain data packet is not received for a certain time, the packet loss event is considered to occur, and when the packet loss event occurs, network congestion is likely to occur. That is, the network state data at the time of the packet loss event can effectively represent the state of the network, and thus congestion control can be performed based on the network state data at the time of the packet loss event.

In one embodiment, the first prize value is inversely related to the average time delay of the current acquisition period. The first rewarding value can be reduced along with the increase of the average time delay of the current acquisition period, namely the average time delay of the current acquisition period can be punished through the first rewarding value, the increase of the time delay is avoided, and then the problem of bufferbloat (buffer expansion) can be avoided, so that a better congestion control effect is achieved.

In one implementation, the first prize value may be calculated according to equation (3):

reward=α×acked_rate-β×(send_rate-acked_rate)-γ×delay (3)

Wherein, reorder represents a first reward value, the acked_rate is the receiving rate of the current acquisition period, the send_rate is the sending rate of the current acquisition period, delay is the average time delay of the current acquisition period, and alpha, beta and gamma represent preset coefficients. For example, α, β may be 1 or 2, and γ may be 0.05 or 0.04.

In one embodiment, calculating a prize value for a current acquisition cycle as a first prize value based on first network state data (S102) includes:

step one, judging whether the average time delay of the current acquisition period is smaller than a first threshold value, if yes, executing step two, and if not, executing step three.

And step two, determining the rewarding value of the current acquisition period as the receiving rate of the current acquisition period.

And thirdly, calculating the rewarding value of the current acquisition period based on the sending rate of the current acquisition period, the receiving rate of the current acquisition period, the average time delay of the current acquisition period and the minimum round trip time of the current acquisition period.

If the average time delay of the current acquisition period is smaller than the first threshold value, determining the rewarding value of the current acquisition period as the receiving rate of the current acquisition period, and further, enabling the larger the receiving rate of the current acquisition period, the larger the rewarding value. Therefore, through the rewarding mechanism, the receiving rate is equal to the sending rate as much as possible, so that a larger bandwidth utilization rate is obtained.

If the average time delay of the current acquisition period is not smaller than the first threshold value, calculating the rewarding value of the current acquisition period based on the sending rate of the current acquisition period, the receiving rate of the current acquisition period, the average time delay of the current acquisition period and the minimum round trip time of the current acquisition period. Because the reward value is related to the sending rate and the receiving rate of the current acquisition period, the receiving rate and the sending rate are made equal as much as possible through a reward mechanism, so that a larger bandwidth utilization rate is obtained.

Based on the above processing, if the first threshold is set to a smaller value, the average time delay of the current acquisition period can be controlled to be lower, so that data queuing is reduced, and if the first threshold is set to a larger value, in order to acquire a larger rewarding value, the average time delay of the current acquisition period can still be controlled to be within a larger range, so that the transmitted data volume can be increased, and thus the capacity of preempting the cache is improved.

For example, the first threshold may be a fixed value set in advance.

Or the first threshold may be positively correlated with the minimum round trip time of the current acquisition cycle. Correspondingly, when the minimum round trip time of the current acquisition period is larger, the first threshold value is larger, and at the moment, the larger average time delay can still be rewarded, so that the sending rate is increased, the size of the on-the-fly data is improved, more data packets can be ensured to be in queuing, and the capacity of preempting and caching is improved. When the minimum round trip time of the current acquisition period is smaller, the first threshold value is smaller, and the average time delay of the current acquisition period can be controlled to be lower, so that data queuing is reduced.

For example, the first threshold is calculated based on equation (4):

S=εMinRtt+ρ (4)

Wherein S represents a first threshold, minRtt represents the minimum round trip time of the current acquisition period, epsilon represents a first preset parameter, ρ represents a second preset parameter.

Step three, including:

Calculating a reward value of the current acquisition period as a first reward value according to a formula (5) based on a transmission rate of the current acquisition period, a transmission rate of the current acquisition period, a receiving rate of the current acquisition period, an average time delay of the current acquisition period and a minimum round trip time of the current acquisition period:

When epsilon and rho are larger, namely the first threshold is set to be larger, at the moment, the method can still give rewards for larger average time delay, so that the sending rate is increased, the size of the on-the-fly data is increased, more data packets can be ensured to be in queuing, and the capacity of occupying the buffer memory is improved. When epsilon and rho are smaller, namely the first threshold value is set to be smaller, the average time delay of the current acquisition period can be controlled to be lower, so that data queuing is reduced.

In the prior art, a congestion control reward value based on reinforcement learning is calculated based on a linear combination of throughput, average delay and size of a lost data packet in a period of time, as shown in a formula (6):

reward=α×T+β×D+γ×L (6)

Wherein, reorder represents the rewarding value of the time, T represents the throughput of the time, D represents the average time delay of the time, L represents the lost data packet size of the time, and alpha, beta and gamma represent each preset coefficient. However, in the above manner of calculating the prize value, there is only a simple linear relationship between the prize value and the throughput, average delay and size of the lost data packet in a period of time, so the prize value calculating method cannot be adapted to all network environments by adjusting coefficients, resulting in a limited network environment to which the method is applicable.

In the method for calculating the prize value provided by the embodiment of the application, the relation between the prize value and the network state data is nonlinear, so that the calculated prize value can be suitable for different network environments through adjustment of the preset parameters.

In one embodiment, the target adjustment policy includes two or more first specified adjustment factors greater than 1, two or more second specified adjustment factors reciprocal to the two or more first specified adjustment factors, and a probability corresponding to each specified adjustment factor. Accordingly, the current congestion window is adjusted according to the target adjustment policy (S104), including:

Step S1041, adjusting the current congestion window according to the specified adjustment multiple with the maximum probability.

Two or more specified adjustment multiples in the target adjustment strategy respectively correspond to adjustment actions with different amplitudes so as to adjust the size of the congestion window. For example, the first specified adjustment factors may include 2.89, 1.25, 1.05, respectively representing an adjustment of the congestion window size to 2.89 times, 1.25 times, 1.05 times the current congestion window, and the second specified adjustment factors may include 1/2.89, 1/1.25, 1/1.05, respectively representing an adjustment of the congestion window size to 1/2.89, 1/1.25, 1/1.05 of the current congestion window. And each appointed adjustment multiple has the corresponding probability, and the congestion window is adjusted according to the appointed adjustment multiple with the largest corresponding probability.

In addition, in the prior art, the adjusting action comprises keeping the congestion window size unchanged, adjusting the congestion window size to 1/2 of the current congestion window, adjusting the congestion window size to 10 minus the current congestion window, adjusting the congestion window size to 10 plus the current congestion window, and adjusting the congestion window size to 2 times of the current congestion window. The determined adjustment actions may also be the same for two networks of different bandwidths, and the same adjustment actions have different effects on the two different networks. In this embodiment, the same size of the congestion window after adjustment may be obtained based on different adjustment operations. For example, the size of the congestion window before adjustment is 10, the size of the congestion window obtained by adding 10 to the current congestion window is 20, and the size of the congestion window obtained by doubling the current congestion window is 20, which are the same. That is, even if different adjustment actions are determined based on different network states, the same adjustment results are obtained.

In the embodiment provided by the application, the current congestion window size can be adjusted according to the designated adjustment multiple, and the problem that different adjustment actions obtain the same adjustment result can be avoided. In addition, the designated adjustment multiple is not 1, so that the congestion window size can be kept in a changed state continuously, the change of the data transmission rate can be promoted, and further, the influence of the rate change on the network state can be perceived, so that the effectiveness of congestion control is improved. Selecting a larger first designated adjustment factor (e.g., 2.89) may cause the transmission rate of the data to rise rapidly, and correspondingly, selecting a second designated adjustment factor (e.g., 1/2.89) corresponding to the first designated adjustment factor may cause the transmission rate of the data to drop rapidly to cause the link to drain rapidly. The congestion window size is adjusted by a plurality of specified adjustment factors, so that the changed network environment can be detected, and the congestion control method provided by the embodiment of the application has quick strain capacity on the change of the network environment.

In one embodiment, the method may further include transmitting the data packet at a rate less than the current reception rate for a first duration when the preset adjustment period is reached.

The preset adjustment period may also be referred to as RTT probing phase. For example, when the preset adjustment period is reached, the data packet is sent in the first duration at a rate characterized by a preset multiple of the current reception rate, the preset multiple being less than 1. For example, the period duration of the preset adjustment period may be 10S or 11S, the first period may be 190ms or 200ms, and the preset multiple may be 0.5 times or 0.6 times. The preset adjustment period, the first duration and the preset multiple can be adjusted according to actual application conditions, and the method is not particularly limited.

The current receiving rate may be a receiving rate acquired when the ACK message is last received before the current time.

The current actual minimum round trip time may be much greater than the minimum round trip time recorded for the current acquisition period, as network conditions may change. Therefore, in order to make the recorded minimum round-trip time more accurate, when the preset adjustment period is reached, the data packet is directly sent at a rate smaller than the current receiving rate, and the first duration is maintained, so as to reduce the size of the on-fly data, drain the link, reduce the actual minimum round-trip time of the network, make the recorded minimum round-trip time more accurate, and make the congestion control method provided by the embodiment of the application adapt to the dynamic delay link.

In addition, when the first time period is reached, data may continue to be transmitted in accordance with the size of the pre-adjustment congestion window.

In one embodiment, referring to fig. 3, fig. 3 is a schematic diagram of a congestion control method according to an embodiment of the present application. The transmitting end receives and transmits data with the receiving end, and the transmitting end realizes asynchronous congestion control through the agent decision module and the data transmitting module. The Agent decision module is RL Agent (Reinforcement LEARNING AGENT ). The data transmission module includes EVENT STATE SAMPLER (event state sampler) and Round TRIP STATE SAMPLER (Round trip state sampler), which collect network state data when ACK messages and packet loss events are detected. When the timing module determines that the preset acquisition period is reached, the round-trip state sampler can integrate network state data at each appointed moment in the current acquisition period acquired by the event state sampler to obtain first network state data of the current acquisition period.

The data transmission module calculates a first reward value based on the first network state data and communicates the first network state data and the first reward value to the agent decision module. For example, the data transmission module may further include a calculation sub-module to which the round-trip state sampler may transmit first network state data, and in response, the calculation sub-module may calculate a first reward value based on the first network state data and communicate the first network state data and the first reward value to the agent decision module.

The agent decision module obtains a target adjustment strategy based on the first network state data and the first reward value, selects a designated adjustment multiple (i.e., adjustment action) from the target adjustment strategy, and sends the adjustment action to the data sending module. And the data sending module adjusts the size of the congestion window according to the adjustment action. For example, the agent decision module may predict the network model based on the adjustment policy shown in fig. 5, and process the first network state data and the first prize value to obtain the target adjustment policy.

When the RTT detection phase is reached, that is, the preset adjustment period is reached, the data transmission module transmits the data packet in the first duration at a rate less than the current receiving rate. For example, the data transmission module may further include a transmission adjustment sub-module, that is, when the preset adjustment period is reached, the transmission adjustment sub-module may transmit the data packet at a rate less than the current reception rate for the first period.

Because the first rewarding value is calculated based on the sending rate, the receiving rate and the time delay, the rewarding value calculation mode is clear in meaning, and the agent decision module (namely the reinforcement learning agent) can be effectively guided to converge and make good control decisions.

In one embodiment, before inputting the first network state data and the first prize value into the pre-trained adjustment policy prediction network model to obtain the target adjustment policy (S103), the method further includes obtaining network state data for a preset number of history periods prior to the current acquisition period as the second network state data.

Correspondingly, the first network state data and the first reward value are input into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy (S103), including:

Step S1031, inputting the first network state data, the second network state data and the first rewards value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy.

The network state data of the preset number of history periods can represent the network state at the history moment, and further, the network state data of the preset number of history periods and the network state data of the current acquisition period are combined, so that the change condition of the network state in a certain time period can be represented. Furthermore, based on the data, the adjustment strategy prediction network model can make more reasonable adjustment strategy selection, and the adjustment strategy which is more suitable for the current network state is determined, so that better congestion control effect can be obtained.

Fig. 4 is a training flowchart of an adjustment policy prediction network model in the congestion control method according to the embodiment of the present application, referring to fig. 4, the training process of the adjustment policy prediction network model includes the following steps:

step S401, acquiring network state data of a sample period as sample network state data.

The sample network state data comprises a sending rate and a receiving rate of a sample period, wherein the sending rate of the sample period is determined based on the sending rate of each appointed time in the sample period, the receiving rate of the sample period is determined based on the receiving rate of each appointed time in the sample period, the appointed time comprises a first time when an ACK message is received, the receiving rate of each first time represents the rate of receiving data between the time when the ACK message is finally received before the data packet corresponding to the first time is sent and the first time, the data packet corresponding to the first time represents the data packet responded by the ACK message received at the first time, and the sending rate of each first time represents the sending time of the data packet responded by the last received ACK message and the rate of sending data between the sending time of the data packet corresponding to the first time.

Step S402, calculating a second prize value for the sample period based on the sample network state data.

The second reward value is positively correlated with the receiving rate of the sample period and negatively correlated with the rate difference, and the rate difference represents the difference between the sending rate of the sample period and the receiving rate of the sample period.

And S403, inputting the sample network state data and the second prize value into an adjustment strategy prediction network model of the initial parameters to obtain a sample adjustment strategy and a strategy grading value.

And step S404, adjusting the current congestion window according to the sample adjustment strategy.

And step S405, adjusting model parameters of the initial parameter adjustment strategy prediction network model based on the strategy grading value and the second rewarding value until convergence conditions are reached.

Wherein, the adjustment strategy prediction network model can be realized based on the deep reinforcement learning network model.

Referring to fig. 5, fig. 5 is a schematic diagram of generating an adjustment policy based on an adjustment policy prediction network model according to an embodiment of the present application. The network model contains two fully connected layers (fully connected layer 1 and fully connected layer 2), two active layers (active layer 1 and active layer 2), and a target network layer. Each fully connected layer may contain 512 neuron nodes. The network state data is input into the network model, and the feature data output by the activation layer 2 can be obtained. And then, inputting the obtained characteristic data and the rewarding value into a target network layer together to obtain a strategy and an evaluation value. The policy includes an adjustment action for adjusting the congestion window and probabilities corresponding to the adjustment actions, and the evaluation value represents a score for the policy. The target network layer may be an LSTM (Long Short-Term Memory) layer or a full connectivity layer. In the training process, the congestion window is adjusted according to the probability corresponding to the adjustment action in the obtained strategy, the loss value is calculated based on the strategy and the evaluation value, and the model parameters of the network model are predicted by the adjustment strategy based on the loss value. Correspondingly, in the deployment stage, the congestion window can be directly adjusted according to the strategy output by the target network layer.

With respect to the foregoing method embodiment, the embodiment of the present application further provides a congestion control apparatus, referring to fig. 6, and fig. 6 is a schematic structural diagram of the congestion control apparatus provided in the embodiment of the present application, where the apparatus may include:

The first network state acquisition module 601 is configured to acquire network state data of a current acquisition period as first network state data when a preset acquisition period is reached;

A first prize value calculating module 602, configured to calculate, based on the first network state data, a prize value of a current acquisition period as a first prize value;

The target adjustment policy obtaining module 603 is configured to input the first network state data and the first reward value into a pre-trained adjustment policy prediction network model to obtain a target adjustment policy, where the adjustment policy prediction network model is obtained by training based on a reinforcement learning algorithm;

And the congestion window adjusting module 604 is configured to adjust a current congestion window according to the target adjustment policy.

By applying the congestion control device provided by the embodiment of the application, the sending rate and the receiving rate of the current acquisition period can be obtained aiming at different network environments, and the sending rate and the receiving rate can effectively embody the current network state, so that the congestion control can be effectively realized based on the obtained information. The congestion window can be adjusted only by acquiring the sending rate and the receiving rate of the current acquisition period because whether the congestion signal is network congestion or not is not needed to be distinguished. Therefore, the congestion control device of the application can be applied to complex network environments and can improve the effectiveness of congestion control. In addition, since the reward value is related to the sending rate and the receiving rate of the current acquisition period, the receiving rate and the sending rate are made equal as much as possible through the reward mechanism. Therefore, the congestion window is adjusted according to the target adjustment strategy obtained based on the reward value, a larger bandwidth utilization rate can be obtained, and the problem of excessive transmission caused by the fact that the transmission rate is larger than the receiving rate can be avoided.

In one embodiment, the first prize value is positively correlated with the rate of receipt of the current acquisition period and negatively correlated with the rate difference for the current acquisition period, the rate difference for the current acquisition period representing the difference between the rate of transmission of the current acquisition period and the rate of receipt of the current acquisition period.

In one embodiment, the first network status data further comprises at least one of:

the average time delay of the current acquisition period represents the average value of the time delays acquired at each appointed time in the current acquisition period, and the time delay acquired at one appointed time represents the difference value between the round trip time acquired at the appointed time and the minimum value in the round trip time acquired at each appointed time at the appointed time;

The size of the transmitted data in the current acquisition period;

The size of the lost data packet in the current acquisition period;

In one embodiment, the specified time further includes a second time at which the packet loss event was detected;

In one embodiment, the first prize value is inversely related to the average time delay of the current acquisition period.

In one embodiment, referring to fig. 7, fig. 7 is another schematic structural diagram of a congestion control device according to an embodiment of the present application, and the first prize value calculating module 602 includes:

The first threshold value judging submodule 6021 is used for judging whether the average time delay of the current acquisition period is smaller than a first threshold value, if yes, the first reward value calculating submodule 6022 is triggered, and if not, the second reward value calculating submodule 6023 is triggered;

a first prize value calculating submodule 6022 for determining a first prize value of the current acquisition period as a receiving rate of the current acquisition period;

The second rewards value calculating submodule 6023 is used for calculating rewards value of the current collection period as the first rewards value based on the sending rate of the current collection period, the receiving rate of the current collection period, the average time delay of the current collection period and the minimum round trip time of the current collection period.

In one embodiment, the first threshold is positively correlated with the minimum round trip time of the current acquisition cycle.

In one embodiment, the first threshold is calculated based on equation (4) above;

The second rewarding value calculating submodule 6023 is configured to calculate, as the first rewarding value, a rewarding value of the current collection period according to the above formula (5) based on the sending rate of the current collection period, the sending rate of the current collection period, the receiving rate of the current collection period, the average time delay of the current collection period, and the minimum round trip time of the current collection period.

In one embodiment, the sending rate of the current collection period is the sending rate of the last appointed time in the current collection period, and the receiving rate of the current collection period is the receiving rate of the last appointed time in the current collection period.

In one embodiment, the target adjustment policy includes two or more first specified adjustment multiples of greater than 1, two or more second specified adjustment multiples that are reciprocal to the two or more first specified adjustment multiples, and a probability corresponding to each specified adjustment multiple;

The congestion window adjusting module 604 is specifically configured to adjust the current congestion window according to a specified adjustment multiple with the largest corresponding probability.

In an embodiment, referring to fig. 8, fig. 8 is another schematic structural diagram of a congestion control apparatus provided in an embodiment of the present application, where the apparatus further includes:

The sending rate adjusting module 605 is configured to send the data packet at a rate less than the current receiving rate in the first duration when the preset adjustment period is reached.

In embodiments of the present application, the sending rate adjustment module 605 and other modules in the congestion control device may be processed asynchronously.

In one embodiment, the apparatus further comprises:

and the second network state acquisition module is used for acquiring network state data of a preset number of history periods before the current acquisition period as second network state data before inputting the first network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy.

And the target adjustment strategy acquisition module is used for inputting the first network state data, the second network state data and the first rewarding value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy.

In an embodiment, referring to fig. 9, fig. 9 is another schematic structural diagram of a congestion control apparatus provided in an embodiment of the present application, where the congestion control apparatus further includes:

A training module 606, configured to obtain network state data of a sample period as sample network state data;

The sample network state data comprises a sending rate and a receiving rate of a sample period, wherein the sending rate of the sample period is determined based on the sending rate of each appointed time in the sample period, the receiving rate of the sample period is determined based on the receiving rate of each appointed time in the sample period, the appointed time comprises first time when an ACK message is received, the receiving rate of each first time is represented by the rate of receiving data between the time when the ACK message is finally received before the data packet corresponding to the first time is sent and the first time, the data packet corresponding to the first time is represented by the data packet responded by the ACK message received at the first time, and the sending rate of each first time is represented by the sending time of the data packet responded by the ACK message last received before the data packet corresponding to the first time and the data packet corresponding to the first time;

The embodiment of the application also provides an electronic device, as shown in fig. 10, which comprises a processor 1001, a communication interface 1002, a memory 1003 and a communication bus 1004, wherein the processor 1001, the communication interface 1002 and the memory 1003 complete communication with each other through the communication bus 1004,

A memory 1003 for storing a computer program;

The processor 1001 is configured to execute a program stored in the memory 1003, and implement the following steps:

when a first preset period is reached, acquiring network state data of a current acquisition period as first network state data;

The first network state data comprises a sending rate and a receiving rate of a current collection period, wherein the sending rate of the current collection period is determined based on the sending rate of each appointed time in the current collection period, the receiving rate of the current collection period is determined based on the receiving rate of each appointed time in the current collection period, the appointed time comprises a first time when an ACK message is received, the receiving rate of each first time represents the rate of receiving data between the time when the ACK message is received last before a data packet corresponding to the first time is sent and the first time, the data packet corresponding to the first time represents the data packet responded by the ACK message received at the first time, and the sending rate of each first time represents the sending time when the data packet corresponding to the last received ACK message is sent before the data packet corresponding to the first time is sent and the data packet corresponding to the first time is sent;

inputting the first network state data and the first rewarding value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy, wherein the adjustment strategy prediction network model is obtained by training based on a reinforcement learning algorithm;

The communication bus mentioned above for the electronic device may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The Processor may be a general-purpose Processor including a central processing unit (Central Processing Unit, CPU), a network Processor (Network Processor, NP), etc., or may be a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.

In yet another embodiment of the present application, there is also provided a computer readable storage medium having stored therein a computer program which when executed by a processor implements the steps of any of the congestion control methods described above.

In yet another embodiment of the present application, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the congestion control methods of the above embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A congestion control method, characterized in that the method comprises:

When the preset collection period is reached, the network status data of the current collection period is obtained as the first network status data;

Wherein, the first network status data includes the sending rate and receiving rate of the current collection cycle; the sending rate of the current collection cycle is determined based on the sending rate collected at each specified time in the current collection cycle; the receiving rate of the current collection cycle is determined based on the receiving rate collected at each specified time in the current collection cycle; the specified time includes the first time when the ACK message is received; the receiving rate of each first time represents: the rate of receiving data between the time when the last ACK message is received before the data packet corresponding to the first time is sent and the first time; the data packet corresponding to the first time represents the data packet responded by the ACK message received at the first time; the sending rate of each first time represents: the rate of sending data between the sending time of the data packet responded by the last received ACK message and the sending time of the data packet corresponding to the first time;

Based on the first network status data, calculating a reward value of a current collection cycle as a first reward value;

Inputting the first network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy; wherein the adjustment strategy prediction network model is trained based on a reinforcement learning algorithm;

According to the target adjustment strategy, the current congestion window is adjusted;

The first network status data also includes:

The minimum round-trip time of the current collection cycle means: in the current collection cycle, when each specified time is reached, the minimum value of the round-trip time of each specified time that has been collected;

The average latency of the current collection period is: the average value of the latency collected at each specified time in the current collection period;

The step of calculating the reward value of the current collection cycle based on the first network status data as the first reward value includes:

Determine whether the average delay of the current collection period is less than a first threshold;

If yes, then determining the first reward value of the current acquisition period as the receiving rate of the current acquisition period;

If not, the reward value of the current collection cycle is calculated as the first reward value based on the sending rate of the current collection cycle, the receiving rate of the current collection cycle, the average delay of the current collection cycle, and the minimum round-trip time of the current collection cycle.

2. The method according to claim 1 is characterized in that the first reward value is positively correlated with the receiving rate of the current acquisition cycle, and negatively correlated with the rate difference of the current acquisition cycle; the rate difference of the current acquisition cycle represents the difference between the sending rate of the current acquisition cycle and the receiving rate of the current acquisition cycle.

3. The method according to claim 1, wherein the first network status data further comprises at least one of the following:

The average round-trip time of the current collection cycle means: the average round-trip time collected at each specified time in the current collection cycle;

The average congestion window size of the current collection period is: the average value of the congestion window sizes collected at each specified time in the current collection period;

The average in-flight data size of the current collection period indicates: the average value of the in-flight data size collected at each specified time in the current collection period; the in-flight data size collected at a specified time indicates: the size of the data packet that has been sent at the specified time and has not received the corresponding ACK message;

The size of data sent during the current collection cycle;

The size of the data packet in response to the ACK message received during the current collection cycle;

The size of the data packets lost in the current collection cycle;

The number of ACK messages received during the current collection period that indicate congestion signals.

4. The method according to claim 3, characterized in that the specified time also includes a second time when a packet loss event is detected;

The round trip time collected at each second moment represents: the round trip time collected when the last ACK message was received before the second moment;

The delay collected at each second moment represents: the delay collected when the last ACK message was received before the second moment;

The sending rate collected at each second moment represents: the sending rate collected when the last ACK message was received before the second moment;

The receiving rate collected at each second moment represents: the receiving rate collected when the last ACK message was received before the second moment.

5. The method according to claim 3, characterized in that the first reward value is negatively correlated with the average delay of the current acquisition cycle.

6 . The method according to claim 1 , wherein the first threshold is positively correlated with a minimum round trip time of a current acquisition cycle.

7. The method according to claim 6, characterized in that the first threshold is calculated based on a first formula;

The first formula is:

S＝εMinRtt+ρ

Wherein, S represents the first threshold, MinRtt represents the minimum round trip time of the current acquisition cycle, ε represents the first preset parameter, and ρ represents the second preset parameter;

The reward value of the current collection cycle is calculated as the first reward value based on the sending rate of the current collection cycle, the receiving rate of the current collection cycle, the average delay of the current collection cycle, and the minimum round-trip time of the current collection cycle, including:

Based on the sending rate of the current collection cycle, the sending rate of the current collection cycle of the current collection cycle, the receiving rate of the current collection cycle, the average delay of the current collection cycle, and the minimum round-trip time of the current collection cycle, the reward value of the current collection cycle is calculated according to the second formula as the first reward value; wherein the second formula is:

Among them, reward represents the first reward value, AR represents the receiving rate of the current collection cycle, D represents the average delay of the current collection cycle, MinRtt represents the minimum round-trip time of the current collection cycle, SR represents the sending rate of the current collection cycle, and δ represents the third preset parameter.

8. The method according to claim 1, characterized in that the target adjustment strategy includes two or more first specified adjustment multiples greater than 1, two or more second specified adjustment multiples that are reciprocal of the two or more first specified adjustment multiples, and a probability corresponding to each specified adjustment multiple;

The adjusting the current congestion window according to the target adjustment strategy includes:

The current congestion window is adjusted according to the specified adjustment multiple with the maximum corresponding probability.

9. The method according to claim 1, characterized in that the method further comprises:

When the preset adjustment period is reached, data packets are sent at a rate lower than the current receiving rate within a first time period.

10. According to the method of claim 1, the training process of the adjustment strategy prediction network model comprises the following steps:

Obtaining network status data of a sample period as sample network status data;

Wherein, the sample network status data includes: the sending rate and receiving rate of the sample period; the sending rate of the sample period is determined based on the sending rate collected at each specified time in the sample period; the receiving rate of the sample period is determined based on the receiving rate collected at each specified time in the sample period; the specified time includes the first time when the ACK message is received; the receiving rate of each first time represents: the rate of receiving data between the time when the last ACK message is received before the data packet corresponding to the first time is sent and the first time; the data packet corresponding to the first time represents the data packet responded by the ACK message received at the first time; the sending rate of each first time represents: the rate of sending data between the sending time of the data packet responded by the last received ACK message and the sending time of the data packet corresponding to the first time;

Based on the sample network status data, a second reward value of the sample period is calculated; wherein the second reward value is positively correlated with the receiving rate of the sample period, and negatively correlated with the rate difference of the sample period; the rate difference represents the difference between the sending rate of the sample period and the receiving rate of the sample period;

Input the sample network state data and the second reward value into the adjustment strategy prediction network model of the initial parameters to obtain a sample adjustment strategy and a strategy score value;

Adjust the current congestion window according to the sample adjustment strategy;

Based on the strategy score value and the second reward value, the model parameters of the initial parameter adjustment strategy prediction network model are adjusted until a convergence condition is reached.

11. A congestion control device, characterized in that the device comprises:

A first network status collection module, used for acquiring network status data of a current collection period as first network status data when a preset collection period is reached;

A first reward value calculation module, used to calculate the reward value of the current collection cycle based on the first network status data as a first reward value;

A target adjustment strategy acquisition module, used to input the first network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy; wherein the adjustment strategy prediction network model is obtained by training based on a reinforcement learning algorithm;

A congestion window adjustment module, used to adjust the current congestion window according to the target adjustment strategy;

The first network status data also includes:

The first reward value calculation module includes:

A first threshold judgment submodule, used to judge whether the average delay of the current acquisition cycle is less than a first threshold, and if so, trigger the first reward value calculation submodule, and if not, trigger the second reward value calculation submodule;

A first reward value calculation submodule, used to determine the first reward value of the current acquisition cycle as the receiving rate of the current acquisition cycle;

The second reward value calculation submodule is used to calculate the reward value of the current collection cycle as the first reward value based on the sending rate of the current collection cycle, the receiving rate of the current collection cycle, the average delay of the current collection cycle, and the minimum round-trip time of the current collection cycle.

12. The device according to claim 11 is characterized in that the first reward value is positively correlated with the receiving rate of the current acquisition cycle, and negatively correlated with the rate difference of the current acquisition cycle; the rate difference of the current acquisition cycle represents the difference between the sending rate of the current acquisition cycle and the receiving rate of the current acquisition cycle.

13. The device according to claim 11, wherein the first network status data further comprises at least one of the following:

The size of data sent during the current collection cycle;

The size of the data packets lost in the current collection cycle;

14. The device according to claim 11, wherein the specified time also includes a second time when a packet loss event is detected;

15. The device according to claim 11, characterized in that the first reward value is negatively correlated with the average delay of the current acquisition cycle.

16 . The device according to claim 11 , wherein the first threshold is positively correlated with a minimum round trip time of a current acquisition cycle.

17. The device according to claim 16, wherein the first threshold is calculated based on a first formula;

The first formula is:

S＝εMinRtt+ρ

The second reward value calculation submodule is used to calculate the reward value of the current collection cycle as the first reward value according to the second formula based on the sending rate of the current collection cycle, the sending rate of the current collection cycle of the current collection cycle, the receiving rate of the current collection cycle, the average delay of the current collection cycle, and the minimum round-trip time of the current collection cycle; wherein the second formula is:

18. The device according to claim 11, characterized in that the target adjustment strategy includes two or more first specified adjustment multiples greater than 1, two or more second specified adjustment multiples that are reciprocal of the two or more first specified adjustment multiples, and a probability corresponding to each specified adjustment multiple;

The congestion window adjustment module is specifically used to adjust the current congestion window according to the designated adjustment multiple with the maximum corresponding probability.

19. The device according to claim 11, characterized in that the device further comprises:

The sending rate adjustment module is used to send data packets at a rate lower than the current receiving rate within a first time period when a preset adjustment period is reached.

20. The device according to claim 11, characterized in that the device further comprises:

A training module, used for obtaining network status data of a sample period as sample network status data;

21. An electronic device, characterized in that it comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other via the communication bus;

Memory, used to store computer programs;

A processor, for implementing the method steps described in any one of claims 1-10 when executing a program stored in a memory.

22. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the method steps described in any one of claims 1 to 10 are implemented.