HK1130972B

HK1130972B - A method and system for improving time delay in the power management of active state

Info

Publication number: HK1130972B
Application number: HK09108966.4A
Authority: HK
Inventors: 斯蒂文．B．林赛
Original assignee: 美国博通公司
Priority date: 2007-10-11
Filing date: 2009-09-29
Publication date: 2013-04-26

Description

Method and system for improving latency in active state power management

Technical Field

The present invention relates to computer systems, and more particularly to a PCI-E interface in a computer system, and more particularly to a method and system for improving PCI-E L1ASPM latency during active state power management.

Background

PCI-E (peripheral Component Interconnect express) interfaces are used in servers, desktop computers, and mobile PCs. One important power saving feature of PCI-E is Active State Power Management (ASPM). When the L1ASPM is enabled in a particular PCI-E link and the link is in an inactive state for a period of time (e.g., tens or hundreds of microseconds), the PCI-E link will transition to the L1 state, consuming substantially less power than full power (i.e., a fully functional L0(on) state). In the L1 state, the PCI-E clock is stopped and the PLL is powered down to save power. However, in order for a device to begin DMA and transfer data over a PCI-E link, the link must return to the L0 state.

The transition from L1 to L0 is not instantaneous. This transition time period is referred to as the "L1 exit latency". The L1 retirement latency begins from the point in time when the device decides that it needs to perform a PCI-E transaction (e.g., DMA) and begins the transition to L0. The L1 exit latency ends when the PCI-E link has completely transitioned to the L0 state. The exact L1 exit latency depends on the design of the devices across the PCI-E link, but the latency will be greater than 20 microseconds if the PLL is not powered down and greater than 100 microseconds if the PLL is powered down.

Gigabit and ethernet-express controllers can use the PCI-E bus to connect PCs because PCI-E is a common high-speed peripheral interface. Furthermore, it is highly desirable for these Ethernet controllers with PCI-E interfaces to support L1ASPM so that the PCI-E link can be automatically placed into a low power state during the time the interface is inactive. However, long L1 delays will negatively impact network response and performance. This is because the L1 exit latency affects the latency required for one network station to process and respond to network packets sent by another network station. At gigabit ethernet speeds, even a 10 microsecond delay is highly undesirable in certain applications or references involving delay-sensitive applications.

The L1 exit latency of a device depends on the physical layer design of the device. A trade-off can be made between the performance, cost and complexity of the physical layer design. The L1 exit delay is therefore in a very large range of slightly less than 10 microseconds to several hundred microseconds. Even devices with a "lower" L1 exit latency have an exit latency from L1 of greater than 30 (and in some cases greater than 100) microseconds when the PCI-E reference clock and PLL have powered down, because the clock needs to restart at the transition to L0, the PLL needs to reacquire the clock.

Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.

Disclosure of Invention

A method and/or system for improving effective PCI-E L1ASPM exit latency by prospectively initiating a transition at an earlier point in time, substantially as shown in and/or described in connection with at least one of the figures, as set forth more completely in the claims.

According to one aspect of the invention, there is provided a method of improving latency in active state power management, the method comprising:

entering a low power PCI-E state;

predicting transactions that will require a full power PCI-E state;

transitioning to a full-power PCI-E state based on the predicted transaction.

Preferably, predicting the transaction comprises scheduling an event, wherein the event starts upon expiration of a timer.

Preferably, the event comprises a status update.

Preferably, the event comprises an interrupt generation.

Preferably, the event comprises transmitting the statistical value to the host.

Preferably, the transition to the full power state occurs after a delay.

Preferably, the delay is time-based.

Preferably, predicting the transaction comprises:

receiving a data packet;

an address associated with the data packet is verified.

Preferably, the transition to the full power PCI-E state occurs after a delay.

Preferably, the delay is time-based.

Preferably, the delay is based on the amount of data received.

Preferably, the delay is based on the amount of data received relative to the length of the data packet.

Preferably, the method comprises: the packet is checked for errors after transitioning to the full power PCI-E state and a transition back to the low power (L1) state is then initiated after expiration of the inactivity timer.

According to one aspect of the invention, there is also provided a method of reducing state transitions during active state power management, the method comprising:

predicting a DMA transaction;

resetting an inactivity timer based on the predicted DMA transaction.

According to another aspect of the invention, there is provided a system for improving latency in active state power management, the system comprising:

an interface having power management features, wherein the power management features include a low power PCI-E state and a full power PCI-E state;

a controller to instruct the interface to initiate a transition from a low-power PCI-E state to a full-power PCI-E state, wherein the controller predicts a demand for the full-power PCI-E state.

Preferably, the controller is an ethernet MAC.

Preferably, the controller is a WLAN controller.

Preferably, the controller determines that the data packet has passed address filtering before initiating the transition from the low power PCI-E state to the full power PCI-E state.

Preferably, the controller generates a delay between predicting a full power PCI-E state requirement and initiating the transition.

Preferably, the delay is time-based.

Preferably, the delay is based on the amount of data received.

Preferably, the delay is based on the amount of data received relative to the total amount of data in the data packet.

Preferably, the controller operates at one or more speeds.

Preferably, the time between predicting full power PCI-E state demand and initiating a transition is determined in accordance with an operating speed of the controller.

According to one aspect of the invention, there is provided a system for reducing state transitions during active state power management, the system comprising:

an interface having power management features;

a controller for resetting the inactivity timer when a DMA transaction is predicted.

Various advantages, aspects and novel features of the invention, as well as details of an illustrated embodiment thereof, will be more fully described with reference to the following description and drawings.

Drawings

FIG. 1 is a flowchart of a method for improving PCI-E L1ASPM exit latency according to a first embodiment of the present invention;

FIG. 2 is a flowchart of a method for improving PCI-E L1ASPM exit latency according to a second embodiment of the present invention;

FIG. 3 is a diagram illustrating a system for improving PCI-E L1ASPM exit latency according to an embodiment of the present invention.

Detailed Description

The invention will be further explained with reference to the following figures and examples:

the present invention relates to improving PCI-E L1 Active State Power Management (ASPM) exit latency by proactively starting L1 exit at an earlier point in time based on network stimuli (network stimuli). The improved latency may enable higher levels of performance and responsiveness while supporting the benefits of ASPM. The present invention may be embedded within a Network Interface Controller (NIC) having a PCI-E interface supporting ASPM. Although the following description will be presented in conjunction with a specific embodiment of a PCI-E interface, there are many other embodiments that may use these systems and methods. The present invention may also reduce latency in other processes using a PCI-E interface.

According to various embodiments of the invention, the smart NIC may predict the need to exit the L1 state earlier than normal (just before the NIC must initiate DMA) based on network stimuli. In other words, the present invention allows the NIC to initiate the transition from L1 to L0 just before the device pending PCI-E transaction (e.g., DMA read or write) is ready to initiate. In accordance with the present invention, the NIC may prospectively initiate a transition from a low power L1 state to a full power L0 state. By anticipating and initiating this transition earlier, section L1 may be exited from the latency mask and the PCI-E link may return to the L0 state faster. Faster return to the L0 state may improve the performance and responsiveness of a network controller that supports PCI-E active state power management.

The transition from L1 to L0 may be started by the device immediately after it is requested to initiate a PCI-E transaction. If the NIC receives a packet, the packet will be fully buffered and verified before the DMA request sends the packet to the main memory. A gigabit (and faster) NIC may completely buffer a packet before requesting DMA. The slower NIC can only buffer a portion of the packet before requesting DMA.

To reduce latency, the transition from L1 to L0 may be initiated based on prediction (prediction) before the device actually has a pending PCI-E transaction. The transition from L1 to L0 may begin when the NIC is able to make a decision that it is highly likely that a DMA request will need to be made in the near future. This may provide a lead of more than 10 microseconds.

Fig. 1 is a flowchart of a method for improving PCI-E L1ASPM exit latency according to a first embodiment of the present invention. In step 101, the NIC is activated to receive a data packet. In step 103, the NIC verifies that an inbound packet has passed address filtering as soon as a sufficient portion of the packet has been received. It can be presumed that the packet is error free and requires DMA to main memory.

In step 105, the NIC determines what state the PCI-E interface is in. If the PCI-E interface is in the L0 state, the PCI-E inactivity counter for the device is reset in step 107 and the NIC returns to step 101 waiting for a packet to be received. If the PCI-E interface is in the L1 state and the device is found to be in the PCI D3 state in step 109, the NIC returns to step 101 waiting to receive a packet. If the PCI-E interface is in the L1 state and the device is not found to be in the PCI D3 state in step 109, the NIC will initiate a transition from L1 to L0 and reset the PCI-E inactivity timer in step 111. This happens just before the entire packet is received and may allow the NIC to start the transition from L1 to L0 at least 10 microseconds earlier than if waiting for the DMA engine to request a DMA transfer.

By prospectively requesting an earlier transition from L1 to L0, the NIC may transition the bus to the L0 state without actually making DMA requests. If the NIC finally determines that the packet is erroneous (e.g., has an FCS error), the NIC may discard the packet and not initiate a DMA request for the packet data.

The PCI-E specification allows the transition to L0 to occur even if the state transition does not immediately cause a PCI-E transaction. The downside of making an unnecessary transition from L1 to L0 is that the bus will consume slightly more power in a short period of time. In step 113, an inactive state timer is used to determine when a device should transition the link from L0 back to L1 because of inactivity. In step 115, the bus transitions back to the L1 state due to the inactive state. If the number of unnecessary L1 to L0 transitions is kept to a minimum (e.g., less than 5%), the negative impact of power consumption is negligible and the improvement in L1 exit latency will be noticeable and noticeable. Error rate on gigabit Ethernet less than 10^-10(by specification) and may be below 10 for most media types^-12. Therefore, the packet error rate on ethernet will be less than 1%.

To prevent unnecessary L0 to L1 to L0 transitions from occurring, the NIC may use the same advance indication to reset the NIC's PCI-E inactive state timer earlier than normal in step 107 and step 111. By resetting the timer early when an event occurs (e.g., when a packet is received but before the DMA for that packet is ready), unnecessary L1 transitions may be avoided. In step 117, the PCI-E inactivity timer may be reset even though the PCI-E link is processing network and PCI traffic in the L0 state. While this reset does not improve the L1 exit latency, it may reduce unnecessary L0 to L1 to L0 transitions by early clearing of the inactivity timer when a DMA transaction is generated.

FIG. 2 is a flowchart of a method for improving PCI-E L1ASPM exit latency according to a second embodiment of the present invention.

In step 201, a DMA event timer is used to schedule a DMA event. Once the DMA event timer expires in step 205, a DMA event will be initiated in step 207. The DMA event may be a "host coalition" event that updates the driver running on the host with the current hardware state and generates an interrupt. The DMA event may also be the transfer of the current on-chip statistics counter to the host. In step 203, the transition from L1 to L0 may begin before the DMA event timer expires. By initiating the L1 to L0 transition after the DMA timer is set and just before the timer expires, the device may be configured such that the L1 to L0 transition is completed just before the timer expires, thereby completely masking the L1 exit latency.

FIG. 3 is a diagram illustrating a system for improving PCI-E L1ASPM exit latency according to an embodiment of the present invention. Fig. 3 shows a gigabit ethernet controller 300. However, the following description may also be applicable to other classes of network devices having a PCI-E bus, such as WLAN devices.

MAC301 may support: 1) an Ethernet Media Access Control (MAC) function; 2) the ethernet 802.3 protocol; 3) an interface to a physical layer (PHY); 4) classifying the data packet; 5) error detection logic for inbound data packets; and/or 6) memory for temporary packet buffering. MAC301 may also include: 1) logic for offloading checksum calculations; 2) accelerators for TCP/IP or IPSEC traffic; and/or 3) other embedded processors. The DMA engine 303 is responsible for initiating DMA read and write requests to the PCI-E core 305. The PCI-E core 305 is responsible for generating the actual DMA requests on the PCI-E bus, supporting the PCI-E protocol, and providing PCI-E target support.

When a packet is received over the ethernet, the data in the packet will pass through the modules in the chip. For received packets, data will enter the device from the physical layer 309 through the network interface 307 and be processed by the MAC 301. In order for the MAC to ensure that the packet does not contain fcs (aka crc) errors, the entire packet must be received. If there are no errors in the packet and the packet is intended for the system, DMA engine 303 will form a DMA request to PCI-E core 305 to transfer the packet to main memory across PCI bus 311.

Gigabit Ethernet NICs can support 1Gb/s of operation, as well as lower speed, e.g., 100mb and 10mb of operation. Inbound network packets may also arrive slower when operating at lower network speeds. The time required to receive a maximum size ethernet (1518B) packet at 100mb is approximately 122 microseconds, which is greater than the L1 exit latency. In the case where the PLL power down is inactive in the L1 state, the L1 exit latency will be greater than 10 microseconds but less than 64 microseconds. However, if the PLL is powered down, the L1 exit delay will be greater than 100 microseconds. Thus, when operating at a speed of 10mb or 100mb and receiving a packet in the L1 state, it may be desirable to initiate the L1-to-L0 transition at a point in time corresponding to a later point within the inbound received packet. For example, assume that a device operating at 100mb speed receives a 1518B ethernet packet of maximum size. The time required for the maximum size inbound packet to arrive completely (e.g., about 120 microseconds) will exceed the L1 exit delay (i.e., if the PLL is powered down and activated). Therefore, it is preferable not to initiate the L1 to L0 transition just after the MAC header of the packet arrives, but at some later point within the packet (e.g., 1000B). This may allow the PCI-E interface not to transition to L0 prematurely for a certain period of time. Thus, a preferred implementation of this feature includes a feature that causes the NIC's software driver to configure a delay factor from the point in time that the NIC has determined that inbound packets have passed address filtering to the point in time that the NIC should initiate an L1 to L0 transition. The delay factor may be time based (e.g., microseconds) or based on the amount of data received for a particular packet. The software driver may use various criteria such as network speed, whether the PLL is powered down active or inactive, and expected device L1 exit latency on a particular system as criteria for determining the delay factor.

To support the earlier L1 to L0 transition, the signal sent from the MAC301 to the PCI-E core 305 instructs the PCI-E core to initiate an L1 to L0 transition. This signal may be edge triggered and the MAC may generate a pulse when it wants to instruct the PCI-E core to enter L0. The use of this signal may be activated or deactivated by software for debugging and diagnostic purposes. This may be done through device specific register bits (which may be configured through a device driver).

PCI-E core 305 contains logic to recognize the pulse on this signal. If the above-described feature has been activated at the device level and the device is in the L1ASPM state and the D0 device state, the PCI-E core 305 will initiate an L1 to L0 transition when it recognizes that the signal is asserted (i.e., when it detects a rising edge of the signal). Once the transition has been made to L0, PCI-E core 305 will reset the PCI-E inactive state timer so that when there is no activity on the bus for a period of time, the device will initiate a transition back to L1. If the device is in the D3 state, this signal will be ignored entirely by the PCI-E core. If the device is not in the L1ASPM state but in the L0 state, the device will immediately reset its PCI-E inactive state timer when it detects a pulse on this signal. This may provide the benefit of eliminating a possible unnecessary L0 to L1 to L0 transition if the inactive state timer is nearing expiration when the earlier indication signal is asserted.

To support the earlier L1 to L0 transitions due to packet reception, MAC301 may include logic to allow MAC301 to pulse on a signal to PCI-E core 305 to tell PCI-E core 305 to start an L1 to L0 transition immediately after MAC301 determines that an inbound packet has passed address filtering.

An "early L1 exit delay" register may be added that is configured by software to delay the pulse from MAC301 to PCI-E core 305 for n microseconds, or until n bytes of the packet have been received. In a 1Gb network environment, this register would normally be set to "0" (i.e., no additional delay), but if the network speed is slow (e.g., 100mb), a delay value would be used to generate an earlier L1 exit pulse before the DMA write packet is issued, and thereby reduce the L1 exit latency. By using this delay value, the pulse is not as early as when the packet first passes address filtering, as that would result in an L1 to L0 transition, which is too early if the PLL power down is not activated. The benefit of using the "n bytes" approach to 10/100 half-duplex networks is that software can set the delay threshold outside of the collision window (e.g., header 64B within the packet).

The delay value is related to the length of the data packet. The length of the data packet can be determined by hardware inspection of the typ/len field in the ethernet header and/or the IP len field in the IP header of the unsegmented IP packet.

The NIC may have a timer to cause generation of the DMA when the timer expires. Examples of such timers include a timer for aggregating interrupt events, a timer for aggregating status module update events, or a timer for generating periodic DMAs of on-chip statistics to the host. It is possible to predict when a particular timer will expire. To support the earlier L1 to L0 transition due to the expected recent expiration of the timer, the control logic within MAC301 may be modified to generate an "earlier L1 exit pulse" when a programmed threshold is reached before the timer expires. For example, one or more registers may be added to enable a software driver to configure the device to initiate an internal L1 exit signal n microseconds before a particular counter expires. For example, for a statistics module update timer that normally expires every second, if n equals 30, the device will initiate an L1 to L0 transition 30 microseconds before preparing to issue a statistics module update DMA. If the device is in the L1ASPM state, this will mask the 30 microsecond L1 exit delay. If the PCI-E link is in the L0 state, the PCI-E inactive state timer will be reset to prevent unnecessary L0 to L1 transitions from occurring before the timer expires.

If the device is in the L0 state and its PCI-E inactive state timer is nearing expiration, the assertion of the signal to make the earlier L1 to L0 transition will cause the PCI-E inactive state timer to immediately reset, thereby preventing the recent L0 to L1 transition. The inactivity timer may again be reset when the actual DMA occurs after the earlier indication of look-ahead indicates that DMA will occur.

If the device is in the D3 state and WoL is enabled, and the device receives a packet, the L1 to L0 transition will not be initiated simply because the device received the packet. In the D3 state, the early L1 exit feature within the PCI-E core is disabled. If the device receives a WoL packet, it may assert a WAKE # signal, which WAKEs up the system and causes the PCI-E interface to reset.

The present invention can be realized in hardware, software, or a combination of hardware and software. The present invention can be realized in a centralized fashion in at least one computer system, or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software could be a general purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein. The method is implemented in a computer system using a processor and a memory unit.

The present invention can also be implemented by a computer program product, which comprises all the features enabling the implementation of the methods of the invention and which, when loaded in a computer system, is able to carry out these methods. The computer program in this document refers to: any expression, in any programming language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to other languages, codes or symbols; b) reproduced in a different format.

While the invention has been described with reference to several embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from its scope. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims

1. A method for improving latency in an active state power management process, the method comprising:

entering a low power PCI-E state;

predicting transactions that will require a full power PCI-E state;

transitioning to a full-power PCI-E state based on the predicted transaction, wherein transitioning to a full-power PCI-E state occurs after a delay that is based on time or based on an amount of data received.

2. The method of claim 1, wherein predicting the transaction comprises scheduling an event, wherein the event begins when a timer expires.

3. The method of claim 2, wherein the event comprises a status update.

4. The method of claim 2, wherein the event comprises interrupt generation.

5. The method of claim 2, wherein the event comprises transmitting statistics to a host.

6. A system for improving latency during active state power management, the system comprising:

a controller to instruct the interface to initiate a transition from a low-power PCI-E state to a full-power PCI-E state, wherein the controller predicts a need for the full-power PCI-E state, the transition to the full-power PCI-E state occurs after a delay, the delay is based on time or based on an amount of data received.

7. The system of claim 6, wherein the controller is an Ethernet MAC.

8. The system of claim 6, wherein the controller is a WLAN controller.