US20040054841A1 - Method and apparatus for promoting memory read commands - Google Patents
Method and apparatus for promoting memory read commands Download PDFInfo
- Publication number
- US20040054841A1 US20040054841A1 US10/640,891 US64089103A US2004054841A1 US 20040054841 A1 US20040054841 A1 US 20040054841A1 US 64089103 A US64089103 A US 64089103A US 2004054841 A1 US2004054841 A1 US 2004054841A1
- Authority
- US
- United States
- Prior art keywords
- data
- signal
- read request
- response
- phases
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4204—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
- G06F13/4234—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being a memory bus
- G06F13/4243—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being a memory bus with synchronous protocol
Definitions
- This invention relates generally to communication between devices on different buses of a computer system, and, more particularly, to a method and apparatus for promoting memory read commands and advantageously prefetch data to reduce bus latency.
- Computer systems of the PC type typically employ an expansion bus to handle various data transfers and transactions related to I/O and disk access.
- the expansion bus is separate from the system bus or from the bus to which the processor is connected, but is coupled to the system bus by a bridge circuit.
- expansion bus architectures have been used in the art, including the ISA (Industry Standard Architecture) expansion bus, an 8-Mhz, 16-bit device and the EISA (Extension to ISA) bus, a 32-bit bus clocked at 8-Mhz.
- ISA Industry Standard Architecture
- 8-Mhz 8-Mhz
- 16-bit device 8-Mhz
- EISA Extension to ISA
- 32-bit bus 32-bit bus clocked at 8-Mhz.
- PCI Peripheral Component Interconnect
- Intel Corporation the PCI (Peripheral Component Interconnect) bus standard was proposed by Intel Corporation as a longer-term expansion bus standard specifically addressing burst transfers.
- the original PCI bus standard has been revised several times, with the current standard being Revision 2.1, available from the PCI Special Interest Group, located in Portland, Oreg.
- the PCI Specification, Rev. 2.1 is incorporated herein by reference in its entirety.
- the PCI bus provides for 32-bit or 64-bit transfers at 33 or 66 MHz. It can be populated with adapters requiring fast access to each other and/or with system memory, and that can be accessed by the host processor at speeds approaching that of the processor's native bus speed.
- a 64-bit, 66-MHz PCI bus has a theoretical maximum transfer rate of 528 MByte/sec. All read and write transfers over the bus may be burst transfers. The length of the burst may be negotiated between initiator and target devices, and may be any length.
- a CPU operates at a much faster clock rate and data access rate than most of the resources it accesses via a bus.
- this delay in reading data from a resource on the bus was handled by inserting wait states.
- the processor When a processor requested data that was not immediately available due to a slow memory or disk access, the processor merely marked time using wait states, doing no useful work, until the data finally became available.
- a processor such as the Pentium Pro (P6), offered by Intel Corporation, provides a pipelined bus that allows multiple transactions to be pending on the bus at one time, rather than requiring one transaction to be finished before starting another.
- the P6 bus allows split transactions, i.e., a request for data may be separated from the delivery of the data by other transactions on the bus.
- the P6 processor uses a technique referred to as “deferred transaction” to accomplish the split on the bus.
- a processor sends out a read request, for example, and the target sends back a “defer” response, meaning that the target will send the data onto the bus, on its own initiative, when the data becomes available.
- PCI bus specification does not provide for split transactions. There is no mechanism for issuing a “deferred transaction” signal, nor for generating the deferred data initiative. Accordingly, while a P6 processor can communicate with resources such as main memory that are on the processor bus itself using deferred transactions, this technique is not used when communicating with disk drives, network resources, compatibility devices, etc., on an expansion bus.
- the PCI bus specification provides a protocol for issuing delayed transactions. Delayed transactions use a retry protocol to implement efficient processing of the transactions. If an initiator initiates a request to a target and the target cannot provide the data quickly enough, a retry command is issued. The retry command directs the initiator to retry or “ask again” for the data at a later time.
- the target does not simply sit idly by, awaiting the renewed request. Instead, the target initially records certain information, such as the address and command type associated with the initiator's request, and begins to assemble the requested information in anticipation of a retry request from the initiator. When the request is retried, the information can be quickly provided without unnecessarily tying up the system's buses.
- a memory read (MR) command does not provide any immediate indication as to the length of the intended read. The read is terminated based on logic signals driven on the bus by the initiator.
- a memory read line (MRL) command indicates that the initiator intends to read at least one cache line (e.g., 32 bytes) of data.
- a memory read multiple command (MRM) indicates that the initiator is likely to read more than one cache line of data.
- the bridge prefetches data and stores it in a buffer in anticipation of the retried transaction. The amount of data prefetched depends on the amount the initiator is likely to require. Efficiency is highest when the amount of prefetched data most closely matches the amount of data required.
- Prefetching in response to MRL and MRM commands is relatively uncomplicated, because, by the very nature of the command, the bridge knows to prefetch at least one, and likely more than one, cache line.
- the amount of data required by an initiator of an MR command is not readily apparent. Initiators may issue MR commands even if they know they will require multiple data phases.
- the PCI specification recommends, but does not require, that initiators use an MRL or an MRM command only if the starting address lies on a cache line boundary. Accordingly, a device following this recommendation would issue one or more MR commands until a cache line boundary is encountered, and would then issue the appropriate MRL or MRM command.
- some devices due to their vintage or their simplicity, are not equipped to issue MRL or MRM commands, and use MR commands exclusively.
- FIGS. 1A through 1D provide timing diagrams of exemplary MR transactions on a PCI bus. For clarity, only those PCI control signals useful in illustrating the examples are shown.
- the PCI bus uses shared address/data (AD) lines and shared command/byte enable (C/BE#) lines.
- AD shared address/data
- C/BE# shared command/byte enable
- a turnaround cycle is required on all signals that may be driven by more than one agent.
- the initiator drives the address and the target drives the data.
- the turnaround cycle is used to avoid contention when one agent stops driving a signal and another agent begins driving the signal.
- a turnaround cycle is indicated on the timing diagrams as two arrows pointing at each others' tail.
- FIG. 1A illustrates an MR command in which the initiator requires multiple data phases to complete the transaction.
- the target and initiator reside on the same PCI bus, and the target is ready to supply the data when requested.
- the initiator asserts a FRAME# signal before the rising edge of a first clock cycle (CLK1) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively.
- CLK3 a first clock cycle
- the initiator asserts the IRDY# signal to indicate that it is ready to receive data.
- the target also asserts the TRDY# signal at CLK3 (i.e., after the turnaround cycle) to signal that valid data is present on the AD lines.
- the initiator must deassert FRAME# before the last data phase. Because the FRAME# signal remains asserted at CLK3, the target knows that more data is required. Data transfer continues between the initiator and target during cycles CLK4 and CLK5. The initiator deasserts the FRAME# signal before CLK5 to indicate that Data3 is the last data phase. The initiator continues to assert the IRDY# signal until after the last data phase has been completed.
- FIG. 1B illustrates an MR command in which the initiator requires only one data phase to complete the transaction.
- the initiator asserts the FRAME# signal before the rising edge of the first clock cycle (CLK1) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively.
- CLK3 the initiator asserts the IRDY# signal to indicate that it is ready to receive data.
- the target asserts the TRDY# signal at CLK3 (i.e., after the turnaround cycle) to signal that valid data is present on the AD lines.
- the FRAME# signal is deasserted before CLK3.
- the target then knows that no more data is required.
- the initiator continues to assert the IRDY# signal during the transfer of the data at CLK3, and deasserts it thereafter.
- FIGS. 1A and 1B illustrated MR transaction between devices on the same PCI bus.
- FIGS. 1C and 1D illustrates an MR transaction where the target resides on a different PCI bus than the initiator, and is subordinate to a bridge device.
- the initiator asserts the FRAME# signal before the rising edge of the first clock cycle (CLK1) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively.
- CLK1 rising edge of the first clock cycle
- the bridge claims the transaction, and because no data is readily available forces a retry by asserting the STOP# signal during CLK2.
- the target deasserts the FRAME# signal before CLK3.
- the bridge then deasserts STOP# at CLK4.
- the bridge not knowing how much data the initiator requires, conservatively assumes the transaction is a single data phase transaction and retrieves the data.
- the initiator retries the request. Again, the initiator asserts the FRAME# signal before the rising edge of the first clock cycle (CLK1) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively.
- CLK1 rising edge of the first clock cycle
- the bridge now in possession of the data, allows the transaction to proceed.
- CLK3 the initiator asserts the IRDY# signal to indicate that it is ready to receive data.
- the bridge asserts the TRDY# signal at CLK3 to signal that valid data is present on the AD lines.
- the bridge also asserts the STOP# signal at CLK3 to indicate it cannot provide any further data. Even though the initiator desired more than one data phase to complete the transaction, as indicated by the FRAME# signal being asserted during the transfer of Data1, the transaction is terminated.
- the initiator is then forced to issue a new transaction, in accordance with FIG. 1C for the next data phase.
- the cycle of FIGS. 1C and 1D repeats until the initiator has received its requested data.
- the situation of FIGS. 1C and 1D illustrate an inefficiency introduced by the use of an MR command. It may take many such exchanges to complete the data transfer, thus increasing the number of tenancies (i.e., exchanges between an initiator and a target) on the bus. Also, the initiator, bridge, and target must compete for bus time with other devices on their respective buses, thus increasing the total number of cycles required to complete the transaction beyond those required just to complete the evolutions of FIGS. 1C and 1D.
- the present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.
- the device includes a data source, a bus interface, a data buffer, and control logic.
- the bus interface is coupled to a plurality of control lines of a bus and adapted to receive a read request targeting the data source.
- the control logic is adapted to determine if the read request requires multiple data phases to complete based on the control lines, and to retrieve at least two data phases of data from the data source and store them in the data buffer in response to the read request requiring multiple data phases to complete.
- the method includes receiving a read request on a bus.
- the bus includes a plurality of control lines. It is determined if the read request requires multiple data phases to complete based on the control lines. At least two data phases of data are retrieved from a data source in response to the read request requiring multiple data phases to complete. The at least two data phases of data are stored in a data buffer.
- FIGS. 1A through 1D illustrate timing diagrams of typical prior art bus commands
- FIG. 2 is a simplified block diagram of a computer system in accordance with the present invention.
- FIG. 3A is a diagram illustrating typical lines included in a processor bus of FIG. 2;
- FIG. 3B is a diagram illustrating typical lines included in a peripheral component interconnect bus of FIG. 2;
- FIG. 4 is a simplified block diagram of a bridge device of FIG. 2.
- FIGS. 5 through 7 are timing diagrams of bus transactions in accordance with the present invention.
- the computer system 100 includes multiple processors 102 in the illustrated example, although more or less may be employed.
- the processors 102 are connected to a processor bus 104 .
- the processor bus 104 operates based on the processor clock (not shown), so if the processors 102 are 166 MHz or 200 MHz devices (e.g., the clock speed of a Pentium Pro processor), for example, then the processor bus 104 is operated on some multiple of the base clock rate.
- a main memory 106 is coupled to the processor bus 104 through a memory controller 108 .
- the processors 102 each have a level-two cache 110 as a separate chip within the same package as the CPU chip itself, and the CPU chips have level-one data and instruction caches (not shown) included on-chip.
- Host bridges 112 , 114 are provided between the processor bus 104 and the PCI buses 116 , 118 , respectively. Two host bridges 112 and 114 are shown, although it is understood that many computer systems 100 would require only one, and other computer system 100 may use more than two. In one example, up to four of the host bridges 112 , 114 may be used. The reason for using more than one host bridge 112 , 114 is to increase the potential data throughput. One of the host bridges 112 is designated as a primary bridge, and the remaining bridges 114 (if any) are designated as secondary bridges.
- the primary host bridge 112 in the illustrated example, carries traffic for “legacy” devices, such as an EISA bridge 120 coupled to an EISA bus 122 , a keyboard/mouse controller 124 , a video controller 126 coupled to a monitor 128 , a flash ROM 130 , a NVRAM 132 , and a controller 134 for a floppy drive 136 and serial/parallel ports 138 .
- the secondary host bridge 114 does not usually accommodate any PC legacy items. Coupled to the PCI bus 118 by the host bridge 114 to the processor bus 104 are other resources such as a SCSI disk controller 140 for hard disk resources 142 , 144 , and a network adapter 146 for accessing a network 148 . A potentially large number of other stations (not shown) are coupled to the network 148 . Thus, transactions on the buses 104 , 116 , 118 may originate in or be directed to another station (not shown) or server (not shown) on the network 148 .
- the computer system 100 embodiment illustrated in FIG. 1 is that of a server, rather than a standalone computer system, but the features described herein may be used as well in a workstation or standalone desktop computer.
- Some components, such as the controllers 124 , 140 , 146 may be cards fitted into PCI bus slots (not shown) on the motherboard (not shown) of the computer system 100 . If additional slots (not shown) are needed, a PCI-to-PCI bridge 150 may be placed on the PCI bus 118 to access another PCI bus 152 .
- the additional PCI bus 152 does not provide additional bandwidth, but allows more adapter cards to be added.
- Various other server resources can be connected to the PCI buses 116 , 118 , 152 using commercially-available controller cards, such as CD-ROM drives, tape drives, modems, connections to ISDN lines for internet access, etc. (all not shown).
- Peer-to-peer transactions are allowed between a master and target device on the same PCI bus 116 , 118 , and are referred to as “standard” peer-to-peer transactions. Transactions between a master on one PCI bus 116 and a target device on another PCI bus 118 must traverse the processor bus 104 , and these are referred to as “traversing” transactions.
- the processor bus 104 contains a number of standard signal or data lines as defined in the specification for the particular processor 102 being used. In addition, certain special signals are included for the unique operation of the bridges 112 , 114 .
- the processor bus 104 contains thirty-three address lines 300 , sixty-four data lines 302 , and a number of control lines 304 . Most of the control lines 304 are not required to promote understanding of the present invention, and, as such, are not described in detail herein. Also, the address and data lines 300 , 302 have parity lines (not shown) associated with them that are also not described.
- the PCI buses 116 , 118 , 152 also contain a number of standard signal and data lines as defined in the PCI specification.
- the PCI buses 116 , 118 , 152 are of a multiplexed address/data type, and contain sixty-four AD lines 310 , eight command/byte-enable lines 312 , and a number of control lines (enumerated below).
- the particular control lines used in the illustration of the present invention are a frame line 314 (FRAME#), an initiator ready line 316 (IRDY#), a target ready line 318 (TRDY#), a stop line 320 (STOP#), and a clock line 322 (CLK).
- FIG. 4 a simplified block diagram showing the host bridge 112 in greater detail is provided.
- the host bridge 114 is of similar construction to that of the host bridge 112 depicted in FIG. 4.
- the host bridge 112 is hereinafter referred to as the bridge 112 .
- the bridge 112 includes a processor bus interface circuit 400 serving to acquire data and signals from the processor bus 104 and to drive the processor bus 104 with signals and data.
- a PCI bus interface circuit 402 serves to drive the PCI bus 116 and to acquire signals and data from the PCI bus 116 .
- the bridge 112 is divided into an upstream queue block 404 (US QBLK) and a downstream queue block 406 (DS QBLK).
- the term downstream refers to any transaction going from the processor bus 104 to the PCI bus 116
- the term upstream refers to any transaction going from the PCI bus 116 back toward the processor bus 104
- the bridge 112 interfaces on the upstream side with the processor bus 104 which operates at a bus speed related to the processor clock rate, which is, for example, 133 MHz, 166 MHz, or 200 MHz for Pentium Pro processors 102 .
- the bridge 112 interfaces with the PCI bus 116 operating at 33 or 66 MHz. These bus frequencies are provided for illustrative purposes. Application of the invention is not limited by the particular bus speeds selected.
- One function of the bridge 112 is to serve as a buffer between asynchronous buses 104 , 116 , and buses that differ in address/data presentation, i.e., the processor bus 104 has separate address and data lines 300 , 302 , whereas the PCI bus 116 uses multiplexed address and data lines 310 . To accomplish these translations, all bus transactions are buffered in FIFOs.
- An internal bus 408 conveys processor bus 104 write transactions or read data from the processor bus interface circuit 400 to a downstream delayed completion queue (DSDCQ) 410 and its associated RAM 412 , or to a downstream posted write queue (DSPWQ) 414 and its associated RAM 416 .
- DSDCQ downstream delayed completion queue
- DSPWQ downstream posted write queue
- Read requests going downstream are stored in a downstream delayed request queue (DSDRQ) 418 .
- An arbiter 420 monitors all pending downstream posted writes and read requests via valid bits on lines 422 in the downstream queues 410 , 414 , 418 and schedules which one will be allowed to execute next on the PCI bus 116 according to the read and write ordering rules set forth in the PCI bus specification.
- the arbiter 420 is coupled to the PCI bus interface circuit 402 for transferring commands thereto.
- the components of the upstream queue block 404 are similar to those of the downstream queue block 406 , i.e., the bridge 112 is essentially symmetrical for downstream and upstream transactions.
- a memory write transaction initiated by a device on the PCI bus 116 is posted to the PCI bus interface circuit 402 and the master device proceeds as if the write had been completed.
- a read requested by a device on the PCI bus 116 is not implemented at once by a target device on the processor bus 104 , so these reads are again treated as delayed transactions.
- An internal bus 424 conveys PCI bus write transactions or read data from the PCI bus interface circuit 402 to an upstream delayed completion queue (USDCQ) 426 and its associated RAM 428 , or to an upstream posted write queue (USPWQ) 430 and its associated RAM 432 .
- Read requests going upstream are stored in an upstream delayed request queue (USDRQ) 434 .
- An arbiter 436 monitors all pending upstream posted writes and read requests via valid bits on lines 438 in the upstream queues 426 , 430 , 434 and schedules which one will be allowed to execute next on the processor bus 104 according to the read and write ordering rules set forth in the PCI bus specification.
- the arbiter 436 is coupled to the processor bus interface circuit 400 for transferring commands thereto.
- Each buffer in a delayed request queue 418 , 434 stores a delayed request that is waiting for execution, and this delayed request consists of a command field, an address field, a write data field (not required if the request is a read request), and a valid bit.
- the USDRQ 434 holds requests originating from masters on the PCI bus 116 and directed to targets on the processor bus 104 or the PCI bus 118 .
- the USDRQ 434 and has eight buffers, corresponding one-to-one with eight buffers in the DSDCQ 410 .
- the DSDRQ 418 holds requests originating on the processor bus 104 and directed to targets on the PCI bus 116 .
- the DSDRQ 418 and has four buffers, corresponding one-to-one with four buffers in the USDCQ 426 .
- the DSDRQ 418 is loaded with a request from the processor bus interface circuit 400 and the USDCQ 426 .
- the USDRQ 434 is loaded from the PCI bus interface circuit 402 and the DSDCQ 410 . Requests are routed through the DCQ 410 , 426 logic to identify if a read request is a repeat of a previously encountered request.
- a read request from the processor bus 104 is latched into the processor bus interface circuit 400 and the transaction information is applied to the USDCQ 426 , where it is compared with all enqueued prior downstream read requests. If the current request is a duplicate, it is discarded if the data is not yet available to satisfy the request. If it is not a duplicate, the information is forwarded to the DSDRQ 418 . The same mechanism is used for upstream read requests. Information defining the request is latched into the PCI bus interface circuit 402 from the PCI bus 116 , forwarded to DSDCQ 410 , and, if not a duplicate of an enqueued request, forwarded to USDRQ 434 .
- the delayed completion queues 410 , 426 and their associated dual port RAMs 412 , 428 each store completion status and read data for delayed requests.
- a delayable request is sent from one of the interfaces 400 or 402 to the queue block 404 or 406 , the appropriate DCQ 410 , 426 is queried to see if a buffer for this same request has already been allocated.
- the address, commands, and byte enables are checked against the buffers in DCQ 410 or 426 . If no match is identified, a new buffer is allocated (if available), and the request is delayed (or deferred for the processor bus 104 ).
- the request is forwarded to the DRQ 418 or 434 in the opposite side.
- the request is then executed on the opposite bus 104 , 116 , under control of the appropriate arbiter 420 , 436 , and the completion status and data are forwarded back to the appropriate DCQ 410 , 426 .
- the buffer is not valid until ordering rules are satisfied. For example, a read cannot be completed until previous writes are completed.
- a delayable request “matches” a DCQ 410 , 426 buffer, and the requested data is valid, the request cycle is ready for immediate completion.
- the DSDCQ 410 stores status/read data for PCI-to-host delayed requests
- the USDCQ 426 stores status/read data for Host-to-PCI delayed or deferred requests.
- the bridge 112 includes bridge control circuitry 440 that prefetches data into the DSDCQ buffers 410 on behalf of the master, attempting to stream data with zero wait states after the delayed request completes.
- the DSDCQ 410 buffers are kept coherent with the processor bus 104 via snooping, which allows the buffers to be discarded as seldom as possible. Requests going the other direction may use prefetching, as described in greater detail below, however, since many PCI memory regions have “read side effects” (e.g., stacks and FIFOs), the bridge control circuitry 440 attempts to prefetch data into these buffers on behalf of the master only under controlled circumstances. In the illustrated embodiment, the USDCQ 426 buffers are flushed as soon as their associated deferred reply completes.
- the posted write queues 414 , 430 and their associated dual port RAM memories 416 , 432 commands and data associated with transactions. Only memory writes are posted, i.e., writes to I/O space are not posted. Because memory writes flow through dedicated queues within the bridge, they cannot blocked by delayed requests that precede them, as required by the PCI specification.
- Each of the four buffers in DSPWQ 414 stores 32 bytes (i.e., a cache line) of data plus commands for a host-to-PCI write.
- the four buffers in the DSPWQ 414 provide a total data storage of 128 bytes.
- the arbiters 420 and 436 control event ordering in the QBLKs 404 , 406 . These arbiters 420 , 436 make certain that any transaction in the DRQ 418 , 434 is not attempted until posted writes that preceded it are flushed, and that no datum in a DCQ 410 , 426 is marked valid until posted writes that arrived in the QBLK 404 , 406 ahead of it are flushed.
- the bridge control circuitry 440 is adapted to detect if an initiator intends to retrieve multiple phases of data with a burst MR command. There are numerous techniques for making such a determination, and several are described herein for illustrative purposes. As described above, it often takes multiple clock cycles before the behavior of an initiator can be determined. The techniques described below, although using different approaches, attempt to identify the intentions of an initiator with respect to the number of data phases desired and prefetch data, if possible, to reduce the inefficiencies described above. In response to determining that the initiator intends to complete multiple data phases, the bridge control circuitry 440 prefetches multiple data phases of data and stores them in the appropriate DCQ 410 , 420 associated with the transaction.
- FIG. 5 illustrates a timing diagram of a read transaction traversing the bridge 112 .
- the initiator asserts the FRAME# signal before the rising edge of the first clock cycle (CLK1) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively.
- CLK1 first clock cycle
- the bridge 112 claims the transaction, and because no data is readily available forces a retry by asserting the STOP# signal during CLK3.
- the bridge control circuitry 440 samples the FRAME# signal and the IRDY# signal to determine the intentions of the initiator with respect to the number of data phases requested. As described above in reference to FIG. 1B, an initiator requesting a single data phase must deassert the FRAME# signal before asserting the IRDY# signal to signify that the last data phase is being requested. In FIG. 5, coincident with the STOP# signal, the FRAME# signal and the IRDY# signal are both asserted, indicating that the initiator intends to request multiple data phases. Accordingly, the bridge control circuitry 440 prefetches more than just a single data phase of data in anticipation of the impending retry by the initiator. If the FRAME# signal was found to be deasserted when the STOP# signal was asserted, the bridge control circuitry 440 retrieves only one data phase of data. Approaches for determining the amount of data to prefetch are discussed in greater detail below.
- a second illustrative technique involves monitoring the behavior of the initiation for a predetermined number of clock cycles after the FRAME# signal is asserted to identify if the initiator commits to multiple data phases.
- the predetermined number of clock cycles is three.
- FIG. 6 is a timing diagram illustrating this technique.
- the initiator asserts the FRAME# signal before the rising edge of the first clock cycle (CLK1) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively.
- the bridge 112 claims the transaction, and monitors the behavior of the initiator to determine if the initiator commits to multiple data phases on or before the third clock cycle following the assertion of the FRAME# signal (i.e., CLK4). If the initiator does not commit prior to the third clock cycle, the bridge control circuitry 440 assumes a single data phase is required, and fetches only one data phase of data.
- the PCI specification does not impose a requirement on the initiator to assert the IRDY# signal within a certain number of clock cycles after asserting the FRAME# signal.
- the initiator does not assert the IRDY# signal until after CLK4, and thus, at the determination point, the bridge control circuitry 440 determines that the initiator has not committed to a multiple phase transfer and assumes that a single data phase is required. It is evident from the behavior of the initiator after CLK4 that the initiator intended to transfer during more than one data phase (i.e., the FRAME# signal and the IRDY# signal are both asserted at CLK5, but this intention is not detected by the bridge control circuitry 440 . Instead, the bridge control circuitry 440 asserts the STOP# signal at CLK5 in response to the lack of commitment on the part of the initiator prior to CLK4.
- the bridge control circuitry 440 would have detected the initiators multiple phase intention at CLK2, and would have asserted the STOP# signal at CLK3, without waiting the predetermined number of clock cycles.
- a third illustrative technique involves simply sampling the FRAME# signal when the initiator asserts the IRDY# signal. If the FRAME# signal is asserted coincident with the IRDY# signal, as evident at CLK5 of FIG. 7, the initiator has committed to a multiple data phase transfer. Accordingly, the bridge control circuitry 440 asserts the STOP# signal at CLK6, following the positive determination, and proceeds to prefetch multiple phases of data.
- This technique although the most accurate, has the potential to introduce the most latency, as there is no restriction imposed by the PCI specification on the time between the assertion of the FRAME# signal and the subsequent assertion of the IRDY# signal.
- the choice of how much data to prefetch in response to determining that the initiator intends to complete multiple data phases is application dependent.
- the bridge control circuitry 440 might prefetch up to the next cache line boundary, the next 512 byte boundary, or the next 4 kB boundary. Alternatively, the amount of data might depend on the available space in the bridge 112 .
- a device in the computer system 100 knowingly accessing a non-speculative region should be restricted to using only single data phase MR commands. In other words, multiple data phase read commands should be reserved only for accessing known speculative memory regions.
- the bridge includes a configuration register 442 for selectively enabling or disabling the MR promotion function of the bridge control circuitry 440 for any or all of the PCI slots (not shown) subordinate to the bridge 112 .
- the configuration register 442 stores a plurality of MR promotion bits, one for each subordinate device in its private configuration space.
- configuration software executing on the computer system 100 may choose to enable or disable the MR promotion function for each of the slots.
- the configuration software determines the type of device installed, and may compare this determination against a list of devices known to function well with MR promotion, or alternatively, to a list of devices known to have problems with MR promotion.
- the technique may be employed in any number of devices.
- the hard disk resource 142 , 144 may have a high latency as compared to the other devices accessing it.
- the hard disk resource 142 , 144 may implement a buffering technique at least partially similar to that used in the bridge 112 , wherein a retry is forced while the data is buffered.
- the hard disk resource 142 , 144 may advantageously use the MR promotion techniques described herein to reduce latencies and/or tenancies on its associated bus 118 .
- the network adapter 146 may advantageously implement MR promotion techniques.
- MR promotion may be used in peer-to-peer transaction, as well as traversing transactions.
- any device controlling data may implement MR promotion techniques in response to any received read transaction for which data is not immediately available.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Systems (AREA)
Abstract
A device for providing data includes a data source, a bus interface, a data buffer, and control logic. The bus interface is coupled to a plurality of control lines of a bus and adapted to receive a read request targeting the data source. The control logic is adapted to determine if the read request requires multiple data phases to complete based on the control lines, and to retrieve at least two data phases of data from the data source and store them in the data buffer in response to the read request requiring multiple data phases to complete. A method for retrieving data includes receiving a read request on a bus. The bus includes a plurality of control lines. It is determined if the read request requires multiple data phases to complete based on the control lines. At least two data phases of data are retrieved from a data source in response to the read request requiring multiple data phases to complete. The at least two data phases of data are stored in a data buffer.
Description
- 1. Field of the Invention
- This invention relates generally to communication between devices on different buses of a computer system, and, more particularly, to a method and apparatus for promoting memory read commands and advantageously prefetch data to reduce bus latency.
- 2. Description of the Related Art
- Computer systems of the PC type typically employ an expansion bus to handle various data transfers and transactions related to I/O and disk access. The expansion bus is separate from the system bus or from the bus to which the processor is connected, but is coupled to the system bus by a bridge circuit.
- A variety of expansion bus architectures have been used in the art, including the ISA (Industry Standard Architecture) expansion bus, an 8-Mhz, 16-bit device and the EISA (Extension to ISA) bus, a 32-bit bus clocked at 8-Mhz. As performance requirements increased, with faster processors and memory, and increased video bandwidth needs, high performance bus standard were developed. These standards included the Micro Channel architecture, a 10-Mhz, 32-bit bus; an enhanced Micro Channel, using a 64-bit data width and 64-bit data streaming; and the VESA (Video Electronics Standards Association) bus, a 33 MHz, 32-bit local bus specifically adapted for a 486 processor.
- More recently, the PCI (Peripheral Component Interconnect) bus standard was proposed by Intel Corporation as a longer-term expansion bus standard specifically addressing burst transfers. The original PCI bus standard has been revised several times, with the current standard being Revision 2.1, available from the PCI Special Interest Group, located in Portland, Oreg. The PCI Specification, Rev. 2.1, is incorporated herein by reference in its entirety. The PCI bus provides for 32-bit or 64-bit transfers at 33 or 66 MHz. It can be populated with adapters requiring fast access to each other and/or with system memory, and that can be accessed by the host processor at speeds approaching that of the processor's native bus speed. A 64-bit, 66-MHz PCI bus has a theoretical maximum transfer rate of 528 MByte/sec. All read and write transfers over the bus may be burst transfers. The length of the burst may be negotiated between initiator and target devices, and may be any length.
- A CPU operates at a much faster clock rate and data access rate than most of the resources it accesses via a bus. In earlier processors, such as those commonly available when the ISA bus and EISA bus were designed, this delay in reading data from a resource on the bus was handled by inserting wait states. When a processor requested data that was not immediately available due to a slow memory or disk access, the processor merely marked time using wait states, doing no useful work, until the data finally became available. To make use of this delay time, a processor such as the Pentium Pro (P6), offered by Intel Corporation, provides a pipelined bus that allows multiple transactions to be pending on the bus at one time, rather than requiring one transaction to be finished before starting another. Also, the P6 bus allows split transactions, i.e., a request for data may be separated from the delivery of the data by other transactions on the bus. The P6 processor uses a technique referred to as “deferred transaction” to accomplish the split on the bus. In a deferred transaction, a processor sends out a read request, for example, and the target sends back a “defer” response, meaning that the target will send the data onto the bus, on its own initiative, when the data becomes available.
- The PCI bus specification as set forth above does not provide for split transactions. There is no mechanism for issuing a “deferred transaction” signal, nor for generating the deferred data initiative. Accordingly, while a P6 processor can communicate with resources such as main memory that are on the processor bus itself using deferred transactions, this technique is not used when communicating with disk drives, network resources, compatibility devices, etc., on an expansion bus.
- The PCI bus specification, however, provides a protocol for issuing delayed transactions. Delayed transactions use a retry protocol to implement efficient processing of the transactions. If an initiator initiates a request to a target and the target cannot provide the data quickly enough, a retry command is issued. The retry command directs the initiator to retry or “ask again” for the data at a later time. In delayed transaction protocol, the target does not simply sit idly by, awaiting the renewed request. Instead, the target initially records certain information, such as the address and command type associated with the initiator's request, and begins to assemble the requested information in anticipation of a retry request from the initiator. When the request is retried, the information can be quickly provided without unnecessarily tying up the system's buses.
- Differentiated commands are used in accordance with the PCI specification to indicate, or at least hint at, the amount of data required by the initiator. A memory read (MR) command does not provide any immediate indication as to the length of the intended read. The read is terminated based on logic signals driven on the bus by the initiator. A memory read line (MRL) command, on the other hand, indicates that the initiator intends to read at least one cache line (e.g., 32 bytes) of data. A memory read multiple command (MRM) indicates that the initiator is likely to read more than one cache line of data. Based on the command received, the bridge prefetches data and stores it in a buffer in anticipation of the retried transaction. The amount of data prefetched depends on the amount the initiator is likely to require. Efficiency is highest when the amount of prefetched data most closely matches the amount of data required.
- Prefetching in response to MRL and MRM commands is relatively uncomplicated, because, by the very nature of the command, the bridge knows to prefetch at least one, and likely more than one, cache line. The amount of data required by an initiator of an MR command, on the other hand, is not readily apparent. Initiators may issue MR commands even if they know they will require multiple data phases. For example, the PCI specification recommends, but does not require, that initiators use an MRL or an MRM command only if the starting address lies on a cache line boundary. Accordingly, a device following this recommendation would issue one or more MR commands until a cache line boundary is encountered, and would then issue the appropriate MRL or MRM command. Also, some devices, due to their vintage or their simplicity, are not equipped to issue MRL or MRM commands, and use MR commands exclusively.
- To illustrate the difficulties of anticipating the amount of data required by the initiator of an MR command, FIGS. 1A through 1D provide timing diagrams of exemplary MR transactions on a PCI bus. For clarity, only those PCI control signals useful in illustrating the examples are shown. The PCI bus uses shared address/data (AD) lines and shared command/byte enable (C/BE#) lines. In accordance with the PCI specification, a turnaround cycle is required on all signals that may be driven by more than one agent. In the case of the AD lines, the initiator drives the address and the target drives the data. The turnaround cycle is used to avoid contention when one agent stops driving a signal and another agent begins driving the signal. A turnaround cycle is indicated on the timing diagrams as two arrows pointing at each others' tail.
- FIG. 1A illustrates an MR command in which the initiator requires multiple data phases to complete the transaction. In this illustration, the target and initiator reside on the same PCI bus, and the target is ready to supply the data when requested. The initiator asserts a FRAME# signal before the rising edge of a first clock cycle (CLK1) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively. During a third cycle, CLK3, the initiator asserts the IRDY# signal to indicate that it is ready to receive data. The target also asserts the TRDY# signal at CLK3 (i.e., after the turnaround cycle) to signal that valid data is present on the AD lines. In accordance with the PCI specification, the initiator must deassert FRAME# before the last data phase. Because the FRAME# signal remains asserted at CLK3, the target knows that more data is required. Data transfer continues between the initiator and target during cycles CLK4 and CLK5. The initiator deasserts the FRAME# signal before CLK5 to indicate that Data3 is the last data phase. The initiator continues to assert the IRDY# signal until after the last data phase has been completed.
- FIG. 1B illustrates an MR command in which the initiator requires only one data phase to complete the transaction. Again, the initiator asserts the FRAME# signal before the rising edge of the first clock cycle (CLK1) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively. During the third cycle, CLK3, the initiator asserts the IRDY# signal to indicate that it is ready to receive data. The target asserts the TRDY# signal at CLK3 (i.e., after the turnaround cycle) to signal that valid data is present on the AD lines. Because the initiator must deassert frame before the last data phase, the FRAME# signal is deasserted before CLK3. The target then knows that no more data is required. The initiator continues to assert the IRDY# signal during the transfer of the data at CLK3, and deasserts it thereafter.
- From the examples of FIGS. 1A and 1B, it is clear that the determination of the amount of data required by the initiator may not be determined until well into the transaction. FIGS. 1A and 1B illustrated MR transaction between devices on the same PCI bus. FIGS. 1C and 1D illustrates an MR transaction where the target resides on a different PCI bus than the initiator, and is subordinate to a bridge device.
- As shown in FIG. 1C, the initiator asserts the FRAME# signal before the rising edge of the first clock cycle (CLK1) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively. The bridge claims the transaction, and because no data is readily available forces a retry by asserting the STOP# signal during CLK2. In response to the STOP# signal, the target deasserts the FRAME# signal before CLK3. The bridge then deasserts STOP# at CLK4. The bridge, not knowing how much data the initiator requires, conservatively assumes the transaction is a single data phase transaction and retrieves the data.
- At some later time, as shown in FIG. 1D, the initiator retries the request. Again, the initiator asserts the FRAME# signal before the rising edge of the first clock cycle (CLK1) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively. The bridge, now in possession of the data, allows the transaction to proceed. During the third cycle, CLK3, the initiator asserts the IRDY# signal to indicate that it is ready to receive data. The bridge asserts the TRDY# signal at CLK3 to signal that valid data is present on the AD lines. The bridge also asserts the STOP# signal at CLK3 to indicate it cannot provide any further data. Even though the initiator desired more than one data phase to complete the transaction, as indicated by the FRAME# signal being asserted during the transfer of Data1, the transaction is terminated.
- The initiator is then forced to issue a new transaction, in accordance with FIG. 1C for the next data phase. The cycle of FIGS. 1C and 1D repeats until the initiator has received its requested data. The situation of FIGS. 1C and 1D illustrate an inefficiency introduced by the use of an MR command. It may take many such exchanges to complete the data transfer, thus increasing the number of tenancies (i.e., exchanges between an initiator and a target) on the bus. Also, the initiator, bridge, and target must compete for bus time with other devices on their respective buses, thus increasing the total number of cycles required to complete the transaction beyond those required just to complete the evolutions of FIGS. 1C and 1D.
- Techniques have been developed in the art to attempt to increase the efficiency of MR transactions traversing bridges. One such technique involves storing an MR promotion bit for each of the devices subordinate to a bridge in the private configuration space of the bridge. If the bit is asserted, MR commands are automatically promoted, and multiple data phases of data are prefetched. The decision on whether to set the promotion bit depends on knowledge of the device being accessed. Certain devices have undesirable read “side effects.” For example, an address might refer to a first-in-first-out (FIFO) register. A read to a FIFO increments the pointer of the FIFO to the next slot. If the prefetching conducted in response to the assertion of the promotion bit hits the address of the FIFO, it would increment, and a subsequent read targeting the FIFO would retrieve the wrong data, possible causing undesirable operation or a deadlock condition. Memory regions with such undesirable side effects are referred to as non-speculative regions, and memory regions where prefetching is allowable is referred to as speculative memory regions.
- The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.
- One aspect of the present invention is seen in a device for providing data. The device includes a data source, a bus interface, a data buffer, and control logic. The bus interface is coupled to a plurality of control lines of a bus and adapted to receive a read request targeting the data source. The control logic is adapted to determine if the read request requires multiple data phases to complete based on the control lines, and to retrieve at least two data phases of data from the data source and store them in the data buffer in response to the read request requiring multiple data phases to complete.
- Another aspect of the present invention is seen in a method for retrieving data. The method includes receiving a read request on a bus. The bus includes a plurality of control lines. It is determined if the read request requires multiple data phases to complete based on the control lines. At least two data phases of data are retrieved from a data source in response to the read request requiring multiple data phases to complete. The at least two data phases of data are stored in a data buffer.
- The invention may be understood by reference to the following description taken in conjunction with the accompanying drawings, in which like reference numerals identify like elements, and in which:
- FIGS. 1A through 1D illustrate timing diagrams of typical prior art bus commands;
- FIG. 2 is a simplified block diagram of a computer system in accordance with the present invention;
- FIG. 3A is a diagram illustrating typical lines included in a processor bus of FIG. 2;
- FIG. 3B is a diagram illustrating typical lines included in a peripheral component interconnect bus of FIG. 2;
- FIG. 4 is a simplified block diagram of a bridge device of FIG. 2; and
- FIGS. 5 through 7 are timing diagrams of bus transactions in accordance with the present invention.
- While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and are herein described in detail. It should be understood, however, that the description herein of specific embodiments is not intended to limit the invention to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.
- Illustrative embodiments of the invention are described below. In the interest of clarity, not all features of an actual implementation are described in this specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which will vary from one implementation to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.
- Referring to FIG. 2, a
computer system 100 in accordance with the present invention is shown. Thecomputer system 100 includesmultiple processors 102 in the illustrated example, although more or less may be employed. Theprocessors 102 are connected to aprocessor bus 104. Theprocessor bus 104 operates based on the processor clock (not shown), so if theprocessors 102 are 166 MHz or 200 MHz devices (e.g., the clock speed of a Pentium Pro processor), for example, then theprocessor bus 104 is operated on some multiple of the base clock rate. Amain memory 106 is coupled to theprocessor bus 104 through amemory controller 108. In the illustrated embodiment, theprocessors 102 each have a level-twocache 110 as a separate chip within the same package as the CPU chip itself, and the CPU chips have level-one data and instruction caches (not shown) included on-chip. - Host bridges112, 114 are provided between the
processor bus 104 and thePCI buses 116, 118, respectively. Twohost bridges many computer systems 100 would require only one, andother computer system 100 may use more than two. In one example, up to four of the host bridges 112, 114 may be used. The reason for using more than onehost bridge - The
primary host bridge 112, in the illustrated example, carries traffic for “legacy” devices, such as anEISA bridge 120 coupled to anEISA bus 122, a keyboard/mouse controller 124, avideo controller 126 coupled to amonitor 128, aflash ROM 130, aNVRAM 132, and acontroller 134 for afloppy drive 136 and serial/parallel ports 138. Thesecondary host bridge 114 does not usually accommodate any PC legacy items. Coupled to the PCI bus 118 by thehost bridge 114 to theprocessor bus 104 are other resources such as aSCSI disk controller 140 forhard disk resources network adapter 146 for accessing anetwork 148. A potentially large number of other stations (not shown) are coupled to thenetwork 148. Thus, transactions on thebuses network 148. - The
computer system 100 embodiment illustrated in FIG. 1 is that of a server, rather than a standalone computer system, but the features described herein may be used as well in a workstation or standalone desktop computer. Some components, such as thecontrollers computer system 100. If additional slots (not shown) are needed, a PCI-to-PCI bridge 150 may be placed on the PCI bus 118 to access another PCI bus 152. The additional PCI bus 152 does not provide additional bandwidth, but allows more adapter cards to be added. Various other server resources can be connected to thePCI buses 116, 118, 152 using commercially-available controller cards, such as CD-ROM drives, tape drives, modems, connections to ISDN lines for internet access, etc. (all not shown). - Traffic between devices on the
concurrent PCI buses 116, 118 and themain memory 106 must traverse theprocessor bus 104. Peer-to-peer transactions are allowed between a master and target device on thesame PCI bus 116, 118, and are referred to as “standard” peer-to-peer transactions. Transactions between a master on onePCI bus 116 and a target device on another PCI bus 118 must traverse theprocessor bus 104, and these are referred to as “traversing” transactions. - Referring briefly to FIG. 3A, the
processor bus 104 contains a number of standard signal or data lines as defined in the specification for theparticular processor 102 being used. In addition, certain special signals are included for the unique operation of thebridges processor bus 104 contains thirty-threeaddress lines 300, sixty-fourdata lines 302, and a number ofcontrol lines 304. Most of thecontrol lines 304 are not required to promote understanding of the present invention, and, as such, are not described in detail herein. Also, the address anddata lines - Referring now to FIG. 3B, the
PCI buses 116, 118, 152 also contain a number of standard signal and data lines as defined in the PCI specification. ThePCI buses 116, 118, 152 are of a multiplexed address/data type, and contain sixty-fourAD lines 310, eight command/byte-enablelines 312, and a number of control lines (enumerated below). The particular control lines used in the illustration of the present invention are a frame line 314 (FRAME#), an initiator ready line 316 (IRDY#), a target ready line 318 (TRDY#), a stop line 320 (STOP#), and a clock line 322 (CLK). - Turning now to FIG. 4, a simplified block diagram showing the
host bridge 112 in greater detail is provided. Thehost bridge 114 is of similar construction to that of thehost bridge 112 depicted in FIG. 4. For simplicity, thehost bridge 112 is hereinafter referred to as thebridge 112. Thebridge 112 includes a processorbus interface circuit 400 serving to acquire data and signals from theprocessor bus 104 and to drive theprocessor bus 104 with signals and data. A PCIbus interface circuit 402 serves to drive thePCI bus 116 and to acquire signals and data from thePCI bus 116. Internally, thebridge 112 is divided into an upstream queue block 404 (US QBLK) and a downstream queue block 406 (DS QBLK). The term downstream refers to any transaction going from theprocessor bus 104 to thePCI bus 116, and the term upstream refers to any transaction going from thePCI bus 116 back toward theprocessor bus 104. Thebridge 112 interfaces on the upstream side with theprocessor bus 104 which operates at a bus speed related to the processor clock rate, which is, for example, 133 MHz, 166 MHz, or 200 MHz forPentium Pro processors 102. On the downstream side, thebridge 112 interfaces with thePCI bus 116 operating at 33 or 66 MHz. These bus frequencies are provided for illustrative purposes. Application of the invention is not limited by the particular bus speeds selected. - One function of the
bridge 112 is to serve as a buffer betweenasynchronous buses processor bus 104 has separate address anddata lines PCI bus 116 uses multiplexed address anddata lines 310. To accomplish these translations, all bus transactions are buffered in FIFOs. - For transactions traversing the
bridge 112, all memory writes are posted writes and all reads are split transactions. A memory write transaction initiated by one of theprocessors 102 on theprocessor bus 104 is posted to the processorbus interface circuit 400, and theprocessor 102 continues with instruction execution as if the write had been completed. A read requested by one of theprocessors 102 is not immediately implemented, due to mismatch in the speed of operation of all of the data storage devices (except for caches) compared to the processor speed, so the reads are all treated as split transactions. Aninternal bus 408 conveysprocessor bus 104 write transactions or read data from the processorbus interface circuit 400 to a downstream delayed completion queue (DSDCQ) 410 and its associatedRAM 412, or to a downstream posted write queue (DSPWQ) 414 and its associatedRAM 416. Read requests going downstream are stored in a downstream delayed request queue (DSDRQ) 418. Anarbiter 420 monitors all pending downstream posted writes and read requests via valid bits onlines 422 in thedownstream queues PCI bus 116 according to the read and write ordering rules set forth in the PCI bus specification. Thearbiter 420 is coupled to the PCIbus interface circuit 402 for transferring commands thereto. - The components of the
upstream queue block 404 are similar to those of thedownstream queue block 406, i.e., thebridge 112 is essentially symmetrical for downstream and upstream transactions. A memory write transaction initiated by a device on thePCI bus 116 is posted to the PCIbus interface circuit 402 and the master device proceeds as if the write had been completed. A read requested by a device on thePCI bus 116 is not implemented at once by a target device on theprocessor bus 104, so these reads are again treated as delayed transactions. Aninternal bus 424 conveys PCI bus write transactions or read data from the PCIbus interface circuit 402 to an upstream delayed completion queue (USDCQ) 426 and its associatedRAM 428, or to an upstream posted write queue (USPWQ) 430 and its associatedRAM 432. Read requests going upstream are stored in an upstream delayed request queue (USDRQ) 434. Anarbiter 436 monitors all pending upstream posted writes and read requests via valid bits onlines 438 in theupstream queues processor bus 104 according to the read and write ordering rules set forth in the PCI bus specification. Thearbiter 436 is coupled to the processorbus interface circuit 400 for transferring commands thereto. - The structure and functions of the FIFO buffers or queues in the
bridge 112 is now described. Each buffer in a delayedrequest queue USDRQ 434 holds requests originating from masters on thePCI bus 116 and directed to targets on theprocessor bus 104 or the PCI bus 118. In the illustrated embodiment, theUSDRQ 434 and has eight buffers, corresponding one-to-one with eight buffers in theDSDCQ 410. TheDSDRQ 418 holds requests originating on theprocessor bus 104 and directed to targets on thePCI bus 116. In the illustrated embodiment, theDSDRQ 418 and has four buffers, corresponding one-to-one with four buffers in theUSDCQ 426. TheDSDRQ 418 is loaded with a request from the processorbus interface circuit 400 and theUSDCQ 426. Similarly, theUSDRQ 434 is loaded from the PCIbus interface circuit 402 and theDSDCQ 410. Requests are routed through theDCQ processor bus 104 is latched into the processorbus interface circuit 400 and the transaction information is applied to theUSDCQ 426, where it is compared with all enqueued prior downstream read requests. If the current request is a duplicate, it is discarded if the data is not yet available to satisfy the request. If it is not a duplicate, the information is forwarded to theDSDRQ 418. The same mechanism is used for upstream read requests. Information defining the request is latched into the PCIbus interface circuit 402 from thePCI bus 116, forwarded toDSDCQ 410, and, if not a duplicate of an enqueued request, forwarded toUSDRQ 434. - The delayed
completion queues dual port RAMs interfaces queue block appropriate DCQ DCQ DRQ opposite bus appropriate arbiter appropriate DCQ DCQ DCQ - The
DSDCQ 410 stores status/read data for PCI-to-host delayed requests, and theUSDCQ 426 stores status/read data for Host-to-PCI delayed or deferred requests. EachDSDCQ 410 buffer stores eight cache lines (256-bytes of data), and there are eight buffers (total data storage=2 kB). The four buffers in theUSDCQ 426, on the other hand, each store only 32 bytes (i.e., a cache line) of data (total data storage=128-Bytes). The upstream and downstream operation is slightly different in this regard. - The
bridge 112 includesbridge control circuitry 440 that prefetches data into the DSDCQ buffers 410 on behalf of the master, attempting to stream data with zero wait states after the delayed request completes. TheDSDCQ 410 buffers are kept coherent with theprocessor bus 104 via snooping, which allows the buffers to be discarded as seldom as possible. Requests going the other direction may use prefetching, as described in greater detail below, however, since many PCI memory regions have “read side effects” (e.g., stacks and FIFOs), thebridge control circuitry 440 attempts to prefetch data into these buffers on behalf of the master only under controlled circumstances. In the illustrated embodiment, theUSDCQ 426 buffers are flushed as soon as their associated deferred reply completes. - The posted
write queues port RAM memories DSPWQ 414 stores 32 bytes (i.e., a cache line) of data plus commands for a host-to-PCI write. The four buffers in theDSPWQ 414 provide a total data storage of 128 bytes. Each of the four buffers inUSPWQ 430 stores 256 bytes of data plus commands for a PCI-to-host write, i.e., eight cache lines (total data storage=1 kB). Burst memory writes that are longer than eight cache lines may cascade continuously from one buffer to the next in theUSPWQ 430. Often, an entire page (e.g., 4 kB) is written from thedisk 142 to themain memory 106 in a virtual memory system that is switching between tasks. For this reason, thebridge 112 has more capacity for bulk upstream memory writes than for downstream writes. - The
arbiters QBLKs arbiters DRQ DCQ QBLK - As described above, there is a risk associated with prefetching data in response to an upstream read command due to potential side effects. However, the conservative approach of never prefetching for upstream reads, as illustrated in FIGS. 1A through 1D, results in costly inefficiencies. The risk of prefetching is lessened if the anticipated behavior of the initiator can be predicted. For example, if an initiator issues an MR command, and it can be identified that the initiator is requesting more than one data phase of data, it is more likely that prefetching data will not cause an unintended side effect.
- The
bridge control circuitry 440, as described in reference to FIGS. 5, 6, and 7, is adapted to detect if an initiator intends to retrieve multiple phases of data with a burst MR command. There are numerous techniques for making such a determination, and several are described herein for illustrative purposes. As described above, it often takes multiple clock cycles before the behavior of an initiator can be determined. The techniques described below, although using different approaches, attempt to identify the intentions of an initiator with respect to the number of data phases desired and prefetch data, if possible, to reduce the inefficiencies described above. In response to determining that the initiator intends to complete multiple data phases, thebridge control circuitry 440 prefetches multiple data phases of data and stores them in theappropriate DCQ - A first illustrative technique involves evaluating the behavior of the initiator when the bridge issues a retry request (i.e., by asserting the STOP# signal). FIG. 5 illustrates a timing diagram of a read transaction traversing the
bridge 112. The initiator asserts the FRAME# signal before the rising edge of the first clock cycle (CLK1) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively. Thebridge 112 claims the transaction, and because no data is readily available forces a retry by asserting the STOP# signal during CLK3. When the STOP# signal is asserted, thebridge control circuitry 440 samples the FRAME# signal and the IRDY# signal to determine the intentions of the initiator with respect to the number of data phases requested. As described above in reference to FIG. 1B, an initiator requesting a single data phase must deassert the FRAME# signal before asserting the IRDY# signal to signify that the last data phase is being requested. In FIG. 5, coincident with the STOP# signal, the FRAME# signal and the IRDY# signal are both asserted, indicating that the initiator intends to request multiple data phases. Accordingly, thebridge control circuitry 440 prefetches more than just a single data phase of data in anticipation of the impending retry by the initiator. If the FRAME# signal was found to be deasserted when the STOP# signal was asserted, thebridge control circuitry 440 retrieves only one data phase of data. Approaches for determining the amount of data to prefetch are discussed in greater detail below. - A second illustrative technique involves monitoring the behavior of the initiation for a predetermined number of clock cycles after the FRAME# signal is asserted to identify if the initiator commits to multiple data phases. In the illustrated embodiment, the predetermined number of clock cycles is three. FIG. 6 is a timing diagram illustrating this technique. Again, the initiator asserts the FRAME# signal before the rising edge of the first clock cycle (CLK1) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively. The
bridge 112 claims the transaction, and monitors the behavior of the initiator to determine if the initiator commits to multiple data phases on or before the third clock cycle following the assertion of the FRAME# signal (i.e., CLK4). If the initiator does not commit prior to the third clock cycle, thebridge control circuitry 440 assumes a single data phase is required, and fetches only one data phase of data. - The PCI specification does not impose a requirement on the initiator to assert the IRDY# signal within a certain number of clock cycles after asserting the FRAME# signal. In FIG. 6, the initiator does not assert the IRDY# signal until after CLK4, and thus, at the determination point, the
bridge control circuitry 440 determines that the initiator has not committed to a multiple phase transfer and assumes that a single data phase is required. It is evident from the behavior of the initiator after CLK4 that the initiator intended to transfer during more than one data phase (i.e., the FRAME# signal and the IRDY# signal are both asserted at CLK5, but this intention is not detected by thebridge control circuitry 440. Instead, thebridge control circuitry 440 asserts the STOP# signal at CLK5 in response to the lack of commitment on the part of the initiator prior to CLK4. - If the initiator had responded in the manner previously described in FIG. 5, the
bridge control circuitry 440 would have detected the initiators multiple phase intention at CLK2, and would have asserted the STOP# signal at CLK3, without waiting the predetermined number of clock cycles. - A tradeoff exists between the number of cycles selected for evaluation and the accuracy of the determination of the initiator's intention. A larger number of clock cycles yields more accurate prediction, but takes longer to complete.
- A third illustrative technique involves simply sampling the FRAME# signal when the initiator asserts the IRDY# signal. If the FRAME# signal is asserted coincident with the IRDY# signal, as evident at CLK5 of FIG. 7, the initiator has committed to a multiple data phase transfer. Accordingly, the
bridge control circuitry 440 asserts the STOP# signal at CLK6, following the positive determination, and proceeds to prefetch multiple phases of data. This technique, although the most accurate, has the potential to introduce the most latency, as there is no restriction imposed by the PCI specification on the time between the assertion of the FRAME# signal and the subsequent assertion of the IRDY# signal. - The choice of how much data to prefetch in response to determining that the initiator intends to complete multiple data phases is application dependent. The
bridge control circuitry 440 might prefetch up to the next cache line boundary, the next 512 byte boundary, or the next 4 kB boundary. Alternatively, the amount of data might depend on the available space in thebridge 112. - To further safeguard against unintentionally prefetching a region with read side effects, a device in the
computer system 100 knowingly accessing a non-speculative region should be restricted to using only single data phase MR commands. In other words, multiple data phase read commands should be reserved only for accessing known speculative memory regions. - The bridge includes a configuration register442 for selectively enabling or disabling the MR promotion function of the
bridge control circuitry 440 for any or all of the PCI slots (not shown) subordinate to thebridge 112. The configuration register 442 stores a plurality of MR promotion bits, one for each subordinate device in its private configuration space. During power-up, configuration software executing on thecomputer system 100 may choose to enable or disable the MR promotion function for each of the slots. The configuration software determines the type of device installed, and may compare this determination against a list of devices known to function well with MR promotion, or alternatively, to a list of devices known to have problems with MR promotion. - Although the preceding description focused on the application of the MR promotion techniques in a
bridge 112, it is contemplated that the technique may be employed in any number of devices. For example, thehard disk resource hard disk resource bridge 112, wherein a retry is forced while the data is buffered. Thehard disk resource network 148 and accessing data present somewhere on thecomputer system 100. Accordingly, thenetwork adapter 146 may advantageously implement MR promotion techniques. As such, MR promotion may be used in peer-to-peer transaction, as well as traversing transactions. Generally speaking, any device controlling data may implement MR promotion techniques in response to any received read transaction for which data is not immediately available. - The particular embodiments disclosed above are illustrative only, as the invention may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. Furthermore, no limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope and spirit of the invention. Accordingly, the protection sought herein is as set forth in the claims below.
Claims (63)
1. A bridge device for communicating between a first and a second bus, comprising:
a bus interface coupled to a plurality of control lines of the first bus and adapted to receive a read request targeting the second bus;
a data buffer; and
control logic adapted to determine if the read request requires multiple data phases to complete based on the control lines, and to retrieve at least two data phases of data from the second bus and store them in the data buffer in response to the read request requiring multiple data phases to complete.
2. The bridge device of claim 1 , wherein the control lines include a stop line, and the control logic is adapted to assert a stop signal on the stop line after determining that the read request requires multiple data phases.
3. The bridge device of claim 1 , wherein the control lines include a frame line and a initiator ready line, and the control logic is adapted to sample a frame signal on the frame line and an initiator ready signal on the initiator ready line to determine of the read request requires multiple data phases.
4. The bridge device of claim 3 , wherein the control logic is adapted to determine that the read request requires multiple data phases in response to the frame signal and the initiator ready signal being asserted concurrently.
5. The bridge device of claim 4 , wherein the control logic is adapted to determine that the read request requires multiple data phases in response to the frame signal and the initiator ready signal being asserted concurrently within a predetermined number of clock cycles.
6. The bridge device of claim 5 , wherein the control logic is adapted to retrieve only one phase of data in response to the frame signal and the initiator ready signal not being asserted concurrently within the predetermined number of clock cycles.
7. The bridge device of claim 4 , wherein the control logic is adapted to sample the frame signal and the initiator ready signal in response to the initiator ready signal being asserted.
8. The bridge device of claim 7 , wherein the control logic is adapted to retrieve only one phase of data in response to the frame signal and the initiator ready signal not being asserted concurrently when the initiator ready signal is asserted.
9. The bridge device of claim 3 , wherein the control lines include a stop line, and the control logic is adapted assert a stop signal on the stop line in response to data corresponding to the read request not being stored in the data buffer.
10. The bridge device of claim 9 , wherein the control logic is adapted to sample the frame signal and the initiator signal when asserting the stop signal.
11. The bridge device of claim 10 , wherein the control logic is adapted to retrieve only one phase of data in response to the frame signal and the initiator ready signal not being asserted concurrently when the stop signal is asserted.
12. The bridge device of claim 1 , wherein the control logic is adapted to retrieve a plurality of data phases of data in response to the read request requiring multiple data phases until a cache line boundary is reached.
13. The bridge device of claim 1 , wherein the control logic is adapted to retrieve a plurality of data phases of data in response to the read request requiring multiple data phases until the data buffer is full.
14. The bridge device of claim 5 , wherein the predetermined number of cycles is between two and five.
15. The bridge device of claim 5 , wherein the predetermined number of cycles is at least two.
16. A device for providing data, comprising:
a data source;
a bus interface coupled to a plurality of control lines of a bus and adapted to receive a read request targeting the data source;
a data buffer; and
control logic adapted to determine if the read request requires multiple data phases to complete based on the control lines, and to retrieve at least two data phases of data from the data source and store them in the data buffer in response to the read request requiring multiple data phases to complete.
17. The device of claim 16 , wherein the control lines include a stop line, and the control logic is adapted to assert a stop signal on the stop line after determining that the read request requires multiple data phases.
18. The device of claim 16 , wherein the control lines include a frame line and a initiator ready line, and the control logic is adapted to sample a frame signal on the frame line and an initiator ready signal on the initiator ready line to determine of the read request requires multiple data phases.
19. The device of claim 18 , wherein the control logic is adapted to determine that the read request requires multiple data phases in response to the frame signal and the initiator ready signal being asserted concurrently.
20. The device of claim 19 , wherein the control logic is adapted to determine that the read request requires multiple data phases in response to the frame signal and the initiator ready signal being asserted concurrently within a predetermined number of clock cycles.
21. The device of claim 20 , wherein the control logic is adapted to retrieve only one phase of data in response to the frame signal and the initiator ready signal not being asserted concurrently within the predetermined number of clock cycles.
22. The device of claim 19 , wherein the control logic is adapted to sample the frame signal and the initiator ready signal in response to the initiator ready signal being asserted.
23. The device of claim 22 , wherein the control logic is adapted to retrieve only one phase of data in response to the frame signal and the initiator ready signal not being asserted concurrently when the initiator ready signal is asserted.
24. The device of claim 18 , wherein the control lines include a stop line, and the control logic is adapted assert a stop signal on the stop line in response to data corresponding to the read request not being stored in the data buffer.
25. The device of claim 24 , wherein the control logic is adapted to sample the frame signal and the initiator signal when asserting the stop signal.
26. The device of claim 25 , wherein the control logic is adapted to retrieve only one phase of data in response to the frame signal and the initiator ready signal not being asserted concurrently when the stop signal is asserted.
27. The device of claim 16 , wherein the control logic is adapted to retrieve a plurality of data phases of data in response to the read request requiring multiple data phases until a cache line boundary is reached.
28. The device of claim 16 , wherein the control logic is adapted to retrieve a plurality of data phases of data in response to the read request requiring multiple data phases until the data buffer is full.
29. The device of claim 16 , wherein the data source comprises at least one of a second bus, a disk drive, and a network.
30. The device of claim 20 , wherein the predetermined number of cycles is between two and five.
31. The device of claim 20 , wherein the predetermined number of cycles is at least two.
32. A method for retrieving data, comprising:
receiving a read request on a bus, the bus including a plurality of control lines;
determining that the read request requires multiple data phases to complete based on the control lines;
retrieving at least two data phases of data from a data source in response to the read request requiring multiple data phases to complete; and
storing the at least two data phases of data in a data buffer.
33. The method of claim 32 , wherein the control lines include a stop line, and the method further includes asserting a stop signal on the stop line after determining that the read request requires multiple data phases.
34. The method of claim 32 , wherein the control lines include a frame line and a initiator ready line, and determining that the read request requires multiple data phases includes:
sampling a frame signal on the frame line; and
sampling an initiator ready signal on the initiator ready line.
35. The method of claim 34 , wherein determining that the read request requires multiple data phases includes determining that the frame signal and the initiator ready signal are asserted concurrently.
36. The method of claim 35 , wherein determining that the read request requires multiple data phases includes determining that the frame signal and the initiator ready signal are asserted concurrently within a predetermined number of clock cycles.
37. The method of claim 36 , further comprising retrieving only one phase of data in response to the frame signal and the initiator ready signal not being asserted concurrently within the predetermined number of clock cycles.
38. The method of claim 35 , wherein determining that the read request requires multiple data phases includes sampling the frame signal and the initiator ready signal in response to the initiator ready signal being asserted.
39. The method of claim 38 , further comprising retrieving only one phase of data in response to the frame signal and the initiator ready signal not being asserted concurrently when the initiator ready signal is asserted.
40. The method of claim 34 , wherein the control lines include a stop line, and the method further comprises asserting a stop signal on the stop line in response to data corresponding to the read request not being stored in the data buffer.
41. The method of claim 40 , wherein determining that the read request requires multiple data phases includes sampling the frame signal and the initiator signal when asserting the stop signal.
42. The method of claim 25 , further comprising retrieving only one phase of data in response to the frame signal and the initiator ready signal not being asserted concurrently when the stop signal is asserted.
43. The method of claim 32 , wherein retrieving the at least two data phases of data includes retrieving a plurality of data phases of data until a cache line boundary is reached.
44. The method of claim 32 , wherein retrieving the at least two data phases of data includes retrieving a plurality of data phases of data until the data buffer is full.
45. The method of claim 32 , wherein retrieving the at least two data phases of data from the data source includes retrieving the at least two data phases of data from at least one of a second bus, a disk drive, and a network.
46. The method of claim 36 , wherein determining that the frame signal and the initiator ready signal are asserted concurrently within a predetermined number of clock cycles includes determining that the frame signal and the initiator ready signal are asserted concurrently within between two and five clock cycles.
47. The method of claim 36 , wherein determining that the frame signal and the initiator ready signal are asserted concurrently within a predetermined number of clock cycles includes determining that the frame signal and the initiator ready signal are asserted concurrently within at least two clock cycles.
48. A computer system, comprising:
a first bus having a plurality of control lines;
a second bus;
an initiating device coupled to the first bus and being adapted to initiate a read request targeting the target device;
a target device coupled to the second bus; and
a bridge device for communicating between the first and second buses, comprising:
a data buffer; and
control logic adapted to receive the read request, determine if the read request requires multiple data phases to complete based on the control lines, retrieve at least two data phases of data from the target device, and store the at least two data phases of data in the data buffer in response to the read request requiring multiple data phases to complete.
49. The computer system of claim 48 , wherein the control lines include a stop line, and the control logic is adapted to assert a stop signal on the stop line after determining that the read request requires multiple data phases.
50. The computer system of claim 48 , wherein the control lines include a frame line and a initiator ready line, the initiating device is adapted to assert a frame signal on the frame line and an initiator ready signal on the initiator ready line, and the control logic is adapted to sample the frame signal and the initiator ready signal to determine of the read request requires multiple data phases.
51. The computer system of claim 50 , wherein the control logic is adapted to determine that the read request requires multiple data phases in response to the frame signal and the initiator ready signal being asserted concurrently.
52. The computer system of claim 51 , wherein the control logic is adapted to determine that the read request requires multiple data phases in response to the frame signal and the initiator ready signal being asserted concurrently within a predetermined number of clock cycles.
53. The computer system of claim 52 , wherein the control logic is adapted to retrieve only one phase of data in response to the frame signal and the initiator ready signal not being asserted concurrently within the predetermined number of clock cycles.
54. The computer system of claim 51 , wherein the control logic is adapted to sample the frame signal and the initiator ready signal in response to the initiator ready signal being asserted.
55. The computer system of claim 54 , wherein the control logic is adapted to retrieve only one phase of data in response to the frame signal and the initiator ready signal not being asserted concurrently when the initiator ready signal is asserted.
56. The computer system of claim 50 , wherein the control lines include a stop line, and the control logic is adapted assert a stop signal on the stop line in response to data corresponding to the read request not being stored in the data buffer.
57. The computer system of claim 56 , wherein the control logic is adapted to sample the frame signal and the initiator signal when asserting the stop signal.
58. The computer system of claim 57 , wherein the control logic is adapted to retrieve only one phase of data in response to the frame signal and the initiator ready signal not being asserted concurrently when the stop signal is asserted.
59. The computer system of claim 48 , wherein the control logic is adapted to retrieve a plurality of data phases of data in response to the read request requiring multiple data phases until a cache line boundary is reached.
60. The computer system of claim 48 , wherein the control logic is adapted to retrieve a plurality of data phases of data in response to the read request requiring multiple data phases until the data buffer is full.
61. The computer system of claim 52 , wherein the predetermined number of cycles is between two and five.
62. The computer system of claim 52 , wherein the predetermined number of cycles is at least two.
63. An apparatus, comprising:
means for receiving a read request on a bus, the bus including a plurality of control lines;
means for determining that the read request requires multiple data phases to complete based on the control lines;
means for retrieving at least two data phases of data from a data source in response to the read request requiring multiple data phases to complete; and
means for storing the at least two data phases of data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/640,891 US20040054841A1 (en) | 2000-04-06 | 2003-08-14 | Method and apparatus for promoting memory read commands |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/543,817 US6631437B1 (en) | 2000-04-06 | 2000-04-06 | Method and apparatus for promoting memory read commands |
US10/640,891 US20040054841A1 (en) | 2000-04-06 | 2003-08-14 | Method and apparatus for promoting memory read commands |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/543,817 Continuation US6631437B1 (en) | 2000-04-06 | 2000-04-06 | Method and apparatus for promoting memory read commands |
Publications (1)
Publication Number | Publication Date |
---|---|
US20040054841A1 true US20040054841A1 (en) | 2004-03-18 |
Family
ID=28675684
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/543,817 Expired - Fee Related US6631437B1 (en) | 2000-04-06 | 2000-04-06 | Method and apparatus for promoting memory read commands |
US10/640,891 Abandoned US20040054841A1 (en) | 2000-04-06 | 2003-08-14 | Method and apparatus for promoting memory read commands |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/543,817 Expired - Fee Related US6631437B1 (en) | 2000-04-06 | 2000-04-06 | Method and apparatus for promoting memory read commands |
Country Status (1)
Country | Link |
---|---|
US (2) | US6631437B1 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7051162B2 (en) | 2003-04-07 | 2006-05-23 | Hewlett-Packard Development Company, L.P. | Methods and apparatus used to retrieve data from memory before such data is requested |
US7055005B2 (en) | 2003-04-07 | 2006-05-30 | Hewlett-Packard Development Company, L.P. | Methods and apparatus used to retrieve data from memory into a RAM controller before such data is requested |
US20110022746A1 (en) * | 2008-06-13 | 2011-01-27 | Phison Electronics Corp. | Method of dispatching and transmitting data streams, memory controller and memory storage apparatus |
US20140250095A1 (en) * | 2003-07-03 | 2014-09-04 | Ebay Inc. | Managing data transaction requests |
US20220138104A1 (en) * | 2019-03-15 | 2022-05-05 | Intel Corporation | Cache structure and utilization |
US11842423B2 (en) | 2019-03-15 | 2023-12-12 | Intel Corporation | Dot product operations on sparse matrix elements |
US11861761B2 (en) | 2019-11-15 | 2024-01-02 | Intel Corporation | Graphics processing unit processing and caching improvements |
US11934342B2 (en) | 2019-03-15 | 2024-03-19 | Intel Corporation | Assistance for hardware prefetch in cache access |
US12039331B2 (en) | 2017-04-28 | 2024-07-16 | Intel Corporation | Instructions and logic to perform floating point and integer operations for machine learning |
US12056059B2 (en) | 2019-03-15 | 2024-08-06 | Intel Corporation | Systems and methods for cache optimization |
US12175252B2 (en) | 2017-04-24 | 2024-12-24 | Intel Corporation | Concurrent multi-datatype execution within a processing resource |
US12361600B2 (en) | 2019-11-15 | 2025-07-15 | Intel Corporation | Systolic arithmetic on sparse data |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6631437B1 (en) * | 2000-04-06 | 2003-10-07 | Hewlett-Packard Development Company, L.P. | Method and apparatus for promoting memory read commands |
US6775732B2 (en) * | 2000-09-08 | 2004-08-10 | Texas Instruments Incorporated | Multiple transaction bus system |
US6973524B1 (en) * | 2000-12-14 | 2005-12-06 | Lsi Logic Corporation | Interface for bus independent core |
TW510992B (en) * | 2001-05-11 | 2002-11-21 | Realtek Semiconductor Corp | PCI device and method with shared expansion memory interface |
US6941408B2 (en) * | 2002-09-30 | 2005-09-06 | Lsi Logic Corporation | Bus interface system with two separate data transfer interfaces |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5802323A (en) * | 1996-06-14 | 1998-09-01 | Advanced Micro Devices, Inc. | Transparent burst access to data having a portion residing in cache and a portion residing in memory |
US5813036A (en) * | 1995-07-07 | 1998-09-22 | Opti Inc. | Predictive snooping of cache memory for master-initiated accesses |
US5835741A (en) * | 1996-12-31 | 1998-11-10 | Compaq Computer Corporation | Bus-to-bus bridge in computer system, with fast burst memory range |
US6092141A (en) * | 1996-09-26 | 2000-07-18 | Vlsi Technology, Inc. | Selective data read-ahead in bus-to-bus bridge architecture |
US6199131B1 (en) * | 1997-12-22 | 2001-03-06 | Compaq Computer Corporation | Computer system employing optimized delayed transaction arbitration technique |
US6301632B1 (en) * | 1999-03-26 | 2001-10-09 | Vlsi Technology, Inc. | Direct memory access system and method to bridge PCI bus protocols and hitachi SH4 protocols |
US6301630B1 (en) * | 1998-12-10 | 2001-10-09 | International Business Machines Corporation | Interrupt response in a multiple set buffer pool bus bridge |
US6314472B1 (en) * | 1998-12-01 | 2001-11-06 | Intel Corporation | Abort of DRAM read ahead when PCI read multiple has ended |
US6502157B1 (en) * | 1999-03-24 | 2002-12-31 | International Business Machines Corporation | Method and system for perfetching data in a bridge system |
US6581129B1 (en) * | 1999-10-07 | 2003-06-17 | International Business Machines Corporation | Intelligent PCI/PCI-X host bridge |
US6631437B1 (en) * | 2000-04-06 | 2003-10-07 | Hewlett-Packard Development Company, L.P. | Method and apparatus for promoting memory read commands |
-
2000
- 2000-04-06 US US09/543,817 patent/US6631437B1/en not_active Expired - Fee Related
-
2003
- 2003-08-14 US US10/640,891 patent/US20040054841A1/en not_active Abandoned
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5813036A (en) * | 1995-07-07 | 1998-09-22 | Opti Inc. | Predictive snooping of cache memory for master-initiated accesses |
US5802323A (en) * | 1996-06-14 | 1998-09-01 | Advanced Micro Devices, Inc. | Transparent burst access to data having a portion residing in cache and a portion residing in memory |
US6092141A (en) * | 1996-09-26 | 2000-07-18 | Vlsi Technology, Inc. | Selective data read-ahead in bus-to-bus bridge architecture |
US5835741A (en) * | 1996-12-31 | 1998-11-10 | Compaq Computer Corporation | Bus-to-bus bridge in computer system, with fast burst memory range |
US6199131B1 (en) * | 1997-12-22 | 2001-03-06 | Compaq Computer Corporation | Computer system employing optimized delayed transaction arbitration technique |
US6314472B1 (en) * | 1998-12-01 | 2001-11-06 | Intel Corporation | Abort of DRAM read ahead when PCI read multiple has ended |
US6301630B1 (en) * | 1998-12-10 | 2001-10-09 | International Business Machines Corporation | Interrupt response in a multiple set buffer pool bus bridge |
US6502157B1 (en) * | 1999-03-24 | 2002-12-31 | International Business Machines Corporation | Method and system for perfetching data in a bridge system |
US6301632B1 (en) * | 1999-03-26 | 2001-10-09 | Vlsi Technology, Inc. | Direct memory access system and method to bridge PCI bus protocols and hitachi SH4 protocols |
US6581129B1 (en) * | 1999-10-07 | 2003-06-17 | International Business Machines Corporation | Intelligent PCI/PCI-X host bridge |
US6631437B1 (en) * | 2000-04-06 | 2003-10-07 | Hewlett-Packard Development Company, L.P. | Method and apparatus for promoting memory read commands |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7055005B2 (en) | 2003-04-07 | 2006-05-30 | Hewlett-Packard Development Company, L.P. | Methods and apparatus used to retrieve data from memory into a RAM controller before such data is requested |
US7051162B2 (en) | 2003-04-07 | 2006-05-23 | Hewlett-Packard Development Company, L.P. | Methods and apparatus used to retrieve data from memory before such data is requested |
US20140250095A1 (en) * | 2003-07-03 | 2014-09-04 | Ebay Inc. | Managing data transaction requests |
US20110022746A1 (en) * | 2008-06-13 | 2011-01-27 | Phison Electronics Corp. | Method of dispatching and transmitting data streams, memory controller and memory storage apparatus |
US8812756B2 (en) * | 2008-06-13 | 2014-08-19 | Phison Electronics Corp. | Method of dispatching and transmitting data streams, memory controller and storage apparatus |
US12411695B2 (en) | 2017-04-24 | 2025-09-09 | Intel Corporation | Multicore processor with each core having independent floating point datapath and integer datapath |
US12175252B2 (en) | 2017-04-24 | 2024-12-24 | Intel Corporation | Concurrent multi-datatype execution within a processing resource |
US12039331B2 (en) | 2017-04-28 | 2024-07-16 | Intel Corporation | Instructions and logic to perform floating point and integer operations for machine learning |
US12217053B2 (en) | 2017-04-28 | 2025-02-04 | Intel Corporation | Instructions and logic to perform floating point and integer operations for machine learning |
US12141578B2 (en) | 2017-04-28 | 2024-11-12 | Intel Corporation | Instructions and logic to perform floating point and integer operations for machine learning |
US12066975B2 (en) * | 2019-03-15 | 2024-08-20 | Intel Corporation | Cache structure and utilization |
US12153541B2 (en) * | 2019-03-15 | 2024-11-26 | Intel Corporation | Cache structure and utilization |
US11954062B2 (en) | 2019-03-15 | 2024-04-09 | Intel Corporation | Dynamic memory reconfiguration |
US11995029B2 (en) | 2019-03-15 | 2024-05-28 | Intel Corporation | Multi-tile memory management for detecting cross tile access providing multi-tile inference scaling and providing page migration |
US12007935B2 (en) | 2019-03-15 | 2024-06-11 | Intel Corporation | Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format |
US12013808B2 (en) | 2019-03-15 | 2024-06-18 | Intel Corporation | Multi-tile architecture for graphics operations |
US11934342B2 (en) | 2019-03-15 | 2024-03-19 | Intel Corporation | Assistance for hardware prefetch in cache access |
US12056059B2 (en) | 2019-03-15 | 2024-08-06 | Intel Corporation | Systems and methods for cache optimization |
US11899614B2 (en) | 2019-03-15 | 2024-02-13 | Intel Corporation | Instruction based control of memory attributes |
US12079155B2 (en) | 2019-03-15 | 2024-09-03 | Intel Corporation | Graphics processor operation scheduling for deterministic latency |
US12093210B2 (en) | 2019-03-15 | 2024-09-17 | Intel Corporation | Compression techniques |
US12099461B2 (en) | 2019-03-15 | 2024-09-24 | Intel Corporation | Multi-tile memory management |
US12124383B2 (en) | 2019-03-15 | 2024-10-22 | Intel Corporation | Systems and methods for cache optimization |
US20220138104A1 (en) * | 2019-03-15 | 2022-05-05 | Intel Corporation | Cache structure and utilization |
US12141094B2 (en) | 2019-03-15 | 2024-11-12 | Intel Corporation | Systolic disaggregation within a matrix accelerator architecture |
US11954063B2 (en) | 2019-03-15 | 2024-04-09 | Intel Corporation | Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format |
US11842423B2 (en) | 2019-03-15 | 2023-12-12 | Intel Corporation | Dot product operations on sparse matrix elements |
US12182035B2 (en) | 2019-03-15 | 2024-12-31 | Intel Corporation | Systems and methods for cache optimization |
US12182062B1 (en) | 2019-03-15 | 2024-12-31 | Intel Corporation | Multi-tile memory management |
US12198222B2 (en) | 2019-03-15 | 2025-01-14 | Intel Corporation | Architecture for block sparse operations on a systolic array |
US12204487B2 (en) | 2019-03-15 | 2025-01-21 | Intel Corporation | Graphics processor data access and sharing |
US12210477B2 (en) | 2019-03-15 | 2025-01-28 | Intel Corporation | Systems and methods for improving cache efficiency and utilization |
US20220171710A1 (en) * | 2019-03-15 | 2022-06-02 | Intel Corporation | Cache structure and utilization |
US12242414B2 (en) | 2019-03-15 | 2025-03-04 | Intel Corporation | Data initialization techniques |
US12293431B2 (en) | 2019-03-15 | 2025-05-06 | Intel Corporation | Sparse optimizations for a matrix accelerator architecture |
US12321310B2 (en) | 2019-03-15 | 2025-06-03 | Intel Corporation | Implicit fence for write messages |
US12386779B2 (en) | 2019-03-15 | 2025-08-12 | Intel Corporation | Dynamic memory reconfiguration |
US12361600B2 (en) | 2019-11-15 | 2025-07-15 | Intel Corporation | Systolic arithmetic on sparse data |
US11861761B2 (en) | 2019-11-15 | 2024-01-02 | Intel Corporation | Graphics processing unit processing and caching improvements |
Also Published As
Publication number | Publication date |
---|---|
US6631437B1 (en) | 2003-10-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6148359A (en) | Bus-to-bus bridge in computer system, with fast burst memory range | |
US5870567A (en) | Delayed transaction protocol for computer system bus | |
US6098134A (en) | Lock protocol for PCI bus using an additional "superlock" signal on the system bus | |
US6085274A (en) | Computer system with bridges having posted memory write buffers | |
US6321286B1 (en) | Fault tolerant computer system | |
US6631437B1 (en) | Method and apparatus for promoting memory read commands | |
US6754737B2 (en) | Method and apparatus to allow dynamic variation of ordering enforcement between transactions in a strongly ordered computer interconnect | |
US6502157B1 (en) | Method and system for perfetching data in a bridge system | |
US5815677A (en) | Buffer reservation method for a bus bridge system | |
US5802324A (en) | Computer system with PCI repeater between primary bus and second bus | |
US5859988A (en) | Triple-port bus bridge | |
US6286074B1 (en) | Method and system for reading prefetched data across a bridge system | |
US6330630B1 (en) | Computer system having improved data transfer across a bus bridge | |
US5918072A (en) | System for controlling variable length PCI burst data using a dummy final data phase and adjusting the burst length during transaction | |
US7213094B2 (en) | Method and apparatus for managing buffers in PCI bridges | |
US6170030B1 (en) | Method and apparatus for restreaming data that has been queued in a bus bridging device | |
US5832243A (en) | Computer system implementing a stop clock acknowledge special cycle | |
US5918026A (en) | PCI to PCI bridge for transparently completing transactions between agents on opposite sides of the bridge | |
US7054987B1 (en) | Apparatus, system, and method for avoiding data writes that stall transactions in a bus interface | |
US6425023B1 (en) | Method and system for gathering and buffering sequential data for a transaction comprising multiple data access requests | |
US6961819B2 (en) | Method and apparatus for redirection of operations between interfaces | |
US6202112B1 (en) | Arbitration methods to avoid deadlock and livelock when performing transactions across a bridge | |
US20040064626A1 (en) | Method and apparatus for ordering interconnect transactions in a computer system | |
US20030131175A1 (en) | Method and apparatus for ensuring multi-threaded transaction ordering in a strongly ordered computer interconnect | |
US20030084223A1 (en) | Bus to system memory delayed read processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS Free format text: CHANGE OF NAME;ASSIGNOR:COMPAQ INFORMATION TECHNOLOGIES GROUP L.P.;REEL/FRAME:014177/0428 Effective date: 20021001 Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.,TEXAS Free format text: CHANGE OF NAME;ASSIGNOR:COMPAQ INFORMATION TECHNOLOGIES GROUP L.P.;REEL/FRAME:014177/0428 Effective date: 20021001 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |