US20090077290A1 - Controller for processing apparatus - Google Patents
Controller for processing apparatus Download PDFInfo
- Publication number
- US20090077290A1 US20090077290A1 US12/212,114 US21211408A US2009077290A1 US 20090077290 A1 US20090077290 A1 US 20090077290A1 US 21211408 A US21211408 A US 21211408A US 2009077290 A1 US2009077290 A1 US 2009077290A1
- Authority
- US
- United States
- Prior art keywords
- request
- module
- accordance
- dvs
- operable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/324—Power saving characterised by the action undertaken by lowering clock frequency
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30083—Power or thermal control instructions
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- This invention relates to a controller for controlling processor apparatus and particularly to a controller employing dynamic voltage scaling. It is particularly, but not exclusively, concerned with control of a CMOS based integrated circuit.
- CMOS complementary metal-oxide-semiconductor
- DVS Dynamic Voltage Scaling
- UK Patent Application GB2403823 describes a method for implementing the dynamic scaling of voltages on a set of resources while the resources continue to execute operations. This technique is especially applicable to software defined radio.
- the DVS scheme disclosed therein ramps up the supply voltage and clock frequency during the execution of an operation by a processing resource. By increasing the voltage-frequency during the execution of an operation, the resource will use less power if the operation uses fewer cycles than the worst-case execution cycle count.
- UK Patent Application GB2410344 describes implementation of an intra-operation DVS scheme to a reconfigurable application in a hard real-time heterogeneous System on a Chip (SoC) environment.
- SoC System on a Chip
- DVS is currently in use by companies such as ARM, Intel and Transmeta. This is demonstrated by the following two publications by ARM and a third by Transmeta:
- the schemes used by these device designers are based on uni-processor design with a common clock.
- the DVS schemes implemented by ARM, Intel and Transmeta in the papers identified above only apply to a single voltage-frequency domain. That is, only one domain is modified in voltage and frequency as a result of a decision by the DVS management entity.
- GALS globally asynchronous, locally synchronous
- DVS Dynamic speed/voltage scaling for GALS processors
- US Patent Application US 2006/161797 describes an asynchronous wrapper for use in a GALS architecture. It describes how an external signal is used to set the internal synchronous clock of a processing resource.
- an aspect of the invention provides a modification of the approach taken in GB2410344.
- an approach is disclosed which uses an adaptive DVS scheme, but which relies on a controllable clock directly modifying the execution time for a task on a module. If the number of cycles taken to complete the task is a function of a second module, then the benefits of the DVS scheme are diminished. Typically, the cycle count of a task on the first module might be dependent on a second module if the task needs the second module to perform a function.
- An aspect of the present invention provides a mechanism where the processing time for a slave module is linked to its master in such a way that the DVS scheme supported by the master can have the greatest benefit to the overall processing apparatus.
- information concerning the clock frequency, calculated by the master DVS manager is inherited (or reused) by sub-modules whenever the master requests a function from the sub-module.
- Another aspect of the invention provides a computer apparatus comprising a master processing module and at least one sub-module, dynamic voltage scaling means being associated with the master module and operable to calculate dynamically an operating frequency for the master module, and wherein said sub-module is operable to use said operating frequency when accessed by the master module.
- the sub-module ‘inherits’ the operating frequency of the master module.
- mapping means may be provided operable to map the master clock frequency to a generic speed request.
- This generic speed request can then be sent to the sub module in terms which it can interpret independently.
- a further aspect of the invention provides a computer processing apparatus comprising a plurality of processing modules, wherein at least one of said modules comprises dynamic voltage scaling means, and is operable to send to a further of said modules a functional request message for processing by said further module, wherein said functional request message is, in use, accompanied by a processing speed message.
- the further module may be responsive to receipt of a speed message by controlling its clock frequency and/or operating voltage.
- a further aspect of the invention provides a computer processing apparatus comprising a plurality of modules, wherein at least one module comprises dynamic voltage scaling means and is operable to interact with another module by supplying it with a speed request associated with a functional request. Responsive to receiving a speed request, the module in receipt thereof is operable to interpret the speed request by control of at least one processing parameter governing execution of the associated functional request.
- the processing parameter may be the expected time for execution of the functional request.
- a further aspect of the invention provides a computer processing apparatus comprising a plurality of modules, wherein at least one module comprises dynamic voltage scaling means and is operable to interact with another module by supplying it with a clock signal when it requests said other module to execute a function.
- the module may be operable to supply a supply voltage to said other module when requesting said other module to execute a function.
- a further aspect of the invention provides a computer processing apparatus comprising a master module and a slave module, the master module being operable to send a functional request to said slave module for execution by said slave module of a requested function, the master module comprising dynamic voltage scaling (DVS) means operable to establish a DVS control scheme for the master processing module, and DVS linking means operable to relate the DVS control scheme to said slave processing module.
- DVD dynamic voltage scaling
- a further aspect of the invention provides a method of controlling a computer processing apparatus comprising a master module and a slave module, comprising establishing a DVS control scheme for the master module, relating the DVS control scheme to said slave module, associating a DVS control request with a functional request wherein the DVS control request is in accordance with the slave module related DVS control scheme, and sending said functional request and said DVS control request from the master module to said slave module for execution by said slave module of a requested function in accordance with said DVS control request.
- aspects of the invention can be implemented, by way of example, in a ‘system an a chip’ (SoC) context, for instance for a mobile telephone, or for execution of a video CODEC, for Games Equipment, or in base stations or access points. That is, aspects of the invention can be applied to a situation wherein a multi-processor architecture is provided, wherein there is a requirement to manage and possibly to minimise power consumption.
- SoC system an a chip
- aspects of the invention can be implemented using software components, for execution by broadly generic computer hardware, such as a DSP or an FPGA.
- Such software components could be delivered by physical storage media, or by a signal.
- FIG. 1 is a schematic diagram of a computer processing apparatus in accordance with a first specific embodiment of the invention
- FIG. 2 is a schematic diagram of a master processor of the computer processing apparatus illustrated in FIG. 1 ;
- FIG. 3 is a schematic diagram of a slave processor of the computer processing apparatus illustrated in FIG. 1 ;
- FIG. 4 is a schematic diagram of a slave processor, in accordance with a second embodiment of the invention, for incorporation into the computer processing apparatus illustrated in FIG. 1 instead of the slave processor illustrated in FIG. 3 ;
- FIG. 5 is a schematic diagram of a slave processor, in accordance with a third embodiment of the invention, for incorporation into the computer processing apparatus illustrated in FIG. 1 instead of the slave processor illustrated in FIG. 3 ;
- FIG. 6 is a schematic diagram of a wireless modem implemented in accordance with the computer processing apparatus of the first specific embodiment illustrated in FIG. 1 .
- FIG. 1 illustrates a first specific embodiment of the invention, in which a computer processing apparatus 10 is illustrated.
- a master processor 100 and a slave processor 200 are provided, each of which is operable to access a bus 20 for transmission of messages between the two processing components 100 , 200 .
- the master can send a function request 22 to the slave, to cause the slave 200 to perform a function for which it is better suited than the master 100 .
- the reasons why the master request to a slave 200 may depend on a number of factors, not just suitability for a particular task to be performed.
- a speed request 24 is sent alongside the function request 22 by the master 100 to the slave 200 .
- the master processing unit 100 is illustrated in further detail in FIG. 2 .
- the master processing unit 100 is compliant with the “globally asynchronous locally synchronous” (GALS) architecture, so comprises a processing element 110 operable in a synchronous domain, under the control of a DVS control unit 112 which supplies a clock and an associated supply voltage on the basis of a requested frequency.
- the frequency is determined in a wrapper unit 120 which is an interface between asynchronous and synchronous architectures.
- the wrapper unit 120 comprises a frequency register 122 which is programmed by a DVS manager 130 .
- the register 122 In addition to outputting the frequency for use by the DVS control unit 112 , the register 122 passes the frequency to a functional block 140 .
- This block converts the register frequency value for the clock speed in the master processor unit 100 , into a generic speed request.
- This generic speed request is then output as signal 24 previously described.
- This signal 24 is output alongside a functional request signal 22 output by the processing element 110 .
- a functional request signal 22 is output when the master module makes a request for a service from a different clock domain.
- An example could be a memory transfer request, or a hardware accelerator operation, such as to channel decode a block of data.
- a speed request is sent for use by the slave module 200 receiving the functional request 22 .
- This speed request 24 is used by the slave module 200 to determine the mechanism of execution.
- the effect of the speed request is to alter the time for which the master processing unit 100 will wait for the slave processing unit 200 to complete its operation.
- the master processing unit 100 selects the value of the speed request based on the frequency voltage setting under which it is currently executing tasks. That is, if the master processing unit 100 is operating at a relatively high master clock frequency (as governed by the DVS control unit 112 ), the speed request will correspondingly be high. Conversely, if the master processing unit 100 currently executes at a relatively low speed, the speed request will consequently be adjusted to a lower level.
- the speed request can be a generic value, for interpretation by the slave processing unit 200 according to its type and structure.
- FIG. 3 illustrates in further detail the structure of the slave processing unit 200 of the first specific embodiment of the invention.
- the slave processing unit 200 comprises a processing element 210 , which is synchronous in nature and therefore governed by a DVS control unit 212 , supplying a supply voltage and a clock thereto.
- the DVS control unit 212 is governed by a frequency quantity, which is extracted from a wrapper unit 220 comprising a register 222 generating the frequency signal.
- the register 222 generates the frequency signal on the basis of a functional block 240 , in receipt of a speed request signal 24 . Consequently, a functional request 22 received by the processing element 210 can be processed according to DVS conditions governed by the speed request 24 .
- the functional block 240 is architecture specific, and is designed for the capabilities of the slave unit 200 .
- the block 240 converts the speed request into a form suitable for the slave processing unit 200 .
- each processing unit may also have the capacity to modify its operating voltage or frequency to match the requested speed. This will allow for further saving in power consumption in the slave processing unit.
- the following table sets out a correspondence between the master clock frequency output by the DVS control unit 112 of the master unit 100 , with a generic speed request value, and with a priority value on the shared bus 20 .
- FIG. 4 illustrates a schematic diagram of a second specific embodiment of a slave unit 300 .
- the slave unit 300 comprises a processing element 310 operable to respond to a functional request 22 received on the bus.
- the processing element is governed in its ability to do this by means of a supply voltage VCC and a clock.
- the clock is generated by a clock generator 313
- the supply voltage is generated by a power supply unit 314 .
- the wrapper unit 320 is also modified from the wrapper unit 220 of the first embodiment.
- the wrapper unit now comprises a functional block 340 which is operable to interpret received speed requests 24 into configuration commands for the processing element 310 .
- the slave unit does not just adopt the DVS control of the master unit 100 , but instead interprets master unit speed requests 24 and provides local conditions in terms of configuration of the processing element 310 to enable tasks to be completed in an effective manner.
- the processor can allocate different time slots to the thread associated with the function request. This will enable priority tasks to be completed more quickly, or low priority tasks to be completed more slowly, without DVS at the slave.
- FIG. 5 A third embodiment of the slave unit 400 is illustrated in FIG. 5 .
- the slave 400 of this example comprises a wrapper 420 which now includes a functional block 422 which interprets speed requests into a control signal for a communication fabric controller 412 .
- the communication fabric controller 412 manages access to the shared communication fabric. It is thus a direct memory access (DMA) controller.
- the control signals are operable to cause the communication fabric controller 412 to modify its operating voltage and frequency to match the requested speed represented by the speed request 24 . This allows for further saving in power consumption in the slave module.
- DMA direct memory access
- the clock speed of a slave module is determined by the status of the FIFO used to transfer data into the sub-module, this means that if no data is supplied, the clock used to drive the associated processing logic is switched off.
- the approach identified above allows for finer and more precise control of the operating mode and/or clock frequency of slave modules employed by a master module.
- the FIFO technique of Krstic has a high latency associated with it.
- the technique described above in accordance with the specific embodiments of the invention explicitly states the speed at which a slave module should run when the data is supplied and so avoids the lag caused by the FIFO buffer.
- Simple GALS/DVS schemes which only allow static setting of clock frequency and voltage do not take advantage of power savings possible due to the actual processing complexity being distributed i.e. having a mean and max value.
- a communications network can take advantage of this aspect of power saving opportunities.
- This approach can be used to reduce power consumption in any complicated CMOS based electronic system. Typically, it could be used in a large SoC with multiple processing elements. However, it could also be applied to multi-processor designs such as the CELL. These electronic systems could then be used for sophisticated applications such as the base band processing in a wireless phone or base station or in a games machine.
- Embodiments of the invention will supply performance benefits when an application has variable complexity and requires the operating voltage and clock frequency to track the workload of the platform.
- FIG. 6 depicts a wireless modem system 50 comprising a digital signal processor (DSP) 500 executing the signal processing stages of the modem as well as a DVS management controller, as separate tasks, and a hardware accelerator 600 for implementing a turbo decoder.
- DSP digital signal processor
- Both modules 500 , 600 have their own clock and voltage generator (DVS Controller 512 , 612 respectively), and processing elements ( 510 , 610 respectively).
- a wrapper 520 is provided in the DSP for associating information with an execution request and for unwrapping information received from another processing entity in the system 50 .
- a wrapper 620 is provided in the turbo decoder 600 for unwrapping information associated with an execution request received from the DSP 500 , and also for associating items of information with each other for return to the DSP 500 .
- a DVS management task 530 defined in a processing element 510 of the DSP 500 provides the function of a DVS manager.
- the DVS manager in the DSP determines the clock frequency for the DSP at any particular time to ensure deadlines are achieved and power consumption is minimised.
- a wireless modem task 550 is also defined in the DSP processing element 510 , to provide the signal processing functions referred to above in connection with the modem capability of the wireless modem system 50 .
- the wireless modem task 550 when requesting the turbo decoder 600 to execute, also includes a speed request with the functional request. This speed request is based on the speed currently set by the DVS manager 530 .
- the speed request is written into a register in the turbo decoder's DVS controller 612 at the same time as the control bits and parameters are written into their associated registers. In this way, the turbo decoder can be set a DVS profile suitable to its own hardware capabilities but also reflecting the overall system requirements as managed from the DSP 500 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Power Sources (AREA)
- Executing Machine-Instructions (AREA)
Abstract
A computer apparatus comprises a master module and a slave module such that the master module is able to send a functional request to the slave module for the execution by the slave module of a requested function. The master module comprises dynamic voltage scaling (DVS) means operable to establish a DVS control scheme for the master processing module, and DVS liking means operable to relate the DVS control scheme to the slave processing module.
Description
- This invention relates to a controller for controlling processor apparatus and particularly to a controller employing dynamic voltage scaling. It is particularly, but not exclusively, concerned with control of a CMOS based integrated circuit.
- It is well known that the maximum operating frequency of CMOS technology increases generally with supply voltage. Using this, power consumption of a CMOS device can be controlled by operating the device at the lowest clock frequency permitted for a particular operating requirement and taking the opportunity arising from this to limit supply voltage. Various techniques have been put forward in the art to take advantage of this, collectively known as Dynamic Voltage Scaling (DVS).
- UK Patent Application GB2403823 describes a method for implementing the dynamic scaling of voltages on a set of resources while the resources continue to execute operations. This technique is especially applicable to software defined radio. The DVS scheme disclosed therein ramps up the supply voltage and clock frequency during the execution of an operation by a processing resource. By increasing the voltage-frequency during the execution of an operation, the resource will use less power if the operation uses fewer cycles than the worst-case execution cycle count.
- UK Patent Application GB2410344 describes implementation of an intra-operation DVS scheme to a reconfigurable application in a hard real-time heterogeneous System on a Chip (SoC) environment.
- DVS is currently in use by companies such as ARM, Intel and Transmeta. This is demonstrated by the following two publications by ARM and a third by Transmeta:
- S. M. Martin, et al, “Combined Dynamic Voltage Scaling and Adaptive Body Biasing for Low Power Microprocessors Under Dynamic Workloads”, http://www.arm.com/pdfs/dvsabb-ICCAD2002.pdf;
- P. Morris, P. Watson, “Automated Low-Power Implementation Methodology” ARM Developers Conference-Information Quarterly, Vol. 4, No. 3, 2005; and
- M. Fleischmann, “Longun™ Power Management”, www.transmeta.com/pdfs/paper_mfleischmann—17jan01.pdf, 2001.
- The schemes used by these device designers are based on uni-processor design with a common clock. The DVS schemes implemented by ARM, Intel and Transmeta in the papers identified above only apply to a single voltage-frequency domain. That is, only one domain is modified in voltage and frequency as a result of a decision by the DVS management entity.
- A number of papers discuss combining globally asynchronous, locally synchronous (GALS) architectures with DVS.
- For instance, “Dynamic speed/voltage scaling for GALS processors”, (S. Chan, A. Eswaran, http://www.ece.cmu.edu/˜schen1/ece743) discusses how DVS can be used to ensure certain stages in a processor operate more slowly than usual, when later stages take longer to complete tasks. By running more slowly and at a lower voltage, overall power consumption is reduced.
- “Power Efficiency of Voltage Scaling in Multiple Clock, Multiple Voltage Cores” (A. Iyer, D. Marculescu, Conference on Computer-Aided Design (ICCAD), November 2002) and “Power-Performance Evaluation of Globally Asynchronous, Locally Synchronous Processors” (A. Iyer and D. Marculescu, International Symposium on Computer Architecture (ISCA), May 2002) discuss the benefits of GALS when combined with DVS.
- “Request-Driven GALS Technique for Datapath Architectures” (M. Krstic, E Grass, Proc. of the 3rd ACiD-WG Workshop, Heraklion, Jan. 27-28, 2003, Greece, session 2 (2003)) describes how the clock frequency of a second module can be dynamically modified by monitoring the status of a FIFO feeding to it i.e. when the FIFO is empty the clock is stopped. This paper is based on a thesis by Krstic at the Brandenburgischen Technischen Universität, Cottbus.
- US Patent Application US 2006/161797 describes an asynchronous wrapper for use in a GALS architecture. It describes how an external signal is used to set the internal synchronous clock of a processing resource.
- In general terms, an aspect of the invention provides a modification of the approach taken in GB2410344. In that patent application, an approach is disclosed which uses an adaptive DVS scheme, but which relies on a controllable clock directly modifying the execution time for a task on a module. If the number of cycles taken to complete the task is a function of a second module, then the benefits of the DVS scheme are diminished. Typically, the cycle count of a task on the first module might be dependent on a second module if the task needs the second module to perform a function. Some examples of possible functions to be transferred to another processing resource are:
-
- Hardware accelerators (turbo decoder)
- Memory transfer (DMA)
- Slave processors
- An aspect of the present invention provides a mechanism where the processing time for a slave module is linked to its master in such a way that the DVS scheme supported by the master can have the greatest benefit to the overall processing apparatus. In this aspect of the invention, information concerning the clock frequency, calculated by the master DVS manager, is inherited (or reused) by sub-modules whenever the master requests a function from the sub-module.
- Another aspect of the invention provides a computer apparatus comprising a master processing module and at least one sub-module, dynamic voltage scaling means being associated with the master module and operable to calculate dynamically an operating frequency for the master module, and wherein said sub-module is operable to use said operating frequency when accessed by the master module.
- In such a case, it can be said that the sub-module ‘inherits’ the operating frequency of the master module.
- In an embodiment of the invention, mapping means may be provided operable to map the master clock frequency to a generic speed request. This generic speed request can then be sent to the sub module in terms which it can interpret independently. This enables the sub-module to interpret a received generic speed request to take account of local processing capabilities or conditions, to achieve a result desired by the master module. For instance, the sub-module may interpret the speed request according to its processing type.
- A further aspect of the invention provides a computer processing apparatus comprising a plurality of processing modules, wherein at least one of said modules comprises dynamic voltage scaling means, and is operable to send to a further of said modules a functional request message for processing by said further module, wherein said functional request message is, in use, accompanied by a processing speed message.
- In said further aspect, the further module may be responsive to receipt of a speed message by controlling its clock frequency and/or operating voltage.
- A further aspect of the invention provides a computer processing apparatus comprising a plurality of modules, wherein at least one module comprises dynamic voltage scaling means and is operable to interact with another module by supplying it with a speed request associated with a functional request. Responsive to receiving a speed request, the module in receipt thereof is operable to interpret the speed request by control of at least one processing parameter governing execution of the associated functional request. The processing parameter may be the expected time for execution of the functional request.
- A further aspect of the invention provides a computer processing apparatus comprising a plurality of modules, wherein at least one module comprises dynamic voltage scaling means and is operable to interact with another module by supplying it with a clock signal when it requests said other module to execute a function. In addition to the clock signal, the module may be operable to supply a supply voltage to said other module when requesting said other module to execute a function.
- A further aspect of the invention provides a computer processing apparatus comprising a master module and a slave module, the master module being operable to send a functional request to said slave module for execution by said slave module of a requested function, the master module comprising dynamic voltage scaling (DVS) means operable to establish a DVS control scheme for the master processing module, and DVS linking means operable to relate the DVS control scheme to said slave processing module.
- A further aspect of the invention provides a method of controlling a computer processing apparatus comprising a master module and a slave module, comprising establishing a DVS control scheme for the master module, relating the DVS control scheme to said slave module, associating a DVS control request with a functional request wherein the DVS control request is in accordance with the slave module related DVS control scheme, and sending said functional request and said DVS control request from the master module to said slave module for execution by said slave module of a requested function in accordance with said DVS control request.
- Aspects of the invention can be implemented, by way of example, in a ‘system an a chip’ (SoC) context, for instance for a mobile telephone, or for execution of a video CODEC, for Games Equipment, or in base stations or access points. That is, aspects of the invention can be applied to a situation wherein a multi-processor architecture is provided, wherein there is a requirement to manage and possibly to minimise power consumption.
- Aspects of the invention can be implemented using software components, for execution by broadly generic computer hardware, such as a DSP or an FPGA. Such software components could be delivered by physical storage media, or by a signal.
- Further possible aspects, features and advantages of the invention will become apparent from the follow description of specific embodiments thereof, with reference to the accompanying drawings, in which:
-
FIG. 1 is a schematic diagram of a computer processing apparatus in accordance with a first specific embodiment of the invention; -
FIG. 2 is a schematic diagram of a master processor of the computer processing apparatus illustrated inFIG. 1 ; -
FIG. 3 is a schematic diagram of a slave processor of the computer processing apparatus illustrated inFIG. 1 ; -
FIG. 4 is a schematic diagram of a slave processor, in accordance with a second embodiment of the invention, for incorporation into the computer processing apparatus illustrated inFIG. 1 instead of the slave processor illustrated inFIG. 3 ; -
FIG. 5 is a schematic diagram of a slave processor, in accordance with a third embodiment of the invention, for incorporation into the computer processing apparatus illustrated inFIG. 1 instead of the slave processor illustrated inFIG. 3 ; and -
FIG. 6 is a schematic diagram of a wireless modem implemented in accordance with the computer processing apparatus of the first specific embodiment illustrated inFIG. 1 . -
FIG. 1 illustrates a first specific embodiment of the invention, in which acomputer processing apparatus 10 is illustrated. It will be appreciated by the reader that the illustrated example is but representative, and more complex apparatus including a larger number of processing elements can be provided. In this case, amaster processor 100 and aslave processor 200 are provided, each of which is operable to access abus 20 for transmission of messages between the two 100, 200. In conventional manner, the master can send aprocessing components function request 22 to the slave, to cause theslave 200 to perform a function for which it is better suited than themaster 100. It will be appreciated that the reasons why the master request to aslave 200 may depend on a number of factors, not just suitability for a particular task to be performed. - In addition to this, and in accordance with this specific embodiment of the invention, a
speed request 24 is sent alongside thefunction request 22 by themaster 100 to theslave 200. - The
master processing unit 100 is illustrated in further detail inFIG. 2 . Themaster processing unit 100 is compliant with the “globally asynchronous locally synchronous” (GALS) architecture, so comprises aprocessing element 110 operable in a synchronous domain, under the control of aDVS control unit 112 which supplies a clock and an associated supply voltage on the basis of a requested frequency. The frequency is determined in awrapper unit 120 which is an interface between asynchronous and synchronous architectures. Thewrapper unit 120 comprises afrequency register 122 which is programmed by aDVS manager 130. - In addition to outputting the frequency for use by the
DVS control unit 112, theregister 122 passes the frequency to afunctional block 140. This block converts the register frequency value for the clock speed in themaster processor unit 100, into a generic speed request. This generic speed request is then output assignal 24 previously described. Thissignal 24 is output alongside afunctional request signal 22 output by theprocessing element 110. Afunctional request signal 22 is output when the master module makes a request for a service from a different clock domain. An example could be a memory transfer request, or a hardware accelerator operation, such as to channel decode a block of data. - Similarly, a speed request is sent for use by the
slave module 200 receiving thefunctional request 22. Thisspeed request 24 is used by theslave module 200 to determine the mechanism of execution. - The effect of the speed request is to alter the time for which the
master processing unit 100 will wait for theslave processing unit 200 to complete its operation. Themaster processing unit 100 selects the value of the speed request based on the frequency voltage setting under which it is currently executing tasks. That is, if themaster processing unit 100 is operating at a relatively high master clock frequency (as governed by the DVS control unit 112), the speed request will correspondingly be high. Conversely, if themaster processing unit 100 currently executes at a relatively low speed, the speed request will consequently be adjusted to a lower level. - The speed request can be a generic value, for interpretation by the
slave processing unit 200 according to its type and structure. -
FIG. 3 illustrates in further detail the structure of theslave processing unit 200 of the first specific embodiment of the invention. Theslave processing unit 200 comprises aprocessing element 210, which is synchronous in nature and therefore governed by aDVS control unit 212, supplying a supply voltage and a clock thereto. TheDVS control unit 212 is governed by a frequency quantity, which is extracted from awrapper unit 220 comprising aregister 222 generating the frequency signal. Theregister 222 generates the frequency signal on the basis of afunctional block 240, in receipt of aspeed request signal 24. Consequently, afunctional request 22 received by theprocessing element 210 can be processed according to DVS conditions governed by thespeed request 24. - The
functional block 240 is architecture specific, and is designed for the capabilities of theslave unit 200. Theblock 240 converts the speed request into a form suitable for theslave processing unit 200. - This allows the
slave processing unit 200 to interpret the speed request in accordance with its own capabilities. It will be recognised by the reader that different types of modules may interpret the speed request differently. In addition, each processing unit may also have the capacity to modify its operating voltage or frequency to match the requested speed. This will allow for further saving in power consumption in the slave processing unit. - The following table sets out a correspondence between the master clock frequency output by the
DVS control unit 112 of themaster unit 100, with a generic speed request value, and with a priority value on the sharedbus 20. -
Priority Value on Shared Bus Master Clock Generic Speed (0 = lowest Frequency Request Value priority 50 Mhz 0 0 70 Mhz 1 2 90 Mhz 2 4 110 Mhz 3 6 130 Mhz 4 8 150 Mhz 5 10 170 Mhz 6 12 190 Mhz 7 14 -
FIG. 4 illustrates a schematic diagram of a second specific embodiment of aslave unit 300. Again, theslave unit 300 comprises aprocessing element 310 operable to respond to afunctional request 22 received on the bus. The processing element is governed in its ability to do this by means of a supply voltage VCC and a clock. However, in this case, the clock is generated by aclock generator 313, and the supply voltage is generated by apower supply unit 314. - The
wrapper unit 320 is also modified from thewrapper unit 220 of the first embodiment. The wrapper unit now comprises afunctional block 340 which is operable to interpret receivedspeed requests 24 into configuration commands for theprocessing element 310. Thus, there is no direct DVS control on the slave unit of the second embodiment. The slave unit however does not just adopt the DVS control of themaster unit 100, but instead interprets master unit speed requests 24 and provides local conditions in terms of configuration of theprocessing element 310 to enable tasks to be completed in an effective manner. - For example, if the
processing element 310 is a multithreaded processor, the processor can allocate different time slots to the thread associated with the function request. This will enable priority tasks to be completed more quickly, or low priority tasks to be completed more slowly, without DVS at the slave. - A third embodiment of the
slave unit 400 is illustrated inFIG. 5 . This example is particularly relevant wherein theprocessing apparatus 10 comprises a shared communication fabric. Theslave 400 of this example comprises awrapper 420 which now includes a functional block 422 which interprets speed requests into a control signal for a communication fabric controller 412. The communication fabric controller 412 manages access to the shared communication fabric. It is thus a direct memory access (DMA) controller. The control signals are operable to cause the communication fabric controller 412 to modify its operating voltage and frequency to match the requested speed represented by thespeed request 24. This allows for further saving in power consumption in the slave module. - Whereas in the thesis by Krstic, the clock speed of a slave module is determined by the status of the FIFO used to transfer data into the sub-module, this means that if no data is supplied, the clock used to drive the associated processing logic is switched off. The approach identified above allows for finer and more precise control of the operating mode and/or clock frequency of slave modules employed by a master module.
- The FIFO technique of Krstic has a high latency associated with it. The technique described above in accordance with the specific embodiments of the invention explicitly states the speed at which a slave module should run when the data is supplied and so avoids the lag caused by the FIFO buffer.
- Simple GALS/DVS schemes which only allow static setting of clock frequency and voltage do not take advantage of power savings possible due to the actual processing complexity being distributed i.e. having a mean and max value. By allowing sub-modules to inherit clock information, a communications network can take advantage of this aspect of power saving opportunities.
- This approach can be used to reduce power consumption in any complicated CMOS based electronic system. Typically, it could be used in a large SoC with multiple processing elements. However, it could also be applied to multi-processor designs such as the CELL. These electronic systems could then be used for sophisticated applications such as the base band processing in a wireless phone or base station or in a games machine.
- Embodiments of the invention will supply performance benefits when an application has variable complexity and requires the operating voltage and clock frequency to track the workload of the platform.
- As a practical example,
FIG. 6 depicts awireless modem system 50 comprising a digital signal processor (DSP) 500 executing the signal processing stages of the modem as well as a DVS management controller, as separate tasks, and ahardware accelerator 600 for implementing a turbo decoder. Both 500, 600 have their own clock and voltage generator (modules 512, 612 respectively), and processing elements (510, 610 respectively). ADVS Controller wrapper 520 is provided in the DSP for associating information with an execution request and for unwrapping information received from another processing entity in thesystem 50. Likewise, awrapper 620 is provided in theturbo decoder 600 for unwrapping information associated with an execution request received from theDSP 500, and also for associating items of information with each other for return to theDSP 500. - That is, this is a practical example of the first embodiment of the invention described above with reference to
FIGS. 1 and 2 . ADVS management task 530 defined in aprocessing element 510 of theDSP 500 provides the function of a DVS manager. The DVS manager in the DSP determines the clock frequency for the DSP at any particular time to ensure deadlines are achieved and power consumption is minimised. - A
wireless modem task 550 is also defined in theDSP processing element 510, to provide the signal processing functions referred to above in connection with the modem capability of thewireless modem system 50. Thewireless modem task 550, when requesting theturbo decoder 600 to execute, also includes a speed request with the functional request. This speed request is based on the speed currently set by theDVS manager 530. The speed request is written into a register in the turbo decoder'sDVS controller 612 at the same time as the control bits and parameters are written into their associated registers. In this way, the turbo decoder can be set a DVS profile suitable to its own hardware capabilities but also reflecting the overall system requirements as managed from theDSP 500.
Claims (18)
1. A computer processing apparatus comprising a master module and a slave module, the master module being operable to send a functional request to said slave module for execution by said slave module of a requested function, the master module comprising dynamic voltage scaling (DVS) means operable to establish a DVS control scheme for the master processing module, and DVS linking means operable to relate the DVS control scheme to said slave processing module.
2. Apparatus in accordance with claim 1 wherein said linking means is operable to send a DVS control message to said slave module alongside a functional request from said master module.
3. Apparatus in accordance with claim 2 wherein said DVS means is operable to determine clock frequency information defining a clock frequency for said master processing module, and wherein said linking means is operable to transfer said clock frequency information to said slave module in said DVS control message in conjunction with said functional request.
4. Apparatus in accordance with claim 1 wherein said DVS means is operable to calculate dynamically an operating frequency for the master module, and wherein said linking means is operable to send a DVS control message alongside a functional request, said DVS control message indicating said operating frequency to said slave module.
5. Apparatus in accordance with claim 1 wherein the master module further comprises DVS control information mapping means operable to map information defining a DVS control scheme for use by said master module into a generic speed request, said linking means being operable to send a generic speed request with a functional request, and wherein said slave module comprises generic speed information receiving means operable to cause said slave module to operate in accordance with said generic speed request.
6. Apparatus in accordance with claim 5 wherein said generic speed information receiving means is operable to map said generic speed information request to one of a plurality of available operating frequencies.
7. Apparatus in accordance with claim 5 wherein said generic speed information receiving means is operable to map said generic speed information request to one of a plurality of available supply voltages.
8. Apparatus in accordance with claim 5 wherein said generic speed information receiving means is operable to map said generic speed information request to one of a plurality of available operating speeds.
9. Apparatus in accordance with claim 5 wherein said generic speed information receiving means is operable to map said generic speed information request to a priority for a functional request sent with said generic speed information request.
10. A method of controlling a computer processing apparatus comprising a master module and a slave module, comprising establishing a DVS control scheme for the master module, relating the DVS control scheme to said slave module, associating a DVS control request with a functional request wherein the DVS control request is in accordance with the slave module related DVS control scheme, and sending said functional request and said DVS control request from the master module to said slave module for execution by said slave module of a requested function in accordance with said DVS control request.
11. A method in accordance with claim 10 and including determining clock frequency information defining a clock frequency for said master module, and transferring said clock frequency information to said slave module in said DVS control request in conjunction with said functional request.
12. A method in accordance with claim 10 and including calculating dynamically an operating frequency for the master module, and sending a DVS control request alongside a functional request, said DVS control request indicating said operating frequency to said slave module.
13. A method in accordance with claim 10 and including mapping said information defining a DVS control scheme for use by said master module into a generic speed request, and sending said generic speed request with said functional request, receiving said generic speed request at said slave module such that said slave module is caused to operate in accordance with said generic speed request.
14. A method in accordance with claim 13 and including mapping, at said slave module, said generic speed information request to one of a plurality of available operating frequencies.
15. A method in accordance with claim 13 and including mapping, at said slave module, said generic speed information request to one of a plurality of available supply voltages.
16. A method in accordance with claim 13 and including mapping, at said slave module, said generic speed information request to one of a plurality of available operating speeds.
17. A method in accordance with claim 13 and including mapping, at said slave module, said generic speed information request to a priority for a functional request sent with said generic speed information request.
18. A computer program product comprising computer executable instructions which, when loaded on a computer, cause said computer to perform a method in accordance with any one of claims 10 to 17 .
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB0718100.1 | 2007-09-17 | ||
| GB0718100A GB2452778A (en) | 2007-09-17 | 2007-09-17 | Linking dynamic voltage scaling in master and slave modules |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20090077290A1 true US20090077290A1 (en) | 2009-03-19 |
Family
ID=38659090
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US12/212,114 Abandoned US20090077290A1 (en) | 2007-09-17 | 2008-09-17 | Controller for processing apparatus |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20090077290A1 (en) |
| JP (1) | JP2009070389A (en) |
| GB (1) | GB2452778A (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110173617A1 (en) * | 2010-01-11 | 2011-07-14 | Qualcomm Incorporated | System and method of dynamically controlling a processor |
| US20130067130A1 (en) * | 2010-05-21 | 2013-03-14 | Nec Corporation | Bus control apparatus and bus control method |
| CN113032015A (en) * | 2019-12-24 | 2021-06-25 | 中国科学院沈阳自动化研究所 | Communication method for precision motion control |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8601296B2 (en) * | 2008-12-31 | 2013-12-03 | Intel Corporation | Downstream device service latency reporting for power management |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030088778A1 (en) * | 2001-10-10 | 2003-05-08 | Markus Lindqvist | Datacast distribution system |
| US20050090235A1 (en) * | 2003-10-27 | 2005-04-28 | Larri Vermola | Apparatus, system, method and computer program product for service selection and sorting |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6766460B1 (en) * | 2000-08-23 | 2004-07-20 | Koninklijke Philips Electronics N.V. | System and method for power management in a Java accelerator environment |
| JP2002351436A (en) * | 2001-05-25 | 2002-12-06 | Sony Corp | Display apparatus and transition and return method of display apparatus to low power consumption mode |
| JP2006163970A (en) * | 2004-12-09 | 2006-06-22 | Mitsubishi Electric Corp | Multiprocessor system, multiprocessor control method, and multiprocessor control program recording medium |
-
2007
- 2007-09-17 GB GB0718100A patent/GB2452778A/en not_active Withdrawn
-
2008
- 2008-09-17 JP JP2008238033A patent/JP2009070389A/en not_active Abandoned
- 2008-09-17 US US12/212,114 patent/US20090077290A1/en not_active Abandoned
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030088778A1 (en) * | 2001-10-10 | 2003-05-08 | Markus Lindqvist | Datacast distribution system |
| US20050090235A1 (en) * | 2003-10-27 | 2005-04-28 | Larri Vermola | Apparatus, system, method and computer program product for service selection and sorting |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110173617A1 (en) * | 2010-01-11 | 2011-07-14 | Qualcomm Incorporated | System and method of dynamically controlling a processor |
| US8671413B2 (en) * | 2010-01-11 | 2014-03-11 | Qualcomm Incorporated | System and method of dynamic clock and voltage scaling for workload based power management of a wireless mobile device |
| US8996595B2 (en) | 2010-01-11 | 2015-03-31 | Qualcomm Incorporated | User activity response dynamic frequency scaling processor power management system and method |
| US20130067130A1 (en) * | 2010-05-21 | 2013-03-14 | Nec Corporation | Bus control apparatus and bus control method |
| CN113032015A (en) * | 2019-12-24 | 2021-06-25 | 中国科学院沈阳自动化研究所 | Communication method for precision motion control |
Also Published As
| Publication number | Publication date |
|---|---|
| GB2452778A (en) | 2009-03-18 |
| JP2009070389A (en) | 2009-04-02 |
| GB0718100D0 (en) | 2007-10-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Aydin et al. | Dynamic and aggressive scheduling techniques for power-aware real-time systems | |
| US10613876B2 (en) | Methods and apparatuses for controlling thread contention | |
| JP5175335B2 (en) | Priority-based throttling for power / performance quality of service | |
| Aydin et al. | Power-aware scheduling for periodic real-time tasks | |
| EP3155521B1 (en) | Systems and methods of managing processor device power consumption | |
| US9098274B2 (en) | Methods and apparatuses to improve turbo performance for events handling | |
| US8489904B2 (en) | Allocating computing system power levels responsive to service level agreements | |
| US9009512B2 (en) | Power state synchronization in a multi-core processor | |
| US9342122B2 (en) | Distributing power to heterogeneous compute elements of a processor | |
| EP2430541B1 (en) | Power management in a multi-processor computer system | |
| EP2469377A2 (en) | Decentralized power management distributed among multiple processor cores | |
| US20100332883A1 (en) | Method and system for event-based management of resources | |
| CN101403982A (en) | Task distribution method, system and equipment for multi-core processor | |
| TW200426688A (en) | Performance scheduling using multiple constraints | |
| Raghunathan et al. | Adaptive power-fidelity in energy-aware wireless embedded systems | |
| CN101770273A (en) | Method for realizing energy saving of system with a plurality of central processing units of server and device therefor | |
| EP2031510A1 (en) | Semiconductor integrated circuit | |
| EP3770727A1 (en) | Technology for managing per-core performance states | |
| US20090077290A1 (en) | Controller for processing apparatus | |
| CN112230757A (en) | Method and system for power reduction by empting a subset of CPUs and memory | |
| Nélis et al. | Power-aware real-time scheduling upon identical multiprocessor platforms | |
| US20160320832A1 (en) | Controlling processor consumption using on-off keying having a maximum off time | |
| Ykman-Couvreur et al. | Run-time resource management based on design space exploration | |
| Jejurikar et al. | Integrating preemption threshold scheduling and dynamic voltage scaling for energy efficient real-time systems | |
| Zhou et al. | Shum-ucos: A rtos using multi-task model to reduce migration cost between sw/hw tasks |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: KABUSHIKI KAISHA TOSHIBA, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:DOLWIN, ANTHONY CRAIG;REEL/FRAME:021821/0398 Effective date: 20081015 |
|
| STCB | Information on status: application discontinuation |
Free format text: EXPRESSLY ABANDONED -- DURING EXAMINATION |