GB2590389A - Method and apparatus for control of memory banks - Google Patents
Method and apparatus for control of memory banks Download PDFInfo
- Publication number
- GB2590389A GB2590389A GB1918459.7A GB201918459A GB2590389A GB 2590389 A GB2590389 A GB 2590389A GB 201918459 A GB201918459 A GB 201918459A GB 2590389 A GB2590389 A GB 2590389A
- Authority
- GB
- United Kingdom
- Prior art keywords
- memory banks
- mode
- sequence
- memory
- accesses
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
- G06F1/3215—Monitoring of peripheral devices
- G06F1/3225—Monitoring of peripheral devices of memory devices
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/325—Power saving in peripheral device
- G06F1/3275—Power saving in memory, e.g. RAM, cache
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3287—Power saving characterised by the action undertaken by switching off individual functional units in the computer system
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0625—Power saving in storage systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0634—Configuration or reconfiguration of storage systems by changing the state or mode of one or more devices
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0653—Monitoring storage devices or systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0679—Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Power Sources (AREA)
Abstract
Embodiments relate to an apparatus comprising at least one processor, a plurality of memory banks and an address bus, wherein the at least one processor is configured for accessing an address in said plurality of memory banks based on specifying said address on the address bus, wherein at least one of said memory banks is operable in at least a first mode and a second mode, wherein the power consumption of the memory banks in the second mode is lower than in the first mode, the apparatus further comprising means configured for: - determining a past sequence of memory banks accesses based on a sequence of addresses on said address bus, - predicting a future sequence of memory banks accesses based on said past sequence, - controlling at least one of the memory banks in the first mode or the second mode based on the future sequence.
Description
Method and apparatus for control of memory banks
FIELD OF THE INVENTION
Embodiments of the present invention relate to the field of computing. In particular, embodiments of the present invention relate to the 5 interworking of one or more processors and memory banks.
BACKGROUND
The power consumption of a computing apparatus impacts how long it can operate without a battery recharge and/or the operating cost. 10 Techniques to limit the power consumption are therefore desirable.
SUMMARY
It is thus an object of embodiments of the present invention to propose a method and an apparatus, which do not show the inherent
shortcomings of the prior art.
Accordingly, embodiments relate to an apparatus comprising at least one processor, a plurality of memory banks and an address bus, wherein the at least one processor is configured for accessing an address in said plurality of memory banks based on specifying said address on the address bus, wherein at least one of said memory banks is operable in at least a first mode and a second mode, wherein the power consumption of the memory banks in the second mode is lower than in the first mode, the apparatus further comprising means configured for: -determining a past sequence of memory banks accesses based on a sequence of addresses on said address bus, - predicting a future sequence of memory banks accesses based on said past sequence, - controlling at least one of the memory banks in the first mode or the second 30 mode based on the future sequence.
In some embodiments, the means are configured for predicting a future sequence of memory banks accesses based on a recurrent neural network having an input sequence specifying at least the past sequence of memory bank accesses and providing an output sequence specifying at least the future sequence of memory banks accesses. -2 -
In some embodiments, the recurrent neural network comprises at least one LSTM core.
In some embodiments, the recurrent neural network comprises a plurality of LSTM cores associated with respective part of an input vector of 5 the input sequence, and a fully connected part.
In some embodiments, the means are configured for training said recurrent neural network based on a comparison between input sequences and output sequences.
In some embodiments, the means are further configured for 10 determining, for respective memory banks, a time difference between two successive memory bank accesses, wherein the input sequence specifies said time differences.
In some embodiments, controlling at least one of the memory banks in the first mode or the second mode based on the future sequence 15 comprises setting a memory bank in the first mode in response to determining that said memory banks will be accessed within a given time.
In some embodiments, controlling at least one of the memory banks in the first mode or the second mode based on the future sequence comprises setting a memory bank in the second mode in response to determining that said memory banks will not be accessed within a given time.
In some embodiments, the means are further configured for controlling the processor to halt execution in response to determining that an address on the address bus correspond to a memory bank in the second 25 operating mode.
In some embodiments, the apparatus comprises an integrated circuit including said at least one processor, the plurality of memory banks the address bus and said means.
Said means may comprises a memory controller circuit.
Correlatively, some embodiments relate to a method executed by the apparatus.
The scope of protection sought for various embodiments of the invention is set out by the independent claims. The embodiments and features, if any, described in this specification that do not fall under the scope of the independent claims are to be interpreted as examples useful for understanding various embodiments of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS -3 -
The above and other objects and features of the invention will become more apparent and the invention itself will be best understood by referring to the following description of embodiments taken in conjunction with the accompanying drawings wherein: Figure 1 is a block diagram of an apparatus for processing data according to example embodiments, Figure 2 is a block diagram of a system-on-a-chip according to example embodiments, Figure 3 is a flowchart of a method according to example 10 embodiments, Figure 4 is a block diagram of an memory controller circuit according to example embodiments, Figure 5 is a flowchart of a method according to example embodiments, Figures 6 and 7 are block diagram of neural networks according to
example embodiments.
DESCRIPTION OF EMBODIMENTS
System on-chip (SoC) implementations, ranging from modern CPUs/GPUs which target high performance computing to low-cost industrial Internet of things devices (1IoT), are constrained to operate at low power. Power consumption of such devices determine how long they can operate without battery recharge and/or the operating cost. In some cases, like in lloT, they are required to operate for years while in case of a smartphone (with CPU/GPU's) they are expected to operate a couple of days. Most of these SoC designs have a processor complex that includes CPUs and multiple on-chip SRAM (Static Random Access Memory) banks (which can be instruction or Ll /L2/L3 data memory).
Typically, in most sub-micron process technologies (90nm and below), memory leakage and access power consumption dominate the overall power consumption of the SoC. Retaining data in the SRAM banks needs power to be constantly supplied (however rarely they are used) resulting in wastage of power.
A memory bank may support low power modes, in addition to the 35 active or normal operational mode, to reduce SRAM power consumption. Examples of low power modes include: 1. Sleep: turns off peripheral control and 10 ports of the memory, reducing leakage power but retaining data. May be used when data is -4 -accessed in bursts or intermittently and data retention is required. This saves typically 30% of leakage power. However, this needs at least one clock cycle latency between turning on from sleep mode and the time data is first accessed.
2. Shutdown: turn off power supply to most of the components inside the memory bank, reducing leakage to 90%, but also losing the all the data stored. May be used when data retention is not required. Waking up from shutdown mode may needs few additional clock cycles.
The low power options may be exercised as needed either by the compiler or by SoC design implementation. Compiler driven optimizations typically are guided by the application programmer who has the expert knowledge on which banks and what data are to be retained or forgotten through compiler directives. The application programmer can be further aided by knowledge of hardware architecture to control the power islands on the SoC. However, as system gets larger, these fine grain control (even if provided in hardware) becomes harder for application programmer to use and hence limits to five or less power down modes.
A static analysis of the application code can be done to ascertain at what point in the application code the bank access would be needed and replace that instruction with an equivalent instruction that would do the same access but with a pre-deactivate sleep operation. Similarly, once the memory bank access is no longer needed one can introduce activate-sleep as part of the last access to memory bank. This solution would suffice if the application code is largely static and the post-compile static analysis of assembly instructions is possible. In many cases, this is not possible as CPUs or any processing unit is multi-threaded or multicore where several cores run parallelly and access a shared memory bank resource. This limits the usage of such an implementation as the application programmer is not necessarily in full control/knowledge of what processes are running and on which subsystem, if the application code is dynamic and includes many data depended context switches.
Figure 1 is a block diagram of an apparatus 1 for processing data. The apparatus 1 includes at least a data processing component 2. The apparatus 1 may comprises other components not shown on Figure 1, including a battery, a user interface (display, touchscreen, keys, speakers...), interfaces for wire connection (USB, headphones, network, electrical -5 -power...), wireless communication interfaces (Wi-Fi, Bluetooth, cellular network...) The apparatus 1 may be a user terminal, for example a smartphone, a personal computer, a wearable device (e.g. smartwatch, 5 earpiece, headphone...). The apparatus 1 may be a network element, for example a router, gateway, (part of) a base station, server...
The apparatus 1 may be a battery-powered device. Limiting the power consumption of the components, in particular of the data processing component 2, is desirable.
The data processing component 2 may comprises an integrated circuit. For example, the data processing component is or includes a System-on-a-chip (SoC). A more detailed embodiments is described hereafter.
Figure 2 is a block diagram of a data processing component 2. The 15 processing component 2 may be included in the apparatus 1. The processing component 2 may be an integrated circuit, for example a SoC. The processing component 2 may comprise a processing subsystem 20, a memory controller 22 and a pool 23 of memory banks 24 interconnected by one or more buses and/or signal paths.
The processing subsystem 20 may comprises one or more processors 21. A processor 21 may be referred to for example as a CPU, a GPU, a processing core... Figure 2 shows an example including M CPUs denoted CP1, CPU2...CPUM.
The pool 23 comprises a plurality of memory banks 24. Figure 2 shows an example including N memory banks 24 denoted bi, b2...bN, with N > 1. A memory bank 24 is configured for storing data and/or instructions used by the processors 21. For example, a memory bank 24 may be a SRAM and may be part of a cache hierarchy. A memory bank 24 may be operable in a plurality of operating modes. For example, a memory bank 24 is operable in at least two of: A normal operating mode, wherein data is retained, and memory access requires few clock cycles, for example only one clock cycle, A sleep mode, wherein data is retained, and memory access requires more clock cycles than in the normal operating mode, for example two clock cycles,
-
A power down mode, wherein data is not retained, and memory access requires more clock cycles than in the normal operating mode and/or the sleep mode.
The power consumption of a memory bank 24 may be higher in 5 the normal operating mode than in the sleep mode, and higher in the sleep mode than in the power down mode.
The memory controller 22 controls at least in part the interaction between the processors 21 and the memory banks 24. In particular, the memory controller 22 may control one or more of the memory banks 24 in 10 the normal operating mode, the sleep mode or the power down mode based on predicted memory banks accesses, as described in more detail hereafter. This help in limiting the power consumption of the apparatus 1. The memory controller 22 may also be referred to as a memory interconnect. The processors 21, the memory controller 22 and the memory 15 banks 24 may interact based on buses and/or signal paths.
In particular, the processing component 2 may comprises an address bus 25. A processor 21 may access an address in the plurality of memory banks based on specifying the address on the address bus 25. An address may specify which memory bank 24 is to be accessed, and a location within said memory bank 24. For example, an address is specified by a plurality of bits, with the most significant bits (MSB) indicating the memory bank 24 and the least significant bits indicating the location within the memory bank 24.
The memory controller 22 may provide memory enable signals 29 for controlling at least one of the memory banks 24 in one of the normal operating mode, the sleep mode or the shutdown mode based on predicted memory banks accesses. This is described in more detail hereafter with reference to Figure 3.
The processing subsystem 20 may provide a program counter 30 signal 26 to the memory controller 22. The program counter signal 26 provides an indication of the progress of execution of a set of instructions.
The memory controller 22 may provide a halt signal 27 to the processors 21. The halt signal 27 indicates that the processors 21 should wait before requesting further memory access.
A processor 21 may provide a read/write signal 28 to the memory banks 24. The read/write signal 28 indicates whether a memory access requested by a processor 21 is for reading or writing in the memory banks 24. -7 -
The processing component 2 may also comprises a data bus (not shown) for communication of data and/or instructions between the processing subsystem 20 and the memory banks 24.
The processing subsystem 20, memory controller 22, pool 23 of 5 memory banks 24, buses and signal paths may be implemented based on integrated circuit technology, for example FPGA, ASIC, SoC...
Figure 3 is a flowchart of operations executed by the apparatus 1, in parficular by the memory controller 22, for controlling the memory banks 10 24.
The processing subsystem 20 executes a set of instructions, and this involve accessing the memory banks 24. A memory access comprises providing an address on the address bus 25.
At operation 31, the memory controller 22 determines a past sequence of memory bank accesses based on a sequence of addresses on the address bus 25. For example, for successive addresses on the address bus 25, the memory controller 22 parses the most significant bits and determines the memory bank 24 corresponding to the address. The determined memory bank 24 may be represented by a one-hot encoded vector of size N: The N elements (bi, b2... bN) of the vector correspond to the N memory banks, and all elements are 0 except the element corresponding to the determined memory bank 24 which is equal to 1. A sequence of L memory bank accesses may be specified by L vectors of size N or a LxN matrix.
At operation 32, the memory controller 22 predicts a future sequence of memory bank accesses based on the past sequence of memory bank accesses. The future sequence may be of the same length L as the past sequence, or a different length. The format of the predicted sequence may match the format of the past sequence, for example L one-hot encoded vectors of size N. Various sequence prediction techniques may be used. For example, the prediction may involve a recurrent neural network, e.g. a LSTM neural network, which uses sequences of memory bank accesses to progressively get trained. In some embodiments, the memory controller 22 also determines time differences between memory bank accesses, and the time differences are taken into account for the prediction. Example are described in greater detail with reference to Figures 4 to 7.
At operation 33, the memory controller 22 controls the memory banks 24 in one of the normal operation mode, the sleep mode and the -8 -power-down mode based on the predicted sequence. The memory controller 22 may control the memory banks 24 based on providing memory enable signals 29 to the memory banks 24. For example, if the predicted sequence indicates that a given memory bank 24 will be accessed during the next L accesses, the memory controller 22 sets this memory bank in the normal operation mode, and if the predicted sequence indicates that a given memory bank 24 will not be accessed during the next L accesses, the memory controller 22 sets this memory bank in the sleep mode.
Operations 31, 32 and/or 33 may be executed at least partially in 10 parallel. For example, the memory controller 22 may start to predict the future sequence of memory bank accesses before the past sequence has been fully obtained.
Figure 4 is a block diagram of the memory controller 22 according 15 to some embodiments.
The memory controller 22 comprises a sequence determination module 41, a time differentiation module 42, a prediction module 43 and a control module 44.
The sequence determination module 41 parses the address specified on the address bus 25, for example the most significant bits, and determines the memory bank 24 corresponding to the address. As explained before, the determined memory bank 24 may be represented by a one-hot encoded vector of size N (bi, b2... bN). This operation is repeated for successive addresses.
The time differentiation module 42 determine the time difference between accesses At. More specifically, for a given memory bank 24, the memory controller 22 determines At = taccess faccess_prev where faccess is a current time when the memory bank 24 is accessed and taccess_pre, is a time when the memory bank 24 was last accessed. The time difference At may be expressed in clock cycles and determined for example based on a local clock cycle counter in the memory controller 22 or on the program counter signal 26 if available, and determined based on registers which stores the value of taccess_prev for each memory banks 24.
In a typical use case, the memory accesses in the applications are done in a periodic manner. Thus, the number of unique At values are much smaller than the number of unique sequential memory access patterns. For example, depending on the application At may take a number of discrete values comprised between 0 and a max number of say 100 or so. Hence, it -9 -is possible to predict these At values with a predictor with categorical output (i.e. as a one hot vector). In some embodiments, banks access enables (b), bn) and one hot encoded At values are concatenated for L time steps and fed to the prediction module 43.
The input of the prediction module 43 is a sequence of L input vector, wherein an input vector is the concatenation of a bank access enable vector (bi, b2... bN) and a one-hot encoded At value. The prediction module 43 applies a sequence prediction technique to determine an output sequence of L output vector, wherein an output vector is the concatenation of a predicted bank access enable vector (bi, b2... bN) and a one-hot encoded At value.
The prediction module 43 may be implemented based on a machine learning model, for example a neural network. In some embodiment, the prediction module 43 comprises one or more LSTM unit followed by a fully connected neural network. The prediction module 43 may provide categorical output, e.g. the predicted memory bank 24 and At value for successive time steps, both encoded as one-hot vectors.
The memory controller 22 may train the prediction module 43 based on a comparison of input sequences and corresponding output sequences. For example, two loss functions (categorical cross entropy) are applied to each of the two group of outputs ((bi, b2... bN) and At values) and optimized using stochastic gradient descent. The training may be online or offline. Online training involves regularly updating the machine learning model during operation, which allows to adapt to a changing context, for example a new or updated application being executed.
The control module 44 determines memory enable signals 29 based on the output sequence provided by the prediction module 43. For example, if the predicted sequence indicates that a given memory bank 24 will be accessed during the next L accesses, the memory controller 22 sets this memory bank in the normal operation mode, and if the predicted sequence indicates that a given memory bank 24 will not be accessed during the next L accesses, the memory controller 22 sets this memory bank in the sleep mode.
Figure 5 is a flowchart of a method according to embodiments, which may be executed with the memory controller 22 of Figure 4.
-10 -At operation 51, in an initial state, the processing subsystem 20 starts execution and the memory banks 24 are in the normal operating mode.
At operation 52, the memory controller 22 continuously reads the 5 address bus, the program counter signal 26 if available, and determines the bank accesses and At values to learn the dynamic context of memory access pattern. For example, parameters of a neural network are updated based on a comparison between a past sequence of memory bank accesses and a corresponding predicted sequence of memory bank 10 accesses.
At operation 53, the memory controller 22 controls the memory banks 24 in one of the normal, sleep and shutdown mode. For example, in some embodiments, determines the memory bank 24 to be access next based on the predicted sequence and sends a sleep or shutdown command to the other memory banks 24. In other embodiments, the memory controller 22 may set one or more memory banks 24 in the normal active mode and other in the sleep or shutdown mode, based on an overview of the predicted access specified by the predicted (b), b2... bN) and values.
Operations 52 and 53 are normally repeated. However, when the memory controller 22 determines that an address on the address bus 25 correspond to a memory bank 24 in sleep or shutdown mode, meaning that the prediction was wrong, the memory controller 22 issues a halt command to the processing subsystem 20 at operation 54 and turns the memory bank 24 in the normal operation mode.
In that case, at operation 55, the processor execution is halted for a predetermined time, so that the memory bank 24 comes out of the sleep or shutdown mode and is ready to be accessed.
In the apparatus 1, in particular in the processing component 2, the memory bank accesses are predicted based on past memory bank accesses. This allows controlling the memory banks 24 in a low power operating mode when possible. This reduce the power consumption of the processing component 2. Since the memory bank accesses are predicted based on past memory bank accesses, the applications executed by the processing subsystem 20 do not need to include predetermined instructions for controlling the memory banks 24 in one of the operating modes. Moreover, the prediction module 43 may be self-trained based on a comparison of input sequences and output sequences. Accordingly, the prediction module 43 may adapt itself to various applications executed by the processing component 2 and my update itself when the applications changes.
In an example embodiment, the processing subsystem 20 executes a low-density parity-check algorithm for a channel coding application, the number of memory banks 24 is N=16, the time difference At is encoded as 76 one-hot vectors and the size of the sequence is L=20. The memory access patterns during decoding of 12 different LDPC check matrix sizes were memorized by the prediction module 43 with an accuracy of 98%100% with less than 40 epochs.
In other words, after executing a full assembly code 40 times, the prediction module 43 remembers and accurately predicts the next 20 banks to be access given the last 20 access pattern with an accuracy of at least 98%. This in turn results in at least 30% savings in power consumption as the banks not used in the next 20 time steps can be in sleep mode.
If the application code changes, the prediction module 43 can retrain and re learn the new access pattern in place.
Figure 6 is a block diagram of a neural network 60 which can be used to implement the prediction module 43.
The neural network 60 comprises a LSTM core 61, a fully connected part 62 and an output part 63.
The LSTM core 61 takes the sequence of input vector as input and 25 provides its output to the fully connected pad 62. In some embodiments, the LSTM pad comprises 10 hidden nodes.
The fully connected pad 62 comprises a plurality of fully connected layers, for example a first layer of 64 nodes and a second layer of 20 nodes. The fully connected pad 62 provides its output to the output pad 63.
The output part 63 comprises a layer with softmax activation. It is composed of two pads, one to predict the memory bank access and one to predict the At.
Figure 7 is a block diagram of a neural network 70 which can be 35 used to implement the prediction module 43.
The neural network 70 comprises a LSTM pad 71, a fully connected pad 72 and an output pad 73, and may be seen as a variant of the neural network 60 which can be used for more complex and/or lager patterns.
-12 -In comparison with the neural network 60, the neural network 70 comprises a LSTM part 71 with an ensemble network composed of LSTM cores each containing of 10 LSTM hidden nodes. Each LSTM core now trains on part of the total address space.
The following feed forward pad 72 learns to switch between the predictions of each core.
In the context of this description, a machine learning model is a function for outputting an output based on an input, which depends on trainable parameters. An example of machine learning model is a neural network, with weights and biases as parameters. Training the machine learning model is the task of determining the parameters of the model based on training data.
It should be noted that although examples of methods have been 15 described with a specific order of steps, this does not exclude other implementations. In particular, the described steps may be executed in another order, partially or totally in parallel...
It is to be remarked that the functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared, for example in a cloud computing architecture. Moreover, explicit use of the term "processor" should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), read only memory (ROM) for storing software, random access memory (RAM), and non-volatile storage. Other hardware, conventional and/or custom, may also be included. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
It should be further appreciated by those skilled in the ad that any block diagrams herein represent conceptual views of illustrative circuitry embodying the principles of the invention. Similarly, it will be appreciated -13 -that any flow charts represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
While the principles of the invention have been described above in connection with specific embodiments, it is to be clearly understood that this description is made only by way of example and not as a limitation on the scope of the invention, as defined in the appended claims.
Claims (15)
- -14 -CLAIMS 1. Apparatus (1) comprising at least one processor (21), a plurality of memory banks (24) and an address bus (25), wherein the at least one processor is configured for accessing an address in said plurality of memory banks based on specifying said address on the address bus, wherein at least one of said memory banks is operable in at least a first mode and a second mode, wherein the power consumption of the memory banks 10 in the second mode is lower than in the first mode, the apparatus further comprising means configured for: - determining (31) a past sequence of memory banks accesses based on a sequence of addresses on said address bus, - predicting (32) a future sequence of memory banks accesses based on 15 said past sequence, - controlling (33) at least one of the memory banks in the first mode or the second mode based on the future sequence.
- 2. Apparatus according to claim 1, wherein said means are configured for predicting a future sequence of memory banks accesses based on a recurrent neural network haying an input sequence specifying at least the past sequence of memory bank accesses and providing an output sequence specifying at least the future sequence of memory banks accesses.
- 3. Apparatus according to claim 1, wherein the recurrent neural network comprises at least one LSTM core.
- 4. Apparatus according to claim 3, wherein the recurrent neural 30 network comprises a plurality of LSTM cores associated with respective part of an input vector of the input sequence, and a fully connected part.
- 5. Apparatus according to one of claims 2 to 4, wherein said means are configured for training said recurrent neural network based on a 35 comparison between input sequences and output sequences.
- 6. Apparatus according to one of claims 2 to 5, wherein said means are further configured for determining, for respective memory banks, a time -1 5 -difference between two successive memory bank accesses, wherein the input sequence specifies said time differences.
- 7. Apparatus according to one of claims 1 to 6, wherein controlling at least one of the memory banks in the first mode or the second mode based on the future sequence comprises setting a memory bank in the first mode in response to determining that said memory banks will be accessed within a given time.
- 8. Apparatus according to one of claims 1 to 7, wherein controlling at least one of the memory banks in the first mode or the second mode based on the future sequence comprises setting a memory bank in the second mode in response to determining that said memory banks will not be accessed within a given time.
- 9. Apparatus according to one of claims 2 to 5, wherein said means are further configured for controlling the processor to halt execution in response to determining that an address on the address bus correspond to a memory bank in the second operating mode.
- 10. Apparatus according to one of claims 1 to 9, comprising an integrated circuit (2) including said at least one processor, the plurality of memory banks the address bus and said means.
- 11. A method executed by an apparatus comprising at least one processor, a plurality of memory banks and an address bus, wherein the at least one processor is configured for accessing an address in said plurality of memory banks based on specifying said address on the address bus, wherein at least one of said memory banks is operable in at least a first mode and a second mode, wherein the power consumption of the memory banks in the second mode is lower than in the first mode, the method comprising: - determining a past sequence of memory banks accesses based on a 35 sequence of addresses on said address bus, - predicting a future sequence of memory banks accesses based on said past sequence, - controlling at least one of the memory banks in the first mode or the second mode based on the future sequence.
- 12. Method according to claim 11, comprising predicting a future sequence of memory banks accesses based on a recurrent neural network having an input sequence specifying at least the past sequence of memory bank accesses and providing an output sequence specifying at least the future sequence of memory banks accesses.
- 13. Method according to claim 12, comprising determining, for respective memory banks, a time difference between two successive memory bank accesses, wherein the input sequence specifies said time differences.
- 14. Method according to one of claims 11 to 13, wherein controlling at least one of the memory banks in the first mode or the second mode based on the future sequence comprises setting a memory bank in the first mode in response to determining that said memory banks will be accessed within a given time.
- 15. Method according to one of claims 11 to 15, wherein controlling at least one of the memory banks in the first mode or the second mode based on the future sequence comprises setting a memory bank in the second mode in response to determining that said memory banks will not be accessed within a given time.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB1918459.7A GB2590389A (en) | 2019-12-16 | 2019-12-16 | Method and apparatus for control of memory banks |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB1918459.7A GB2590389A (en) | 2019-12-16 | 2019-12-16 | Method and apparatus for control of memory banks |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| GB201918459D0 GB201918459D0 (en) | 2020-01-29 |
| GB2590389A true GB2590389A (en) | 2021-06-30 |
Family
ID=69186736
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| GB1918459.7A Withdrawn GB2590389A (en) | 2019-12-16 | 2019-12-16 | Method and apparatus for control of memory banks |
Country Status (1)
| Country | Link |
|---|---|
| GB (1) | GB2590389A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240111437A1 (en) * | 2022-09-30 | 2024-04-04 | Silicon Laboratories Inc. | Memory allocation based on lifespan |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180150123A1 (en) * | 2016-11-28 | 2018-05-31 | Qualcomm Incorporated | Wifi memory power minimization |
| US20190339892A1 (en) * | 2018-05-03 | 2019-11-07 | Mediatek Inc. | Memory management system and memory management method for dynamic memory management |
| US20190370632A1 (en) * | 2018-05-31 | 2019-12-05 | Google Llc | Computer system prediction machine learning models |
-
2019
- 2019-12-16 GB GB1918459.7A patent/GB2590389A/en not_active Withdrawn
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180150123A1 (en) * | 2016-11-28 | 2018-05-31 | Qualcomm Incorporated | Wifi memory power minimization |
| US20190339892A1 (en) * | 2018-05-03 | 2019-11-07 | Mediatek Inc. | Memory management system and memory management method for dynamic memory management |
| US20190370632A1 (en) * | 2018-05-31 | 2019-12-05 | Google Llc | Computer system prediction machine learning models |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240111437A1 (en) * | 2022-09-30 | 2024-04-04 | Silicon Laboratories Inc. | Memory allocation based on lifespan |
| US12346577B2 (en) * | 2022-09-30 | 2025-07-01 | Silicon Laboratories Inc. | Memory allocation based on lifespan |
Also Published As
| Publication number | Publication date |
|---|---|
| GB201918459D0 (en) | 2020-01-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20240420742A1 (en) | Parallel access to volatile memory by a processing device for machine learning | |
| EP3864582B1 (en) | Modifying machine learning models to improve locality | |
| US11921561B2 (en) | Neural network inference circuit employing dynamic memory sleep | |
| TWI537821B (en) | Providing per core voltage and frequency control | |
| CN110750351B (en) | Multi-core task scheduler, multi-core task scheduling method, multi-core task scheduling device and related products | |
| US20220076739A1 (en) | Memory context restore, reduction of boot time of a system on a chip by reducing double data rate memory training | |
| KR20190120698A (en) | System and method for optimizing performance of a solid-state drive using a deep neural network | |
| CN115516437A (en) | Host-Assisted Memory-Side Prefetcher | |
| CN111105023B (en) | Data stream reconstruction method and reconfigurable data stream processor | |
| US12265492B2 (en) | Circular buffer for input and output of tensor computations | |
| US11733763B2 (en) | Intelligent low power modes for deep learning accelerator and random access memory | |
| KR102425909B1 (en) | Neural network computing system and operating method thereof | |
| US20190286971A1 (en) | Reconfigurable prediction engine for general processor counting | |
| CN117223009A (en) | Performance scaling of data stream deep neural network hardware accelerators | |
| US12197362B2 (en) | Batch matrix multiplication operations in a machine learning accelerator | |
| CN112906877A (en) | Data layout conscious processing in memory architectures for executing neural network models | |
| CN112052943B (en) | Electronic device and method for performing operation of the electronic device | |
| CN116997910A (en) | tensor controller architecture | |
| EP4022446A1 (en) | Memory sharing | |
| GB2590389A (en) | Method and apparatus for control of memory banks | |
| CN108830379B (en) | Neural morphology processor based on parameter quantification sharing | |
| KR20230166836A (en) | A neural processing unit including a variable internal memory | |
| US20250272145A1 (en) | Systems and methods for heterogeneous large language model encoder and decoder processing | |
| US11704562B1 (en) | Architecture for virtual instructions | |
| CN111198714B (en) | Retraining method and related product |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| WAP | Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1) |