[go: up one dir, main page]

GB2590389A - Method and apparatus for control of memory banks - Google Patents

Method and apparatus for control of memory banks Download PDF

Info

Publication number
GB2590389A
GB2590389A GB1918459.7A GB201918459A GB2590389A GB 2590389 A GB2590389 A GB 2590389A GB 201918459 A GB201918459 A GB 201918459A GB 2590389 A GB2590389 A GB 2590389A
Authority
GB
United Kingdom
Prior art keywords
memory banks
mode
sequence
memory
accesses
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1918459.7A
Other versions
GB201918459D0 (en
Inventor
Purushotham Murugappa Velayuthan
Dev Gomony Manil
Catalsakal Koralp
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Technologies Oy
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Priority to GB1918459.7A priority Critical patent/GB2590389A/en
Publication of GB201918459D0 publication Critical patent/GB201918459D0/en
Publication of GB2590389A publication Critical patent/GB2590389A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • G06F1/3215Monitoring of peripheral devices
    • G06F1/3225Monitoring of peripheral devices of memory devices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/325Power saving in peripheral device
    • G06F1/3275Power saving in memory, e.g. RAM, cache
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3287Power saving characterised by the action undertaken by switching off individual functional units in the computer system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0625Power saving in storage systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0634Configuration or reconfiguration of storage systems by changing the state or mode of one or more devices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Power Sources (AREA)

Abstract

Embodiments relate to an apparatus comprising at least one processor, a plurality of memory banks and an address bus, wherein the at least one processor is configured for accessing an address in said plurality of memory banks based on specifying said address on the address bus, wherein at least one of said memory banks is operable in at least a first mode and a second mode, wherein the power consumption of the memory banks in the second mode is lower than in the first mode, the apparatus further comprising means configured for: - determining a past sequence of memory banks accesses based on a sequence of addresses on said address bus, - predicting a future sequence of memory banks accesses based on said past sequence, - controlling at least one of the memory banks in the first mode or the second mode based on the future sequence.

Description

Method and apparatus for control of memory banks
FIELD OF THE INVENTION
Embodiments of the present invention relate to the field of computing. In particular, embodiments of the present invention relate to the 5 interworking of one or more processors and memory banks.
BACKGROUND
The power consumption of a computing apparatus impacts how long it can operate without a battery recharge and/or the operating cost. 10 Techniques to limit the power consumption are therefore desirable.
SUMMARY
It is thus an object of embodiments of the present invention to propose a method and an apparatus, which do not show the inherent
shortcomings of the prior art.
Accordingly, embodiments relate to an apparatus comprising at least one processor, a plurality of memory banks and an address bus, wherein the at least one processor is configured for accessing an address in said plurality of memory banks based on specifying said address on the address bus, wherein at least one of said memory banks is operable in at least a first mode and a second mode, wherein the power consumption of the memory banks in the second mode is lower than in the first mode, the apparatus further comprising means configured for: -determining a past sequence of memory banks accesses based on a sequence of addresses on said address bus, - predicting a future sequence of memory banks accesses based on said past sequence, - controlling at least one of the memory banks in the first mode or the second 30 mode based on the future sequence.
In some embodiments, the means are configured for predicting a future sequence of memory banks accesses based on a recurrent neural network having an input sequence specifying at least the past sequence of memory bank accesses and providing an output sequence specifying at least the future sequence of memory banks accesses. -2 -
In some embodiments, the recurrent neural network comprises at least one LSTM core.
In some embodiments, the recurrent neural network comprises a plurality of LSTM cores associated with respective part of an input vector of 5 the input sequence, and a fully connected part.
In some embodiments, the means are configured for training said recurrent neural network based on a comparison between input sequences and output sequences.
In some embodiments, the means are further configured for 10 determining, for respective memory banks, a time difference between two successive memory bank accesses, wherein the input sequence specifies said time differences.
In some embodiments, controlling at least one of the memory banks in the first mode or the second mode based on the future sequence 15 comprises setting a memory bank in the first mode in response to determining that said memory banks will be accessed within a given time.
In some embodiments, controlling at least one of the memory banks in the first mode or the second mode based on the future sequence comprises setting a memory bank in the second mode in response to determining that said memory banks will not be accessed within a given time.
In some embodiments, the means are further configured for controlling the processor to halt execution in response to determining that an address on the address bus correspond to a memory bank in the second 25 operating mode.
In some embodiments, the apparatus comprises an integrated circuit including said at least one processor, the plurality of memory banks the address bus and said means.
Said means may comprises a memory controller circuit.
Correlatively, some embodiments relate to a method executed by the apparatus.
The scope of protection sought for various embodiments of the invention is set out by the independent claims. The embodiments and features, if any, described in this specification that do not fall under the scope of the independent claims are to be interpreted as examples useful for understanding various embodiments of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS -3 -
The above and other objects and features of the invention will become more apparent and the invention itself will be best understood by referring to the following description of embodiments taken in conjunction with the accompanying drawings wherein: Figure 1 is a block diagram of an apparatus for processing data according to example embodiments, Figure 2 is a block diagram of a system-on-a-chip according to example embodiments, Figure 3 is a flowchart of a method according to example 10 embodiments, Figure 4 is a block diagram of an memory controller circuit according to example embodiments, Figure 5 is a flowchart of a method according to example embodiments, Figures 6 and 7 are block diagram of neural networks according to
example embodiments.
DESCRIPTION OF EMBODIMENTS
System on-chip (SoC) implementations, ranging from modern CPUs/GPUs which target high performance computing to low-cost industrial Internet of things devices (1IoT), are constrained to operate at low power. Power consumption of such devices determine how long they can operate without battery recharge and/or the operating cost. In some cases, like in lloT, they are required to operate for years while in case of a smartphone (with CPU/GPU's) they are expected to operate a couple of days. Most of these SoC designs have a processor complex that includes CPUs and multiple on-chip SRAM (Static Random Access Memory) banks (which can be instruction or Ll /L2/L3 data memory).
Typically, in most sub-micron process technologies (90nm and below), memory leakage and access power consumption dominate the overall power consumption of the SoC. Retaining data in the SRAM banks needs power to be constantly supplied (however rarely they are used) resulting in wastage of power.
A memory bank may support low power modes, in addition to the 35 active or normal operational mode, to reduce SRAM power consumption. Examples of low power modes include: 1. Sleep: turns off peripheral control and 10 ports of the memory, reducing leakage power but retaining data. May be used when data is -4 -accessed in bursts or intermittently and data retention is required. This saves typically 30% of leakage power. However, this needs at least one clock cycle latency between turning on from sleep mode and the time data is first accessed.
2. Shutdown: turn off power supply to most of the components inside the memory bank, reducing leakage to 90%, but also losing the all the data stored. May be used when data retention is not required. Waking up from shutdown mode may needs few additional clock cycles.
The low power options may be exercised as needed either by the compiler or by SoC design implementation. Compiler driven optimizations typically are guided by the application programmer who has the expert knowledge on which banks and what data are to be retained or forgotten through compiler directives. The application programmer can be further aided by knowledge of hardware architecture to control the power islands on the SoC. However, as system gets larger, these fine grain control (even if provided in hardware) becomes harder for application programmer to use and hence limits to five or less power down modes.
A static analysis of the application code can be done to ascertain at what point in the application code the bank access would be needed and replace that instruction with an equivalent instruction that would do the same access but with a pre-deactivate sleep operation. Similarly, once the memory bank access is no longer needed one can introduce activate-sleep as part of the last access to memory bank. This solution would suffice if the application code is largely static and the post-compile static analysis of assembly instructions is possible. In many cases, this is not possible as CPUs or any processing unit is multi-threaded or multicore where several cores run parallelly and access a shared memory bank resource. This limits the usage of such an implementation as the application programmer is not necessarily in full control/knowledge of what processes are running and on which subsystem, if the application code is dynamic and includes many data depended context switches.
Figure 1 is a block diagram of an apparatus 1 for processing data. The apparatus 1 includes at least a data processing component 2. The apparatus 1 may comprises other components not shown on Figure 1, including a battery, a user interface (display, touchscreen, keys, speakers...), interfaces for wire connection (USB, headphones, network, electrical -5 -power...), wireless communication interfaces (Wi-Fi, Bluetooth, cellular network...) The apparatus 1 may be a user terminal, for example a smartphone, a personal computer, a wearable device (e.g. smartwatch, 5 earpiece, headphone...). The apparatus 1 may be a network element, for example a router, gateway, (part of) a base station, server...
The apparatus 1 may be a battery-powered device. Limiting the power consumption of the components, in particular of the data processing component 2, is desirable.
The data processing component 2 may comprises an integrated circuit. For example, the data processing component is or includes a System-on-a-chip (SoC). A more detailed embodiments is described hereafter.
Figure 2 is a block diagram of a data processing component 2. The 15 processing component 2 may be included in the apparatus 1. The processing component 2 may be an integrated circuit, for example a SoC. The processing component 2 may comprise a processing subsystem 20, a memory controller 22 and a pool 23 of memory banks 24 interconnected by one or more buses and/or signal paths.
The processing subsystem 20 may comprises one or more processors 21. A processor 21 may be referred to for example as a CPU, a GPU, a processing core... Figure 2 shows an example including M CPUs denoted CP1, CPU2...CPUM.
The pool 23 comprises a plurality of memory banks 24. Figure 2 shows an example including N memory banks 24 denoted bi, b2...bN, with N > 1. A memory bank 24 is configured for storing data and/or instructions used by the processors 21. For example, a memory bank 24 may be a SRAM and may be part of a cache hierarchy. A memory bank 24 may be operable in a plurality of operating modes. For example, a memory bank 24 is operable in at least two of: A normal operating mode, wherein data is retained, and memory access requires few clock cycles, for example only one clock cycle, A sleep mode, wherein data is retained, and memory access requires more clock cycles than in the normal operating mode, for example two clock cycles,
-
A power down mode, wherein data is not retained, and memory access requires more clock cycles than in the normal operating mode and/or the sleep mode.
The power consumption of a memory bank 24 may be higher in 5 the normal operating mode than in the sleep mode, and higher in the sleep mode than in the power down mode.
The memory controller 22 controls at least in part the interaction between the processors 21 and the memory banks 24. In particular, the memory controller 22 may control one or more of the memory banks 24 in 10 the normal operating mode, the sleep mode or the power down mode based on predicted memory banks accesses, as described in more detail hereafter. This help in limiting the power consumption of the apparatus 1. The memory controller 22 may also be referred to as a memory interconnect. The processors 21, the memory controller 22 and the memory 15 banks 24 may interact based on buses and/or signal paths.
In particular, the processing component 2 may comprises an address bus 25. A processor 21 may access an address in the plurality of memory banks based on specifying the address on the address bus 25. An address may specify which memory bank 24 is to be accessed, and a location within said memory bank 24. For example, an address is specified by a plurality of bits, with the most significant bits (MSB) indicating the memory bank 24 and the least significant bits indicating the location within the memory bank 24.
The memory controller 22 may provide memory enable signals 29 for controlling at least one of the memory banks 24 in one of the normal operating mode, the sleep mode or the shutdown mode based on predicted memory banks accesses. This is described in more detail hereafter with reference to Figure 3.
The processing subsystem 20 may provide a program counter 30 signal 26 to the memory controller 22. The program counter signal 26 provides an indication of the progress of execution of a set of instructions.
The memory controller 22 may provide a halt signal 27 to the processors 21. The halt signal 27 indicates that the processors 21 should wait before requesting further memory access.
A processor 21 may provide a read/write signal 28 to the memory banks 24. The read/write signal 28 indicates whether a memory access requested by a processor 21 is for reading or writing in the memory banks 24. -7 -
The processing component 2 may also comprises a data bus (not shown) for communication of data and/or instructions between the processing subsystem 20 and the memory banks 24.
The processing subsystem 20, memory controller 22, pool 23 of 5 memory banks 24, buses and signal paths may be implemented based on integrated circuit technology, for example FPGA, ASIC, SoC...
Figure 3 is a flowchart of operations executed by the apparatus 1, in parficular by the memory controller 22, for controlling the memory banks 10 24.
The processing subsystem 20 executes a set of instructions, and this involve accessing the memory banks 24. A memory access comprises providing an address on the address bus 25.
At operation 31, the memory controller 22 determines a past sequence of memory bank accesses based on a sequence of addresses on the address bus 25. For example, for successive addresses on the address bus 25, the memory controller 22 parses the most significant bits and determines the memory bank 24 corresponding to the address. The determined memory bank 24 may be represented by a one-hot encoded vector of size N: The N elements (bi, b2... bN) of the vector correspond to the N memory banks, and all elements are 0 except the element corresponding to the determined memory bank 24 which is equal to 1. A sequence of L memory bank accesses may be specified by L vectors of size N or a LxN matrix.
At operation 32, the memory controller 22 predicts a future sequence of memory bank accesses based on the past sequence of memory bank accesses. The future sequence may be of the same length L as the past sequence, or a different length. The format of the predicted sequence may match the format of the past sequence, for example L one-hot encoded vectors of size N. Various sequence prediction techniques may be used. For example, the prediction may involve a recurrent neural network, e.g. a LSTM neural network, which uses sequences of memory bank accesses to progressively get trained. In some embodiments, the memory controller 22 also determines time differences between memory bank accesses, and the time differences are taken into account for the prediction. Example are described in greater detail with reference to Figures 4 to 7.
At operation 33, the memory controller 22 controls the memory banks 24 in one of the normal operation mode, the sleep mode and the -8 -power-down mode based on the predicted sequence. The memory controller 22 may control the memory banks 24 based on providing memory enable signals 29 to the memory banks 24. For example, if the predicted sequence indicates that a given memory bank 24 will be accessed during the next L accesses, the memory controller 22 sets this memory bank in the normal operation mode, and if the predicted sequence indicates that a given memory bank 24 will not be accessed during the next L accesses, the memory controller 22 sets this memory bank in the sleep mode.
Operations 31, 32 and/or 33 may be executed at least partially in 10 parallel. For example, the memory controller 22 may start to predict the future sequence of memory bank accesses before the past sequence has been fully obtained.
Figure 4 is a block diagram of the memory controller 22 according 15 to some embodiments.
The memory controller 22 comprises a sequence determination module 41, a time differentiation module 42, a prediction module 43 and a control module 44.
The sequence determination module 41 parses the address specified on the address bus 25, for example the most significant bits, and determines the memory bank 24 corresponding to the address. As explained before, the determined memory bank 24 may be represented by a one-hot encoded vector of size N (bi, b2... bN). This operation is repeated for successive addresses.
The time differentiation module 42 determine the time difference between accesses At. More specifically, for a given memory bank 24, the memory controller 22 determines At = taccess faccess_prev where faccess is a current time when the memory bank 24 is accessed and taccess_pre, is a time when the memory bank 24 was last accessed. The time difference At may be expressed in clock cycles and determined for example based on a local clock cycle counter in the memory controller 22 or on the program counter signal 26 if available, and determined based on registers which stores the value of taccess_prev for each memory banks 24.
In a typical use case, the memory accesses in the applications are done in a periodic manner. Thus, the number of unique At values are much smaller than the number of unique sequential memory access patterns. For example, depending on the application At may take a number of discrete values comprised between 0 and a max number of say 100 or so. Hence, it -9 -is possible to predict these At values with a predictor with categorical output (i.e. as a one hot vector). In some embodiments, banks access enables (b), bn) and one hot encoded At values are concatenated for L time steps and fed to the prediction module 43.
The input of the prediction module 43 is a sequence of L input vector, wherein an input vector is the concatenation of a bank access enable vector (bi, b2... bN) and a one-hot encoded At value. The prediction module 43 applies a sequence prediction technique to determine an output sequence of L output vector, wherein an output vector is the concatenation of a predicted bank access enable vector (bi, b2... bN) and a one-hot encoded At value.
The prediction module 43 may be implemented based on a machine learning model, for example a neural network. In some embodiment, the prediction module 43 comprises one or more LSTM unit followed by a fully connected neural network. The prediction module 43 may provide categorical output, e.g. the predicted memory bank 24 and At value for successive time steps, both encoded as one-hot vectors.
The memory controller 22 may train the prediction module 43 based on a comparison of input sequences and corresponding output sequences. For example, two loss functions (categorical cross entropy) are applied to each of the two group of outputs ((bi, b2... bN) and At values) and optimized using stochastic gradient descent. The training may be online or offline. Online training involves regularly updating the machine learning model during operation, which allows to adapt to a changing context, for example a new or updated application being executed.
The control module 44 determines memory enable signals 29 based on the output sequence provided by the prediction module 43. For example, if the predicted sequence indicates that a given memory bank 24 will be accessed during the next L accesses, the memory controller 22 sets this memory bank in the normal operation mode, and if the predicted sequence indicates that a given memory bank 24 will not be accessed during the next L accesses, the memory controller 22 sets this memory bank in the sleep mode.
Figure 5 is a flowchart of a method according to embodiments, which may be executed with the memory controller 22 of Figure 4.
-10 -At operation 51, in an initial state, the processing subsystem 20 starts execution and the memory banks 24 are in the normal operating mode.
At operation 52, the memory controller 22 continuously reads the 5 address bus, the program counter signal 26 if available, and determines the bank accesses and At values to learn the dynamic context of memory access pattern. For example, parameters of a neural network are updated based on a comparison between a past sequence of memory bank accesses and a corresponding predicted sequence of memory bank 10 accesses.
At operation 53, the memory controller 22 controls the memory banks 24 in one of the normal, sleep and shutdown mode. For example, in some embodiments, determines the memory bank 24 to be access next based on the predicted sequence and sends a sleep or shutdown command to the other memory banks 24. In other embodiments, the memory controller 22 may set one or more memory banks 24 in the normal active mode and other in the sleep or shutdown mode, based on an overview of the predicted access specified by the predicted (b), b2... bN) and values.
Operations 52 and 53 are normally repeated. However, when the memory controller 22 determines that an address on the address bus 25 correspond to a memory bank 24 in sleep or shutdown mode, meaning that the prediction was wrong, the memory controller 22 issues a halt command to the processing subsystem 20 at operation 54 and turns the memory bank 24 in the normal operation mode.
In that case, at operation 55, the processor execution is halted for a predetermined time, so that the memory bank 24 comes out of the sleep or shutdown mode and is ready to be accessed.
In the apparatus 1, in particular in the processing component 2, the memory bank accesses are predicted based on past memory bank accesses. This allows controlling the memory banks 24 in a low power operating mode when possible. This reduce the power consumption of the processing component 2. Since the memory bank accesses are predicted based on past memory bank accesses, the applications executed by the processing subsystem 20 do not need to include predetermined instructions for controlling the memory banks 24 in one of the operating modes. Moreover, the prediction module 43 may be self-trained based on a comparison of input sequences and output sequences. Accordingly, the prediction module 43 may adapt itself to various applications executed by the processing component 2 and my update itself when the applications changes.
In an example embodiment, the processing subsystem 20 executes a low-density parity-check algorithm for a channel coding application, the number of memory banks 24 is N=16, the time difference At is encoded as 76 one-hot vectors and the size of the sequence is L=20. The memory access patterns during decoding of 12 different LDPC check matrix sizes were memorized by the prediction module 43 with an accuracy of 98%100% with less than 40 epochs.
In other words, after executing a full assembly code 40 times, the prediction module 43 remembers and accurately predicts the next 20 banks to be access given the last 20 access pattern with an accuracy of at least 98%. This in turn results in at least 30% savings in power consumption as the banks not used in the next 20 time steps can be in sleep mode.
If the application code changes, the prediction module 43 can retrain and re learn the new access pattern in place.
Figure 6 is a block diagram of a neural network 60 which can be used to implement the prediction module 43.
The neural network 60 comprises a LSTM core 61, a fully connected part 62 and an output part 63.
The LSTM core 61 takes the sequence of input vector as input and 25 provides its output to the fully connected pad 62. In some embodiments, the LSTM pad comprises 10 hidden nodes.
The fully connected pad 62 comprises a plurality of fully connected layers, for example a first layer of 64 nodes and a second layer of 20 nodes. The fully connected pad 62 provides its output to the output pad 63.
The output part 63 comprises a layer with softmax activation. It is composed of two pads, one to predict the memory bank access and one to predict the At.
Figure 7 is a block diagram of a neural network 70 which can be 35 used to implement the prediction module 43.
The neural network 70 comprises a LSTM pad 71, a fully connected pad 72 and an output pad 73, and may be seen as a variant of the neural network 60 which can be used for more complex and/or lager patterns.
-12 -In comparison with the neural network 60, the neural network 70 comprises a LSTM part 71 with an ensemble network composed of LSTM cores each containing of 10 LSTM hidden nodes. Each LSTM core now trains on part of the total address space.
The following feed forward pad 72 learns to switch between the predictions of each core.
In the context of this description, a machine learning model is a function for outputting an output based on an input, which depends on trainable parameters. An example of machine learning model is a neural network, with weights and biases as parameters. Training the machine learning model is the task of determining the parameters of the model based on training data.
It should be noted that although examples of methods have been 15 described with a specific order of steps, this does not exclude other implementations. In particular, the described steps may be executed in another order, partially or totally in parallel...
It is to be remarked that the functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared, for example in a cloud computing architecture. Moreover, explicit use of the term "processor" should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), read only memory (ROM) for storing software, random access memory (RAM), and non-volatile storage. Other hardware, conventional and/or custom, may also be included. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
It should be further appreciated by those skilled in the ad that any block diagrams herein represent conceptual views of illustrative circuitry embodying the principles of the invention. Similarly, it will be appreciated -13 -that any flow charts represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
While the principles of the invention have been described above in connection with specific embodiments, it is to be clearly understood that this description is made only by way of example and not as a limitation on the scope of the invention, as defined in the appended claims.

Claims (15)

  1. -14 -CLAIMS 1. Apparatus (1) comprising at least one processor (21), a plurality of memory banks (24) and an address bus (25), wherein the at least one processor is configured for accessing an address in said plurality of memory banks based on specifying said address on the address bus, wherein at least one of said memory banks is operable in at least a first mode and a second mode, wherein the power consumption of the memory banks 10 in the second mode is lower than in the first mode, the apparatus further comprising means configured for: - determining (31) a past sequence of memory banks accesses based on a sequence of addresses on said address bus, - predicting (32) a future sequence of memory banks accesses based on 15 said past sequence, - controlling (33) at least one of the memory banks in the first mode or the second mode based on the future sequence.
  2. 2. Apparatus according to claim 1, wherein said means are configured for predicting a future sequence of memory banks accesses based on a recurrent neural network haying an input sequence specifying at least the past sequence of memory bank accesses and providing an output sequence specifying at least the future sequence of memory banks accesses.
  3. 3. Apparatus according to claim 1, wherein the recurrent neural network comprises at least one LSTM core.
  4. 4. Apparatus according to claim 3, wherein the recurrent neural 30 network comprises a plurality of LSTM cores associated with respective part of an input vector of the input sequence, and a fully connected part.
  5. 5. Apparatus according to one of claims 2 to 4, wherein said means are configured for training said recurrent neural network based on a 35 comparison between input sequences and output sequences.
  6. 6. Apparatus according to one of claims 2 to 5, wherein said means are further configured for determining, for respective memory banks, a time -1 5 -difference between two successive memory bank accesses, wherein the input sequence specifies said time differences.
  7. 7. Apparatus according to one of claims 1 to 6, wherein controlling at least one of the memory banks in the first mode or the second mode based on the future sequence comprises setting a memory bank in the first mode in response to determining that said memory banks will be accessed within a given time.
  8. 8. Apparatus according to one of claims 1 to 7, wherein controlling at least one of the memory banks in the first mode or the second mode based on the future sequence comprises setting a memory bank in the second mode in response to determining that said memory banks will not be accessed within a given time.
  9. 9. Apparatus according to one of claims 2 to 5, wherein said means are further configured for controlling the processor to halt execution in response to determining that an address on the address bus correspond to a memory bank in the second operating mode.
  10. 10. Apparatus according to one of claims 1 to 9, comprising an integrated circuit (2) including said at least one processor, the plurality of memory banks the address bus and said means.
  11. 11. A method executed by an apparatus comprising at least one processor, a plurality of memory banks and an address bus, wherein the at least one processor is configured for accessing an address in said plurality of memory banks based on specifying said address on the address bus, wherein at least one of said memory banks is operable in at least a first mode and a second mode, wherein the power consumption of the memory banks in the second mode is lower than in the first mode, the method comprising: - determining a past sequence of memory banks accesses based on a 35 sequence of addresses on said address bus, - predicting a future sequence of memory banks accesses based on said past sequence, - controlling at least one of the memory banks in the first mode or the second mode based on the future sequence.
  12. 12. Method according to claim 11, comprising predicting a future sequence of memory banks accesses based on a recurrent neural network having an input sequence specifying at least the past sequence of memory bank accesses and providing an output sequence specifying at least the future sequence of memory banks accesses.
  13. 13. Method according to claim 12, comprising determining, for respective memory banks, a time difference between two successive memory bank accesses, wherein the input sequence specifies said time differences.
  14. 14. Method according to one of claims 11 to 13, wherein controlling at least one of the memory banks in the first mode or the second mode based on the future sequence comprises setting a memory bank in the first mode in response to determining that said memory banks will be accessed within a given time.
  15. 15. Method according to one of claims 11 to 15, wherein controlling at least one of the memory banks in the first mode or the second mode based on the future sequence comprises setting a memory bank in the second mode in response to determining that said memory banks will not be accessed within a given time.
GB1918459.7A 2019-12-16 2019-12-16 Method and apparatus for control of memory banks Withdrawn GB2590389A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1918459.7A GB2590389A (en) 2019-12-16 2019-12-16 Method and apparatus for control of memory banks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1918459.7A GB2590389A (en) 2019-12-16 2019-12-16 Method and apparatus for control of memory banks

Publications (2)

Publication Number Publication Date
GB201918459D0 GB201918459D0 (en) 2020-01-29
GB2590389A true GB2590389A (en) 2021-06-30

Family

ID=69186736

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1918459.7A Withdrawn GB2590389A (en) 2019-12-16 2019-12-16 Method and apparatus for control of memory banks

Country Status (1)

Country Link
GB (1) GB2590389A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240111437A1 (en) * 2022-09-30 2024-04-04 Silicon Laboratories Inc. Memory allocation based on lifespan

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180150123A1 (en) * 2016-11-28 2018-05-31 Qualcomm Incorporated Wifi memory power minimization
US20190339892A1 (en) * 2018-05-03 2019-11-07 Mediatek Inc. Memory management system and memory management method for dynamic memory management
US20190370632A1 (en) * 2018-05-31 2019-12-05 Google Llc Computer system prediction machine learning models

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180150123A1 (en) * 2016-11-28 2018-05-31 Qualcomm Incorporated Wifi memory power minimization
US20190339892A1 (en) * 2018-05-03 2019-11-07 Mediatek Inc. Memory management system and memory management method for dynamic memory management
US20190370632A1 (en) * 2018-05-31 2019-12-05 Google Llc Computer system prediction machine learning models

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240111437A1 (en) * 2022-09-30 2024-04-04 Silicon Laboratories Inc. Memory allocation based on lifespan
US12346577B2 (en) * 2022-09-30 2025-07-01 Silicon Laboratories Inc. Memory allocation based on lifespan

Also Published As

Publication number Publication date
GB201918459D0 (en) 2020-01-29

Similar Documents

Publication Publication Date Title
US20240420742A1 (en) Parallel access to volatile memory by a processing device for machine learning
EP3864582B1 (en) Modifying machine learning models to improve locality
US11921561B2 (en) Neural network inference circuit employing dynamic memory sleep
TWI537821B (en) Providing per core voltage and frequency control
CN110750351B (en) Multi-core task scheduler, multi-core task scheduling method, multi-core task scheduling device and related products
US20220076739A1 (en) Memory context restore, reduction of boot time of a system on a chip by reducing double data rate memory training
KR20190120698A (en) System and method for optimizing performance of a solid-state drive using a deep neural network
CN115516437A (en) Host-Assisted Memory-Side Prefetcher
CN111105023B (en) Data stream reconstruction method and reconfigurable data stream processor
US12265492B2 (en) Circular buffer for input and output of tensor computations
US11733763B2 (en) Intelligent low power modes for deep learning accelerator and random access memory
KR102425909B1 (en) Neural network computing system and operating method thereof
US20190286971A1 (en) Reconfigurable prediction engine for general processor counting
CN117223009A (en) Performance scaling of data stream deep neural network hardware accelerators
US12197362B2 (en) Batch matrix multiplication operations in a machine learning accelerator
CN112906877A (en) Data layout conscious processing in memory architectures for executing neural network models
CN112052943B (en) Electronic device and method for performing operation of the electronic device
CN116997910A (en) tensor controller architecture
EP4022446A1 (en) Memory sharing
GB2590389A (en) Method and apparatus for control of memory banks
CN108830379B (en) Neural morphology processor based on parameter quantification sharing
KR20230166836A (en) A neural processing unit including a variable internal memory
US20250272145A1 (en) Systems and methods for heterogeneous large language model encoder and decoder processing
US11704562B1 (en) Architecture for virtual instructions
CN111198714B (en) Retraining method and related product

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)