[go: up one dir, main page]

TW202526608A - Data reading method for chip, chip, computer equipment and computer-readable storage medium - Google Patents

Data reading method for chip, chip, computer equipment and computer-readable storage medium Download PDF

Info

Publication number
TW202526608A
TW202526608A TW113149884A TW113149884A TW202526608A TW 202526608 A TW202526608 A TW 202526608A TW 113149884 A TW113149884 A TW 113149884A TW 113149884 A TW113149884 A TW 113149884A TW 202526608 A TW202526608 A TW 202526608A
Authority
TW
Taiwan
Prior art keywords
read
data
request
prefetch
cache
Prior art date
Application number
TW113149884A
Other languages
Chinese (zh)
Inventor
發明人放棄姓名表示權
Original Assignee
大陸商摩爾線程智能科技(北京)股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 大陸商摩爾線程智能科技(北京)股份有限公司 filed Critical 大陸商摩爾線程智能科技(北京)股份有限公司
Publication of TW202526608A publication Critical patent/TW202526608A/en

Links

Abstract

The embodiment of the disclosure discloses a data reading method for a chip, a chip, computer equipment and a computer-readable storage medium, wherein the method comprises: acquiring a data reading instruction; the data reading instruction is used to instruct data reading according to the reading information and data prefetching according to the prefetching information; acquiring a first data reading request according to a data reading instruction; the first data reading request includes reading information and prefetching information; sending a first data reading request to a cache module, so that the cache module responds to the first data reading request, reads and returns the first target data according to the reading information, and prefetches and caches the second target data according to the prefetching information; receiving the first target data returned by the cache module. According to the embodiment of the present disclosure, the data request pressure between the request source and the cache module can be reduced, the request bandwidth can be compressed, the processor performance can be improved, and the prefetch mode can be flexibly controlled task by task to better meet the data reading requirements of each task.

Description

用於晶片的資料讀取方法、晶片、電腦設備及電腦可讀儲存媒體Data reading method for chip, chip, computer device and computer-readable storage medium

[相關申請][Related Application]

本發明基於申請號為202311809260.9、申請日為2023年12月26日、申請名稱為「用於芯片的數據讀取方法、芯片、計算機設備及存儲介質」的中國專利申請提出,並要求該中國專利申請的優先權,該中國專利申請的全部內容在此引入本發明作為參考。This invention is based on and claims priority from Chinese patent application No. 202311809260.9, filed on December 26, 2023, and entitled “Data reading method for chip, chip, computer device, and storage medium,” the entire contents of which are incorporated herein by reference.

本發明涉及但不限於電腦技術領域,尤其涉及一種用於晶片的資料讀取方法、晶片、電腦設備及電腦可讀儲存媒體。The present invention relates to, but is not limited to, the field of computer technology, and more particularly to a data reading method for a chip, a chip, a computer device, and a computer-readable storage medium.

在處理器中通常都會設計快取模組(Cache),以使得資料讀寫的速度適應處理器中計算模組的計算速度,提高系統性能。相關技術中,處理器大多採用資料預取技術,藉由預測未來的記憶體存取,並在顯式存取之前發出對應記憶體塊的存取請求,提前將記憶體塊中的資料取到快取模組中,從而提高快取的命中率。然而,相關技術中採用的資料預取方案,會增加請求源與快取模組之間的資料請求壓力,影響其他正常訪存請求,並且在相關技術的一些資料預取方案中也難以對預取進行靈活地控制。Processors typically include a cache module to adapt data read and write speeds to the computational speed of the processor's compute modules, improving system performance. In related technologies, processors often employ data prefetching techniques. This predicts future memory accesses and issues access requests for corresponding memory blocks before explicit accesses occur. This allows data from the memory blocks to be pre-fetched into the cache module, thereby improving cache hit rates. However, these data prefetching schemes increase the data request pressure between the request source and the cache module, impacting other normal memory access requests. Furthermore, some data prefetching schemes in related technologies make it difficult to flexibly control prefetching.

有鑑於此,本發明實施例至少提供一種用於晶片的資料讀取方法、晶片、電腦設備及電腦可讀儲存媒體。In view of this, embodiments of the present invention at least provide a data reading method for a chip, a chip, a computer device, and a computer-readable storage medium.

本發明實施例的技術方案是這樣實現的: 本發明實施例提供一種用於晶片的資料讀取方法,所述方法包括: 獲取讀資料指令;所述讀資料指令用於指示按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取; 根據所述讀資料指令獲取第一讀資料請求;所述第一讀資料請求中包含所述讀取資訊和所述預取資訊; 向快取模組發送所述第一讀資料請求,以使所述快取模組回應於所述第一讀資料請求,按照所述讀取資訊讀取並返回第一目標資料、以及按照所述預取資訊預取並快取第二目標資料; 接收所述快取模組返回的所述第一目標資料。 The technical solution of an embodiment of the present invention is implemented as follows: This embodiment of the present invention provides a data reading method for a chip, comprising: obtaining a read instruction; the read instruction is used to instruct data reading according to read information and data prefetching according to prefetch information; obtaining a first read request based on the read instruction; the first read request includes the read information and the prefetch information; sending the first read request to a cache module, so that the cache module responds to the first read request by reading and returning first target data according to the read information and prefetching and caching second target data according to the prefetch information; receiving the first target data returned by the cache module.

本發明實施例提供一種用於晶片的資料讀取方法,所述方法包括: 接收請求源發送的第一讀資料請求,所述第一讀資料請求用於請求按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取; 根據所述第一讀資料請求,獲取所述讀取資訊和所述預取資訊; 基於所述讀取資訊,從快取模組的儲存空間、和/或記憶體中讀取第一目標資料,並將所述第一目標資料返回至所述請求源; 基於所述預取資訊,從所述記憶體中預取第二目標資料,並將所述第二目標資料快取至所述快取模組的至少一個快取單元中。 An embodiment of the present invention provides a data reading method for a chip, comprising: Receiving a first read data request sent by a request source, the first read data request being used to request data reading according to read information and data prefetching according to prefetch information; According to the first read data request, obtaining the read information and prefetch information; Based on the read information, reading first target data from storage space and/or memory of a cache module and returning the first target data to the request source; Based on the prefetch information, prefetching second target data from the memory and caching the second target data into at least one cache unit of the cache module.

本發明實施例提供一種晶片,所述晶片包括: 計算模組,用於:獲取讀資料指令;所述讀資料指令用於指示按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取;根據所述讀資料指令獲取第一讀資料請求;所述第一讀資料請求中包含所述讀取資訊和所述預取資訊;向快取模組發送所述第一讀資料請求,以使所述快取模組回應於所述第一讀資料請求,按照所述讀取資訊讀取並返回第一目標資料、以及按照所述預取資訊預取並快取第二目標資料;接收所述快取模組返回的所述第一目標資料; 快取模組,用於:接收所述計算模組發送的第一讀資料請求,所述第一讀資料請求用於請求按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取;根據所述第一讀資料請求,獲取所述讀取資訊和所述預取資訊;基於所述讀取資訊,從所述快取模組的儲存空間、和/或記憶體中讀取第一目標資料,並將所述第一目標資料返回至所述計算模組;基於所述預取資訊,從所述記憶體中預取第二目標資料,並將所述第二目標資料快取至所述快取模組的至少一個快取單元中。 An embodiment of the present invention provides a chip comprising: a computing module configured to: obtain a read instruction; the read instruction is configured to instruct data reading according to read information and data prefetching according to prefetch information; obtain a first read request based on the read instruction; the first read request includes the read information and the prefetch information; send the first read request to a cache module, causing the cache module to respond to the first read request by reading and returning first target data according to the read information and prefetching and caching second target data according to the prefetch information; and receive the first target data returned by the cache module; The cache module is configured to: receive a first read data request sent by the computing module, the first read data request being used to request data reading according to read information and data prefetching according to prefetch information; obtain the read information and prefetch information based on the first read data request; read first target data from the storage space and/or memory of the cache module based on the read information, and return the first target data to the computing module; and prefetch second target data from the memory based on the prefetch information, and cache the second target data into at least one cache unit of the cache module.

本發明實施例提供一種電腦設備,包括記憶體和處理器,所述記憶體儲存有可在處理器上運行的電腦程式,所述處理器執行所述程式時實現上述方法中的部分或全部步驟、或所述處理器包括上述的晶片。An embodiment of the present invention provides a computer device comprising a memory and a processor, wherein the memory stores a computer program that can be run on the processor, and when the processor executes the program, it implements part or all of the steps in the above method, or the processor includes the above chip.

本發明實施例提供一種電腦可讀儲存媒體,其上儲存有電腦程式,該電腦程式被處理器執行時實現上述方法中的部分或全部步驟。An embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, some or all of the steps in the above method are implemented.

本發明實施例中,獲取讀資料指令,該讀資料指令用於指示按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取;根據該讀資料指令獲取第一讀資料請求,該第一讀資料請求中包含該讀取資訊和該預取資訊;向快取模組發送第一讀資料請求,以使快取模組回應於該第一讀資料請求,按照該讀取資訊讀取並返回第一目標資料、以及按照該預取資訊預取並快取第二目標資料;接收快取模組返回的該第一目標資料。這樣,讀資料請求的請求源可以將預取資訊伴隨讀資料請求發送給快取模組,因此可以藉由單次讀資料請求實現資料的讀取和預取,從而,一方面可以減少請求源與快取模組之間的資料請求壓力,壓縮請求頻寬,提高處理器性能,另一方面可以利用讀資料請求中包含的預取資訊,對資料預取進行控制,進而可以便於逐任務靈活地控制預取方式,以更好地滿足各任務的資料讀取需求。In an embodiment of the present invention, a read data instruction is obtained, the read data instruction being used to instruct data reading according to read information and data prefetching according to prefetch information; a first read data request is obtained according to the read data instruction, the first read data request including the read information and the prefetch information; the first read data request is sent to a cache module, so that the cache module responds to the first read data request by reading and returning first target data according to the read information and prefetching and caching second target data according to the prefetch information; and the first target data returned by the cache module is received. In this way, the request source of a read request can send prefetch information along with the read request to the cache module. Therefore, data can be read and prefetched through a single read request. This can reduce the data request pressure between the request source and the cache module, compress the request bandwidth, and improve processor performance. On the other hand, the prefetch information contained in the read request can be used to control data prefetching, thereby facilitating flexible control of the prefetching method on a task-by-task basis to better meet the data reading needs of each task.

應當理解的是,以上的一般描述和後文的細節描述僅是示例性和解釋性的,而非限制本發明的技術方案。It should be understood that the above general description and the following detailed description are merely exemplary and explanatory, and do not limit the technical solutions of the present invention.

為了使本發明的目的、技術方案和優點更加清楚,下面結合附圖和實施例對本發明的技術方案進一步詳細闡述,所描述的實施例不應視為對本發明的限制,本領域普通技術人員在沒有做出創造性勞動前提下所獲得的所有其它實施例,都屬於本發明保護的範圍。In order to make the purpose, technical solutions and advantages of the present invention more clear, the technical solutions of the present invention are further described in detail below in conjunction with the accompanying drawings and embodiments. The described embodiments should not be regarded as limiting the present invention. All other embodiments obtained by ordinary technical personnel in this field without making creative efforts are within the scope of protection of the present invention.

在以下的描述中,涉及到「一些實施例」,其描述了所有可能實施例的子集,但是可以理解,「一些實施例」可以是所有可能實施例的相同子集或不同子集,並且可以在不衝突的情況下相互結合。所涉及的術語「第一/第二/第三」僅僅是區別類似的物件,不代表針對物件的特定排序,可以理解地,「第一/第二/第三」在允許的情況下可以互換特定的順序或先後次序,以使這裡描述的本發明實施例能夠以除了在這裡圖示或描述的以外的順序實施。In the following description, references to "some embodiments" describe a subset of all possible embodiments. However, it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments and may be combined with each other without conflict. The terms "first/second/third" are used solely to distinguish similar items and do not represent a specific ordering of the items. It is understood that the specific order or precedence of "first/second/third" may be interchanged where permitted, so that the embodiments of the present invention described herein can be implemented in an order other than that illustrated or described herein.

除非另有定義,本文所使用的所有的技術和科學術語與屬於本發明的技術領域的技術人員通常理解的含義相同。本文中所使用的術語只是為了描述本發明的目的,不是旨在限制本發明。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which the present invention pertains. The terms used herein are for the purpose of describing the present invention only and are not intended to limit the present invention.

本發明實施例提供一種資料讀取方法,該方法可以應用於晶片中,例如,可以由晶片中的快取模組執行。其中,晶片可以例如中央處理器(Central Processing Unit,CPU)晶片、圖形處理器(Graphics Processing Unit,GPU)晶片、資料處理器(Data Processing Unit,DPU)晶片等中的至少之一。該晶片可以用於任意合適的電腦設備中,電腦設備指的可以是伺服器、筆記型電腦、平板電腦、桌上型電腦、智慧電視、機上盒、移動設備(例如行動電話、可攜式視訊播放機、個人數位助理、專用消息設備、可攜式遊戲裝置)等具備資料處理能力的設備。Embodiments of the present invention provide a data reading method that can be applied to a chip, for example, and can be executed by a cache module in the chip. The chip can be, for example, at least one of a central processing unit (CPU) chip, a graphics processing unit (GPU) chip, and a data processing unit (DPU) chip. The chip can be used in any suitable computer device, such as a server, laptop, tablet, desktop, smart TV, set-top box, mobile device (e.g., mobile phone, portable video player, personal digital assistant, dedicated messaging device, portable gaming device), and other devices with data processing capabilities.

圖1為本發明實施例提供的一種資料讀取方法的實現流程示意圖,如圖1所示,該方法包括如下步驟S101至步驟S104: 步驟S101,接收請求源發送的第一讀資料請求,所述第一讀資料請求用於請求按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取。 Figure 1 is a schematic diagram illustrating a data reading method according to an embodiment of the present invention. As shown in Figure 1 , the method includes the following steps S101 to S104: Step S101: Receive a first data read request from a request source. The first data read request is used to request data reading according to read information and to prefetch data according to prefetch information.

步驟S102,根據所述第一讀資料請求,獲取所述讀取資訊和所述預取資訊。Step S102: Obtain the read information and the pre-fetch information according to the first read data request.

這裡,請求源可以包括但不限於晶片中的計算模組、當前待進行資料讀取的快取模組的上游快取模組等中的至少之一。Here, the request source may include but is not limited to at least one of a computing module in a chip, an upstream cache module of a cache module currently to read data, and the like.

在一些實施例中,可以由晶片中的某一快取模組(記為當前快取模組)接收第一讀資料請求,請求源可以根據需要向該當前快取模組發送至少一個第一讀資料請求。例如,在請求源為計算模組的情況下,計算模組可以在對任務中的讀資料指令進行解析後生成第一讀資料請求,並向快取模組發送該第一讀資料請求。又如,在請求源為當前快取模組的上游快取模組的情況下,上游快取模組可以在接收到的上游讀資料請求對應的快取單元缺失的情況下,根據該上游讀資料請求向當前快取模組發送至少一個第一讀資料請求。In some embodiments, a first read data request may be received by a cache module in a chip (referred to as the current cache module), and the request source may send at least one first read data request to the current cache module as needed. For example, if the request source is a computing module, the computing module may generate a first read data request after parsing the read data instruction in the task and send the first read data request to the cache module. For another example, if the request source is an upstream cache module of the current cache module, the upstream cache module may send at least one first read data request to the current cache module based on the upstream read data request if the cache unit corresponding to the received upstream read data request is missing.

在一些實施例中,第一讀資料請求中可以包含讀取資訊和預取資訊。在接收到請求源發送的第一讀資料請求後,可以對第一讀資料請求進行解析,得到該第一讀資料請求對應的讀取資訊和預取資訊。其中,讀取資訊可以包括對資料讀取進行指示的任意合適的資訊,例如,讀取資訊可以但不限於第一讀請求位址、第一讀資料長度等中的至少之一。第一讀請求位址為該第一讀資料請求對應的待請求讀取的資料的起始位址。第一讀資料長度指的是待請求讀取的資料的長度。預取資訊可以包括對資料預取進行指示的任意合適的資訊,例如,預取資訊可以包括但不限於是否進行資料預取、預取資料長度等中的至少之一。In some embodiments, the first read data request may include read information and prefetch information. After receiving the first read data request sent by the request source, the first read data request may be parsed to obtain the read information and prefetch information corresponding to the first read data request. The read information may include any appropriate information indicating data reading. For example, the read information may include, but is not limited to, at least one of a first read request address and a first read data length. The first read request address is the starting address of the data to be read corresponding to the first read data request. The first read data length refers to the length of the data to be read. The prefetch information may include any appropriate information indicating data prefetching. For example, the prefetch information may include but is not limited to at least one of whether to perform data prefetching, the length of prefetched data, etc.

在實施時,可以根據實際情況按照任意合適的解析方式,對第一讀資料請求進行解析,本發明實施例對此並不限定。During implementation, the first read data request can be parsed in any appropriate parsing method according to the actual situation, and the embodiment of the present invention is not limited to this.

在一些實施方式中,第一讀資料請求中可以包括讀請求位址訊號、讀資料長度訊號和預取控制訊號,對該讀請求位址訊號進行解析可以得到第一讀請求位址,對該讀資料長度訊號進行解析可以得到第一讀資料長度,對該預取控制訊號進行解析可以得到預取資訊。在實施時,可以預先設定多種預取控制訊號與預取資訊之間的對應關係,基於該對應關係可以確定第一讀資料請求中的預取控制訊號對應的預取資訊。本領域技術人員可以根據實際情況設定合適的預取控制訊號與預取資訊之間的對應關係,本發明實施例對此並不限定。In some embodiments, the first read data request may include a read request address signal, a read data length signal, and a prefetch control signal. Parsing the read request address signal yields the first read request address, parsing the read data length signal yields the first read data length, and parsing the prefetch control signal yields prefetch information. During implementation, a plurality of correspondences between prefetch control signals and prefetch information may be pre-set, and based on these correspondences, the prefetch information corresponding to the prefetch control signal in the first read data request may be determined. Those skilled in the art may determine an appropriate correspondence between prefetch control signals and prefetch information based on actual circumstances, and the present invention is not limited thereto.

在一些實施方式中,預取資訊包括表徵是否進行資料預取的預取開關狀態,可以藉由預取控制訊號的取值指示預取開關狀態。例如,可以在預取控制訊號為0’b0的情況下,確定預取開關狀態為表徵不進行資料預取的關閉狀態,在預取控制訊號為0’b1的情況下,確定預取開關狀態為表徵進行資料預取的開啟狀態。在預取開關狀態為表徵進行資料預取的開啟狀態的情況下,預取資料長度可以是默認的,也可以是藉由其他訊號進行控制的,這裡並不限定。In some embodiments, the prefetch information includes a prefetch switch state indicating whether data prefetching is to be performed. The prefetch switch state can be indicated by the value of a prefetch control signal. For example, when the prefetch control signal is 0'b0, the prefetch switch state can be determined to be a closed state indicating that data prefetching is not to be performed. When the prefetch control signal is 0'b1, the prefetch switch state can be determined to be an open state indicating that data prefetching is to be performed. When the prefetch switch state is an open state indicating that data prefetching is to be performed, the prefetch data length can be a default or controlled by other signals, which is not limited here.

在一些實施方式中,預取資訊包括預取資料長度,可以藉由預取控制訊號的取值指示預取資料長度。例如,可以在預取控制訊號為0’b0的情況下,確定預取資料長度為0,在預取控制訊號為0’b1的情況下,確定預取資料長度為1個快取單元。又如,可以在預取控制訊號為0’b00的情況下,確定預取資料長度為0,在預取控制訊號為0’b01的情況下,確定預取資料長度為1個快取單元,在預取控制訊號為0’b10的情況下,確定預取資料長度為2個快取單元,在預取控制訊號為0’b11的情況下,確定預取資料長度為3個快取單元。In some embodiments, the prefetch information includes the prefetch data length, and the prefetch data length can be indicated by the value of the prefetch control signal. For example, when the prefetch control signal is 0’b0, the prefetch data length can be determined to be 0, and when the prefetch control signal is 0’b1, the prefetch data length can be determined to be 1 cache unit. For another example, when the prefetch control signal is 0’b00, the prefetch data length can be determined to be 0, when the prefetch control signal is 0’b01, the prefetch data length can be determined to be 1 cache unit, when the prefetch control signal is 0’b10, the prefetch data length can be determined to be 2 cache units, and when the prefetch control signal is 0’b11, the prefetch data length can be determined to be 3 cache units.

在一些實施方式中,第一讀資料請求中可以包括請求類型訊號、讀請求位址訊號和讀資料長度訊號,對該讀請求位址訊號進行解析可以得到第一讀請求位址,對該讀資料長度訊號進行解析可以得到第一讀資料長度,對該請求類型訊號進行解析可以得到預取資訊。在實施時,可以預先定義多種請求類型訊號,並且不同請求類型訊號指示的請求類型可以對應不同的預取資訊,從而基於第一讀資料請求中的請求類型訊號可以確定相應的預取資訊。本領域技術人員可以根據實際情況定義合適的請求類型訊號、以及各請求類型訊號指示的請求類型與預取資訊之間的對應關係,本發明實施例對此並不限定。In some implementations, the first read data request may include a request type signal, a read request address signal, and a read data length signal. Parsing the read request address signal may yield the first read request address, parsing the read data length signal may yield the first read data length, and parsing the request type signal may yield prefetch information. In implementations, multiple request type signals may be predefined, and different request type signals may indicate different prefetch information. Thus, corresponding prefetch information may be determined based on the request type signal in the first read data request. Those skilled in the art can define appropriate request type signals and the corresponding relationship between the request type indicated by each request type signal and the prefetch information according to actual circumstances, and the embodiments of the present invention are not limited thereto.

在一些實施方式中,預取資訊包括表徵是否進行資料預取的預取開關狀態,可以藉由請求類型訊號的取值指示請求類型,並根據請求類型確定預取開關狀態。例如,可以在請求類型訊號為0’b101的情況下,確定請求類型為僅讀類型,即僅進行資料讀取、不進行資料預取,從而預取開關狀態為表徵不進行資料預取的關閉狀態;在請求類型訊號為0’b110的情況下,確定請求類型為讀伴隨預取類型,即進行資料讀取、也進行資料預取,從而預取開關狀態為表徵進行資料預取的開啟狀態。In some embodiments, the prefetch information includes a prefetch switch state indicating whether data prefetching is performed. The request type can be indicated by the value of a request type signal, and the prefetch switch state is determined based on the request type. For example, if the request type signal is 0'b101, the request type can be determined to be a read-only type, i.e., only data reading is performed without data prefetching, and the prefetch switch state is set to a closed state, indicating that data prefetching is not performed. If the request type signal is 0'b110, the request type can be determined to be a read-with-prefetch type, i.e., both data reading and data prefetching are performed, and the prefetch switch state is set to an open state, indicating that data prefetching is performed.

在一些實施方式中,預取資訊包括預取資料長度,可以藉由請求類型訊號的取值指示請求類型,並根據請求類型確定預取資料長度。例如,可以在請求類型訊號為0’b101的情況下,確定請求類型為僅讀類型,從而確定預取資料長度為0,在請求類型訊號為0’b110的情況下,確定請求類型為讀且預取1個快取單元的類型,從而確定預取資料長度為1個快取單元,在請求類型訊號為0’b111的情況下,確定請求類型為讀且預取4個快取單元的類型,從而確定預取資料長度為4個快取單元。In some embodiments, the prefetch information includes the prefetch data length. The request type can be indicated by the value of the request type signal, and the prefetch data length can be determined based on the request type. For example, if the request type signal is 0'b101, the request type can be determined to be read-only, thereby determining the prefetch data length to be 0. If the request type signal is 0'b110, the request type can be determined to be read with a prefetch of 1 cache unit, thereby determining the prefetch data length to be 1 cache unit. If the request type signal is 0'b111, the request type can be determined to be read with a prefetch of 4 cache units, thereby determining the prefetch data length to be 4 cache units.

步驟S103,基於所述讀取資訊,從快取模組的儲存空間、和/或記憶體中讀取第一目標資料,並將所述第一目標資料返回至所述請求源。Step S103: Based on the read information, read the first target data from the storage space and/or memory of the cache module, and return the first target data to the request source.

這裡,可以基於讀取資訊,在快取模組的儲存空間中進行快取單元命中檢測,若該讀取資訊對應的快取單元命中,則從該快取單元中獲取第一目標資料;若該讀取資訊對應的快取單元缺失,則基於該讀取資訊,從記憶體中獲取第一目標資料。Here, a cache unit hit detection can be performed in the storage space of the cache module based on the read information. If the cache unit corresponding to the read information hits, the first target data is obtained from the cache unit; if the cache unit corresponding to the read information is missing, the first target data is obtained from the memory based on the read information.

在一些實施例中,在讀取資訊中包括第一讀請求位址和第一讀資料長度的情況下,可以基於第一讀請求位址和第一讀資料長度,在快取模組的儲存空間中進行快取單元命中檢測,若第一讀請求位址和第一讀資料長度對應的快取單元命中,則從該快取單元中獲取第一目標資料;若第一讀請求位址和第一讀資料長度對應的快取單元缺失,則基於第一讀請求位址和第一讀資料長度,從記憶體中獲取第一目標資料。In some embodiments, when the read information includes a first read request address and a first read data length, a cache unit hit detection can be performed in the storage space of the cache module based on the first read request address and the first read data length. If the cache unit corresponding to the first read request address and the first read data length hits, the first target data is obtained from the cache unit; if the cache unit corresponding to the first read request address and the first read data length misses, the first target data is obtained from the memory based on the first read request address and the first read data length.

步驟S104,基於所述預取資訊,從所述記憶體中預取第二目標資料,並將所述第二目標資料快取至所述快取模組的至少一個快取單元中。In step S104, based on the prefetch information, second target data is prefetched from the memory, and the second target data is cached into at least one cache unit of the cache module.

這裡,可以基於預取資訊,確定預取起始位址和預取資料長度,從而可以基於預取起始位址和預取資料長度,從記憶體中預取第二目標資料,並將第二目標資料快取至快取模組的至少一個快取單元中。其中,快取單元可以包括但不限於快取塊、快取扇區等。Here, a prefetch start address and a prefetch data length can be determined based on the prefetch information, and then the second target data can be prefetched from the memory based on the prefetch start address and the prefetch data length, and the second target data can be cached into at least one cache unit of the cache module. The cache unit may include but is not limited to a cache block, a cache sector, etc.

在一些實施方式中,可以基於預取資訊,確定預取起始位址和預取資料長度,從而可以基於預取起始位址和預取資料長度,在快取模組的儲存空間中進行快取單元命中檢測,若預取起始位址和預取資料長度對應的快取單元缺失,則從記憶體中預取第二目標資料,並將第二目標資料快取至快取模組的至少一個快取單元中;若預取起始位址和預取資料長度對應的快取單元命中,則可以不用從記憶體中預取第二目標資料。In some embodiments, a prefetch start address and a prefetch data length can be determined based on the prefetch information, and a cache unit hit detection can be performed in the storage space of the cache module based on the prefetch start address and the prefetch data length. If the cache unit corresponding to the prefetch start address and the prefetch data length is missing, the second target data is prefetched from the memory and cached into at least one cache unit of the cache module; if the cache unit corresponding to the prefetch start address and the prefetch data length is hit, the second target data does not need to be prefetched from the memory.

可以理解的是,第一目標資料為第一讀資料請求所請求讀取的資料,第二目標資料為第一讀資料請求所請求預取的資料,在當前請求過程中,第一目標資料會返回給請求源,第二目標資料會被快取在快取模組的至少一個快取單元中,在之後的讀資料請求中若請求讀取該第二目標資料,則可直接從快取模組中讀取快取的第二目標資料,如此,可以提高快取命中率,從而提高資料讀取的效率。It can be understood that the first target data is the data requested to be read by the first read data request, and the second target data is the data requested to be pre-fetched by the first read data request. During the current request process, the first target data will be returned to the request source, and the second target data will be cached in at least one cache unit of the cache module. In subsequent read data requests, if the second target data is requested to be read, the cached second target data can be directly read from the cache module. In this way, the cache hit rate can be improved, thereby improving the efficiency of data reading.

本發明實施例中,接收請求源發送的第一讀資料請求,該第一讀資料請求用於請求按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取;根據該第一讀資料請求,獲取讀取資訊和預取資訊;基於讀取資訊,從快取模組的儲存空間、和/或記憶體中讀取第一目標資料,並將第一目標資料返回至請求源;基於預取資訊,從記憶體中預取第二目標資料,並將第二目標資料快取至快取模組的至少一個快取單元中。這樣,請求源可以將預取資訊伴隨讀資料請求發送給快取模組,因此可以藉由單次讀資料請求實現資料的讀取和預取,從而,一方面可以減少請求源與快取模組之間的資料請求壓力,壓縮請求頻寬,提高處理器性能,另一方面可以利用從讀資料請求中獲取到的預取資訊,對資料預取進行控制,進而可以便於逐任務靈活地控制預取方式,以更好地滿足各任務的資料讀取需求。In an embodiment of the present invention, a first read data request sent by a request source is received, where the first read data request is used to request data reading according to read information and data prefetching according to prefetch information. Based on the first read data request, read information and prefetch information are obtained. Based on the read information, first target data is read from the storage space and/or memory of the cache module and the first target data is returned to the request source. Based on the prefetch information, second target data is prefetched from the memory and the second target data is cached into at least one cache unit of the cache module. In this way, the request source can send prefetch information to the cache module along with the read data request, so data can be read and prefetched through a single read data request. This can, on the one hand, reduce the data request pressure between the request source and the cache module, compress the request bandwidth, and improve processor performance. On the other hand, the prefetch information obtained from the read data request can be used to control data prefetching, thereby facilitating flexible control of the prefetching method on a task-by-task basis to better meet the data reading needs of each task.

在一些實施例中,上述步驟S101至步驟S104可以由晶片中的快取模組執行;所述第一目標資料包括至少一個第一子目標資料,所述快取模組包括第一快取子模組和第二快取子模組。In some embodiments, the above steps S101 to S104 may be executed by a cache module in a chip; the first target data includes at least one first sub-target data, and the cache module includes a first cache sub-module and a second cache sub-module.

上述步驟S101可以包括如下步驟S111: 步驟S111,所述第一快取子模組接收請求源發送的第一讀資料請求。 Step S101 may include the following step S111: In step S111, the first cache sub-module receives a first data read request sent by a request source.

上述步驟S102可以包括如下步驟S112: 步驟S112,所述第一快取子模組根據所述第一讀資料請求,獲取所述讀取資訊和所述預取資訊;所述讀取資訊包括第一讀請求位址和第一讀資料長度。 Step S102 may include the following step S112: In step S112, the first cache sub-module obtains the read information and the prefetch information based on the first read request; the read information includes the first read request address and the first read data length.

上述步驟S103可以包括如下步驟S113至步驟S114: 步驟S113,所述第一快取子模組基於所述第一讀請求位址和所述第一讀資料長度,確定至少一個讀資料子請求,每一所述讀資料子請求對應的資料請求長度與所述第一快取子模組的快取單元長度相等。 Step S103 may include the following steps S113 to S114: In step S113, the first cache sub-module determines at least one read data sub-request based on the first read request address and the first read data length. The data request length corresponding to each read data sub-request is equal to the cache unit length of the first cache sub-module.

這裡,第一讀請求位址和第一讀資料長度可以對應一個讀資料子請求,也可以對應多個讀資料子請求。每一讀資料子請求對應的資料請求長度與第一快取子模組的快取單元長度相等,也即每一讀資料子請求對應的請求資料量均是按第一快取子模組的快取單元長度進行對齊的。Here, the first read request address and first read data length can correspond to one read data sub-request or multiple read data sub-requests. The data request length corresponding to each read data sub-request is equal to the cache unit length of the first cache sub-module. In other words, the request data volume corresponding to each read data sub-request is aligned according to the cache unit length of the first cache sub-module.

在一些實施方式中,第一讀資料長度小於第一快取子模組的快取單元長度,表明第一讀資料請求的資料量不足一個快取單元,可以將該第一讀請求位址和第一讀資料長度按快取單元長度對齊,得到一個資料請求長度與第一快取子模組的快取單元長度相等的讀資料子請求。In some embodiments, the first read data length is less than the cache unit length of the first cache sub-module, indicating that the data amount of the first read data request is less than one cache unit. The first read request address and the first read data length can be aligned according to the cache unit length to obtain a read data sub-request whose data request length is equal to the cache unit length of the first cache sub-module.

在一些實施方式中,第一讀資料長度等於第一快取子模組的快取單元長度,表明第一讀資料請求的資料量對應一個快取單元,可以直接基於該第一讀請求位址和第一讀資料長度得到一個讀資料子請求。In some embodiments, the first read data length is equal to the cache unit length of the first cache submodule, indicating that the data amount of the first read data request corresponds to one cache unit, and a read data sub-request can be directly obtained based on the first read data request address and the first read data length.

在一些實施方式中,第一讀資料長度大於第一快取子模組的快取單元長度,表明第一讀資料請求的資料量超過一個快取單元,可以將該第一讀請求位址和第一讀資料長度按快取單元長度拆分並對齊,得到多個讀資料子請求,且每一讀資料子請求對應的資料請求長度與第一快取子模組的快取單元長度相等。In some embodiments, the first read data length is greater than the cache unit length of the first cache sub-module, indicating that the data size of the first read data request exceeds one cache unit. The first read request address and the first read data length can be split and aligned according to the cache unit length to obtain multiple read data sub-requests, and the data request length corresponding to each read data sub-request is equal to the cache unit length of the first cache sub-module.

步驟S114,所述第一快取子模組從所述第一快取子模組的儲存空間、所述第二快取子模組的儲存空間、和/或記憶體中獲取每一所述讀資料子請求分別對應的第一子目標資料,並將各所述第一子目標資料返回至所述請求源。In step S114, the first cache sub-module obtains the first sub-target data corresponding to each of the read data sub-requests from the storage space of the first cache sub-module, the storage space of the second cache sub-module, and/or the memory, and returns each of the first sub-target data to the request source.

上述步驟S104可以包括如下步驟S115: 步驟S115,所述第二快取子模組基於所述預取資訊,從所述記憶體中預取第二目標資料,並將所述第二目標資料快取至所述第二快取子模組的至少一個快取單元中。 Step S104 may include the following step S115: In step S115, the second cache sub-module pre-fetches second target data from the memory based on the pre-fetch information and caches the second target data into at least one cache unit of the second cache sub-module.

上述實施例中,快取模組包括第一快取子模組和第二快取子模組,第一快取子模組回應於接收到請求源發送的第一讀資料請求,對第一讀資料請求進行解析,得到第一讀請求位址、第一讀資料長度和預取資訊;第一快取子模組基於第一讀請求位址和第一讀資料長度,確定至少一個讀資料子請求,每一讀資料子請求對應的資料請求長度與第一快取子模組的快取單元長度相等;第一快取子模組從第一快取子模組的儲存空間、第二快取子模組的儲存空間、和/或記憶體中獲取每一讀資料子請求分別對應的第一子目標資料,並將各第一子目標資料返回至請求源;第二快取子模組基於預取資訊,從記憶體中預取第二目標資料,並將第二目標資料快取至第二快取子模組的至少一個快取單元中。這樣,可以在包含多級快取的快取模組中,藉由請求源的單次讀資料請求簡單快速地實現資料的讀取和預取。In the above embodiment, the cache module includes a first cache sub-module and a second cache sub-module. The first cache sub-module responds to receiving a first read data request sent by a request source, parses the first read data request, and obtains the first read request address, the first read data length, and prefetch information. The first cache sub-module determines at least one read data sub-request based on the first read request address and the first read data length. The data request length corresponding to each read data sub-request is the same as the first cache sub-module. The cache units of the two sub-modules have equal lengths. The first cache sub-module retrieves the first sub-target data corresponding to each read data sub-request from the storage space of the first cache sub-module, the storage space of the second cache sub-module, and/or memory, and returns each first sub-target data to the request source. The second cache sub-module, based on prefetch information, prefetches the second target data from memory and caches the second target data into at least one cache unit of the second cache sub-module. In this way, in a cache module including multiple cache levels, data reading and prefetching can be simply and quickly achieved through a single read data request from the request source.

在一些實施例中,上述步驟S114中所述的第一快取子模組從所述第一快取子模組的儲存空間、所述第二快取子模組的儲存空間、和/或記憶體中獲取每一所述讀資料子請求分別對應的第一子目標資料,可以包括如下步驟S121至步驟S122: 步驟S121,所述第一快取子模組基於每一所述讀資料子請求,分別在所述第一快取子模組的儲存空間中進行快取單元命中檢測,得到每一所述讀資料子請求對應的第二檢測結果。 In some embodiments, the first cache sub-module in step S114 retrieves the first sub-target data corresponding to each read data sub-request from the storage space of the first cache sub-module, the storage space of the second cache sub-module, and/or memory, which may include the following steps S121 and S122: In step S121, the first cache sub-module performs a cache unit hit check in the storage space of the first cache sub-module based on each read data sub-request, and obtains a second check result corresponding to each read data sub-request.

步驟S122,所述第一快取子模組針對每一所述讀資料子請求,在所述讀資料子請求對應的第二檢測結果表徵所述讀資料子請求對應的快取單元命中的情況下,從命中的快取單元中讀取所述讀資料子請求對應的第一子目標資料,並在所述讀資料子請求對應的第二檢測結果表徵所述讀資料子請求對應的快取單元缺失的情況下,基於所述讀資料子請求對應的第二讀請求位址和第二讀資料長度、以及所述預取資訊,向所述第二快取子模組發送第二讀資料請求,並接收所述第二快取子模組返回的所述讀資料子請求對應的第一子目標資料。In step S122, for each of the read data sub-requests, if the second detection result corresponding to the read data sub-request indicates that the cache unit corresponding to the read data sub-request is hit, the first cache sub-module reads the first sub-target data corresponding to the read data sub-request from the hit cache unit; and if the second detection result corresponding to the read data sub-request indicates that the cache unit corresponding to the read data sub-request is missing, the first cache sub-module sends a second read data request to the second cache sub-module based on the second read request address and second read data length corresponding to the read data sub-request, as well as the prefetch information, and receives the first sub-target data corresponding to the read data sub-request returned by the second cache sub-module.

這裡,讀資料子請求對應的第二讀請求位址指的是該讀資料子請求所請求的資料的起始位址,讀資料子請求對應的第二讀資料長度指的是該讀資料子請求所請求的資料的長度。Here, the second read request address corresponding to the read data sub-request refers to the starting address of the data requested by the read data sub-request, and the second read data length corresponding to the read data sub-request refers to the length of the data requested by the read data sub-request.

在讀資料子請求對應的第二檢測結果表徵該讀資料子請求對應的快取單元缺失的情況下,可以基於讀資料子請求對應的第二讀請求位址和第二讀資料長度、以及預取資訊,生成第二讀資料請求,並向第二快取子模組發送該第二讀資料請求。在實施時,本領域技術人員可以根據實際情況採用任意合適的方式基於讀資料子請求對應的第二讀請求位址和第二讀資料長度、以及預取資訊,生成第二讀資料請求。本發明實施例對此並不限定。If the second detection result corresponding to the read data sub-request indicates that the cache unit corresponding to the read data sub-request is missing, a second read data request can be generated based on the second read request address and second read data length corresponding to the read data sub-request, as well as prefetch information, and sent to the second cache sub-module. In implementation, those skilled in the art can use any suitable method to generate the second read data request based on the second read request address and second read data length corresponding to the read data sub-request, as well as prefetch information, according to actual circumstances. The embodiments of the present invention are not limited to this.

在一些實施方式中,可以基於第二讀請求位址、第二讀資料長度和預取資訊,分別確定讀請求位址訊號、讀資料長度訊號和請求類型訊號,並基於該讀請求位址訊號、該讀資料長度訊號和該請求類型訊號,生成第二讀資料請求。例如,可以將第二讀請求位址編碼為第二讀資料請求中的讀請求位址訊號,將第二讀資料長度編碼為第二讀資料請求中的讀資料長度訊號,並基於預設的請求類型訊號指示的請求類型與預取資訊之間的對應關係,確定預取資訊對應的請求類型訊號,將該請求類型訊號作為第二讀資料請求的請求類型訊號。In some embodiments, a read request address signal, a read data length signal, and a request type signal may be determined based on the second read request address, the second read data length, and the prefetch information, and a second read data request may be generated based on the read request address signal, the read data length signal, and the request type signal. For example, the second read request address may be encoded as the read request address signal in the second read data request, and the second read data length may be encoded as the read data length signal in the second read data request. Based on a correspondence between a request type indicated by a preset request type signal and the prefetch information, a request type signal corresponding to the prefetch information may be determined and used as the request type signal for the second read data request.

在一些實施方式中,可以基於第二讀請求位址、第二讀資料長度和預取資訊,分別確定讀請求位址訊號、讀資料長度訊號和預取控制訊號,並基於該讀請求位址訊號、該讀資料長度訊號和該預取控制訊號,生成第二讀資料請求。例如,可以基於預設的預取控制訊號與預取資訊之間的對應關係,確定預取資訊對應的預取控制訊號。In some embodiments, a read request address signal, a read data length signal, and a prefetch control signal may be determined based on the second read request address, the second read data length, and the prefetch information, and a second read data request may be generated based on the read request address signal, the read data length signal, and the prefetch control signal. For example, the prefetch control signal corresponding to the prefetch information may be determined based on a preset correspondence between the prefetch control signal and the prefetch information.

上述步驟S115中所述的第二快取子模組基於所述預取資訊,從所述記憶體中預取第二目標資料,可以包括如下步驟S123至步驟S125: 步驟S123,所述第二快取子模組回應於接收到所述第一快取子模組發送的第二讀資料請求,對所述第二讀資料請求進行解析,得到第二讀請求位址、第二讀資料長度和所述預取資訊。 The second cache sub-module described in step S115 prefetches the second target data from the memory based on the prefetch information, which may include the following steps S123 to S125: In step S123, the second cache sub-module, in response to receiving the second read data request sent by the first cache sub-module, parses the second read data request to obtain the second read request address, the second read data length, and the prefetch information.

步驟S124,所述第二快取子模組基於所述第二讀請求位址和第二讀資料長度,從所述第二快取子模組的儲存空間、和/或記憶體中獲取第一子目標資料,並將所述第一子目標資料返回至所述第一快取子模組。In step S124, the second cache submodule obtains the first subtarget data from the storage space and/or memory of the second cache submodule based on the second read request address and the second read data length, and returns the first subtarget data to the first cache submodule.

步驟S125,所述第二快取子模組基於所述預取資訊,從所述記憶體中預取第二目標資料,並將所述第二目標資料快取至所述第二快取子模組的至少一個快取單元中。In step S125, the second cache submodule prefetches second target data from the memory based on the prefetch information, and caches the second target data into at least one cache unit of the second cache submodule.

這裡,步驟S123至步驟S125分別對應於前述實施例中的步驟S101至步驟S104,在實施時可以參照前述步驟S101至步驟S104的實施方式。Here, steps S123 to S125 correspond to steps S101 to S104 in the aforementioned embodiment, and the implementation of the aforementioned steps S101 to S104 may be referred to.

上述實施例中,第一快取子模組針對每一讀資料子請求,在該讀資料子請求對應的第二檢測結果表徵該讀資料子請求對應的快取單元命中的情況下,從命中的快取單元中讀取該讀資料子請求對應的第一子目標資料,並在該讀資料子請求對應的第二檢測結果表徵該讀資料子請求對應的快取單元缺失的情況下,基於該讀資料子請求對應的第二讀請求位址和第二讀資料長度、以及該預取資訊,向第二快取子模組發送第二讀資料請求。第二快取子模組回應於接收到所述第一快取子模組發送的第二讀資料請求,對所述第二讀資料請求進行解析,得到第二讀請求位址、第二讀資料長度和所述預取資訊。這樣,可以在第一讀資料請求對應的部分讀資料子請求在第一快取子模組中未命中快取單元的情況下,藉由該部分讀資料子請求對應的第二讀資料請求將預取資訊攜帶至第二快取子模組中,從而可以在包括多級快取的快取模組中減少請求源與第一快取子模組之間的資料請求壓力,減少請求頻寬的佔用,提高處理器性能。In the above embodiment, for each read data sub-request, if the second detection result corresponding to the read data sub-request indicates that the cache unit corresponding to the read data sub-request is hit, the first cache sub-module reads the first sub-target data corresponding to the read data sub-request from the hit cache unit. If the second detection result corresponding to the read data sub-request indicates that the cache unit corresponding to the read data sub-request is missed, the first cache sub-module sends a second read data request to the second cache sub-module based on the second read request address and second read data length corresponding to the read data sub-request, as well as the prefetch information. In response to receiving the second read data request sent by the first cache submodule, the second cache submodule parses the second read data request to obtain the second read request address, the second read data length, and the prefetch information. In this way, if a partial read data subrequest corresponding to the first read data request misses a cache unit in the first cache submodule, the prefetch information can be carried to the second cache submodule via the second read data request corresponding to the partial read data subrequest. This can reduce the data request pressure between the request source and the first cache submodule in a cache module including a multi-level cache, reduce request bandwidth usage, and improve processor performance.

在一些實施方式中,可以在全部讀資料子請求對應的快取單元均命中的情況下,將該預取資訊丟棄,即不將該預取資訊發送至第二快取子模組中。這樣,由於全部讀資料子請求對應的快取單元均命中,可以減少第一快取子模組與第二快取子模組之間的資料請求壓力,從而減少第一快取子模組與第二快取子模組之間的請求頻寬成為訪存瓶頸的情況,而通常請求源與第一快取子模組之間的請求頻寬不小於第一快取子模組與第二快取子模組之間的請求頻寬,從而也可以減少第一快取子模組與第二快取子模組之間的請求頻寬成為訪存瓶頸的情況,如此,請求源可以藉由向第一快取子模組發送單獨的預取請求對丟棄的預取資訊對應的資料進行預取,以減少第二快取子模組對讀請求攜帶的預取資訊做處理所佔用的處理資源。此外,在使用預取的場景為在循環計算場景的情況下,每次循環計算過程中請求的資料通常不存在交疊,預取資訊對應的資料在首次請求的情況下通常是缺失的,因而對該資料進行請求的第一讀資料請求中攜帶的、用於對下一份待預取的資料進行預取的預取資訊通常會被攜帶至第二快取子模組。In some embodiments, if all cache units corresponding to read data sub-requests are hit, the pre-fetch information can be discarded, that is, the pre-fetch information is not sent to the second cache sub-module. In this way, since all cache units corresponding to read data sub-requests are hit, the data request pressure between the first cache sub-module and the second cache sub-module can be reduced, thereby reducing the request bandwidth between the first cache sub-module and the second cache sub-module from becoming an access bottleneck. Generally, the request bandwidth between the request source and the first cache sub-module is not less than the request bandwidth between the first cache sub-module and the second cache sub-module. The request bandwidth between modules can be reduced, thereby also reducing the situation where the request bandwidth between the first and second cache sub-modules becomes an access bottleneck. In this way, the request source can prefetch the data corresponding to the discarded prefetch information by sending a separate prefetch request to the first cache sub-module, thereby reducing the processing resources occupied by the second cache sub-module to process the prefetch information carried by the read request. Furthermore, when prefetching is used in a loop-based calculation scenario, the data requested in each loop calculation process typically does not overlap, and the data corresponding to the prefetch information is typically missing in the first request. Therefore, the prefetch information carried in the first read request for the data, used to prefetch the next piece of data to be prefetched, is typically carried to the second cache submodule.

本發明實施例提供一種資料讀取方法,該方法可以應用於晶片中,例如可以由晶片中的快取模組執行。圖2為本發明實施例提供的一種資料讀取方法的實現流程示意圖,如圖2所示,該方法包括如下步驟S201至步驟S205: 步驟S201,接收請求源發送的第一讀資料請求,所述第一讀資料請求用於請求按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取。 An embodiment of the present invention provides a data reading method that can be applied in a chip, for example, by a cache module within the chip. Figure 2 is a schematic diagram illustrating the implementation flow of the data reading method provided by an embodiment of the present invention. As shown in Figure 2, the method includes the following steps S201 to S205: Step S201: Receive a first data read request from a request source. The first data read request is used to request data reading according to read information and to prefetch data according to prefetch information.

步驟S202,根據所述第一讀資料請求,獲取所述讀取資訊和所述預取資訊;其中,所述預取資訊包括預取資料長度。Step S202: Obtain the read information and the pre-fetch information according to the first read data request; wherein the pre-fetch information includes the pre-fetch data length.

步驟S203,基於所述讀取資訊,從快取模組的儲存空間、和/或記憶體中讀取第一目標資料,並將所述第一目標資料返回至所述請求源。Step S203: Based on the read information, read the first target data from the storage space and/or memory of the cache module, and return the first target data to the request source.

這裡,步驟S201至步驟S203可以分別對應於前述實施例中的步驟S101至步驟S103,在實施時可以參照前述步驟S101至步驟S103的實施方式。Here, steps S201 to S203 may respectively correspond to steps S101 to S103 in the aforementioned embodiment, and the implementation of the aforementioned steps S101 to S103 may be referred to.

步驟S204,在所述預取資料長度大於0的情況下,基於所述第一讀請求位址和所述第一讀資料長度,確定預取起始位址。Step S204: When the pre-fetch data length is greater than 0, a pre-fetch start address is determined based on the first read request address and the first read data length.

這裡,預取起始位址為進行資料預取的起始位址。Here, the prefetch start address is the starting address for data prefetching.

在一些實施方式中,可以將第一讀請求位址加上第一讀資料長度,得到預取起始位址。In some implementations, the first read request address may be added to the first read data length to obtain a prefetch start address.

在一些實施方式中,可以先將第一讀請求位址和第一讀資料長度,按照快取模組中的快取單元長度進行對齊後,得到對齊後的第一讀請求位址和第一讀資料長度,並將對齊後的第一讀請求位址加上對齊後的第一讀資料長度,得到預取起始位址。In some implementations, the first read request address and the first read data length may be aligned according to the cache unit length in the cache module to obtain the aligned first read request address and the first read data length, and the aligned first read request address may be added to the aligned first read data length to obtain the prefetch start address.

步驟S205,基於所述預取起始位址和所述預取資料長度,從所述記憶體中預取第二目標資料,並將所述第二目標資料快取至所述快取模組的至少一個快取單元中。In step S205, based on the prefetch start address and the prefetch data length, second target data is prefetched from the memory, and the second target data is cached into at least one cache unit of the cache module.

本發明實施例中,在預取資料長度大於0的情況下,基於第一讀請求位址和第一讀資料長度,確定預取起始位址,基於預取起始位址和預取資料長度,從記憶體中預取第二目標資料,並將第二目標資料快取至快取模組的至少一個快取單元中。這樣,可以藉由預取資料長度是否大於0,簡單方便地控制是否進行資料預取,並在預取資料長度大於0的情況下,藉由預取資料長度控制預取的第二目標資料,從而可以靈活地滿足各任務的資料讀取需求。In this embodiment of the present invention, if the prefetch data length is greater than 0, a prefetch start address is determined based on the first read request address and the first read data length. Based on the prefetch start address and the prefetch data length, second target data is prefetched from memory and cached into at least one cache unit of the cache module. This allows for simple and convenient control of whether data prefetching is performed based on whether the prefetch data length is greater than 0. Furthermore, if the prefetch data length is greater than 0, the prefetched second target data is controlled based on the prefetch data length, thereby flexibly meeting the data reading requirements of various tasks.

在一些實施例中,所述讀取資訊包括第一讀請求位址和第一讀資料長度。上述步驟S202可以包括如下步驟S211至步驟S212: 步驟S211,對所述第一讀資料請求中的請求類型訊號、讀請求位址訊號和讀資料長度訊號分別進行解析,得到第一請求類型、所述第一讀請求位址和所述第一讀資料長度。 In some embodiments, the read information includes a first read request address and a first read data length. Step S202 may include the following steps S211 and S212: Step S211: Parse the request type signal, read request address signal, and read data length signal in the first read data request to obtain the first request type, the first read request address, and the first read data length.

這裡,請求源發送給快取模組的第一讀資料請求中可以包括請求類型訊號、讀請求位址訊號和讀資料長度訊號,其中,請求類型訊號用於指示第一讀資料請求的請求類型,讀請求位址訊號用於指示該第一讀資料請求對應的待請求讀取的資料的起始位址,讀資料長度訊號用於指示待請求讀取的資料的長度。因此,藉由對第一讀資料請求中的請求類型訊號、讀請求位址訊號和讀資料長度訊號分別進行解析,可以得到第一讀資料請求的第一請求類型、第一讀請求位址和第一讀資料長度。Here, the first read data request sent by the request source to the cache module may include a request type signal, a read request address signal, and a read data length signal. The request type signal is used to indicate the request type of the first read data request, the read request address signal is used to indicate the starting address of the data to be read corresponding to the first read data request, and the read data length signal is used to indicate the length of the data to be read. Therefore, by parsing the request type signal, the read request address signal, and the read data length signal in the first read data request, the first request type, the first read request address, and the first read data length of the first read data request can be obtained.

步驟S212,基於所述第一請求類型,確定預取資料長度。Step S212: Determine the pre-fetched data length based on the first request type.

這裡,可以預先定義多種請求類型,並且不同請求類型可以對應不同的預取資料長度,從而基於第一請求類型可以確定相應的預取資料長度。在實施時,本領域技術人員可以根據實際情況定義各請求類型與預取資料長度之間的對應關係,本發明實施例對此並不限定。Here, multiple request types can be predefined, and different request types can correspond to different prefetched data lengths. Thus, the corresponding prefetched data length can be determined based on the first request type. In practice, those skilled in the art can define the correspondence between each request type and prefetched data length based on actual circumstances, and the present invention is not limited thereto.

例如,可以在請求類型訊號為0’b101的情況下,確定第一請求類型為僅讀類型,從而確定預取資料長度為0,在請求類型訊號為0’b110的情況下,確定第一請求類型為讀且預取1個快取單元的類型,從而確定預取資料長度為1個快取單元,在請求類型訊號為0’b111的情況下,確定第一請求類型為讀且預取4個快取單元的類型,從而確定預取資料長度為4個快取單元。For example, when the request type signal is 0’b101, the first request type is determined to be a read-only type, thereby determining the prefetch data length to be 0; when the request type signal is 0’b110, the first request type is determined to be a read and prefetch type of 1 cache unit, thereby determining the prefetch data length to be 1 cache unit; when the request type signal is 0’b111, the first request type is determined to be a read and prefetch type of 4 cache units, thereby determining the prefetch data length to be 4 cache units.

上述實施例中,對第一讀資料請求中的請求類型訊號、讀請求位址訊號和讀資料長度訊號分別進行解析,得到第一讀資料請求的第一請求類型、第一讀請求位址和第一讀資料長度,並基於第一請求類型,確定預取資料長度。這樣,可以簡單快速地藉由請求類型確定預取資料長度。In the above embodiment, the request type signal, read request address signal, and read data length signal in the first read data request are analyzed separately to obtain the first request type, first read request address, and first read data length of the first read data request. The prefetch data length is then determined based on the first request type. This allows for a simple and quick way to determine the prefetch data length based on the request type.

在一些實施例中,所述讀取資訊包括第一讀請求位址和第一讀資料長度。上述步驟S202可以包括如下步驟S221: 步驟S221,對所述第一讀資料請求中的讀請求位址訊號、讀資料長度訊號和預取控制訊號分別進行解析,得到所述第一讀請求位址、所述第一讀資料長度和所述預取資料長度。 In some embodiments, the read information includes a first read request address and a first read data length. Step S202 may include the following step S221: Step S221: Parse the read request address signal, the read data length signal, and the prefetch control signal in the first read data request to obtain the first read request address, the first read data length, and the prefetch data length.

這裡,第一讀資料請求中可以包括讀請求位址訊號、讀資料長度訊號和預取控制訊號,對該讀請求位址訊號進行解析可以得到第一讀請求位址,對該讀資料長度訊號進行解析可以得到第一讀資料長度,對該預取控制訊號進行解析可以得到預取資料長度。在實施時,可以預先設定多種預取控制訊號與預取資料長度之間的對應關係,基於該對應關係可以確定第一讀資料請求中的預取控制訊號對應的預取資料長度。本領域技術人員可以根據實際情況設定合適的預取控制訊號與預取資料長度之間的對應關係,本發明實施例對此並不限定。例如,可以在預取控制訊號為0’b0的情況下,確定預取資料長度為0,在預取控制訊號為0’b1的情況下,確定預取資料長度為1個快取單元。Here, the first read data request may include a read request address signal, a read data length signal, and a prefetch control signal. Parsing the read request address signal yields the first read request address, parsing the read data length signal yields the first read data length, and parsing the prefetch control signal yields the prefetch data length. In practice, a correspondence between multiple prefetch control signals and prefetch data lengths may be pre-set, and based on this correspondence, the prefetch data length corresponding to the prefetch control signal in the first read data request may be determined. Those skilled in the art may determine an appropriate correspondence between prefetch control signals and prefetch data lengths based on actual circumstances, and the present invention is not limited thereto. For example, when the prefetch control signal is 0'b0, the prefetch data length is determined to be 0, and when the prefetch control signal is 0'b1, the prefetch data length is determined to be 1 cache unit.

上述實施例中,對第一讀資料請求中的讀請求位址訊號、讀資料長度訊號和預取控制訊號分別進行解析,得到第一讀資料請求的第一讀請求位址、第一讀資料長度和預取資料長度。這樣,可以簡單快速地藉由預取控制訊號確定預取資料長度。In the above embodiment, the read request address signal, read data length signal, and prefetch control signal in the first read data request are analyzed separately to obtain the first read request address, first read data length, and prefetch data length of the first read data request. In this way, the prefetch data length can be determined simply and quickly using the prefetch control signal.

在一些實施例中,上述步驟S205可以包括如下步驟S231至步驟S232: 步驟S231,基於所述預取起始位址和所述預取資料長度,確定至少一個預取子請求,每一所述預取子請求對應的資料請求長度與所述快取模組的快取單元長度相等。 In some embodiments, step S205 may include the following steps S231 and S232: Step S231: Based on the prefetch start address and the prefetch data length, determine at least one prefetch sub-request, where the data request length corresponding to each prefetch sub-request is equal to the cache unit length of the cache module.

這裡,預取起始位址和預取資料長度可以對應一個預取子請求,也可以對應多個預取子請求。每一預取子請求對應的資料請求長度與第一快取子模組的快取單元長度相等,也即每一預取子請求對應的請求資料量均是按第一快取子模組的快取單元長度進行對齊的。Here, the prefetch start address and prefetch data length can correspond to one prefetch sub-request or multiple prefetch sub-requests. The data request length corresponding to each prefetch sub-request is equal to the cache unit length of the first cache sub-module. In other words, the request data volume corresponding to each prefetch sub-request is aligned with the cache unit length of the first cache sub-module.

在一些實施方式中,預取資料長度小於第一快取子模組的快取單元長度,表明第一讀資料請求所請求預取的資料量不足一個快取單元,可以將該預取起始位址和預取資料長度按快取單元長度對齊,得到一個資料請求長度與第一快取子模組的快取單元長度相等的預取子請求。In some embodiments, the prefetch data length is less than the cache unit length of the first cache sub-module, indicating that the amount of data requested for prefetching by the first read data request is less than one cache unit. The prefetch start address and the prefetch data length can be aligned according to the cache unit length to obtain a prefetch sub-request whose data request length is equal to the cache unit length of the first cache sub-module.

在一些實施方式中,第一讀資料長度等於第一快取子模組的快取單元長度,表明第一讀資料請求所請求預取的資料量對應一個快取單元,可以直接基於預取起始位址和預取資料長度得到一個預取子請求。In some embodiments, the first read data length is equal to the cache unit length of the first cache submodule, indicating that the amount of data requested for prefetching by the first read data request corresponds to one cache unit. A prefetch sub-request can be directly obtained based on the prefetch start address and the prefetch data length.

在一些實施方式中,第一讀資料長度大於第一快取子模組的快取單元長度,表明第一讀資料請求所請求預取的資料量超過一個快取單元,可以將該預取起始位址和預取資料長度按快取單元長度拆分並對齊,得到多個預取子請求,且每一預取子請求對應的資料請求長度與第一快取子模組的快取單元長度相等。In some embodiments, the first read data length is greater than the cache unit length of the first cache sub-module, indicating that the amount of data requested for prefetching by the first read data request exceeds one cache unit. The prefetch start address and prefetch data length can be split and aligned according to the cache unit length to obtain multiple prefetch sub-requests, and the data request length corresponding to each prefetch sub-request is equal to the cache unit length of the first cache sub-module.

步驟S232,針對每一所述預取子請求,基於所述預取子請求,在所述快取模組的儲存空間中進行快取單元命中檢測,得到第一檢測結果,並在所述第一檢測結果表徵所述預取子請求對應的快取單元缺失的情況下,基於所述預取子請求,從所述記憶體中預取第二目標資料,並將所述第二目標資料快取至所述快取模組的至少一個快取單元中。In step S232, for each of the prefetch sub-requests, a cache unit hit detection is performed in the storage space of the cache module based on the prefetch sub-request to obtain a first detection result. If the first detection result indicates that the cache unit corresponding to the prefetch sub-request is missing, second target data is prefetched from the memory based on the prefetch sub-request and the second target data is cached into at least one cache unit of the cache module.

上述實施例中,可以分別針對每一預取子請求進行快取單元命中檢測,並在預取子請求對應的快取單元缺失的情況下,基於該預取子請求從記憶體中預取第二目標資料。這樣,以預取子請求為細微性進行資料預取,可以提高對資料預取進行控制的靈活度。In the above embodiment, cache unit hit detection can be performed for each prefetch sub-request. If the cache unit corresponding to the prefetch sub-request is missing, the second target data is prefetched from the memory based on the prefetch sub-request. In this way, data prefetching is performed at a granular level based on the prefetch sub-request, which can improve the flexibility of controlling data prefetching.

在一些實施例中,所述預取資料長度為快取單元長度的整數倍,所述快取單元包括快取塊或快取扇區。這樣,可以更好地適配相應快取單元細微性的快取模組。In some embodiments, the pre-fetched data length is an integer multiple of the cache unit length, and the cache unit includes a cache block or a cache sector. This can better adapt the cache module to the specific details of the cache unit.

本發明實施例提供一種資料讀取方法,該方法可以應用於晶片中,例如可以由晶片中的快取模組執行。圖3為本發明實施例提供的一種資料讀取方法的實現流程示意圖,如圖3所示,該方法包括如下步驟S301至步驟S306:An embodiment of the present invention provides a data reading method that can be applied to a chip, for example, by a cache module in the chip. FIG3 is a schematic diagram of the implementation process of a data reading method provided by an embodiment of the present invention. As shown in FIG3 , the method includes the following steps S301 to S306:

步驟S301,接收請求源發送的多個第一讀資料請求,所述第一讀資料請求用於請求按照與所述第一讀資料請求對應的讀取資訊進行資料讀取,並按照與所述第一讀資料請求對應的預取資訊進行資料預取。In step S301, a plurality of first data read requests sent by a request source are received, wherein the first data read requests are used to request data reading according to read information corresponding to the first data read requests and data prefetching according to prefetch information corresponding to the first data read requests.

步驟S302,根據每一所述第一讀資料請求,分別獲取與所述第一讀資料請求對應的讀取資訊以及與所述第一讀資料請求對應的預取資訊;其中,所述讀取資訊包括第一讀請求位址和第一讀資料長度。In step S302, according to each first read data request, read information corresponding to the first read data request and prefetch information corresponding to the first read data request are obtained respectively; wherein the read information includes a first read request address and a first read data length.

這裡,步驟S301至步驟S302分別對應於前述實施例中的步驟S101至步驟S102,在實施時可以參照前述步驟S101至步驟S102的實施方式。Here, steps S301 to S302 correspond to steps S101 to S102 in the aforementioned embodiment, and the implementation of the aforementioned steps S101 to S102 may be referred to.

步驟S303,針對每一所述第一讀資料請求,基於所述第一讀資料請求對應的第一讀請求位址和第一讀資料長度,確定所述第一讀資料請求對應的快取單元。In step S303, for each first read data request, a cache unit corresponding to the first read data request is determined based on the first read request address and the first read data length corresponding to the first read data request.

這裡,第一讀資料請求對應的快取單元可以是基於第一讀資料請求對應的第一讀請求位址和第一讀資料長度進行映射得到的。Here, the cache unit corresponding to the first read data request may be obtained by mapping based on the first read request address and the first read data length corresponding to the first read data request.

步驟S304,在所述多個第一讀資料請求分別對應的快取單元重合的情況下,將所述多個第一讀資料請求對應的第一讀請求位址和第一讀資料長度分別合併,得到合併後的第一讀請求位址和第一讀資料長度,並將所述多個第一讀資料請求對應的預取資訊合併,得到合併後的預取資訊。In step S304, when the cache units corresponding to the multiple first read data requests overlap, the first read request addresses and first read data lengths corresponding to the multiple first read data requests are merged to obtain a merged first read request address and first read data length, and the prefetch information corresponding to the multiple first read data requests is merged to obtain a merged prefetch information.

這裡,每一第一讀資料請求對應的第一讀請求位址和第一讀資料長度可以分別確定一個位址區間。Here, the first read request address and first read data length corresponding to each first read data request can each determine an address range.

在一些實施方式中,可以將各第一讀資料請求對應的位址區間的並集確定為目標位址區間,將該目標位址區間的起始位址確定為合併後的第一讀請求位址,將該目標位址區間的長度確定為合併後的第一讀資料長度。In some implementations, the union of the address intervals corresponding to the first read data requests may be determined as the target address interval, the starting address of the target address interval may be determined as the merged first read request address, and the length of the target address interval may be determined as the merged first read data length.

在一些實施方式中,可以先將各第一讀資料請求對應的位址區間按照快取模組中快取單元的長度進行對齊,並將對齊後的各位址區間的並集確定為目標位址區間,然後將該目標位址區間的起始位址確定為合併後的第一讀請求位址,將該目標位址區間的長度確定為合併後的第一讀資料長度。In some implementations, the address intervals corresponding to the first read data requests may be aligned according to the length of the cache unit in the cache module, and the union of the aligned address intervals may be determined as the target address interval. The starting address of the target address interval may then be determined as the merged first read request address, and the length of the target address interval may be determined as the merged first read data length.

步驟S305,基於合併後的所述第一讀請求位址和所述第一讀資料長度,從快取模組的儲存空間、和/或記憶體中獲取第一目標資料,並將所述第一目標資料返回至所述請求源。In step S305, based on the combined first read request address and the first read data length, first target data is obtained from the storage space and/or memory of the cache module, and the first target data is returned to the request source.

步驟S306,基於合併後的所述預取資訊,從所述記憶體中預取第二目標資料,並將所述第二目標資料快取至所述快取模組的至少一個快取單元中。In step S306, based on the merged prefetch information, second target data is prefetched from the memory, and the second target data is cached into at least one cache unit of the cache module.

這裡,步驟S305至步驟S306分別對應於前述實施例中的步驟S103至步驟S104,在實施時可以參照前述步驟S103至步驟S104的實施方式。Here, steps S305 to S306 correspond to steps S103 to S104 in the aforementioned embodiment, and the implementation of the aforementioned steps S103 to S104 may be referred to.

本發明實施例中,對於接收到的多個第一讀資料請求,可以在多個第一讀資料請求分別對應的快取單元重合的情況下,將多個第一讀資料請求對應的第一讀請求位址和第一讀資料長度分別合併,得到合併後的第一讀請求位址和第一讀資料長度,並將多個第一讀資料請求對應的預取資訊合併,得到合併後的預取資訊。這樣,可以基於合併後的一份預取資訊,從記憶體中預取第二目標資料,從而可以減少從記憶體中預取資料的次數,進而可以減少快取模組與記憶體之間的資料請求次數,進一步提升處理器性能。In an embodiment of the present invention, when multiple first read data requests are received and the cache units corresponding to the multiple first read data requests overlap, the first read request addresses and first read data lengths corresponding to the multiple first read data requests can be merged to obtain a merged first read request address and first read data length. Furthermore, the prefetch information corresponding to the multiple first read data requests can be merged to obtain a merged prefetch information. In this way, the second target data can be prefetched from the memory based on the merged prefetch information, thereby reducing the number of data prefetches from the memory and, in turn, reducing the number of data requests between the cache module and the memory, further improving processor performance.

在一些實施例中,上述步驟S304中所述的將所述多個第一讀資料請求對應的預取資訊合併,得到合併後的預取資訊,包括如下步驟S311: 步驟S311,將所述多個第一讀資料請求中第一讀請求位址最靠後的第一讀資料請求對應的預取資訊,確定為合併後的預取資訊。 In some embodiments, merging the prefetch information corresponding to the multiple first read data requests to obtain the merged prefetch information in step S304 includes the following step S311: Step S311: Determine the prefetch information corresponding to the first read data request with the latest first read request address among the multiple first read data requests as the merged prefetch information.

這裡,由於多個第一讀資料請求分別對應的快取單元重合,從而某一第一讀資料請求對應的預取資訊所指示的待預取資料,可能會與第一讀請求位址在該第一讀資料請求對應的第一讀請求位址之後的其他第一讀資料請求所請求讀取的資料重複。因此,可以將多個第一讀資料請求中第一讀請求位址最靠後的第一讀資料請求對應的預取資訊確定為合併後的預取資訊。這樣,可以減少不必要的資料重複預取,進一步提升處理器的性能。Here, because the cache units corresponding to multiple first read data requests overlap, the data to be prefetched indicated by the prefetch information corresponding to a first read data request may overlap with the data requested by other first read data requests whose first read request addresses follow the first read request address corresponding to the first read data request. Therefore, the prefetch information corresponding to the first read data request with the latest first read request address among the multiple first read data requests can be determined as the merged prefetch information. This can reduce unnecessary data duplication and further improve processor performance.

本發明實施例提供一種資料讀取方法,該方法可以應用於晶片中。圖4為本發明實施例提供的一種資料讀取方法的實現流程示意圖,如圖4所示,該方法包括如下步驟S401至步驟S404: 步驟S401,獲取讀資料指令;所述讀資料指令用於指示按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取。 An embodiment of the present invention provides a data reading method that can be applied to a chip. Figure 4 is a schematic diagram illustrating the implementation flow of the data reading method provided by the embodiment of the present invention. As shown in Figure 4, the method includes the following steps S401 to S404: Step S401: Obtain a data read instruction; the data read instruction is used to instruct data reading according to read information and data prefetching according to prefetch information.

這裡,讀資料指令可以是晶片中的計算模組當前執行的任務中用於進行資料讀取的指令。在實施時,本領域技術人員可以根據實際執行的任務設置合適的讀資料指令。Here, the read data instruction can be an instruction for reading data in the task currently executed by the computing module in the chip. During implementation, those skilled in the art can set an appropriate read data instruction based on the task actually executed.

在一些實施方式中,讀資料指令可以由應用軟體生成後發送給晶片中的計算模組,以指示該計算模組根據該讀資料指令按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取。其中,讀取資訊和預取資訊可以是由該應用軟體根據實際的資料讀取需求確定的。In some embodiments, a read instruction may be generated by application software and sent to a computing module in the chip, instructing the computing module to read data according to read information and prefetch data according to prefetch information. The read information and prefetch information may be determined by the application software based on actual data read requirements.

在一些實施方式中,讀資料指令也可以由晶片中的某一計算模組生成後發送給另一計算模組,以指示該另一計算模組根據該讀資料指令按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取。其中,讀取資訊和預取資訊可以是由該某一計算模組根據實際的資料讀取需求確定的。In some embodiments, a data read instruction may be generated by a computing module within a chip and sent to another computing module, instructing the other computing module to read data according to the read instruction and prefetch data according to prefetch information. The read information and prefetch information may be determined by the computing module based on actual data read requirements.

步驟S402,根據所述讀資料指令獲取第一讀資料請求;所述第一讀資料請求中包含所述讀取資訊和所述預取資訊。In step S402, a first read data request is obtained according to the read data instruction; the first read data request includes the read information and the pre-fetch information.

這裡,可以對讀資料指令進行解析,得到該讀資料指令對應的讀取資訊和預取資訊,並根據該讀取資訊和預取資訊生成第一讀資料請求。在實施時,可以根據實際情況採用任意合適的解析方式,對讀資料指令進行解析,本發明實施例對此並不限定。Here, the read instruction can be parsed to obtain read information and prefetch information corresponding to the read instruction, and a first read request can be generated based on the read information and prefetch information. During implementation, any appropriate parsing method can be used to parse the read instruction according to actual circumstances, and the embodiments of the present invention are not limited to this.

在一些實施方式中,讀資料指令中可以包括讀取控制編碼和預取控制編碼,對讀取控制編碼進行解析可以得到讀取資訊,對該預取控制編碼進行解析可以得到預取資訊。In some implementations, the data read instruction may include a read control code and a prefetch control code. Parsing the read control code may obtain read information, and parsing the prefetch control code may obtain prefetch information.

在一些實施方式中,讀資料指令中可以包括讀請求位址編碼、讀資料長度編碼和預取控制編碼,對該讀請求位址編碼進行解析可以得到第一讀請求位址,對該讀資料長度編碼進行解析可以得到第一讀資料長度,對該預取控制編碼進行解析可以得到預取資訊。在實施時,可以預先設定多種預取控制編碼與預取資訊之間的對應關係,基於該對應關係可以確定讀資料指令中的預取控制編碼對應的預取資訊。本領域技術人員可以根據實際情況設定合適的預取控制編碼與預取資訊之間的對應關係,本發明實施例對此並不限定。In some embodiments, a read instruction may include a read request address code, a read data length code, and a prefetch control code. Parsing the read request address code may yield a first read request address, parsing the read data length code may yield a first read data length, and parsing the prefetch control code may yield prefetch information. During implementation, a correspondence between multiple prefetch control codes and prefetch information may be pre-set, and based on this correspondence, the prefetch information corresponding to the prefetch control code in the read instruction may be determined. Those skilled in the art may determine an appropriate correspondence between prefetch control codes and prefetch information based on actual circumstances, and the embodiments of the present invention are not limited thereto.

在一些實施方式中,預取資訊包括表徵是否進行資料預取的預取開關狀態,可以藉由預取控制編碼指示預取開關狀態。例如,可以在預取控制編碼為0’b0的情況下,確定預取開關狀態為表徵不進行資料預取的關閉狀態,在預取控制編碼為0’b1的情況下,確定預取開關狀態為表徵進行資料預取的開啟狀態。在預取開關狀態為表徵進行資料預取的開啟狀態的情況下,預取資料長度可以是默認的,也可以是藉由讀資料指令中的其他位域進行控制的,這裡並不限定。In some embodiments, the prefetch information includes a prefetch switch state indicating whether data prefetching is to be performed. The prefetch switch state can be indicated by a prefetch control code. For example, when the prefetch control code is 0'b0, the prefetch switch state can be determined to be a closed state indicating that data prefetching is not to be performed. When the prefetch control code is 0'b1, the prefetch switch state can be determined to be an open state indicating that data prefetching is to be performed. When the prefetch switch state is an open state indicating that data prefetching is to be performed, the prefetch data length can be a default or controlled by other bit fields in the read data instruction, which is not limited here.

在一些實施方式中,預取資訊包括預取資料長度,可以藉由預取控制編碼指示預取資料長度。例如,可以在預取控制編碼為0’b0的情況下,確定預取資料長度為0,在預取控制編碼為0’b1的情況下,確定預取資料長度為1個快取單元。又如,可以在預取控制編碼為0’b00的情況下,確定預取資料長度為0,在預取控制編碼為0’b01的情況下,確定預取資料長度為1個快取單元,在預取控制編碼為0’b10的情況下,確定預取資料長度為2個快取單元,在預取控制編碼為0’b11的情況下,確定預取資料長度為3個快取單元。In some embodiments, the prefetch information includes the prefetch data length, and the prefetch data length can be indicated by a prefetch control code. For example, when the prefetch control code is 0’b0, the prefetch data length can be determined to be 0, and when the prefetch control code is 0’b1, the prefetch data length can be determined to be 1 cache unit. For another example, when the prefetch control code is 0’b00, the prefetch data length can be determined to be 0, when the prefetch control code is 0’b01, the prefetch data length can be determined to be 1 cache unit, when the prefetch control code is 0’b10, the prefetch data length can be determined to be 2 cache units, and when the prefetch control code is 0’b11, the prefetch data length can be determined to be 3 cache units.

在一些實施方式中,讀資料指令中可以包括指令類型編碼、讀請求位址編碼和讀資料長度編碼,對該讀請求位址編碼進行解析可以得到第一讀請求位址,對該讀資料長度編碼進行解析可以得到第一讀資料長度,對該指令類型編碼進行解析可以得到預取資訊。在實施時,可以預先定義多種指令類型編碼,可以藉由指令類型編碼指示請求類型,並且不同指令類型編碼指示的請求類型可以對應不同的預取資訊,從而基於讀資料指令中的指令類型編碼可以確定相應的預取資訊。本領域技術人員可以根據實際情況定義合適的指令類型編碼、以及各指令類型編碼指示的請求類型與預取資訊之間的對應關係,本發明實施例對此並不限定。In some implementations, a read data instruction may include an instruction type code, a read request address code, and a read data length code. Parsing the read request address code may yield a first read request address, parsing the read data length code may yield a first read data length, and parsing the instruction type code may yield prefetch information. During implementation, multiple instruction type codes may be predefined, and the instruction type codes may indicate a request type. Request types indicated by different instruction type codes may correspond to different prefetch information, thereby determining corresponding prefetch information based on the instruction type code in the read data instruction. Those skilled in the art can define appropriate instruction type codes and the correspondence between the request type indicated by each instruction type code and the prefetch information according to actual circumstances, and the embodiments of the present invention are not limited thereto.

在一些實施方式中,預取資訊包括表徵是否進行資料預取的預取開關狀態,可以藉由指令類型編碼指示請求類型,並根據請求類型確定預取開關狀態。例如,可以在指令類型編碼為0’b101的情況下,確定請求類型為僅讀類型,即僅進行資料讀取、不進行資料預取,從而預取開關狀態為表徵不進行資料預取的關閉狀態;在指令類型編碼為0’b110的情況下,確定請求類型為讀伴隨預取類型,即進行資料讀取、也進行資料預取,從而預取開關狀態為表徵進行資料預取的開啟狀態。In some embodiments, the prefetch information includes a prefetch switch state indicating whether data prefetching is performed. The request type may be indicated by the instruction type encoding, and the prefetch switch state may be determined based on the request type. For example, if the instruction type encoding is 0'b101, the request type may be determined to be a read-only type, i.e., only data reading is performed without data prefetching, and the prefetch switch state is set to a closed state, indicating that data prefetching is not performed. If the instruction type encoding is 0'b110, the request type may be determined to be a read-with-prefetch type, i.e., both data reading and data prefetching are performed, and the prefetch switch state is set to an open state, indicating that data prefetching is performed.

在一些實施方式中,預取資訊包括預取資料長度,可以藉由指令類型編碼指示請求類型,並根據請求類型確定預取資料長度。例如,可以在指令類型編碼為0’b101的情況下,確定請求類型為僅讀類型,從而確定預取資料長度為0,在指令類型編碼為0’b110的情況下,確定請求類型為讀且預取1個快取單元的類型,從而確定預取資料長度為1個快取單元,在指令類型編碼為0’b111的情況下,確定請求類型為讀且預取4個快取單元的類型,從而確定預取資料長度為4個快取單元。In some embodiments, the prefetch information includes a prefetch data length. The request type may be indicated by the instruction type encoding, and the prefetch data length may be determined based on the request type. For example, if the instruction type encoding is 0'b101, the request type may be determined to be read-only, thereby determining the prefetch data length to be 0. If the instruction type encoding is 0'b110, the request type may be determined to be read with a prefetch of 1 cache unit, thereby determining the prefetch data length to be 1 cache unit. If the instruction type encoding is 0'b111, the request type may be determined to be read with a prefetch of 4 cache units, thereby determining the prefetch data length to be 4 cache units.

在一些實施方式中,讀取資訊可以包含在第一讀資料請求中的讀取控制訊號中,預取資訊可以包含在第一讀資料請求中的請求類型訊號中。可以基於讀取資訊和預取資訊,分別確定讀取控制訊號和請求類型訊號,並基於該讀取控制訊號和該請求類型訊號,生成第一讀資料請求。In some implementations, the read information may be included in a read control signal in the first read request, and the prefetch information may be included in a request type signal in the first read request. The read control signal and the request type signal may be determined based on the read information and the prefetch information, respectively, and the first read request may be generated based on the read control signal and the request type signal.

在一些實施方式中,讀取資訊可以包含在第一讀資料請求中的讀取控制訊號中,預取資訊可以包含在第一讀資料請求中的預取控制訊號中。可以基於讀取資訊和預取資訊,分別確定讀取控制訊號和預取控制訊號,並基於該讀取控制訊號和該預取控制訊號,生成第一讀資料請求。In some implementations, the read information may be included in a read control signal in the first read request, and the prefetch information may be included in a prefetch control signal in the first read request. A read control signal and a prefetch control signal may be determined based on the read information and the prefetch information, respectively, and the first read request may be generated based on the read control signal and the prefetch control signal.

步驟S403,向所述快取模組發送所述第一讀資料請求,以使所述快取模組回應於所述第一讀資料請求,按照所述讀取資訊讀取並返回第一目標資料、以及按照所述預取資訊預取並快取第二目標資料。In step S403, the first read data request is sent to the cache module, so that the cache module responds to the first read data request by reading and returning the first target data according to the read information, and prefetching and caching the second target data according to the prefetch information.

步驟S404,接收所述快取模組返回的所述第一目標資料。Step S404: receiving the first target data returned by the cache module.

本發明實施例中,獲取讀資料指令,該讀資料指令用於指示按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取;根據該讀資料指令獲取第一讀資料請求,該第一讀資料請求中包含該讀取資訊和該預取資訊;向快取模組發送第一讀資料請求,以使快取模組回應於該第一讀資料請求,按照該讀取資訊讀取並返回第一目標資料、以及按照該預取資訊預取並快取第二目標資料;接收快取模組返回的該第一目標資料。這樣,讀資料請求的請求源可以將預取資訊伴隨讀資料請求發送給快取模組,因此可以藉由單次讀資料請求實現資料的讀取和預取,從而,一方面可以減少計算模組與快取模組之間的資料請求壓力,壓縮請求頻寬,提高處理器性能,另一方面可以藉由從讀資料指令中解析得到的預取資訊對資料預取進行控制,實現指令級的預取控制,進而可以更好地滿足各任務的資料讀取需求。In an embodiment of the present invention, a read data instruction is obtained, the read data instruction being used to instruct data reading according to read information and data prefetching according to prefetch information; a first read data request is obtained according to the read data instruction, the first read data request including the read information and the prefetch information; the first read data request is sent to a cache module, so that the cache module responds to the first read data request by reading and returning first target data according to the read information and prefetching and caching second target data according to the prefetch information; and the first target data returned by the cache module is received. In this way, the request source of a read request can send prefetch information along with the read request to the cache module, allowing data to be read and prefetched with a single read request. This, on the one hand, reduces the data request pressure between the compute module and the cache module, compresses the request bandwidth, and improves processor performance. On the other hand, data prefetching can be controlled by parsing the prefetch information from the read instruction, achieving instruction-level prefetch control, thereby better meeting the data reading needs of each task.

在一些實施例中,所述讀取資訊包括第一讀請求位址和第一讀資料長度。上述步驟S401可以包括如下步驟S411: 步驟S411,根據所述第一讀請求位址、所述第一讀資料長度和所述預取資訊進行指令編碼,得到所述讀資料指令。 In some embodiments, the read information includes a first read request address and a first read data length. Step S401 may include the following step S411: Step S411: Encode the command based on the first read request address, the first read data length, and the prefetch information to obtain the read data command.

這裡,可以藉由對第一讀請求位址、第一讀資料長度和預取資訊進行指令編碼,將第一讀請求位址、第一讀資料長度和預取資訊編碼至讀資料指令中,從而簡單快速地獲得讀資料指令。Here, by encoding the first read request address, the first read data length, and the prefetch information into the read data instruction, the read data instruction can be obtained simply and quickly.

在一些實施方式中,讀資料指令中包括指令類型編碼、讀請求位址編碼和讀資料長度編碼。上述步驟S411可以包括如下步驟S421至步驟S423: 步驟S421,從預設的指令類型編碼集合中,確定與所述預取資訊對應的所述指令類型編碼。 In some implementations, the read data instruction includes an instruction type code, a read request address code, and a read data length code. Step S411 may include the following steps S421 through S423: Step S421: Determine the instruction type code corresponding to the prefetch information from a preset set of instruction type codes.

步驟S422,對所述第一讀請求位址進行編碼,得到所述讀請求位址編碼。In step S422, the first read request address is encoded to obtain the read request address code.

步驟S423,對所述第一讀資料長度進行編碼,得到所述讀資料長度編碼。Step S423: Encode the first read data length to obtain the read data length code.

這裡,指令類型編碼集合中可以包括預先設定的至少一種指令類型編碼,不同的指令類型編碼可以表徵不同的請求類型。在實施時,本領域技術人員可以根據實際應用場景預先設定合適的指令類型編碼集合,本發明實施例對此並不限定。Here, the instruction type code set may include at least one pre-defined instruction type code, and different instruction type codes may represent different request types. During implementation, those skilled in the art may pre-define an appropriate instruction type code set based on actual application scenarios, and the present invention is not limited to this.

上述實施例中,藉由將預取資訊編碼至指令類型編碼中,可以減少讀資料指令中位域的佔用,從而可以簡化指令集的設計。In the above embodiment, by encoding prefetch information into the instruction type code, the bit field occupancy in the data read instruction can be reduced, thereby simplifying the design of the instruction set.

在一些實施例中,上述步驟S402可以包括如下步驟S431至步驟S432: 步驟S431,對所述指令類型編碼、所述讀請求位址編碼和所述讀資料長度編碼分別進行解析,得到所述預取資訊、所述第一讀請求位址和所述第一讀資料長度; 步驟S432,根據所述預取資訊、所述第一讀請求位址和所述第一讀資料長度,生成所述第一讀資料請求。 In some embodiments, step S402 may include the following steps S431 and S432: Step S431: Parsing the instruction type code, the read request address code, and the read data length code to obtain the prefetch information, the first read request address, and the first read data length; Step S432: Generating the first read data request based on the prefetch information, the first read request address, and the first read data length.

這裡,讀資料指令中可以包括指令類型編碼、讀請求位址編碼和讀資料長度編碼,對該讀請求位址編碼進行解析可以得到第一讀請求位址,對該讀資料長度編碼進行解析可以得到第一讀資料長度,對該指令類型編碼進行解析可以得到預取資訊。在一些實施方式中,預取資訊包括預取資料長度。在實施時,可以預先定義多種指令類型編碼,並且不同指令類型編碼指示的請求類型可以對應不同的預取資料長度,從而基於讀資料指令中的指令類型編碼可以確定相應的預取資料長度。Here, the read data instruction may include an instruction type code, a read request address code, and a read data length code. Parsing the read request address code may yield a first read request address, parsing the read data length code may yield a first read data length, and parsing the instruction type code may yield prefetch information. In some embodiments, the prefetch information includes the prefetch data length. During implementation, multiple instruction type codes may be predefined, and request types indicated by different instruction type codes may correspond to different prefetch data lengths. Thus, the corresponding prefetch data length may be determined based on the instruction type code in the read data instruction.

上述實施例中,對讀資料指令中的指令類型編碼、讀請求位址編碼和讀資料長度編碼分別進行解析,得到預取資訊、第一讀請求位址和第一讀資料長度。這樣,可以簡單快速地藉由請求類型確定預取資訊,從而根據該預取資訊、第一讀請求位址和第一讀資料長度快速生成第一讀資料請求。In the above embodiment, the instruction type code, read request address code, and read data length code in the read data instruction are parsed separately to obtain prefetch information, the first read request address, and the first read data length. This allows the prefetch information to be quickly and easily determined based on the request type, and the first read data request can be quickly generated based on the prefetch information, the first read request address, and the first read data length.

在一些實施例中,所述讀資料指令中包括讀請求位址編碼、讀資料長度編碼和預取控制編碼。上述步驟S411可以包括如下步驟S441至步驟S443: 步驟S441,對所述第一讀請求位址進行編碼,得到所述讀請求位址編碼; 步驟S442,對所述第一讀資料長度進行編碼,得到所述讀資料長度編碼; 步驟S443,對所述預取資訊進行編碼,得到所述預取控制編碼。 In some embodiments, the read data instruction includes a read request address code, a read data length code, and a prefetch control code. Step S411 may include the following steps S441 to S443: Step S441: Encode the first read request address to obtain the read request address code; Step S442: Encode the first read data length to obtain the read data length code; Step S443: Encode the prefetch information to obtain the prefetch control code.

可以理解的是,對第一讀請求位址、第一讀資料長度和預取資訊進行編碼的方式可以是根據實際情況確定的任意合適的編碼方式,本發明實施例對此並不限定。It is understandable that the encoding method for the first read request address, the first read data length, and the pre-fetch information can be any appropriate encoding method determined according to the actual situation, and the embodiment of the present invention is not limited to this.

這樣,藉由將預取資訊單獨編碼至預取控制編碼中,可以無需對已有的指令類型編碼進行擴展,從而可以簡化對指令類型編碼的編解碼複雜度。Thus, by encoding the prefetch information separately into the prefetch control code, there is no need to expand the existing instruction type code, thereby simplifying the encoding and decoding complexity of the instruction type code.

在一些實施方式中,預取資訊可以包括預取資料長度,藉由對預取資料長度進行編碼,可以得到預取控制編碼。In some implementations, the prefetch information may include the prefetch data length, and the prefetch control code may be obtained by encoding the prefetch data length.

在一些實施例中,上述步驟S402可以包括如下步驟S451至步驟S452: 步驟S451,對所述讀請求位址編碼、所述讀資料長度編碼和所述預取控制編碼分別進行解析,得到所述第一讀請求位址、所述第一讀資料長度和所述預取資訊。 In some embodiments, step S402 may include the following steps S451 to S452: Step S451: Parse the read request address code, the read data length code, and the prefetch control code to obtain the first read request address, the first read data length, and the prefetch information.

步驟S452,根據所述預取資訊、所述第一讀請求位址和所述第一讀資料長度,生成所述第一讀資料請求。Step S452: Generate the first read data request based on the prefetch information, the first read request address, and the first read data length.

這裡,讀資料指令中可以包括讀請求位址編碼、讀資料長度編碼和預取控制編碼,對該讀請求位址編碼進行解析可以得到第一讀請求位址,對該讀資料長度編碼進行解析可以得到第一讀資料長度,對該預取控制編碼進行解析可以得到預取資訊。在一些實施方式中,預取資訊包括預取資料長度。在實施時,可以預先設定多種預取控制編碼與預取資料長度之間的對應關係,基於該對應關係可以確定讀資料指令中的預取控制編碼對應的預取資料長度。例如,可以在預取控制編碼為0’b0的情況下,確定預取資料長度為0,在預取控制編碼為0’b1的情況下,確定預取資料長度為1個快取單元。Here, the read data instruction may include a read request address code, a read data length code, and a prefetch control code. Parsing the read request address code can obtain a first read request address, parsing the read data length code can obtain a first read data length, and parsing the prefetch control code can obtain prefetch information. In some embodiments, the prefetch information includes the prefetch data length. During implementation, a correspondence between multiple prefetch control codes and prefetch data lengths can be pre-set. Based on this correspondence, the prefetch data length corresponding to the prefetch control code in the read data instruction can be determined. For example, when the prefetch control code is 0'b0, the prefetch data length is determined to be 0, and when the prefetch control code is 0'b1, the prefetch data length is determined to be 1 cache unit.

上述實施例中,對讀資料指令中的讀請求位址編碼、讀資料長度編碼和預取控制編碼分別進行解析,得到第一讀請求位址、第一讀資料長度和預取資料長度。這樣,可以簡單快速地藉由預取控制編碼指示預取資訊,從而根據該預取資訊、第一讀請求位址和第一讀資料長度快速生成第一讀資料請求。In the above embodiment, the read request address code, read data length code, and prefetch control code in the read data instruction are parsed separately to obtain the first read request address, first read data length, and prefetch data length. This allows the prefetch control code to simply and quickly indicate prefetch information, thereby quickly generating the first read data request based on the prefetch information, the first read request address, and the first read data length.

在一些實施例中,上述步驟S432和/或步驟S452可以包括如下步驟S461或步驟S462: 步驟S461,根據所述第一讀請求位址、所述第一讀資料長度和所述預取資訊,分別確定讀請求位址訊號、讀資料長度訊號和請求類型訊號,並根據所述讀請求位址訊號、所述讀資料長度訊號和所述請求類型訊號,生成所述第一讀資料請求。 In some embodiments, step S432 and/or step S452 may include the following step S461 or step S462: Step S461: Determine a read request address signal, a read data length signal, and a request type signal based on the first read request address, the first read data length, and the prefetch information, and generate the first read data request based on the read request address signal, the read data length signal, and the request type signal.

在一些實施方式中,可以將第一讀請求位址編碼為第一讀資料請求中的讀請求位址訊號;將第一讀資料長度編碼為第一讀資料請求中的讀資料長度訊號;並基於預設的請求類型訊號指示的請求類型與預取資訊之間的對應關係,確定預取資訊對應的請求類型訊號,將該請求類型訊號作為第一讀資料請求的請求類型訊號。In some implementations, the first read request address can be encoded as a read request address signal in the first read data request; the first read data length can be encoded as a read data length signal in the first read data request; and based on the correspondence between the request type indicated by a preset request type signal and the prefetch information, a request type signal corresponding to the prefetch information is determined, and the request type signal is used as the request type signal of the first read data request.

在一些實施方式中,預取資訊包括預取資料長度,可以基於預設的請求類型訊號指示的請求類型與預取資料長度之間的對應關係,確定預取資料長度對應的請求類型訊號,將該請求類型訊號作為第一讀資料請求的請求類型訊號。In some embodiments, the prefetch information includes the prefetch data length. Based on the correspondence between the request type indicated by a preset request type signal and the prefetch data length, a request type signal corresponding to the prefetch data length can be determined, and the request type signal can be used as the request type signal of the first read data request.

步驟S462,根據所述第一讀請求位址、所述第一讀資料長度和所述預取資訊,分別確定讀請求位址訊號、讀資料長度訊號和預取控制訊號,並根據所述讀請求位址訊號、所述讀資料長度訊號和所述預取控制訊號,生成所述第一讀資料請求。In step S462, a read request address signal, a read data length signal, and a prefetch control signal are respectively determined based on the first read request address, the first read data length, and the prefetch information, and the first read data request is generated based on the read request address signal, the read data length signal, and the prefetch control signal.

在一些實施方式中,可以基於預設的預取控制訊號與預取資訊之間的對應關係,確定預取資訊對應的預取控制訊號。In some implementations, the prefetch control signal corresponding to the prefetch information may be determined based on a preset correspondence between the prefetch control signal and the prefetch information.

在一些實施方式中,預取資訊包括預取資料長度,可以基於預設的預取控制訊號與預取資料長度之間的對應關係,確定預取資料長度對應的預取控制訊號。In some embodiments, the prefetch information includes the prefetch data length, and the prefetch control signal corresponding to the prefetch data length can be determined based on a preset correspondence between the prefetch control signal and the prefetch data length.

在一些實施例中,上述步驟S401可以包括如下步驟S471至步驟S473: 步驟S471,獲取當前任務的讀取資料量和預估計算時長。 In some embodiments, step S401 may include the following steps S471 to S473: Step S471: Obtain the read data volume and estimated calculation duration for the current task.

這裡,可以對當前任務的計算時長進行預估,得到預估計算時長。在實施時,可以採用任意合適的預估方式,對當前任務的計算時長進行預估,本發明實施例對此並不限定。Here, the calculation time of the current task can be estimated to obtain the estimated calculation time. During implementation, any suitable estimation method can be used to estimate the calculation time of the current task, and the embodiment of the present invention is not limited to this.

在一些實施方式中,可以確定當前任務的計算複雜度,基於該計算複雜度對當前任務的計算時長進行預估,得到預估計算時長。In some implementations, the computational complexity of the current task may be determined, and the computational duration of the current task may be estimated based on the computational complexity to obtain an estimated computational duration.

當前任務的讀取資料量指的是執行當前任務所需要讀取的資料量。在一些實施方式中,可以在當前任務的任務資訊中攜帶該讀取資料量。在另一些實施方式中,可以在對當前任務的資料讀取需求進行分析,得到該讀取資料量。The data read amount of the current task refers to the amount of data required to execute the current task. In some implementations, the data read amount can be included in the task information of the current task. In other implementations, the data read amount can be obtained by analyzing the data read requirements of the current task.

步驟S472,根據所述讀取資料量和所述預估計算時長,確定所述預取資訊。Step S472: Determine the pre-fetch information based on the read data volume and the estimated calculation time.

在一些實施方式中,對從快取模組中讀取與該讀取資料量相等的資料所需的時長進行預估,得到預估讀取時長;根據該預估讀取時長與預估計算時長之間的大小關係,確定預取資訊。In some implementations, the time required to read data equal to the read data amount from the cache module is estimated to obtain an estimated read time; and prefetch information is determined based on the relationship between the estimated read time and the estimated calculation time.

步驟S473,根據所述預取資訊,確定所述讀資料指令。Step S473: Determine the data read instruction based on the prefetch information.

可以理解的是,由於當前任務的讀取資料量和預估計算時長可以分別反映當前任務的執行過程中的資料訪存耗時情況以及資料計算耗時情況,而資料預取可以在一定程度上提升資料讀取過程的效率,減少資料讀取過程中的耗時。因此,可以根據讀取資料量和預估計算時長,可以判斷當前任務的執行場景是否需要進行資料讀取伴隨資料預取的操作,從而,可以根據讀取資料量和預估計算時長為當前任務的執行場景確定合適的預取資訊,進而確定合適的讀資料指令,以更好地平衡任務執行過程中的計算耗時和資料訪存耗時。It's understandable that, since the current task's data read volume and estimated computation time can respectively reflect the data access and computation time during the current task's execution, data prefetching can, to a certain extent, improve the efficiency of the data read process and reduce the time consumed during data reads. Therefore, based on the data read volume and estimated computation time, it's possible to determine whether data reads accompanied by data prefetching are necessary for the current task's execution scenario. Consequently, appropriate prefetch information and, consequently, data read instructions can be determined for the current task's execution scenario based on the data read volume and estimated computation time, thereby better balancing computation time and data access time during task execution.

在一些實施例中,上述步驟S472可以包括如下步驟S481至步驟S482: 步驟S481,確定在開啟資料預取功能的情況下與所述讀取資料量對應的第一預估讀取時長、以及在關閉所述資料預取功能的情況下與所述讀取資料量對應的第二預估讀取時長; 步驟S482,根據所述預估計算時長與所述第一預估讀取時長和/或所述第二預估讀取時長之間的大小關係,確定所述預取資訊。 In some embodiments, step S472 may include the following steps S481 to S482: Step S481: Determining a first estimated reading time corresponding to the amount of data to be read when the data prefetching function is enabled, and a second estimated reading time corresponding to the amount of data to be read when the data prefetching function is disabled; Step S482: Determining the prefetch information based on a relationship between the estimated calculation time and the first estimated reading time and/or the second estimated reading time.

這裡,開啟資料預取功能的情況指的是採用讀取資料伴隨預取資料的方式進行資料讀取,也即讀資料指令中攜帶的預取資訊指示進行資料預取的情況。關閉據預取功能的情況指的是採用讀取資料未伴隨預取資料的方式進行資料讀取,也即讀資料指令中攜帶的預取資訊指示不進行資料預取的情況。Here, enabling data prefetching refers to reading data while prefetching it, i.e., the prefetch information carried in the read instruction instructs data prefetching. Disabling data prefetching refers to reading data without prefetching it, i.e., the prefetch information carried in the read instruction instructs data prefetching not to be performed.

可以理解的是,第一預估讀取時長可以反映在開啟資料預取功能的情況下執行當前任務所需的資料訪存耗時,第二預估讀取時長可以反映在關閉資料預取功能的情況下執行當前任務所需的資料訪存耗時。因而,根據預估計算時長與第一預估讀取時長和/或第二預估讀取時長之間的大小關係,可以確定當前任務的執行場景更適合開啟資料預取功能還是關閉資料預取功能,從而可以確定合適的預取資訊來平衡任務執行過程中的計算耗時和資料訪存耗時。It's understood that the first estimated read time can reflect the data access time required to execute the current task when data prefetching is enabled, while the second estimated read time can reflect the data access time required to execute the current task when data prefetching is disabled. Therefore, based on the relationship between the estimated computation time and the first and/or second estimated read times, it can be determined whether data prefetching is more suitable for the current task's execution scenario. This allows for the determination of appropriate prefetch information to balance computation time and data access time during task execution.

在一些實施例中,上述步驟S482可以包括:在所述預估計算時長大於所述第一預估讀取時長且小於所述第二預估讀取時長的情況下,確定所述預取資訊為表徵開啟所述資料預取功能的第一預取資訊。In some embodiments, the above-mentioned step S482 may include: when the estimated calculation time is greater than the first estimated reading time and less than the second estimated reading time, determining that the prefetch information is the first prefetch information indicating that the data prefetch function is enabled.

這樣,由於在預估計算時長大於第一預估讀取時長且小於第二預估讀取時長的情況下,表明在開啟資料預取功能的情況下執行當前任務所需的資料訪存耗時小於當前任務的預估計算時長,且在關閉資料預取功能的情況下執行當前任務所需的資料訪存耗時大於當前任務的預估計算時長,從而在該場景下藉由開啟資料預取功能可以減少資料讀取耗時對計算效率的影響。因此,在該場景下可以確定預取資訊為表徵開啟資料預取功能的第一預取資訊,以減少資料讀取耗時對計算效率的影響,從而更好地平衡任務執行過程中的計算耗時和資料訪存耗時。Therefore, since the estimated calculation time is greater than the first estimated read time and less than the second estimated read time, this indicates that when data prefetching is enabled, the data access time required to execute the current task is less than the estimated calculation time of the current task, and when data prefetching is disabled, the data access time required to execute the current task is greater than the estimated calculation time of the current task. Therefore, in this scenario, enabling data prefetching can reduce the impact of data read time on computing efficiency. Therefore, in this scenario, the prefetch information can be determined as the first prefetch information indicating that the data prefetch function is enabled, thereby reducing the impact of data reading time on computing efficiency, thereby better balancing computing time and data access time during task execution.

在一些實施例中,上述步驟S482可以包括:在所述預估計算時長小於或等於所述第一預估讀取時長的情況下,確定所述預取資訊為表徵開啟所述資料預取功能的第一預取資訊,或者確定所述預取資訊為表徵關閉所述資料預取功能的第二預取資訊。In some embodiments, the above-mentioned step S482 may include: when the estimated calculation time is less than or equal to the first estimated reading time, determining that the prefetch information is the first prefetch information indicating that the data prefetch function is turned on, or determining that the prefetch information is the second prefetch information indicating that the data prefetch function is turned off.

這樣,由於在預估計算時長小於或等於第一預估讀取時長的情況下,表明在開啟資料預取功能的情況下執行當前任務所需的資料訪存耗時不小於當前任務的預估計算時長,從而在該場景下藉由開啟資料預取功能可以在一定程度上減少資料讀取耗時對計算效率的影響,但仍會存在由於資料讀取耗時較長對整體任務執行時長的影響。因此,在該場景下可以確定預取資訊為表徵開啟資料預取功能的第一預取資訊,以減少資料讀取耗時對計算效率的影響,從而更好地平衡任務執行過程中的計算耗時和資料訪存耗時,也可以確定預取資訊為表徵關閉資料預取功能的第二預取資訊,以簡化資料讀取的過程。Therefore, since the estimated calculation time is less than or equal to the first estimated read time, it indicates that the data access time required to execute the current task with data prefetching enabled is at least as long as the estimated calculation time of the current task. Therefore, in this scenario, enabling data prefetching can reduce the impact of data read time on calculation efficiency to a certain extent. However, the impact of longer data read time on the overall task execution time will still exist. Therefore, in this scenario, the prefetch information can be determined as the first prefetch information indicating that the data prefetch function is enabled, thereby reducing the impact of data read time on computing efficiency, thereby better balancing computing time and data access time during task execution. The prefetch information can also be determined as the second prefetch information indicating that the data prefetch function is disabled, thereby simplifying the data read process.

在一些實施例中,上述步驟S482可以包括:在所述預估計算時長大於或等於所述第二預估讀取時長的情況下,確定所述預取資訊為表徵關閉所述資料預取功能的第二預取資訊。In some embodiments, the above-mentioned step S482 may include: when the estimated calculation duration is greater than or equal to the second estimated reading duration, determining that the prefetch information is second prefetch information indicating that the data prefetch function is disabled.

這樣,由於在預估計算時長大於或等於第二預估讀取時長的情況下,表明在關閉資料預取功能的情況下執行當前任務所需的資料訪存耗時不超過當前任務的預估計算時長,從而在該場景下關閉資料預取功能也不會導致資料讀取耗時對計算效率產生的影響。因此,在該場景下可以確定預取資訊為表徵關閉資料預取功能的第二預取資訊,以簡化資料讀取的過程。In this case, since the estimated computation time is greater than or equal to the second estimated read time, this indicates that the data access time required to execute the current task with data prefetching disabled does not exceed the estimated computation time of the current task. Therefore, disabling data prefetching in this scenario does not affect computational efficiency due to data read time. Therefore, in this scenario, the prefetch information can be determined as the second prefetch information indicating that data prefetching is disabled, thereby simplifying the data read process.

在一些實施方式中,預取資訊包括預取開關狀態,第一預取資訊可以為表徵進行資料預取的開啟狀態,第二預取資訊可以為表徵不進行資料預取的關閉狀態。In some implementations, the prefetch information includes a prefetch switch state. The first prefetch information may be an on state indicating that data prefetching is performed, and the second prefetch information may be an off state indicating that data prefetching is not performed.

在一些實施方式中,預取資訊包括預取資料長度,第一預取資訊可以為大於0的整數,第一預取資訊可以為0。In some implementations, the prefetch information includes the prefetch data length, the first prefetch information may be an integer greater than 0, or the first prefetch information may be 0.

可以理解的是,在預取資訊為第一預取資訊的情況下,讀資料指令中的指令類型編碼或預取控制編碼指示開啟資料預取;在預取資訊為第二預取資訊的情況下,讀資料指令中的指令類型編碼或預取控制編碼指示關閉資料預取。It can be understood that when the prefetch information is the first prefetch information, the instruction type code or prefetch control code in the read data instruction indicates that data prefetching is turned on; when the prefetch information is the second prefetch information, the instruction type code or prefetch control code in the read data instruction indicates that data prefetching is turned off.

在一些實施例中,上述步驟S481可以包括如下步驟S491至步驟S493: 步驟S491,對向所述快取模組發送單次請求的時延、所述快取模組返回單筆資料的時延、所述快取模組向記憶體發送單次請求的時延、以及所述記憶體向所述快取模組返回單筆資料的時延進行估計,分別得到第一預估時延、第二預估時延、第三預估時延和第四預估時延。 In some embodiments, step S481 may include the following steps S491 to S493: Step S491: Estimate the latency of sending a single request to the cache module, the latency of the cache module returning a single piece of data, the latency of the cache module sending a single request to the memory, and the latency of the memory returning a single piece of data to the cache module, respectively obtaining a first estimated latency, a second estimated latency, a third estimated latency, and a fourth estimated latency.

這裡,可以採用任意合適的方式對向快取模組發送請求以及快取模組返回資料的時延進行預估,本發明實施例對此並不限定。Here, any appropriate method can be used to estimate the delay in sending a request to the cache module and the delay in the cache module returning data, and the embodiments of the present invention are not limited to this.

在一些實施方式中,第一讀資料請求的請求源可以為晶片中的計算模組,可以藉由對計算模組向快取模組發送單次請求的歷史時延、快取模組向計算模組返回單筆資料的歷史時延、快取模組向記憶體發送單次請求的歷史時延、以及記憶體向所述快取模組返回單筆資料的歷史時延分別進行統計,得到第一預估時延、第二預估時延、第三預估時延和第四預估時延。In some embodiments, the request source of the first data read request may be a computing module in a chip. By collecting statistics on the historical latency of the computing module sending a single request to the cache module, the historical latency of the cache module returning a single data to the computing module, the historical latency of the cache module sending a single request to the memory, and the historical latency of the memory returning a single data to the cache module, a first estimated latency, a second estimated latency, a third estimated latency, and a fourth estimated latency may be obtained.

在一些實施方式中,可以基於計算模組向快取模組發送單次請求的資料量、以及計算模組與快取模組之間的資料頻寬,對計算模組向快取模組發送單次請求的時延進行估計,得到第一預估時延;可以基於快取模組向計算模組返回單筆資料的資料量、以及計算模組與快取模組之間的資料頻寬,對快取模組向計算模組返回單筆資料的時延進行估計,得到第二預估時延;可以基於快取模組向記憶體發送單次請求的資料量、以及快取模組與記憶體之間的資料頻寬,對快取模組向記憶體發送單次請求的時延進行估計,得到第三預估時延;可以基於記憶體向所述快取模組返回單筆資料的資料量、以及快取模組與記憶體之間的資料頻寬,對記憶體向所述快取模組返回單筆資料的時延進行估計,得到第四預估時延。In some embodiments, the latency of a single request from the computing module to the cache module can be estimated based on the amount of data sent by the computing module to the cache module and the data bandwidth between the computing module and the cache module, thereby obtaining a first estimated latency. The latency of a single request from the cache module to the computing module can be estimated based on the amount of data returned by the cache module to the computing module and the data bandwidth between the computing module and the cache module, thereby obtaining a second estimated latency. The second estimated latency can be obtained by estimating the latency for the cache module to send a single request to the memory based on the amount of data sent by the cache module to the memory in a single request and the data bandwidth between the cache module and the memory. The fourth estimated latency can be obtained by estimating the latency for the memory to return a single data entry to the cache module based on the amount of data returned by the memory to the cache module and the data bandwidth between the cache module and the memory.

步驟S492,將所述當前任務的讀取資料量與目標頻寬之間的第一比值、所述第一預估時延、以及所述第二預估時延之和,確定為所述第一預估讀取時長;所述目標頻寬為所述第一讀資料請求的請求源和所述快取模組之間的資料頻寬。In step S492, the first estimated read duration is determined as the sum of a first ratio between the read data volume of the current task and a target bandwidth, the first estimated latency, and the second estimated latency; the target bandwidth is the data bandwidth between the request source of the first read data request and the cache module.

步驟S493,將所述第一比值、所述第一預估時延、所述第二預估時延、所述第三預估時延、以及所述第四預估時延之和,確定為第二預估讀取時長。In step S493, the sum of the first ratio, the first estimated delay, the second estimated delay, the third estimated delay, and the fourth estimated delay is determined as a second estimated reading duration.

上述實施例中,基於當前任務的讀取資料量與目標頻寬之間的第一比值、第一預估時延、以及第二預估時延之和,得到第一預估讀取時長;基於第一比值、第一預估時延、第二預估時延、第三預估時延、以及第四預估時延之和,得到第二預估讀取時長;在預估計算時長大於第一預估讀取時長並小於第二預估讀取時長的情況下,確定預取資訊為表徵開啟資料預取功能的第一預取資訊。這樣,一方面,第一預估讀取時長為當前任務的讀取資料量與目標頻寬之間的第一比值、計算模組向快取模組發送單次請求的第一預估時延、以及快取模組向計算模組返回單筆資料的第二預估時延之和,因此,該第一預估讀取時長可以表示當前任務中資料讀取的預估耗時下限;另一方面,第二預估讀取時長為當前任務的讀取資料量與目標頻寬之間的第一比值、計算模組向快取模組發送單次請求的第一預估時延、快取模組向計算模組返回單筆資料的第二預估時延、快取模組向記憶體發送單次請求的第三預估時延、以及記憶體向快取模組返回單筆資料的第四預估時延之和,因此,該第一預估讀取時長可以表示當前任務中資料讀取的預估耗時上限。如此,在當前任務的預估計算時長大於第一預估讀取時長並小於第二預估讀取時長的情況下,可以藉由確定預取資訊為表徵開啟資料預取功能的第一預取資訊,使得可以藉由單次資料請求實現資料的讀取和預取,從而減少請求源與快取模組之間的資料請求壓力,提高系統性能,並提高快取命中率,進而減少當前任務中資料讀取的耗時,以使資料讀取的耗時更多地被計算時長掩蓋,也即減少資料讀取耗時對計算效率的影響,從而更好地平衡任務執行過程中的計算耗時和資料訪存耗時。In the above embodiment, the first estimated reading duration is obtained based on the first ratio between the read data volume of the current task and the target bandwidth, the first estimated delay, and the sum of the second estimated delay; the second estimated reading duration is obtained based on the sum of the first ratio, the first estimated delay, the second estimated delay, the third estimated delay, and the fourth estimated delay; when the estimated calculation duration is greater than the first estimated reading duration and less than the second estimated reading duration, the prefetch information is determined to be the first prefetch information indicating that the data prefetch function is enabled. In this way, on the one hand, the first estimated reading time is the sum of the first ratio between the amount of data to be read in the current task and the target bandwidth, the first estimated delay for the computing module to send a single request to the cache module, and the second estimated delay for the cache module to return a single data to the computing module. Therefore, the first estimated reading time can represent the lower limit of the estimated time consumption for data reading in the current task; on the other hand, the second estimated reading time is the sum of the first ratio between the amount of data to be read in the current task and the target bandwidth, the first estimated delay for the computing module to send a single request to the cache module, and the second estimated delay for the cache module to return a single data to the computing module. The first estimated read duration is the sum of the first ratio between the data volume and the target bandwidth, the first estimated latency for the compute module to send a single request to the cache module, the second estimated latency for the cache module to return a single piece of data to the compute module, the third estimated latency for the cache module to send a single request to the memory, and the fourth estimated latency for the memory to return a single piece of data to the cache module. Therefore, the first estimated read duration can represent the estimated upper limit of the data reading time in the current task. In this way, when the estimated calculation time of the current task is greater than the first estimated reading time and less than the second estimated reading time, the prefetch information can be determined as the first prefetch information indicating that the data prefetch function is enabled, so that data reading and prefetching can be achieved through a single data request, thereby reducing the data request pressure between the request source and the cache module, improving system performance, and improving the cache hit rate, thereby reducing the data reading time in the current task, so that the data reading time is more covered by the calculation time, that is, reducing the impact of the data reading time on the computing efficiency, thereby better balancing the computing time and data access time during task execution.

下面說明本發明實施例提供的資料讀取方法在實際場景中的應用,以藉由在軟硬體聯合的基礎上,利用指令控制的讀請求伴隨資料預取過程為例進行以下說明。The following describes the application of the data reading method provided by the embodiment of the present invention in a practical scenario, using an example of a data prefetch process accompanied by instruction-controlled read requests based on a combination of software and hardware.

處理器訪存性能優化是電腦體系結構中的熱點內容。針對由於記憶體(例如記憶體等)的存取速率和處理器中運算模組的計算速率不匹配而帶來的延遲問題,可以藉由在處理器中設計快取模組,以使得資料讀寫的速率適應計算模組的計算速率,提高系統性能。圖5A為本發明實施例提供的一種資料快取的實現架構示意圖,如圖5A所示,該架構中包括:計算模組511、暫存器512、快取模組513和記憶體514。計算模組511、快取模組513和記憶體514之間可以進行資料的輸入和/或輸出。計算模組511可以從快取模組513、暫存器512和/或記憶體514中讀取資料進行運算,並將運算結果儲存在快取模組513、暫存器512和/或記憶體514中。Optimizing processor memory access performance is a hot topic in computer architecture. To address the delay problem caused by the mismatch between the access rate of memory (such as memory, etc.) and the computing rate of the computing module in the processor, a cache module can be designed in the processor to make the data read and write rate adapt to the computing rate of the computing module, thereby improving system performance. Figure 5A is a schematic diagram of a data cache implementation architecture provided by an embodiment of the present invention. As shown in Figure 5A, the architecture includes: computing module 511, register 512, cache module 513 and memory 514. Data can be input and/or output between the computing module 511, cache module 513 and memory 514. The computing module 511 can read data from the cache module 513 , the register 512 , and/or the memory 514 , perform operations, and store the operation results in the cache module 513 , the register 512 , and/or the memory 514 .

相關技術中,處理器大多採用資料預取技術,藉由預測未來的記憶體存取,並在顯式存取之前發出對應記憶體塊的存取請求,提前將記憶體塊中的資料取到快取模組中,從而提高快取的命中率。Among related technologies, most processors use data prefetching technology. By predicting future memory accesses and issuing access requests for corresponding memory blocks before explicit access, the data in the memory blocks can be fetched into the cache module in advance, thereby improving the cache hit rate.

在一些相關技術中,可以採用軟體預取的方式實現資料預取。在該軟體預取的方式中,處理器支援預取指令,處理器中運行的應用軟體可以藉由預取指令的形式向處理器硬體發送待進行資料預取的位址,以指示快取模組基於該位址進行資料預取。例如,在有循環計算的場景下,應用軟體可以根據處理器中的硬體資源,對單次循環額外多分配1倍、2倍甚至4倍的暫存器空間,用於儲存預取到的資料;在啟動循環計算前,可以將當前循環之後多次的讀資料請求提前發出,以實現資料的預取,並將預取到的資料儲存至額外多分配的暫存器空間中。這樣,在該軟體預取的方式中,儘管只需要使用普通的讀資料請求,不需要硬體架構額外的支援,但是需要分配的片上暫存器空間是原本需要使用的1至4倍。此外,在並行處理器上,將藉由增加單任務消耗的暫存器數量導致任務並行能力下降,而且此問題在片上暫存器資源不充裕的顯存架構上尤為明顯。另外,下游的請求通路和資料返回通路並不能準確區分預取的請求與當前循環的請求,在多處理器並行架構下,有可能部分處理器的預取請求會阻塞其他處理器的正常訪存請求,使得可並行任務數量下降。In some related technologies, data prefetching can be implemented using software prefetching. In this software prefetching method, the processor supports prefetch instructions. The application software running in the processor can send the address of the data to be prefetched to the processor hardware in the form of a prefetch instruction, instructing the cache module to prefetch data based on the address. For example, in a loop-based computing scenario, application software can allocate 1, 2, or even 4 times the amount of register space per loop based on the processor's hardware resources to store prefetched data. Before starting the loop, multiple read requests following the current loop can be issued in advance to prefetch data, and the prefetched data can be stored in the additional allocated register space. In this way, although this software prefetching method only requires ordinary read requests and does not require additional hardware architecture support, the required on-chip register space is 1 to 4 times the original amount. Furthermore, on parallel processors, this will reduce task concurrency by increasing the amount of registers consumed by a single task. This problem is particularly pronounced in graphics memory architectures with limited on-chip register resources. Furthermore, downstream request paths and data return paths cannot accurately distinguish between prefetch requests and requests from the current loop. In a multi-processor parallel architecture, prefetch requests from some processors may block normal memory access requests from other processors, reducing the number of tasks that can be run in parallel.

在一些相關技術中,可以採用硬體預取的方式實現資料的自動預取。在該硬體預取的方式中,可以對快取模組進行硬體功能擴展,當一筆資料請求發送到該快取模組進行快取塊位址命中查詢時,快取模組可以自動根據預設的預取策略確定待進行資料預取的位址,並根據該位址進行資料預取。例如快取模組可以自動將讀請求位址對應的快取單元的下一個或幾個快取單元(記為預取塊)也進行命中查詢,如果預取塊缺失,則發送請求到下級快取或者記憶體。這樣,在該方式中雖然無需請求源單獨發送讀請求和預取請求,並且簡化用戶程式設計複雜度;但是不能逐任務控制預取,部分場景存在預取資料將正在使用的資料替換出快取的問題,導致性能比不適用該方案還差。In some related technologies, hardware prefetching can be used to automatically prefetch data. This hardware prefetching approach involves extending the cache module's hardware functionality. When a data request is sent to the cache module for a cache block address hit query, the cache module automatically determines the address to be prefetched based on a pre-set prefetch strategy and prefetches data based on that address. For example, the cache module can automatically perform a hit query on the next cache unit or units (referred to as prefetch blocks) immediately following the cache unit corresponding to the read request address. If the prefetch block is missing, the request is forwarded to a lower-level cache or memory. This approach eliminates the need for the request source to send separate read and prefetch requests, simplifying user programming complexity. However, prefetching cannot be controlled on a per-task basis. In some scenarios, prefetched data may replace in-use data in the cache, resulting in worse performance than if this approach were not used.

基於此,本發明實施例提供一種資料讀取方法,採用硬體和軟體相結合的方式,藉由將後續循環中使用的資料預取後快取在快取模組的儲存空間中,這樣,不存在額外的暫存器使用開銷,從而不影響並行任務的分配數量。而且藉由讀資料請求攜帶預取資料長度到快取模組後解析拆開,不會帶來額外的請求通路壓力,從而可以壓縮請求頻寬,並能逐任務藉由指令配置預取資料長度,以靈活方便地進行指令細微性的預取控制,更好地滿足任務的預取需求。此外,該資料讀取方法還可以相容多種快取模式,如多級快取模式、扇區模式等。Based on this, an embodiment of the present invention provides a data reading method that uses a combination of hardware and software. By prefetching data used in subsequent loops and caching it in the cache module's storage space, there is no additional register usage overhead, which does not affect the number of parallel tasks allocated. Moreover, by carrying the prefetch data length with the read data request to the cache module for parsing and decomposition, no additional request path pressure is introduced, thereby compressing the request bandwidth. The prefetch data length can be configured by instruction on a task-by-task basis, allowing for flexible and convenient prefetch control of detailed instructions to better meet the prefetch requirements of tasks. In addition, the data reading method is also compatible with multiple cache modes, such as multi-level cache mode and sector mode.

在一些實施例中,本發明實施例提供的資料讀取方法在實施時可以考慮以下六個部分: 第一,對指令集進行設計。 In some embodiments, the data reading method provided by the present invention may be implemented with the following six aspects in mind: First, designing the instruction set.

這裡,可以在指令集中擴展支援預取控制的讀資料指令。本發明實施例中,擴展的讀資料指令複用相關技術中普通讀指令的讀請求位址與讀資料長度,並擴展預取資料長度的修飾。這樣,可以使用預取資料長度對以讀請求位址加讀資料長度為預取起始位址的待預取資料的資料長度做描述,無需單獨的預取指令或者在指令集增加預取起始位址的傳輸。並且使用該擴展的讀資料指令時也無需額外的預取起始位址計算開銷和預取起始位址傳輸開銷。Here, a read data instruction that supports prefetch control can be extended in the instruction set. In an embodiment of the present invention, the extended read data instruction reuses the read request address and read data length of the ordinary read instruction in the related art, and extends the modification of the prefetch data length. In this way, the prefetch data length can be used to describe the data length of the data to be prefetched with the read request address plus the read data length as the prefetch start address, without the need for a separate prefetch instruction or adding the transmission of the prefetch start address to the instruction set. In addition, when using the extended read data instruction, there is no need for additional prefetch start address calculation overhead and prefetch start address transmission overhead.

在一些實施方式中,在讀資料指令中擴展預取資料長度的修飾的方式可以從以下兩種方式選擇:1)類型編碼擴充:如果當前架構指令類型有充裕保留編碼,可將指令類型區分為讀指令(即僅讀類型)或讀攜帶預取指令;2)新增位域:增加額外位域(對應預取控制編碼)表示預取資料長度,可以保留0編碼用於指示關閉預取(即不進行資料預取)。例如,這兩種擴展預取資料長度的方式可以如下表1所示。In some implementations, the modification method for extending the prefetch data length in read data instructions can be selected from the following two methods: 1) Type code expansion: If the current architecture has sufficient reserved code for instruction types, the instruction type can be distinguished as a read instruction (i.e., read-only type) or a read-with-prefetch instruction; 2) Adding a bit field: An additional bit field (corresponding to the prefetch control code) is added to indicate the prefetch data length, and a code of 0 can be reserved to indicate that prefetching is disabled (i.e., no data prefetching is performed). For example, these two methods for extending the prefetch data length can be shown in Table 1 below.

表1 兩種擴展預取資料長度的方式示意表 編碼支援方式 擴展前的編碼示例 擴展預取資料長度的編碼示例 類型編碼擴充 0’b101:讀 0’b110:保留 0’b111:保留 0’b101:讀 0’b110:讀且預取後一快取塊0’b111:讀且預取後二快取塊 新增位域 預取控制編碼: 0’b0:關閉預取 0’b1:預取下一記憶體塊 Table 1 Schematic diagram of two methods for extending the prefetch data length Encoding support method Encoding example before expansion Example of coding for extending the prefetch data length Type encoding extension 0'b101: Read 0'b110: Reserved 0'b111: Reserved 0'b101: Read 0'b110: Read and prefetch the next cache block 0'b111: Read and prefetch the next two cache blocks New bit fields without Prefetch control code: 0'b0: disable prefetch 0'b1: prefetch next memory block

在一些實施方式中,為對指令集做前向相容,預取控制編碼位域的默認值配置為表徵關閉預取的編碼。In some implementations, to be forward compatible with the instruction set, the default value of the prefetch control encoding bit field is configured to indicate that prefetch is disabled.

第二,對快取模組的介面訊號進行設計。Second, design the interface signals of the cache module.

這裡,快取模組的介面訊號的設計與指令集設計類似。在實施時,快取模組的介面訊號中可以保留原有的讀請求位址、讀資料長度,擴展預取資料長度的修飾。請求源僅需將預取資料長度伴隨讀資料請求發送給下游快取模組,即可將預取資料長度傳遞到快取模組,供快取模組解析拆分。The cache module interface signal design is similar to instruction set design. In implementation, the cache module interface signal can retain the existing read request address and read data length, while extending the prefetch data length modification. The request source simply sends the prefetch data length along with the read data request to the downstream cache module, which then passes the prefetch data length to the cache module for parsing and splitting.

快取模組的介面訊號中擴展的預取資料長度的修飾與指令集設計類似,支援兩類:1)類型訊號編碼擴充:若當前架構請求類型有充裕保留編碼,可藉由請求類型訊號直接區分僅讀請求或讀請求攜帶預取;2)新增預取控制訊號:可以藉由增加額外的預取控制訊號來指示預取資料長度,預取控制訊號可以保留0編碼用於指示關閉預取(即不進行資料預取)。The extended prefetch data length modification in the cache module interface signals is similar to the instruction set design, supporting two categories: 1) Type signal encoding extension: If the current architecture has sufficient reserved encoding for the request type, the request type signal can be used to directly distinguish between read-only requests and read requests with prefetch; 2) New prefetch control signal: The prefetch data length can be indicated by adding an additional prefetch control signal. The prefetch control signal can reserve the encoding of 0 to indicate that prefetch is disabled (i.e., no data prefetch is performed).

在一些實施方式中,介面訊號的設計還可以考慮控制資訊的壓縮以增加資訊熵。In some implementations, the design of interface signals may also consider compression of control information to increase information entropy.

第三,單次讀請求資料的請求的資料量影響。Third, the amount of data requested in a single read request affects the data.

在一些實施方式中,送入快取模組的讀資料請求會按:請求拆分、快取塊對齊、命中檢測的順序進行處理,最終向記憶體請求資料,並返回讀取的資料給上游的請求源、和/或將預取的資料快取至快取模的快取塊中。In some implementations, read requests sent to the cache module are processed in the following order: request splitting, cache block alignment, and hit detection. The module ultimately requests data from memory and returns the read data to the upstream request source and/or caches pre-fetched data into the cache block of the cache module.

請求拆分:快取模組的入口處將對讀資料長度超過一個快取塊長度的讀資料請求進行按快取塊長度的拆分並對齊操作。對於讀請求位址和讀資料長度對應的位址區間,藉由拆分並對齊操作後,可以得到拆分後的多個讀取子請求。對於預取起始位址(讀請求位址加上讀資料長度後得到)與預取資料長度對應的位址區間,藉由拆分並對齊操作後,可以得到拆分後的多個預取子請求。對於各讀取子請求和各預取子請求,可以獨立進行快取塊命中檢查。Request Splitting: At the entry point of the cache module, read requests whose data length exceeds one cache block length are split and aligned according to the cache block length. The address range corresponding to the read request address and the read data length is split and aligned to obtain multiple read sub-requests. The address range corresponding to the prefetch start address (obtained by adding the read data length to the read request address) and the prefetch data length is split and aligned to obtain multiple prefetch sub-requests. Each read sub-request and prefetch sub-request can be independently checked for cache block hits.

快取塊對齊:快取塊為快取模組中基本的資料分配和管理單元。若請求源發送的單筆讀資料請求對應待讀取的資料量不滿一個快取塊,快取模組將對讀資料請求對應的讀請求位址和讀資料長度對應的位址區間按快取塊長度進行對齊。若請求源發送的單筆讀資料請求對應待預取的資料量不滿一個快取塊,快取模組將對預取起始位址與預取資料長度對應的位址區間按快取塊長度進行對齊。Cache block alignment: A cache block is the basic unit of data allocation and management in the cache module. If the amount of data to be read from a single read request sent by the request source does not fit within a cache block, the cache module will align the address range corresponding to the read request address and the read data length according to the cache block length. If the amount of data to be prefetched from a single read request sent by the request source does not fit within a cache block, the cache module will align the address range corresponding to the prefetch start address and the prefetch data length according to the cache block length.

快取塊命中檢測:拆分後的讀取子請求和預取子請求均會被送入快取塊命中檢查單元進行快取塊命中檢測。如果某個讀取子請求或預取子請求發生快取塊缺失,快取模組將為該缺失的讀取子請求或預取子請求分配快取空間,記錄和更新該讀取子請求或預取子請求,並將缺失的讀取子請求或預取子請求傳輸到下游快取模組或記憶體,並從下游快取模組或記憶體中獲取讀取的資料(對應第一目標資料)或預取的資料(對應第二目標資料),讀取的資料會被返回給請求源,預取的資料會被快取至快取模組的至少一個快取塊中。Cache block hit detection: The split read sub-request and prefetch sub-request are both sent to the cache block hit detection unit for cache block hit detection. If a cache block miss occurs in a read sub-request or prefetch sub-request, the cache module allocates cache space for the missed read sub-request or prefetch sub-request, records and updates the read sub-request or prefetch sub-request, transmits the missed read sub-request or prefetch sub-request to a downstream cache module or memory, and obtains the read data (corresponding to the first target data) or prefetched data (corresponding to the second target data) from the downstream cache module or memory. The read data is returned to the request source, and the prefetched data is cached in at least one cache block of the cache module.

第四,與快取模組支援扇區的模式相容。Fourth, it must be compatible with the cache module's sector-supported mode.

在一些實施方式中,在當前架構對快取支持更細細微性的控制的情況下,允許單次請求是快取塊的子集,比如一個快取塊為128字節,有四個快取扇區(Sector),允許請求源按照32字節(即一個快取扇區的長度)的倍數向快取模組發出讀資料請求,快取塊仍然按照快取塊長度(如128字節)分配。在這種情況下,讀資料請求中對應的預取資料長度可以為快取扇區長度的倍數。In some implementations, where the current architecture supports more granular control over cache, a single request is permitted to be a subset of a cache block. For example, a cache block is 128 bytes and has four cache sectors. This allows the request source to issue read requests to the cache module in multiples of 32 bytes (i.e., the length of a cache sector), while cache blocks are still allocated based on the cache block length (e.g., 128 bytes). In this case, the corresponding prefetched data length in the read request can be a multiple of the cache sector length.

第五,與多級快取模式相容。Fifth, it is compatible with multi-level cache mode.

對於多級快取模式的場景,可以考慮請求合併、以及上游快取模組全命中或部分命中的情況。In a multi-level cache scenario, you can consider request merging and situations where the upstream cache module has a full or partial hit.

請求合併的情況:本發明實施例提供的資料讀取方法可以支援請求合併。例如,多級快取中的請求源或上游快取模組發送的每一筆讀資料請求都攜帶預取資料長度,可以由下游快取模組對對應的快取塊/扇區重合的多個讀資料請求進行合併,合併後的請求中僅攜帶一個預取資料長度。由於以讀請求位址加讀資料長度作為預取起始位址,合併後的讀資料請求對應的預取資料會落在第一筆讀資料請求的讀請求位址、到最靠後的一筆讀資料請求的讀請求位址加讀資料長度以及預取資料長度後位址之間的位址區間內。Request merging: The data reading method provided by the embodiments of the present invention supports request merging. For example, in a multi-level cache, each read request sent by a request source or upstream cache module carries a prefetched data length. The downstream cache module can merge multiple read requests for overlapping cache blocks/sectors, with the merged request carrying only a single prefetched data length. Because the prefetch start address is the read request address plus the read data length, the prefetch data corresponding to the merged read request will fall within the address range between the read request address of the first read request, the read request address of the last read request plus the read data length, and the address following the prefetch data length.

上游快取模組全命中或部分命中的情況:本發明實施例提供的資料讀取方法中,若請求源發送的讀資料請求對應的多個讀取子請求在上游快取模組中部分命中,則未命中的部分讀取子請求可以將預取資料長度攜帶至下游快取模組;若請求源發送的讀資料請求對應的多個讀取子請求在上游快取模組中全部命中,則對該多個讀取子請求在上游快取模組中分別快取的第一子目標資料進行合併,得到該讀資料請求對應的第一目標資料,將該第一目標資料返回給請求源,並將預取資料長度丟棄。Situation of full or partial hit in the upstream cache module: In the data reading method provided by the embodiment of the present invention, if multiple read sub-requests corresponding to the read data request sent by the request source partially hit in the upstream cache module, the read sub-requests that missed can carry the pre-fetched data length to the downstream cache module; if multiple read sub-requests corresponding to the read data request sent by the request source all hit in the upstream cache module, the first sub-target data cached in the upstream cache module for the multiple read sub-requests are merged to obtain the first target data corresponding to the read data request, and the first target data is returned to the request source, and the pre-fetched data length is discarded.

第六,場景遴選。Sixth, scene selection.

根據計算和訪存的特徵,通常將任務分為運算密集和訪存密集兩類。運算規模與計算並行能力(即吞吐率)的商,被作為計算需要消耗的時間,而計算需要的資料量與訪存頻寬的商被定義為訪存需要消耗的時間。例如,在任務的計算時間大於訪存時間的情況下,可以將該任務劃分為計算密集型任務;在任務的計算時間小於訪存時間的情況下,可以將該任務劃分為訪存密集型任務。Based on the characteristics of computation and memory access, tasks are typically categorized as either compute-intensive or memory-intensive. The quotient of computational scale and computational parallelism (i.e., throughput) is used as computation time, while the quotient of the amount of data required for computation and memory access bandwidth is defined as memory access time. For example, if a task's computation time is greater than its memory access time, it can be classified as compute-intensive; if its computation time is less than its memory access time, it can be classified as memory-intensive.

圖5B為本發明實施例提供的一種資料讀取方法中各模組之間的交互示意圖。如圖5B所示,請求源521向快取模組522發送表徵讀請求攜帶預取的讀資料請求;快取模組522回應於接收到該讀資料請求,可以向記憶體523發送讀取請求和預取請求,並接收記憶體523返回的讀取資料和預取資料;快取模組522將接收到的讀取資料返回至請求源521,並將預取資料快取至該快取模組522的儲存空間中。其中,可以把從請求源521請求資料到讀資料返回的時間拆成四個階段:延遲0至3。延遲0表示請求源521向快取模組522發送單次請求的預估時延,延遲1表示快取模組522向記憶體523發送單次請求的預估時延,延遲2表示記憶體523向快取模組522返回單筆資料的預估時延,延遲3表示快取模組522向請求源521返回單筆資料的預估時延。Figure 5B is a schematic diagram illustrating the interaction between the various modules in a data reading method provided by an embodiment of the present invention. As shown in Figure 5B , a request source 521 sends a read request, representing a read request with prefetching, to a cache module 522. In response to receiving the read request, cache module 522 can send a read request and a prefetch request to memory 523, and receive the read data and prefetched data returned by memory 523. Cache module 522 returns the received read data to request source 521 and caches the prefetched data into cache module 522's storage space. The time from request source 521 requesting data to the return of the read data can be divided into four phases: delays 0 to 3. Delay 0 represents the estimated latency for request source 521 to send a single request to cache module 522. Delay 1 represents the estimated latency for cache module 522 to send a single request to memory 523. Delay 2 represents the estimated latency for memory 523 to return a single piece of data to cache module 522. Delay 3 represents the estimated latency for cache module 522 to return a single piece of data to request source 521.

在不進行預取的情況下,讀資料的時間=當前任務的請求資料量/目標頻寬+延遲0+延遲1+延遲2+延遲3;其中,目標頻寬為請求源521與快取模組522之間的資料頻寬。Without prefetching, the data reading time = the amount of data requested by the current task / target bandwidth + delay 0 + delay 1 + delay 2 + delay 3; where target bandwidth is the data bandwidth between the request source 521 and the cache module 522.

在進行預取的情況下,由於提前將需要的資料預取回了快取,再次請求該資料的時候會在快取模組命中,並直接返回,因而不會因快取缺失觸發向記憶體的請求延遲,從而不用考慮延遲1和延遲2。因此,在進行預取的情況下,讀資料的時間=當前任務的請求資料量/目標頻寬+延遲0+延遲3。When prefetching, the required data is pre-fetched into the cache. When the data is requested again, it will be found in the cache module and returned directly. Therefore, there is no request delay to memory due to cache misses, and Delay 1 and Delay 2 do not need to be considered. Therefore, when prefetching, the time to read data = the amount of data requested by the current task / the target bandwidth + Delay 0 + Delay 3.

在一些實施方式中,可以在當前任務的預估計算時長滿足如下條件的情況下,採用指令類型編碼或預取控制編碼指示開啟資料預取(即預取資料長度大於0)的讀資料指令,以減少資料讀取耗時對計算效率的影響,從而更好地平衡任務執行過程中的計算耗時和資料訪存耗時: 當前任務的請求資料量/目標頻寬+延遲0+延遲3 < 當前任務的預估計算時長 < 當前任務的請求資料量/目標頻寬+延遲0+延遲1+延遲2+延遲3。 In some implementations, if the estimated computation time of the current task meets the following conditions, an instruction type code or prefetch control code can be used to indicate that data prefetching (i.e., the prefetch data length is greater than 0) is enabled for a read instruction. This reduces the impact of data read time on computation efficiency, thereby better balancing computation time and data access time during task execution: Current task requested data amount/target bandwidth + Delay 0 + Delay 3 < Current task estimated computation time < Current task requested data amount/target bandwidth + Delay 0 + Delay 1 + Delay 2 + Delay 3.

而若當前任務的預估計算時長>當前任務的請求資料量/目標頻寬+延遲0+延遲1+延遲2+延遲3,可以採用指令類型編碼或預取控制編碼指示關閉資料預取(即預取資料長度等於0)的讀資料指令。If the estimated calculation duration of the current task exceeds the current task's requested data volume/target bandwidth + Delay 0 + Delay 1 + Delay 2 + Delay 3, a read instruction can be used to disable data prefetching (i.e., the prefetch data length is equal to 0) using either the instruction type code or the prefetch control code.

以下將藉由一個每次循環請求128字節,循環連續取記憶體中的資料的場景作為例子,介紹本發明實施例提供的資料讀取方法的應用。The following describes the application of the data reading method provided by the embodiment of the present invention by taking a scenario of continuously accessing data from the memory in a loop with 128 bytes requested per loop as an example.

定義請求資料位址為128字節對齊,每次讀取請求可以請求128字節資料(即讀資料長度為128字節),預取資料長度為128字節。圖5C為本發明實施例提供的一種攜帶預取資料長度的讀資料請求的實現示意圖。如圖5C所示,假設下一次循環前,預取的資料已經返回快取中,則在不同循環中的處理過程如下: 首次循環為循環0,循環0中請求源請求起始位址的128字節(快取塊0)和預取的128字節(快取塊1); 循環1中請求源請求快取塊1,快取塊1可以在快取模組命中,以更短的延遲返回至請求源,並向下游快取模組或記憶體發送預取快取塊2的請求; 循環2中請求源請求快取塊2,快取塊2可以在快取模組命中,以更短的延遲返回至請求源,並向下游快取模組或記憶體發送預取快取塊3的請求。 The request data address is defined as 128-byte aligned. Each read request can request 128 bytes of data (i.e., the read data length is 128 bytes), and the prefetch data length is 128 bytes. Figure 5C is a schematic diagram of an implementation of a read data request with prefetch data length provided by an embodiment of the present invention. As shown in Figure 5C, assuming the prefetched data has already been returned to the cache before the next loop, the processing in different loops is as follows: The first loop is loop 0. In loop 0, the request source requests the 128-byte starting address (cache block 0) and the prefetched 128-byte data (cache block 1). In loop 1, the request source requests cache block 1. Cache block 1 can hit the cache module and is returned to the request source with a shorter latency. A request to prefetch cache block 2 is then sent to the downstream cache module or memory. In loop 2, the request source requests cache block 2. Cache block 2 can be hit in the cache module, returned to the request source with a shorter latency, and a request to prefetch cache block 3 is sent to the downstream cache module or memory.

循環3中請求源請求快取塊3,快取塊3可以在快取模組命中,以更短的延遲返回至請求源,循環結束。In loop 3, the request source requests cache block 3. Cache block 3 can be hit in the cache module and returned to the request source with a shorter delay, and the loop ends.

可見,在循環連續取記憶體中的資料的場景中,在流水線滿載的情況下,增加預取功能的支持後,每一次的讀資料請求所請求讀取的資料都會在快取中命中,以更短的延遲返回至請求源。As can be seen, in the scenario of continuously accessing data from memory in a loop, when the pipeline is fully loaded, after adding support for the prefetch function, the data requested by each read request will be hit in the cache and returned to the request source with shorter latency.

本發明實施例提供一種晶片。圖6為本發明實施例提供的一種晶片的組成結構示意圖。如圖6所示,晶片600包括計算模組610和快取模組620,其中: 計算模組610,用於:獲取讀資料指令;所述讀資料指令用於指示按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取;根據所述讀資料指令獲取第一讀資料請求;所述第一讀資料請求中包含所述讀取資訊和所述預取資訊;向快取模組發送所述第一讀資料請求,以使所述快取模組回應於所述第一讀資料請求,按照所述讀取資訊讀取並返回第一目標資料、以及按照所述預取資訊預取並快取第二目標資料;接收所述快取模組返回的所述第一目標資料; 快取模組620,用於:接收所述計算模組發送的第一讀資料請求,所述第一讀資料請求用於請求按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取;根據所述第一讀資料請求,獲取所述讀取資訊和所述預取資訊;基於所述讀取資訊,從所述快取模組的儲存空間、和/或記憶體中讀取第一目標資料,並將所述第一目標資料返回至所述計算模組;基於所述預取資訊,從所述記憶體中預取第二目標資料,並將所述第二目標資料快取至所述快取模組的至少一個快取單元中。 FIG6 is a schematic diagram showing the structure of a chip according to an embodiment of the present invention. As shown in Figure 6, chip 600 includes a computing module 610 and a cache module 620, wherein: Computing module 610 is configured to: obtain a read instruction; the read instruction is configured to instruct data reading according to read information and data prefetching according to prefetch information; obtain a first read request based on the read instruction; the first read request includes the read information and the prefetch information; send the first read request to the cache module, so that the cache module responds to the first read request by reading and returning first target data according to the read information and prefetching and caching second target data according to the prefetch information; and receive the first target data returned by the cache module; The cache module 620 is configured to: receive a first read request sent by the computing module, the first read request being used to request data reading according to read information and data prefetching according to prefetch information; obtain the read information and prefetch information based on the first read request; read first target data from the cache module's storage space and/or memory based on the read information, and return the first target data to the computing module; and prefetch second target data from the memory based on the prefetch information, and cache the second target data into at least one cache unit of the cache module.

以上處理器實施例的描述,與上述方法實施例的描述是類似的,具有同方法實施例相似的有益效果。在一些實施例中,本發明實施例提供的處理器具有的功能或包含的模組可以用於執行上述方法實施例描述的方法,對於本發明處理器實施例中未披露的技術細節,請參照本發明方法實施例的描述而理解。The description of the processor embodiment described above is similar to the description of the method embodiment described above, and has similar beneficial effects as the method embodiment. In some embodiments, the functions or modules included in the processor provided by the embodiments of the present invention can be used to execute the methods described in the method embodiments described above. For technical details not disclosed in the processor embodiment of the present invention, please refer to the description of the method embodiment of the present invention for understanding.

本發明實施例提供一種電腦設備,包括記憶體和處理器,所述記憶體儲存有可在處理器上運行的電腦程式,所述處理器執行所述程式時實現上述方法中的部分或全部步驟,或者所述處理器包括上述實施例中所述的晶片。An embodiment of the present invention provides a computer device comprising a memory and a processor, wherein the memory stores a computer program that can be run on the processor, and when the processor executes the program, some or all of the steps in the above method are implemented, or the processor comprises the chip described in the above embodiment.

本發明實施例提供一種電腦可讀儲存媒體,其上儲存有電腦程式,該電腦程式被處理器執行時實現上述方法中的部分或全部步驟。所述電腦可讀儲存媒體可以是暫態性的,也可以是非暫態性的。An embodiment of the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements some or all of the steps in the above method. The computer-readable storage medium may be transient or non-transient.

本發明實施例提供一種電腦程式,包括電腦可讀代碼,在所述電腦可讀代碼在電腦設備中運行的情況下,所述電腦設備中的處理器執行用於實現上述方法中的部分或全部步驟。An embodiment of the present invention provides a computer program, comprising computer-readable codes. When the computer-readable codes are executed in a computer device, a processor in the computer device executes the program to implement part or all of the steps in the above method.

本發明實施例提供一種電腦程式產品,所述電腦程式產品包括儲存了電腦程式的非暫態性電腦可讀儲存媒體,所述電腦程式被電腦讀取並執行時,實現上述方法中的部分或全部步驟。該電腦程式產品可以具體藉由硬體、軟體或其結合的方式實現。在一些實施例中,所述電腦程式產品具體體現為電腦儲存媒體,在另一些實施例中,電腦程式產品具體體現為軟體產品,例如軟體發展包(Software Development Kit,SDK)等等。Embodiments of the present invention provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program. When the computer program is read and executed by a computer, some or all of the steps of the above-described method are implemented. The computer program product can be implemented in hardware, software, or a combination thereof. In some embodiments, the computer program product is embodied as a computer storage medium. In other embodiments, the computer program product is embodied as a software product, such as a software development kit (SDK).

這裡需要指出的是:上文對各個實施例的描述傾向於強調各個實施例之間的不同之處,其相同或相似之處可以互相參考。以上設備、儲存媒體、電腦程式及電腦程式產品實施例的描述,與上述方法實施例的描述是類似的,具有同方法實施例相似的有益效果。對於本發明設備、儲存媒體、電腦程式及電腦程式產品實施例中未披露的技術細節,請參照本發明方法實施例的描述而理解。It should be noted that the above descriptions of the various embodiments tend to emphasize the differences between them, and reference can be made to the similarities and similarities between them. The descriptions of the above embodiments of the apparatus, storage medium, computer program, and computer program product are similar to the descriptions of the above-mentioned method embodiments and have similar beneficial effects as the method embodiments. For technical details not disclosed in the apparatus, storage medium, computer program, and computer program product embodiments of the present invention, please refer to the descriptions of the method embodiments of the present invention for an understanding.

圖7為本發明實施例提供的一種電腦設備的硬體實體示意圖,如圖7所示,該電腦設備700的硬體實體包括:處理器701和記憶體702,其中,記憶體702儲存有可在處理器701上運行的電腦程式,處理器701執行程式時實現上述任一實施例的方法中的步驟。FIG7 is a schematic diagram of the hardware structure of a computer device according to an embodiment of the present invention. As shown in FIG7 , the hardware structure of the computer device 700 includes a processor 701 and a memory 702. The memory 702 stores a computer program that can be executed on the processor 701. When the processor 701 executes the program, the steps of the method according to any of the above embodiments are implemented.

記憶體702儲存有可在處理器上運行的電腦程式,記憶體702配置為儲存由處理器701可執行的指令和應用,還可以快取待處理器701以及電腦設備700中各模組待處理或已經處理的資料(例如,圖像資料、音訊資料、語音通訊資料和視訊通訊資料),可以藉由快閃記憶體(FLASH)或隨機存取記憶體(Random Access Memory,RAM)實現。Memory 702 stores computer programs that can be run on the processor. Memory 702 is configured to store instructions and applications that can be executed by processor 701. It can also cache data to be processed or processed by processor 701 and various modules in computer device 700 (for example, image data, audio data, voice communication data, and video communication data). This can be achieved through flash memory (FLASH) or random access memory (RAM).

處理器701執行程式時實現上述任一項的資料讀取方法的步驟。處理器701通常控制電腦設備700的總體操作。When the processor 701 executes a program, it implements the steps of any of the above-mentioned data reading methods. The processor 701 generally controls the overall operation of the computer device 700.

本發明實施例提供一種電腦儲存媒體,電腦儲存媒體儲存有一個或者多個程式,該一個或者多個程式可被一個或者多個處理器執行,以實現如上任一實施例的資料讀取方法的步驟。An embodiment of the present invention provides a computer storage medium storing one or more programs. The one or more programs can be executed by one or more processors to implement the steps of the data reading method of any of the above embodiments.

這裡需要指出的是:以上儲存媒體和設備實施例的描述,與上述方法實施例的描述是類似的,具有同方法實施例相似的有益效果。對於本發明儲存媒體和設備實施例中未披露的技術細節,請參照本發明方法實施例的描述而理解。It should be noted that the description of the above storage medium and device embodiments is similar to the description of the above method embodiments and has similar beneficial effects as the method embodiments. For technical details not disclosed in the storage medium and device embodiments of the present invention, please refer to the description of the method embodiments of the present invention for understanding.

上述處理器可以為目標用途積體電路(Application Specific Integrated Circuit,ASIC)、數位訊號處理器(Digital Signal Processor,DSP)、數位訊號處理裝置(Digital Signal Processing Device,DSPD)、可程式化邏輯裝置(Programmable Logic Device,PLD)、現場可程式化閘陣列(Field Programmable Gate Array,FPGA)、中央處理器(Central Processing Unit,CPU)、控制器、微控制器、微處理器中的至少一種。可以理解地,實現上述處理器功能的電子器件還可以為其它,本發明實施例不作具體限定。The processor can be at least one of an application-specific integrated circuit (ASIC), a digital signal processor (DSP), a digital signal processing device (DSPD), a programmable logic device (PLD), a field programmable gate array (FPGA), a central processing unit (CPU), a controller, a microcontroller, and a microprocessor. It is understood that the electronic device implementing the functions of the processor can also be other types, and the embodiments of the present invention are not specifically limited thereto.

上述電腦儲存媒體/記憶體可以是唯讀記憶體(Read Only Memory,ROM)、可程式化唯讀記憶體(Programmable Read-Only Memory,PROM)、可擦除可程式化唯讀記憶體(Erasable Programmable Read-Only Memory,EPROM)、電可擦除可程式化唯讀記憶體(Electrically Erasable Programmable Read-Only Memory,EEPROM)、磁性隨機存取記憶體(Ferromagnetic Random Access Memory,FRAM)、快閃記憶體(Flash Memory)、磁表面記憶體、光碟、或唯讀光碟(Compact Disc Read-Only Memory,CD-ROM)等記憶體;也可以是包括上述記憶體之一或任意組合的各種終端,如行動電話、電腦、平板設備、個人數位助理等。The computer storage medium/memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), a magnetic random access memory (FRAM), a flash memory, a magnetic surface area memory, an optical disc, or a compact disc read-only disc (CDR). Memory, CD-ROM) and other memories; it can also be various terminals that include one or any combination of the above memories, such as mobile phones, computers, tablet devices, personal digital assistants, etc.

應理解,說明書通篇中提到的「一個實施例」或「一實施例」意味著與實施例有關的特定特徵、結構或特性包括在本發明的至少一個實施例中。因此,在整個說明書各處出現的「在一個實施例中」或「在一實施例中」未必一定指相同的實施例。此外,這些特定的特徵、結構或特性可以任意適合的方式結合在一個或多個實施例中。應理解,在本發明的各種實施例中,上述各步驟/過程的序號的大小並不意味著執行順序的先後,各步驟/過程的執行順序應以其功能和內在邏輯確定,而不應對本發明實施例的實施過程構成任何限定。上述本發明實施例序號僅僅為了描述,不代表實施例的優劣。It should be understood that "one embodiment" or "an embodiment" mentioned throughout the specification means that the specific features, structures or characteristics related to the embodiment are included in at least one embodiment of the present invention. Therefore, "in one embodiment" or "in an embodiment" appearing throughout the specification does not necessarily refer to the same embodiment. In addition, these specific features, structures or characteristics can be combined in one or more embodiments in any suitable manner. It should be understood that in the various embodiments of the present invention, the size of the serial numbers of the above-mentioned steps/processes does not mean the order of execution. The execution order of each step/process should be determined according to its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present invention. The above-mentioned serial numbers of the embodiments of the present invention are for description only and do not represent the advantages and disadvantages of the embodiments.

需要說明的是,在本文中,術語「包括」、「包含」或者其任何其他變體意在涵蓋非排他性的包含,從而使得包括一系列要素的過程、方法、物品或者裝置不僅包括那些要素,而且還包括沒有明確列出的其他要素,或者是還包括為這種過程、方法、物品或者裝置所固有的要素。在沒有更多限制的情況下,由語句「包括一個……」限定的要素,並不排除在包括該要素的過程、方法、物品或者裝置中還存在另外的相同要素。It should be noted that, in this document, the terms "comprise," "include," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements includes not only those elements but also other elements not explicitly listed, or elements inherent to such process, method, article, or apparatus. In the absence of further limitations, an element defined by the phrase "comprises a..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes the element.

在本發明所提供的幾個實施例中,應該理解到,所揭露的設備和方法,可以藉由其它的方式實現。以上所描述的設備實施例僅僅是示意性的,例如,所述單元的劃分,僅僅為一種邏輯功能劃分,實際實現時可以有另外的劃分方式,如:多個單元或元件可以結合,或可以集成到另一個系統,或一些特徵可以忽略,或不執行。另外,所顯示或討論的各組成部分相互之間的耦合、或直接耦合、或通訊連接可以是藉由一些介面,設備或單元的間接耦合或通訊連接,可以是電性的、機械的或其它形式的。In the several embodiments provided by the present invention, it should be understood that the disclosed devices and methods can be implemented in other ways. The device embodiments described above are only schematic. For example, the division of the units is only a logical functional division. In actual implementation, there may be other division methods, such as: multiple units or components can be combined, or can be integrated into another system, or some features can be ignored or not executed. In addition, the coupling, direct coupling, or communication connection between the components shown or discussed can be through some interfaces, and the indirect coupling or communication connection of the devices or units can be electrical, mechanical or other forms.

上述作為分離部件說明的單元可以是、或也可以不是物理上分開的,作為單元顯示的部件可以是、或也可以不是物理單元;既可以位於一個地方,也可以分佈到多個網路單元上;可以根據實際的需要選擇其中的部分或全部單元來實現本實施例方案的目的。The units described above as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; they may be located in one place or distributed across multiple network units; some or all of the units may be selected according to actual needs to achieve the purpose of the present embodiment.

另外,在本發明各實施例中的各功能單元可以全部集成在一個處理單元中,也可以是各單元分別單獨作為一個單元,也可以兩個或兩個以上單元集成在一個單元中;上述集成的單元既可以採用硬體的形式實現,也可以採用硬體加軟體功能單元的形式實現。本領域普通技術人員可以理解:實現上述方法實施例的全部或部分步驟可以藉由程式指令相關的硬體來完成,前述的程式可以儲存於電腦可讀取儲存媒體中,該程式在執行時,執行包括上述方法實施例的步驟;而前述的儲存媒體包括:移動存放裝置、唯讀記憶體(Read Only Memory,ROM)、磁碟或者光碟等各種可以儲存程式碼的媒體。In addition, the various functional units in the various embodiments of the present invention may all be integrated into a single processing unit, each unit may be a separate unit, or two or more units may be integrated into a single unit. The aforementioned integrated units may be implemented in the form of hardware or a combination of hardware and software functional units. A person skilled in the art will appreciate that all or part of the steps of the aforementioned method embodiments may be performed by hardware associated with program instructions. The aforementioned program may be stored in a computer-readable storage medium. When executed, the program performs the steps of the aforementioned method embodiments. The aforementioned storage medium includes various media capable of storing program code, such as a mobile storage device, a read-only memory (ROM), a magnetic disk, or an optical disk.

或者,本發明上述集成的單元如果以軟體功能模組的形式實現並作為獨立的產品銷售或使用時,也可以儲存在一個電腦可讀取儲存媒體中。基於這樣的理解,本發明的技術方案本質上或者說對相關技術做出貢獻的部分可以以軟體產品的形式體現出來,該電腦軟體產品儲存在一個儲存媒體中,包括若干指令用以使得一台電腦設備(可以是個人電腦、伺服器、或者網路設備等)執行本發明各個實施例所述方法的全部或部分。而前述的儲存媒體包括:移動存放裝置、ROM、磁碟或者光碟等各種可以儲存程式碼的媒體。Alternatively, if the integrated unit of the present invention is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the portion that contributes to the relevant technology, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes a number of instructions for enabling a computer device (which can be a personal computer, server, or network device, etc.) to execute all or part of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as mobile storage devices, ROMs, magnetic disks, or optical disks.

以上所述,僅為本發明的實施方式,但本發明的保護範圍並不局限於此,任何熟悉本技術領域的技術人員在本發明揭露的技術範圍內,可輕易想到的變化或替換,都應涵蓋在本發明的保護範圍之內。The above is only an embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any changes or substitutions that can be easily thought of by any technical personnel familiar with this technical field within the technical scope disclosed in the present invention should be included in the scope of protection of the present invention.

S101、S102、S103、S104、S111、S112、S113、S114、S115、S121、S122、S123、S125、S201、S202、S203、S204、S205、S211、S212、S213、S214、S221、S231、S232、S301、S302、S303、S304、S305、S306、S311、S401、S402、S403、S404、S411、S421、S423、S431、S432、S441、S442、S443、S451、S452、S461、S462、S471、S472、S473、S481、S482、S491、S492、S493:步驟 511、610:計算模組 512:暫存器 513、522、620:快取模組 514、523、702:記憶體 521:請求源 600:晶片 700:電腦設備 701:處理器 S101, S102, S103, S104, S111, S112, S113, S114, S115, S121, S122, S123, S125, S201, S202, S203, S204, S205, S211, S212, S213, S214, S221, S231, S232, S301, S302, S303, S 304, S305, S306, S311, S401, S402, S403, S404, S411, S421, S423, S431, S432, S441, S442, S443, S451, S452, S461, S462, S471, S472, S473, S481, S482, S491, S492, S493: Steps 511, 610: Computing module 512: Register 513, 522, 620: Cache module 514, 523, 702: Memory 521: Request source 600: Chip 700: Computer device 701: Processor

此處的附圖被併入說明書中並構成本說明書的一部分,這些附圖示出了符合本發明的實施例,並與說明書一起用於說明本發明的技術方案。The drawings herein are incorporated into and constitute a part of the specification. These drawings illustrate embodiments consistent with the present invention and, together with the specification, are used to illustrate the technical solutions of the present invention.

圖1為本發明實施例提供的一種資料讀取方法的實現流程示意圖。FIG1 is a schematic diagram of an implementation process of a data reading method provided by an embodiment of the present invention.

圖2為本發明實施例提供的一種資料讀取方法的實現流程示意圖。FIG2 is a schematic diagram of an implementation process of a data reading method provided by an embodiment of the present invention.

圖3為本發明實施例提供的一種資料讀取方法的實現流程示意圖。FIG3 is a schematic diagram of an implementation process of a data reading method provided by an embodiment of the present invention.

圖4為本發明實施例提供的一種資料讀取方法的實現流程示意圖。FIG4 is a schematic diagram of an implementation process of a data reading method provided by an embodiment of the present invention.

圖5A為本發明實施例提供的一種資料快取的實現架構示意圖。FIG5A is a schematic diagram of an implementation architecture of a data cache provided by an embodiment of the present invention.

圖5B為本發明實施例提供的一種資料讀取方法中各模組之間的交互示意圖。FIG5B is a schematic diagram illustrating the interaction between modules in a data reading method provided by an embodiment of the present invention.

圖5C為本發明實施例提供的一種攜帶預取資料長度的讀資料請求的實現示意圖。FIG5C is a schematic diagram illustrating an implementation of a read data request carrying pre-fetched data length provided by an embodiment of the present invention.

圖6為本發明實施例提供的一種晶片的組成結構示意圖。FIG6 is a schematic diagram of the composition structure of a chip provided by an embodiment of the present invention.

圖7為本發明實施例提供的一種電腦設備的硬體實體示意圖。FIG7 is a schematic diagram of a hardware structure of a computer device provided in an embodiment of the present invention.

S401、S402、S403、S404:步驟 S401, S402, S403, S404: Steps

Claims (23)

一種用於晶片的資料讀取方法,所述資料讀取方法包括以下步驟: 獲取讀資料指令;所述讀資料指令用於指示按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取; 根據所述讀資料指令獲取第一讀資料請求;所述第一讀資料請求中包含所述讀取資訊和所述預取資訊; 向快取模組發送所述第一讀資料請求,以使所述快取模組回應於所述第一讀資料請求,按照所述讀取資訊讀取並返回第一目標資料、以及按照所述預取資訊預取並快取第二目標資料; 接收所述快取模組返回的所述第一目標資料。 A data reading method for a chip comprises the following steps: Obtaining a read instruction; the read instruction instructing data reading according to read information and data prefetching according to prefetch information; Obtaining a first read request based on the read instruction; the first read request including the read information and the prefetch information; Sending the first read request to a cache module, causing the cache module to respond to the first read request by reading and returning first target data according to the read information and prefetching and caching second target data according to the prefetch information; Receiving the first target data returned by the cache module. 如請求項1所述之資料讀取方法,其中,所述讀取資訊包括第一讀請求位址和第一讀資料長度;所述獲取讀資料指令,包括以下步驟: 根據所述第一讀請求位址、所述第一讀資料長度和所述預取資訊進行指令編碼,得到所述讀資料指令。 The data reading method of claim 1, wherein the read information includes a first read request address and a first read data length; and obtaining the read data instruction comprises the following steps: Encoding the instruction based on the first read request address, the first read data length, and the prefetch information to obtain the read data instruction. 如請求項2所述之資料讀取方法,其中,所述讀資料指令中包括指令類型編碼、讀請求位址編碼和讀資料長度編碼; 所述根據所述第一讀請求位址、所述第一讀資料長度和所述預取資訊進行指令編碼,得到所述讀資料指令,包括以下步驟: 從預設的指令類型編碼集合中,確定與所述預取資訊對應的所述指令類型編碼; 對所述第一讀請求位址進行編碼,得到所述讀請求位址編碼; 對所述第一讀資料長度進行編碼,得到所述讀資料長度編碼。 The data reading method of claim 2, wherein the read data instruction includes an instruction type code, a read request address code, and a read data length code; Encoding the instruction based on the first read request address, the first read data length, and the prefetch information to obtain the read data instruction comprises the following steps: Determining the instruction type code corresponding to the prefetch information from a preset set of instruction type codes; Encoding the first read request address to obtain the read request address code; Encoding the first read data length to obtain the read data length code. 如請求項3所述之資料讀取方法,其中,所述根據所述讀資料指令獲取第一讀資料請求,包括以下步驟: 對所述指令類型編碼、所述讀請求位址編碼和所述讀資料長度編碼分別進行解析,得到所述預取資訊、所述第一讀請求位址和所述第一讀資料長度; 根據所述預取資訊、所述第一讀請求位址和所述第一讀資料長度,生成所述第一讀資料請求。 The data reading method of claim 3, wherein obtaining a first read data request based on the read data instruction comprises the following steps: Parsing the instruction type code, the read request address code, and the read data length code to obtain prefetch information, the first read request address, and the first read data length; Generating the first read data request based on the prefetch information, the first read request address, and the first read data length. 如請求項2所述之資料讀取方法,其中,所述讀資料指令中包括讀請求位址編碼、讀資料長度編碼和預取控制編碼; 所述根據所述第一讀請求位址、所述第一讀資料長度和所述預取資訊進行指令編碼,得到所述讀資料指令,包括以下步驟: 對所述第一讀請求位址進行編碼,得到所述讀請求位址編碼; 對所述第一讀資料長度進行編碼,得到所述讀資料長度編碼; 對所述預取資訊進行編碼,得到所述預取控制編碼。 The data reading method of claim 2, wherein the read data instruction includes a read request address code, a read data length code, and a prefetch control code; Encoding the instruction based on the first read request address, the first read data length, and the prefetch information to obtain the read data instruction comprises the following steps: Encoding the first read request address to obtain the read request address code; Encoding the first read data length to obtain the read data length code; Encoding the prefetch information to obtain the prefetch control code. 如請求項5所述之資料讀取方法,其中,所述根據所述讀資料指令獲取第一讀資料請求,包括以下步驟: 對所述讀請求位址編碼、所述讀資料長度編碼和所述預取控制編碼分別進行解析,得到所述第一讀請求位址、所述第一讀資料長度和所述預取資訊; 根據所述預取資訊、所述第一讀請求位址和所述第一讀資料長度,生成所述第一讀資料請求。 The data reading method of claim 5, wherein obtaining a first read data request based on the read data instruction comprises the following steps: Parsing the read request address code, the read data length code, and the prefetch control code to obtain the first read request address, the first read data length, and the prefetch information; Generating the first read data request based on the prefetch information, the first read request address, and the first read data length. 如請求項4或6所述之資料讀取方法,其中,所述根據所述預取資訊、所述第一讀請求位址和所述第一讀資料長度,生成所述第一讀資料請求,包括以下步驟之一: 根據所述第一讀請求位址、所述第一讀資料長度和所述預取資訊,分別確定讀請求位址訊號、讀資料長度訊號和請求類型訊號,並根據所述讀請求位址訊號、所述讀資料長度訊號和所述請求類型訊號,生成所述第一讀資料請求; 根據所述第一讀請求位址、所述第一讀資料長度和所述預取資訊,分別確定所述讀請求位址訊號、所述讀資料長度訊號和預取控制訊號,並根據所述讀請求位址訊號、所述讀資料長度訊號和所述預取控制訊號,生成所述第一讀資料請求。 The data reading method of claim 4 or 6, wherein generating the first read data request based on the prefetch information, the first read request address, and the first read data length comprises one of the following steps: Determining a read request address signal, a read data length signal, and a request type signal based on the first read request address, the first read data length, and the prefetch information, respectively, and generating the first read data request based on the read request address signal, the read data length signal, and the request type signal; The read request address signal, the read data length signal, and the prefetch control signal are determined based on the first read request address, the first read data length, and the prefetch information, and the first read data request is generated based on the read request address signal, the read data length signal, and the prefetch control signal. 如請求項1至6中任一項所述之資料讀取方法,其中,所述獲取讀資料指令,包括以下步驟: 獲取當前任務的讀取資料量和預估計算時長; 根據所述讀取資料量和所述預估計算時長,確定所述預取資訊; 根據所述預取資訊,確定所述讀資料指令。 The data reading method of any one of claims 1 to 6, wherein obtaining the data reading instruction comprises the following steps: Obtaining the amount of data to be read and the estimated computation time of the current task; Determining the prefetch information based on the amount of data to be read and the estimated computation time; Determining the data reading instruction based on the prefetch information. 如請求項8所述之資料讀取方法,其中,所述根據所述讀取資料量和所述預估計算時長,確定所述預取資訊,包括以下步驟: 確定在開啟資料預取功能的情況下與所述讀取資料量對應的第一預估讀取時長、以及在關閉所述資料預取功能的情況下與所述讀取資料量對應的第二預估讀取時長; 根據所述預估計算時長與所述第一預估讀取時長和/或所述第二預估讀取時長之間的大小關係,確定所述預取資訊。 The data reading method of claim 8, wherein determining the prefetch information based on the read data amount and the estimated calculation time comprises the following steps: Determining a first estimated reading time corresponding to the read data amount when the data prefetch function is enabled, and a second estimated reading time corresponding to the read data amount when the data prefetch function is disabled; Determining the prefetch information based on a relationship between the estimated calculation time and the first estimated reading time and/or the second estimated reading time. 如請求項9所述之資料讀取方法,其中,所述根據所述預估計算時長與所述第一預估讀取時長和/或所述第二預估讀取時長之間的大小關係,確定所述預取資訊,包括以下步驟: 在所述預估計算時長大於所述第一預估讀取時長且小於所述第二預估讀取時長的情況下,確定所述預取資訊為表徵開啟所述資料預取功能的第一預取資訊。 The data reading method of claim 9, wherein determining the prefetch information based on the relationship between the estimated calculation time and the first estimated reading time and/or the second estimated reading time comprises the following steps: If the estimated calculation time is greater than the first estimated reading time and less than the second estimated reading time, determining the prefetch information as first prefetch information indicating activation of the data prefetch function. 如請求項8所述之資料讀取方法,其中,所述確定在開啟資料預取功能的情況下與所述讀取資料量對應的第一預估讀取時長、以及在關閉所述資料預取功能的情況下與所述讀取資料量對應的第二預估讀取時長,包括以下步驟: 對向所述快取模組發送單次請求的時延、所述快取模組返回單筆資料的時延、所述快取模組向記憶體發送所述單次請求的時延、以及所述記憶體向所述快取模組返回所述單筆資料的時延進行估計,分別得到第一預估時延、第二預估時延、第三預估時延和第四預估時延; 將所述當前任務的讀取資料量與目標頻寬之間的第一比值、所述第一預估時延、以及所述第二預估時延之和,確定為所述第一預估讀取時長;所述目標頻寬為所述第一讀資料請求的請求源與所述快取模組之間的資料頻寬; 將所述第一比值、所述第一預估時延、所述第二預估時延、所述第三預估時延、以及所述第四預估時延之和,確定為所述第二預估讀取時長。 The data reading method of claim 8, wherein determining a first estimated reading duration corresponding to the amount of data to be read when a data prefetch function is enabled, and a second estimated reading duration corresponding to the amount of data to be read when the data prefetch function is disabled, comprises the following steps: Estimating a latency for sending a single request to the cache module, a latency for the cache module to return a single piece of data, a latency for the cache module to send the single request to a memory, and a latency for the memory to return the single piece of data to the cache module, to obtain a first estimated latency, a second estimated latency, a third estimated latency, and a fourth estimated latency, respectively; The first estimated read duration is determined as the sum of a first ratio between the read data volume of the current task and the target bandwidth, the first estimated latency, and the second estimated latency; the target bandwidth is the data bandwidth between the request source of the first read request and the cache module; The second estimated read duration is determined as the sum of the first ratio, the first estimated latency, the second estimated latency, the third estimated latency, and the fourth estimated latency. 一種用於晶片的資料讀取方法,所述資料讀取方法包括以下步驟: 接收請求源發送的第一讀資料請求,所述第一讀資料請求用於請求按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取; 根據所述第一讀資料請求,獲取所述讀取資訊和所述預取資訊; 基於所述讀取資訊,從快取模組的儲存空間、和/或記憶體中讀取第一目標資料,並將所述第一目標資料返回至所述請求源; 基於所述預取資訊,從所述記憶體中預取第二目標資料,並將所述第二目標資料快取至所述快取模組的至少一個快取單元中。 A data reading method for a chip comprises the following steps: Receiving a first read data request sent by a request source, the first read data request being used to request data reading according to read information and data prefetching according to prefetch information; According to the first read data request, obtaining the read information and prefetch information; Based on the read information, reading first target data from storage space and/or memory of a cache module and returning the first target data to the request source; Based on the prefetch information, prefetching second target data from the memory and caching the second target data into at least one cache unit of the cache module. 如請求項12所述的資料讀取方法,其中,所述預取資訊包括預取資料長度; 所述基於所述預取資訊,從所述記憶體中預取第二目標資料,並將所述第二目標資料快取至所述快取模組的至少一個快取單元中,包括以下步驟: 在所述預取資料長度大於0的情況下,基於第一讀請求位址和第一讀資料長度,確定預取起始位址; 基於所述預取起始位址和所述預取資料長度,從所述記憶體中預取所述第二目標資料,並將所述第二目標資料快取至所述快取模組的至少一個快取單元中。 The data reading method of claim 12, wherein the prefetch information includes a prefetch data length; Prefetching the second target data from the memory based on the prefetch information and caching the second target data into at least one cache unit of the cache module comprises the following steps: If the prefetch data length is greater than 0, determining a prefetch start address based on the first read request address and the first read data length; Prefetching the second target data from the memory based on the prefetch start address and the prefetch data length, and caching the second target data into at least one cache unit of the cache module. 如請求項13所述之資料讀取方法,其中,所述讀取資訊包括所述第一讀請求位址和所述第一讀資料長度; 所述根據所述第一讀資料請求,獲取所述讀取資訊和所述預取資訊,包括以下步驟之一: 對所述第一讀資料請求中的請求類型訊號、讀請求位址訊號和讀資料長度訊號分別進行解析,得到第一請求類型、所述第一讀請求位址和所述第一讀資料長度,並基於所述第一請求類型,確定所述預取資料長度; 對所述第一讀資料請求中的所述讀請求位址訊號、所述讀資料長度訊號和所述預取控制訊號分別進行解析,得到所述第一讀請求位址、所述第一讀資料長度和所述預取資料長度。 The data reading method of claim 13, wherein the read information includes the first read request address and the first read data length; Acquiring the read information and the prefetch information based on the first read data request comprises one of the following steps: Separately parsing the request type signal, the read request address signal, and the read data length signal in the first read data request to obtain the first request type, the first read request address, and the first read data length, and determining the prefetch data length based on the first request type; Separately parsing the read request address signal, the read data length signal, and the prefetch control signal in the first read data request to obtain the first read request address, the first read data length, and the prefetch data length. 如請求項13所述之資料讀取方法,其中,所述基於所述預取起始位址和所述預取資料長度,從所述記憶體中預取所述第二目標資料,並將所述第二目標資料快取至所述快取模組的至少一個快取單元中,包括以下步驟: 基於所述預取起始位址和所述預取資料長度,確定至少一個預取子請求,每一所述預取子請求對應的資料請求長度與所述快取模組的快取單元長度相等; 針對每一所述預取子請求,基於所述預取子請求,在所述快取模組的所述儲存空間中進行快取單元命中檢測,得到第一檢測結果,並在所述第一檢測結果表徵所述預取子請求對應的快取單元缺失的情況下,基於所述預取子請求,從所述記憶體中預取所述第二目標資料,並將所述第二目標資料快取至所述快取模組的至少一個快取單元中。 The data reading method of claim 13, wherein prefetching the second target data from the memory based on the prefetch start address and the prefetch data length and caching the second target data into at least one cache unit of the cache module comprises the following steps: Determining at least one prefetch sub-request based on the prefetch start address and the prefetch data length, wherein the data request length corresponding to each prefetch sub-request is equal to the cache unit length of the cache module; For each prefetch sub-request, a cache unit hit test is performed in the storage space of the cache module based on the prefetch sub-request to obtain a first test result. If the first test result indicates that the cache unit corresponding to the prefetch sub-request is missed, the second target data is prefetched from the memory based on the prefetch sub-request and cached into at least one cache unit of the cache module. 如請求項13所述之資料讀取方法,其中,所述預取資料長度為快取單元長度的整數倍,所述快取單元包括快取塊或快取扇區。The data reading method as described in claim 13, wherein the length of the pre-fetched data is an integer multiple of the length of the cache unit, and the cache unit includes a cache block or a cache sector. 如請求項12所述之資料讀取方法,其中,所述第一讀資料請求的數量為多個;所述讀取資訊包括第一讀請求位址和第一讀資料長度; 所述基於所述讀取資訊,從快取模組的儲存空間、和/或記憶體中讀取第一目標資料,包括以下步驟: 針對每一所述第一讀資料請求,基於所述第一讀資料請求對應的第一讀請求位址和第一讀資料長度,確定所述第一讀資料請求對應的快取單元; 在所述多個第一讀資料請求分別對應的快取單元重合的情況下,將所述多個第一讀資料請求對應的第一讀請求位址和第一讀資料長度分別合併,得到合併後的第一讀請求位址和第一讀資料長度,並將所述多個第一讀資料請求對應的預取資訊合併,得到合併後的預取資訊; 基於合併後的所述第一讀請求位址和所述第一讀資料長度,從所述快取模組的儲存空間、和/或記憶體中獲取所述第一目標資料; 所述基於所述預取資訊,從所述記憶體中預取第二目標資料,並將所述第二目標資料快取至所述快取模組的至少一個快取單元中,包括以下步驟: 基於合併後的所述預取資訊,從所述記憶體中預取所述第二目標資料,並將所述第二目標資料快取至所述快取模組的至少一個快取單元中。 The data reading method of claim 12, wherein the number of the first read data requests is multiple; the read information includes a first read request address and a first read data length; Reading the first target data from the storage space and/or memory of the cache module based on the read information comprises the following steps: For each first read data request, determining the cache unit corresponding to the first read data request based on the first read request address and the first read data length corresponding to the first read data request; If the cache units corresponding to the multiple first read data requests overlap, merging the first read request addresses and first read data lengths corresponding to the multiple first read data requests to obtain a merged first read request address and first read data length, and merging the prefetch information corresponding to the multiple first read data requests to obtain a merged prefetch information; Based on the merged first read request address and first read data length, retrieving the first target data from the storage space and/or memory of the cache module; Prefetching the second target data from the memory based on the prefetch information and caching the second target data into at least one cache unit of the cache module comprises the following steps: Prefetching the second target data from the memory based on the merged prefetch information and caching the second target data into at least one cache unit of the cache module. 如請求項17所述之資料讀取方法,其中,所述將所述多個第一讀資料請求對應的預取資訊合併,得到合併後的預取資訊,包括以下步驟: 將所述多個第一讀資料請求中第一讀請求位址最靠後的第一讀資料請求對應的預取資訊,確定為合併後的預取資訊。 The data reading method of claim 17, wherein merging the prefetch information corresponding to the multiple first read data requests to obtain the merged prefetch information comprises the following steps: Determining the prefetch information corresponding to the first read data request with the latest first read request address among the multiple first read data requests as the merged prefetch information. 如請求項12至18中任一項所述之資料讀取方法,其中,所述第一目標資料包括至少一個第一子目標資料,所述快取模組包括第一快取子模組和第二快取子模組; 所述接收請求源發送的第一讀資料請求,包括以下步驟: 所述第一快取子模組接收請求源發送的所述第一讀資料請求; 所述根據所述第一讀資料請求,獲取所述讀取資訊和所述預取資訊,包括以下步驟: 所述第一快取子模組根據所述第一讀資料請求,獲取所述讀取資訊和所述預取資訊;所述讀取資訊包括所述第一讀請求位址和所述第一讀資料長度; 所述基於所述讀取資訊,從快取模組的儲存空間、和/或記憶體中讀取第一目標資料,並將所述第一目標資料返回至所述請求源,包括以下步驟: 所述第一快取子模組基於所述第一讀請求位址和所述第一讀資料長度,確定至少一個讀資料子請求,每一所述讀資料子請求對應的資料請求長度與所述第一快取子模組的快取單元長度相等;所述第一快取子模組從所述第一快取子模組的儲存空間、所述第二快取子模組的儲存空間、和/或所述記憶體中獲取每一所述讀資料子請求分別對應的第一子目標資料,並將各所述第一子目標資料返回至所述請求源; 所述基於所述預取資訊,從所述記憶體中預取第二目標資料,並將所述第二目標資料快取至所述快取模組的至少一個快取單元中,包括以下步驟: 所述第二快取子模組基於所述預取資訊,從所述記憶體中預取所述第二目標資料,並將所述第二目標資料快取至所述第二快取子模組的至少一個快取單元中。 The data reading method of any one of claim items 12 to 18, wherein the first target data includes at least one first sub-target data, and the cache module includes a first cache sub-module and a second cache sub-module; Receiving a first read data request sent by a request source comprises the following steps: The first cache sub-module receives the first read data request sent by the request source; Acquiring the read information and the prefetch information based on the first read data request comprises the following steps: The first cache sub-module acquires the read information and the prefetch information based on the first read data request; the read information includes the first read request address and the first read data length; Reading first target data from the storage space and/or memory of the cache module based on the read information and returning the first target data to the request source comprises the following steps: The first cache submodule determines at least one read data sub-request based on the first read request address and the first read data length, wherein the data request length corresponding to each read data sub-request is equal to the cache unit length of the first cache submodule; the first cache submodule obtains first sub-target data corresponding to each read data sub-request from the storage space of the first cache submodule, the storage space of the second cache submodule, and/or the memory, and returns each first sub-target data to the request source; Prefetching the second target data from the memory based on the prefetch information and caching the second target data into at least one cache unit of the cache module comprises the following steps: The second cache submodule prefetches the second target data from the memory based on the prefetch information and caches the second target data into at least one cache unit of the second cache submodule. 如請求項19所述之資料讀取方法,其中,所述第一快取子模組從所述第一快取子模組的儲存空間、所述第二快取子模組的儲存空間、和/或所述記憶體中獲取每一所述讀資料子請求分別對應的第一子目標資料,包括以下步驟: 所述第一快取子模組基於每一所述讀資料子請求,分別在所述第一快取子模組的所述儲存空間中進行快取單元命中檢測,得到每一所述讀資料子請求對應的第二檢測結果; 所述第一快取子模組針對每一所述讀資料子請求,在所述讀資料子請求對應的第二檢測結果表徵所述讀資料子請求對應的快取單元命中的情況下,從命中的快取單元中讀取所述讀資料子請求對應的第一子目標資料,並在所述讀資料子請求對應的第二檢測結果表徵所述讀資料子請求對應的快取單元缺失的情況下,基於所述讀資料子請求對應的第二讀請求位址和第二讀資料長度、以及所述預取資訊,向所述第二快取子模組發送第二讀資料請求,並接收所述第二快取子模組返回的所述讀資料子請求對應的第一子目標資料; 所述第二快取子模組基於所述預取資訊,從所述記憶體中預取第二目標資料,包括以下步驟: 所述第二快取子模組回應於接收到所述第一快取子模組發送的第二讀資料請求,對所述第二讀資料請求進行解析,得到第二讀請求位址、第二讀資料長度和所述預取資訊; 所述第二快取子模組基於所述第二讀請求位址和所述第二讀資料長度,從所述第二快取子模組的所述儲存空間、和/或所述記憶體中獲取第一子目標資料,並將所述第一子目標資料返回至所述第一快取子模組; 所述第二快取子模組基於所述預取資訊,從所述記憶體中預取所述第二目標資料,並將所述第二目標資料快取至所述第二快取子模組的至少一個快取單元中。 The data reading method of claim 19, wherein the first cache sub-module obtains first sub-target data corresponding to each read data sub-request from the storage space of the first cache sub-module, the storage space of the second cache sub-module, and/or the memory, comprising the following steps: Based on each read data sub-request, the first cache sub-module performs a cache unit hit check in the storage space of the first cache sub-module to obtain a second check result corresponding to each read data sub-request; For each read data sub-request, if the second detection result corresponding to the read data sub-request indicates that the cache unit corresponding to the read data sub-request is a hit, the first cache sub-module reads the first sub-target data corresponding to the read data sub-request from the hit cache unit. If the second detection result corresponding to the read data sub-request indicates that the cache unit corresponding to the read data sub-request is a miss, the first cache sub-module sends a second read data request to the second cache sub-module based on the second read request address and second read data length corresponding to the read data sub-request, as well as the prefetch information, and receives the first sub-target data corresponding to the read data sub-request returned by the second cache sub-module. The second cache submodule prefetches the second target data from the memory based on the prefetch information, comprising the following steps: In response to receiving a second read data request sent by the first cache submodule, the second cache submodule parses the second read data request to obtain a second read request address, a second read data length, and the prefetch information; Based on the second read request address and the second read data length, the second cache submodule obtains the first subtarget data from the storage space and/or the memory of the second cache submodule, and returns the first subtarget data to the first cache submodule; Based on the prefetch information, the second cache submodule prefetches the second target data from the memory, and caches the second target data into at least one cache unit of the second cache submodule. 一種晶片,其包括: 計算模組,用於:獲取讀資料指令;所述讀資料指令用於指示按照讀取資訊進行資料讀取,並按照預取資訊進行資料預取;根據所述讀資料指令獲取第一讀資料請求;所述第一讀資料請求中包含所述讀取資訊和所述預取資訊;向快取模組發送所述第一讀資料請求,以使所述快取模組回應於所述第一讀資料請求,按照所述讀取資訊讀取並返回第一目標資料、以及按照所述預取資訊預取並快取第二目標資料;接收所述快取模組返回的所述第一目標資料; 快取模組,用於:接收所述計算模組發送的所述第一讀資料請求,所述第一讀資料請求用於請求按照所述讀取資訊進行資料讀取,並按照所述預取資訊進行資料預取;根據所述第一讀資料請求,獲取所述讀取資訊和所述預取資訊;基於所述讀取資訊,從所述快取模組的儲存空間、和/或記憶體中讀取所述第一目標資料,並將所述第一目標資料返回至所述計算模組;基於所述預取資訊,從所述記憶體中預取所述第二目標資料,並將所述第二目標資料快取至所述快取模組的至少一個快取單元中。 A chip comprises: A computing module configured to: obtain a read instruction; the read instruction is configured to instruct data reading according to read information and data prefetching according to prefetch information; obtain a first read request based on the read instruction; the first read request includes the read information and the prefetch information; send the first read request to a cache module, causing the cache module to respond to the first read request by reading and returning first target data according to the read information and prefetching and caching second target data according to the prefetch information; and receive the first target data returned by the cache module; The cache module is configured to: receive the first read data request sent by the computing module, the first read data request being used to request data reading according to the read information and data prefetching according to the prefetch information; obtain the read information and the prefetch information based on the first read data request; read the first target data from the storage space and/or memory of the cache module based on the read information, and return the first target data to the computing module; and prefetch the second target data from the memory based on the prefetch information, and cache the second target data into at least one cache unit of the cache module. 一種電腦設備,包括記憶體和處理器,所述記憶體儲存有可在所述處理器上運行的電腦程式,其中,所述處理器執行所述電腦程式時實現如請求項1至20中任一項所述之資料讀取方法中的步驟、或所述處理器包括請求項21所述之晶片。A computer device includes a memory and a processor, wherein the memory stores a computer program that can be executed on the processor, wherein when the processor executes the computer program, the steps in the data reading method described in any one of claims 1 to 20 are implemented, or the processor includes the chip described in claim 21. 一種電腦可讀儲存媒體,其上儲存有電腦程式,其中,所述電腦程式被處理器執行時實現如請求項1至20中任一項所述之資料讀取方法中的步驟。A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the steps of the data reading method as described in any one of claims 1 to 20 are implemented.
TW113149884A 2023-12-26 2024-12-20 Data reading method for chip, chip, computer equipment and computer-readable storage medium TW202526608A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2023118092609 2023-12-26

Publications (1)

Publication Number Publication Date
TW202526608A true TW202526608A (en) 2025-07-01

Family

ID=

Similar Documents

Publication Publication Date Title
KR102074329B1 (en) Storage device and data porcessing method thereof
WO2020199061A1 (en) Processing method and apparatus, and related device
US6981119B1 (en) System and method for storing performance-enhancing data in memory space freed by data compression
TWI651620B (en) Data processing system and method for processing multiple transactions
US20210089343A1 (en) Information processing apparatus and information processing method
US9639471B2 (en) Prefetching according to attributes of access requests
WO2025139618A1 (en) Data reading method for chip, and chip, computer device, storage medium and computer program product
JP6859361B2 (en) Performing memory bandwidth compression using multiple Last Level Cache (LLC) lines in a central processing unit (CPU) -based system
US6578065B1 (en) Multi-threaded processing system and method for scheduling the execution of threads based on data received from a cache memory
JPH10187533A (en) Cache system, processor and method of operating processor
TW200428210A (en) Memory management
CN112905237A (en) Instruction prefetching method, device, equipment and medium
JP5625809B2 (en) Arithmetic processing apparatus, information processing apparatus and control method
US7058767B2 (en) Adaptive memory access speculation
JP2024511768A (en) Method and apparatus for DRAM cache tag prefetcher
KR101689094B1 (en) System cache with sticky removal engine
US8661169B2 (en) Copying data to a cache using direct memory access
WO2025124522A1 (en) Instruction acquisition method, central processing unit, device, medium, and program product
JP7170093B2 (en) Improved read-ahead capabilities for storage devices
KR20100005539A (en) Cache memory system and prefetching method thereof
CN115269492A (en) Streaming data management method and device for reconfigurable processor multi-port cache
CN100552647C (en) Processing module with multi-level cache architecture
TW202526608A (en) Data reading method for chip, chip, computer equipment and computer-readable storage medium
CN111183414A (en) Cache method and system based on service level agreement
US10997077B2 (en) Increasing the lookahead amount for prefetching