[go: up one dir, main page]

HK1211102B - Method for processing data and processor - Google Patents

Method for processing data and processor Download PDF

Info

Publication number
HK1211102B
HK1211102B HK15111785.9A HK15111785A HK1211102B HK 1211102 B HK1211102 B HK 1211102B HK 15111785 A HK15111785 A HK 15111785A HK 1211102 B HK1211102 B HK 1211102B
Authority
HK
Hong Kong
Prior art keywords
data
processor
state
llc
private cache
Prior art date
Application number
HK15111785.9A
Other languages
Chinese (zh)
Other versions
HK1211102A1 (en
Inventor
马凌
姚四海
张磊
Original Assignee
阿里巴巴集团控股有限公司
Filing date
Publication date
Priority claimed from CN201410117556.9A external-priority patent/CN104951240B/en
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of HK1211102A1 publication Critical patent/HK1211102A1/en
Publication of HK1211102B publication Critical patent/HK1211102B/en

Links

Description

Data processing method and processor
Technical Field
The present invention relates to the field of computers, and in particular, to a data processing method and a processor.
Background
In recent years, processor vendors have been constrained by power consumption and temperature, while computers have moved toward multi-core computer architectures in order to keep high-performance operations growing continuously. In order to fully utilize the multi-core architecture, the application program is divided into a plurality of threads which can independently run on a single CPU, so that the design program can be executed in parallel to improve the overall running efficiency.
An example of a mainstream design scheme of a current multi-core architecture is shown in fig. 1a and 1b, where fig. 1a has 16 CPU cores in total, and each CPU core can access each other through a route (thick line in the figure), and fig. 1b is a frame of each processor, where Ln represents First Level Cache (First Level Cache) L1, or L1 and Second Level Cache (Second Level Cache) L2; the LastLevel Cache is the last level of Cache, called LLC for short, Ln is respectively connected with LLC and the route, and the directory of LLC is connected with the route. The memory data is evenly distributed to the LLC of each processor after being read from the memory.
Based on such program architecture, it is necessary to ensure the synchronism and data integrity of data processing, so some kind of synchronization mechanism is needed to serially access the shared area between threads. The Transaction memory is proposed to improve the parallelism of the threads, and meanwhile, when a conflict occurs, the thread can be rolled back to a state before the conflict occurs, so that the data integrity is ensured. Current Transaction memory technology has entered into CPU architectures, including IBM's Blue Gen, Intel's Haswell, and others.
After the development of more than 20 years, the Transaction memory can be realized by software or hardware respectively, the execution efficiency of the Transaction memory realized by software is very low, and the Transaction memory realized by hardware greatly improves the practicability, and the Transaction memory based on hardware is mainly discussed herein.
The Transaction memory assumes that the access of multi-core threads to shared data rarely generates write-read, read-write and write-write conflicts, thus allowing the multiple threads to be executed in a tentative parallel mode, and by hiding the update state of the data and rolling back when the conflict is found to be generated, the state of the program can return to the state before the conflict, and the performance and the expandability of the system structure are improved by utilizing the characteristic on the premise of not influencing the integrity of the data. Although the parallelism of the multi-core system is improved by the Transaction memory, the probability of conflict generation is increased along with the improvement of the parallelism, and the performance of the program is seriously damaged when the rollback operation is performed once the conflict is generated.
If the advanced invalidation processing is adopted, the operation of a key area can be accelerated, and the conflict generated when the shared data is updated is greatly reduced. However, if the preassigned process is directly applied to the conventional Transaction Memory technology, the preassigned process requires the global data state to be changed, which is contrary to the way in which the Transaction Memory is implemented (the Transaction Memory process must hide the state during updating), and therefore cannot be easily combined with the Transaction Memory.
Disclosure of Invention
The technical problem to be solved by the application is how to reduce rollback events caused by data conflicts in a multi-core system and accelerate the operation of a key area.
In order to solve the above problem, the present application provides a data processing method, including:
the first processor starts transaction processing and reads the first data into a private cache;
the first processor writes the first data in the private cache, and starts to submit the transactional memory after the first data is written;
writing the first data in the private cache of the first processor into a Last Level Cache (LLC) if the last change of the first data before being written by the first processor is made by a second processor, and invalidating the first data in the private cache of the first processor;
the transactional memory completes committing.
Further, the method further comprises:
when the first processor is to write the first data in a private cache, if the last change of the first data before being written by the first processor is made by a second processor, modifying the state of the first data in the private cache to be exclusive and changed; modifying the state of the first data in the private cache to change if the last change of the first data before being written by the first processor was made by the first processor;
after the step of starting to commit the transactional memory and before the step of completing commit of the transactional memory, the method further comprises the following steps:
setting the state of the first data in the LLC directory as exclusive and changing if the state of the first data in the private cache of the first processor is exclusive and changing.
Further, the first processor starts transaction processing, and the step of reading the first data into the private cache includes:
s11, the first processor starts transaction processing, and if the private cache does not have the needed first data, the directory of the LLC is accessed according to the address mapping;
s12, acquiring the state of the first data according to a first state indication string of the cache line of the first data in the directory of the LLC; if the state of the first data is changed, performing step S13;
s13, determining the processor with the latest first data according to the data identification bit in the first state indication string; if the second processor is the first processor, the latest first data is fetched from the private cache of the first processor to the LLC, and the state of the first data in the directory of the LLC is modified to be shared; if the first processor is the first processor, directly performing the step that the first processor performs write operation on the first data in the private cache;
s14, reading the first data from the LLC into a private cache of the first processor, and setting a data identification bit in the first status indication string corresponding to the first processor.
Further, the step S12 further includes:
if the state of the first data is exclusive and changed, go to step S13';
the method further comprises the following steps:
s13', the state of the first data in the directory of the LLC is modified to be shared and changed; step S14 is performed.
Further, after the step of starting to commit the transactional memory and before the step of completing commit of the transactional memory, the method further includes:
inquiring each data identification bit of the state indication string of the first data in the LLC directory, and judging whether a set data identification bit exists except the data identification bit corresponding to the first processor; and if so, invalidating the first data in the processor corresponding to the set data identification bit and resetting the data identification bit.
The application also provides a processor, which is applied to the multi-core processing equipment and comprises:
a private cache and a submission unit;
the reading unit is used for reading the first data into the private cache when the processor starts transaction processing;
the write operation unit is used for performing write operation on the first data in the private cache and indicating the commit unit to begin to commit the transactional memory after the write operation is completed;
and the invalidation unit is used for writing the first data in the private cache into the LLC and invalidating the first data in the private cache of the processor if the last change of the first data before the write operation of the processor is performed by other processors after the commit unit starts to commit the transactional memory and before the commit of the transactional memory is completed.
Further, the processor further comprises:
a setting unit, configured to modify the state of the first data in the private cache to be exclusive and changed if the last change of the first data before being write-operated by the processor is performed by another processor when the write-operation unit is to perform write-operation on the first data; if the last change of the first data before the first data is written by the processor is carried out by the processor, modifying the state of the first data in the private cache into a change; after the commit unit starts to commit the transactional memory and before the completion of the commit of the transactional memory, if the state of the first data in the private cache of the first processor is exclusive and changed, setting the state of the first data in the LLC directory to be exclusive and changed.
Further, when the processor starts transaction processing, the reading unit reads the first data into the private cache by:
when the processor starts transaction processing, if the private cache does not have the needed first data, the reading unit accesses the directory of the LLC according to address mapping; acquiring the state of the first data according to a first state indication string of the cache line of the first data in a directory of the LLC; if the state of the first data is changed, determining a processor with the latest first data according to a data identification bit in the first state indication string, if the state of the first data is changed, retrieving the latest first data from the private cache of other processors to the LLC, modifying the state of the first data in the directory of the LLC into shared state, reading the first data from the LLC into the private cache of the processor, setting the data identification bit corresponding to the processor in the first state indication string, and indicating the write operation unit to write the first data in the private cache of the processor; and if the processor is the processor, directly indicating the write operation unit to write the first data in the private cache of the processor.
Further, the reading unit is further configured to modify the state of the first data in the directory of the LLC to be shared and modified when it is known that the state of the first data is exclusive and modified according to the first state indication string of the cache line of the first data in the directory of the LLC and according to the address mapping.
Further, the invalidation unit is further configured to query each data identification bit of the status indication string of the first data in the LLC directory, and determine whether a set data identification bit exists in addition to the data identification bit corresponding to the processor; and if so, invalidating the first data in the processor corresponding to the set data identification bit and resetting the data identification bit.
At least one embodiment of the application utilizes a simple prediction mechanism to seamlessly combine the advanced invalidation processing and the hardware Transaction Memory based on the existing hardware Transaction Memory, thereby quickening the improvement of the prediction accuracy and the execution efficiency of the Transaction Memory to a key area, reducing the rollback time generated by data collision during the operation of the Transaction Memory, and further improving the operation performance and the expandability of the multi-core system. Of course, it is not necessary for any product to achieve all of the above-described advantages at the same time for the practice of the present application.
Drawings
FIG. 1a is a schematic diagram of multiple processors in a multi-core parallel system;
FIG. 1b is a schematic diagram of the structure of each processor;
FIG. 2 is a schematic diagram of a status indication string of data in a directory of an LLC;
FIG. 3 is a flowchart illustrating a data processing method according to a first embodiment;
FIG. 4 is a diagram illustrating a status indication string of a data cache line in a directory of an LLC at the beginning according to a first example of the first embodiment;
FIG. 5 is a flowchart illustrating a data processing procedure according to a first example of the first embodiment;
fig. 6 is a diagram illustrating a status indication string of the data cache line in the directory of the LLC in step 104 in the first example according to the first embodiment;
fig. 7 is a diagram illustrating a status indication string of the data cache line in the directory of the LLC in step 110 in the first example according to the first embodiment;
FIG. 8 is a flowchart illustrating data processing according to a second example of the first embodiment;
FIG. 9 is a diagram illustrating a status indication string of a data cache line in a directory of an LLC in step 210 according to a second example of the first embodiment;
FIG. 10 is a data processing flow chart in a third example of the first embodiment;
FIG. 11 is a diagram illustrating a status indication string of the data cache line in the directory of the LLC in step 304 in a third example according to the first embodiment;
fig. 12 is a diagram illustrating a status indication string of the data cache line in the directory of the LLC in step 310 in the third example of the first embodiment.
Detailed Description
The technical solutions of the present application will be described in more detail below with reference to the accompanying drawings and embodiments.
It should be noted that, if not conflicted, the embodiments and the features of the embodiments can be combined with each other and are within the scope of protection of the present application. Additionally, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
In a typical configuration, a multi-core system may include multiple processors (CPUs), one or more input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
First, the operation principle of the conventional hardware Transaction memory will be described.
The hardware Transaction memory working principle is as follows: the CPU is first told with code the start and end of the Transaction area, such as Transaction _ start and Transaction _ end. The middle section is the execution Transaction area, and once execution is complete, the Transaction end instruction asks the CPU to atomically commit all the modified data (i.e., the commit process is not interrupted and accessible). Any read or write memory area is monitored in the course of Transaction execution to avoid write-read, read-write and write-write conflicts, and in order to roll back to the initial state of the Transaction area when data conflicts occur, the original data must be copied before all memory write operations in the course of Transaction memory execution; for example, a corresponding cache line may be copied to a private invisible cache (e.g., a first-level cache), and the newly written data is stored in the private invisible cache, so that the changed data in the private invisible cache (e.g., the first-level cache) is discarded in case of a conflict, and the newly changed data (e.g., the first-level cache) is used to replace the original data if the Transaction is successfully completed.
There are two ways of data cache coherency protocol: broadcast, directory. The broadcast mode is suitable for the architecture with a small number of cores, and the directory has strong expandability and is suitable for the multi-core architecture, so the embodiment of the application is mainly proposed on the basis of the directory.
Each data cache line has a Status indication string in the directory of the LLC, as shown in fig. 2, Tag in the diagram is an index mark of the data cache line in the directory, usually a data high address, Status is a Status flag, and includes: exclusive, Modified, Shared, invalid; data can be distributed to private caches of one or more CPUs in a shared state, data identification bits P0-Pn-1 respectively correspond to n processors CPU 1-CPU n-1, when distributed data exist in corresponding CPUs, the value of the data identification bits is '1', and when distributed data do not exist, the value of the data identification bits is '0'; the values of P0 Pn-1 may therefore indicate in which CPU's private cache the data is distributed in the shared state. If the state is changed, only one CPU can contain the latest data, namely, only one data identification bit has the value of 1, and the other data identification bits are 0.
Any write operation must transition the state of the data cache line (which may be any of the four above) to a modified state. In the process of changing into the change state, an invalidation request needs to be sent to all CPUs containing original data, namely, data of private caches of corresponding CPUs are invalidated, data identification positions of corresponding data cache lines corresponding to the CPUs are set to be 0, and latest and unique data are obtained at the same time. For example, the above process is required in the process of reading changed data or submitting recently changed data to the Transaction Memory key area, and the embodiment of the application aims to accelerate the process, increase the parallel time and reduce the collision probability.
In an embodiment, a data processing method, as shown in fig. 3, includes:
s1, the first processor starts transaction processing and reads the first data into the private cache;
s2, the first processor writes the first data in the private cache, and starts to submit the transactional memory after the first data is written;
s3, if the last change of the first data before being written by the first processor is performed by the second processor, writing the first data in the private cache of the first processor into the LLC, and invalidating the first data in the private cache of the first processor;
s4, the transaction memory completes the submission.
In this embodiment, the processor that updates the first data last time before the write operation of the first processor may be known according to the first state indication string of the cache line of the first data in the LLC directory; if the processor is the processor, the processing can be carried out according to the existing scheme; if the first processor is not the local processor, as shown in fig. 3, the first data is written back to the LLC after the first processor performs the write operation, and the first data in the private cache of the first processor is invalidated.
In this embodiment, since the changed data is stored in the LLC, rather than in the processor performing the write operation, and the processor performing the write operation is invalidated when submitting the transactional memory, the data change in the subsequent steps is performed only on the data in the LLC, thereby avoiding data transmission between processors and improving the operating performance of the system; after the method of the embodiment is adopted, the reading time of the changed data cannot be rapidly increased due to the increase of the number of the cores, so that the expandability of the system is improved.
The method of this embodiment may be implemented autonomously by each processor according to execution logic written in the processor in advance, or may be implemented by adding related instructions to a program to be executed, and controlling the corresponding processor through the added instructions in the process of executing the program.
In an implementation manner of this embodiment, the method may further include:
when the first processor is to write the first data in a private cache, if the last change of the first data before being written by the first processor is made by a second processor, modifying the state of the first data in the private cache to be exclusive and changed; modifying the state of the first data in the private cache to change if the last change of the first data before being written by the first processor was made by the first processor;
after the step of starting to commit the transactional memory and before the step of completing commit of the transactional memory, the method may further include:
setting the state of the first data in the LLC directory as exclusive and changing if the state of the first data in the private cache of the first processor is exclusive and changing.
In this embodiment, the case that the first data state is changed can be handled according to the existing scheme; the state of the first data in the first processor private cache, and the state in the LLC directory at commit, will be set to special "exclusive and change" for the previous case where the data was changed by other processors. This makes it clear that the first data has been written back to the LLC and that the first data in the first processor has been invalidated in subsequent processing.
In an alternative of this embodiment, the step S1 may specifically include:
s11, the first processor starts transaction processing, and if the private cache does not have the needed first data, the directory of the LLC is accessed according to the address mapping;
s12, acquiring the state of the first data according to a first state indication string of the cache line of the first data in the directory of the LLC; if the state of the first data is changed, performing step S13;
s13, determining the processor with the latest first data according to the data identification bit in the first state indication string, if the processor is the second processor, fetching the latest first data from the private cache of the second processor to the LLC, and modifying the state of the first data in the directory of the LLC into shared state; if it is the first processor, go directly to step S2;
s14, reading the first data from the LLC into a private cache of the first processor, and setting a data identification bit in the first status indication string corresponding to the first processor.
Step S13 in this alternative is the process when the method of this embodiment is adopted for the first time in the processing of the transactional memory; if the latest first data is present in the second processor that changed the data before the method of this embodiment is used, the first data needs to be retrieved from the second processor to the LLC first when the first processor reads the first data. In this case, after the first processor writes the first data, it is necessary to invalidate the first data in the second processor.
In this alternative, step S14 and step of modifying the state of the first data to be shared are not sequential, or may be performed simultaneously.
In this alternative, the step S12 may further include:
if the state of the first data is exclusive and changed, go to step S13';
the method may further comprise:
s13', the state of the first data in the directory of the LLC is modified to be shared and changed; step S14 is performed.
Wherein, S14 and S13' can be performed simultaneously or not sequentially.
Step S13' in this alternative is the step when the transactional memory is processed after the method of this embodiment has been adopted. In summary, if the original state of the read first data is changed, the read first data is changed to be shared; if the original state of the read first data is exclusive and changed, the first data is changed to be shared and changed; therefore, whether the first data still exists in other processors can be distinguished, and whether the first data in other processors needs to be invalidated or not can be judged in the subsequent step.
In this alternative, after the step of starting to commit the transactional memory and before the step of completing commit of the transactional memory, the method may further include:
inquiring each data identification bit of the status indication string of the first data in the LLC directory, and judging whether a set (1) data identification bit exists except the data identification bit (P1) corresponding to the first processor; if so, the first data in the processor corresponding to the set data identification bit is invalidated and the data identification bit is reset (set to "0").
In this way, no matter whether the method of the present embodiment is adopted for the first time, after step S4 is completed, the status flag in the first status indication string is exclusive and changed; and in each data identification bit, only the value of the data identification bit corresponding to the first processor is '1', so as to indicate that the first processor is the first processor for finally changing the first data.
The following compares the existing data processing procedure with the data processing procedure of the present embodiment by using three examples.
The first example is to perform a data processing in the normal way, and the process is as follows:
assuming that the data has been Modified, the private cache of CPU1 holds the latest data because of the invalidation of data in other CPUs, and the directory structure corresponding to LLC reflects that CPU1 has the latest data whose current state is Modified, so that the status indication string of data in the directory of LLC at the beginning is shown in fig. 4, the state is Modified, only the value of P1 corresponding to CPU1 in the data identification bits is "1", and the values of other data identification bits are "0".
The conventional read/write process of the CPU0 in Transaction is shown in FIG. 5, and includes steps 101 to 111.
101. The Transaction processing procedure of the CPU0 starts.
102. The CPU0 reads data from its private cache.
103. Since the private cache of the CPU0 has no data, the directory of the LLC is accessed according to the address mapping.
104. The LLC instructs the CPU1 to write the latest data back to memory based on the contents of the directory, and to retrieve data from the private cache of the CPU1 to the LLC at the same time, setting the status flag of the corresponding data cache line in the directory to Shared, setting the data flag bits P0, P1 to set (the value is changed to "1"), and returning the data back to the CPU 0; the state of the cache line corresponding to the data in the directory of the LLC at this time indicates the string as shown in FIG. 6.
105. Returning the read data to the private cache of CPU0 with the state of the cacheline of the data being shared; and finishing reading the data.
106. This data is cached in the private cache of the CPU0, and write operations (e.g., changing shared data) are not fed back to the LLC until Transaction commits.
107. The Transaction Memory of the CPU0 starts committing.
108. Since the data cache line is in a shared state, all write operations must invalidate data in other CPUs' private caches, and query the LLC directory according to the address mapping.
109. Knowing from the LLC directory that CPU1 contains data, therefore, invalidates the data of the CPU1 private cache, resets P1 (value changes to "0").
110. Setting the status indication string of the data cache line in the directory of the LLC, as shown in FIG. 7, sets the status flag to Modified and exclusive to CPU0 (i.e., only P0 is "1" and the other data flag bits are "0").
111. The CPU0 completes the Transaction Memory submission.
To avoid conflicts, steps 107-110 need to be merged into an atomic operation at commit time.
It can be seen that the data cache line is now Modified in the private cache of CPU0, and the above steps are repeated when CPU1 is again operational (CPU 0 and CPU1 are interchanged). For operations that access other CPU private caches (retrieve data from other CPUs, invalidate data in other CPUs), latency will increase rapidly as the number of CPU cores increases.
The second example is that the first writing process when the method of the embodiment is adopted to process data is seamlessly combined with the advanced invalidation technology, so that the operation of the Transaction Memory can be accelerated; it is still assumed that the state of the data at the beginning is Modified in the private cache of the CPU1, i.e., the state indication string of the data at the beginning in the directory of the LLC is also shown in fig. 4.
The data processing process in this example is shown in FIG. 8 and includes steps 201-211.
Steps 201-205 synchronize steps 101-105. similarly, the status indication string of the data cache line in the directory of the LLC in step 204 is shown in FIG. 6, that is: the status flag is set to Shared, and the data flag bits P0 and P1 are "1".
206. When the data is operated in the private cache of the CPU0, reading and writing of the data during Transaction are all carried out in the private cache of the CPU 0; if the CPU0 is simply doing a read operation, then the state of the data in the CPU 0's private cache is shared; however, if there is a write operation, because the CPU0 sees the last time the data was changed, that is, another CPU (e.g., CPU 1), the state of the data in the private cache of CPU0 is set to exclusive and changed, as shown in step 206 of fig. 8; otherwise (namely the last change is the CPU) is set to be Modified according to the normal operation.
207. The CPU0 starts submitting the Transaction memory.
208. If a write operation is performed and the state of data in the LLC of the private cache of CPU0 is shared, it may be necessary to invalidate data in the private caches of other CPUs, so the data identification bit of the data state indication string in the LLC directory is queried according to the address mapping. If only a read operation is performed, only the data of the private cache of the CPU0 needs to be invalidated.
209. According to the fact that P1 in the LLC directory is '1', the CPU1 contains data, and therefore data of a private cache of the CPU1 is invalidated, and P1 is reset; if the data state in the private cache of the CPU0 is exclusive and changed, it is necessary to write the data with the exclusive and changed state in the private cache of the CPU0 back to the corresponding LLC, and simultaneously invalidate the data in the private cache of the CPU0, as shown in step 209 in fig. 8; if the private cache data state of CPU0 is only Modified, then the data is retained in the private cache of CPU 0.
210. Setting a status indication string of data which is written by the CPU0 in the directory of the LLC, as shown in fig. 9, if the data status of the private cache of the CPU0 is Modified & Exclusive, setting the status flag in the status indication string to Modified & Exclusive; if the data state of the private cache of the CPU0 is Modified, only the state flag in the state indication string is converted into Modified; both cases keep P0 at "1" to indicate that it is the last time that the CPU0 changed the data.
211. The Transaction Memory completes the commit.
To avoid interference, steps 207-211 need to be merged into an atomic operation at commit time.
It can be seen that the state of the data at LLC is now exclusive and changed, which means that the data has been changed, and the last CPU that changed the data is known to be CPU0 based on the data flag bit; however, in this example, the CPU0 does not contain data, and therefore, other CPU operations will not need to invalidate the CPU0 in the future.
The third example is a writing process performed after the second example when the data processing is performed by the method of the present embodiment, and as shown in fig. 10, the writing process includes steps 301 to 308.
301. The Transaction processing procedure of the CPU1 starts.
302. The CPU1 reads data from its private cache.
303. Since the private cache of the CPU1 has no data, the directory of the LLC is accessed according to the address map, and the state of the data is known to the line as exclusive and changed according to the state of the data's cache line in the directory of the LLC, and is finally changed by the CPU 0. Since the CPU1 needs to read the data, the status flag of the cache line of the data is changed to Modified & Shared and changed, and P1 is set and P0 is reset, as shown in fig. 11.
304. The data is retrieved to the private cache of the CPU1 and the data is read.
305. When the data is operated in the private cache of the CPU1, the reading and writing of the data during the Transaction are all carried out in the private cache of the CPU 1; if the CPU1 only reads this data, then the state of the data in the CPU1 private cache is set to Shared; if the CPU1 writes to this data, the CPU1 knows that the data was Modified by the CPU0 the last time when it fetched from the LLC, and therefore changes the state of the data in the private cache of the CPU1 to Modified & Exclusive, and if the CPU1 knows that the data was Modified by the CPU the last time when it fetched from the LLC, it changes the state of the data in the private cache of the CPU1 to Modified.
306. The CPU1 starts submitting the Transaction memory.
307. If the CPU1 only performs a read operation, only the data of the private cache of the CPU1 needs to be invalidated; if a write operation is performed, since the state of the data in the private cache of the CPU1 in the LLC is Modified & Shared, it may be necessary to invalidate the data in other CPUs (the CPU corresponding to the data identification bit with "1"); if the identification bits of the data in the status indication string of the LLC except the P1 are all '0', judging that no data exists in other CPUs, and therefore, the data in other CPUs does not need to be invalidated; if the data status of the private cache of the CPU1 is Modified & Exclusive, it is determined that the Exclusive and Modified data of the CPU1 needs to be written back to the LLC, and the data of the private cache of the CPU1 is invalidated, as shown in step 307 in fig. 10; if the data state of the private cache of CPU1 is Modified, then the data is only saved in the private cache of CPU 1; both cases change the state identification in the state indication string in the directory of the LLC where the cache line is located, the first case sets the state identification to Modified & Exclusive (as shown in FIG. 12), the second case sets the state identification to Modified, and both cases keep P1 to "1" to indicate that CPU1 last used the data.
308. The Transaction Memory commit is complete.
To avoid disturbing the steps 306-308 needs to be merged into an atomic operation at commit time.
From now on, all the following data changes only change the data on the LLC, thus avoiding the data transmission between the CPU and the CPU, greatly shortening the processing time in the Transaction Memory, finally reducing the linear processing time and improving the expandability of the whole system.
The second embodiment provides a processor, which is applied to a multi-core processing device, and includes:
a private cache and a submission unit;
the reading unit is used for reading the first data into the private cache when the processor starts transaction processing;
the write operation unit is used for performing write operation on the first data in the private cache and indicating the commit unit to begin to commit the transactional memory after the write operation is completed;
and the invalidation unit is used for writing the first data in the private cache into the LLC and invalidating the first data in the private cache of the processor if the last change of the first data before the write operation of the processor is performed by other processors after the commit unit starts to commit the transactional memory and before the commit of the transactional memory is completed.
In an implementation manner of this embodiment, the processor may further include:
a setting unit, configured to modify the state of the first data in the private cache to be exclusive and changed if the last change of the first data before being write-operated by the processor is performed by another processor when the write-operation unit is to perform write-operation on the first data; if the last change of the first data before the first data is written by the processor is carried out by the processor, modifying the state of the first data in the private cache into a change; after the commit unit starts to commit the transactional memory and before the completion of the commit of the transactional memory, if the state of the first data in the private cache of the first processor is exclusive and changed, setting the state of the first data in the LLC directory to be exclusive and changed.
In an alternative of this embodiment, when the processor starts transaction processing, the reading unit reads the first data into the private cache specifically may be:
when the processor starts transaction processing, if the private cache does not have the needed first data, the reading unit accesses the directory of the LLC according to address mapping; acquiring the state of the first data according to a first state indication string of the cache line of the first data in a directory of the LLC; if the state of the first data is changed, determining a processor with the latest first data according to a data identification bit in the first state indication string, if the state of the first data is changed, retrieving the latest first data from the private cache of other processors to the LLC, modifying the state of the first data in the directory of the LLC into shared state, reading the first data from the LLC into the private cache of the processor, setting the data identification bit corresponding to the processor in the first state indication string, and indicating the write operation unit to write the first data in the private cache of the processor; and if the processor is the processor, directly indicating the write operation unit to write the first data in the private cache of the processor.
In this alternative, the read unit may be further configured to modify the state of the first data in the directory of the LLC to be shared and modified when it is known that the state of the first data is exclusive and modified according to the first state indication string of the cache line of the first data in the directory of the LLC and the directory of the LLC is accessed according to address mapping.
In this alternative, the invalidating unit may be further configured to query each data identification bit of the status indication string of the first data in the LLC directory, and determine whether a set (1) data identification bit exists in addition to the data identification bit (P1) corresponding to the processor; if so, the first data in the processor corresponding to the set data identification bit is invalidated and the data identification bit is reset (set to "0").
It will be understood by those skilled in the art that all or part of the steps of the above methods may be implemented by instructing the relevant hardware through a program, and the program may be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present application is not limited to any specific form of hardware or software combination.
There are, of course, many other embodiments of the invention that can be devised without departing from the spirit and scope thereof, and it will be apparent to those skilled in the art that various changes and modifications can be made herein without departing from the spirit and scope of the invention.

Claims (10)

1. A method of data processing, comprising:
the first processor starts transaction processing and reads the first data into a private cache;
the first processor writes the first data in the private cache, and starts to submit the transactional memory after the first data is written;
writing the first data in the private cache of the first processor into a Last Level Cache (LLC) if the last change of the first data before being written by the first processor is made by a second processor, and invalidating the first data in the private cache of the first processor;
the transactional memory completes committing.
2. The method of claim 1, further comprising:
when the first processor is to write the first data in a private cache, if the last change of the first data before being written by the first processor is made by a second processor, modifying the state of the first data in the private cache to be exclusive and changed; modifying the state of the first data in the private cache to change if the last change of the first data before being written by the first processor was made by the first processor;
after the step of starting to commit the transactional memory and before the step of completing commit of the transactional memory, the method further comprises the following steps:
setting the state of the first data in the LLC directory as exclusive and changing if the state of the first data in the private cache of the first processor is exclusive and changing.
3. The method of claim 2, wherein the first processor initiates a transaction, and the step of reading the first data into the private cache comprises:
s11, the first processor starts transaction processing, and if the private cache does not have the needed first data, the directory of the LLC is accessed according to the address mapping;
s12, acquiring the state of the first data according to a first state indication string of the cache line of the first data in the directory of the LLC; if the state of the first data is changed, the step is carried out
S13;
S13, determining the processor with the latest first data according to the data identification bit in the first state indication string; if the second processor is the first processor, the latest first data is fetched from the private cache of the first processor to the LLC, and the state of the first data in the directory of the LLC is modified to be shared; if the first processor is the first processor, directly performing the step that the first processor performs write operation on the first data in the private cache;
s14, reading the first data from the LLC into a private cache of the first processor, and setting a data identification bit in the first status indication string corresponding to the first processor.
4. The method according to claim 3, wherein the step S12 further comprises:
if the state of the first data is exclusive and changed, go to step S13';
the method further comprises the following steps:
s13', the state of the first data in the directory of the LLC is modified to be shared and changed; step S14 is performed.
5. The method of claim 4, wherein after the step of starting committing the transactional memory and before the step of completing the committing of the transactional memory, further comprising:
inquiring each data identification bit of the state indication string of the first data in the LLC directory, and judging whether a set data identification bit exists except the data identification bit corresponding to the first processor; and if so, invalidating the first data in the processor corresponding to the set data identification bit and resetting the data identification bit.
6. A processor for use in a multi-core processing device, comprising:
a private cache and a submission unit;
the reading unit is used for reading the first data into the private cache when the processor starts transaction processing;
the write operation unit is used for performing write operation on the first data in the private cache and indicating the commit unit to begin to commit the transactional memory after the write operation is completed;
it is characterized by also comprising:
and the invalidation unit is used for writing the first data in the private cache into the LLC and invalidating the first data in the private cache of the processor if the last change of the first data before the write operation of the processor is performed by other processors after the commit unit starts to commit the transactional memory and before the commit of the transactional memory is completed.
7. The processor of claim 6, further comprising:
a setting unit, configured to modify the state of the first data in the private cache to be exclusive and changed if the last change of the first data before being write-operated by the processor is performed by another processor when the write-operation unit is to perform write-operation on the first data; if the last change of the first data before the first data is written by the processor is carried out by the processor, modifying the state of the first data in the private cache into a change; after the commit unit starts to commit the transactional memory and before the completion of the commit of the transactional memory, if the state of the first data in the private cache of the processor is exclusive and changed, the state of the first data in the LLC directory is set to exclusive and changed.
8. The processor as claimed in claim 7, wherein said reading unit reads the first data into the private cache when the present processor starts the transaction, by:
when the processor starts transaction processing, if the private cache does not have the needed first data, the reading unit accesses the directory of the LLC according to address mapping; acquiring the state of the first data according to a first state indication string of the cache line of the first data in a directory of the LLC; if the state of the first data is changed, determining a processor with the latest first data according to a data identification bit in the first state indication string, if the state of the first data is changed, retrieving the latest first data from the private cache of other processors to the LLC, modifying the state of the first data in the directory of the LLC into shared state, reading the first data from the LLC into the private cache of the processor, setting the data identification bit corresponding to the processor in the first state indication string, and indicating the write operation unit to write the first data in the private cache of the processor; and if the processor is the processor, directly indicating the write operation unit to write the first data in the private cache of the processor.
9. The processor of claim 8, wherein:
the reading unit is further configured to modify the state of the first data in the directory of the LLC to be shared and modified when the directory of the LLC is accessed according to the address mapping and the state of the first data is known to be exclusive and modified according to the first state indication string of the cache line of the first data in the directory of the LLC.
10. The processor as in claim 9 wherein:
the invalidation unit is further configured to query each data identification bit of the status indication string of the first data in the LLC directory, and determine whether a set data identification bit exists in addition to the data identification bit corresponding to the processor; and if so, invalidating the first data in the processor corresponding to the set data identification bit and resetting the data identification bit.
HK15111785.9A 2015-12-01 Method for processing data and processor HK1211102B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410117556.9A CN104951240B (en) 2014-03-26 2014-03-26 A data processing method and processor

Publications (2)

Publication Number Publication Date
HK1211102A1 HK1211102A1 (en) 2016-05-13
HK1211102B true HK1211102B (en) 2019-08-09

Family

ID=

Similar Documents

Publication Publication Date Title
CN109240945B (en) Data processing method and processor
US8706973B2 (en) Unbounded transactional memory system and method
CN101097544B (en) Global overflow method for virtualized transactional memory
EP1399823B1 (en) Using an l2 directory to facilitate speculative loads in a multiprocessor system
US6718839B2 (en) Method and apparatus for facilitating speculative loads in a multiprocessor system
JP2010507160A (en) Processing of write access request to shared memory of data processor
WO2003001369A2 (en) Method and apparatus for facilitating speculative stores in a multiprocessor system
US12222860B2 (en) Processor and method designating an in-core cache of a hierarchical cache system to perform writing-back and invalidation of cached data
US11321233B2 (en) Multi-chip system and cache processing method
CN116830092A (en) Techniques for tracking modifications to the contents of a memory area
KR20160086820A (en) Concurrently accessing memory
US20230099256A1 (en) Storing an indication of a specific data pattern in spare directory entries
HK1211102B (en) Method for processing data and processor
US6996675B2 (en) Retrieval of all tag entries of cache locations for memory address and determining ECC based on same
US20090235027A1 (en) Cache memory system, data processing apparatus, and storage apparatus