[go: up one dir, main page]

HK1062342A - Method and apparatus for enabling access to global data by a plurality of codes in an integrated executable for a heterogeneous architecture - Google Patents

Method and apparatus for enabling access to global data by a plurality of codes in an integrated executable for a heterogeneous architecture Download PDF

Info

Publication number
HK1062342A
HK1062342A HK04105319.9A HK04105319A HK1062342A HK 1062342 A HK1062342 A HK 1062342A HK 04105319 A HK04105319 A HK 04105319A HK 1062342 A HK1062342 A HK 1062342A
Authority
HK
Hong Kong
Prior art keywords
memory
global
executable
data
additional
Prior art date
Application number
HK04105319.9A
Other languages
Chinese (zh)
Inventor
迈克尔.K.克施温德
凯瑟琳.M.奥布莱恩
约翰.K.奥布莱恩
瓦伦蒂纳.萨拉普罗
Original Assignee
国际商业机器公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国际商业机器公司 filed Critical 国际商业机器公司
Publication of HK1062342A publication Critical patent/HK1062342A/en

Links

Description

System and apparatus for accessing global variables with multiple codes in an integrated executable of a heterogeneous architecture
Technical Field
The present invention relates generally to multiprocessing, and more particularly to linking global variables among multiple processors.
Background
In computer technology, parallel processing is very important. Parallel processing typically involves the use of multiple microprocessors connected to the same system to process a batch of data concurrently. Parallel processing generally includes three main types. These parallel processing systems use shared memory or distributed memory or a combination of both. Typically, shared memory is memory that can be accessed by multiple processors in a single operation, such as a "load" or "read" command. Distributed memory is memory that is localized to a single processor. In other words, each processor may access its own associated memory in a single access operation, but may not access other processors' associated memories in a single operation. Finally, there is also a hybrid or "heterogeneous" parallel process, where there is both shared and distributed memory.
Typically such a hybrid parallel processor system includes a Reduced Instruction Set (RISC) Main Processor Unit (MPU), such as a PowerPCTMProcessors, and a special purpose or "additional" processor (APU), e.g. SynergisticTMAPU (SPU). Generally, MPUs are used to execute general-purpose code, which includes complex control flows and coordinates overall mixed parallel processing functions. The MPU has access to all system memory. While in one embodiment only one MPU is used, in other embodiments multiple MPUs are used. The APUs are typically used to perform dataflow operations. That is, the APU calculates a highly repetitive multimedia, graphics, signal, or network processing workload characterized by a high computation to control decision ratio. In conventional hybrid systems, the APUs have no access to system memory, and their own memory, i.e., local storage, is typically smaller than the shared memory.
In general, using hybrid systems, while providing high computational performance, presents significant challenges to the programming model. Such problems are associated with APUs. The APU cannot directly address system memory. Thus, code to be run on an APU must first be transferred to the local storage associated with the APU before the code can be executed on the APU. In addition, the APU and the MPU can also have different instruction sets.
In a processing system, such as a hybrid processing system, data needs to be transferred between different parts (e.g., subroutines or functions) of a program. If these subroutines are designed to execute on a processor with direct access to system memory, or are designed to execute entirely on a single processor within a heterogeneous computer system, then the conventional approach of resolving global data addresses by a binder or linker function may be used. As understood by those of ordinary skill in the art, global data is generally defined as data referenced by a plurality of subroutines.
However, conventional linking mechanisms fail to support the function of referencing global variables when global data communication is required between subroutines executing on separate APUs, where an APU has its own local storage, or between a combination of one or more APUs and one or more MPUs. In conventional heterogeneous multiprocessor systems, global data may reside in several locations that are not consistently accessible from subroutines executing on different processors in the system. However, an integrated executable program typically requires access to these global variables from multiple different processors in the system.
It is therefore desirable to be able to access global variables in a hybrid parallel processing system to overcome the limitations of conventional systems.
Disclosure of Invention
The present invention is used to transfer global information between execution environments in a parallel processor architecture. The parallel architecture includes at least one first execution environment, at least one second execution environment, and a memory flow controller. The parallel processor architecture also includes an information transfer function. The information transfer function is operable to instruct the memory flow controller to transfer global information between the at least one first execution environment and the at least one second execution environment.
Brief description of the drawings
For a more complete understanding of the present invention, and the various advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
FIG. 1 generally depicts a distributed processing system environment for transferring global variables from a first execution environment to a second execution environment;
FIG. 2 depicts a method of compiling and linking individual object modules into an integrated executable program, wherein the integrated executable program further includes global variables accessible from the portion to which the object modules are linked;
FIG. 3 generally depicts components of a global variable descriptor used to transfer a global variable from a first execution environment to a second execution environment;
FIG. 4A depicts a pseudo code function for transmitting global variables from the APU to the MPU;
FIG. 4B depicts a pseudo code function for transmitting global variables from the MPU to the APU;
FIG. 5 generally depicts an integrated executable program and communication between system memory and local memory.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be recognized by one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known elements have been illustrated in schematic or block diagram form in order not to obscure the present invention in unnecessary detail. Moreover, for the most part, details concerning network communications, electromagnetic signal techniques, and the like, have been omitted inasmuch as such details are not necessary to obtain a complete understanding of the present invention, and are considered to be within the understanding of persons of ordinary skill in the relevant art.
It should also be noted that, unless indicated otherwise, all functions described herein may be performed in either hardware or software, or some combination thereof. In a preferred embodiment, however, the functions are performed by a processor, such as a computer or an electronic data processor, in accordance with computer program code, software, and/or integrated circuits programmed to perform the functions, unless indicated otherwise.
Referring to FIG. 1, reference numeral 100 generally designates a heterogeneous parallel processing architecture, providing an environment for implementing access to global data by multiple codes in an integrated executable program. The architecture 100 includes a distributed computing environment 110 and a system memory 160, which are electronically connected by an interface 150. The environment 110 includes a plurality of APUs 120, each with its own local store 125. The environment 110 also includes an MPU130, such as a RISC processor and its first-level buffer memory 135. In one embodiment, MPU130 is connected to system memory 160 via a signal path 145. In one embodiment, the additional processor comprises an SPU.
The environment 110 also includes a Memory Flow Controller (MFC) 140. In general, the MFC140 serves to perform data movement and synchronization functions between the MPU130 and the APU120 processors, and provides for data transfer between the main system memory 160 and the local storage 125. In FIG. 1, the MFC140 is connected to the system memory 160 through the interface 150.
Generally, the MFC140 enables movement of information between the system memory 160 and the local storage 125 of the APU120 at the request of the main processor 130 or the APU 120. Because the APU120 does not have direct access to the system memory 160, the MFC140 transfers information between the system memory 160 and the local store 125 of the APU120 as requested by a transfer function, such as a stub function, running on either the APU120 or the MPU 130. In one embodiment, the MFC140 includes a Direct Memory Access (DMA) device.
Architecture 100 is an environment in which an executable program runs, where the executable program references global information, such as global variables. The executable program has a plurality of subroutines for transferring information related to global variables to a first execution environment, such as the local store 125 of the APU120, and at least one subroutine for operating in a second execution environment, such as the shared memory 160 coupled to the MPU 130. The APU120 and the MPU130 communicate with each other using MFCs. The subroutines on the APU120 and the MPU130 communicate information by accessing and using global information. In one embodiment, a single MPU130 is used. In another embodiment, multiple MPUs 130 are used.
In one embodiment, a linker is used exclusively to generate an integrated executable program containing a link information transfer function, such as a stub function, or to transfer subroutines between two linked object files. In one embodiment, a linker is dedicated to generating an integrated executable program containing one or more link information transfer functions, such as stub functions, or to transferring routines between local memory 125 and system memory 160 that are capable of transferring values of global variables used in two or more object files. Typically, each object file contains at least one reference to a global variable. The object file may include multiple references. In one embodiment, the object file executes a stub function through a subroutine call or branch and link instruction. In another embodiment, the transfer function inline at the global variable call site is replaced by the compiler.
Generally, an information transfer function, such as a stub function, instructs the MFC140 to transfer information between two separate execution environments, such as the system memory 160 and the local storage 125. The information transfer function is used by the APU120 and the MPU130 to access and transfer global variables. The linker accepts the inserted, compiled information transfer functions and typically binds the various modules together with the information transfer functions so that the APU120 and MPU130 can access global information. The call to the called data transfer function may be inserted by either a compiler or a human programmer.
Turning now to FIG. 2, there are shown steps of a method 200 performed by a binder/compiler when creating an integrated executable that includes global variables. Generally, the data portion of an integrated executable program contains several types of program variables. These program variables are divided into text (code) classes and data classes. In addition, these program variables can be divided into local information and global information. Generally, global information is used to communicate values between different subroutines or objects of a program in architecture 100.
In step 205, a source module written in a programming language, such as "C", "C + +" or assembly language, is compiled by a compiler to produce a plurality of individual object code modules adapted for execution in a plurality of individual execution environments of the architecture 100, such as the APU120 or the MPU 130.
In general, each compiled code module targeting an APU120 or MPU130 can be the result of a combination of code or data originating from several different compiled object files. Also, multiple executable portions may include the same code or data. In general, step 205 may also consolidate all code or data into one contiguous region, which may also be loaded in one operation into the corresponding local store 125 of the selected APU 120. Examples of modules targeted to run on the selected APU120 are trigonometric functions, matrix operations, square root calculations, and the like. In addition, step 205 merges any files to be compiled into object modules targeted for execution on the MPU 130.
At step 210, the linker links the object code between the multiple execution object files resulting from step 205. These executable portions used by the APUs 120 will include both global and local data and code therein. The linking of global variables from the APU code modules to the APU code modules has been assigned, while the linking between the APU code modules and the MPU code modules has not been assigned.
In addition, at step 210, an external symbol dictionary symbol is generated and modified, typically corresponding to an identification between the global variable used facing the APU120 and the same global variable used facing the MPU 130. For example, for a module targeted for processing on the MPU130, if the external symbol dictionary has the variables "X" and "Y," the external symbol dictionary also has terms targeted for processing on the APU 120. The global variable assigned to the APU120 typically also has an identifier associated with the global variable from the APU120 to the MPU 130. For example, the global variables "X" and "Y" are identified as "APU-X" and "APU-Y", thereby allowing the linker to both distinguish the associated variable names and determine the global linking relationships between these variables.
In step 215, the linker assigns each compiled global data and code a corresponding location in the memory map corresponding to the local store 125 of the targeted APU120 module. The memory location of system memory 160 is treated as the original location for a given global variable. Allowing the variable to be originally invoked without rewriting the variable when it is stored in the APU.
In step 220, the linker links the object module to be executed on the APU120 to the object module whose execution is targeted for the MPU130, thereby creating an integrated executable program. These executable portions, which target and are used by the APUs 120 and MPU130, will carry well-defined global and local data and code. An external name is assigned to correctly link global variables from module to module. In general, an external name is a flag that is visible to all subroutines bound in the integrated module to help enable the transfer of external variables. Finally, in step 230, the linked global memory access information of steps 215 and 220 is shared among the various modules, thereby generating a linked integrated executable.
Turning now to fig. 3, a global information descriptor 300 is shown. The descriptor is used when transferring a global information between a first execution environment and a second execution environment, such as between shared memory 160 and local memory 125. Generally, the global descriptor information 300 is used by a transfer function, such as a stub function, wherein the transfer function transfers global variables from a first execution environment to a second execution environment, such as from the system memory 160 to the local memory 125. The global information descriptor 300 is constructed by the binder/linker.
In general, the global variable descriptor information 300 is used by the MFC140 to transfer information between a memory associated with a first processor and a memory associated with a second processor, and for example, a global variable such as an "x" or a text portion of an executable program can be transferred from the system memory 160 to the local store 125 of the first APU120, or from the local store 125 of the first APU120 to the local store 125 of the second APU120, using the global descriptor information 300. In one embodiment, this data transfer is performed at the request of the stub/transfer function.
To effect the transfer of information between the original location of the global variable in the system memory 160 and any one of the additional local stores 125 corresponding to the targeted APU120, the binder/linker uses a number of external names 310. Generally, the external name 310 is visible to all subroutines bound in the integrated module, and this includes subroutines bound to separate execution environments.
In addition to the external names of global variables, including indicia of the correct processor (e.g., the particular MPU130 or APU120), the binder/linker generates at least 3 other mapping indicia where a global descriptor is used in the integrated executable. These mapping flags are the address of the global variable in system memory 320, the length of global variable 330, and the address of the global variable in local memory 340. In another embodiment, multiple addresses 340 are used, one for each subroutine that is linked and targeted to the same APU120, using separate descriptors for the same global information, one for each external name of the descriptor 300.
In another embodiment, where the information descriptor 300 is to use multiple addresses of the local memory 340, an alternative approach is to use the level of indirection, i.e. one element per additional processor in an array of address variables, in the system. Using an array of address variables would require the function to maintain the validity of these address variables.
In yet another embodiment, data items do not require the length of global variable 320 as compared to text or code items, as these lengths may be obtained by other means by a computer programmer or compiler. However, when the data is a portion of bound code for execution on the APU120, the length 320 is required in the global descriptor 300.
Generally, a programmer may use the values of fields 320, 330, and 340 to explicitly move data between the first and second execution environments by using global information by directly specifying the fields and the names of the required tags, or implicitly by using compiler-provided built-in functions (i.e., internal functions) that will be extended to the appropriate references.
Turning now to FIG. 4A, a C programming language pseudo code object is disclosed for transferring global information from the local store 125 of the APU120 to the system memory 160. FIG. 4A discloses generally a plurality of global variables used in the environment 100 using the "drawFigure ()" and "displayLitFigure ()" routines. The variables "figure", "animation", "litFigure" and "world ViewTrans form" are external variables as well as global variables. Those skilled in the art will appreciate that in FIG. 4A, because variable declarations occur outside of any subroutine, these variable declarations are global variables and are located in the last data portion. Looking briefly at FIG. 4B, the key "external" indicates that these variables are global variables and cannot be found in the data portion of FIG. 4B. Referring again to FIG. 4A, the global variables in FIG. 4A are all associated with the system memory 160 of FIG. 1.
The first file in FIG. 4A is to be executed in the MPU130 and the second file in FIG. 4B is to be executed in the APU 120. In general, the code disclosed in FIGS. 4A and 4B allows code to be transferred between the MPU130 and the APU120 via a transfer/stub function. In one embodiment, the code is generated by a compiler, programmer, or by a compiler or other tool according to a high level description written by a programmer.
In environment 100, there are multiple memory locations that are not addressable from all processors, such as local memory 125. Thus, the binder for environment 100 adds a new function. For each external variable, at least two addresses are required. One memory address corresponds to a global variable in main memory 130, (generally corresponding to 320 of fig. 3), and one corresponds to a potential address in local memory 125 (generally corresponding to 340 of fig. 3). The binder allocates memory space so it knows both addresses. The memory address is stored in an external symbol dictionary entry in the object file. In one embodiment, the memory address is stored in an external symbol dictionary of the executable file.
In yet another embodiment, for global variables, this is all the information needed, as the length can be calculated by the programmer. However, in another embodiment, the binder typically provides the length 330 to another object file that is to reference it. Using length 330, the program of fig. 4B can itself be treated as a form of external/global data.
For ease of explanation of fig. 4A and 4B, three address names are given. The first two names correspond directly to the required addresses. The third name is the actual global variable name whose interpretation depends on the context (i.e., its interpretation is equivalent to one of the first two names).
For example, the three names correspond to the variable "figure" of fig. 4A. These are _ MPU _ configure (the address of configure in system memory 160, corresponding to 320 of FIG. 3), _ APU _ configure (the address of configure in local store 125, corresponding to 340 of FIG. 3). The variable "configure" includes contents corresponding to _ MPU _ configure in the APU process or _ APU _ configure in the MPU process.
A similar logic may be extended to the global variable APU _ TL, the address of the specific fill (Charge) of the configuration in the local store 125. In general, the filler may be defined as the result of compiling a file, such as a second file and binding with the subroutine it calls. For a file with filler, the length 330 of the other object files referencing it is also provided by the binder. For fill content, the internal name is not derived directly from any external name in the program, but is specified by the programmer during the linking process.
In one embodiment, the name APU _ TL generally refers to a filler. In FIG. 4A, _ MPU _ APU _ TL is the address of this fill in MPU memory 160, _ APU _ APU _ TL is the address of this fill in local storage, and length _ APU _ TL is the length (size) of this fill. In the embodiment shown in FIGS. 4A and 4B, all three input variables are transferred from the MPU program to the APU program. MPU _ APU _ TL generally corresponds to address 320, length _ APU _ TL generally corresponds to length 330, and _ APU _ APU _ TL generally corresponds to length 340.
Turning now to FIG. 5, a generalized depiction of the communication between an integrated executable 505, which includes global data and the location of such global data in system memory 160, and local memory 125 is depicted in one embodiment 500.
In fig. 5, the integrated executable program includes an External Symbol Dictionary (ESD) 510. Typically, the ESD510 is created in a link, allowing individual modules to have access to global data, and executable programs residing in the system memory 160, i.e., MPU code and data 520, transfer information between the modules. EDS510 includes the name of global information, as well as symbols to be shared among modules, such as "configuration," "instrumentation," "litFigure," and the like.
EDS510 also includes a length (len) for each module, an address for each module whose image is to be stored in system memory 160, MPU _ addr. EDS510 also includes the address, APU _ addr, at which the image of each module is to be stored in local storage.
The integrated executable 505 includes global data 538. In general, global data 538 is data that may be accessed from multiple separate modules at runtime. In the illustrated embodiment, the global data 538 includes global data "configuration" 540, global variable drilling 542, and global variable world transformation 546. The contents of the global data 538 are referenced by a plurality of tags. One marker is used by the MPU130 and the other marker is used by the APU 120. For example, the global variable "figure" of figure 4A is referenced by multiple names. The binder links two differently named variables together to reference the same global variable. Both the name and the link information are stored in the ESD 510.
Integrated executable 505 also includes a copy of the individual modules to be stored in system memory 160. In the illustrated embodiment, MPU code and data 520 are stored in MPU system memory 160. MPU code and data 520 includes a drawFigure symbol 523 and a displayLitFigure symbol 525. In MPU code and data 520, a drawFigure symbol 523 and a displayLitFigure symbol 525 use global data configuration 540 and global data litFigure544, respectively.
Integrated executable 505 also includes a copy of at least one module that may be used to copy to local storage 125. In the illustrated embodiment, an APU code and data 530 is available for storage in the MPU system memory 160. APU code and data 530 includes transformANdLightFigure symbol 532, computeCoolor symbol 534, and applyTransform 536. In one embodiment, transform 532 uses the global variable world transformation 546.
System memory 160 is also illustrated in fig. 5. System memory 160 contains an image of the module to be transferred to local memory 125, as well as MPU code and data 550. The symbol APU _ TL is stored in memory location 550 and includes information and references to system memory module 530. Global data 538 is also stored in system memory 160. These global variables, configuration 540, illmination 542, litFigure544 and world view transformation546, may all be modified by MPU code and data 560, and ultimately by code running in local memory 125.
FIG. 5 also illustrates MPU code and data 560 used in the system memory 160. This is a copy of the MPU code and data 520 from the integrated executable 505. MPU code and data 560 contains the symbols drawFigure 523 and displayLitFigure 525.
Fig. 5 also illustrates the contents of local memory 125. In the illustrated embodiment, at runtime, local memory 125 also stores a copy of global data 538. Local store 125 also has executable program APU code and data 570, an image copy 550. The executable APU code and data 570 has access to global data configuration 540, instrumentation 542, litFigure544 and world Viewtransformation 546.
It is understood that the present invention can take many forms and embodiments. Accordingly, various modifications may be made in the foregoing without departing from the spirit or scope of the invention. The functionality outlined herein allows for implementation with a variety of possible programming models. The disclosures made herein are not intended to be limited to any particular programming model, but are to be construed as providing the basic mechanism for implementing such programming models.
Having thus described the present invention by reference to certain of its preferred embodiments, it is to be understood that the embodiments disclosed are illustrative rather than limiting in nature and that a wide range of variations, modifications, changes, and substitutions are intended in the foregoing disclosure. In some instances, some features of the present invention may be employed without a corresponding use of the other features. Many variations and modifications will be apparent to and necessary to those skilled in the art in light of the above description of the preferred embodiment. It is appropriate, therefore, that the claims appended hereto be construed with a reasonably broad scope, in a manner consistent with the scope of the invention.

Claims (21)

1. A parallel processor architecture for transferring global information from a first execution environment to a second execution environment, the architecture comprising:
at least one first execution environment;
at least one second execution environment
A memory flow controller; and
an information transfer function, wherein the information transfer function is operable to instruct the memory flow controller to transfer global information between the at least one first execution environment and the at least one second execution environment.
2. A parallel processor architecture according to claim 1, characterized in that the first execution environment is a shared memory.
3. A parallel processor architecture according to claim 1, characterized in that the second execution environment is a local memory.
4. A parallel processor architecture according to claim 3, characterized in that the memory flow controller is further adapted to transfer global data between the plurality of local memories.
5. A parallel processor architecture according to claim 1, characterized in that the local memory is associated with an additional processor unit.
6. A parallel processor architecture according to claim 1, characterized in that the memory flow controller further comprises a direct memory access device.
7. An integrated executable system usable on at least one main processor and at least one additional processor, the integrated executable system comprising:
at least one executable module executable on at least one host processor;
at least one additional executable module operable to execute on at least one additional processor, wherein the main and additional executable program portions are linked as an integrated executable program; and
at least one global variable, wherein indicia of the global variable is available for transfer between the linked at least one additional executable program module and the at least one main executable program module.
8. A system according to claim 7, characterized in that at least one additional executable program module is loaded into a continuous memory of the local memory.
9. The system of claim 7, wherein the at least one additional executable program is operable to be loaded into the local memory in a load operation.
10. The system of claim 7, wherein the plurality of additional executable portions are linked prior to binding the plurality of additional executable portions to the main executable portion.
11. The system of claim 7, wherein the at least one additional executable module and the at least one main executable module share data via a transfer function.
12. The system of claim 7, wherein the at least one additional executable module is derived from a plurality of source files.
13. The system of claim 7, wherein the at least one primary processor executable module is compiled from a plurality of source files.
14. A method of linking a main executable module and an additional executable module to enable access to global information in an independently executable area, the method comprising:
allocating global data to memory locations in system memory;
allocating global data to memory locations in a local memory;
creating a tag representing an address of global data in the shared system memory; and
a tag is created that represents an address of global data in the local memory.
15. The method of claim 14, further comprising creating a flag indicating a length of the global data.
16. The method of claim 14, further comprising creating an external name for the global variable.
17. A method of transferring global data from an additional processor unit to a main processor unit, comprising:
transferring a global data tag from the additional processor unit to the memory flow controller;
transmitting a global data length flag from the additional processor unit to the memory flow controller;
indicia of a system memory location where data is to be stored in the system memory is transferred from the additional processor unit to the removable controller.
18. The method of claim 17, wherein transferring an indicia of a system memory location in the system memory where the data is to be stored from the additional processor unit to the removable controller further comprises transferring an array.
19. A method of transferring global data from an additional processor unit to a primary processor unit, comprising:
transmitting data from the primary processor unit to the mobile flow controller;
transmitting a data length flag from the primary processor unit to the mobile flow controller; and
local memory addresses of the data array to be stored in the local memory are transferred from the main processor unit to a mobile flow controller.
20. A computer program product linking a main executable program module and an additional executable program module to enable access to global information in separate execution areas, the computer program product having a computer program medium embodied thereon, the computer program comprising:
computer code for allocating global data to memory locations in the shared system memory;
computer code for allocating global data to memory locations in the local memory;
computer code for creating a tag representing an address of global data in the shared system memory; and
computer code for creating a tag representing an address of global data in the local memory.
21. A processor for linking a main executable module and an additional executable module to enable access to global information in different execution regions, the processor comprising a computer program comprising:
computer code for allocating global data to memory locations in the shared system memory;
computer code for allocating global data to memory locations in the local memory;
computer code for creating a tag representing an address of global data in the shared system memory; and
computer code for creating a tag representing an address of global data in the local memory.
HK04105319.9A 2002-10-24 2004-07-21 Method and apparatus for enabling access to global data by a plurality of codes in an integrated executable for a heterogeneous architecture HK1062342A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/280,187 2002-10-24

Publications (1)

Publication Number Publication Date
HK1062342A true HK1062342A (en) 2004-10-29

Family

ID=

Similar Documents

Publication Publication Date Title
US7200840B2 (en) Method and apparatus for enabling access to global data by a plurality of codes in an integrated executable for a heterogeneous architecture
US7243333B2 (en) Method and apparatus for creating and executing integrated executables in a heterogeneous architecture
Alpern et al. Implementing jalapeño in java
Bogle et al. Reducing cross domain call overhead using batched futures
CN104756078B (en) Apparatus and method for processing resource allocation
US20040205740A1 (en) Method for collection of memory reference information and memory disambiguation
US20030066056A1 (en) Method and apparatus for accessing thread-privatized global storage objects
US6721943B2 (en) Compile-time memory coalescing for dynamic arrays
US12373213B2 (en) Hardware enforcement of boundaries on the control, space, time, modularity, reference, initialization, and mutability aspects of software
US8843920B2 (en) Systems and methods for deferring software implementation decisions until load time
EP3262503A1 (en) Hardware instruction generation unit for specialized processors
US7222332B2 (en) Method and apparatus for overlay management within an integrated executable for a heterogeneous architecture
US6260191B1 (en) User controlled relaxation of optimization constraints related to volatile memory references
CA2434280A1 (en) Method and apparatus to guarantee type and initialization safety in multihreaded programs
WO2020219609A1 (en) Hardware enforcement of boundaries on the control, space, time, modularity, reference, initialization, and mutability aspects of software
Hill et al. Tempest: A substrate for portable parallel programs
WO2019105565A1 (en) Systems for compiling and executing code within one or more virtual memory pages
Wolfe et al. Implementing the OpenACC data model
US20030135535A1 (en) Transferring data between threads in a multiprocessing computer system
HK1062342A (en) Method and apparatus for enabling access to global data by a plurality of codes in an integrated executable for a heterogeneous architecture
Sewall et al. Developments in memory management in OpenMP
JP3051438B2 (en) How to give enhanced graphics capabilities
US6961839B2 (en) Generation of native code to enable page table access
Daloukas et al. GLOpenCL: OpenCL support on hardware-and software-managed cache multicores
Yang et al. Support OpenCL 2.0 Compiler on LLVM for PTX Simulators