WO2019212182A1

WO2019212182A1 - Apparatus and method for managing a shareable resource in a multi-core processor

Info

Publication number: WO2019212182A1
Application number: PCT/KR2019/004886
Authority: WO
Inventors: Mahantesh Mallikarjun KOTHIWALE; Manjunath JAYRAM; Tamilarasu S; Srinivasa Rao KOLA; Yunas Rashid; Andhavarapu KARTHIK
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2018-05-04
Filing date: 2019-04-23
Publication date: 2019-11-07
Anticipated expiration: 2020-11-04
Also published as: CN115605846A; EP3756092A4; EP3756092A1

Abstract

Embodiments herein disclose an apparatus and methods for managing a shareable resource(s) in a multi-core processor. Embodiments herein relate to computer systems and, more specifically, to parallelizing a data stream for distributed processing within a computer system. The method includes providing a lockless access of the shareable resource in a multiprocessing cores or single processing core, by releasing the assigned shareable resource in a dedicated release sub queue of the each processing core, to support cross core de-allocation of the shareable resources. The method includes monitoring an occupancy level of each memory pool and dynamically adjusting the allocation per-pool without locking the shareable resources.

Description

APPARATUS AND METHOD FOR MANAGING A SHAREABLE RESOURCE IN A MULTI-CORE PROCESSOR

The present disclosure relates to computer systems. And more specifically, the present disclosure relates to parallelizing a data stream for distributed processing within a computer system.

Currently, programmable computing systems, comprising a multi-core processor platform such as asymmetric multi processing mode (AMP), symmetric multi processing (SMP) mode, and bound multi processing (BMP) mode, may need efficient management of concurrent access to shareable resources. In general, the operating system (OS) or real time operating system (RTOS) may provide inter processor communications (IPC(s)) method. Further, the OS/RTOS may provide resource locking method across cores such as spinlock, and the OS/RTOS may provide resource locking method across the threads within the cores such as semaphore, int-lock, to avoid concurrent access to shared resources such as peripheral ports, memory and so on. Also, dynamic resource management for the multi-core processor may require handling a metadata such as resource availability information, wherein the metadata can be a shared resource. The conventional methods may handle the shareable resources through locking such as int-lock, semaphore and spinlock. However, the conventional locking methods may affect the performance. The　frequent resource locking and wait for unlock can affect parallelism in multi-core system/processors. Also, the cores/threads may not completely utilize the dedicated memory, thereby wasting the available memory.

Accordingly, a conventional method for accessing the shareable resource in a single core processor and multi-core processor, according to the prior art will be described, referring to FIGs. 1a to 3b, as follows.

FIG. 1a illustrates a block diagram of a conventional single core processor accessing a shareable resource. The single processor core may have multiple threads to be executed. Further, the single data memory pool and single metadata file is allocated for the single core processor to execute the threads. Also, memory allocation and de-allocation by the processor, may need to have intra-core locks (i.e. interrupt locks) for protecting critical section such as metadata file. The metadata file may be protected across the multiple threads.

FIGs. 1b and 1c are flowcharts depicting a conventional method to access the shareable resource by the single core processor.

The memory allocation/de-allocation may need to have both intra-core lock (i.e. interrupt locks) and inter-core lock (i.e. spin lock)　for protecting the critical section such as metadata. At an acquire int-lock step, the processor may disable the interrupts and may not allow another thread to execute. At a release int-lock step (shown in FIG. 1c), the processor may allow the waiting threads to resume and start again from the acquire int-lock. The steps labeled as 'A' can be a critical section. In the critical section, the processor may allow one thread at a time and other threads may need to wait at acquire int-lock step.

FIG. 2a illustrates a block diagram of a conventional multi-core processor accessing a shareable resource.

The multi-core processor with each core may have multiple threads to be executed. The multi-core processor may be allocated with a single data memory pool and a single metadata file to execute the multiple threads. Further, the memory allocation and de-allocation may need to have inter-core locks (i.e. spin locks) and intra-core locks (i.e. interrupt locks) for protecting the critical section such as metadata file. The locks may affect parallelism in the multi-core processor.

FIG. 2b illustrates a block diagram of a conventional multi-core processor accessing a shareable resource in the dedicated metadata file for each core of the processor.

The metadata file may be allocated for each core of the processor. Further, each thread may apply intra-core locks (i.e. int-locks) while accessing the dedicated metadata file. However, the inter-core locks may be needed for protecting the metadata during de-allocation or release of the critical section such as metadata file.

FIG. 2c illustrates a block diagram of a conventional multi-core processor accessing a shareable resource based on cross core release.

In the cross core release scenario, the core-0 may allocate 1 memory block from the memory　pool-0 associated with the core-0. The information regarding the allocated memory block may be updated by the core-0 to the metadata-0 file associated with the core-0. Further, the core-0 may share the memory block data to core-1. However, the inter-core locks may be needed for protecting the metadata during de-allocation or release of the critical section such as metadata file.

FIG. 2d illustrates a block diagram of a conventional multi-core processor accessing the same metadata block for allocating a memory block.

In the cross-core release scenario, the core-1 may release block-0 shared the by core-0 to metadata-0 block. At the same time the core-0 may try to allocate another block to another core, by accessing the metadata-0 file/block. However, themetadata-0 can still be the common/shareable resource and critical section. Accordingly, the inter-core lock (i.e. spin-lock) may not be avoided.

FIGs. 3a and 3b are flowcharts depicting a conventional method for allocating and releasing by the multi-core processor. To access busy and free list while freeing pointer belonging to different core, the conventional methods may need to protect critical section metadata, thereby the dedicated resource maybe may not be advantageous to the multi-core to achieve the parallelism. In the critical section labeled as 'B' (shown in FIGs. 3a and 3b), thethread_1 in the core-0 may acquire int-lock and may acquire the spinlock. Accordingly, the other threads in the core-0 may need to wait for the release int-lock. Further, the core-1, core-2, and core-3 may need to wait for release spin lock

In conventional methods, the dynamic memory management solutions may use per-core/per-thread static resource (memory) allocation to handle the incoming data stream blocks and release the data stream blocks. Accordingly, the allocation by the processor may use static allocation and release the resource (memory) by the same core/thread. However, the access to the common/shareable resource may have an access conflict, if the resources allocated by a particular core are de-allocated/released by the other cores.

However, the conventional methods, may not allocate from one core/thread and release of memory/buffer by another core/thread. The conventional methods may not have dynamic buffer access and release in case of multi-threaded/multi-core system.

The present disclosure has been made to address at least the above problems and/or disadvantages and to provide at least the advantages described below. Accordingly, an aspect of the present disclosure is to provide an apparatus and methods for managing a shareable resource associated with a multi-core processor in a computing environment.

Another aspect of the present disclosure is to provide apparatus and methods for resource management in a multi-core processor by having a per-core/per-thread dedicated resource pool and metadata.

Another aspect of the present disclosure is to provide apparatus and methods for cross core de-allocation of resources using a special release queue management with an exclusive set of sub-queues.

Another aspect of the present disclosure is to provide apparatus and methods for monitoring occupancy level of each memory pool and dynamically adjusting the allocation per-pool in a lockless manner.

Accordingly, the embodiments herein provide a method for managing a shareable resource in a multi-core processor. The method includes accessing, by a target processing core, the shareable resource associated with a source processing core. The source processing core and the target processing core is resided in the multi-core processor. The method includes generating, by the source processing core , a plurality of release sub queues corresponding to each of at least one target processing core in a release queue of the source processing core, based on the accessed sharable resource, to release a shareable resource assigned by the source processing core to the target processing core.. The method includes releasing, by the target processing core at least one accessed shareable resource to the generated respective plurality of release sub queues in the release queue of the source processing core, based on analyzing a first information relating to the sharable resource. The first information relating to the sharable resource is stored in a metadata file. The method includes updating, by the source processing core, a second information in the stored metadata file corresponding to the source processing core, based on identifying the release of the shareable resource in the release queue.

Accordingly, the embodiments herein provide an apparatus for managing a shareable resource in a multi-core processor. The apparatus is configured to access by the target processing core, the shareable resource associated with a source processing core. The apparatus is configured to generate, by the source processing core, a plurality of release sub queues corresponding to each of at least one target processing core in a release queue of the source processing core, based on the accessed sharable resource, to release a shareable resource assigned by the source processing core to the target processing core. The apparatus is configured to release, by the target processing core, at least one accessed shareable resource to the generated respective plurality of release sub queues in the release queue of the source processing core, based on analyzing a first information relating to the sharable resource, wherein the first information relating to the sharable resource is stored in a metadata file. The apparatus is configured to update, by the source processing core, a second information in the stored metadata file corresponding to the source processing core, based on identifying the release of the shareable resource in the release queue.

These and other aspects of the example embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating example embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the example embodiments herein without departing from the spirit thereof, and the example embodiments herein include all such modifications.

The above and other aspects, features, and advantages of certain embodiments of the present disclosure will be more apparent from the following detailed description taken in conjunction with the accompanying drawings , in which:

FIG. 1a illustrates a block diagram of a conventional single core processor accessing a shareable resource;

FIGs. 1b and 1c are flowcharts depicting a conventional method to access the shareable resource by the single core processor;

FIG. 2a illustrates a block diagram of a conventional multi-core processor accessing a shareable resource;

FIG. 2b illustrates a block diagram of a conventional multi-core processor accessing a shareable resource in the dedicated metadata file for each core of the processor;

FIG. 2c illustrates a block diagram of a conventional multi-core processor accessing a shareable resource based on cross core release;

FIG. 2d illustrates a block diagram of a conventional multi-core processor accessing the same metadata block for allocating a memory block;

FIGs. 3a and 3b are flowcharts depicting a conventional method for allocating and releasing by the multi-core processor;

FIG. 4 illustrates an apparatus for managing shareable resource in a multi-core processor in a computing environment , according to embodiments of the present disclosure;

FIG. 5a illustrates a block diagram for managing shareable resource using single release queue for each core of the multi-core processor, according to embodiments of the present disclosure;

FIG. 5b illustrates a block diagram for managing shareable resource using plurality of release sub queues for each core of the multi-core processor, according to embodiments of the present disclosure;

FIG. 6a is a flowchart depicting a method for adding pointers in the release queues, according to embodiments of the present disclosure;

FIG. 6b is a flowchart depicting a method for releasing pointers in the release queues, according to embodiments of the present disclosure;

FIG. 7a is a flowchart depicting a method for allocation across cores using per-core and/or per-thread dedicated resource pool, according to embodiments of the present disclosure;

FIG. 7b is a flowchart depicting a method for release across the cores using per-core and/or per-thread dedicated release queues, according to embodiments of the present disclosure;

FIG. 8a is a flow chart depicting a method for dynamic pool adjustment, according to embodiments of the present disclosure;

FIG. 8b is a flowchart depicting a steps for dynamic pool adjustment, according to embodiments as disclosed herein;

FIG. 9a is a flowchart depicting a method for managing shareable resource associated in the multi-core processor, according to embodiments of the present disclosure;

FIG. 9b is a flowchart depicting a method for determining, if the accessed at least one sharable resource is corresponding to the source processing core, according to embodiments of the present disclosure;

FIG. 9c is a flowchart depicting a method for pushing the shareable resource of the source processing core to the release sub queue marked in the free list, during dynamic pool adjustment, according to embodiments of the present disclosure; and

FIG. 9d is a flowchart depicting a method for updating by the source processing core, a metadata file corresponding to the source processing core, according to embodiments of the present disclosure.

Throughout the drawings, like reference numerals will be understood to refer to like parts, components, and structures.

The example embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The description herein is intended merely to facilitate an understanding of ways in which the example embodiments herein can be practiced and to further enable those of skill in the art to practice the example embodiments herein. Accordingly, this disclosure should not be construed as limiting the scope of the example embodiments herein.

The embodiments herein achieve an apparatus and methods for managing a shareable resource in a multi-core processor, by generating dedicated release sub queues. Referring now to the drawings, and more particularly to FIGs. 4 through 9d, where similar reference characters denote corresponding features consistently throughout the figures, there are shown example embodiments.

FIG. 4 illustrates an apparatus 100 for managing shareable resource in a multi-core processor 102, according to embodiments of the present disclosure.

The apparatus 100 can be at least one of but not limited to, a server, a desktop computer, a hand-held device, a multiprocessor system, a microprocessor based programmable consumer electronics, a laptop, a network computer, a minicomputer, a mainframe computer, a modem, a vehicle infotainment system, a consumer electronics, and so on. The apparatus 100 may include a multi-core processor 102, and a memory 104. The memory 104 can be at least one of, but not limited to, a static memory, a dynamic memory, flash memory, a cache memory, a random access memory (RAM), and so on.

The processor 102 or multi-core processor 102 may include a plurality of cores such as a source processing core 102a and at least one target processing cores 102b. The source processing core 102a can be at least one of a core 0, a core 1, a core 2, a core 3, and so on. The target processing core can be at least two of the core 0, core 1, core 2, core 3 and so on. The source processing core 102a may assign a dedicated memory block for each core of the multi-core processor 102. The memory 104 may include a shareable resource such as at least one of, but not limited to, a metadata, a data stream, a packet, and so on. Further, the apparatus 100 may include a release pointers stored in a static memory or a static array. The release pointers may have one or more release queues dedicated to each core of the multi-core processor 102. The release queues may further have dedicated release sub queues such as entry queues for each core of the processor 102 or multi-core processor 102.

Further, the apparatus may include an input interface (not shown), and an output interface (not shown) that are connected by a bus (not shown), which may represent one or more system and/or peripheral busses. The data source to the apparatus 100 and the multi-core processor 102 can at least one of but not limited to, packetized data from applications, databases, computer networks, scientific instrumentation, real-time video capture devices, and so on. The apparatus 100 may also include volatile and/or non-volatile memory (not shown), removable and/or non-removable media, processor-readable instructions, data structures, program modules, other data, and so on. The volatile and/or non-volatile memory includes at least one of, but not limited to, random access memory (RAM), read-only memory (ROM). EEPROM, flash memory, CD-ROMs, digital versatile discs (DVDs) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and that can be accessed by the source processing core 102a and/or the at least one target processing cores 102b.

The flash memory or other form of fixed or removable storage medium in the apparatus 100may be used to store desired programmable instructions and program data and may be accessed by the cores such as the source processing core 102a and the at least one target processing cores 102b. Further, an operating system (OS)/ real time operating system (RTOS) of the apparatus 100 may allow partitioning the physical address space of the memory 104 for managing the shareable resources. The memory 104 may permit multiple concurrent read/write operations.

The operating system (OS)/real time operating system (RTOS) may include at least one of sub modules such as, but not limited to a kernel processing module, a thread managing module, a process managing module, an input/output ("I/O") managing module, a memory managing module, and so on. The process managing module may perform multitasking by initializing, scheduling, and switching processes for OS for accessing the cores of the multi-core processor 102. The thread managing module may manage instantiation and execution of application threads, including receiving threads and sending threads of the multi-core processor 102. For example, thread managing module may allocate the threads for execution among cores　of multi-core processor 102. The memory managing module may control the allocation, use, and de-allocation of the physical address space provided by the memory 104.

Advantageously, at least one aspect of the embodiments herein, enables the dynamic redistribution of shareable resource across logical partitions under direction of a workload manager (not shown). The shareable resource may include at least one of but not limited to, a CPU (central processing unit) resources, a logical processor resources, an Input/output resources, a coprocessors resources, a channel resources, a network adapters, a memory resources, an audio, a display, a common peripherals, a serial ports, a parallel ports, and so on. In an example, during the execution of task, the memory manager module may typically allocate a stack and a heap for allocating blocks of memory. The allocated memory blocks may be referenced by the pointers.

The apparatus 100 may process incoming data received by the input interface and may parallelize the incoming data. The input interface can be at least one of a network interface card (NIC), a programmable NIC, an analog-to-digital converter (not shown), and so on, coupled to the multi-core processor 102. The apparatus 100 may have a buffer mapped to the memory 104, which may be used to store intermediate data.

The memory blocks 103a-103d as shown in FIG. 4, may vary in length. The memory blocks 103a-103d may include ethernet data-grams, internet protocol packets, asynchronous transfer mode (ATM) cells, data constituting a run of a scientific instrument, a video frame or video coding block, an image, a block of instrument data, and so on. Further, the threads may be sliced among the cores of multi-core processor 102 during execution. Each thread or memory block may include similar components operating similarly. Each core of multi-core processor 102 may also have a different numbers of buffers and send threads. The memory 104 may also include metadata dedicated for each core of the multi-core processor 102. In an example, the metadata may include reference to threads, reference to pointers, a reference to location in memory 104, a length of the memory block, and so on.

In an embodiment, the apparatus 100 is configured to assign at least one shareable resource stored in a memory 104, to at least one target processing cores 102b, based on a determined type of task to be executed by the multi-core processor 102. In an embodiment, the at least one shareable resource is assigned by the source processing core 102a. In an embodiment, the apparatus 100 is configured to store information related to the assigned at least one shareable resource in a metadata file corresponding to the source processing core 102a. In an embodiment, the apparatus 100 is configured to access, by a target processing core 102b, the shareable resource associated with a source processing core 102a. In an embodiment, the apparatus 100 is configured to provide access to the assigned at least one shareable resource for the at least one target processing core 102b, based on the information stored in the metadata file. In an embodiment, the apparatus 100 is configured to determine if the accessed at least one sharable resource is corresponding to the source processing core 102a, based on the stored metadata. In an embodiment, the accessed at least one shareable resource by the determined by the at least at least one target processing core102b based on accessing the at least one shareable resource. In an embodiment, the apparatus 100 is configured to generate a plurality of release sub queues corresponding to each of the at least one target processing core 102b in a release queue of the source processing core 102a, to release a shareable resource assigned by the source processing core 102a to the target processing core 102b. In an embodiment, the apparatus 100 is configured to release the at least one accessed shareable resource to the generated respective plurality of release sub queues in the release queue of the source processing core 102a, based on the analyzed a first information relating to the shareable resource. In an embodiment, the first information relating to the sharable resource is stored in a metadata file. The pointers may be stored in release sub queues. The multi-core processor 102 may access the pointers through indirect addressing mode instruction sets. In an embodiment, the shareable resource is released by the target processing core 102b. In an embodiment, the apparatus 100 is configured to identify if the accessed shareable resource is released by the at least one target processing core 102b, based on analyzing the release queue corresponding to the source processing core 102a and the at least one target processing cores102b. In an embodiment, the apparatus 100 is configured to update second information in the stored metadata file corresponding to the source processing core 102a, based on identifying the release of the shareable resource in the release queue.

In an embodiment, the apparatus 100 is configured to determine the available space of each release sub queue in the release queue. In an embodiment, the apparatus 100 is configured to determine the available space of each release sub queue is above a pre-defined threshold value or below the pre-defined threshold value. In an embodiment, the apparatus 100 is configured to update information corresponding to a free list and a busy list of the analyzed available space of each release sub queue, in the metadata file corresponding to the respective source processing core 102a and the at least one target processing core 102b. In an embodiment, the apparatus 100 is configured to set a deficient flag corresponding to the source processing core 102a and the at least one target processing cores 102b, if the available space of each release sub queue is determined to be below the pre-defined threshold value, based on the updated metadata file. In an embodiment, the apparatus 100 is configured to push the shareable resource of the source processing core 102a to the release sub queue marked in the free list by dynamically adjusting the pool size of the release queue, if the available space of the release sub queue is determined to be below the pre-defined threshold value. In an embodiment, the apparatus 100 is configured to trigger the release of the shareable resource during the assigning of the shareable resource, if the release queue corresponding to the source processing core 102a has available space. In an embodiment, the apparatus 100 is configured to cause the release of the shareable resource during the release of the shareable resource to the release queue, if the release queue corresponding to the source processing core has available space. In an embodiment, the apparatus 100 is configured to parse the metadata file, to determine the at least one of a pool ID, the free list of the pool ID, the busy list of the pool ID, and the assigned shareable resource ID. The pool ID can be a core ID/core number of the respective memory pool. For example, the pool ID of the memory pool corresponding to core 2 is 2. In an embodiment, the information relating to shareable resource comprises a pool ID, and the assigned shareable resource ID, a resource block ID, and an assigned core ID.

In an embodiment, the source processing core 102a and the at least one target processing cores 102b comprises at least one of a core 0, a core 1, a core 2, a core 3, and so on. In an embodiment, assigning the at least one shareable resource comprises allocating a memory block in the memory 104 to each of processing core of the source processing core 102a and the target processing core 102b for accessing the shareable resource in the memory block. In an embodiment, the metadata file is generated for each processing core of the source processing core 102a and the target processing core 102b and stored sequentially according to the order of each processing core. In an embodiment, the release queue corresponding to the source processing core 102acomprises at least one entry queue corresponding to the at least one target processing core102b. In an embodiment, releasing the at least one shareable resource comprises updating the at least one entry queue corresponding to the at least one target processing core102b. In an embodiment, the shareable resource is assigned by the source processing core 102a and released by the at least one target processing core 102b.

The FIG. 4 illustrates functional components of the computer implemented system. In some cases, the component may be a hardware component, a software component, or a combination of hardware and software. Some of the components may be application level software, while other components may be operating system level components. In some cases, the connection of one component to another may be a close connection where two or more components are operating on a single hardware platform. In other cases, the connections may be made over network connections spanning long distances. Each embodiment may use different hardware, software, and interconnection architectures to achieve the functions described.

The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

FIG. 5a illustrates a block diagram for managing shareable resource using a single release queue for each core of the multi-core processor 102, according to embodiments of the present disclosure.

Consider a cross core release scenario, where the core-1 may release the resource block 0 from the memory 104, which may be shared by the core-0 to core-0 release queue. At the same time, thecore-0 may allocate/assign another resource block on accessing themetadata-0 without any conflict. In this scenario, the processor is mapped to four release pointer queues for four cores. The pointers may be added or generated in the release queue. For instance, the release queues can be a shareable resource for the cores of the multi-core processor 102. The release queues can be concurrently accessed by multiple cores. The critical section may be protected, for instance, the release queue.

FIG. 5b illustrates a block diagram for managing a shareable resource using a plurality of release sub queues for each core of the multi-core processor 102, according to embodiments of the present disclosure.

In an embodiment, a plurality of release sub queues for each of the at least one target processing core 102b in a release queue corresponding to the source processing core 102a is generated, to release the determined shareable resource corresponding to the source processing core 102a. An exclusive release sub queue for each processing core is added in the pointers. In an embodiment, the at least one accessed shareable resource is released to the respective plurality of release sub queues in the release queue corresponding to the source processing core 102a, based on the analyzed information in the metadata file. The shareable resource is released by the target processing core 102b.

The embodiments herein may provide a per-core exclusive release queue (ERQ) and the ERQ may be added in the case of single core processor, based on per thread execution. The shareable resource is added in the entry queue of the release pointers to avoid locks to the shareable resource.

FIG. 6a is a flowchart depicting a method for adding pointers in the release queues, according to embodiments of the present disclosure.

The pointers in the release queue may be added by the multi-core processor 102. At step 611, the target processing core 102b determines whether the release queue/release pointer does not belong to the target processing core 102b. And at step 613, the target processing core 102b may release the pointers to the respective release queue of the source processing core 102a, if the release queue/release pointer does not belong to the target processing core 102b. The release queue of the respective queue may be updated with the shareable resource based on the information stored in the metadata file of the respective core.

FIG. 6b is a flowchart depicting a method for releasing pointers in the release queues, according to embodiments of the present disclosure.

In an example, the shareable resource assigned by the core 0 may be released to the release sub queue of the release queue/ release pointer. At step 621, the target processing core 102b such as core 1, core 2, and core 3 determines whether the release queue/release pointer is empty and has available space. At step 623, if the release queue has empty space, then the target processing core 102b may call the release operation to release the shareable resource and add to the release sub queue/entry queue in the release queue/release pointers.

FIG. 7a is a flowchart depicting a method for allocation across cores using per-core and/or per-thread dedicated resource pool, according to embodiments of the present disclosure.

In an example, to allocate/ assign the memory block by the source processing core 102a, at

steps

703 and 705, the multi-core processor 102 may acquire an int-lock and get core ID from the stored metadata file of the respective processing core. At step 707, the multi-core processor 102 determines whether the release pointer corresponding to the target processing core 102b is not empty. If the release pointer corresponding to the target processing core 102b is not empty, at step 709, the free list of the release pointer may be determined by parsing the metadata of the respective processing core. Further, at step 711, the memory blocks may be analyzed to determine the allocated and free memory blocks. If the memory blocks are available, at step 713, the free and busy memory blocks may be updated in the list and stored in the metadata file of the respective processing core. At step 715, the acquired int-lock may be released after the execution of the task. After that, at step 717, addresses of the memory blocks may be return.

Meanwhile, if the memory blocks are not available, at steps 719 and 721, the acquired int-lock may be released and NULL may be return.

In an example, consider a heat management scenario, in which the dynamic memory may be allocated and de-allocated. Further, the thread or processing core may request for the memory to allocate the data corresponding to heat management. The dynamic memory may allocate the memory, wherein the details of the amount of allocated memory may be stored as the metadata file of the respective processing core memory. Accordingly, for instance, the metadata may be a common/shareable resource.

In an example, consider pipelined work (i.e. cross core free of packet buffers in a modem). The packet processing can be performed in four stages such as

Stage 1: MAC processing

Stage 2: RLC processing

Stage 3: PDCP processing

Stage 4: Application packet routing

In a functional decomposed parallel execution design such as four core processor, each stage can be allowed to run in an individual core. A packet buffer allocated by the MAC processing core will be released by another core, for instance, Application packet routing core. In the functional decomposed parallel execution, the common resource is the metadata of heap manager that may be needed to allocate memory for each time new packet arrives.

In another example, consider a load balancing in symmetric multi-processing (SMP) systems. Accordingly, a multi-core SMP operating system may have per-core ready and wait queues for threads. In the course of dynamic load balancing performed by a scheduler, the ready and wait queues are accessed across cores. In a linux SMP multi-core scheduler, the processor pushes the threads from busy core to ready queue for free cores. In the load balancing SMP systems, the common resources can be at least one of, but not limited to, metadata of heap manager that may be needed to allocate memory and operating system (OS) metadata or task control block, (i.e. accessed concurrently by OS scheduler from different cores concurrently for load balancing).

FIG. 7b is a flowchart depicting a method for release across the cores using per-core and/or per-thread dedicated release queues, according to embodiments of the present disclosure.

In an example, to release the allocated memory block by the source processing core 102a, at

steps

723 and 725, the multi-core processor 102 may acquire an int-lock and get core ID from the stored metadata file of the respective processing core. At step 727, the multi-core processor 102 determines whether the release pointer corresponding to the target processing core 102b is not empty. If the release pointer corresponding to the target processing core 102b is not empty, at step 729, the free list of the release pointers may be determined by parsing the metadata of the respective processing core. The pointer maybe called by the target processing core 102b if the release pointers are empty. Then, at step 731, the target processing core 102b may determine if the pointer or the shareable resource belongs to the core ID. If the pointer or the shareable resource belongs to the core ID, at

steps

733 and 735, the metadata may be parsed to determine the busy list and the free and busy list may be updated in the metadata. And, at step 737, the acquired int-lock may be released after the execution of the task.

Meanwhile, if the pointer or the shareable resource does not belongs to the core ID, at step 739, release queue may be updated.

FIG. 8a is a flow chart depicting a method for dynamic pool adjustment, according to embodiments of the present disclosure.

Referring to (a) shown in FIG. 8a, to allocate/ assign the memory block by the source processing core 102a, at

steps

803 and 805, the multi-core processor 102 may acquire an int-lock and get core ID from the stored metadata file of the respective processing core. At step 807, the multi-core processor 102 releases pointer release pointer queue.

And at step 809, the multi-core processor 102 adjusts dynamic pool size. In an example, the dynamic re-adjustment of the resource pool size dedicated to per-core/per-thread may be performed based on monitoring the occupancy level of each pool during the run time. Further, due to the exclusive sub release queues, the resource pool size may be dynamically adjusted for each core in efficient and lockless way. Adjusting dynamic pool size per core may have advantages of optimized usage of overall resource pool size. Also, the resources may not be left unused for longer duration.

At step 811, the memory blocks may be analyzed to determine the allocated and free memory blocks. If the memory blocks are available, at step 813, the free and busy memory blocks may be updated in the list and stored in the metadata file of the respective processing core. At step 815, the acquired int-lock may be released after the execution of the task. After that, at step 817, addresses of the memory blocks may be return.

Meanwhile, if the memory blocks are not available, at

steps

819 and 821, the acquired int-lock may be released and NULL may be return.

Referring to (b) shown in FIG. 8a, at step 823 to 829, the same operations as those at steps 803 to 809 are performed. And steps 831 to 839, the same operations as those at steps 731 to 738 are performed. Therefore, detailed description thereof will be omitted here.

FIG. 8b is a flowchart depicting steps for dynamic pool adjustment, according to embodiments of the present disclosure.

In an example, at step 871, the multi-core processor 102 may determine the available space of each release sub queue in the release queue and analyze if the available space of each release sub queue is above the threshold value or below the threshold value. Information corresponding to analyzed available space of each release sub queue may be updated, in the metadata file corresponding to the respective source processing core 102a and the at least one target processing core 102b. In other words, if the available space of each release sub queue is above the threshold value or below the threshold value, the multi-core processor 102 may set a deficient flag and sufficient flag for adjusting the pool size, at step 875. Meanwhile, if the available space of each release sub queue is not above the threshold value or below the threshold value, the multi-core processor 102 may remove the deficient flag and sufficient flag for current core, at step 873

Further, if the source processing core 102a and the at least one target processing cores 102b, may need to either allocate or release the resources (memory blocks), then the source processing core 102a and the at least one target processing cores102b may check the condition of deficient or sufficient flag for each core, based on the lower threshold value and upper threshold values as shown collectively in FIG. 8b. In run-time, the source processing core 102a and the at least one target processing cores 102b, may need to either allocate or release the resources, then the source processing core 102a and the at least one target processing cores102b may check if the current state is sufficient, at step 877. If, the current state is sufficient then the respective processing core may contribute to other deficient core(s), by initially changing the ownership of the memory block to deficient core(s), at step 879. Further, respective processing core may write the pointer into the entry block of the sufficient core, in the memory block of the exclusive sub release queue corresponding to the deficient core, at

steps

881 and 883. As shown in FIG. 8a, the switch GUARD SPACE may need to be maintained to avoid frequent conflicts between any core, corresponding to deficient and sufficient states.

FIG. 9a is a flowchart depicting a method 900a for managing shareable resource in the multi-core processor 102, according to embodiments of the present disclosure.

At step 902, the method 900a includes accessing, by a target processing core 102b, the shareable resource associated with a source processing core 102a. At step 904, the method 900a includes generating, by the source processing core 102a, a plurality of release sub queues corresponding to each of at least one target processing core 102b in a release queue of the source processing core 102a, based on the accessed sharable resource, to release a shareable resource assigned by the source processing core 102a to the target processing core 102b. At step 906, the method 900a includes releasing, by the target processing core 102b, at least one accessed shareable resource to the generated respective plurality of release sub queues in the release queue of the source processing core 102a, based on the analyzing a first information relating to the shareable resource, wherein the first information relating to the shareable resource is stored in a metadata file. At step 908, the method 900a includes updating, by the source processing core 102a, a second information in the stored metadata file corresponding to the source processing core102a, based on identifying the release of the shareable resource in the release queue.

The various actions in method 900a may be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some actions listed in FIG. 9a may be omitted.

FIG. 9b is a flowchart depicting a method 900b for determining, if the accessed at least one sharable resource is corresponding to the source processing core 102a, according to embodiments of the present disclosure.

At step 912, the method 900b includes assigning, by the source processing core 102a, at least one shareable resource stored in a memory 104, to at least one target processing core 102b, based on a determined type of task to be executed by the multi-core processor 102. At step 914, the method 900b includes storing by the source processing core 102a, the first information related to the assigned at least one shareable resource, in a metadata file corresponding to the source processing core 102a. At step 916, the method 900b includes providing, by the source processing core 102a, an access to the assigned at least one shareable resource for the at least one target processing core 102b, based on the information stored in the metadata file. At step 918, the method 900b includes determining, by the target processing core 102b, if the accessed at least one sharable resource is corresponding to the source processing core 102a, based on the stored metadata file. In an embodiment, the accessed at least one shareable resource is determined by the at least one target processing cores 102b based on accessing the at least one shareable resource.

The various actions in method 900b may be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some actions listed in FIG. 9b may be omitted.

FIG. 9c is a flowchart depicting a method 900c for pushing the shareable resource of the source processing core 102a to the release sub queue marked in the free list, during dynamic pool adjustment, according to embodiments of the present disclosure.

At step 922, the method 900c includes determining, by the multi-core processor 102, the available space of each release sub queue in the release queue. At step 924, the method 900c includes determining, by the multi-core processor 102 if the available space of each release sub queue is above the pre-defined threshold value or below the pre-defined threshold value. At step 926, the method 900c includes updating, by the multi-core processor 102, information corresponding to a free list and a busy list of the analyzed available space of each release sub queue, in the metadata file corresponding to the respective source processing core 102a and the at least one target processing core 102b. At step 928, the method 900c includes setting, by the multi-core processor 102, a deficient flag corresponding to the source processing core 102a and the at least one target processing core 102b, if the available space of each release sub queue is below the pre-defined threshold, based on the updated metadata file. At step 930, the method 900c includes pushing, by the multi-core processor 102, the shareable resource of the source processing core 102a to the release sub queue marked in the free list, by dynamically adjusting the pool size of the release queue, if the available space of the release sub queue is below the threshold value.

The various actions in method 900c may be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some actions listed in FIG. 9c may be omitted.

FIG. 9d is a flowchart of method 900f for updating by the source processing core 102a, the metadata file corresponding to the source processing core, according to embodiments of the present disclosure.

At step 932, the method 900f includes allocating, by the source processing core 102a, the shareable resource from a memory 104 to at least one target processing core 102b. At step 934, the method 900f includes updating by the source processing core 102a, an information relating to the sharable resource, in a metadata file corresponding to the source processing core 102a. At step 936, the method 900f includes accessing by the target processing core 102b, the allocated shareable resource from the memory 104. At step 938, the method 900f includes determining, by the target processing core 102b, the allocated shareable resource, by the source processing core 102a. At step 940, the method 900f includes releasing, by the target processing core 102b, the shareable resource, allocated by the source processing core 102a, wherein releasing the shareable resource comprises updating a release queue corresponding to the source processing core 102a. At step 942, the method 900f includes identifying, by the source processing core 102a, the release of the shareable resource by the target processing core 102b, based on checking the release queue corresponding to the source processing core 102a. At step 944, the method 900f includes updating, by the source processing core 102a, the metadata file corresponding to the source processing core 102a. In an embodiment the release queue corresponding to the source processing core 102a comprises at least one entry queue corresponding to the at least one target processing core 102b. In an embodiment, updating the release queue corresponding to the source processing core 102a further comprises updating at least one entry queue corresponding to the at least one target processing core 102b.

The various actions in method 900f may be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some actions listed in FIG. 9d may be omitted.

In an example, according to the embodiments herein, maximum possible gain may be obtained. An example test code (i.e. pseudo code) based on experimental test can be as follows:

Set Num_Iterations = 1000000

Start Timer

Repeat for Num_Iterations:

mem = Allocate(1000)

Release(mem)

Stop Timer and calculate running time

In another example, the procedure to test can be as follows:

Create a task per core that runs the above test code.

Run the tasks in DUT#1 (with spinlock) and DUT#2 (proposed solution).

Measure running time of the tasks in each core.

The test result may comprise calculating the gain by Device Under Test (DUT) #2 with respect to DUT#1.

Accordingly, the test results have yielded a high gain. The embodiments herein may achieve multi-core parallelism and the release call is made from a different core. A second test tries to observe gain at various frequencies of allocate and release being called.

In an example, the test code (i.e. pseudo code) for the second test can be as follows:

Set Num_Iterations = 1000000

Set Num_wait = <Variable>

Start Timer

Repeat for Num_Iterations:

mem = Allocate(1000)

wait for Num_wait instructions

Release(mem)

Stop Timer and calculate running time

The test procedure for the second test can be as follows:

Create a task per core that runs the above test code.

Run the tasks in DUT#1 (with spinlock) and DUT#2 (proposed solution).

Measure running time of the tasks in each core and calculate as in Test #1.

Based on analyzing the Gain (%) vs. Frequency of request per 1M instruction, the performance of the system may gradually decrease with lesser allocate/release calls per second.

Embodiments herein can allow removing spinlocks, thereby enhancing parallelism/performance. Embodiments herein may achieve overall faster access to shared resources (i.e. dynamic memory, peripheral buffer pool etc.) by providing lockless access to resources shared across cores\threads. Embodiments herein may perform operations such as allocate, de-allocate, adjust resource pool, and so on, in lockless way, to maximize parallelism.

Embodiments herein can be used in low latency and high bandwidth systems. Embodiments herein achieve faster execution for real-time multi-core applications. Embodiments herein may manage shared resources with optimal sizes such as optimized lesser memory. Embodiments herein avoid locks spinlocks by having a per-core/per-thread dedicated resource pool and metadata. Embodiments herein may use release queue management, with an exclusive set of sub-queues. Embodiments herein can support cross core de-allocation of resources. Embodiments herein can monitor the occupancy level of each memory pool and dynamically adjust the allocation per-pool in a lockless way. Embodiments herein can determine dynamically when to re-adjust per-core/per-function dedicated memory.

The embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing network management functions to control the elements. The elements shown in FIG. 4 can be at least one of a hardware device, or a combination of hardware device and software module.

The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the embodiments as described herein.

Claims

A method for managing a shareable resource in a multi-core processor (102) comprising:

accessing a shareable resource associated with a source processing core (102a), wherein the source processing core (102a) and at least one target processing core (102b) is resided in the multi-core processor (102);

generating a plurality of release sub queues corresponding to each of at least one target processing core (102b) in a release queue of the source processing core (102a), based on the accessed sharable resource;

releasing at least one accessed shareable resource to the generated respective plurality of release sub queues in the release queue of the source processing core (102a), based on analyzing a first information relating to the sharable resource, wherein the first information relating to the sharable resource is stored in a metadata file; and

updating a second information in the metadata file, based on identifying the release of the shareable resource in the release queue.
The method (900a) as claimed in claim 1, wherein the method (900b) further comprises:

assigning at least one shareable resource stored in a memory (104), to the at least one target processing core (102b), based on a determined type of task to be executed by the multi-core processor (102);

storing the first information related to the assigned at least one shareable resource in the metadata file corresponding to the source processing core (102a);

providing an access to the assigned at least one shareable resource for the at least one target processing core (102b), based on the information stored in the metadata file; and

determining if the accessed at least one sharable resource is corresponding to the source processing core (102a), based on the stored metadata, wherein the accessed at least one shareable resource is determined by the at least one target processing core (102b) based on accessing the at least one shareable resource.
The method (900a) as claimed in claim 1, wherein the method (900c) further comprises:

determining the available space of the each release sub queue in the release queue;

determining the available space of the each release sub queue is at least one of above a pre-defined threshold value and below a pre-defined threshold value;

updating an information corresponding to a free list and a busy list of the analyzed available space of each release sub queue, in the metadata file corresponding to the respective source processing core (102a) and the at least one target processing core (102b);

setting a deficient flag corresponding to the source processing core (102a) and the at least one target processing core (102b), if the available space of the each release sub queue is determined to be below the pre-defined threshold, based on the updated metadata file; and

pushing the shareable resource of the source processing core (102a) to the release sub queue marked in the free list, by dynamically adjusting the pool size of the release queue, if the available space of the release sub queue is determined to be below the pre-defined threshold value.
The method as claimed in claim 1, wherein the method (900d) further comprises:

triggering the release of the shareable resource during the assigning of the shareable resource, if the release queue corresponding to the source processing core (102a) has available space; and

causing the release of the shareable resource during the release of the shareable resource to the release queue, if the release queue corresponding to the source processing core (102a) has available space.
The method as claimed in claim 1, wherein the method (900e) further comprises:

parsing the metadata file, to determine theat least one of a pool ID, the free list of the pool ID, the busy list of the pool ID, and the assigned shareable resource ID.
The method as claimed in claim 1, wherein the shareable resource comprises at least one of a memory (104) resource, a common peripherals resource, a serial port resource, a parallel port resource, a display resource, an audio resource, a　multi-core processor (102) resource, a central processing unit resource, a logical processor resource, an input/output resource, a channel resource, a coprocessor resource, a network adapter resource; and wherein the source processing core (102a) and the at least one target processing core (102b) of the multi-core processor (102) comprises at least one of a core 0, a core 1, a core 2 and a core 3.
The method as claimed in claim 1, wherein assigning the at least one shareable resource comprises allocating a memory block in the memory (104) to each of processing core of the source processing core (102a) and the at least one target processing core(102b) for accessing the shareable resource in the memory block.
The method as claimed in claim 1, wherein the metadata file is generated to the each processing core of the source processing core (102a) and the at least one target processing core(102b) and stored sequentially according to an order of the each processing core; and

wherein the release queue corresponding to the source processing core (102a) comprises at least one entry queue corresponding to the at least one target processing core (102b).
The method as claimed in claim 1, wherein releasing the at least one shareable resource comprises updating the at least one entry queue corresponding to the at least one target processing core (102b).
The method as claimed in claim 1, wherein the shareable resource is assigned by the source processing core (102a) and released by the at least one target processing core (102b).
The method as claimed in claim 1, wherein the information relating to shareable resource comprises a pool ID, and the assigned shareable resource ID, a resource block ID, and an assigned core ID.
A method for managing a shareable resource in a multi-core processor (102) comprising:

allocating a shareable resource from a memory (104) to at least one target processing core (102b), wherein a source processing core (102a) and the target processing core (102b) is resided in the multi-core processor (102);

updating an information relating to the sharable resource, in a metadata file corresponding to the source processing core (102a);

accessing the allocated shareable resource from the memory (104);

releasing the allocated shareable resource, wherein releasing the allocated shareable resource comprises updating a release queue corresponding to the source processing core (102a);

identifying the release of the shareable resource by the target processing core (102b), based on checking the release queue corresponding to the source processing core (102a); and

updating, by the source processing core (102a), the metadata file corresponding to the source processing core (102a).
The method as claimed in claim 12, wherein the release queue corresponding to the source processing core (102a) comprises at least one entry queue corresponding to the at least one target processing core (102b).
The method as claimed in claim 12, wherein updating the release queue corresponding to the source processing core (102a) further comprises updating at least one entry queue corresponding to the at least one target processing core (102b).
An apparatus (100) for managing a shareable resource in a multi-core processor (102), wherein the apparatus (100) comprises: at least one processor configured to perform the method described in any one of claims 1-14.