HK1249946B

HK1249946B - Modulating processor core operations

Info

Publication number: HK1249946B
Application number: HK18109369.4A
Authority: HK
Inventors: Andre Barroso Luiz
Original assignee: Google Llc
Priority date: 2015-07-13
Filing date: 2016-06-28
Publication date: 2021-06-25

Description

Regulating processor core operations

技术领域Technical Field

本说明书涉及用于修改处理器性能的技术。This specification relates to techniques for modifying processor performance.

背景技术Background Art

现代计算机处理器通常包括多个独立处理器核。已经对当前的计算系统进行了优化以便高效地处理花费几毫秒(通过操作系统支持的多编程机制，诸如进程上下文切换)或几十纳秒(通过硬件处理器特征，诸如预取、乱序执行、预测等)的事件。Modern computer processors typically include multiple independent processor cores. Current computing systems have been optimized to efficiently process events that take milliseconds (through various programming mechanisms supported by the operating system, such as process context switching) or tens of nanoseconds (through hardware processor features, such as prefetching, out-of-order execution, prediction, etc.).

特别是在需要低延迟响应时间时高效地支持花费几微秒的事件仍然是个挑战。此类微秒粒度事件正随着高性能联网结构、诸如闪速和相变存储器的新非易失性存储技术或利用诸如图形处理单元(GPU)的计算加速器的数据交换而变得更常见。微秒级事件太短而无法负担上下文切换和操作系统中断的开销，而太长则无法容易地通过当今的微处理器中的硬件处理器架构特征来解决。In particular, efficiently supporting events that take several microseconds remains a challenge when low-latency response times are required. Such microsecond-granularity events are becoming more common with the advent of high-performance networking fabrics, new non-volatile storage technologies such as flash and phase-change memory, or data exchange using computational accelerators such as graphics processing units (GPUs). Microsecond-scale events are too short to incur the overhead of context switches and operating system interrupts, yet too long to be easily addressed by hardware processor architecture features in today's microprocessors.

使处理器核专用于处理特定低延迟操作(有时被称为自旋)是在微秒粒度操作中实现低延迟的可能的解决方案。然而，使处理器专用于特定I/O操作可能从多核处理器中扣除大量的计算能力。Dedicating a processor core to handling a specific low-latency operation (sometimes referred to as spinning) is a possible solution to achieving low latency in microsecond granularity operations. However, dedicating a processor to a specific I/O operation can deduct a large amount of computing power from a multi-core processor.

发明内容Summary of the Invention

总的来说，本说明书中所描述的主题的一个创新方面可用方法加以具体实现，所述方法包括在具有n个核的多核处理器中实现的动作，包括：选择所述多核处理器的所述n个核中的k个核来为所述n核处理器执行专用低延迟操作，其中k小于n，m个核未被选择，并且所述多核处理器的每个核具有额定核容量。所述方法在低于所述额定核容量下操作所选择的k个核，使得k个核由于未充分利用的容量而集体地未被充分利用，并且在超过所述额定核容量的容量下操作所述m个核中的一个或多个，使得所述m个核在超过所述m个核的所述额定核容量的集体容量的集体容量下操作。In general, one innovative aspect of the subject matter described herein can be embodied in a method comprising actions implemented in a multi-core processor having n cores, including: selecting k cores of the n cores of the multi-core processor to perform dedicated low-latency operations for the n-core processor, where k is less than n, m cores are unselected, and each core of the multi-core processor has a rated core capacity. The method operates the selected k cores at less than the rated core capacity, such that the k cores are collectively underutilized due to underutilized capacity, and operates one or more of the m cores at a capacity exceeding the rated core capacity, such that the m cores operate at a collective capacity exceeding the rated core capacity of the m cores.

此方面的其它实施例包括被配置成执行在计算机存储装置上编码的方法的动作的对应系统、设备和计算机程序。Other embodiments of this aspect include corresponding systems, apparatus, and computer programs configured to perform the actions of the method encoded on computer storage devices.

可实现本说明书中所描述的主题的特定实施例以便实现以下优点中的一个或多个。本系统和方法使得多核处理器的特定数目的独立处理器核能够专用于仅执行输入/输出(I/O)操作但是在减少的操作容量下，这进而为多核处理器的剩余的核提供附加的操作容量。使处理器核专用于低延迟操作可为多核处理器实现一致的低延迟，同时减少在满容量下使核专用于低延迟操作的负面效应。通过在减少的容量下使处理器核专用于低延迟操作，从未充分利用的专用处理器核可得到的能量可用于提高多核处理器中的剩余的核的性能。Specific embodiments of the subject matter described in this specification can be implemented to achieve one or more of the following advantages. The present systems and methods enable a specific number of independent processor cores of a multi-core processor to be dedicated to performing only input/output (I/O) operations but at a reduced operating capacity, which in turn provides additional operating capacity for the remaining cores of the multi-core processor. Dedicating processor cores to low-latency operations can achieve consistent low latency for the multi-core processor while reducing the negative effects of dedicating cores to low-latency operations at full capacity. By dedicating processor cores to low-latency operations at a reduced capacity, energy available from underutilized dedicated processor cores can be used to improve the performance of the remaining cores in the multi-core processor.

在附图和以下描述中阐述了本说明书中所描述的主题的一个或多个实施例的细节。主题的其它特征、方面和优点将根据说明书、附图和权利要求书变得显而易见。The details of one or more embodiments of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, drawings, and claims.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是特定数目的处理器核可专用于低延迟操作的环境的框图，其中每个专用处理核未被充分利用。1 is a block diagram of an environment in which a certain number of processor cores may be dedicated to low-latency operations, where each dedicated processing core is underutilized.

图2是用于通过使用来自未充分利用的处理器核的过剩能量来提高一般操作处理器核的性能的示例过程的流程图。2 is a flow diagram of an example process for improving the performance of a generally operating processor core by using excess energy from an underutilized processor core.

在各个附图中相同的附图标记和名称指示相同的元件。Like reference numbers and designations throughout the various drawings indicate like elements.

具体实施方式DETAILED DESCRIPTION

在下面描述的系统和方法涉及使多核CPU(例如，处理器)内的独立处理器核致力于实现低延迟并且提高总体通用操作处理器核性能。在一些实施方式中，多核处理器的第一组处理器核专用于执行低延迟操作。例如，低延迟操作可包括输入/输出(I/O)操作、访问与其它处理器核进行通信的存储器(例如，固态驱动器、闪存器件等)内的数据、诸如用于交换非常快的消息(例如，1μS-100μS)的超级计算式网络结构的非易失性存储技术以及其它低延迟操作。多核处理器中的第二组处理器核是不局限于仅执行低延迟操作的处理器核，例如，第二组处理器核可以执行处理器所需的大量不同的操作。通常，对于n核处理器来说，k个核被选择用于专用低延迟操作，并且剩余的核被选择用于执行剩余的操作。然而，第二组核不一定包括剩余的核，例如，对于n核处理器来说，在n-k＝m的情况下，第二组核可以具有m的基数或小于m的基数。The systems and methods described below relate to enabling independent processor cores within a multi-core CPU (e.g., a processor) to achieve low latency and improve overall general-purpose processor core performance. In some embodiments, a first group of processor cores in a multi-core processor is dedicated to performing low-latency operations. For example, low-latency operations may include input/output (I/O) operations, accessing data in a memory (e.g., a solid-state drive, a flash memory device, etc.) that communicates with other processor cores, non-volatile storage technologies such as supercomputing network structures for exchanging very fast messages (e.g., 1 μS-100 μS), and other low-latency operations. The second group of processor cores in a multi-core processor is not limited to processor cores that only perform low-latency operations. For example, the second group of processor cores can perform a large number of different operations required by the processor. Typically, for an n-core processor, k cores are selected for dedicated low-latency operations, and the remaining cores are selected to perform the remaining operations. However, the second group of cores does not necessarily include the remaining cores. For example, for an n-core processor, when n-k=m, the second group of cores can have a cardinality of m or less than m.

通常，具有专用于低延迟操作的处理器核的多核处理器未充分利用专用低延迟处理器核，剩下未使用量的可用能量。多核处理器可减少由低延迟处理器核所利用的功率(例如，电压/频率)的量。通过减少低延迟处理器核的功耗而产生的过剩功率容量可被添加到可被剩余通用处理器核利用的总体功率容量。利用可由通用处理器核利用的附加功率容量提高剩余的通用处理器核的性能，这消除一些搁置的计算容量。Typically, multi-core processors with processor cores dedicated to low-latency operations underutilize the dedicated low-latency processor cores, leaving unused amounts of available energy. Multi-core processors can reduce the amount of power (e.g., voltage/frequency) utilized by the low-latency processor cores. The excess power capacity created by reducing the power consumption of the low-latency processor cores can be added to the overall power capacity available to the remaining general-purpose processor cores. Utilizing the additional power capacity available to the general-purpose processor cores improves the performance of the remaining general-purpose processor cores, eliminating some stranded computing capacity.

在下面对这些特征和其它特征进行更详细的描述。These and other features are described in more detail below.

图1是特定数目的处理器核可专用于低延迟操作的环境100的框图，其中每个专用处理核未被充分利用。如所示，环境100包括连接到总线108的多核处理器102。总线108将处理器102通信地耦合到一组外部资源，包括闪速存储器110、随机存取存储器(RAM)112和图形处理单元(GPU)114。多核处理器102包括可操作来执行指令的处理器核104-1至104-n(n个核)。多核处理器102也具有包括处理器核104-1和处理器核104-2的第一组k个处理器核106。虽然在图1的示例中k的值为2，但是可利用k的其它值。FIG1 is a block diagram of an environment 100 in which a specific number of processor cores can be dedicated to low-latency operations, wherein each dedicated processing core is not fully utilized. As shown, environment 100 includes a multi-core processor 102 connected to a bus 108. Bus 108 communicatively couples processor 102 to a set of external resources, including flash memory 110, random access memory (RAM) 112, and a graphics processing unit (GPU) 114. Multi-core processor 102 includes processor cores 104-1 to 104-n (n cores) operable to execute instructions. Multi-core processor 102 also has a first group of k processor cores 106 including processor core 104-1 and processor core 104-2. Although the value of k is 2 in the example of FIG1 , other values of k may be utilized.

多核处理器102可以是包括多个处理器核、存储资源和其它组件的单个中央处理单元(CPU)芯片。多核处理器102也可以包括存储在片上存储部中并且可操作来实现本文所描述的技术的固件或微码指令。在一些实施方式中，实现本文所描述的技术所必需的指令可以用硅来实现。多核处理器102也可以包括多个处理器，每个处理器包括它自己的处理器核、存储资源和/或其它组件。The multi-core processor 102 can be a single central processing unit (CPU) chip that includes multiple processor cores, memory resources, and other components. The multi-core processor 102 can also include firmware or microcode instructions stored in on-chip memory and operable to implement the techniques described herein. In some embodiments, the instructions necessary to implement the techniques described herein can be implemented in silicon. The multi-core processor 102 can also include multiple processors, each processor including its own processor core, memory resources, and/or other components.

多核处理器102连接到数据总线108。在一些实施方式中，数据总线108可以是用于在计算装置机箱内的组件之间传输数据的高速数据转移机制。数据总线108可以是能够执行此类数据转移的任何类型的总线。The multi-core processor 102 is connected to a data bus 108. In some embodiments, the data bus 108 may be a high-speed data transfer mechanism for transferring data between components within a computing device chassis. The data bus 108 may be any type of bus capable of performing such data transfer.

多个资源(包括闪速存储器110、RAM 112和GPU 114)连接到数据总线108。在一些实施方式中，附加资源可以连接到数据总线108，附加资源包括但不限于直接存储器访问(DMA)控制器、图形卡、网卡、RAID控制器和/或其它资源。A number of resources, including flash memory 110, RAM 112, and GPU 114, are connected to data bus 108. In some implementations, additional resources may be connected to data bus 108, including, but not limited to, a direct memory access (DMA) controller, a graphics card, a network card, a RAID controller, and/or other resources.

如所示，多核处理器102包括处理器核104-1至104-n。处理器核104-1至104-n可以是通过数据总线或网桥连接到多核处理器102的其它组件的单独的组件。多核处理器102包括处理器核104-1至104-n，其中n是用于进行普通多核处理器102功能的处理器核的任何适合的数目。As shown, the multi-core processor 102 includes processor cores 104-1 through 104-n. The processor cores 104-1 through 104-n may be separate components connected to other components of the multi-core processor 102 via data buses or bridges. The multi-core processor 102 includes processor cores 104-1 through 104-n, where n is any suitable number of processor cores for performing the common multi-core processor 102 functions.

多核处理器102中的每个处理器核104具有额定核容量。额定核容量可以是最大性能容量的度量(例如，处理器核可在不对处理器核造成损坏的情况下实现并维持处理器核的适当操作的最高电压/频率比率)，其描述在不损坏处理器核的情况下每个处理器核104可消耗多少功率。可以以各种方式测量容量，诸如通过功耗、最大频率、给定电压的最大频率、最大电流、最大工作温度或者通过指示核功耗的任何其它测量结果。Each processor core 104 in the multi-core processor 102 has a rated core capacity. The rated core capacity can be a measure of maximum performance capacity (e.g., the highest voltage/frequency ratio that the processor core can achieve and maintain proper operation of the processor core without causing damage to the processor core), which describes how much power each processor core 104 can consume without causing damage to the processor core. Capacity can be measured in various ways, such as by power consumption, maximum frequency, maximum frequency for a given voltage, maximum current, maximum operating temperature, or by any other measurement that indicates core power consumption.

在一些实施方式中，处理器核104可以包括其它集成组件，例如，诸如L1高速缓存的专用高速缓存存储器、硬件上下文存储部、固件存储部、微码存储部和/或其它集成组件。In some implementations, the processor core 104 may include other integrated components, for example, dedicated cache memory such as an L1 cache, hardware context storage, firmware storage, microcode storage, and/or other integrated components.

可给处理器核104-1-n指派线程。在一些实施方式中，线程可以是待在处理器核104上执行的指令的集合。例如，线程可以是在多核处理器102上执行的软件应用。线程也可以是正在多核处理器102上执行的单个软件应用内的许多线程中的一个。Threads may be assigned to processor cores 104-1-n. In some implementations, a thread may be a collection of instructions to be executed on a processor core 104. For example, a thread may be a software application executing on a multi-core processor 102. A thread may also be one of many threads within a single software application executing on a multi-core processor 102.

k个处理器核106可以是只专用于低延迟操作的特定数目的处理核。当只专用于I/O操作时，k个处理器核106不可被操作系统调度器用于指派通用处理任务，并且仅执行与I/O有关的指令(例如，从闪速存储器110中检索/发送数据、从RAM 112中检索/发送数据、从GPU 114中检索/发送数据等)。在一些实施方式中，低延迟操作是作为每个操作的速度的度量的微秒低延迟操作。在其它实施方式中，可以由专用核执行的低延迟操作可以包括纳秒和微秒操作。如图1所示，k个处理器核106包括两个处理器核106-1和106-2。在一些实施方式中，k个处理器核106可包括任何适合数目的处理器核，但是小于处理核的总数以确保处理器102的适当操作，即，k<n。The k processor cores 106 can be a specific number of processing cores dedicated only to low-latency operations. When dedicated only to I/O operations, the k processor cores 106 cannot be used by the operating system scheduler to assign general processing tasks and only execute instructions related to I/O (e.g., retrieving/sending data from flash memory 110, retrieving/sending data from RAM 112, retrieving/sending data from GPU 114, etc.). In some embodiments, low-latency operations are microsecond low-latency operations as a measure of the speed of each operation. In other embodiments, the low-latency operations that can be performed by dedicated cores can include nanosecond and microsecond operations. As shown in Figure 1, the k processor cores 106 include two processor cores 106-1 and 106-2. In some embodiments, the k processor cores 106 may include any suitable number of processor cores, but less than the total number of processing cores to ensure the proper operation of the processor 102, that is, k<n.

k个处理器核106通过减少用于低延迟操作的多核处理器102的响应时间来减少处理器102的延迟。多核处理器102的响应时间是通过在k个核中运行低延迟操作来减少的，这消除在剩余的m个核中暂停其它线程使得低延迟操作可由其它核执行的需要。The k processor cores 106 reduce the latency of the processor 102 by reducing the response time of the multi-core processor 102 for low-latency operations. The response time of the multi-core processor 102 is reduced by running low-latency operations in k cores, which eliminates the need to pause other threads in the remaining m cores so that low-latency operations can be performed by the other cores.

k个处理器核106由于高利用率而未被利用至满容量，导致处理队列的临时建立，这将增加处理器102的延迟。剩余的处理器核104-k至104-n(例如，m个核)专用于通用处理任务。通常，剩余的处理器核104-k至104-n的数目m等于处理核的总数减去k个处理器核106。例如，在n核处理器中，n个核等于m个核加上k个核。The k processor cores 106 are not utilized to their full capacity due to high utilization, resulting in a temporary buildup of processing queues, which increases latency in the processor 102. The remaining processor cores 104-k through 104-n (e.g., m cores) are dedicated to general processing tasks. Typically, the number m of remaining processor cores 104-k through 104-n is equal to the total number of processing cores minus the k processor cores 106. For example, in an n-core processor, the number of n cores is equal to the number of m cores plus the number of k cores.

为了减少处理核的损耗，处理器102可以周期性地循环对多核处理器中的k个核(例如，专用低延迟处理器核)的选择。k个核的循环确保在某时间段期间，每个核持续与其它核相同的时间段未被充分利用，这促进处理核之间的均匀损耗。To reduce the consumption of processing cores, processor 102 can periodically cycle the selection of k cores (e.g., dedicated low-latency processor cores) in the multi-core processor. Cycling the k cores ensures that each core is underutilized for the same period of time as the other cores during a certain time period, which promotes uniform consumption among the processing cores.

如前所述，用作专用低延迟处理器核的k个处理器核106中的每一个未被充分利用。多核处理器中的每个核具有额定核容量。通常，额定核容量对于每个核来说是相同的。As previously mentioned, each of the k processor cores 106 used as dedicated low-latency processor cores is not fully utilized. Each core in a multi-core processor has a rated core capacity. Typically, the rated core capacity is the same for each core.

k个处理器核的功耗中的每一个都小于满额定核容量。来自k个处理器核106的剩余的过剩功率容量由m个核使用以提高m个核的性能。我们的解决方案依赖于现代多核CPU被构建为使得并非所有核都可同时以其最大性能运行的事实(参见早先的暗硅参考文献)，因为这样做与CPU封装能够散热同时保持在工作温度范围内相比将要求更多的能量。因此只有当剩余的CPU是不太活动时CPU才可让其核的子集在其最大性能(最高频率/电压设定)下运行，从而产生较少的热量并且可能在较低的频率/电压设定下运行。Each of the k processor cores consumes less power than the full rated core capacity. The remaining excess power capacity from the k processor cores 106 is used by the m cores to improve the performance of the m cores. Our solution relies on the fact that modern multi-core CPUs are built so that not all cores can run at their maximum performance simultaneously (see earlier Dark Silicon references), because doing so would require more energy than the CPU package is able to dissipate while staying within the operating temperature range. Therefore, the CPU can have a subset of its cores run at their maximum performance (highest frequency/voltage setting) only when the remaining CPUs are less active, thereby generating less heat and potentially running at a lower frequency/voltage setting.

对于给定热设计功率(TDP)约束，k个核的未充分利用可考虑在标称工作电压下可通电的硅的量。得到的通用计算能力的相称的减少小于k/n，这减少了专用核对于微秒级I/O操作实现相对较低的延迟的低延迟操作的负面效应。For a given thermal design power (TDP) constraint, the underutilization of k cores can be considered the amount of silicon that can be powered at the nominal operating voltage. The resulting commensurate reduction in general-purpose computing power is less than k/n, which reduces the negative impact of specialized cores achieving relatively low latency for microsecond I/O operations.

图2是作为未充分利用的处理器核的结果而使用过剩能量来提高通用操作处理器核的性能的示例过程的流程图。2 is a flow diagram of an example process for using excess energy to improve the performance of general-purpose operating processor cores as a result of underutilized processor cores.

可在具有n个核的多核处理器中实现该过程。多核处理器102具有n*E的能量等级/容量，其中n是核的数目并且E是每一个核的额定核容量。This process may be implemented in a multi-core processor having n cores.The multi-core processor 102 has an energy rating/capacity of n*E, where n is the number of cores and E is the rated core capacity of each core.

该过程选择多核处理器102的n个核中的k个核来为n核处理器102执行专用输入/输出操作，其中k小于n，并且m个核未被选择，并且其中多核处理器的每个核具有额定核容量(202)。额定核容量是处理核104在不遭受损坏的情况下操作的技术限制(例如，热设计功率、时钟速度、功耗等)。例如，额定核容量可以是额定核操作频率、额定核功耗、额定核操作温度，或处理核每时钟速度(即，频率)利用的电压量的比率。The process selects k cores out of n cores of the multi-core processor 102 to perform dedicated input/output operations for the n-core processor 102, where k is less than n and m cores are not selected, and where each core of the multi-core processor has a rated core capacity (202). The rated core capacity is a technical limit (e.g., thermal design power, clock speed, power consumption, etc.) at which the processing core 104 can operate without suffering damage. For example, the rated core capacity can be a rated core operating frequency, a rated core power consumption, a rated core operating temperature, or a ratio of the amount of voltage utilized by the processing core per clock speed (i.e., frequency).

在一些实施方式中，额定核容量等于处理核104的最大性能水平(例如，处理器核可在不对处理器核造成损坏的情况下实现并维持处理器核的适当操作的最高电压/频率比)。同样注意，核可以超过其额定容量操作某个时间段而不会遭受损坏。In some embodiments, the rated core capacity is equal to the maximum performance level of the processing core 104 (e.g., the highest voltage/frequency ratio that the processing core can achieve and maintain proper operation of the processing core without causing damage to the processing core). Also note that a core can operate beyond its rated capacity for some period of time without suffering damage.

此外，n核处理器可以具有处理器额定容量。处理器的额定容量可以等于额定核容量的和，例如，n*E。然而，针对一些处理器，额定处理器容量可以小于额定核容量的和。这是由于暗硅效应而导致的，暗硅效应限定对于给定热设计功率(TDP)约束不可在标称工作电压下通电的硅的量。Furthermore, an n-core processor may have a processor rated capacity. The rated capacity of the processor may be equal to the sum of the rated core capacities, e.g., n*E. However, for some processors, the rated processor capacity may be less than the sum of the rated core capacities. This is due to the dark silicon effect, which defines the amount of silicon that cannot be powered at the nominal operating voltage for a given thermal design power (TDP) constraint.

该过程在低于额定核容量下操作所选择的k个核，使得k个核由于未充分利用的容量而集体地未被充分利用(204)。如前所述，k个核在z容量下操作并且可根据下式(1)来计算未被充分利用的容量(u)：The process operates the selected k cores at less than the rated core capacity, such that the k cores are collectively underutilized due to the underutilized capacity (204). As previously described, the k cores operate at z capacity and the underutilized capacity (u) can be calculated according to the following equation (1):

u＝(100％容量-z容量) (1)u＝(100% capacity-z capacity) (1)

未被充分利用的容量向多核处理器102提供附加的功率可用性。由于上面所阐述的原因，并非所有未被充分利用的容量都可为处理器所利用。The underutilized capacity provides additional power availability to the multi-core processor 102. For the reasons set forth above, not all of the underutilized capacity may be utilized by the processor.

该过程在超过额定核容量的容量下操作m个核中的一个或多个，使得m个核在超过m个核的集体额定核容量的集体容量下操作(206)。在一些实施方式中，处理器102可以选择少于所有剩余的核来在超过额定核容量的容量下操作，但是仍然在比所有剩余的处理器核的和高的容量下操作。m个核中的一个或多个可以在或直到通过下式(2)所描述的能量等级/容量下操作：The process operates one or more of the m cores at a capacity exceeding the rated core capacity such that the m cores operate at a collective capacity exceeding the collective rated core capacity of the m cores (206). In some embodiments, the processor 102 may select fewer than all remaining cores to operate at a capacity exceeding the rated core capacity, but still operate at a capacity greater than the sum of all remaining processor cores. One or more of the m cores may operate at or up to an energy level/capacity described by the following equation (2):

m核功率使用率＝[(n-k)+(u)*k]*E (2)m-core power utilization rate = [(n-k)+(u)*k]*E (2)

在一些实施方式中，在超过额定核容量的容量下操作m个核中的一个或多个包括在暂时超过m个额定核容量与未充分利用的容量之和的容量下操作m个核中的一个或多个。一个或多个处理器核可以单独地或集体地暂时超过额定核容量。In some embodiments, operating one or more of the m cores at a capacity exceeding the rated core capacity includes operating one or more of the m cores at a capacity temporarily exceeding the sum of the m rated core capacities and underutilized capacity. One or more processor cores may individually or collectively temporarily exceed the rated core capacity.

在其它实施方式中，在超过额定核容量的容量下操作m个核中的一个或多个包括在不超过m个额定核容量与未充分利用的容量之和的容量下操作m个核中的一个或多个。可在仅超过k个核的未充分利用的容量的容量下操作m个额定核，该容量可小于或者等于额定处理器容量。例如，在超过额定核容量的容量下操作m个核中的一个或多个包括在下述容量下操作m个核中的一个或多个，该容量使得m个核正在操作的容量之和与k个核正在操作的容量之和不超过处理器额定容量。In other embodiments, operating one or more of the m cores at a capacity exceeding the rated core capacity includes operating one or more of the m cores at a capacity that does not exceed the sum of the m rated core capacities and the underutilized capacity. The m rated cores may be operated at a capacity that only exceeds the underutilized capacity of k cores, which may be less than or equal to the rated processor capacity. For example, operating one or more of the m cores at a capacity exceeding the rated core capacity includes operating one or more of the m cores at a capacity such that the sum of the capacities at which the m cores are operating and the sum of the capacities at which the k cores are operating does not exceed the rated capacity of the processor.

与额定核容量类似，额定处理器容量可被描述为处理器在不遭受损坏的情况下操作从而仍然在最佳性能水平(例如，产生快速性能而不消耗太多功率的电压/频率比)下操作的技术限制。可将处理器额定容量定义为小于n个核的额定核容量的和。Similar to the rated core capacity, the rated processor capacity can be described as the technical limit at which a processor can operate without suffering damage while still operating at an optimal performance level (e.g., a voltage/frequency ratio that produces fast performance without consuming too much power). The processor rated capacity can be defined as the sum of the rated core capacities of the lesser n cores.

在这里所讨论的系统收集关于用户的个人信息或者可以利用个人信息的情形下，可以给用户提供用于控制应用或特征是否收集用户信息(例如，关于用户的社交网络、社会动作或活动、职业、用户的偏好或用户的当前位置的信息)或者用于控制是否和/或如何接收可能与用户更相关的内容的机会。此外，某些数据可以在它被存储或者使用之前被以一个或多个方式处理，使得个人可标识的信息被移除。例如，可以对用户的身份进行处理，使得对该用户来说不可确定个人可标识的信息，或者可以在获得了位置信息的情况下使用户的地理位置一般化(诸如到城市、邮政编码或州级)，使得不可确定用户的特定位置。因此，用户可以控制信息如何关于用户被收集并由内容服务器使用。In situations where the systems discussed herein collect personal information about users or can utilize personal information, users can be provided with the opportunity to control whether applications or features collect user information (e.g., information about the user's social network, social actions or activities, occupation, the user's preferences, or the user's current location) or to control whether and/or how content that may be more relevant to the user is received. In addition, certain data can be processed in one or more ways before it is stored or used so that personally identifiable information is removed. For example, the user's identity can be processed so that personally identifiable information cannot be determined for the user, or the user's geographic location can be generalized (such as to a city, zip code, or state level) if location information is available so that the user's specific location cannot be determined. Thus, users can control how information is collected about them and used by content servers.

本说明书中所描述的主题和操作的实施例可用数字电子电路或者用计算机软件、固件或硬件(包括本说明书中所公开的结构及其结构等同物)或者用它们中的一个或多个的组合加以实现。本说明书中所描述的主题的实施例可作为在计算机存储介质上编码以供由数据处理设备执行或者用以控制数据处理设备的操作的一个或多个计算机程序(即，计算机程序指令的一个或多个模块)被实现。Embodiments of the subject matter and operations described in this specification may be implemented in digital electronic circuitry, or in computer software, firmware, or hardware (including the structures disclosed in this specification and their structural equivalents), or in a combination of one or more of these. Embodiments of the subject matter described in this specification may be implemented as one or more computer programs (i.e., one or more modules of computer program instructions) encoded on computer storage media for execution by, or to control the operation of, data processing apparatus.

计算机存储介质可以是或者被包括在计算机可读存储装置、计算机可读存储基板、随机或串行存取存储器阵列或装置或它们中的一个或多个的组合中。此外，当计算机存储介质不是传播信号时，计算机存储介质可以是在人工生成的传播信号中编码的计算机程序指令的源或目的地。计算机存储介质也可以是或者被包括在一个或多个单独的物理组件或介质(例如，多个CD、磁盘或其它存储装置)中。A computer storage medium can be or be included in a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them. Additionally, while a computer storage medium is not a propagated signal, a computer storage medium can be a source or destination of computer program instructions encoded in an artificially generated propagated signal. A computer storage medium can also be or be included in one or more separate physical components or media (e.g., multiple CDs, disks, or other storage devices).

本说明书中所描述的操作可作为由数据处理设备对存储在一个或多个计算机可读存储装置上或者从其它源接收到的数据执行的操作被实现。The operations described in this specification can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.

术语“数据处理设备”包含用于对数据进行处理的所有种类的设备、装置和机器，作为示例包括可编程处理器、计算机、片上系统或多个可编程处理器、计算机、片上系统，或上述的组合。该设备可包括专用逻辑电路，例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)。该设备除了包括硬件之外还可包括为所述计算机程序创建执行环境的代码，例如，构成处理器固件、协议栈、数据库管理系统、操作系统、跨平台运行时环境、虚拟机或它们中的一个或多个的组合的代码。该设备和执行环境可实现各种不同的计算模型基础设施，诸如web服务、分布式计算和网格计算基础设施。The term "data processing device" includes all types of equipment, devices and machines for processing data, including, by way of example, a programmable processor, a computer, a system on a chip or multiple programmable processors, computers, systems on a chip, or a combination thereof. The device may include dedicated logic circuitry, such as an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). In addition to hardware, the device may also include code that creates an execution environment for the computer program, for example, code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of these. The device and execution environment can implement a variety of different computing model infrastructures, such as web services, distributed computing, and grid computing infrastructure.

计算机程序(也被称为程序、软件、软件应用、脚本或代码)可用任何形式的编程语言加以编写，所述编程语言包括编译或解释语言、声明性或过程语言，并且它可被以任何形式部署，包括作为独立程序或者作为模块、组件、子例行程序、对象或适合于在计算环境使用的其它单元。计算机程序可以但未必对应于文件系统中的文件。可在保持其它程序或数据的文件的一部分(例如，存储在标记语言文档中的一个或多个脚本)中、在专用于所述程序的单个文件中或者在多个协调文件(例如，存储一个或多个模块、子程序或代码的部分的文件)中存储程序。可将计算机程序部署成在一个计算机上或者在位于一个站点处或者跨越多个站点分布并通过通信网络互连的多个计算机上执行。A computer program (also referred to as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment. A computer program may, but does not necessarily, correspond to a file in a file system. A program may be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program, or in multiple coordinated files (e.g., files that store portions of one or more modules, subroutines, or code). A computer program may be deployed to execute on one computer or on multiple computers located at one site or distributed across multiple sites and interconnected by a communication network.

本说明书中所描述的过程和逻辑流程可通过一个或多个可编程处理器执行一个或多个计算机程序以通过对输入数据进行操作并产生输出来执行动作而被执行。这些过程和逻辑流程也可由专用逻辑电路来执行，并且设备也可作为专用逻辑电路被实现，所述专用逻辑电路例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)。The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input data and generating output. The processes and logic flows can also be performed by, and the apparatus can also be implemented as, special purpose logic circuitry, such as an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

作为示例，适合于执行计算机程序的处理器包括通用微处理器和专用微处理器两者，以及任何种类的数字计算机的任何一个或多个处理器。通常，处理器将从只读存储器或随机存取存储器或两者接收指令和数据。计算机的必要元件是用于依照指令执行动作的处理器以及用于存储指令和数据的一个或多个存储器器件。通常，计算机也将包括用于存储数据的一个或多个海量存储装置，或者在操作上耦合以从用于存储数据的一个或多个海量存储装置接收数据或者向用于存储数据的一个或多个海量存储装置转移数据或两者，所述海量存储装置例如磁盘、磁光盘或光盘。然而，计算机不必具有此类装置。此外，计算机可被嵌入在另一装置中，所述另一装置例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏控制台、全球定位系统(GPS)接收器或便携式存储装置(例如，通用串行总线(USB)闪存驱动器)，仅举几例。适合于存储计算机程序指令和数据的装置包括所有形式的非易失性存储器、介质和存储器器件，作为示例包括半导体存储器器件，例如EPROM、EEPROM和闪速存储器器件；磁盘，例如内部硬盘或可移动盘；磁光盘；以及CD-ROM和DVD-ROM盘。处理器和存储器可由专用逻辑电路来补充，或者并入在专用逻辑电路中。As an example, processors suitable for executing computer programs include both general-purpose microprocessors and special-purpose microprocessors, as well as any one or more processors of any type of digital computer. Typically, the processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for performing actions in accordance with instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include one or more mass storage devices for storing data, or be operationally coupled to receive data from one or more mass storage devices for storing data or transfer data to one or more mass storage devices for storing data, such as magnetic disks, magneto-optical disks, or optical disks. However, a computer does not necessarily have such devices. In addition, a computer can be embedded in another device, such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or a portable storage device (e.g., a universal serial bus (USB) flash drive), to name a few. Suitable means for storing computer program instructions and data include all forms of nonvolatile memory, media, and storage devices, including, by way of example, semiconductor memory devices such as EPROM, EEPROM, and flash memory devices; magnetic disks, such as internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and memory can be supplemented by, or incorporated in, special purpose logic circuitry.

为了提供与用户的交互，可在计算机上实现本说明书中所描述的主题的实施例，所述计算机具有用于向用户显示信息的显示装置(例如，CRT(阴极射线管)或LCD(液晶显示器)监视器)以及用户可用来向计算机提供输入的键盘和指点装置，例如鼠标或轨迹球。也可使用其它种类的装置来提供与用户的交互；例如，提供给用户的反馈可以是任何形式的感觉反馈，例如视觉反馈、听觉反馈或触觉反馈；并且可以任何形式接收来自用户的输入，包括声、语音或触觉输入。此外，计算机可通过向由用户使用的装置发送文档并且从由用户使用的装置接收文档来与用户交互；例如，通过响应于从web浏览器接收到的请求而向用户的用户装置上的web浏览器发送web页面。To provide for interaction with a user, embodiments of the subject matter described in this specification may be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user, and a keyboard and pointing device, such as a mouse or trackball, that the user can use to provide input to the computer. Other types of devices may also be used to provide for interaction with the user; for example, feedback provided to the user may be any form of sensory feedback, such as visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including acoustic, voice, or tactile input. In addition, a computer may interact with a user by sending documents to and receiving documents from a device used by the user; for example, by sending web pages to a web browser on a user's user device in response to a request received from the web browser.

可在计算系统中实现本说明书中所描述的主题的实施例，所述计算系统包括后端组件(例如，作为数据服务器)，或者包括中间件组件(例如，应用服务器)，或者包括前端组件，例如具有用户可用来与本说明书中所描述的主题的实施方式交互的图形用户界面或Web浏览器的用户计算机，或者包括一个或多个此类后端、中间件或前端组件的任何组合。本系统的组件可通过任何形式或介质的数字数据通信(例如，通信网络)来互连。通信网络的示例包括局域网(“LAN”)和广域网(“WAN”)、互联网(例如，因特网)以及对等网络(例如，自组织对等网络)。Embodiments of the subject matter described in this specification may be implemented in a computing system that includes a back-end component (e.g., as a data server), or includes a middleware component (e.g., an application server), or includes a front-end component, such as a user computer having a graphical user interface or a web browser that a user can use to interact with implementations of the subject matter described in this specification, or includes any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include local area networks ("LANs") and wide area networks ("WANs"), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks).

计算系统可包括用户和服务器。用户和服务器通常彼此远离并且通常通过通信网络来交互。用户和服务器的关系借助于在相应的计算机上运行并且彼此具有用户-服务器关系的计算机程序而产生。在一些实施例中，服务器向用户装置发送数据(例如，HTML页面)(例如，为了向与用户装置交互的用户显示数据并且从与用户装置交互的用户接收用户输入)。可在服务器处从用户装置接收在用户装置处产生的数据(例如，用户交互的结果)。A computing system may include a user and a server. The user and the server are typically remote from each other and typically interact via a communication network. The relationship between the user and the server arises by means of computer programs running on respective computers and having a user-server relationship with each other. In some embodiments, the server sends data (e.g., an HTML page) to the user device (e.g., to display data to a user interacting with the user device and to receive user input from the user interacting with the user device). Data generated at the user device (e.g., the results of the user interaction) may be received at the server from the user device.

虽然本说明书包含许多特定实施方式细节，但是这些不应该被解释为对任何特征的或可以要求保护的范围构成限制，而是相反被解释为特定于特定实施例的特征的描述。也可在单个实施例中相结合地实现在本说明书中在单独的实施例的上下文中描述的某些特征。相反地，也可以单独地或者按照任何适合的子组合在多个实施例中实现在单个实施例的上下文中所描述的各种特征。此外，尽管特征可以在上面被描述为按照某些组合行动并因此甚至最初要求保护，然而来自要求保护的组合的一个或多个特征在一些情况下可被从该组合中删去，并且所要求保护的组合可以针对子组合或子组合的变化。Although this specification contains many specific implementation details, these should not be interpreted as limiting the scope of any feature or that may be claimed, but rather as descriptions of features specific to a particular embodiment. Certain features described in this specification in the context of separate embodiments may also be implemented in combination in a single embodiment. Conversely, various features described in the context of a single embodiment may also be implemented in multiple embodiments individually or in any suitable subcombination. Furthermore, although features may be described above as acting in accordance with certain combinations and therefore even initially claimed, one or more features from a claimed combination may in some cases be deleted from that combination, and a claimed combination may be directed to a subcombination or variation of a subcombination.

类似地，虽然在附图中以特定次序描绘操作，但是这不应该被理解为要求以所示特定次序或者以顺序次序执行此类操作，或者要求执行所有图示的操作以实现所希望的结果。在某些情况下，多任务处理和并行处理可以是有利的。此外，上述的实施例中的各种系统组件的分离不应该被理解为在所有实施例中要求这种分离，并且应该理解的是，所描述的程序组件和系统通常可被一起集成在单个软件产品中或者封装到多个软件产品中。Similarly, although operations are depicted in a particular order in the accompanying drawings, this should not be understood as requiring that such operations be performed in the particular order shown or in a sequential order, or that all illustrated operations be performed to achieve the desired results. In some cases, multitasking and parallel processing can be advantageous. Furthermore, the separation of various system components in the above-described embodiments should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

因此，已经描述了主题的特定实施例。其它实施例在以下权利要求书的范围内。在一些情况下，权利要求书中所记载的动作可被以不同的次序执行并仍然实现所希望的结果。此外，附图中所描绘的过程不一定要求所示特定次序或顺序次序来获得所希望的结果。在某些实施方式中，多任务处理和并行处理可以是有利的。Thus, specific embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve the desired results. Furthermore, the processes depicted in the accompanying drawings do not necessarily require the particular order shown or sequential order to achieve the desired results. In certain embodiments, multitasking and parallel processing may be advantageous.

Claims

1. A method implemented in a multi-core processor with n cores, comprising:

Select k cores from the n cores of the multi-core processor to perform dedicated low-latency operations for the multi-core processor, wherein:

k is less than n;

m cores were not selected; and

Each core of the multi-core processor has a rated core capacity;

Operating the selected k cores below the rated core capacity, such that the k cores are collectively underutilized due to underutilized capacity; and

One or more of the m cores are operated at a capacity exceeding the rated core capacity, such that the m cores operate at a collective capacity exceeding the collective capacity of the rated core capacity of the m cores, and wherein the selection of the k cores in the multi-core processor is periodically cyclical, such that a different group of k cores is selected each time.

2. The method of claim 1, wherein operating one or more of the m cores at a capacity exceeding the rated core capacity comprises: operating one or more of the m cores at a capacity not exceeding the sum of the rated core capacity of the m cores and the underutilized capacity.

3. The method of claim 1, wherein operating one or more of the m cores at a capacity exceeding the rated core capacity comprises: operating one or more of the m cores at a capacity temporarily exceeding the sum of the rated core capacity and the underutilized capacity.

4. The method according to claim 1, wherein:

The multi-core processor has a rated processor capacity; and

Operating one or more of the m cores at a capacity exceeding the rated core capacity includes operating one or more of the m cores at a capacity such that the sum of the capacities at which the m cores are operating and the sum of the capacities at which the k cores are operating do not exceed the rated capacity of the processor.

5. The method according to claim 4, wherein the rated capacity of the processor is less than the sum of the rated core capacities of the n cores.

6. The method according to claim 1, wherein the rated core capacity is the rated core operating frequency.

7. The method according to claim 1, wherein the rated core capacity is the rated core power consumption.

8. The method according to claim 1, wherein the rated core capacity is the rated core operating temperature.

9. The method according to claim 1, wherein k+m equals n.

10. The method of claim 1, wherein the dedicated low-latency operation includes at least one of memory access operation, I/O operation, and inter-processor communication.

11. The method of claim 10, wherein the I/O operation is a microsecond I/O operation.

12. A non-transitory storage medium that communicates data with a multi-core processor having n cores and stores instructions for the multi-core processor to perform operations, the operations including:

Select k cores from the n cores of the multi-core processor to perform dedicated low-latency operations for the multi-core processor, where k is less than n, and m cores are not selected, and each core of the multi-core processor has a rated core capacity;

Operating the selected k cores below the rated core capacity, such that the k cores are collectively underutilized due to underutilized capacity, and the selection of the k cores in the multi-core processor is periodically cyclical; and

One or more of the m cores are operated at a capacity exceeding the rated core capacity, such that the m cores operate at a collective capacity exceeding the collective capacity of the rated core capacity of the m cores.

13. The non-transitory storage medium of claim 12, wherein operating one or more of the m cores at a capacity exceeding the rated core capacity comprises: operating one or more of the m cores at a capacity not exceeding the sum of the rated core capacity of the m cores and the underutilized capacity.

14. The non-transitory storage medium of claim 12, wherein operating one or more of the m cores at a capacity exceeding the rated core capacity comprises: operating one or more of the m cores at a capacity temporarily exceeding the sum of the rated core capacity and the underutilized capacity.

15. The non-transitory storage medium according to claim 12, wherein:

The multi-core processor has a rated processor capacity; and

Operating one or more of the m cores at a capacity exceeding the rated core capacity includes operating one or more of the m cores at a capacity such that the sum of the capacities of the m cores being operated and the sum of the capacities of the k cores being operated do not exceed the rated capacity of the processor.

16. The non-transitory storage medium according to claim 15, wherein the rated capacity of the processor is less than the sum of the rated core capacities of the n cores.

17. The non-transitory storage medium according to claim 12, wherein the rated core capacity is the rated core operating frequency.

18. The non-transitory storage medium according to claim 12, wherein the rated core capacity is the rated core power consumption.

19. The non-transitory storage medium according to claim 12, wherein the rated core capacity is the rated core operating temperature.

20. The non-transitory storage medium according to claim 12, wherein k+m equals n.

21. The non-transitory storage medium of claim 12, wherein the dedicated low-latency operation includes at least one of memory access operation, I/O operation, and inter-processor communication.

22. The non-transitory storage medium according to claim 21, wherein the I/O operation is a microsecond I/O operation.