[go: up one dir, main page]

US20160217085A1 - Processor and system for processing stream data at high speed - Google Patents

Processor and system for processing stream data at high speed Download PDF

Info

Publication number
US20160217085A1
US20160217085A1 US15/090,842 US201615090842A US2016217085A1 US 20160217085 A1 US20160217085 A1 US 20160217085A1 US 201615090842 A US201615090842 A US 201615090842A US 2016217085 A1 US2016217085 A1 US 2016217085A1
Authority
US
United States
Prior art keywords
data
output
processor
functional unit
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/090,842
Inventor
Kwon Taek Kwon
Seok Yoon Jung
Shi Hwa Lee
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Priority to US15/090,842 priority Critical patent/US20160217085A1/en
Publication of US20160217085A1 publication Critical patent/US20160217085A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/10Program control for peripheral devices
    • G06F13/102Program control for peripheral devices where the programme performs an interfacing function, e.g. device driver
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • G06F15/8023Two dimensional arrays, e.g. mesh, torus

Definitions

  • Example embodiments of the following description relate to a processor, and more particularly, to a processor for processing stream data at a high speed.
  • Batch-based processing is mainly used by processors to process a large amount of data, for example, a SRP.
  • a same amount of input data and/or output data required for an operation, at a predetermined amount is collected in an L1 memory, and the collected data is processed.
  • a same amount of input data received from an external source may be collected in an input data buffer of the L1 memory, and output data may be collected in an output data buffer of the L1 memory, while performing an operation on the collected input data. Subsequently, the collected output data may be transmitted to the external source.
  • the above operations may be performed simultaneously or sequentially.
  • the above-described batch-based processing inevitably requires a high-cost L1 memory with a large input and/or output (I/O) bandwidth and a large storage capacity.
  • FIG. 1 illustrates a diagram of a structure of a conventional processor.
  • the conventional processor may include a memory 110 , and a functional unit 120 .
  • the functional unit 120 may perform an operation, and the memory 110 may store I/O data of the operation.
  • a high-speed operation of the functional unit 120 may be required, and a high-speed memory 110 (for example, an L1 memory) may also be required.
  • the functional unit 120 may directly access the memory 110 to store the I/O data in the memory 110 .
  • the memory 110 may include, for example, a cache memory or a scratch pad memory (SPM).
  • the functional unit 120 When a processor is used to process a large amount of data, for example, for the purpose of multimedia or scientific computation, the functional unit 120 needs to assimilate a required operation amount, and simultaneously needs to sufficiently provide a data bandwidth required by the memory 110 .
  • each of an input buffer 111 and an output buffer 112 may use double buffering with buffers A and B to simultaneously perform an operation of the functional unit 120 , an input data loading of an external data producer 101 , and an output data fetching of an external data consumer 102 .
  • the memory 110 of the conventional processor needs to simultaneously satisfy the following I/O bandwidth requirements:
  • the memory may enable a high-speed operation with a high I/O bandwidth to provide batch-based processing.
  • the functional unit 120 when operated using an H/W pipeline process or S/W pipeline process to consecutively process a serial of operations, when a maximum throughput is reached, it is efficient in performance to process a large amount of data (for example, a size of a batch) at a time, if possible. When data is processed several times, bubbles may occur in a pipeline, thereby reducing efficiency.
  • a processor for processing stream data at a high speed including a functional unit to perform an operation on the stream data, an input interface module to perform relaying between the functional unit and an external data producer module that is used to input the stream data to the processor, and an output interface module to perform relaying between the functional unit and an external data consumer module that is used to receive an input of result data regarding a result of the operation performed by the functional unit.
  • the input interface module may include an input multiplexer (MUX) to receive the stream data from the external data producer module, and to store the received stream data, and an input channel connected between the input MUX and the functional unit, to transfer the stream data from the input MUX to the functional unit.
  • MUX input multiplexer
  • the input MUX may include a plurality of input queues, and may store the stream data in the plurality of input queues for each data field.
  • the input channel may receive an input request signal for a single data field from the functional unit, and may transfer the input request signal to the input MUX.
  • the input MUX may dequeue data from an input queue corresponding to the single data field, and may transfer the dequeued data to the input channel.
  • the input channel may transfer the dequeued data to the functional unit.
  • the input request signal may include a field number of the single data field.
  • the output interface module may include an output MUX to receive the result data from the functional unit, and to store the received result data, and an output channel connected between the output MUX and the functional unit, to transfer the result data from the functional unit to the output MUX.
  • the output MUX may include a plurality of output queues, and may store the result data in the plurality of output queues for each data field.
  • the output channel may receive an output request signal for the result data from the functional unit, and may transfer the output request signal to the output MUX.
  • the output request signal may include a data value of the result data.
  • the output MUX may enqueue the data value in an output queue corresponding to the output request signal.
  • the output MUX may receive the output request signal from the external data consumer module, may dequeue a plurality of pieces of data, respectively, from the plurality of output queues, and may transfer the plurality of pieces of dequeued data to the external data consumer module.
  • the output request signal may further include a field number of a data field of the result data.
  • a system including an external data producer module to output stream data to a processor; an input interface module, within the processor, to receive, store, and transfer the stream data; a functional unit, within the processor, to perform an operation on the stream data; an output interface module, within the processor, to receive, store, and transfer a result of the operation performed by the functional unit; and an external data consumer module to receive the result data.
  • FIG. 1 illustrates a diagram of a structure of a conventional processor
  • FIG. 2 illustrates a diagram of a structure of a processor, according to example embodiments
  • FIG. 3 illustrates a diagram of a structure of an input multiplexer (MUX), according to example embodiments.
  • FIG. 4 illustrates a diagram of a structure of an output MUX, according to example embodiments.
  • FIG. 2 illustrates a diagram of a structure of a processor, according to example embodiments.
  • the processor of FIG. 2 may process stream data at a high speed, and may include an input interface module 210 , a functional unit 220 , and an output interface module 230 .
  • the functional unit 220 may perform an operation on received data.
  • the processor of FIG. 2 may include a plurality of functional units 220 .
  • the processor of FIG. 2 may receive stream data from an external data producer module 201 .
  • the input interface module 210 may perform relaying between the processor and the external data producer module 201 . More specifically, the input interface module 210 may receive stream data from the external data producer module 201 , may store the received stream data, and may transfer the stored stream data to the functional unit 220 .
  • the input interface module 210 may include an input multiplexer (MUX) 211 , and an input channel 212 .
  • the input MUX 211 may receive the stream data from the external data producer module 201 . Additionally, the input MUX 211 may include a plurality of input queues, and may store the received stream data in the plurality of input queues for each data field.
  • MUX input multiplexer
  • the input channel 212 may be connected between the input MUX 211 and the functional unit 220 , and may transfer the stream data from the input MUX 211 to the functional unit 220 .
  • a plurality of input channels 212 may be provided, and each of the input channels 212 may correspond to each of the functional units 220 , or to a group of the functional units 220 .
  • the input channel 212 may perform relaying between the input MUX 211 and the functional unit 220 .
  • the functional unit 220 may require input data at regular intervals, and may transmit, to the input channel 212 , an input request signal for a single data field.
  • the input request signal may include a field number of the single data field.
  • the input channel 212 may receive the input request signal from the functional unit 220 .
  • the input channel 212 may transfer the received input request signal to the input MUX 211 .
  • the input MUX 211 may dequeue data from an input queue corresponding to the single data field associated with the input request signal, and may transfer the dequeued data to the input channel 212 .
  • the input channel 212 may transfer the dequeued data to the functional unit 220 .
  • the functional unit 220 may receive stream data from the input MUX 211 via the input channel 212 , and may process the received stream data.
  • the functional unit 220 may perform an operation on the stream data, and may output result data, obtained by performing the operation, to the external data consumer module 202 .
  • the output interface module 230 may perform relaying between the external data consumer module 202 and the functional unit 220 . Specifically, the output interface module 230 may receive the result data from the functional unit 220 , may store the received result data, and may output the stored result data to the external data consumer module 202 .
  • the output interface module 230 may include an output MUX 231 , and output channels 232 .
  • the output MUX 231 may receive the result data from the functional unit 220 . Additionally, the output MUX 231 may include a plurality of output queues, and may store the received result data in the plurality of output queues for each data field.
  • the output channel 232 may be connected between the output MUX 231 and the functional unit 220 , and may transfer the result data from the functional unit 220 to the output MUX 231 .
  • a plurality of output channels 232 may be provided, and each of the output channels 232 may correspond to each of the functional units 220 , or to a group of the functional units 220 .
  • the output channel 232 may perform relaying between the output MUX 231 and the functional unit 220 .
  • the functional unit 220 may transmit an output request signal for the result data to the output channel 232 at regular intervals.
  • the output request signal may include a data value of the result data. Additionally, the output request signal may further include a field number of a data field of the result data.
  • the output channel 232 may receive the output request signal from the functional unit 220 .
  • the output channel 232 may transfer the received output request signal to the output MUX 231 .
  • the output MUX 231 may enqueue the data value of the result data in an output queue corresponding to the output request signal.
  • the external data consumer module 202 may transmit to the output MUX 231 a signal to request an output of the result data.
  • the output MUX 231 may receive the signal from the external data consumer module 202 .
  • the output MUX 231 may dequeue a plurality of pieces of data, respectively, from the plurality of output queues, and may transfer the plurality of pieces of dequeued data to the external data consumer module 202 .
  • An operation of the output channel 232 , and an operation of the output MUX 231 will be further described with reference to FIG. 4 .
  • FIG. 3 illustrates a diagram of a structure of an input MUX 310 , according to example embodiments.
  • the input MUX 310 may include a plurality of input queues 311 , 312 , 313 , and 314 .
  • the external data producer module 301 When an external data producer module 301 is ready to output stream data, and when space exists in the input queues 311 , 312 , 313 , and 314 in the input MUX 310 , the external data producer module 301 may transmit the stream data to the input MUX 310 .
  • the input MUX 310 may store the stream data received from the external data producer module 301 in the input queues 311 , 312 , 313 , and 314 , for each data field.
  • the input MUX 310 may include a decoder 315 to distribute the received stream data in the input queues 311 , 312 , 313 , and 314 , based on data field numbers, and to store the distributed stream data, based on a decoding logic. Additionally, the input MUX 310 may further include a control register file 316 to control an operation of a module.
  • a functional unit 330 may require input data at regular intervals, and may transmit to an input channel 320 an input request for a single data field.
  • the input request signal may include a field number of the single data field that is required by the functional unit 330 .
  • the input channel 320 may transmit the input request signal to the input MUX 310 .
  • the input MUX 310 may receive the input request signal, and may dequeue data from an input queue corresponding to the field number that is included in the received input request signal. Additionally, the input MUX 310 may transmit the dequeued data to the input channel 320 . Subsequently, the input channel 320 may transfer the dequeued data to the functional unit 330 , so that the functional unit 330 may receive the required input data.
  • FIG. 4 illustrates a diagram of a structure of an output MUX 410 , according to example embodiments.
  • the output MUX 410 may include a plurality of output queues 411 , 412 , 413 , and 414 . Specifically, the plurality of output queues 411 , 412 , 413 , and 414 may be included in the output MUX 410 for each data field.
  • a functional unit 430 may transmit, to an output channel 420 at regular intervals, an output request signal for result data regarding a result of an operation performed by the functional unit 430 .
  • the output request signal may include a data value of the result data, and a data field number of the result data.
  • the output channel 420 may transfer the output request signal to the output MUX 410 , and the output MUX 410 may receive the output request signal.
  • the output MUX 410 may store the result data in an output queue corresponding to the data field number included in the output request signal.
  • the output MUX 410 may include a decoder 415 to distribute the received result data in the output queues 411 , 412 , 413 , and 414 , based on data field numbers.
  • the decoder 415 may store the result data in an output queue, corresponding to the data field number, based on a decoding logic.
  • the output MUX 410 may further include a control register file 416 to control an operation of a
  • the external data consumer module 401 may transmit to the output MUX 410 a signal to request an output of the stored result data.
  • the output MUX 410 may dequeue data from each of the output queues 411 , 412 , 413 , and 414 , and may transmit the dequeued data to the external data consumer module 401 .
  • the above-described example embodiments may be recorded in non-transitory computer-readable media including program instructions to implement various operations embodied by a computer.
  • the media may also include, alone or in combination with the program instructions, data files, data structures, and the like.
  • the program instructions recorded on the media may be those specially designed and constructed for the purposes of the example embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts.
  • non-transitory computer-readable media examples include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM disks and DVDs; magneto-optical media such as optical discs; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like.
  • program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
  • Examples of the magnetic recording apparatus include a hard disk device (HDD), a flexible disk (FD), and a magnetic tape (MT).
  • optical disk examples include a DVD (Digital Versatile Disc), a DVD-RAM, a CD-ROM (Compact Disc—Read Only Memory), and a CD-R (Recordable)/RW.
  • DVD Digital Versatile Disc
  • CD-ROM Compact Disc—Read Only Memory
  • CD-R Recordable/RW.
  • the described hardware devices may be configured to act as one or more software modules in order to perform the operations of the above-described example embodiments, or vice versa.
  • At least one processor may be included to execute at least one of the above-described units and methods.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Bus Control (AREA)

Abstract

A processor for processing stream data at a high speed is provided. The processor may include a functional unit to perform an operation on the stream data, an input interface module to perform relaying between the functional unit and an external data producer module that is used to input the stream data to the processor, and an output interface module to perform relaying between the functional unit and an external data consumer module that is used to receive an input of result data regarding a result of the operation performed by the functional unit.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This Application is a Continuation of U.S. patent application Ser. No. 13/599,465 filed on Aug. 30, 2012, which claims the benefit under 35 U.S.C. §119(a) Korean Patent Application No. 10-2011-0094030, filed on Sep. 19, 2011, in the Korean Intellectual Property Office, the entire disclosures of which are incorporated herein by reference for all purposes.
  • BACKGROUND
  • 1. Field
  • Example embodiments of the following description relate to a processor, and more particularly, to a processor for processing stream data at a high speed.
  • 2. Description of the Related Art
  • Batch-based processing is mainly used by processors to process a large amount of data, for example, a SRP. In batch-based processing, a same amount of input data and/or output data required for an operation, at a predetermined amount is collected in an L1 memory, and the collected data is processed.
  • First, a same amount of input data received from an external source, at a designated amount, may be collected in an input data buffer of the L1 memory, and output data may be collected in an output data buffer of the L1 memory, while performing an operation on the collected input data. Subsequently, the collected output data may be transmitted to the external source. The above operations may be performed simultaneously or sequentially. The above-described batch-based processing inevitably requires a high-cost L1 memory with a large input and/or output (I/O) bandwidth and a large storage capacity.
  • Hereinafter, conventional batch-based processing will be further described with reference to FIG. 1.
  • FIG. 1 illustrates a diagram of a structure of a conventional processor.
  • Referring to FIG. 1, the conventional processor may include a memory 110, and a functional unit 120. The functional unit 120 may perform an operation, and the memory 110 may store I/O data of the operation.
  • To achieve high performance in the conventional processor, a high-speed operation of the functional unit 120 may be required, and a high-speed memory 110 (for example, an L1 memory) may also be required. The functional unit 120 may directly access the memory 110 to store the I/O data in the memory 110. The memory 110 may include, for example, a cache memory or a scratch pad memory (SPM).
  • When a processor is used to process a large amount of data, for example, for the purpose of multimedia or scientific computation, the functional unit 120 needs to assimilate a required operation amount, and simultaneously needs to sufficiently provide a data bandwidth required by the memory 110.
  • In the conventional processor, each of an input buffer 111 and an output buffer 112 may use double buffering with buffers A and B to simultaneously perform an operation of the functional unit 120, an input data loading of an external data producer 101, and an output data fetching of an external data consumer 102.
  • The memory 110 of the conventional processor needs to simultaneously satisfy the following I/O bandwidth requirements:
  • 1. Write input buffer write
    2. Read input buffer
    3. Randomly access to L1 memory
    4. Write output buffer
    5. Read output buffer
  • As the conventional processor requires processing of a large amount of data with a higher performance, the I/O bandwidth requirements may be increased. However, since fully using a same capacity of a multi-port, wide-I/O memory at a considerable capacity causes high costs for an H/W area and a design burden, there is a need to sacrifice either performance or cost. In this case, the memory may enable a high-speed operation with a high I/O bandwidth to provide batch-based processing.
  • Additionally, when the functional unit 120 is operated using an H/W pipeline process or S/W pipeline process to consecutively process a serial of operations, when a maximum throughput is reached, it is efficient in performance to process a large amount of data (for example, a size of a batch) at a time, if possible. When data is processed several times, bubbles may occur in a pipeline, thereby reducing efficiency.
  • Since there is a need to use a large-capacity memory to set a large size of a batch to increase the efficiency, costs for an H/W area may be increased in proportion to the capacity of the memory.
  • Accordingly, there is a desire for a stream I/O interface architecture that may more efficiently process a large amount of data by overcoming a limitation of the conventional L1 memory-based batch-based processing.
  • SUMMARY
  • The foregoing and/or other aspects are achieved by providing a processor for processing stream data at a high speed, including a functional unit to perform an operation on the stream data, an input interface module to perform relaying between the functional unit and an external data producer module that is used to input the stream data to the processor, and an output interface module to perform relaying between the functional unit and an external data consumer module that is used to receive an input of result data regarding a result of the operation performed by the functional unit.
  • The input interface module may include an input multiplexer (MUX) to receive the stream data from the external data producer module, and to store the received stream data, and an input channel connected between the input MUX and the functional unit, to transfer the stream data from the input MUX to the functional unit.
  • The input MUX may include a plurality of input queues, and may store the stream data in the plurality of input queues for each data field.
  • The input channel may receive an input request signal for a single data field from the functional unit, and may transfer the input request signal to the input MUX. The input MUX may dequeue data from an input queue corresponding to the single data field, and may transfer the dequeued data to the input channel. The input channel may transfer the dequeued data to the functional unit.
  • The input request signal may include a field number of the single data field.
  • The output interface module may include an output MUX to receive the result data from the functional unit, and to store the received result data, and an output channel connected between the output MUX and the functional unit, to transfer the result data from the functional unit to the output MUX.
  • The output MUX may include a plurality of output queues, and may store the result data in the plurality of output queues for each data field.
  • The output channel may receive an output request signal for the result data from the functional unit, and may transfer the output request signal to the output MUX. The output request signal may include a data value of the result data. The output MUX may enqueue the data value in an output queue corresponding to the output request signal.
  • The output MUX may receive the output request signal from the external data consumer module, may dequeue a plurality of pieces of data, respectively, from the plurality of output queues, and may transfer the plurality of pieces of dequeued data to the external data consumer module.
  • The output request signal may further include a field number of a data field of the result data.
  • The foregoing and/or other aspects are achieved by providing a system, including an external data producer module to output stream data to a processor; an input interface module, within the processor, to receive, store, and transfer the stream data; a functional unit, within the processor, to perform an operation on the stream data; an output interface module, within the processor, to receive, store, and transfer a result of the operation performed by the functional unit; and an external data consumer module to receive the result data.
  • Additional aspects, features, and/or advantages of example embodiments will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the disclosure.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • These and/or other aspects and advantages will become apparent and more readily appreciated from the following description of the example embodiments, taken in conjunction with the accompanying drawings of which:
  • FIG. 1 illustrates a diagram of a structure of a conventional processor;
  • FIG. 2 illustrates a diagram of a structure of a processor, according to example embodiments;
  • FIG. 3 illustrates a diagram of a structure of an input multiplexer (MUX), according to example embodiments; and
  • FIG. 4 illustrates a diagram of a structure of an output MUX, according to example embodiments.
  • DETAILED DESCRIPTION
  • Reference will now be made in detail to example embodiments, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. Example embodiments are described below to explain the present disclosure by referring to the figures.
  • FIG. 2 illustrates a diagram of a structure of a processor, according to example embodiments.
  • The processor of FIG. 2 may process stream data at a high speed, and may include an input interface module 210, a functional unit 220, and an output interface module 230.
  • The functional unit 220 may perform an operation on received data. In another embodiment, the processor of FIG. 2 may include a plurality of functional units 220.
  • The processor of FIG. 2 may receive stream data from an external data producer module 201. The input interface module 210 may perform relaying between the processor and the external data producer module 201. More specifically, the input interface module 210 may receive stream data from the external data producer module 201, may store the received stream data, and may transfer the stored stream data to the functional unit 220.
  • Hereinafter, a configuration of the input interface module 210 will be further described.
  • The input interface module 210 may include an input multiplexer (MUX) 211, and an input channel 212. The input MUX 211 may receive the stream data from the external data producer module 201. Additionally, the input MUX 211 may include a plurality of input queues, and may store the received stream data in the plurality of input queues for each data field.
  • The input channel 212 may be connected between the input MUX 211 and the functional unit 220, and may transfer the stream data from the input MUX 211 to the functional unit 220. Depending on example embodiments, a plurality of input channels 212 may be provided, and each of the input channels 212 may correspond to each of the functional units 220, or to a group of the functional units 220.
  • Since the input MUX 211 and the functional unit 220 are connected via the input channel 212, the input channel 212 may perform relaying between the input MUX 211 and the functional unit 220. For example, the functional unit 220 may require input data at regular intervals, and may transmit, to the input channel 212, an input request signal for a single data field. The input request signal may include a field number of the single data field. The input channel 212 may receive the input request signal from the functional unit 220.
  • Additionally, the input channel 212 may transfer the received input request signal to the input MUX 211. In response to the input request signal, the input MUX 211 may dequeue data from an input queue corresponding to the single data field associated with the input request signal, and may transfer the dequeued data to the input channel 212. Subsequently, the input channel 212 may transfer the dequeued data to the functional unit 220. In other words, the functional unit 220 may receive stream data from the input MUX 211 via the input channel 212, and may process the received stream data. An operation of the input channel 212, and an operation of the input MUX 211 will be further described with reference to FIG. 3.
  • Through the above-described operations, the functional unit 220 may perform an operation on the stream data, and may output result data, obtained by performing the operation, to the external data consumer module 202. The output interface module 230 may perform relaying between the external data consumer module 202 and the functional unit 220. Specifically, the output interface module 230 may receive the result data from the functional unit 220, may store the received result data, and may output the stored result data to the external data consumer module 202.
  • Hereinafter, a configuration of the output interface module 230 will be further described.
  • The output interface module 230 may include an output MUX 231, and output channels 232. The output MUX 231 may receive the result data from the functional unit 220. Additionally, the output MUX 231 may include a plurality of output queues, and may store the received result data in the plurality of output queues for each data field.
  • The output channel 232 may be connected between the output MUX 231 and the functional unit 220, and may transfer the result data from the functional unit 220 to the output MUX 231. Depending on example embodiments, a plurality of output channels 232 may be provided, and each of the output channels 232 may correspond to each of the functional units 220, or to a group of the functional units 220.
  • Since the output MUX 231 and the functional unit 220 are connected via the output channel 232, the output channel 232 may perform relaying between the output MUX 231 and the functional unit 220. For example, the functional unit 220 may transmit an output request signal for the result data to the output channel 232 at regular intervals. The output request signal may include a data value of the result data. Additionally, the output request signal may further include a field number of a data field of the result data. The output channel 232 may receive the output request signal from the functional unit 220.
  • Additionally, the output channel 232 may transfer the received output request signal to the output MUX 231. In response to the output request signal, the output MUX 231 may enqueue the data value of the result data in an output queue corresponding to the output request signal.
  • For example, when the result data is required, the external data consumer module 202 may transmit to the output MUX 231 a signal to request an output of the result data. In this example, the output MUX 231 may receive the signal from the external data consumer module 202. In response to the signal, the output MUX 231 may dequeue a plurality of pieces of data, respectively, from the plurality of output queues, and may transfer the plurality of pieces of dequeued data to the external data consumer module 202. An operation of the output channel 232, and an operation of the output MUX 231 will be further described with reference to FIG. 4.
  • FIG. 3 illustrates a diagram of a structure of an input MUX 310, according to example embodiments.
  • Referring to FIG. 3, the input MUX 310 may include a plurality of input queues 311, 312, 313, and 314.
  • When an external data producer module 301 is ready to output stream data, and when space exists in the input queues 311, 312, 313, and 314 in the input MUX 310, the external data producer module 301 may transmit the stream data to the input MUX 310. The input MUX 310 may store the stream data received from the external data producer module 301 in the input queues 311, 312, 313, and 314, for each data field. Depending on example embodiments, the input MUX 310 may include a decoder 315 to distribute the received stream data in the input queues 311, 312, 313, and 314, based on data field numbers, and to store the distributed stream data, based on a decoding logic. Additionally, the input MUX 310 may further include a control register file 316 to control an operation of a module.
  • A functional unit 330 may require input data at regular intervals, and may transmit to an input channel 320 an input request for a single data field. The input request signal may include a field number of the single data field that is required by the functional unit 330.
  • The input channel 320 may transmit the input request signal to the input MUX 310. The input MUX 310 may receive the input request signal, and may dequeue data from an input queue corresponding to the field number that is included in the received input request signal. Additionally, the input MUX 310 may transmit the dequeued data to the input channel 320. Subsequently, the input channel 320 may transfer the dequeued data to the functional unit 330, so that the functional unit 330 may receive the required input data.
  • FIG. 4 illustrates a diagram of a structure of an output MUX 410, according to example embodiments.
  • Referring to FIG. 4, the output MUX 410 may include a plurality of output queues 411, 412, 413, and 414. Specifically, the plurality of output queues 411, 412, 413, and 414 may be included in the output MUX 410 for each data field.
  • A functional unit 430 may transmit, to an output channel 420 at regular intervals, an output request signal for result data regarding a result of an operation performed by the functional unit 430. The output request signal may include a data value of the result data, and a data field number of the result data. The output channel 420 may transfer the output request signal to the output MUX 410, and the output MUX 410 may receive the output request signal. The output MUX 410 may store the result data in an output queue corresponding to the data field number included in the output request signal. Depending on example embodiments, the output MUX 410 may include a decoder 415 to distribute the received result data in the output queues 411, 412, 413, and 414, based on data field numbers. The decoder 415 may store the result data in an output queue, corresponding to the data field number, based on a decoding logic. Additionally, the output MUX 410 may further include a control register file 416 to control an operation of a module.
  • For example, when an external data consumer module 401 is ready to fetch the result data, and when the result data is stored in the output queues 411, 412, 413, and 414 included in the output MUX 410, the external data consumer module 401 may transmit to the output MUX 410 a signal to request an output of the stored result data. When the signal is received from the external data consumer module 401, the output MUX 410 may dequeue data from each of the output queues 411, 412, 413, and 414, and may transmit the dequeued data to the external data consumer module 401.
  • The above-described example embodiments may be recorded in non-transitory computer-readable media including program instructions to implement various operations embodied by a computer. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. The program instructions recorded on the media may be those specially designed and constructed for the purposes of the example embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of non-transitory computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM disks and DVDs; magneto-optical media such as optical discs; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like. Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter. Examples of the magnetic recording apparatus include a hard disk device (HDD), a flexible disk (FD), and a magnetic tape (MT). Examples of the optical disk include a DVD (Digital Versatile Disc), a DVD-RAM, a CD-ROM (Compact Disc—Read Only Memory), and a CD-R (Recordable)/RW. The described hardware devices may be configured to act as one or more software modules in order to perform the operations of the above-described example embodiments, or vice versa.
  • Further, according to an aspect of the embodiments, any combinations of the described features, functions and/or operations can be provided.
  • Moreover, at least one processor may be included to execute at least one of the above-described units and methods.
  • Although example embodiments have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these example embodiments without departing from the principles and spirit of the disclosure, the scope of which is defined in the claims and their equivalents.

Claims (20)

1. A processor for processing stream data at a high speed, the processor comprising:
a functional unit configured to perform an operation on the stream data;
an input interface module configured to relay data between the functional unit and an external data producer module, and
an output interface module configured to relay data between the functional unit and an external data consumer module, the external data consumer module configured to receive result data from the operation performed by the functional unit.
2. The processor of claim 1, wherein the input interface module is configured to store the received stream data.
3. The processor of claim 2, wherein the input interface module comprises a plurality of input queues, and
wherein the input interface module is configured to store the stream data in the plurality of input queues for each data field.
4. The processor of claim 3, wherein the input interface module receives an input request signal for a single data field from the functional unit,
wherein the input interface module dequeues data from an input queue corresponding to the single data field, and transfers the dequeued data to the functional unit.
5. The processor of claim 4, wherein the input request signal comprises a field number of the single data field.
6. The processor of claim 1, wherein the output interface module configured to receive the result data from the functional unit, and to store the received result data.
7. The processor of claim 6, wherein the output interface module comprises a plurality of output queues, and
wherein the output interface module stores the result data in the plurality of output queues for each data field.
8. The processor of claim 7, wherein the output interface module receives an output request signal for the result data from the functional unit,
wherein the output request signal comprises a data value of the result data, and
wherein the output interface module enqueues the data value in an output queue corresponding to the output request signal.
9. The processor of claim 8, wherein the output interface module receives the output request signal from the external data consumer module, dequeues a plurality of pieces of data, respectively, from the plurality of output queues, and transfers the plurality of pieces of dequeued data to the external data consumer module.
10. The processor of claim 8, wherein the output request signal further comprises a field number of a data field of the result data.
11. The processor of claim 1, wherein the functional unit includes a plurality of functional units.
12. A processor for processing stream data at a high speed, the processor comprising:
a functional unit configured to perform an operation on the stream data; and
an input interface module configured to relay data between the functional unit and an external data producer module, and
13. The processor of claim 12, further comprising:
an output interface module configured to relay data between the functional unit and an external data consumer module, the external data consumer module configured to receive result data from the operation performed by the functional unit.
14. The processor of claim 12, wherein the input interface module comprises a plurality of input queues, and
wherein the input interface module is configured to store the stream data in the plurality of input queues for each data field.
15. The processor of claim 14, wherein the input interface module receives an input request signal for a single data field from the functional unit,
wherein the input interface module dequeues data from an input queue corresponding to the single data field, and transfers the dequeued data to the functional unit.
16. A processor for processing stream data at a high speed, the processor comprising:
a functional unit configured to perform an operation on the stream data; and
an output interface module configured to relay data between the functional unit and an external data consumer module, the external data consumer module configured to receive result data from the operation performed by the functional unit.
17. The processor of claim 16, further comprising:
an input interface module configured to relay data between the functional unit and an external data producer module.
18. The processor of claim 16, wherein the output interface module comprises a plurality of output queues, and
wherein the output interface module stores the result data in the plurality of output queues for each data field.
19. The processor of claim 18, wherein the output interface module receives an output request signal for the result data from the functional unit,
wherein the output request signal comprises a data value of the result data, and
wherein the output interface module enqueues the data value in an output queue corresponding to the output request signal.
20. The processor of claim 19, wherein the output interface module receives the output request signal from the external data consumer module, dequeues a plurality of pieces of data, respectively, from the plurality of output queues, and transfers the plurality of pieces of dequeued data to the external data consumer module.
US15/090,842 2011-09-19 2016-04-05 Processor and system for processing stream data at high speed Abandoned US20160217085A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/090,842 US20160217085A1 (en) 2011-09-19 2016-04-05 Processor and system for processing stream data at high speed

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR1020110094030A KR101863605B1 (en) 2011-09-19 2011-09-19 Processor for Processing Stream Data
KR10-2011-0094030 2011-09-19
US13/599,465 US9323717B2 (en) 2011-09-19 2012-08-30 Processor and system for processing stream data at high speed
US15/090,842 US20160217085A1 (en) 2011-09-19 2016-04-05 Processor and system for processing stream data at high speed

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/599,465 Continuation US9323717B2 (en) 2011-09-19 2012-08-30 Processor and system for processing stream data at high speed

Publications (1)

Publication Number Publication Date
US20160217085A1 true US20160217085A1 (en) 2016-07-28

Family

ID=47881732

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/599,465 Active 2034-12-15 US9323717B2 (en) 2011-09-19 2012-08-30 Processor and system for processing stream data at high speed
US15/090,842 Abandoned US20160217085A1 (en) 2011-09-19 2016-04-05 Processor and system for processing stream data at high speed

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/599,465 Active 2034-12-15 US9323717B2 (en) 2011-09-19 2012-08-30 Processor and system for processing stream data at high speed

Country Status (2)

Country Link
US (2) US9323717B2 (en)
KR (1) KR101863605B1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102101834B1 (en) 2013-10-08 2020-04-17 삼성전자 주식회사 Image processing apparatus and method
CN110580238B (en) * 2018-06-11 2024-11-01 Arm有限公司 Data processing system, method of operating a data processing system, and medium

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0533373A3 (en) 1991-09-18 1993-04-21 Ncr International Inc. Computer system having cache memory
US5710907A (en) 1995-12-22 1998-01-20 Sun Microsystems, Inc. Hybrid NUMA COMA caching system and methods for selecting between the caching modes
US6002883A (en) 1996-07-18 1999-12-14 International Business Machines Corporation System with intersystem information links for intersystem traffic having I/O traffic being transmitted to and from processor bus via processor means
KR19980010786U (en) 1996-08-12 1998-05-15 양재신 Hood Panel Automatic Opening Device
US6253292B1 (en) 1997-08-22 2001-06-26 Seong Tae Jhang Distributed shared memory multiprocessor system based on a unidirectional ring bus using a snooping scheme
KR100258026B1 (en) 1997-10-04 2000-06-01 구자홍 Method and apparatus for transaction controlling of multi-processor system
JP3657428B2 (en) 1998-04-27 2005-06-08 株式会社日立製作所 Storage controller
US6704799B1 (en) 1998-12-29 2004-03-09 Honeywell Inc. Time-efficient inter-process communication in a harmonic rate system
KR100344065B1 (en) 2000-02-15 2002-07-24 전주식 Shared memory multiprocessor system based on multi-level cache
JP2002014827A (en) 2000-05-19 2002-01-18 Micro-Epsilon Messtechnik Gmbh & Co Kg Method for controlling and visualizing process
JP3426223B2 (en) 2000-09-27 2003-07-14 株式会社ソニー・コンピュータエンタテインメント Multiprocessor system, data processing system, data processing method, and computer program
US6907490B2 (en) 2000-12-13 2005-06-14 Intel Corporation Method and an apparatus for a re-configurable processor
ATE386982T1 (en) 2001-05-11 2008-03-15 Koninkl Philips Electronics Nv DEVICE FOR PARALLEL DATA PROCESSING AND CAMERA SYSTEM HAVING SUCH A DEVICE
US7194587B2 (en) 2003-04-24 2007-03-20 International Business Machines Corp. Localized cache block flush instruction
KR100590764B1 (en) 2003-12-11 2006-06-15 한국전자통신연구원 Large Data Processing Using Scheduler in Multiprocessor System
US7366845B2 (en) * 2004-06-29 2008-04-29 Intel Corporation Pushing of clean data to one or more processors in a system having a coherency protocol
CA2636833C (en) * 2007-06-21 2011-08-30 Maged E. Beshai Large-scale packet switch
KR100936601B1 (en) 2008-05-15 2010-01-13 재단법인서울대학교산학협력재단 Multiprocessor system
KR20100035394A (en) * 2008-09-26 2010-04-05 삼성전자주식회사 Memory managing apparatus and method in parallel processing
US8479216B2 (en) * 2009-08-18 2013-07-02 International Business Machines Corporation Method for decentralized load distribution in an event-driven system using localized migration between physically connected nodes and load exchange protocol preventing simultaneous migration of plurality of tasks to or from a same node
KR101533820B1 (en) 2009-09-25 2015-07-09 삼성전자 주식회사 Method and memory manager for managing memory

Also Published As

Publication number Publication date
KR20130030515A (en) 2013-03-27
US9323717B2 (en) 2016-04-26
US20130073756A1 (en) 2013-03-21
KR101863605B1 (en) 2018-07-06

Similar Documents

Publication Publication Date Title
US9959210B2 (en) Systems and methods for dynamic optimization of flash cache in storage devices
US9164772B2 (en) Hybrid queue for storing instructions from fetch queue directly in out-of-order queue or temporarily in in-order queue until space is available
US6728824B1 (en) Method and apparatus for controlling multi-channel bitstreams
US20080294913A1 (en) Disk array controller, disk array control method and storage system
US7606954B2 (en) Data storage using compression
TW201234188A (en) Memory access device for memory sharing among multiple processors and access method for the same
US7694040B2 (en) Method and apparatus of arranging priority queue and arbitrating for memory access requests
US20110283042A1 (en) Transaction splitting apparatus and method
EP2214103B1 (en) I/O controller and descriptor transfer method
JP2011141823A (en) Data processing device and parallel arithmetic device
US20080301381A1 (en) Device and method for controlling commands used for flash memory
US20120221797A1 (en) Multi-port cache memory apparatus and method
US20160217085A1 (en) Processor and system for processing stream data at high speed
EP1513071B1 (en) Memory bandwidth control device
US9594714B2 (en) Multi-channel storage system supporting a multi-command protocol
US6877060B2 (en) Dynamic delayed transaction buffer configuration based on bus frequency
CN101470600B (en) Method and apparatus for processing very long instruction word
US8285932B2 (en) Mass storage system with improved usage of buffer capacity
US20150243259A1 (en) Method and apparatus for transferring data in a computer
US10102005B2 (en) External intrinsic interface
CN116360672A (en) Method, device and electronic device for accessing memory
US7805548B2 (en) Method, medium and system setting transfer unit corresponding to a minimum overhead in a data processing system
US20130036275A1 (en) Circuit and method for rapidly transmitting data
US11029881B2 (en) Memory controller, memory system, and information processing system
US20090077325A1 (en) Method and arrangements for memory access

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION