WO2007011203A1 - Scalable control interface for large-scale signal processing systems. - Google Patents
Scalable control interface for large-scale signal processing systems. Download PDFInfo
- Publication number
- WO2007011203A1 WO2007011203A1 PCT/NL2006/000256 NL2006000256W WO2007011203A1 WO 2007011203 A1 WO2007011203 A1 WO 2007011203A1 NL 2006000256 W NL2006000256 W NL 2006000256W WO 2007011203 A1 WO2007011203 A1 WO 2007011203A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- control
- data
- signal processing
- node
- nodes
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
Definitions
- the invention relates to a system for processing of signals.
- the invention further relates to a signal processing operation.
- the invention also relates to a method for controling and re-configuring a (large-scale) system for processing of signals.
- the invention further relates to an assembly of at least one control unit and an interface to a signal processing device.
- Such a network may for example include one or more separate hardware components, such as a Central Processing Unit (CPU), a Field Programmable Gate Array (FPGA) or a Digital Signal Processor (DSP) and/or one or more software components running on a programmable device, such as a software processor.
- CPU Central Processing Unit
- FPGA Field Programmable Gate Array
- DSP Digital Signal Processor
- each of the components processes a part of the data or performs a certain processing task with the data and thereafter passes data to another component.
- a scalable and speed-optimized mapping of signal processing applications onto such a network requires a scalable and optimized configuration of the components as well as optimal and flexible control of the components.
- a disadvantage of the prior art systems is that either the performance of the components is reduced by the control or that the system is not flexible, e.g. that the system is not scalable and/or not portable across different platforms.
- dedicated signal processing devices also referred to in the art as "Intellectual Property” or IP
- IP Intellectual Property
- the development of the interfaces between the controller of the processing devices and the signal processing devices is time consuming.
- Developing or reconfiguring an arrangement of dedicated signal processing devices therefore requires a development of suitable interfaces, each adapted to the specific, dedicated signal processing device. This makes the system inflexible.
- General purpose processors can execute software applications that are relatively flexible and rapid to develop. However, in case a general purpose processor is used to control a configuration of dedicated signal processing components, the processor runs at a low speed rate compared to the maximum speed rate of the signal processing components. Furthermore, the general purpose processor executes only sequential instructions and can not fulfil hard realtime requirements, whereas the signal processing components typically require to be controlled with hard real-time precision.
- the invention provides a system according to claim 1.
- the invention further provides a signal processing operation according to claim 12.
- the invention also provides a modifying according to claim 13 and to an assembly according to claim 14.
- Fig. 1 schematically shows a block diagram of an example of an embodiment of a system according to the invention.
- Fig. 2 schematically shows a block diagram of an example of an embodiment of an internal node suitable for the example of fig. 1.
- Fig. 3 schematically shows a block diagram of an example of an embodiment of a part of an interfacing node suitable for the example of fig. 1
- Fig. 4 schematically shows a block diagram of an example of an embodiment of another part of the example of fig. 3 connected to an dedicated data processing component.
- Fig. 5 schematically shows a circuit diagram of an example of an embodiment of an interfacing node connected to a data processing unit.
- Fig. 6 schematically shows circuit diagram of the example of fig. 5 with added data processing units.
- Fig. 7 schematically shows a block diagram of a network used in a simulation.
- Fig. 8 schematically shows a timing diagram for the network of fig. 7.
- Such distributed systems may comprise a hierarchical data processing network, a hierarchical control network, an appropriate interface between them, and a superimposed synchronization network.
- the nodes of the data processing network execute applications that are themselves modelled as process networks.
- the nodes of the control network process and forward control information (to (reconfigure, test, monitor the system) in different operation modes and are themselves modelled as extended finite state machine networks.
- the signal processing applications dictate the specification of the system architecture and are dominant over the control applications.
- the system includes several sub-(sub-)systems down to the level of the components on platforms. Moreover the sub-(sub)-systems have a single entry point for the control. This imposes a hierarchy in the control network.
- the example of fig. 1 includes a hierarchical control network that is interfaced with a data processing network. Tasks that are executed in the data processing network have periodic execution cycles that fall well within periods of a synchronization pulse train that is distributed to all nodes in the control network.
- the control network includes distributed nodes and leaf-nodes. This network is instantiated at design time. Nodes may for example be mapped onto software processors/CPUs (preferably with RISC architecture) and leaf-nodes may for example be mapped onto software processors and/or timed-FSM plug-ins.
- the control network can execute control commands in its nodes and transport the control/monitoring information in the system.
- To each control application corresponds a set of procedures that schedule the execution of control commands in the network.
- the control commands are issued in a top-down manner from a root to the FSM actors (i.e. the nodes in the control network) distributed in the system. These control commands can correspond to a (re-)configuration, test, reset, monitoring or any other action on the system.
- the nodes are connected across adjacent levels through communication channels (see Figure 1). They all receive a global notion of time (synchronization pulse). A suitable period between successive synchronisation pulses may for example be the smallest common multiple of all periods in all dedicated data processing devices.
- the control network is interfaced with processes in the PN network, via its nodes in the lowest level (leaf- nodes), through point to point communication channels.
- Communication between nodes may be master-slave based. Each node has a unique identifier, which can be masked to address groups of nodes. A node may send a control packed to the node acting as its slave.
- the control packets may include two parts: 1) a header, and 2) control data information.
- the header may for example include:
- a timestamp which indicates the time at which the command needs to be executed (with respect to the global synchronization pulse).
- a priority which indicates an execution priority order given a timestamp
- Control data information contains parameters that are required by a node and related to the command in the packet received
- the example has a control interface mechanism which facilitates the wrapping and re-use of high-throughput data-flow IPs and permits design-scaling and re-use in a network of multiple re-configurable platforms.
- the example of fig. 1 includes a control network which is based on a (hierarchical) network of synchronized node processors.
- the node processors can for example be implemented as Central Processing Units or other software processors connected to one or more memories which provides suitable instructions to the processors.
- point-to-point control interfaces may be provided in the network to bridge central control to the data-flow processes.
- the clock domains of the (high-level, soft real-time) control and the (low-level, hard real-time) data processing may be decoupled, so as to allow for a maximal processing frequency in the data processing layer.
- the design of the system can easily be scaled (increase in the number of dataflow IPs and associated dedicated leaf- nodes, and/or increase in the number of re- configurable platforms they are mapped onto) without altering the existing designs, therefore facilitating design-reuse as well.
- This is achieved through a hierarchical control network with physically separated point-to-point control paths.
- a synchronization pulse is distributed to all nodes and leaf-nodes in the hierarchical control network.
- the synchronization pulse preferably has a period which is a common multiple of the periodic execution cycles of all data-flow IPs.
- This pulse train permits sharing a global notion of time in the hierarchical control network.
- time-stamps may be generated that relate to this global notion of time so as to execute control commands at specific times in the control network and in the signal processing network.
- the root of the hierarchical control network initiates procedures by sending, preferably asynchronously, control packets towards the nodes.
- Asynchronous control packets encapsulate a timestamp and control information including the command. Timestamps correspond to a global time at which the commands must be executed. To be executed, control packets must reach the nodes that will execute the commands at last during the time interval that precedes the writing of the command in the register. This advance provides time enough for the sequential processing (queue, order) of the control packets by the software processors given the fact that only (up to) a few control packets will be sent to a node for a specific time interval. The approach allows for unordered asynchronous communication of control commands from the control tree.
- the command packets can be transmitted asynchronously to the timed-FSM.
- Several asynchronous packets can be sent to different nodes, with a same timestamp. This will result in the parallel time-accurate execution of the commands by these nodes and in the parallel control of the concerned data-flow IPs.
- This invention allows for rapid adaptation to modifications in required operational modes.
- the platform and IP specific code is separated from the platform independent code. This is achieved by disseminating and progressively refining operational modes and corresponding execution of control commands in finite state machines, and through a generic re-usable plug-in mechanism for the IP specific reconfiguration associated with state transitions.
- Software processors may implement the state machines required to handle the operational modes.
- the interfaces between the data-flow IPs and the software processors may be implemented as timed-FSMs, which separate the domain of free-running software processors from the maximal clock of the data-flow IPs.
- the software processors together with the timed-FSMs handle the transformation, from asynchronously communicated time- stamped commands to time-accurate state-transitions implementing accurately timed reconfiguration of IPs.
- the method allows for rapid upgrading of the completion code that controls the data-flow IPs.
- the individual IP controllers can be plug-ins to a generated timed state-machine implemented in VHDL (Very High Speed Integrated Circuits Hardware Definition Language) or other Hardware definition language (HDL).
- VHDL Very High Speed Integrated Circuits Hardware Definition Language
- HDL Hardware definition language
- the plug-ins can be specified at a high-level by a table representing the control sequences and then be translated into a HDL definition (state-machine) by a suitable program. It should be noted that the translation of a control sequence into HDL is generally known in the art of designing logic devices and for the sake of brevity is not described in further detail.
- the HDL definition may then be read by a compiler program able to read instructions in the specific type of HDL.
- the HDL code is fed into a logic compiler, and the output is uploaded into a programmable logic devices, such as a FPGA (field programmable gate array).
- a programmable logic devices such as a FPGA (field programmable gate array).
- Control sequences in plug-ins are activated by a state transition corresponding to a unique command issued from a software processor during a period that precedes the activation of the command.
- the command is written in a register until its activation at the beginning of the next period, therefore slack-time in the software processors is tolerated by decoupling the data-flow IP clock domain from the software processor clock domain.
- Commands are activated (read from register) on the occurrence of an external synchronization pulse. The execution of commands is therefore time-accurate.
- Each individual high-throughput data-flow IP is controlled from a high-level without any loss of data across multiple re-configurable platforms via the hierarchical network. This is achieved by isolating the control paths from the data-flow paths and by interfacing them via the periodic synchronization pulse whose period relates to all • V# V* W fm V VJ?
- the periodic synchronization pulse is distributed to all nodes in the entire said hierarchical control network and across all re-configurable platforms.
- a global notion of time is associated to the periodic and distributed synchronization pulse. The global notion of time is the same for all nodes and leaf- nodes, however it does not determine the clock of the software processors.
- Each timed- FSM plug-in incorporates a local counter which is incremented at the clock speed of the data-flow IP controlled by this plug-in. This counter restarts counting from zero when executing a new control command on the occurrence of a synchronization pulse train.
- Each control sequence in a plug-in is specified as a function of the counter value.
- control nodes and root node only need to be aware of their parents and siblings in order to disseminate control messages appropriately and translate summarized monitoring information upward. Inserting layers of control nodes in the hierarchichy at compile time permits adapting to the partitioning and distribution of the dominant data processing tasks onto a network of physical platforms.
- the root node may be completely unaware of the number of tasks and their distribution at the lowest level. As a result the commands are kept to a minimal size, progressively refined throughout the hierarchy, and the number of commands passed at each layer is kept constant even if the number of tasks scales up.
- each node may only be aware of its directly connected neighbours and be arranged to perform instructions from its directly adjacent parent node and to generate control commands readable by its directly adjacent child node.
- Modifying parameters of a data-flow IP or scaling a data-flow IP often requires adapting the associated completion logic. This invention allows doing so without interfering with the rest of the design, therefore facilitating design-scaling. This is done by taking a specification of the control sequence (including timing constraints) that is determined by the IP or some hardware component. From the specification, the logic or code which generates (for all possible control or reconfiguration actions on the IP) the proper control sequence, is generated automatically.
- a timed FSM-plug-in is used to generate the proper control sequence by implementing, for each high-level state transition, the logic that can be activated with the time-stamped message to generate the required control sequences for the IP; logic is generated for each specific control sequence.
- the synchronization pulse still has to be connected to this updated plug-in. With this SoC-level interconnection mechanism, the data-flow remains undisturbed and all IPs keep running at their maximum speed.
- Moving a data-flow IP from a re-configurable platform to another, or sending high-throughput data between IPs that are mapped onto different platforms may require the introduction of additional IPs for high throughput data-transmission between platforms, such as serialization or concatenation of data packets.
- This invention permits re-using the leaf- nodes of the moved or interconnected data-flow IPs without modifying them.
- the only constraint is to insert leaf-nodes to control the data-transmission IPs in the design. This is preferably done by creating dedicated FSMs to control these data-transmission IPs and by inserting control sequences into these timed-FSMs. Then these timed-FSMs are preferably connected to a software processor so as to complete a leaf-node.
- the synchronization pulse also has to be connected to the new plug-ins/leaf-nodes. With this cross-platforms interconnection mechanism, the data-flow remains undisturbed and all IPs keep running at their maximum speed.
- Instantiation of the on-chip control infrastructure is done at design-time.
- T he required C/C++ code to run on CPUs in the control hierarchy as well as the transistor logic including the plug-in for generating the control signals for the IP can be generated automatically from a specification of control actions and a specification of the control and data processing network at design-time.
- the links are dedicated point-to-point links, hence they require no arbitration mechanism and extra latency for regulating access to a shared resource like a bus or shared memory.
- the SoC may be defined by the system description and the Register transfer level description (RTL), also called register transfer logic.
- RTL Register transfer level description
- RTL is a description of a digital electronic circuit in terms of data flow between registers, which store information between clock cycles in a digital circuit.
- the RTL description specifies what and where this information is stored and how it is passed through the circuit during its operation.
- the control interface can be adapted when increasing the number of IPs in the data-path by replicating the timed FSM and integrating the IP-specific sequence generators on state transition in the FSM skeleton.
- Multiple timed-FSMs can be connected to softcores implementing the platform-independent FSM.
- the number of state -transitions which can be handled by the FSM on the CPU and the timed-FSM implemented in RTL is limited only the either the clock of the CPU or by the IP timing constraints for reconfiguration.
- Each individual plug-in can run on the same clock as its dedicated data-flow IP (this clock is preferably external or generated by an embedded PLL in a re- configurable platform), as depicted in Figures 5 and 6.
- the interface permits reconfiguring and monitoring the data processing application without altering its behaviour. It facilitates the scaling of the system by allowing a systematic wrapping of components. Moreover it simplifies the insertion of new components by supporting the specification of their control in extended Finite State Machines in hierarchical control layers without modifying the rest of the interface.
- the example of fig. 1 includes a network 1 of nodes.
- the network 1 includes a control network 10 and a data processing network 11.
- the data processing network 11 includes two signal processing components 40,41 connected in series.
- the signal processing components may also be referred to as IP cores 40,41.
- TP' or 1 IP core' is used in art to refer to a component which contains a block of logic or data (which often is protected by intellectual property and provided by a third party as a black box, hence the name IP), such as a suitably programmed, part of, a field programmable gate array (FPGA) or" an application-specific integrated circuit (ASIC).
- IP such as a suitably programmed, part of, a field programmable gate array (FPGA) or" an application-specific integrated circuit (ASIC).
- the IP 40 may for example be a Finite Impulse Response (FIR) filter component, a Fast Fourier Transformer (FFT) component or any other suitable type of circuit.
- FIR Finite Impulse Response
- FFT Fast Fourier Transformer
- an IP core 40 which is positioned, in a data flow direction, upstream receives data to be processed and performs a certain processing function on the received data, e.g. a filtering process.
- the IP core 40 outputs the processed data to a IP core 41 positioned downstream of the upstream IP core.
- the downstream IP core 41 performs another processing function, e.g. fourier transforming of the filtered data, and outputs the further processed data further downstream.
- the network of fig. 1 includes two IP cores and corresponding interface nodes. However, depending on the specific implementation can comprise of any suitable number of data processors (e.g. IP cores) and any suitable number of interfacing nodes.
- the control network 10 includes a node processor 20 and interface-node processors 30-31.
- the control network 10 is arranged to control the operation of the data processing network 11.
- the control network may for example have the topology of a lattice or tree network.
- the control network may be a hierarchical network. In fig. 1, the control network is a hierarchical tree network of which the node processor 20 forms the root node of the control network.
- the interface node processors 30,31 form the leaf nodes of the hierarchical tree network. However, it is also possible that between the root processor of the control network and the interface node processor 30 further nodes are present, which may be internal nodes.
- Each of the interface-node processors 30-31 is connected by means of a point-to-point connection to one of the dedicated signal processing components 40,41 in the data processing network 11.
- the operation of the interface node processor 30 is controlled by the node processor 20.
- the node processor 20 can send interface node control commands to the interface node processor.
- the interface node processor 30 Based on the interface node control commands, the interface node processor 30 generates control signals which are sent to the data processor 40.
- the control signals control the settings of the data processor 40.
- the interface node processor includes a leaf-node processor 301 which forms an end node of the control network 10 and a controller which in this example is implemented as a time accurate plug-in 302, as described above.
- the leaf node processor 301 receives control commands via an input and generates from the control commands macro-commands, and optionally control data, which are sent to the controller 302. Based on the macro-command, the controller 302 generates control signals suitable for the specific data processing component 40.
- the control network 10 can be adapted in a simple manner by adjusting or replacing the controller 302. Furthermore, since each of the data processing components 40,41 has a single corresponding interfacing node 30,31, each of the data processing components can operate at its own frequency (preferably its maximum frequency). Furthermore, the nodes 20,30-31 in the control network 10 are separated from the data processing component 40, and accordingly the nodes 20,30-31 can operate at a different clock frequency than the data processing component.
- the controller 302 can advantageously be implemented as a Hardware Definition Language (HDL) compiler connected to a memory in which a HDL definition of the data processing unit is stored.
- HDL Hardware Definition Language
- the HDL definition can be adjusted, and accordingly the network 1 is very flexible.
- the example of a node processor 20 of fig. 2 includes an input buffer and an output buffer. A switch is present between the input buffer and the output buffer.
- the node processor of fig. 2 further includes a feedback loop including a queue and order unit followed by an execute unit.
- the node processor 20 may be connected to nodes higher in the control network and nodes lower in the control network.
- the packets may be received asynchronously.
- the received packets are stored the input buffer (non deterministic merge) to handle possible traffic congestions. There are then two options. 1) If the destination node of the packet does not match the identifier of the receiving node, the packets is switched to the output buffer.
- the packet is switched into the feedback loop (Queue and order unit, Execute unit).
- the execution unit is implemented as a Finite State Machine (FSM) with states corresponding to commands. Each state executes a pre-determined sequence of instructions that can use data provided in the control data information in the packets.
- FSM Finite State Machine
- a leaf-node processor in the control network may include an input buffer and an output buffer.
- the input buffer is connected to a queue and order unit.
- the queue and order unit is connected to a monitoring unit and a macro-command output.
- the queue and order unit can convert (after ordering as in the other nodes) the commands obtained from all packets with identical current timestamp into a single macro-command.
- a macro-command is sent to the process dedicated to the leaf- node.
- the monitoring unit collects data from the process it controls and generates packets to be sent upwards in the network.
- the monitoring unit is connected with its output to the output buffer.
- the output buffer is connected to other nodes in the network
- the monitoring unit may be implemented as an extended finite state machine
- FSM FSM that processes incoming monitoring data and generates packets.
- the states of the extended FSM depend on the macro-commands received from the queue.
- the processing in the FSM may be delayed for a data processing period with respect to the received macro-commands in order to take into account that the data received from the PN process is only released at the end of its period.
- a controller controls a data processor.
- the controller 302 has an input/output port that serves as the single entry point for the control network connected to the controller 302. Via the entry point the macro-commands generated by the leaf-node are received and data is transmitted to the leaf node 301.
- the data processor 40 may include an IP 400, an input and an output multiplexer 402, a private memory 403,404 and FIFOs 401.
- the private memory has a segment 404 that is exclusively dealing with monitoring data.
- the respective components of the data processor are connected to the controller 302 for receiving control signals and outputting monitoring data to the controller 302.
- Figs. 5 and 6 schematically illustrate a programmable logic device.
- the PLD may for instance be a System on a Chip implemented on a FPGA.
- the FPGA includes a CPU which can receive control data from other systems connected to the FPGA.
- the CPU is programmed such that it performs the functions of a node processor, e.g. as shown in fig. 2, and, together with the buffers, a leaf node processor, e.g. as shown in fig. 3.
- the CPU and the buffer receive a notion of time, e.g. a synchronisation pulse.
- the leaf node processor is connected via the buffers with a controller, implemented as concurrent finite state machines, which controls the signal processing IP core.
- the data processing is performed by an IP core which receives and outputs data via respective FIFO buffers.
- the example includes a PLL (phase locked loop) which generates a clock signal suitable for the CPU and a clock signal suitable for the IP core and the controller.
- Fig. 6 illustrates the rescaling of the system.
- the data processing part is provided with two additional IP cores. Each of the IP cores is controlled by a respective controller (FSMsl-FSMs3).
- Each IP core and its corresponding controller receives a dedicated clock signal (IPl clock-IP3 clock), thus allowing the IP cores to run at their maximum processing speed.
- CPU2 operates, together with the buffer connected to FSMs3, as a leaf node processor for IP3.
- CPUl operates, together with/the respective buffers, as leaf nodes for IPl and IP2.
- CPUl is suitably programmed to operate as a node processor for the three leaf nodes. Both CPUl and CPU2 receive the synchronisation pulse and operates at their respective CPU clock frequency.
- the control network tests the behaviour of the data processing network locally.
- the control network is connected to two, parallel data processing lines.
- the control network is divided onto three hierarchical levels.
- a root occupies the first level. This root is a master for a node (NODE) and two leaf-nodes (LEAF TEST 1 -LEAF FIE) that belong to the second and third levels respectively.
- a first leaf node (LEAF TEST) interfaces and synchronizes the control network with a data generator, thus enabling to update and monitor the amplitude of the generated data.
- Another leaf-node (LEAF FIR) interfaces and synchronizes the control network with a FIR filter, which permits re-configuring the FIR filter coefficients and monitoring the output.
- each of the data processing lines includes a input (IN), a test process (TESTO and a FIR filter (FIR) the time varying FIR filter (FIR) and a test process (TEST) receive control information from a control network starting from a root node (ROOT).
- the test generator sinks data into the input port of the FIR filter. The behaviour of both components is verified locally and across the network by modifying the FIR filter coefficients and the amplitude of the test data at a fixed point in time.
- Dedicated leaf-nodes (LEAF TEST;LEAF FIR) for both the test and the FIR process monitor the output of the processes after executing the control command and pass this information higher up in the control hierarchy to a node that diagnoses the status of the system.
- This test has been scheduled as part of a health management routine (a self- test procedure). Scheduling in an asynchronous network is possible because the same notion of time is distributed to all the elements in the control network with a synchronization pulse.
- the timing of the self-test procedure is schematically shown in fig. 8, and is as follows:
- the test is scheduled in terms of synchronization periods. If TO is the reference period at which the test command is issued, TN is the Nth period.
- the procedure is further detailed according the schedule in fig. 7.
- the root sends an asynchronous control packet in the control network, which orders the node to start a self-test procedure at Tl.
- the node therefore executes a pre-defined test procedure from Tl to T7 by generating control packets towards the control leaf-nodes.
- the node monitors the behaviour of these leaf-nodes and finally returns the state of the control network and interface with the data processing to the root after the test procedure.
- the procedure is a combination of two other subtests, which are performed by the leaf- nodes under the supervision of the node. These two subtests run concurrently since the asynchronous control packets generated by the node towards the two leaf- nodes are interleaved in time.
- the first subtest executes a set of commands to verify the behaviour of the interface between the control network and the data generator.
- the amplitude of the data which was initially set to 0, is set to 1 at T2.
- the data issued from the generator is monitored during T3.
- the leaf-node dedicated to the control of the generator receives the monitored data at T4.
- this leaf-node sends a packet to the node, indicating the result of the first subtest.
- the second subtest verifies the behaviour of the interface to the FIR filter. It starts with the monitoring of the output of the FIR filter during T4 in order to ensure that the filter received the correct data from the generator.
- control network enables a synchronicity of parallel distributed processing which is required for real-time processing, this requirement translates from real-time to data-driven asynchronous processing systems where the a similar requirement is to minimize memory usage by ensuring there are minimal difference in latency between parallel data pipelines
- the invention is not limited to implementation in the disclosed examples of devices, but can likewise be applied in other devices.
- the invention is not limited to physical devices or units implemented in non-programmable hardware but can also be applied in programmable devices or units able to perform the desired device functions by operating in accordance with suitable program code.
- the devices may be physically distributed over a number of apparatuses, while logically regarded as a single device.
- a node processor may be implemented as a plurality of separate processors arranged to perform in combination the functions of the node.
- devices logically regarded as separate devices may be integrated in a single physical device.
- the node processors can be implemented in a single processor able to perform the functions of the respective nodes or the entire system can be implemented on a single chip, as a so called 'system on a chip' or Soc.
- a number of systems on a chip may be connected to each other via the nodes of the control network in order to form a larger system for distributed processing of data.
- each SoC may be referred to a platform
- the invention may also be implemented in a computer program for running on a computer system, at least including code portions for performing steps of a method according to the invention when run on a programmable apparatus, such as computer system or enabling a programmable apparatus to perform functions of a device or system according to the invention.
- a computer program may be provided on a data carrier, such as a CD-rom or diskette, stored with data loadable in a memory of a computer system, the data representing the computer program.
- the data carrier may further be a data connection, such as a telephone cable or a wireless connection transmitting signals representing a computer program according to the invention.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
A system for distributed processing of signals, including: at least one signal processing unit; at least one control unit arranged to generate a set of control commands and presenting the control commands at an output; at least two interfacing nodes, each of said interfacing nodes including: a data input connected to one control unit, for receiving control data from the control unit; a command data generator for generating command instructions in response to the received control data, said command instructions being readable by the signal processing device connected to the node; a clock input for receiving a synchronisation clock signal; a scheduling unit for scheduling a release of the command instructions based on the synchronisation clock signal; and a control output for outputting generated command instructions according to the schedule, which control output is connected with via a point-to-point connection to one of said data processing devices.
Description
Title: Scalable control interface for large-scale signal processing systems.
FIELD AND BACKGROUND OF THE INVENTION
The invention relates to a system for processing of signals. The invention further relates to a signal processing operation. The invention also relates to a method for controling and re-configuring a (large-scale) system for processing of signals. The invention further relates to an assembly of at least one control unit and an interface to a signal processing device.
It is known to process data or signals using a system or network that includes a plurality of processing components. Such a network may for example include one or more separate hardware components, such as a Central Processing Unit (CPU), a Field Programmable Gate Array (FPGA) or a Digital Signal Processor (DSP) and/or one or more software components running on a programmable device, such as a software processor. In the network, each of the components processes a part of the data or performs a certain processing task with the data and thereafter passes data to another component. A scalable and speed-optimized mapping of signal processing applications onto such a network requires a scalable and optimized configuration of the components as well as optimal and flexible control of the components. However, a disadvantage of the prior art systems is that either the performance of the components is reduced by the control or that the system is not flexible, e.g. that the system is not scalable and/or not portable across different platforms.
For example, dedicated signal processing devices (also referred to in the art as "Intellectual Property" or IP), can process data streams at very high speed rates in parallel. However, the development of the interfaces between the controller of the processing devices and the signal processing devices is time consuming. Developing or reconfiguring an arrangement of dedicated signal processing devices therefore requires a development of suitable interfaces, each adapted to the specific, dedicated signal processing device. This makes the system inflexible.
General purpose processors (or embedded CPUs) can execute software applications that are relatively flexible and rapid to develop. However, in case a general purpose processor is used to control a configuration of dedicated signal processing components, the processor runs at a low speed rate compared to the
maximum speed rate of the signal processing components. Furthermore, the general purpose processor executes only sequential instructions and can not fulfil hard realtime requirements, whereas the signal processing components typically require to be controlled with hard real-time precision.
SUMMAEY OF THE INVENTION
It is one goal of the invention to provide a system which can be flexible without reducing the operating speed of the signal processing components. Therefore, the invention provides a system according to claim 1. The invention further provides a signal processing operation according to claim 12. The invention also provides a modifying according to claim 13 and to an assembly according to claim 14.
Specific embodiments of the invention are set forth in the dependent claims.
BRIEF DESCRIPTION OF THE DRAWINGS Further details, aspects and embodiments of the invention will be described, by way of example only, with reference to the drawings.
Fig. 1 schematically shows a block diagram of an example of an embodiment of a system according to the invention.
Fig. 2 schematically shows a block diagram of an example of an embodiment of an internal node suitable for the example of fig. 1.
Fig. 3 schematically shows a block diagram of an example of an embodiment of a part of an interfacing node suitable for the example of fig. 1
Fig. 4 schematically shows a block diagram of an example of an embodiment of another part of the example of fig. 3 connected to an dedicated data processing component.
Fig. 5 schematically shows a circuit diagram of an example of an embodiment of an interfacing node connected to a data processing unit.
Fig. 6 schematically shows circuit diagram of the example of fig. 5 with added data processing units. Fig. 7 schematically shows a block diagram of a network used in a simulation.
Fig. 8 schematically shows a timing diagram for the network of fig. 7.
DETAILED DESCRIPTION
Applications that are distributed on large systems are often data flow- processing applications, as in telecommunications, radar, sensor networks for physics (e.g. particle detectors in particle accelerators) or distributed radio telescopes. Examples of such distributed systems are distributed radio telescopes and large image processing systems. Such distributed systems may comprise a hierarchical data processing network, a hierarchical control network, an appropriate interface between them, and a superimposed synchronization network. The nodes of the data processing network execute applications that are themselves modelled as process networks. The nodes of the control network process and forward control information (to (reconfigure, test, monitor the system) in different operation modes and are themselves modelled as extended finite state machine networks. Also, the signal processing applications dictate the specification of the system architecture and are dominant over the control applications. Typically, the system includes several sub-(sub-)systems down to the level of the components on platforms. Moreover the sub-(sub)-systems have a single entry point for the control. This imposes a hierarchy in the control network.
The example of fig. 1 includes a hierarchical control network that is interfaced with a data processing network. Tasks that are executed in the data processing network have periodic execution cycles that fall well within periods of a synchronization pulse train that is distributed to all nodes in the control network. The control network includes distributed nodes and leaf-nodes. This network is instantiated at design time. Nodes may for example be mapped onto software processors/CPUs (preferably with RISC architecture) and leaf-nodes may for example be mapped onto software processors and/or timed-FSM plug-ins.
The control network can execute control commands in its nodes and transport the control/monitoring information in the system. To each control application corresponds a set of procedures that schedule the execution of control commands in the network. The control commands are issued in a top-down manner from a root to the FSM actors (i.e. the nodes in the control network) distributed in the system. These control commands can correspond to a (re-)configuration, test, reset, monitoring or any other action on the system. In the network, the nodes are connected across adjacent levels through communication channels (see Figure 1). They all receive a
global notion of time (synchronization pulse). A suitable period between successive synchronisation pulses may for example be the smallest common multiple of all periods in all dedicated data processing devices. Finally, the control network is interfaced with processes in the PN network, via its nodes in the lowest level (leaf- nodes), through point to point communication channels.
Communication between nodes may be master-slave based. Each node has a unique identifier, which can be masked to address groups of nodes. A node may send a control packed to the node acting as its slave.
The control packets may include two parts: 1) a header, and 2) control data information. The header may for example include:
• An ID, which identifies the destination node(s).
• A command, which requests a specific control action.
• A timestamp, which indicates the time at which the command needs to be executed (with respect to the global synchronization pulse). • A priority, which indicates an execution priority order given a timestamp
• A size, which indicates the length of the control data information.
Control data information contains parameters that are required by a node and related to the command in the packet received
The example has a control interface mechanism which facilitates the wrapping and re-use of high-throughput data-flow IPs and permits design-scaling and re-use in a network of multiple re-configurable platforms. As explained below, the example of fig. 1 includes a control network which is based on a (hierarchical) network of synchronized node processors. The node processors can for example be implemented as Central Processing Units or other software processors connected to one or more memories which provides suitable instructions to the processors.
In order to prevent resource sharing in the control and monitoring paths, point-to-point control interfaces may be provided in the network to bridge central control to the data-flow processes. Furthermore, the clock domains of the (high-level, soft real-time) control and the (low-level, hard real-time) data processing may be decoupled, so as to allow for a maximal processing frequency in the data processing layer. By decoupling the clocks of the leaf- nodes and the clocks of the IPs and by preventing resource sharing, each individual high-throughput data-flow IP is able to run at its maximum clock frequency.
The design of the system can easily be scaled (increase in the number of dataflow IPs and associated dedicated leaf- nodes, and/or increase in the number of re- configurable platforms they are mapped onto) without altering the existing designs, therefore facilitating design-reuse as well. This is achieved through a hierarchical control network with physically separated point-to-point control paths. Furthermore, a synchronization pulse is distributed to all nodes and leaf-nodes in the hierarchical control network. The synchronization pulse preferably has a period which is a common multiple of the periodic execution cycles of all data-flow IPs. This pulse train permits sharing a global notion of time in the hierarchical control network. Thus, time-stamps may be generated that relate to this global notion of time so as to execute control commands at specific times in the control network and in the signal processing network.
The root of the hierarchical control network initiates procedures by sending, preferably asynchronously, control packets towards the nodes. Asynchronous control packets encapsulate a timestamp and control information including the command. Timestamps correspond to a global time at which the commands must be executed. To be executed, control packets must reach the nodes that will execute the commands at last during the time interval that precedes the writing of the command in the register. This advance provides time enough for the sequential processing (queue, order) of the control packets by the software processors given the fact that only (up to) a few control packets will be sent to a node for a specific time interval. The approach allows for unordered asynchronous communication of control commands from the control tree. By ordering the possibly out-of-order packets in nodes or in leaf-nodes, the command packets can be transmitted asynchronously to the timed-FSM. Several asynchronous packets can be sent to different nodes, with a same timestamp. This will result in the parallel time-accurate execution of the commands by these nodes and in the parallel control of the concerned data-flow IPs.
This invention allows for rapid adaptation to modifications in required operational modes. The platform and IP specific code is separated from the platform independent code. This is achieved by disseminating and progressively refining operational modes and corresponding execution of control commands in finite state machines, and through a generic re-usable plug-in mechanism for the IP specific reconfiguration associated with state transitions. Software processors may implement
the state machines required to handle the operational modes. The interfaces between the data-flow IPs and the software processors may be implemented as timed-FSMs, which separate the domain of free-running software processors from the maximal clock of the data-flow IPs. The software processors together with the timed-FSMs (leaf-nodes) handle the transformation, from asynchronously communicated time- stamped commands to time-accurate state-transitions implementing accurately timed reconfiguration of IPs.
The method allows for rapid upgrading of the completion code that controls the data-flow IPs. The individual IP controllers can be plug-ins to a generated timed state-machine implemented in VHDL (Very High Speed Integrated Circuits Hardware Definition Language) or other Hardware definition language (HDL). The plug-ins can be specified at a high-level by a table representing the control sequences and then be translated into a HDL definition (state-machine) by a suitable program. It should be noted that the translation of a control sequence into HDL is generally known in the art of designing logic devices and for the sake of brevity is not described in further detail. The HDL definition may then be read by a compiler program able to read instructions in the specific type of HDL. The HDL code is fed into a logic compiler, and the output is uploaded into a programmable logic devices, such as a FPGA (field programmable gate array). Thus, dedicated timed-FSM plug-ins are automatically generated (preferably in portable VHDL or ROM-based) from high-level specifications of time-accurate control sequences of data-flow IPs.
Control sequences in plug-ins are activated by a state transition corresponding to a unique command issued from a software processor during a period that precedes the activation of the command. The command is written in a register until its activation at the beginning of the next period, therefore slack-time in the software processors is tolerated by decoupling the data-flow IP clock domain from the software processor clock domain. Commands are activated (read from register) on the occurrence of an external synchronization pulse. The execution of commands is therefore time-accurate. Each individual high-throughput data-flow IP is controlled from a high-level without any loss of data across multiple re-configurable platforms via the hierarchical network. This is achieved by isolating the control paths from the data-flow paths and by interfacing them via the periodic synchronization pulse whose period relates to all
• V# V* W fm V VJ?
7
periods of all data-flow IPs. The periodic synchronization pulse is distributed to all nodes in the entire said hierarchical control network and across all re-configurable platforms. A global notion of time is associated to the periodic and distributed synchronization pulse. The global notion of time is the same for all nodes and leaf- nodes, however it does not determine the clock of the software processors. Each timed- FSM plug-in incorporates a local counter which is incremented at the clock speed of the data-flow IP controlled by this plug-in. This counter restarts counting from zero when executing a new control command on the occurrence of a synchronization pulse train. Each control sequence in a plug-in is specified as a function of the counter value.
The control nodes and root node only need to be aware of their parents and siblings in order to disseminate control messages appropriately and translate summarized monitoring information upward. Inserting layers of control nodes in the hierarchichy at compile time permits adapting to the partitioning and distribution of the dominant data processing tasks onto a network of physical platforms. The root node may be completely unaware of the number of tasks and their distribution at the lowest level. As a result the commands are kept to a minimal size, progressively refined throughout the hierarchy, and the number of commands passed at each layer is kept constant even if the number of tasks scales up. In this respect, for example, each node may only be aware of its directly connected neighbours and be arranged to perform instructions from its directly adjacent parent node and to generate control commands readable by its directly adjacent child node.
Modifying parameters of a data-flow IP or scaling a data-flow IP often requires adapting the associated completion logic. This invention allows doing so without interfering with the rest of the design, therefore facilitating design-scaling. This is done by taking a specification of the control sequence (including timing constraints) that is determined by the IP or some hardware component. From the specification, the logic or code which generates (for all possible control or reconfiguration actions on the IP) the proper control sequence, is generated automatically. A timed FSM-plug-in is used to generate the proper control sequence by implementing, for each high-level state transition, the logic that can be activated with the time-stamped message to generate the required control sequences for the IP; logic is generated for each specific control sequence. The synchronization pulse still has to be connected to this updated
plug-in. With this SoC-level interconnection mechanism, the data-flow remains undisturbed and all IPs keep running at their maximum speed.
Moving a data-flow IP from a re-configurable platform to another, or sending high-throughput data between IPs that are mapped onto different platforms may require the introduction of additional IPs for high throughput data-transmission between platforms, such as serialization or concatenation of data packets. This invention permits re-using the leaf- nodes of the moved or interconnected data-flow IPs without modifying them. The only constraint is to insert leaf-nodes to control the data-transmission IPs in the design. This is preferably done by creating dedicated FSMs to control these data-transmission IPs and by inserting control sequences into these timed-FSMs. Then these timed-FSMs are preferably connected to a software processor so as to complete a leaf-node. The synchronization pulse also has to be connected to the new plug-ins/leaf-nodes. With this cross-platforms interconnection mechanism, the data-flow remains undisturbed and all IPs keep running at their maximum speed.
Instantiation of the on-chip control infrastructure is done at design-time. T he required C/C++ code to run on CPUs in the control hierarchy as well as the transistor logic including the plug-in for generating the control signals for the IP can be generated automatically from a specification of control actions and a specification of the control and data processing network at design-time. After synthesis of the SoC the links are dedicated point-to-point links, hence they require no arbitration mechanism and extra latency for regulating access to a shared resource like a bus or shared memory. The SoC may be defined by the system description and the Register transfer level description (RTL), also called register transfer logic. It should be noted that in general RTL is a description of a digital electronic circuit in terms of data flow between registers, which store information between clock cycles in a digital circuit. The RTL description specifies what and where this information is stored and how it is passed through the circuit during its operation.
The control interface can be adapted when increasing the number of IPs in the data-path by replicating the timed FSM and integrating the IP-specific sequence generators on state transition in the FSM skeleton. Multiple timed-FSMs can be connected to softcores implementing the platform-independent FSM.
The number of state -transitions which can be handled by the FSM on the CPU and the timed-FSM implemented in RTL is limited only the either the clock of the CPU or by the IP timing constraints for reconfiguration.
Each individual plug-in can run on the same clock as its dedicated data-flow IP (this clock is preferably external or generated by an embedded PLL in a re- configurable platform), as depicted in Figures 5 and 6.
As already stated, in a system according to the invention the signal processing and the control parts of the data processing application are separated. Furthermore, the control tasks are synchronised with the data processing tasks. The interface permits reconfiguring and monitoring the data processing application without altering its behaviour. It facilitates the scaling of the system by allowing a systematic wrapping of components. Moreover it simplifies the insertion of new components by supporting the specification of their control in extended Finite State Machines in hierarchical control layers without modifying the rest of the interface. The example of fig. 1 includes a network 1 of nodes. The network 1 includes a control network 10 and a data processing network 11. The data processing network 11 includes two signal processing components 40,41 connected in series. The signal processing components may also be referred to as IP cores 40,41. In this respect, a term TP' or 1IP core' is used in art to refer to a component which contains a block of logic or data (which often is protected by intellectual property and provided by a third party as a black box, hence the name IP), such as a suitably programmed, part of, a field programmable gate array (FPGA) or" an application-specific integrated circuit (ASIC). The IP 40 may for example be a Finite Impulse Response (FIR) filter component, a Fast Fourier Transformer (FFT) component or any other suitable type of circuit.
As shown in fig. 1, an IP core 40 which is positioned, in a data flow direction, upstream receives data to be processed and performs a certain processing function on the received data, e.g. a filtering process. The IP core 40 outputs the processed data to a IP core 41 positioned downstream of the upstream IP core. The downstream IP core 41 performs another processing function, e.g. fourier transforming of the filtered data, and outputs the further processed data further downstream. The network of fig. 1 includes two IP cores and corresponding interface nodes. However, depending on the
specific implementation can comprise of any suitable number of data processors (e.g. IP cores) and any suitable number of interfacing nodes.
The control network 10 includes a node processor 20 and interface-node processors 30-31. The control network 10 is arranged to control the operation of the data processing network 11. The control network may for example have the topology of a lattice or tree network. The control network may be a hierarchical network. In fig. 1, the control network is a hierarchical tree network of which the node processor 20 forms the root node of the control network. The interface node processors 30,31 form the leaf nodes of the hierarchical tree network. However, it is also possible that between the root processor of the control network and the interface node processor 30 further nodes are present, which may be internal nodes. Each of the interface-node processors 30-31 is connected by means of a point-to-point connection to one of the dedicated signal processing components 40,41 in the data processing network 11. The operation of the interface node processor 30 is controlled by the node processor 20. To that end, the node processor 20 can send interface node control commands to the interface node processor. Based on the interface node control commands, the interface node processor 30 generates control signals which are sent to the data processor 40. The control signals control the settings of the data processor 40. As shown in figs 2 and 3 in more detail, the interface node processor includes a leaf-node processor 301 which forms an end node of the control network 10 and a controller which in this example is implemented as a time accurate plug-in 302, as described above. The leaf node processor 301 receives control commands via an input and generates from the control commands macro-commands, and optionally control data, which are sent to the controller 302. Based on the macro-command, the controller 302 generates control signals suitable for the specific data processing component 40.
Thus, in case the data processing component 40 is modified, the control network 10 can be adapted in a simple manner by adjusting or replacing the controller 302. Furthermore, since each of the data processing components 40,41 has a single corresponding interfacing node 30,31, each of the data processing components can operate at its own frequency (preferably its maximum frequency). Furthermore, the nodes 20,30-31 in the control network 10 are separated from the data processing
component 40, and accordingly the nodes 20,30-31 can operate at a different clock frequency than the data processing component.
As explained below, the controller 302 can advantageously be implemented as a Hardware Definition Language (HDL) compiler connected to a memory in which a HDL definition of the data processing unit is stored. In case the data processing unit 40 is modified, the HDL definition can be adjusted, and accordingly the network 1 is very flexible.
The example of a node processor 20 of fig. 2 includes an input buffer and an output buffer. A switch is present between the input buffer and the output buffer. The node processor of fig. 2 further includes a feedback loop including a queue and order unit followed by an execute unit. The node processor 20 may be connected to nodes higher in the control network and nodes lower in the control network.
The packets may be received asynchronously. The received packets are stored the input buffer (non deterministic merge) to handle possible traffic congestions. There are then two options. 1) If the destination node of the packet does not match the identifier of the receiving node, the packets is switched to the output buffer.
In case the destination of the receiving node matches the identifier of the receiving node, the packet is switched into the feedback loop (Queue and order unit, Execute unit). On the occurrence of the next synchronization pulse in the control network, packets are ordered in the queue with respect to the timestamps and priorities. The ordered packets are then transmitted to the execute unit as long as the , timestamps match the globally shared time as given by the synchronization pulse. The execution unit is implemented as a Finite State Machine (FSM) with states corresponding to commands. Each state executes a pre-determined sequence of instructions that can use data provided in the control data information in the packets. Thus, in response to receiving a command, the execution units performs a predetermined set of instructions, in a predetermined order.
As shown in Figure 3, a leaf-node processor in the control network may include an input buffer and an output buffer. In fig. 3, the input buffer is connected to a queue and order unit. The queue and order unit is connected to a monitoring unit and a macro-command output. The queue and order unit can convert (after ordering as in the other nodes) the commands obtained from all packets with identical current timestamp into a single macro-command. A macro-command is sent to the process
dedicated to the leaf- node. The monitoring unit collects data from the process it controls and generates packets to be sent upwards in the network. The monitoring unit is connected with its output to the output buffer. The output buffer is connected to other nodes in the network The monitoring unit may be implemented as an extended finite state machine
(FSM) that processes incoming monitoring data and generates packets. The states of the extended FSM depend on the macro-commands received from the queue. The processing in the FSM may be delayed for a data processing period with respect to the received macro-commands in order to take into account that the data received from the PN process is only released at the end of its period.
As shown in fig. 4, a controller controls a data processor. To control the process performed by the data processor, the controller 302 has an input/output port that serves as the single entry point for the control network connected to the controller 302. Via the entry point the macro-commands generated by the leaf-node are received and data is transmitted to the leaf node 301.
The data processor 40 may include an IP 400, an input and an output multiplexer 402, a private memory 403,404 and FIFOs 401. In fig. 4, the private memory has a segment 404 that is exclusively dealing with monitoring data. The respective components of the data processor are connected to the controller 302 for receiving control signals and outputting monitoring data to the controller 302.
Figs. 5 and 6 schematically illustrate a programmable logic device. In this example, the PLD may for instance be a System on a Chip implemented on a FPGA. The FPGA includes a CPU which can receive control data from other systems connected to the FPGA. The CPU is programmed such that it performs the functions of a node processor, e.g. as shown in fig. 2, and, together with the buffers, a leaf node processor, e.g. as shown in fig. 3. As shown, the CPU and the buffer receive a notion of time, e.g. a synchronisation pulse.
As further shown in fig. 5, the leaf node processor is connected via the buffers with a controller, implemented as concurrent finite state machines, which controls the signal processing IP core. The data processing is performed by an IP core which receives and outputs data via respective FIFO buffers. The example includes a PLL (phase locked loop) which generates a clock signal suitable for the CPU and a clock signal suitable for the IP core and the controller.
Fig. 6 illustrates the rescaling of the system. The data processing part is provided with two additional IP cores. Each of the IP cores is controlled by a respective controller (FSMsl-FSMs3). Each IP core and its corresponding controller receives a dedicated clock signal (IPl clock-IP3 clock), thus allowing the IP cores to run at their maximum processing speed. Furthermore, an addition CPU (CPU2 is provided). CPU2 operates, together with the buffer connected to FSMs3, as a leaf node processor for IP3. CPUl operates, together with/the respective buffers, as leaf nodes for IPl and IP2. In addition CPUl is suitably programmed to operate as a node processor for the three leaf nodes. Both CPUl and CPU2 receive the synchronisation pulse and operates at their respective CPU clock frequency.
In the example, the control network tests the behaviour of the data processing network locally. In this example, the control network is connected to two, parallel data processing lines. The control network is divided onto three hierarchical levels. A root (ROOT) occupies the first level. This root is a master for a node (NODE) and two leaf-nodes (LEAF TEST1-LEAF FIE) that belong to the second and third levels respectively. A first leaf node ((LEAF TEST) interfaces and synchronizes the control network with a data generator, thus enabling to update and monitor the amplitude of the generated data. Another leaf-node (LEAF FIR) interfaces and synchronizes the control network with a FIR filter, which permits re-configuring the FIR filter coefficients and monitoring the output.
In Fig. 7, each of the data processing lines includes a input (IN), a test process (TESTO and a FIR filter (FIR) the time varying FIR filter (FIR) and a test process (TEST) receive control information from a control network starting from a root node (ROOT). The test generator sinks data into the input port of the FIR filter. The behaviour of both components is verified locally and across the network by modifying the FIR filter coefficients and the amplitude of the test data at a fixed point in time. Dedicated leaf-nodes (LEAF TEST;LEAF FIR) for both the test and the FIR process monitor the output of the processes after executing the control command and pass this information higher up in the control hierarchy to a node that diagnoses the status of the system. This test has been scheduled as part of a health management routine (a self- test procedure). Scheduling in an asynchronous network is possible because the same notion of time is distributed to all the elements in the control network with a synchronization pulse.
The timing of the self-test procedure is schematically shown in fig. 8, and is as follows:
1) send test commands;
2) check local behaviour; 3) check global behaviour;
4) return diagnostic to root.
The test is scheduled in terms of synchronization periods. If TO is the reference period at which the test command is issued, TN is the Nth period. The procedure is further detailed according the schedule in fig. 7. During TO, the root sends an asynchronous control packet in the control network, which orders the node to start a self-test procedure at Tl. The node therefore executes a pre-defined test procedure from Tl to T7 by generating control packets towards the control leaf-nodes. The node monitors the behaviour of these leaf-nodes and finally returns the state of the control network and interface with the data processing to the root after the test procedure. The procedure is a combination of two other subtests, which are performed by the leaf- nodes under the supervision of the node. These two subtests run concurrently since the asynchronous control packets generated by the node towards the two leaf- nodes are interleaved in time.
The first subtest executes a set of commands to verify the behaviour of the interface between the control network and the data generator. The amplitude of the data, which was initially set to 0, is set to 1 at T2. Then the data issued from the generator is monitored during T3. The leaf-node dedicated to the control of the generator receives the monitored data at T4. Finally, this leaf-node sends a packet to the node, indicating the result of the first subtest. The second subtest verifies the behaviour of the interface to the FIR filter. It starts with the monitoring of the output of the FIR filter during T4 in order to ensure that the filter received the correct data from the generator. Then the initial set of FIR coefficients (C=[l;l;l;l;l]) is modified during T5 by a new set (C=[2;2;2;2;2]). The output of the FIR filter is monitored during T6 (to check if the coefficients have been correctly updated) and returned to the leaf-node at T7. Finally this leaf-node indicates the result of this subtest to the node.
As shown in fig. 7,the control network enables a synchronicity of parallel distributed processing which is required for real-time processing, this requirement
translates from real-time to data-driven asynchronous processing systems where the a similar requirement is to minimize memory usage by ensuring there are minimal difference in latency between parallel data pipelines
The invention is not limited to implementation in the disclosed examples of devices, but can likewise be applied in other devices. In particular, the invention is not limited to physical devices or units implemented in non-programmable hardware but can also be applied in programmable devices or units able to perform the desired device functions by operating in accordance with suitable program code. Furthermore, the devices may be physically distributed over a number of apparatuses, while logically regarded as a single device. For example, a node processor may be implemented as a plurality of separate processors arranged to perform in combination the functions of the node. Also, devices logically regarded as separate devices may be integrated in a single physical device. For example, the node processors can be implemented in a single processor able to perform the functions of the respective nodes or the entire system can be implemented on a single chip, as a so called 'system on a chip' or Soc. In the latter case, a number of systems on a chip may be connected to each other via the nodes of the control network in order to form a larger system for distributed processing of data. In this respect, each SoC may be referred to a platform
The invention may also be implemented in a computer program for running on a computer system, at least including code portions for performing steps of a method according to the invention when run on a programmable apparatus, such as computer system or enabling a programmable apparatus to perform functions of a device or system according to the invention. Such a computer program may be provided on a data carrier, such as a CD-rom or diskette, stored with data loadable in a memory of a computer system, the data representing the computer program. The data carrier may further be a data connection, such as a telephone cable or a wireless connection transmitting signals representing a computer program according to the invention.
In the foregoing specification, the invention has been described with reference to specific examples of embodiments of the invention. However, various modifications and changes may be made therein. For example, while the examples describe transmitting control data, from the root node, to the dedicated data processing devices, it is also possible to apply the described mechanisms in a substantially reversed order, for monitoring the dedicated data processing devices. Furthermore,
the invention may also be applied to a systems and method for designing or implementing a distributed processing system.
Claims
1. A system for processing of signals, including: at least one signal processing unit; at least one control unit arranged to generate a set of control commands and presenting the control commands at an output; at least two interfacing nodes, each of said interfacing nodes including:
■ a data input connected to one control unit, for receiving control data from the control unit;
» a command data generator for generating command instructions in response to the received control data, said command instructions being readable by a signal processing unit connected to the node;
■ a clock input for receiving a synchronisation clock signal;
« a scheduling unit for scheduling a release of the command instructions based on the synchronisation clock signal; and a control output for outputting generated command instructions according to the schedule, which control output is connected with via a point-to-point connection to one of said data processing devices.
2. A system according to claim 1, wherein the signal processing unit has a first clock input for receiving a first clock of a first frequency, and the control unit has a second clock input for receiving a second clock signal of a second frequency, said second frequency being lower than said first frequency.
3. A system according to any one of the preceding claims, wherein the interfacing node includes a memory a specification of a finite state machine corresponding to the signal processing unit can be stored, and an interpreting unit for interpreting the specification and controlling a programmable device to function in accordance with the specification.
4. A system according to any one of the preceding claims, wherein at least a part of signal processing unit can be modified, for example by reprogramming or replacing with another component.
5. A system according to any one of the preceding claims, wherein the control unit includes a root node connected to at least one of the leaf node, for example via at least one internal node.
6. A system according to any one of the preceding claims, wherein the nodes form a hierarchical tree network.
7. A system according to any one of the preceding claims, including at least one platform, such as a field programmable gate array or system on a chip, on which at least one of said interface nodes and signal processing unit are implemented.
8. A system according to claim 7, wherein on at least one of said platforms a plurality of signal processing units and interfacing nodes is implemented.
9. A system according to any one of the preceding claims, wherein said nodes include an input for receiving data, an output for outputting data and a queuing unit for ordering received data based on a time-indication in the data and/or a priority indication in the data.
10. A system according to any one of the preceding claims, wherein each of the signal processing units receives control signals from an interface node only.
11.
12. A signal processing operation, including: performing signal processing by at least two signal processing units; controlling the signal processing, said controlling including: generating a separate set of command instructions for each signal processing unit, said command instructions being readable by the respective signal processing unit; • scheduling an order of release of the command instructions;
» outputting the set of command instructions a predetermined period after receiving a synchronisation clock signal.
13. A method for modifying a system according to any one of claims 1-11, including: adapting a signal processing unit on said, system, and reprogramming the interface node connected to the adapted signal processing unit.
14. An assembly of at least one control unit and an interface to a signal processing device, said control unit being arranged to generate a set of control commands and presenting the control commands at an output; the interface including at least two interfacing nodes, each of said interfacing nodes including:
• a data input connected to one control unit, for receiving control data from the control unit;
■ a command data generator for generating command instructions in response to the received control data, said command instructions being readable by the signal processing device connected to the node;
■ a clock input for receiving a synchronisation clock signal;
■ a scheduling unit for scheduling a release of the command instructions based on the synchronisation clock signal; and a control output for outputting generated command instructions according to the schedule, which control output is connected with via a point-to-point connection to one of said data processing devices.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP05076750 | 2005-07-22 | ||
EP05076750.8 | 2005-07-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2007011203A1 true WO2007011203A1 (en) | 2007-01-25 |
Family
ID=37036913
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/NL2006/000256 WO2007011203A1 (en) | 2005-07-22 | 2006-05-19 | Scalable control interface for large-scale signal processing systems. |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2007011203A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011162858A1 (en) * | 2010-06-23 | 2011-12-29 | Tabula, Inc. | Rescaling |
US8115510B2 (en) | 2005-07-15 | 2012-02-14 | Tabula, Inc. | Configuration network for an IC |
US8143915B2 (en) | 2007-06-27 | 2012-03-27 | Tabula, Inc. | IC with deskewing circuits |
US8295428B2 (en) | 2008-08-04 | 2012-10-23 | Tabula, Inc. | Trigger circuits and event counters for an IC |
US8650514B2 (en) | 2010-06-23 | 2014-02-11 | Tabula, Inc. | Rescaling |
US8990651B2 (en) | 2007-09-19 | 2015-03-24 | Tabula, Inc. | Integrated circuit (IC) with primary and secondary networks and device containing such an IC |
CN114281751A (en) * | 2020-09-28 | 2022-04-05 | 上海商汤智能科技有限公司 | Chip system |
CN116300557A (en) * | 2022-12-05 | 2023-06-23 | 上海励驰半导体有限公司 | Multisystem interaction control method and device, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5729757A (en) * | 1985-05-20 | 1998-03-17 | Shekels; Howard D. | Super-computer system architectures using status memory to alter program |
US5892962A (en) * | 1996-11-12 | 1999-04-06 | Lucent Technologies Inc. | FPGA-based processor |
US6096091A (en) * | 1998-02-24 | 2000-08-01 | Advanced Micro Devices, Inc. | Dynamically reconfigurable logic networks interconnected by fall-through FIFOs for flexible pipeline processing in a system-on-a-chip |
US20020085007A1 (en) * | 2000-06-29 | 2002-07-04 | Sun Microsystems, Inc. | Graphics system configured to parallel-process graphics data using multiple pipelines |
EP1229444A1 (en) * | 2000-11-29 | 2002-08-07 | Texas Instruments Incorporated | Media accelerator |
US20040010667A1 (en) * | 2002-07-11 | 2004-01-15 | International Business Machines Corporation | Apparatus and method for load balancing of fixed priority threads in a multiple run queue environment |
WO2004042560A2 (en) * | 2002-10-31 | 2004-05-21 | Lockheed Martin Corporation | Pipeline coprocessor |
-
2006
- 2006-05-19 WO PCT/NL2006/000256 patent/WO2007011203A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5729757A (en) * | 1985-05-20 | 1998-03-17 | Shekels; Howard D. | Super-computer system architectures using status memory to alter program |
US5892962A (en) * | 1996-11-12 | 1999-04-06 | Lucent Technologies Inc. | FPGA-based processor |
US6096091A (en) * | 1998-02-24 | 2000-08-01 | Advanced Micro Devices, Inc. | Dynamically reconfigurable logic networks interconnected by fall-through FIFOs for flexible pipeline processing in a system-on-a-chip |
US20020085007A1 (en) * | 2000-06-29 | 2002-07-04 | Sun Microsystems, Inc. | Graphics system configured to parallel-process graphics data using multiple pipelines |
EP1229444A1 (en) * | 2000-11-29 | 2002-08-07 | Texas Instruments Incorporated | Media accelerator |
US20040010667A1 (en) * | 2002-07-11 | 2004-01-15 | International Business Machines Corporation | Apparatus and method for load balancing of fixed priority threads in a multiple run queue environment |
WO2004042560A2 (en) * | 2002-10-31 | 2004-05-21 | Lockheed Martin Corporation | Pipeline coprocessor |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8760194B2 (en) | 2005-07-15 | 2014-06-24 | Tabula, Inc. | Runtime loading of configuration data in a configurable IC |
US8115510B2 (en) | 2005-07-15 | 2012-02-14 | Tabula, Inc. | Configuration network for an IC |
US8143915B2 (en) | 2007-06-27 | 2012-03-27 | Tabula, Inc. | IC with deskewing circuits |
US8990651B2 (en) | 2007-09-19 | 2015-03-24 | Tabula, Inc. | Integrated circuit (IC) with primary and secondary networks and device containing such an IC |
US8295428B2 (en) | 2008-08-04 | 2012-10-23 | Tabula, Inc. | Trigger circuits and event counters for an IC |
US8755484B2 (en) | 2008-08-04 | 2014-06-17 | Tabula, Inc. | Trigger circuits and event counters for an IC |
US8650514B2 (en) | 2010-06-23 | 2014-02-11 | Tabula, Inc. | Rescaling |
US8788987B2 (en) | 2010-06-23 | 2014-07-22 | Tabula, Inc. | Rescaling |
WO2011162858A1 (en) * | 2010-06-23 | 2011-12-29 | Tabula, Inc. | Rescaling |
US9257986B2 (en) | 2010-06-23 | 2016-02-09 | Altera Corporation | Rescaling |
CN114281751A (en) * | 2020-09-28 | 2022-04-05 | 上海商汤智能科技有限公司 | Chip system |
CN114281751B (en) * | 2020-09-28 | 2024-01-02 | 上海商汤智能科技有限公司 | Chip system |
CN116300557A (en) * | 2022-12-05 | 2023-06-23 | 上海励驰半导体有限公司 | Multisystem interaction control method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2007011203A1 (en) | Scalable control interface for large-scale signal processing systems. | |
US11093674B2 (en) | Generating clock signals for a cycle accurate, cycle reproducible FPGA based hardware accelerator | |
Hansson et al. | Aelite: A flit-synchronous network on chip with composable and predictable services | |
Kopetz et al. | The time-triggered architecture | |
Vanmeerbeeck et al. | Hardware/software partitioning of embedded system in OCAPI-xl | |
Hansson et al. | Trade-offs in the configuration of a network on chip for multiple use-cases | |
Kopetz et al. | Temporal composability [real-time embedded systems] | |
US10110679B2 (en) | Timed functions for distributed decentralized real time systems | |
US12066971B2 (en) | Direct network access by a memory mapped peripheral device for scheduled data transfer on the network | |
US20180217954A1 (en) | Asynchronous Start for Timed Functions | |
Sanchez-Garrido et al. | Digital electrical substation communications based on deterministic time-sensitive networking over Ethernet | |
Rezabek et al. | Engine: Flexible research infrastructure for reliable and scalable time sensitive networks | |
Pilato et al. | A runtime adaptive controller for supporting hardware components with variable latency | |
Kogel et al. | Integrated system-level modeling of network-on-chip enabled multi-processor platforms | |
Le et al. | Timed-automata based schedulability analysis for distributed firm real-time systems: a case study | |
Lee et al. | Dealing with AADL End-to-end Flow Latency with UML MARTE | |
US20140047262A1 (en) | Multiple clock domain tracing | |
JP2002524790A (en) | Synchronous polyphase clock distribution system | |
US7676685B2 (en) | Method for improving the data transfer in semi synchronous clock domains integrated circuits at any possible m/n clock ratio | |
Vermeulen et al. | Debugging distributed-shared-memory communication at multiple granularities in networks on chip | |
Basanta-Val et al. | A synchronous scheduling service for distributed real-time Java | |
Fennibay et al. | Introducing hardware-in-loop concept to the hardware/software co-design of real-time embedded systems | |
Kovácsházy et al. | Prototype implementation and performance of time-based distributed scheduling on linux for real-time cyber-physical systems | |
Lemaitre et al. | Behavioral specification of control interface for signal processing applications | |
Chandhoke et al. | A model-based methodology of programming cyber-physical systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 06747551 Country of ref document: EP Kind code of ref document: A1 |