HK1120315A

HK1120315A - Hardware-based messaging appliance

Info

Publication number: HK1120315A
Application number: HK08109067.1A
Authority: HK
Inventors: 巴利‧J‧汤普森; 库‧辛格; 皮埃尔‧费沃
Original assignee: 特维拉有限公司
Priority date: 2005-01-06
Filing date: 2005-12-23
Publication date: 2009-03-27

Description

Hardware-based messaging appliance

Reference to a previously filed application

This application claims priority from the following applications, and these applications are incorporated herein by reference: U.S. provisional application No.60/641,988 entitled "Event Router Systems and Method" filed on 6.1.2005; and U.S. provisional application No.60/688,983 entitled "Hybrid feed Handlers And Latency Measurement" filed on 8.6.2005.

This application relates to the following applications, and is incorporated herein by reference: U.S. patent application No. _________ (attorney docket No. 50003-.

Technical Field

The present invention relates to data messaging middleware architectures, and more particularly, to hardware-based messaging appliances in messaging systems having a publish and subscribe (hereinafter "publish/subscribe") middleware architecture.

Background

The increasing levels of performance required by data messaging infrastructures have forced the development of networking infrastructures and protocols. Basically, data distribution involves various data sources and destinations, as well as various types of interconnect architectures and communication modes between the data sources and destinations. Examples of existing data messaging architectures include hub-and-spoke (hub-and-spoke), peer-to-peer, and store-and-forward.

With the hub-and-spoke system configuration, all communications are transmitted through the hub, which often results in performance bottlenecks when throughput is high. Thus, such messaging systems create latency. One way to circumvent this bottleneck is to deploy more servers and distribute the network load among these different servers. However, this architecture presents scalability and operational issues. Systems with peer-to-peer configurations create unnecessary stress on applications to process and filter data, and are only as fast as their slowest customers or nodes, as compared to systems with hub-and-spoke configurations. While systems with store-and-forward system configurations store data before forwarding it to the next node in the path in order to provide persistence. Storage operations are typically implemented by indexing and writing messages to a storage disk, which can create a performance bottleneck. Furthermore, as the amount of messages increases, the indexing and writing tasks may be quite slow, thus possibly introducing additional latency.

Existing data messaging architectures share some deficiencies. A common deficiency is that data messaging in existing architectures relies on software residing at the application layer. This means that the messaging infrastructure is subject to OS (operating system) queuing and network I/O (input/output), which can create performance bottlenecks. Furthermore, routing in conventional systems is implemented in software. Another common deficiency is that existing architectures use data transfer protocols statically rather than dynamically, even though other protocols may be more appropriate in some situations. Some examples of common protocols include routable multicast, broadcast, or unicast. In fact, the Application Programming Interfaces (APIs) in existing architectures are not designed to switch between transport protocols in real time.

In addition, network configuration decisions are typically made at deployment time and are typically defined to optimize a set of network and messaging conditions under certain assumptions. The limitations associated with static (fixed) configurations preclude real-time dynamic network reconfiguration. In other words, existing architectures are configured for a particular transport protocol that does not always fit all network data transport load conditions, and therefore, existing architectures are always unable to handle changing or increasing load capacity demands in real time.

Furthermore, existing messaging architectures use routable multicast to transport data across a network when data messaging is destined for a particular recipient or group of recipients. However, in a system set up for multicast, there is a limit to the number of multicast groups that can be used to distribute the data, and as a result, the messaging system no longer sends the data to destinations that are not subscribed to it (i.e., customers that are not subscribers to that particular data). This increases the data processing load and drop rate for the client due to data filtering. Thus, a client that for any reason becomes overloaded and cannot keep up with the data flow eventually drops incoming data and later requires retransmission. Retransmissions have an impact on the overall system because all clients receive duplicate transmissions and all clients reprocess incoming data. Therefore, the retransmission may cause a multicast storm and may eventually crash the entire system.

When the system is set up for unicast messaging as one method of reducing drop rate, the messaging system may experience bandwidth saturation due to data replication. For example, if more than one customer subscribes to a given topic of interest, the messaging system must deliver the data to each subscriber, and in fact, the system sends a different copy of the data to each subscriber. Although this solves the problem of clients filtering out non-subscribed data, unicast transmission is not scalable and therefore basically not suitable for situations where a large group of clients subscribe to specific data or where consumption patterns are extremely overlapping.

In addition, in the path between the publisher and subscriber, messages travel in hops between applications, where each hop introduces application and Operating System (OS) latency. Thus, the total end-to-end latency increases as the number of hops increases. In addition, in routing messages from publishers to subscribers, the throughput of messages along a path is limited by the slowest node in the path, and there is no way in existing systems to implement end-to-end messaging flow control to overcome this limitation.

Another common deficiency of existing architectures is that their protocol conversion is slow and very numerous. This is due to IT-aid policy in the field of Enterprise Application Integration (EAI), where more and more new technologies are integrated with legacy systems.

Accordingly, there is a need in a number of fields to improve data messaging system performance. Examples where performance may need to be improved are speed, resource allocation, latency, etc.

Disclosure of Invention

The present invention is based in part on the foregoing observations and the insight that this deficiency can be addressed with better results using different approaches, including hardware-based solutions. These observations have led to the development of end-to-end message publish/subscribe middleware architectures, particularly hardware-based Messaging Appliances (MAs), for high volume, low latency messaging. Thus, a data distribution system having an end-to-end message publish/subscribe middleware architecture in accordance with the principles of the present invention can advantageously route large numbers of messages with very low latency by reducing intermediate hops with neighbor-based routing and network non-mediation, introducing efficient local-to-external and external-to-local protocol conversion, monitoring system performance (including latency) in real time, deploying topic-based and channel-based messaging, and dynamically and intelligently optimizing system interconnect configuration and messaging protocols, among other things. In addition, such systems may utilize data caching to provide guaranteed delivery quality of service.

In connection with resource allocation, the data distribution system according to the invention brings the advantage of dynamically allocating available resources in real time. In this regard, the present invention contemplates a system having a real-time, dynamic, learning approach for resource allocation, as opposed to the traditional static configuration approach. Examples where resource allocation can be optimized in real-time include network resources (utilization of bandwidth, protocols, paths/routes) and client system resources (utilization of CPU, memory and disk space).

In conjunction with monitoring system topology and performance, the data distribution system according to the present invention advantageously distinguishes between message-level and frame-level latency measurements. In some cases, the correlation between these measurements provides a competitive commercial advantage. In other words, the nature and extent of the wait time may indicate the best data and data source, which in turn may be used in a business process and provide a competitive advantage.

Thus, for the purposes of the present invention as shown and broadly described herein, an exemplary system having a publish/subscribe middleware architecture comprises: one or more messaging devices configured for receiving and routing messages; a medium; and a setup and management device linked via the medium, configured to exchange management messages with each messaging device. In such systems, the messaging appliances perform routing of messages by dynamically selecting a message transmission protocol and a message routing path.

Further in accordance with the purpose of the present invention, a Messaging Appliance (MA) is configured as an edge MA or core MA, where each MA has a high speed interconnect bus through which the various hardware modules are linked, and further, the edge MA has a Protocol Translation Engine (PTE). In each MA, the hardware modules are essentially divided into three groups of plane modules, respectively a control plane, a data plane and a service plane.

These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description, appended claims, and accompanying drawings as will be described hereinafter.

Drawings

The accompanying drawings incorporated in and forming a part of the specification illustrate various aspects of the present invention and, together with the description, serve to explain the principles of the invention. Wherever convenient, the same reference numbers will be used throughout the drawings to refer to the same or like elements.

FIG. 1 illustrates an end-to-end middleware architecture in accordance with the principles of the present invention.

Fig. 1a is a diagram illustrating an overlay network (overlay network).

FIG. 2 is a diagram illustrating an enterprise infrastructure implemented using an end-to-end middleware architecture in accordance with the principles of the present invention.

Fig. 2a is a diagram illustrating an enterprise infrastructure physical deployment with a message device (MA) that creates a network backbone non-intermediaries.

Fig. 3 illustrates the architecture of a channel-based messaging system.

Fig. 4 illustrates one possible topic-based message format.

Fig. 5 illustrates a topic-based message routing and routing table.

Fig. 6a-d are diagrams illustrating aspects of a hardware-based messaging appliance.

Figure 6e illustrates functional aspects of a hardware-based messaging appliance.

Fig. 7 illustrates the effect on adaptive message flow control.

Detailed Description

The description herein provides many details of the end-to-end middleware architecture of a message publish/subscribe system, and in particular, details of a hardware-based messaging facility (MA) according to various embodiments of the present invention. However, before the details of these various embodiments are summarized, the terms used in this description are briefly described below. Note that this description is for clarity only and gives the reader an understanding of how these terms might be used, but does not limit these terms to the context in which they are used, nor does it limit the scope of the claims accordingly.

The term "middleware" is used as a general term in the computer industry for any programming that is coordinated between two separate, usually pre-existing programs. The purpose of adding middleware is to offload some of the complexity associated with information exchange from the application, where this is achieved by defining communication interfaces between all participants (publishers and subscribers) in the network, etc. Generally, middleware programs provide messaging services so that different applications can communicate. With the middleware software layer, information exchange between applications can be performed seamlessly. Linking different applications together systematically, often through the use of middleware, is known as Enterprise Application Integration (EAI). However, in this context, "middleware" may be a broader term used in the context of messaging between a source and a destination and devices deployed to enable such messaging; thus, the middleware architecture, alone or in combination with the one described below, covers networking and computer hardware and software components that enable efficient data messaging. Further, the term "messaging system" or "middleware system" may be used in the context of a publish/subscribe system in which a messaging server manages message routing between publishers and subscribers. In fact, the paradigm of publish/subscribe in messaging middleware is extensible and thus a powerful model.

The term "client" may be used in the context of a client-server application or the like. In one example, a client is a system or application that registers with a middleware system using an Application Programming Interface (API) to subscribe to information and receive data delivered by the middleware system. The API inside the middleware architecture boundary is a client; and the external client is any publish/subscribe system (or external data destination) that does not use the API, and the message is to be converted by a protocol (to be described later) in order to communicate with it.

The term "external data source" may be used in the context of data distribution and message publish/subscribe systems. In one example, an external data source is considered a system or application, located within or external to an enterprise private network, that publishes messages using one of the commonly used protocols or its own messaging protocol. One example of an external data source is a market data exchange that publishes stock market offers, which are distributed to traders via a middleware system. Another example of an external data source is transactional data. Note that in a typical implementation of the present invention, which will be described in more detail later, the middleware architecture employs its unique native protocol into which data from external data sources is translated once it enters the middleware system domain, thereby avoiding the multi-protocol transformations typical of conventional systems.

The term "external data destination" is also used in the context of data distribution and message publish/subscribe systems. For example, an external data destination is a system or application located within or outside an enterprise private network that subscribes to information routed via a local/global network. One example of an external data destination may be the aforementioned market data exchange that processes a transaction order issued by a trader. Another example of an external data destination is transactional data. Note that in the aforementioned middleware architecture, messages destined for an external data destination are translated from a native protocol to an external protocol associated with the external data destination.

From the description herein, it will be appreciated that the present invention may be implemented in a variety of ways using messaging appliances in a variety of configurations implemented within a middleware architecture as hardware-based solutions. Thus, the description starts with an example of the end-to-end middleware architecture shown in FIG. 1.

This exemplary architecture combines a number of beneficial features including: messaging common concepts, APIs, fault tolerance, setup and management (P & M), quality of service (QoS-merged, best effort, guaranteed simultaneous connected, guaranteed simultaneous unconnected, etc.), persistent caching with guaranteed delivery of QoS, management of namespaces and security services, publish/subscribe ecosystems (core, ingress and egress components), transport transparent messaging, neighbor-based messaging (a model that is a hybrid between hub-spoke, peer-to-peer, and store-and-forward that can propagate subscriptions to all neighbors using a subscription-based routing protocol), late-plan binding, partial publication (as opposed to all data, only publishing changing information) and dynamic allocation of network and system resources when necessary. As will be described later, the publish/subscribe middleware system advantageously incorporates a fault tolerant design of the middleware architecture. In each publish/subscribe ecosystem, there are at least one or more (typically two or more) messaging devices (MAs), where each messaging device is configured to act as an edge (egress/ingress) MA or a core MA. Note that the core MA portion of the publish/subscribe ecosystem uses the aforementioned native messaging protocol (native to the middleware system), while the ingress and egress portions, the edge MAs, translate to and from the native protocol, respectively.

In addition to publish/subscribe system components, the diagram of FIG. 1 also shows the logical connections and communications between them. As can be seen from the figure, the middleware architecture shown is that of a distributed system. In a system having this architecture, logical communication between two distinct physical components is established using a message stream and an associated message protocol. The message stream contains one of two types of messages: management and data messages. Management messages are used to manage and control the different physical components, manage subscriptions to data, and so on. Data messages are used to transfer data between a source and a destination, and in typical publish/subscribe messaging, there are multiple senders and multiple recipients of data messages.

With the structural configuration and logical communication shown, the distributed messaging system with publish/subscribe middleware architecture is designed to perform a variety of logical functions. One logical function is message protocol translation, which is advantageously performed at the edge messaging facility (MA) component. This is because communication within the boundaries of the publish/subscribe middleware system is performed independently of the underlying transport logic using native protocols for messages. This is why this architecture is called transport transparent channel-based messaging architecture.

The second logical function is to route messages from publishers to subscribers. Note that these messages are routed throughout the publish/subscribe network. Thus, the routing function is performed by each MA in which messages are propagated, i.e., from edge MA106 a-b (or API) to core MA 108a-c, from one core MA to another, and finally to edge MA (e.g., 106b) or API110 a-b. The APIs 110a-b communicate with the application 112 via an interprocess communication bus (sockets, shared memory, etc.)_1-nAnd (4) communication.

A third logical function is to store messages for different types of guaranteed delivery quality of service, including for example guaranteed simultaneous connected and guaranteed simultaneous unconnected. This is achieved by adding a store-and-forward function. The fourth function is to deliver these messages to the subscriber. (As shown, APIs 106a-b deliver messages to subscribing application 112_1-n)。

In this publish/subscribe middleware architecture, system configuration functions as well as other management and system performance monitoring functions are managed by the P & M system. Configuration involves the physical and logical configuration of the publish/subscribe middleware system network and components. Monitoring and reporting involves monitoring the health of all network and system components and automatically reporting the results, either on a per-demand basis or logged. The P & M system performs its configuration, monitoring and reporting functions using management messages. In addition, the P & M system allows a system administrator to define a message namespace associated with each message routed throughout the publish/subscribe system. Thus, the publish/subscribe network may be physically and/or logically divided into namespace-based sub-networks.

The P & M system manages a publish/subscribe middleware system having one or more MAs. These MAs are deployed either as edge MAs or core MAs depending on their role in the system. Edge MAs are similar in most respects to core MAs, except that they include protocol translation engines that translate messages from external protocols to native protocols and vice versa. Thus, in general, the boundaries of the publish/subscribe middleware architecture in a messaging system (i.e., end-to-end publish/subscribe middleware system boundaries) are characterized by edges in which edges MA106 a-b and APIs 110a-b exist; and within these boundaries, there are core MAs 108 a-c.

Note that the system architecture is not limited to a particular restricted geographic area, and in fact, the system architecture is designed to transcend regional or national boundaries, even across continents. In this case, an edge MA in one network may communicate with an edge MA in another network that is geographically distant via existing networking infrastructure.

In a typical system, the core MAs 108a-c route messages published inside the publish/subscribe middleware system to edge MAs or APIs (e.g., APIs 110 a-b). Especially the routing graph in the core MA is designed for maximum, low latency and efficient routing. Furthermore, routing between core MAs can be dynamically changed in real time. For a given messaging path through multiple nodes (core MAs), real-time changes in routing are based on one or more metrics including network utilization, total end-to-end latency, traffic, network and/or message delay, loss, and jitter.

Alternatively, instead of dynamically selecting the best performing path from two or more different paths, the MA may perform multi-path routing based on message replication and thereby send the same message through all paths. All MAs located at the aggregation point of different paths will drop the duplicated messages, forwarding only the first arriving message. This routing method has the advantage that the messaging infrastructure is optimized for low latency; but a disadvantage of this routing is that the infrastructure requires more network bandwidth to carry the replicated traffic.

The edge MA has the ability to: converting any external message protocol of the incoming message to a native message protocol of the middleware system; and an external protocol that translates from a native message protocol to an outbound message. That is, as messages enter the publish/subscribe network domain (portal), the external protocols are translated to local (e.g., Tervela)^TM) A message protocol; and when the message leaves the publish/subscribe network domain (egress), the native protocol is translated to an external protocol. The edge MA is also operative to deliver published messages to subscribed external data destinations.

In addition, both the edge and core MAs 106a-b and 108a-c are capable of storing messages before forwarding them. One way in which this functionality may be implemented is by using Cache Engines (CEs) 118 a-b. One or more CEs may be connected to the same MA. In theory, the APIs are not considered to have such store-and-forward capabilities, although in practice the APIs 110a-b may store messages before delivering them to an application, and it may store messages received from an application before delivering them to a core MA, an edge MA, or another API.

When an MA (edge or core MA) has an active connection to a CE, it forwards all or a subset of the routed messages to the CE, which writes them to a storage area to achieve persistence. These messages may then be used for retransmission when requested for a predetermined period of time. Examples where such features are implemented are data relaying, partial publishing and various quality of service levels. Partial publication is effective in reducing network and customer load because it requires only updated information to be sent, not all information.

To illustrate how a routing graph may implement routing, several examples of publish/subscribe routing paths are shown in fig. 1. In this illustration, the middleware architecture of the publish/subscribe network provides five or more communication paths between publishers and subscribers.

The first communication path links an external data source to an external data destination. From external data sources 114_1-nThe received published message is translated to local (e.g., Tervela)^TM) The message protocol, and then routed by the edge MA106 a. One route by which native protocol messages may be routed from the edge MA106a is to the external data destination 116 n. This path is referred to as communication path 1 a. In this case, the native protocol message is converted into an external protocol message suitable for the external data destination. Another route by which native protocol messages may be routed from the edge MA106a is internally through the core MA 108 b. This path is referred to as a communication path 1 b. Along this path, the core MA 108b routes the local message to the edge MA106 a. However, native protocol messages are routed at the edge MA106a to the external data destination 116₁Before it converts them to fit the external data destination 116₁The external message protocol of (1). It can be seen that this communication path does not require an API to route messages from publishers to subscribers. Thus, if the publish/subscribe system is used for external source-to-destination communication, the system need not include an API.

Another communication path, referred to as communication path 2, links the external data source 114n to an application using the API110 b. Published messages received from external data sources are translated into native message protocols at the edge MA106a and then routed by the edge MA to the core MA 108. From the first core MA 108a, the messages are routed through another core MA 108c to the API110 b. From the API, the messages are delivered to the subscribing application (e.g., 112)₂). Because the communication path is bidirectional, in another example, messages may follow a reverse path from the subscribing application 112_1-nTo the external data destination 116 n. In each instance, the core MA receives and routes native protocol messages, while the edge MA receives and routes external or native protocol messages, respectively (the edge MA will be)Such external message protocol translates to/from a native message protocol). Each edge MA can route an ingress message to both the native protocol channel and the external protocol channel simultaneously, regardless of whether the ingress message comes as a native protocol message or as an external protocol message. As a result, each edge MA can route ingress messages to both external and internal clients simultaneously, with internal clients consuming native protocol messages and external clients consuming external protocol messages. This capability enables the messaging infrastructure to integrate seamlessly and smoothly with legacy applications and systems.

Another communication path, referred to as communication path 3, links two applications, both of which utilize APIs 110 a-b. At least one of these applications publishes a message or subscribes to a message. Delivery of published messages to or from a subscribing application is accomplished using an API located at the edge of the publish/subscribe network. When an application subscribes to a message, one of the core or edge MAs routes the message to the API, which then notifies the subscribing application when data is ready to be delivered to them. Messages published from an application are sent via the API to the core MA 108c to which the API is "registered".

Note that by "registering" (logging) to a MA, the API becomes logically connected to the MA. The API initiates a connection to the MA by sending a registration ("login" request) message to the MA. After registration, the API may subscribe to a particular topic of interest by sending its subscription message to the MA. Topics are used for publish/subscribe messaging to define shared access domains and targets for messages, and thus subscribing to one or more topics allows for receiving and transmitting messages with such topic annotations. P&M sends periodic authorization updates to the MAs in the network, each MA updating its own table accordingly. Thus, if an API is found to be authorized to subscribe to a particular topic (the MA verifies the authorization of the API using the routing authorization table), the MA activates a logical connection to the API. Then, if the API is properly registered with the core MA 108c, the core MA 108cThe data is routed to the second API110 as shown. In other examples, the core MA 108b may route messages through additional one or more core MAs (not shown) that route messages to the API110 b, which the AP 110b then delivers to the subscribing application 112_1-n。

It can be seen that the communication path 3 does not require the presence of the edge MA, since it does not involve any external data message protocol. In one embodiment, which gives an example of the communication paths herein, an enterprise system is configured with a news server that publishes to employees the latest news on a variety of topics. To receive news, employees subscribe to topics of interest to them via a news browser application that utilizes an API.

Note that the middleware architecture allows subscription to one or more topics. Further, this architecture subscribes to a group of related topics with a single subscription request by allowing wildcards in the message annotations.

Yet another communication path, referred to as communication path 4, is one of the multiple paths associated with the P & M systems 102 and 104, each of which links the P & M to one of the MAs in the publish/subscribe network middleware architecture. The messages that go back and forth between the P & M system and each MA are management messages that are used to configure and monitor the MA. In one system configuration, the P & M system communicates directly with the MA. In another system configuration, the P & M system communicates with some MAs through other MAs. In yet another configuration, the P & M system may communicate directly or indirectly with the MA.

In a typical implementation, the middleware architecture may be deployed on a network having switches, routers, and other networking devices, and which employs channel-based messaging capable of communicating over any type of physical medium. One exemplary implementation of such architecture-agnostic channel-based messaging is an IP-based network. In this environment, all communication between all publish/subscribe physical components is performed over UDP (datagram protocol), and transport reliability is achieved by the message transport layer. Figure 1a illustrates an overlay network in accordance with the present principles.

As shown, overlay communications 1, 2, and 3 may occur between the three core MAs 208a-c via switches 214a-c, routers 216, and subnets 218 a-c. In other words, these communication paths may be built on top of an underlying network comprising networking infrastructure such as subnets, switches and routers, and as noted above, such an architecture may span a large geographic area (different countries or even different continents).

It should be apparent that the foregoing and other end-to-end middleware architectures in accordance with the principles of the present invention may be implemented in a variety of enterprise infrastructures in a variety of business environments. One such implementation is illustrated in fig. 2.

In the enterprise infrastructure, market data distribution plants 12 are built on top of a publish/subscribe network for communicating data from various market data exchange devices 320_1-nTo a trader (application not shown). Such overlay solutions rely on the underlying network to provide, for example, inter-MA and such MA and P&Interconnection between M systems. To API 310_1-nIs based on application ordering. Using this infrastructure, traders utilizing applications (not shown) will come from the API 310_1-nIs placed back to the market data exchange device 320 (via the core MA308a-b and the edge MA 306b) through the publish/subscribe network_1-n。

One example of an underlying physical deployment is shown in fig. 2a. As shown, the MAs are directly connected to each other and plugged directly into the network and subnetworks in which the clients and publishers of messaging traffic are physically connected. In this case, the interconnects should be direct connections, i.e. direct connections between MAs and between them and the P & M system. This enables network backbone non-mediation, and physical separation of messaging traffic from other enterprise application traffic. Effectively, the MA may be used to remove reliance on traditional routing networks for messaging traffic.

In this example of physical deployment, an external data source or destination, such as a market data exchange device, is directly connected to edge MA, e.g., edge MA 1. A messaging traffic consuming or publishing application, such as a market trading application, is connected directly to subnets 1-12. These applications have at least two routes for ordering, publishing, or communicating with other applications; they may utilize either an enterprise backbone network comprising multiple layers of redundant routers and switches that carry all enterprise application traffic, including but not limited to messaging traffic; the messaging backbone includes an edge and a core MA directly interconnected to each other via an integrated switch. The use of an alternate backbone network has the advantage that: the messaging traffic is isolated from other enterprise application traffic to better control the performance of the messaging traffic. In one implementation, applications residing in the subnet 6 are logically or physically connected to the core MA3, subscribing or publishing native protocol message traffic using the TervelaAPI. In another implementation, applications residing in subnet 7 are logically or physically connected to edge MA1, subscribing or publishing messaging traffic for external protocols, where the MA performs protocol translation with an integrated protocol translation engine module.

Logically, the physical components of the publish/subscribe network are built on top of a messaging layer similar to layers 1 to 4 of the Open Systems Interconnection (OSI) reference model. Layers 1 to 4 of the OSI model are the physical layer, the data link layer, the network layer and the transport layer, respectively.

Thus, in one embodiment of the invention, a publish/subscribe network may be deployed directly into an underlying network/fabric by, for example, inserting one or more messaging line cards in all or a subset of the network switches and routers. In another embodiment of the invention, the publish/subscribe network may be efficiently deployed as a mesh overlay network (where all physical components are connected to each other). For example, a full mesh network of 4 MAs is one in which each MA is connected to each of its 3 peer MAs. In a typical implementation, the publish/subscribe network is a mesh network of the following components: one or more external data sources and/or destinations, one or more setup and management (P & M) systems, one or more Messaging Appliances (MAs), one or more optional Caching Engines (CEs), and one or more optional Application Programming Interfaces (APIs).

As will be described in greater detail below, reliability, availability, and consistency are generally necessary in enterprise operations. To this end, a publish/subscribe middleware system may be designed for fault tolerance, with several of its components deployed as fault tolerant systems. For example, MAs may be deployed as fault-tolerant MA pairs, where a first MA is referred to as a primary MA and a second MA is referred to as a secondary MA or a fault-tolerant MA (ftma). Also, for store-and-forward operations, a CE (cache engine) may be connected to the primary or secondary core/edge MA. When a primary or secondary MA has an active connection to a CE, it forwards all or a subset of the routed messages to that CE, which writes them to a storage area to achieve persistence. These messages are then available for retransmission upon request within a predetermined time period.

As previously described, communication within the boundaries of each publish/subscribe middleware system is performed using native protocols for messages, independent of the underlying transport logic. This is why this architecture is called transport transparent channel-based messaging architecture.

Fig. 3 illustrates the channel-based messaging architecture 320 in more detail. In general, each communication path between a messaging source and destination is defined as a messaging channel. Each channel 326_1-nUsing an interface 328 between a channel source and a channel destination_1-nEstablished through physical media. Each such channel is established for a specific messaging protocol, such as local (e.g., Tervela)^TM) Messaging protocols, or otherwise. Edge MA only (pair)Those MAs that manage the ingress and egress of the publish/subscribe network) utilize a channel message protocol (external message protocol). Based on the channel message protocol, channel management layer 324 determines whether the incoming and outgoing messages require protocol translation. At each edge MA, if the channel message protocol of the incoming message is different from the native protocol, the channel management layer 324 will perform protocol translation by sending the messages to be processed through a Protocol Translation Engine (PTE)332 before passing them to the native message layer 330. Also, at each edge MA, if the native message protocol of the outgoing message is different from the channel message protocol (external message protocol), the channel management layer 324 will send the messages to be processed through the Protocol Translation Engine (PTE)332 before routing them to the transport channel 3261-n, thereby performing protocol translation. Thus, the channel manages the interface 3281-n with the physical medium, the particular network and transport logic associated with the physical medium, and the message components or fragments.

In other words, the channel manages the OSI transport layer 322. Optimization of channel resources is performed on a per-channel basis (e.g., message density optimization of the physical medium based on consumption patterns including bandwidth, message size distribution, channel destination resources, and channel health statistics). Then, because the communication channel is architecture agnostic, no specific type of architecture is required. Virtually any fabric medium will work, such as ATM, Infiniband, or Ethernet.

Incidentally, when, for example, a single message is divided into a plurality of frames or a plurality of messages are packed into a single frame, message segmentation or reassembly may be required. Message segmentation or reassembly is performed before the message is delivered to the channel management layer.

Fig. 3 further illustrates a number of possible channel implementations in a network with a middleware architecture. In one implementation 340, the communication is performed via a network-based channel using multicast over an ethernet-switched network that serves as the physical medium for such communication. In this implementation, the source sends a message from its IP address via its UDP port to a group of destinations (defined as an IP multicast address) with their associated UDP ports. In a variation 342 of this implementation, the communication between the source and destination is implemented using UDP unicast over an ethernet switched network. The source sends a message from its IP address via its UDP port to a selected destination having a UDP port at its corresponding IP address.

In another implementation 344, the channel is established over an Infiniband interconnect using a native Infiniband transport protocol, where the Infiniband fabric is the physical medium. In this implementation, the channel is node-based, and the communication between the source and destination is node-based using their respective node addresses. In yet another implementation 346, the channel is memory based, such as RDMA (remote direct memory access), and is referred to herein as a Direct Connection (DC). With this type of channel, messages are sent directly from the source machine to the memory of the destination machine, bypassing CPU processing to handle messages from the NIC to the application memory space, and possibly avoiding the network overhead of encapsulating messages into network packets.

As for the local protocol, one method utilizes the aforementioned local Tervela^TMAnd (4) message protocol. Conceptually, Tervela^TMThe messaging protocol is similar to the IP-based protocol. Each message contains a message header and a message payload. The message header contains a plurality of fields, one of which is used for topic information. As described above, topics are used by customers to subscribe to a shared information domain.

Fig. 4 illustrates one possible topic-based message format. As shown, the message includes a header 370 and bodies 372 and 374, the bodies 372 and 374 including a payload. Two types of messages, namely data and management messages, are shown, the two types of messages having different message bodies and payload types. The header includes fields for: source and destination namespace identifications, source and destination session identifications, topic sequence numbers, and wish timestamps, and additionally, it includes a topic notation field (which is preferably variable length). A topic can be defined as a tag-based string, e.g., nyse. rtf. IBM 376, which is a string of topics that contains messages for IBM stock real-time quotes.

In some implementations, topic information in a message may be encoded or mapped to a key, which may be one or more integer values. Each topic would then be mapped to a unique key and a mapping database between topics and keys would be maintained by the P & M system and updated to all MAs over the wire. As a result, the MA is able to return the associated unique key for the topic field of the message when the API subscribes to or publishes a topic.

Preferably, the subscription format will follow the same format as the message topic. However, the subscription format also supports wildcards matching any topic substring and regular expression patterns matching the topic substring. The mapping of wildcards to actual topics can be P & M system dependent or handled by the MA according to the complexity of the wildcard or pattern matching request.

For example, pattern matching may follow, for example, the following rules:

example # 1: with wildcard T1.^*T3.t4 string will match t1.t2a. t3.t4, t1.t2b. t3.t4, but not t1.t2.t3.t4.t5

Example # 2: with wildcard T1.^*.T3.T4.^*Will not match t1.t2a.t3.t4, t1.t2b.t3.t4, but will match t1.t2.t3.t4.t5

Example # 3: with wildcard T1.^*.T3.T4.[^*]The string of characters (the fifth element is optional) will match with t1.t2a. t3.t4, t1.t2b. t3.t4, and t1.t2.t3.t4.t5, but not with t1.t2.t3.t4.t5.t6

Example # 4: with wildcard character T1.T2^*T3.t4 string will match t1.t2a. t3.t4, t1.t2b. t3.t4, but not t1.t5a. t3.t4

Example # 5: with wildcard T1.^*A string of t3.t4 > (any number of trailing elements) would match t1.t2a.t3.t4, t1.t2b.t3.t4, t1.t2.t3.t4.t5 and t1.t2.t3.t4.t5.t6

Fig. 5 illustrates topic-based message routing. As shown, a topic may be defined as a tag-based string, e.g., T1.t2.t3.t4, where T1, T2, T3, and T4 are variable-length strings. As can be seen, incoming messages with a particular topic annotation 400 are selectively routed to a communication channel 404, and routing determinations are made based on a routing table 402. The mapping of topic subscriptions to channels defines routes and is used to deliver messages throughout the publish/subscribe network. A superset of all these routes or mappings between subscriptions and channels defines the routing table. The routing table is also referred to as a subscription table. The subscription table for routing with string-based topics can be constructed in a number of ways, but is preferably configured to optimize its size and routing lookup speed. In one implementation, the subscription table may be defined as a dynamic hash map structure, while in another implementation, the subscription table may be arranged in a tree structure, as shown in the diagram in fig. 5.

The tree includes nodes (e.g., T) connected by edges₁、…、T₁₀) Wherein each substring of the topic subscription corresponds to a node in the tree. The channels mapped to a given subscription are stored on the leaf nodes of the subscription, each leaf node indicating a list of channels from which the topic subscription came (i.e., through which subscription requests were received). The list indicates which channel should receive a copy of the message whose topic notation matches the subscription. As shown, the message routing lookup takes a message topic as input and then parses the tree with each substring of that topic to locate the different channels associated with the incoming message topic. E.g. T₁，T₂，T₃，T₄And T₅Directed channels 1, 2, and 3; t is₁、T₂And T₃A steered channel 4; t is₁、T₆、T₇、T_*And T₉Directed channels 4 and 5; t is₁，T₆，T₇，T₈And T₉Directed channel 1; and T₁、T₆、T₇、T_*And T₁₀Is directed to channel 5.

Although the choice of the structure of the routing table is to optimize the lookup of the routing table, the performance of the lookup also depends on the search algorithm used to find one or more topic subscriptions that match the incoming message topic. Therefore, the routing table structure should be able to accommodate such an algorithm and vice versa. One way to reduce the size of the routing table is to allow the routing algorithm to selectively propagate subscriptions throughout the publish/subscribe network. For example, if a subscription appears to be a subset of another subscription that has already been propagated (e.g., a portion of the entire string), there is no need to propagate the subset subscription because the MA already has information for a superset of the subscription.

Based on the foregoing, the preferred message routing protocol is a topic-based routing protocol in which authorization is indicated in a mapping between subscribers and corresponding topics. Authorization is specified for each subscriber or group/category of subscribers, indicating what messages the subscription has the right to consume or which messages the publisher can generate (publish). These authorizations are defined in the P & M machine, transmitted to all MAs in the publish/subscribe network, and then used by the MAs to create and update their routing tables.

Each MA updates its routing table by keeping track of who is inserted (requesting subscription) into what message. However, before adding a route to its routing table, the MA must check the subscription for authorization of the publish/subscribe network. The MA authentication may be that the subscribing entity of the neighbor MA, P & M system, CE or API is authorized to do so. If the subscription is valid, a route will be created and added to the routing table. Then, because some authorizations may be known in advance, the system may be deployed with predefined authorizations, and these authorizations may be field loaded at boot time. For example, some specific management messages, such as configuration updates, may always be forwarded throughout the network and thus automatically loaded at startup.

Given the above description of a messaging system having a publish/subscribe middleware architecture, it can be appreciated that Messaging Appliances (MAs) play an important role in such a system. Thus, the details of a hardware-based Messaging Appliance (MA) configured in accordance with the principles of the present invention will now be described. In one implementation of the invention, the MA is a standalone device. In another implementation of the invention, the MA defines embedded components (e.g., line cards) within any network physical component, such as a router or switch. Fig. 6a, 6b, 6c and 6d are block diagrams illustrating hardware-based MAs in various degrees of detail. Fig. 6e shows MA from a functional perspective.

Generally, the architecture of the MA is established based on a high-speed interconnect bus to which various hardware modules are connected. Fig. 6a and 6b show the basic architecture of the edge and core MAs 106 and 108, respectively, with a high speed interconnect bus 508 interconnecting the various hardware modules 502, 504, and 506. The edge MA (106 in fig. 6 a) is shown configured with a Protocol Translation Engine (PTE) module 510, while the core MA (108 in fig. 6b) is shown unconfigured with a PTE module. As further shown, in one embodiment, the high speed interconnect bus is constructed as a PCI/PCI-X bus tree in which the hardware modules are PCI/PCI-X peripheral devices. PCI (peripheral component interconnect) is generally known as an interconnect system for high-speed operation of computers. PCI-X (extended peripheral component interconnect) is a computer bus technology (the "data pipe" between computer components) used for higher speed computer operations. In alternative embodiments, the high-speed interconnect bus is constructed as an Infiniband or direct memory connection medium. In yet another embodiment, the hardware module is a blade (blade) connected via a switched fabric backplane (e.g., advanced telecommunications computing architecture, ATCA).

The various hardware modules of each MA may be essentially divided into three groups, a control plane module group 502, a data plane module group 504, and a service plane module group 506. The control plane module group handles MA management functions including configuration and monitoring. Examples of MA management functions include configuring network management services, configuring hardware modules connected to the high speed interconnect bus, and monitoring these hardware modules. The set of data plane modules handles data message routing and message forwarding functions. The module group handles messages transmitted by the publish/subscribe middleware system as well as management messages, but management messages may also be delivered to the control plane module group. The set of service plane modules handles other local services that can be seamlessly used by the control and data plane modules. In one embodiment, the local service may be the following: time synchronization services for latency measurement are provided using a GPS card or any externally synchronized device capable of periodically receiving millisecond-grained signals. These three module groups will be described further below in connection with fig. 6c, and fig. 6a and 6 b.

The control plane module group 502 includes a management module 512. Generally, the management module incorporates one or more CPUs running an Operating System (OS), such as Linux, Solaris, Windows, or any other OS. Alternatively, the management module incorporates one or more CPUs in a blade (server) installed in a high-speed interconnect backplane. In yet another configuration, the management module incorporates one or more CPUs running in a high-performance rack-mount host server (high-performance host server).

In addition, the management module 512 includes one or more logical configuration paths. The first configuration path is established through a serial interface or network connection using a Command Line Interface (CLI) through which a system administrator may enter configuration commands. A logical configuration path through the CLI is typically established to provide the MA with initial configuration information to allow it to establish connectivity with the P & M system. This initial configuration provides, for example, but not limited to, the following information: the local management IP address, default gateway, IP address of the P & M system to which the MA is connected. All or a subset of this configuration may be used to initialize various hardware components in the MA as part of the boot process.

The second configuration path is established using management messages routed through the publish/subscribe middleware system. Once a MA has connectivity to one or more P & M systems, it will register with at least one P & M system and retrieve its configuration. The configuration is sent to the MA via a management message delivered locally at the management module 512.

The MA configuration information retrieved from the P & M system contains parameters, addresses, etc. Examples of MA configuration information may include Syslog configuration parameters, Network Time Protocol (NTP) configuration parameters, Domain Name Server (DNS) information, remote access policies via SSH/Telnet and/or HTTP/HTTPs, authentication methods (Radius/Tacacs), publish/subscribe authorization, routing information indicating connectivity to neighboring MAs or APIs, and so forth.

The entire MA configuration may be cached at the management module into one or a combination of memory resources associated with the management module. The MA configuration may be cached, for example, in a memory space at the management module, in a volatile storage area (e.g., a RAM disk for booting a file system), in a non-volatile storage area (e.g., a memory flash card or hard drive), or in any combination thereof. If it is still present after reboot, this cached configuration may be loaded by the MA at startup.

In one implementation, the configuration of the cache also contains a configuration Identifier (ID) provided by the P & M system. This configuration ID may be used for comparison, where the MA configuration ID cached locally on the MA is compared to the current MA configuration ID on the P & M system. If the configuration IDs in both the MA and the P & M are the same, the MA may bypass the configuration transfer phase, adopting the configuration of the local cache. Additionally, in the event that the P & M system is unreachable, the MA may revert to the last known configuration, whether it is the last configuration, rather than starting up without any configuration.

Once the MA is started and running, the set of control plane modules (management module 152) monitors the health and any status change flags (status change events) associated with the various logical components within the MA's hardware modules. For example, the state change events may indicate API registration, MA registration, or they may be subscription/unsubscribe events. These and other state change events are generated and may be stored locally at the MA for a certain time. The MA reports these events to the system monitoring tool.

The MA may be monitored remotely using Simple Network Management Protocol (SNMP) or by a P & M real-time monitoring and/or historical trend UI (user interface) module that tracks raw statistics flowing from the MA to the P & M. Such raw statistics may be batched according to a time period to reduce the amount of monitoring traffic being generated. Alternatively, such raw statistics may be aggregated and processed (e.g., by calculation) by time period.

The control plane module of the MA is also responsible for loading new or old firmware versions onto specific hardware modules. In one example, the firmware image is made available to the MA via an update on the line. In these maintenance windows, the new firmware image is first from P&The M system is downloaded to the MA. Upon receiving and validating the firmware image, the MA uploads the image to the target hardware module. After the update is complete, the hardware module may have to be rebooted in order for the upgrade to take effect. There are a number of ways to validate software images, one involving embedded signatures. For example, the MA checks whether the image has been provided by the system provider or an authorized licensee or group thereof (e.g., Tervela or Tervela)^TMAny licensee of the technology).

Preferably, the traffic of the system management messages is routed through a dedicated physical interface. This approach allows different virtual lans (vlans) to be created for management and data message traffic. This may be accomplished by configuring a switch port connected to a particular physical interface to be dedicated to such an interface to a VLAN for all system management message traffic. All or a subset of the remaining physical interfaces will then be dedicated to the VLAN for the data message. By differentiating and segregating different types of traffic within the infrastructure, the performance of each type of message traffic can be managed independently.

Another function of the control plane module set is the function of monitoring the status of the subscription table and statistics on the messaging channel between the MA and the API. Based on this information, the Protocol Optimization Service (POS) in the MA decides whether to switch, for example, from a unicast channel to a multicast channel, or vice versa. Similarly, in the case where a slower customer is found, the POS may decide whether to move the slower customer from the multicast channel to the unicast channel to preserve the operational integrity of the multicast channel.

The aforementioned set of data plane modules (502, in fig. 6a and 6b) includes one or more physical interface cards (PIC; 514, in fig. 6 a-c), such as fast ethernet, gigabit ethernet, 10 gigabit ethernet, gigabit speed memory interconnect, and the like. These data planes PIC are logically controlled by one or more Message Processor Units (MPUs). The MPU is implemented as a network processor unit 516, a MIPS-based network processing card, a custom ASIC, or an embedded solution on any platform.

The PIC 514 processes frames containing one or more messages. Frames enter the MA through an ingress PIC, which contains one or more chipsets to control media-specific processing. In one configuration, the PIC is also responsible for OSI layer four terminations, which corresponds to channel transmission specific terminations, e.g., TCP or UDP terminations. As a result, data forwarded from the PIC to the MPU may contain only the message stream from the incoming frame. In another configuration, the PIC sends the network packet to a channel engine 520 running on the MPU. The channel engine performs OSI layer three to layer four processing prior to handing over messages contained in the network packet.

In yet another configuration, PIC 514 is a memory interconnect interface that forwards messages to channel engine 520 using a channel-specific transport protocol. Additionally, in this case, the channel engine would have a channel-specific processing adapter to parse and extract messages from the incoming data.

In another configuration, the PIC may have a dedicated chipset and on-board memory for performing fast forwarding of message frames, as opposed to passing these frames to the PMU to be routed by the message routing engine 518. To implement this fast forwarding approach, the global routing (subscription) table is distributed in whole or preferably in part from the MPU to the forwarding cache in the PIC. With such a routing table in its forwarding cache, the ingress PIC can examine incoming message frames to identify one or more topics or any subset of topics among them, and forward the frames directly to the egress PIC based on such topics. Note that the advantages of faster route selection lookup and, as a result, faster message forwarding can be obtained if the subscription table of the forwarding cache distributed to the PIC represents only a subset of the global subscription table.

The aforementioned MPU 516, with its channel engine 520, is responsible for managing the communication interface between the PIC and the message routing engine 518. The MPU also takes care of maintaining a subscription table and matching incoming messages to subscriptions and channels with its message routing engine 518. These functions may be implemented in a number of ways, in one of which they are configured to run on different microengines or microchips, and in another of which they are configured to run on separate CPU cores. In the second case, each core employs a standard or custom network stack. In yet another implementation, these functions are configured to run in a multi-core CPU on top of a real-time OS.

The preferred MPU also has an embedded media exchange architecture 522. Because the messaging channel is fabric agnostic, the MPU can interface to any type of physical medium 524. Messages forwarded from the PIC and optionally messages forwarded from the media switch fabric are received by channel engine 520 and then forwarded to message routing engine 518.

The channel engine 520 manages the message delivery channel queues. Fig. 6d shows message queuing with interim message cache 524 and message forwarding with channel engine 520.

At the receiving side, the message is removed from the channel queue. In some instances, the messaging channels may have a special priority. Messaging channel priority is useful when more than one channel has pending messages. For example, a message retransmission request should be forwarded first; thus, it should be understood that different channels are created for retransmission requests. Delaying the retransmission requests may result in more retransmission requests; this is typically the case when broadcast/multicast storms occur.

For edge MA 108, protocol switch 526 in channel engine 520 checks whether the message requires protocol translation. If translation is necessary, the message is sent to the protocol translation engine 510. Where the message is converted to a native protocol (e.g., Tervela) by a protocol translation engine^TMProtocol) format, it is forwarded to the caching component 528. The caching component places the message in an interim message cache 524, where the message will be temporarily available for retransmission. After its time period has elapsed, the message will be removed or overwritten by another message. In one configuration, the interim message cache is implemented as a simple memory ring buffer shared with the message routing engine 518. Preferably, the interim message buffer lookup is optimized to speed up the retransmission process by, for example, maintaining a mapping of the message sequence number to the actual message in the buffer. Message routing engine 518 fetches the message from interim message buffer 524, performs a subscription lookup, and then returns a channel list to forward a copy of the message.

Some management messages may have to be delivered locally at the management module 512 over the shared bus 508 (fig. 6a, 6b and 6 c). Locally delivered messages may also be forwarded throughout the publish/subscribe middleware system. In one implementation, message routing engine 518 pushes copies of the messages into queues for each channel. In another implementation, message routing engine 518 simply queues a reference to the message or a pointer to the message, where the message itself is still in the temporary message cache. This approach has the advantage of optimizing memory utilization on the MPU, since more than one queue may reference the same message. In addition, message routing engine 518 appends a reference (e.g., a pointer) to the message in subscription message queue 532, where the subscription queues of subscriptions S1 and S2 point to the message in temporary message cache 524.

Each queue then maintains a list of references to all subscriptions associated with it. The method has the advantages that: enabling subscription level message processing rather than just channel level message processing. Effectively, these subscription queues provide a way to index messages on a per subscription basis and on a per channel basis; thus, if a message needs to be processed for a given subscription, it shortens the lookup time. For example, in one embodiment, real-time merge logic is used on a per subscription basis. This also allows the MPU to perform incremental calculations, such as a Volume Weighted Average Price (VWAP) calculation for stock market quote messages.

At the sender, the message routing engine 518 marks or flags the channel with pending queued messages. This allows the channel scheduler 530 to know which channel or channels require attention or have a special priority. Channel priority may be obfuscated to provide quality of service (QoS) functionality. For example, the QoS function is implemented based on the message header field alone or based on a combination of the message header field and the message subject. At this point, the message routing engine 518 moves to the next message in the message buffer ring buffer.

Channel scheduler 530 forwards pending messages using a channel-specific communication policy across all channels having queued messages. The policy determines which transport protocol to use, unicast, multicast or otherwise. The communication policy may be negotiated at the time the channel is created, or it may be updated in real-time based on resource utilization patterns, such as network bandwidth utilization, messages, packet delay, jitter, loss, etc. The channel-specific communication policy may be further based on packet flow control parameters negotiated with one or more channel destinations (e.g., neighboring MAs or APIs). For example, rather than sending all messages, one of the N messages may be dropped. Thus, one aspect associated with this policy is message flow control.

Fig. 7 shows a real-time Message Flow Control (MFC) algorithm. According to the algorithm, the size of the channel queue may operate as a threshold parameter. For example, messages delivered over a particular channel accumulate in its channel queue at the receiving device side, and as the channel queue grows, its size can reach a high threshold that it cannot safely exceed in the unlikely event of a channel failure to keep up with the incoming message flow. Near such a situation where the channel is in danger of reaching its maximum capacity, the receiving messaging appliance may activate the MFC before the channel queue is out of bounds. The MFC is closed when the queue shrinks and its size becomes less than the low threshold. The difference between the high and low thresholds is set to be sufficient to produce this behavior, called hysteresis, in which the MFC is turned on at a higher queue size value than it is turned off. This threshold difference avoids frequent on-off oscillations in message flow control that can occur when queue sizes are constantly changing around the high threshold. Thus, to avoid queue overrun on the message delivery recipient, the rate of incoming messages is constrained by the real-time dynamic MFC, which keeps the rate below the maximum channel capacity.

As an alternative to the hysteresis-based MFC algorithm, in which messages are discarded when the channel queue is close to its capability, the real-time dynamic MFC can operate to mix data or apply some merging algorithm to the order queue. However, because such an operation may require additional message transformation, it may revert to a slower forwarding path rather than reserving a faster forwarding path. This places a negative impact of message transformation on message delivery throughput. This additional message transformation is performed by a processor similar to the protocol transformation engine. Examples of such processors include NPUs (network processing units), semantic processors, separate microengines on MPUs, and so forth.

For greater efficiency, real-time merge or subscription level message processing may be distributed between the sender and the recipient. For example, in a situation where only one subscriber requests subscription level message processing, it makes sense to push it downstream on the receiver side rather than perform it on the sender side. However, if more than one data client is requesting the same subscription level message processing, it makes sense to perform it downstream on the sender side. The goal of distributing the workload on the sender and receiver sides of the channel is to optimally use the available combined processing resources.

The transmission channel itself handles transmission-specific processing that is more likely to be performed on the MPU or PIC with the system chip on the receiving side. When the channel packs multiple messages into a single frame, it can keep the message latency below the maximum acceptable latency and relieve the recipient of stress by freeing up some processing resources. Sometimes it is more efficient to receive a smaller number of large frames rather than to process many smaller frames. This is especially true for APIs that might run on a telecommunications OS using general-purpose computer hardware components (including CPUs, memory, and NICs). In general, the NIC is designed to generate an OS interrupt for each received frame, which in turn reduces the application-level processing time that the API can use to deliver messages to the subscribing application.

As described above, only the edge MA has a Protocol Translation Engine (PTE). In edge MA, the data plane module can forward incoming messages to the PTEs (510, in fig. 6a, 6c and 6 d). Such forwarding decisions are made at MPU 512 by a protocol switch 526 running as part of channel engine 520. When the incoming and outgoing message protocols are different from the local message protocol, the message is forwarded to the PTE.

The PTEs may be implemented in a variety of ways using hardware and software in any combination, including using, for example, a semantic processor, an FPGA, an NPU, or an embedded software module executing in real time, an embedded OS running on a network-oriented system chip or MIPS-based processor. As shown in the example of fig. 6c, the PTE has a pipelined task-oriented microengine, including message parsing, message rule lookup, message rule application, and message format engine. The architectural constraint in building such hardware modules is to keep message transformation latency low while allowing multiple complex syntax transformations between protocols. Another constraint is to make firmware updates for the protocol conversion syntax (syntax) very flexible and independent of the underlying hardware.

First, in the pipeline, the message parsing engine 540 takes out a message dequeued from the PTE ingress queue 548, then parses, identifies and tags the message. The message parsing engine then forwards the results to message rule lookup engine 542. The message rule lookup engine performs a rule lookup based on the message content, retrieving the matching rule that needs to be applied. The message content and matching rules are then passed to the message rule application engine 544. The rule application engine transforms the tags of the message according to the matching rules and the resulting tagged message is then forwarded to the message format engine 546. The message format engine reconstructs the message body and header according to the local or external message protocol and then sends it back to the PTE egress queue 550. The processed (translated) message is carried back onto the shared bus 508 to the channel engine 520.

As shown in fig. 6a and 6b, the various hardware modules of each MA may be essentially divided into three groups, of which the above-described control plane module group 502 and data plane module group 504 interface with and utilize services provided by the service plane module group 506. In this regard, the set of service plane modules includes a set of service modules for both the set of control plane modules and the set of data plane modules. An example of a service module is an external time source, such as a GPS (global positioning system) card. Such a service module may be used by any other hardware module to obtain an accurate time stamp. For example, each frame or message routed through the data plane may be time stamped as it enters and/or exits the MA. This embedded timestamp information can later be used to perform latency measurements.

As a result, external latency calculations, for example, involve correlating the embedded timestamps from the data stream with the timestamps measured when the frames entered the MA. Then, by tracking this external latency over time, the MA is able to establish the latency trend and detect any drift in the external latency and embed this information back into the data stream. This latency drift can then be exploited by downstream nodes in the messaging path or by subscribing applications to make business-level decisions and gain competitive advantage.

To track latency and other messaging system statistics, the MA has one or more storage devices. These storage devices hold temporary data such as statistics obtained from different hardware components, networking and messaging traffic profiles, and the like. In one implementation, the one or more storage devices include a flash memory device that holds initialization data for MA startup (boot or reboot). For this purpose, such a non-volatile memory device contains a core and a root ram disk, which are necessary for the boot operation of the management module; and preferably also default, start-up and run configurations.

Such non-volatile memory may also hold encryption keys, digital signatures, and certificates used to manage the secure transmission of messages. In one example, the SSL (secure sockets layer) protocol uses a public and private key (asymmetric) encryption system, which also includes the use of digital certificates. Similarly, PKI (public key infrastructure) enables users of public networks such as the internet to achieve secure and private data exchange through the use of public and private cryptographic key pairs obtained and shared through trusted authorities.

The hardware module may be described with respect to the functionality provided by the hardware module, as shown in fig. 6 e. Functional aspects of the messaging appliance are a network management stack 602, physical interface management 606, system management services 614, timestamp services 624, a messaging layer 608, and a protocol translation engine 618 in the edge messaging appliance. These functional aspects are associated with hardware modules, as described below.

For example, the network management stack (602) runs on the management module (512). The TCP/UDP/IP stack (604) is part of an operating system running on the CPU of the management module. NTP, SNMP, Syslog, HTTP/HTTPs web servers, Telnet/SSH CLI services are all standard network services running on top of the OS.

A system management service (614) also runs on the management module (512). These system management services manage the interface between the network management stack and the messaging components, including configuring and monitoring the system.

The timestamp service (624) may be distributed to multiple hardware components. Any hardware component that requires accurate time stamping (including the management module) includes a time stamping service that interfaces with service plane hardware module time resources.

Buses 616a and 616b are logical buses connecting logical/functional modules as opposed to hardware or software buses connecting hardware and software modules.

The TVA message layer (610) is distributed between the management module and a message routing engine (518) running on a message processing unit (516). The management messages are delivered locally to a management message engine running on a management module (512). The message routing engine (620) runs on a routing engine microengine on the message processing unit (516). The messaging layer (612) runs primarily on the channel engine microengines (520). In some cases, a portion of the channel transmission logic is implemented on some of the transmission-aware PICs 514 a-d. In one embodiment of the invention, such a transport-aware PIC may be a TCP offload engine interface that performs TCP termination. As a result, the partial channel transmit logic is performed on the PIC, as opposed to being performed on the channel engine. Messages Rx and Tx are distributed between the channel engine and the message routing engine because there are two microengines communicating with each other. The protocol translation engine (618) is represented by a PTE (510) of the optional edge MA.

In general, the present invention provides a new method for messaging and, more particularly, a new publish/subscribe middleware system having a hardware-based messaging appliance that plays an important role in improving the efficiency of the messaging system. Although the present invention has been described in considerable detail with reference to certain preferred versions thereof, other versions are possible. Therefore, the spirit and scope of the appended claims should not be limited to the description of the preferred versions contained herein.

Claims

1. A hardware-based messaging appliance in a publish/subscribe middleware system, comprising:

an interconnect bus; and

hardware modules interconnected via the interconnection bus, the hardware modules being divided into a plurality of groups, a first group being a control plane module group for handling message passing device management functions, a second group being a data plane module group for handling message routing functions alone or in addition to message transformation functions, and a third group being a service plane module group for handling service functions utilized by the first and second groups of the hardware modules.

2. A hardware-based messaging appliance as in claim 1, wherein the messaging appliance management functions include configuration and monitoring functions.

3. A hardware-based messaging appliance as in claim 2, wherein the configuration function comprises configuring the publish/subscribe middleware system.

4. A hardware-based messaging appliance as in claim 1, wherein the message routing function includes message forwarding and routing performed by dynamically selecting a message transmission protocol and a message routing path.

5. A hardware-based messaging appliance as in claim 1, wherein the service functions include a time source and a synchronization function.

6. A hardware-based messaging appliance as in claim 1, wherein the set of control plane modules includes a management module and one or more logical configuration paths.

7. A hardware-based messaging appliance as in claim 6, wherein the management module incorporates one or more Central Processing Units (CPUs) in a computer, a blade server, or a host server.

8. A hardware-based messaging appliance as in claim 7, wherein the CPU in the management module executes program code under any operating system including Linux, Solaris, Unix, and Windows.

9. A hardware-based messaging appliance as in claim 6, wherein each logical configuration path is one of a plurality of paths, wherein a first path is established via a Command Line Interface (CLI) over a serial interface or a network connection, and a second path is established with management messages routed through the publish/subscribe middleware system.

10. A hardware-based messaging appliance as in claim 9, wherein the logical configuration path is used for configuration information, and wherein the management message contains such configuration information including one or more of: syslog configuration parameters, Network Time Protocol (NTP) configuration parameters, Domain Name Server (DNS) information, remote access policies, authentication methods, publish/subscribe authorization, and message routing information.

11. A hardware-based messaging appliance as in claim 10, wherein the message routing function is neighbor-based and the message routing information indicates connectivity to each neighboring messaging appliance or application programming interface.

12. A hardware-based messaging appliance as in claim 10, further comprising a memory in which the configuration information is stored for later retrieval during reboot if the configuration information is persistent.

13. A hardware-based messaging appliance as in claim 12, wherein the stored configuration information has associated therewith a configuration identification used to determine whether the configuration information is current or needs to be replaced by updated configuration information.

14. A hardware-based messaging appliance as in claim 1, wherein the messaging appliance management functions further include a health monitoring function and a state change event monitoring function, both of which become active after start-up or reboot is ongoing or completed.

15. A hardware-based messaging appliance as in claim 14, wherein the state change event monitoring function detects events including API (application programming interface) registration, messaging appliance registration, and subscription and unsubscribe events.

16. A hardware-based messaging appliance as in claim 1, wherein the messaging appliance management functions further comprise the function of uploading a firmware image to the hardware module.

17. A hardware-based messaging appliance as in claim 16, wherein the function of uploading the firmware image comprises verifying the firmware image.

18. A hardware-based messaging appliance as in claim 9, further comprising physical interfaces, one or more of which are dedicated to handling management message traffic associated with the messaging appliance management function and the remaining physical interfaces are available for data message traffic such that management message traffic is not intermixed with data message traffic and physical interfaces for data message traffic are not overloaded.

19. A hardware-based messaging appliance as in claim 1, further comprising a messaging channel, wherein the messaging appliance management functions further comprise functions to monitor subscription tables and statistics associated with the messaging channel.

20. A hardware-based messaging appliance as in claim 19, wherein the statistical data is monitored to determine whether to switch from one channel to another, and in the case of a slower client being found, whether to move the slower client to a channel optimized for the client.

21. A hardware-based messaging appliance as in claim 1, wherein the set of data plane modules includes one or more Physical Interface Cards (PICs) and a Message Processing Unit (MPU) for controlling the PICs.

22. A hardware-based messaging appliance as in claim 21, further comprising a serial port providing access to the management module to allow for a Command Line Interface (CLI).

23. A hardware-based messaging appliance as in claim 21, wherein the PIC processes frames having one or more messages.

24. A hardware-based messaging appliance as in claim 21, further comprising a global routing table, a copy of a portion or all of which is sent to a forwarding memory associated with each PIC.

25. A hardware-based messaging appliance as in claim 24, wherein the message routing function involves a topic-based routing table lookup in the forwarding memory.

26. A hardware-based messaging appliance as in claim 15, wherein the topic-based routing table lookup identifies one or more paths for messages between two PICs or between one PIC and itself.

27. A hardware-based messaging appliance as in claim 1, wherein the set of service plane modules includes an external time source accessible by any of the hardware modules to obtain a timestamp.

28. A hardware-based messaging appliance as in claim 27, wherein the timestamp is embedded in the message and later used to evaluate latency.

29. A hardware-based messaging appliance as in claim 28, further comprising a non-volatile memory for accumulating over time a message traffic profile characterized by statistics including the latency, the accumulated message traffic profile establishing a trend indicative of latency drift if the latency drift materializes.

30. A hardware-based messaging appliance as in claim 29, further comprising non-volatile memory for holding encryption keys and certificates for security.

31. A hardware-based messaging appliance as in claim 1, configured as an edge messaging appliance or a core messaging appliance, wherein the edge messaging appliance has a Protocol Translation Engine (PTE) for translating between external and native message protocols.

32. A hardware-based messaging appliance in a publish/subscribe middleware system, comprising:

an interconnect bus;

a management module having a management service and a management message engine interfaced with each other, the management module configured to handle configuration and monitoring functions;

a message processing unit having a message routing engine and a media exchange fabric, and a channel engine interfacing therebetween, the message processing unit configured to process a message routing function;

one or more Physical Interface Cards (PICs) for processing messages received or routed by the hardware messaging devices and destined for or leaving the management module and the message processing unit;

a service module comprising a time source, wherein the management module, the message processing module, the one or more PICs, and the service module are interconnected via the interconnection bus.

33. A hardware-based messaging appliance as in claim 32, further comprising a non-volatile memory for holding configuration information and a temporary message storage area maintained in the memory of the message processing unit.

34. A hardware-based messaging appliance as in claim 32, further comprising, for each of the PICs, a memory having a storage area for holding any portion of a global system routing table.

35. A hardware-based messaging appliance as in claim 32, wherein external connectivity is fabric-agnostic, and thus the PIC and media exchange fabrics may be of any fabric type.

36. A hardware-based messaging appliance as in claim 32, further comprising a serial port for a command line interface.

37. A hardware-based messaging appliance as in claim 32, further comprising a Protocol Translation Engine (PTE) for translating between external and native message protocols.

38. A hardware-based messaging appliance as in claim 37, configured as an edge messaging appliance or a core messaging appliance, wherein the edge messaging appliance comprises the PTE.

39. A hardware-based messaging appliance as in claim 37, wherein the PTE includes a pipelining engine including message parsing, message rule lookup, message rule application, and message format engines, and message ingress and egress queues, and wherein the PTE is connected to the interconnect bus.

40. A hardware-based messaging appliance as in claim 32, wherein the message routing function is performed by dynamically selecting a message transmission protocol and a message routing path.

41. A hardware-based messaging appliance as in claim 32, wherein the channel engine includes a plurality of transport channels and a channel management module for processing incoming and outgoing messages.

42. A hardware-based messaging appliance as in claim 41, wherein the channel management module includes a message buffering module for temporarily buffering received messages, a channel scheduler for prioritizing transmission channels, and a protocol switch for determining protocol translation requirements.

43. A hardware-based messaging appliance as in claim 41, wherein each of the plurality of transmission channels has message ingress and egress queues, the size of the queues being used as a criterion to activate message flow control.

44. A hardware-based messaging appliance as in claim 43, wherein channel capability is considered a high threshold and a lower value is considered a low threshold, the message flow control being enabled when the size of the queue approaches the high threshold and disabled when the queue size shrinks below the low threshold.

45. A system having a publish/subscribe middleware architecture, comprising:

one or more messaging devices configured for receiving and routing messages, each messaging device having an interconnection bus and hardware modules interconnected via the interconnection bus, the hardware modules being divided into a plurality of groups, a first group being a group of control plane modules for handling messaging device management functions, a second group being a group of data plane modules for handling message routing functions, and a third group being a group of service plane modules for handling service functions utilized by the first and second groups of the hardware modules;

an interconnect dielectric; and

a setup and management device linked via the interconnection medium, configured to exchange management messages with each messaging device,

wherein each messaging appliance is further configured to perform routing of messages by dynamically selecting a message transmission protocol and a message routing path.

46. A system as recited in claim 45, wherein the messaging appliances include one or more edge messaging appliances and core messaging appliances.

47. The system of claim 46, wherein each edge messaging appliance includes a protocol translation engine for translating incoming messages from an external protocol to a native protocol, and for translating routed messages from the native protocol to the external protocol.

48. A hardware-based messaging appliance as in claim 1, operable as an embedded component in a switching or routing device.

49. The system of claim 45, wherein one or more of the messaging appliances are interconnected to provide network non-mediation.