HK1115926B

HK1115926B - Synchronous message queues

Info

Publication number: HK1115926B
Application number: HK08106244.3A
Authority: HK
Inventors: Gary Hayato Ogasawara; Jonah Schwartz; David Stone
Original assignee: Cloudian Holdings Inc.
Priority date: 2004-03-31
Filing date: 2005-03-25
Publication date: 2010-08-06

Description

Synchronized message queues

Technical Field

[001] Embodiments of the present invention relate generally to communications. More particularly, the embodiments relate to a system and method for efficiently storing messages in a message relay server.

Background

[002] As the internet and wireless communications become more prevalent, network designers and service providers face a number of performance concerns. One particular concern relates to message routing. A message router receives messages from a sender host and forwards the messages to one or more destination hosts. This process of receiving and sending messages, sometimes referred to as a transaction, is an important component of any networking architecture. It is not uncommon for a high performance message router to be required to complete hundreds of transactions per second.

[003] Conventional messaging systems use a store-and-forward model for message routing. In this approach, when a router receives a message, the message is stored in a non-volatile memory ("NVM") so that the content can be saved when no power is provided to the memory. Examples of NVMs include, but are not limited to, electrically erasable programmable read-only memory ("EEPROM") and magnetic disks. Storing the message to the NVM enables recovery of the message in the event of a system crash or power failure. The junior techniques of reading from and writing to NVM are relatively slow and create a performance bottleneck in the network.

Disclosure of Invention

A method for storing messages according to the invention comprises: receiving a write request; adding the write request to a batch of substantially contiguous disk writes; determining to write the batch of substantially contiguous disk writes to non-volatile memory; writing the batch of substantially contiguous disks to the non-volatile memory; sending a confirmation to write the batch of substantially contiguous disk writes; receiving an acknowledgment to the write acknowledgment; clearing the batch of substantially contiguous disk writes, wherein, based on system throughput, one of the following criteria is selected for the determination to write the batch of disk writes:

● determining that a predetermined period of time has elapsed since the previous write of the batch of disks to non-volatile memory;

● determining that the predetermined size of the batch of disc writes has been exceeded;

● determining that the batch of disk writes contains a plurality of contiguous write requests;

● determining that the size of the batch of disk writes exceeds a predetermined maximum number of bytes;

● determining that the number of write requests stored in the batch of disk writes exceeds a predetermined maximum number;

● determining that a predetermined maximum time to collect the batch of disc writes has been exceeded;

● determining that a predetermined maximum time since the last write request was received has been exceeded;

● determining that the write request exceeds a maximum allowed page offset in the batch of disk writes; or

● determining that the write request exceeds the maximum allowed byte offset in the batch of disk writes.

An apparatus for storing messages according to the present invention, the apparatus comprising: means for receiving a write request; means for adding write requests to a batch of substantially contiguous disk writes; means for determining that it is possible to write the batch of substantially contiguous disk writes to non-volatile memory; means for writing the batch of substantially contiguous disks to the non-volatile memory; means for sending an acknowledgment of writing the batch of substantially contiguous disk writes; means for receiving an acknowledgement of the write acknowledgement; means for purging the batch of substantially contiguous disk writes, wherein one of the following criteria is selected for the determination to write the batch of disk writes based on system throughput:

A method for storing messages according to the invention comprises: obtaining, by a first thread or process, control of a shared mutex lock; sending a first write request from a first thread or process to a synchronization process; queuing the first write request by a synchronization thread or process; sending an acknowledgement of receipt of the first write request to the first thread or process; releasing, by the first thread or process, control of the shared mutex lock upon receipt of the acknowledgment of receipt of the first write request; obtaining, by a second thread or process, control of the shared mutex lock; sending a second write request from a second thread or process to the synchronous process; queuing, by the synchronization process, the second write request and the first write request if the first and second write requests are to substantially contiguous memory locations; sending an acknowledgement of receipt of the second write request to the second thread or process; releasing, by the second thread or process, control of the shared mutex lock upon receipt of the acknowledgment of receipt of the second write request; writing the queued first and second write requests to the non-volatile memory; sending an acknowledgement of the write to the non-volatile memory to each of the first thread or process and the second thread or process; sending an acknowledgement from each of the first thread or process and the second thread or process of receipt of the write acknowledgement; the method is restarted if an acknowledgement of receipt of the write acknowledgement is received from each of the first thread or process and the second thread or process.

Preferably, writing the queued first and second write requests to the non-volatile memory comprises: determining whether a predetermined condition has occurred; the queued first and second write requests are written to the non-volatile memory if the predetermined condition has occurred.

Preferably, the predetermined condition includes: the predetermined period of time is exceeded.

Preferably, the predetermined condition includes: the size of the queued write requests exceeds a predetermined maximum number of bytes.

Preferably, the predetermined condition includes: the number of queued write requests exceeds a predetermined maximum number.

Preferably, the predetermined condition includes: the predetermined maximum time to collect queued write requests has been exceeded.

Preferably, the predetermined condition includes: the predetermined maximum time since the last receipt of a write request has been exceeded.

Preferably, the predetermined condition includes: the new write request exceeds the maximum allowed page offset from the queued write request.

Preferably, the predetermined condition includes: the new write request exceeds the maximum allowed byte offset from the queued write request.

An apparatus for storing messages according to the present invention, the apparatus comprising: means for obtaining control of a shared mutex lock by a first thread or process; means for sending a first write request from a first thread or process to a synchronous process; means for queuing the first write request by the synchronous thread or process; means for sending an acknowledgement of receipt of the first write request to the first thread or process; means for releasing control of the shared mutex lock by the first thread or process upon receiving an acknowledgement of receipt of the first write request; means for obtaining control of the shared mutex lock by a second thread or process; means for sending a second write request from a second thread or process to the synchronous process; means for queuing, by the synchronization process, the second write request and the first write request if the first and second write requests are to substantially contiguous memory locations; means for sending an acknowledgement of receipt of the second write request to the second thread or process; means for releasing control of the shared mutex lock by the second thread or process upon receiving an acknowledgement of receipt of the second write request; means for writing the queued first and second write requests to the non-volatile memory; means for sending an acknowledgement of the write to the non-volatile memory to each of the first process and the second process; means for sending an acknowledgement of receipt of the write acknowledgement from each of the first process and the second process; means for restarting the method if an acknowledgement of receipt of the write acknowledgement is received from each of the first process and the second process.

Preferably, the means for queuing the first write request comprises: the first write request is stored in a batch of files.

Preferably, the means for queuing the second write request comprises: the second write request is stored in the batch of files.

Preferably, the first and second write requests that are queued are written upon the occurrence of a predetermined condition.

Preferably, wherein the predetermined condition comprises: the predetermined period of time is exceeded.

Preferably, wherein the predetermined condition comprises: the size of the queued write requests exceeds a predetermined maximum number of bytes.

Preferably, wherein the predetermined condition comprises: the number of queued write requests exceeds a predetermined maximum number.

Preferably, wherein the predetermined condition comprises: the predetermined maximum time to collect queued write requests has been exceeded.

Preferably, wherein the predetermined condition comprises: the predetermined maximum time since the last receipt of a write request has been exceeded.

Preferably, wherein the predetermined condition comprises: the new write request exceeds the maximum allowed page offset from the queued write request.

Preferably, wherein the predetermined condition comprises: the new write request exceeds the maximum allowed byte offset from the queued write request.

An apparatus for storing messages according to the present invention comprises: a gateway; a queue data file coupled to the gateway; an index file coupled to the gateway; and a device for storing messages as described in the first aspect of the invention.

A method for storing messages according to the invention comprises: receiving a write request; determining whether to add the write request to a batch of outstanding requests; if it is determined that the write request is to be added to the batch, adding the write request to the batch; determining whether to write the batch to non-volatile memory; if it is determined that the batch is to be written to non-volatile storage, determining whether the batch is memory mapped to non-volatile storage; writing the batch to non-volatile storage if it is determined that the batch is memory mapped to non-volatile storage; sending an acknowledgment of writing to the batch to the process associated with each write request in the batch; determining whether all acknowledgments to write the batch have been received; if all the acknowledgments to write to the batch have been received, clearing the batch; determining whether any write requests are skipped or stored in temporary memory; if it is determined that there are no skipped or stored write requests, it loops back to receive the next write request.

Preferably, the method further comprises the following steps: if it is determined that outstanding write requests are in the temporary storage, the outstanding write requests are added to a new batch of outstanding requests.

Preferably, the method further comprises the following steps: storing the write request in the temporary memory if it is determined that the write request is to be stored in the temporary memory; if it is determined that the write request is to be skipped, a loop is made back to receive the next write request.

Preferably, the method further comprises the following steps: if it is determined that the batch is not to be written to non-volatile storage, then a loop is made back to receive the next write request.

Preferably, the method further comprises the following steps: if it is determined that the batch is not memory mapped to the non-volatile memory, the batch is memory mapped to the non-volatile memory before writing the batch to the non-volatile memory.

Preferably, the method further comprises the following steps: if it is determined that the saved write request is in temporary memory, then it is determined whether to write the saved write request to non-volatile memory.

An apparatus for storing messages according to the present invention, the apparatus comprising: a first device that receives a write request; second means for determining whether to add the write request to a batch of outstanding requests; third means for adding the write request to the batch if it is determined that the write request is to be added to the batch; fourth means for determining whether to write the batch to the non-volatile memory; fifth means for determining if the batch is memory mapped to non-volatile memory if it is determined that the batch is to be written to non-volatile memory; sixth means for writing the batch to the non-volatile memory if it is determined that the batch is memory mapped to the non-volatile memory; seventh means for sending an acknowledgment of writing to the batch to the process associated with each write request in the batch; eighth means for determining whether all acknowledgments to write the batch have been received; ninth means for clearing the batch if all acknowledgments to write to the batch have been received; tenth means for determining if any write requests are skipped or stored in the temporary storage and looping back to receive the next write request if no write requests are determined to be skipped or stored.

Preferably, wherein the apparatus further comprises: twelfth means for adding the outstanding write requests to a new batch of outstanding requests if it is determined that the outstanding write requests are in the temporary storage.

Preferably, wherein the apparatus further comprises: thirteenth means for storing the write request in the temporary storage if it is determined that the write request is to be stored in the temporary storage; fourteenth means for looping back to receive a next write request if it is determined that the write request is to be skipped.

Preferably, wherein the apparatus further comprises: fifteenth means for looping back to receive a next write request if it is determined that the batch is not to be written to non-volatile storage.

Preferably, wherein the apparatus further comprises: sixteenth means for mapping the batch to the non-volatile memory prior to writing the batch to the non-volatile memory if it is determined that the batch is not memory mapped to the non-volatile memory.

Preferably, wherein the apparatus further comprises: seventeenth means for determining whether to write the saved write request to the nonvolatile memory if it is determined that the saved write request is in the temporary memory.

Drawings

[004] Fig. 1 is a block diagram of a communication system in accordance with one embodiment of the present invention.

[005] Fig. 2 is a block diagram of the gateway server of fig. 1 according to an embodiment of the present invention.

[006] FIG. 3 is a flow diagram of a method of restoring a queue index file, according to one embodiment of the invention.

[007] Fig. 4 is a flow diagram of an example of a method of processing streaming data according to one embodiment of the invention.

[008] Fig. 5 is a flow chart of an example of a method of processing streaming data according to an alternative embodiment of the present invention.

[009] FIG. 6 is a process flow diagram of a process of determining whether a write request should trigger a batch of writes, according to one embodiment of the invention.

Detailed Description

[010] Embodiments of the present invention may combine several techniques to minimize the performance bottleneck that results from writing data to non-volatile memory. These techniques may include writing data contiguously and/or substantially contiguously to disks in order to take advantage of the contiguous layout of most modern hard disk systems. The techniques may also include performing write requests in batches to minimize the total number of write requests to the disk system, and using a separate synchronization helper process to have synchronous disk writes performed asynchronously. These techniques may also include minimizing data to be synchronously written to the disk; and minimizes the cost of removing data from the queue.

[011] According to one embodiment of the invention, one or more files of sufficient size may be created in non-volatile memory (NVM) and memory mapped to volatile memory. Messages and/or data to be queued may be copied into a memory mapped region of one of the files, which may be referred to as a queue data file. The index data structure may be maintained in volatile memory associated with the queue data file, and entries in the index data structure may be used to record locations in the queue data file where messages/data have been written. Generally, messages/data may be written to sequential regions in a queue data file in order to reduce write latency. The processing of the messages/data may involve manipulating the index data structure and accessing the messages/data in the memory-mapped queue data file area, which essentially leaves the queue data file physically untouched after the messages/data are initially written to the queue data file. If for some reason a message cannot be processed immediately, the message may be ignored and processed later or moved to a secondary queue. To help minimize data loss and improve recovery efficiency, the index data structure may be periodically written to non-volatile memory (NVM).

[012] Similarly, in the current embodiment of the present invention, to prevent data loss and to ensure sequential disk writes, queue data file writes may be synchronously written to disk. Because writing data synchronously to disk is incompatible with asynchronous input/output ("I/O") environments and generally significantly reduces write performance, a synchronous helper process may be used to batch and sort queue data file write requests. The synchronization helper process may batch and sort the queue data file write requests such that the memory mapped regions in the queue data file are synchronized (i.e., physically written) to disk in an efficient and reliable manner.

[013] Fig. 1 is a block diagram of a communication system in accordance with one embodiment of the present invention. In fig. 1, the system 10 may have a terminal 14 coupled to a gateway server 12a and a terminal 18 coupled to a gateway server 12 b. According to the present embodiment, terminal 14 may be a sender host, such as a mail client terminal, and terminal 18 may be a destination host, such as a mail server. For example, the terminals 14, 18 may include computers, workstations, Personal Digital Assistants (PDAs), landline and wireless telephones. The sender host 14 may communicate with a gateway server 12a (e.g., a mail gateway), the destination host 18 may communicate with a gateway server 12b (e.g., a mail gateway), and the servers 12a, 12b may communicate with each other via the network 16. Network 16 may include the Internet, a Local Area Network (LAN), and/or a Wide Area Network (WAN). It should be appreciated that gateway servers 12a and 12b may be servers, routers, switches, and the like. It should also be appreciated that system 10 may include other components, devices, and/or systems that are omitted.

[014] Although certain examples will be described herein with reference to the routing of messages to destination hosts, embodiments of the invention are not so limited. Indeed, the principles described herein may be readily applied to any type of incoming data without departing from the spirit and scope of embodiments of the present invention. For example, images, sound files, and other types of data may also benefit from the principles described herein. Still, there are many aspects of messaging that are well suited for embodiments of the present invention. It should also be noted that a message may be destined for multiple recipients (not shown) and/or destination hosts, with each destination host serving recipients connected to a particular receiver host.

[015] It should be noted that traditional internet protocol ("IP") routing of data packets is designed to tolerate a certain amount of data loss, while routing messages, such as email messages, require a higher level of reliability. As a result, message routers or servers are generally less tolerant of data loss and have traditionally used the store-and-forward model discussed above. Although certain embodiments will be described with reference to one or more of the above-described protocols, it should be noted that embodiments of the invention are not so limited. Indeed, any protocol may be used in which current storage technologies may be implemented.

[016] Fig. 2 is a block diagram of the gateway server of fig. 1, according to one embodiment of the present invention. In fig. 2, the gateway server 12a may include a multimedia messaging gateway ("MMG") 20, which may be coupled to the terminal 14 and the network 16. MMG20 may also be coupled to disk queue 22, and more specifically to queue data files 24 and index files 26 in disk queue 22. The MMG20 may be further coupled to a synchronization component 28, which in turn may be coupled to a queue data file 24. The synchronization component 28 may be implemented as a process that batches and sorts queue data file 24 write requests so that memory mapped regions in the queue data file 24 may be efficiently and reliably synchronized (i.e., physically synchronously written) to disk files for which the queue data file 24 is memory mapped.

[017] In FIG. 2, MMG20 may store all outgoing and incoming messages (e.g., email messages) in disk queue 22 before they are delivered. For example, in the present embodiment, write disk queue 22 (which may include queue data file 24 and index file 26) is sequential; each write may be synchronized with disk queue 22 before validation; and MMG20 may restore disk queue 22 at startup to resume queue activity. Each message in queue data file 24 may be memory mapped to memory and shared by all active processes on MMG 20. Each message in the queue data file 24 may have an associated index file 26 from which queue processing state may be recovered when a system crash occurs. The index file 26 is interchangeably referred to hereinafter as the queue index file 26. In addition, a limited number of messages may be queued and new messages may be rejected if the queue data file 24 is full. The number of index entries may be limited by the primary memory, disk size, and the size of the queue file. Index file 26 may also be memory mapped into MMG20 and shared by all other MMG20 processes.

[018] According to this embodiment, each message in the queue data file 24 may have an entry in the index file 26. The index file 26 may be created at system installation time to the maximum allowable size on the dedicated disk partition. The index file 26 may be memory mapped by the MMG20 and synchronization with the disk file may occur periodically (e.g., every 10 seconds). Periodic synchronization of index file 26 to non-volatile memory (NVM) may be performed asynchronously and MMG20 does not typically wait to confirm that the write has completed successfully. Periodic synchronization may result in duplicate messages being delivered, but will not lose any messages if the system crashes before synchronization is complete. The index file 26 may be comprised of a "queue _ index" structure, which in turn may include an array of "index _ entry" structures. This may also be a mutex variable that may act as a shared mutex lock and may be shared by all running MMG20 processes that access the index. The mutex variable may serve to prevent the queue index from being changed by more than one MMG20 process at the same time. According to the present embodiment, mutual exclusion may be obtained prior to any read or write of header data, and prior to locking an index entry. The index file 26 may help provide efficient access to the queue data. In particular, the index file 26 may maintain pointers to the locations of queue messages. When a message is deleted, the index file 26 may be updated to delete that entry, thereby negating the need for physical modification of the queue data file 24. As a result, "cost," that is, the time and processing resources required to remove data from disk queue 22 may be minimized. In addition, the index file 26 may maintain a limited amount of state information, e.g., retry data that may be frequently used by the MMG. Retry data for a message may be retrieved and checked from the index file 26 faster and more efficiently than from the queue data file 24.

[019] According to an embodiment of the invention, the data structure of the index file 26 may be defined as follows:

struct queue_index{

int head; // index of header of LL in use

int tail; // index of the trailer of LL in use

// last entry for write

int free; // header of free list

int num_entries；

time_tlast_sync；

struct index_entry[]；

}

struct index_entry{

int offset; // offset of queue file

int len; // length of data in queue file

int lock; // pid of the process that has locked the entry

/or 0 if unlocked

A byte type; i/type of index entry, e.g. SMS, SMTP

int status；

int checksum；

int retry_count；

time _ t last _ retry; // in unix time, date of last retry

int prev; // index of previous entry in linked list

int next; // index to the next entry in the linked list

}

It should be noted that the data structure of the index file 26 described above is only one possible representation and should in no way be construed as limiting the possible alternative embodiments that are now contemplated. For example, alternative embodiments are contemplated in which fewer fields, additional fields, and combinations of the two may be included, and additional fields may be included.

[020] To ensure that writing data from an index _ entry to a disk is a single yard operation, a single index _ entry should not span more than one disk block. One possible solution to this problem is to add padding (padding) to the index _ entry structure to ensure that its size is a factor of the disk block size of the operating system. For example, if the disk block size is 4096 bytes, the index _ entry data structure may be extended to a size of a factor of 4096 bytes, e.g., 32 bytes. If, for example, each entry in index file 26 takes up 32 bytes, then a 32 megabyte index file would be required to support one million index entries.

[021] After a system crash, the index file 26 and queue data file 24 may be read by the MMG20 to restore the queue processing state that existed prior to the system crash.

[022] Each MMG20 queue may use one or more dedicated data files and a dedicated index file. Each data file may be created on a single disk partition, typically with no other disk files in that partition. As with the queue index file 26, the queue data files 24 may be created at system installation to their respective maximum allowable sizes. This helps ensure that queue data file 24 is made up of contiguous disk blocks and makes it possible for contiguous disks to write to queue data file 24. Multiple queue files may be placed on separate physical disks to improve parallelism. The queue data file and queue index file may be placed on separate physical disks to improve parallelism. During initialization of MMG20, queue data file 24 may be memory mapped by MMG20 to store messages. After each message is added to the queue data file 24, the disk and memory files may be synchronized to prevent data loss.

[023] A single queue data file size may be limited to, for example, 2GB or more, but 4GB is the maximum file size, where the offset (offset) may be represented using a 32-bit unsigned integer. Sizes of queue data files in excess of 4GB are possible, but require that the queue entry location (queue data file offset) be represented using a long integer (e.g., 64 bits). If the average incoming message size is 4k octets, then a 2GB queue data file 24 can store a maximum of about 500k (k 1024) messages. Header information may also be written to the queue data file 24 in addition to the actual message content. Such header information may include the message number, creation date, and size of the message being written.

[024] According to one embodiment of the invention, each message may be written to queue data file 24 in a certain order along with the encapsulation information. Because the queue recovery process may not always be able to rely on the index file 26 being completed, typically the queue data file 24 should include enough information to reconstruct the index entry for that message. Thus, the queue data file 24 entry format may include, for example:

Signature	Char[]	unique signature (signature) indicating the start of a new message
Signature	Char[]		Date	Int	Date messages are added to the queue. As seconds since the start of the signal
Message type	byte	Type of message SMS, SMTP, etc	Date	Int

Data length	Int	Length of transmitter data
Data length	Int	Length of transmitter data	Message Data	Char[]	Message data
Checksum	Int	Checksum of message data	Message Data	Char[]	Message data

[025] According to one embodiment of the invention, a queue index file recovery process may be included because the queue index file 26 is typically written to disk asynchronously. To facilitate the recovery process, the queue index file 26 (shown in the data structure described above) may include a field "time _ t _ last _ sync" that may specify the last time the index file was written to disk. In general, time _ t _ last _ sync data can only be modified immediately before a file is synchronized to the disc. Each queue entry in the queue data file 24 may begin at an offset, which is typically at a predetermined boundary, such as a 256 byte boundary. The queue data file entry may be prepended, that is, pre-fixed, using a fixed 4-byte signature as the first 4 bytes of the entry, e.g., "\ 252\88\132\ 133". To ensure that it reflects the correct index information for the queue data file 24, the queue index file 26 may be validated using a validation process. For example, according to one embodiment of the present invention, one possible validation process may be implemented as described in the following pseudo-code process:

scan queue index file″inuse″linked list；

if loop discovered，break loop；

if any entry is locked，unlock；

if any entry is marked as free，move to″free″list；

scan queue index file″free″list；

if loop discovered，break loop；

if the entry is marked as″inuse″and the entry points to valid data，move

the entry to″inuse″list；

for any entries not on the″inuse″list or the″free″list；

if locked，unlock the entry；

if not marked free，and the entry points to valid data，move the entry to the

″inuse″list；

if not marked free，but the entry points to invalid data，move the entry to

the″free″list；

if free，move the entry to the″free″list.

[026] FIG. 3 is a flow diagram of a method for restoring information to a queue index file after a system crash, according to one embodiment of the invention. In FIG. 3, for possible entries of the queue index file 26, the records in the queue data file 24 may be scanned (310), for example, at predetermined boundary intervals, such as at 256 byte intervals. Whether an entry needs to be restored into queue index file 26 may be determined (320). Determining (320) that the entry needs to be restored may include determining whether the entry is valid. The entry may be determined to be valid if both of the following conditions are true: 1) the first 4 bytes match the fixed 4-byte header; and 2) the entry checksum is correct. Determining (320) whether an entry needs to be recovered may also include determining whether a valid entry is missing from the queue index file 26. A valid entry may be determined to be lost if either of the following conditions is true: 1) the date field of the entry is contained a date after the date contained in the last _ sync time field of queue index file 26, or 2) the entry is not in queue index file 26. Thus, an entry may be added 330 to the queue index file 26 if the entry is determined 320 to be both valid and missing. It may be determined whether there are more records in the queue data file 24 (340). If more records are determined (340) in the queue data file 24, the method may cause the scan to jump forward to the next record in the queue data file 24 (350). The distance to be skipped (350) is either equal to the predetermined boundary interval or, alternatively, equal to a distance approximately equal to the length field of the entry of the nearest boundary interval. In general, the distance to be skipped (350) may be a predetermined boundary interval regardless of whether an entry for recording is determined (320) to exist. However, if a record is determined (320) to exist, the distance to be skipped (350) may be determined by a length field of the entry, optionally approximately equal to the nearest boundary interval, which may provide some performance improvement over skipping (350) the predetermined boundary interval. The method may loop back to continue scanning (310) the queue data file 24, as described above. Otherwise, the method may terminate if no more records are determined (340) to be in the queue data file 24.

[027] The message data may have a specific format according to the message type. For example, according to one embodiment of the invention, the SMTP message data may have the following format:

Sender Length	Int	length of transmitter data
Sender Length	Int	length of transmitter data	Sender data	Char[]	Transmitter data
Recipient length	Int	Length of data at receiver	Sender data	Char[]	Transmitter data
Recipient length	Int	Length of data at receiver	Recipient data	Char[]	The receiver side data. Note that: more than one receiver may be included in this field
Message data length	Int	Length of message data	Recipient data	Char[]
Message data length	Int	Length of message data	Mes sage data	Char[]	Message data

Similarly, according to one embodiment of the invention, the SMS message may have the following format:

Sender MSISDN

Int

MSISDN of sender

Num Recipients	byte	Number of recipients
Num Recipients	byte	Number of recipients	Recipients	Int[]	An array of MSISDN numbers. Data of recipient
Message data length	Int	Length of message data	Recipients	Int[]	An array of MSISDN numbers. Data of recipient
Message data length	Int	Length of message data	Message data	Char[]	Message data

In addition, delivery instructions may be added to the message format, such as, for example, a mail server hostname, or an SMSC location, to prevent a user from being authorized twice (e.g., a lightweight directory access protocol ("LDAP") query). MMG20 enqueues the tasks. Returning to FIG. 2, in accordance with one embodiment of the present invention, disk queue 22 may include queue data files 24 (although other embodiments may be implemented using more than one queue data file) and index files 26. While in the present embodiment, queue data files 24 and index files 26 may be stored on physical disk, portions of queue data files 24 may also be memory mapped to primary storage to improve processing efficiency.

[028] The message is added to the queue. The task may add a message to the queue data file 24. The addition may be considered successful when receiving an acknowledgement that the message has been written to disc. In one embodiment of the invention, a message may be added when an entry in the queue index file 26 has been successfully allocated and when there is sufficient space in the queue data file 24 to accommodate the message. Otherwise, the message will be rejected. If the message is added, the contents of the message may be copied to the reserved space of the queue data file 24 and disk synchronization may be performed after the copy operation is completed on the queue data file 24.

[029] And detecting the availability of the storage. Generally, when performing queue storage availability detection, two cases are considered. In addition, it is important to know the current offset at which the queue data file 24 is being read (R) and written (W), the size of the to-be-reserved (S), and the maximum size (Q) of the queue data file 24.

[030] In this embodiment, R may correspond to the offset position of the first queue entry and W may correspond to the offset of the last queue entry plus the length of that entry. Thus, according to one embodiment of the invention, the data structure of the queue index may be defined as:

struct queue_index q_index；

[...]

R＝q_index.index_entry[q_index.head].offset；

W＝q_index.index_entry[q_index.tail].offset+

q_index.index_entry[q_index_tail].len；

it should be understood that all algorithms may be modulo the size of the queue data file 24, although details are omitted here for readability.

[031] In the first case, the current read offset R may be smaller than the current write offset W according to the present embodiment. The available space for storage may be denoted as S-Q-W, and if S is less than S, the allocation is successful. Otherwise, W may be reset or wrapped around to the beginning of queue data file 24, and the space availability check may be performed again in the circumstances described below.

[032] In the second case, according to the present embodiment, the current write offset W may be smaller than the current read offset R, and the available space for storage may be denoted as s-R-W. If S is less than S, then the queue entry at offset R may be skipped.

[033] Although it has been described before how to check for free storage space in the queue data file 24, there is no room for new message data if the read and write locations are too close. This occurs when the queue data file 24 is full, in which case there is no choice but to reject incoming messages. Fortunately, this possibility is very small. But more likely the space between read and write locations can be small because the message at the read location (q _ index. In this case, the old message may not be skipped by moving it from the head to the tail of the list. This may be achieved, for example, by the following code segments:

MMG_mutex_acquire(index_lock)

int tmp＝q_index.head；

q_ndex.head＝q_index.indexL_entry[q_index.head].next；

q_index.index_entry[q_ndex.head].prev＝-1；//NONE

q_index.index_entry[q_index.tail].next＝tmp；

q_index.index_entry[tmp].prev＝q_index.tail；

q_index.index_entry[tmp].next＝-1； //NONE

q_index.tail＝tmp；

MMG_mutex_release(index_lock)；

this operation has the effect of moving the write pointer (W) and the read pointer (R).

[034] And index entry allocation. According to one embodiment of the invention, index entries may be stored in an array. When the index is initially instantiated, a linked list of free index entries may be established. Free may point to the head of the linked list, for example. Each entry in the free list may use its index _ entry. The list may end with a-1. When a new index is needed, the process may first get index _ lock, then it may pop the first entry off the stack q _ index.

int new_entry＝q_index.free；

if(new_entry＝＝-1)return ERROR_NO_MORE_INDEX_ENTRIES；

q_index.free＝q_index.index_entries[q_index.free].next；

[035] And (5) processing the queue. According to one embodiment of the invention, a queue processes a process or thread, typically, sequentially fetching messages from a message queue in a root index. Each message may be locked for processing and then unlocked if it cannot be successfully processed. If the retry limit of a message is exceeded, the message may be deleted from the queue by moving its corresponding index entry from the in _ use list of the index file to the free list. The index entry may similarly be moved from the in _ use list to the free list after the message is successfully or unsuccessfully processed.

[036] According to embodiments of the invention, an in-memory index may be used to group all messages directed to the same host or domain. This allows many messages to be sent to a single host at a time.

[037] Queue file synchronization. According to one embodiment of the invention, queue file synchronization may be implemented using a synchronization command (e.g., msync ()) to ensure that data is successfully saved to disk. However, because the synchronization call msync () blocks all other threads from operating and blocks all access to the disk queue, a helper process (e.g., synchronization component 28, as shown in FIG. 2) may be used to handle synchronization. MMG20 may communicate with synchronization component 28 using a first-in, first-out ("FIFO") queue.

[038] FIG. 4 is a flow diagram of an example of a method of processing streaming data according to one embodiment of the invention. In fig. 4, a two-stage, batch-wise approach for synchronizing the assembly 28 that produces good parallelism is shown. The method may use 4 FIFO queues:

requestfifo 401-MMG 201 sends requests to synchronization component 28

receiptifio 401-synchronization component 28 acknowledging receipt of the request

Confirmfifo 403-synchronization component 28 acknowledges the completion of all received requests

confconfconfffifoo 404-MMG 20 acknowledges the acknowledgement of each received synchronization component 28.

[039] In FIG. 4, synchronization component 28 may listen for requestfifo401 for msync data commands. The command may be an offset of a 4 byte integer followed by a length of a 4 byte integer. The command may also be an offset of a 4 byte integer followed by a length of an 8 byte long integer. The offset may represent a location in the memory mapped file in the queue data file 24.

[040] In FIG. 4, an MMG thread, such as MMG thread 1, may acquire 410 the mutual exclusion and send 412 a synchronization ("sync") request to synchronization component 28 via requestfifo 401. MMG thread 1 may wait 414 for a receipt indicating that the sync request was received, and synchronization component 28 may queue 416 the received sync request and send 418 an acknowledgement to MMG thread 1 that the sync request has been received via receiptfifo 402. Upon receiving an acknowledgement that the request was received, MMG thread 1 may release (420) the mutex and wait (421) for an acknowledgement from configfifo 403 that the request was processed. More than one MMG20 may wait for configfifo 403 at a time.

[041] In FIG. 4, a second MMG thread, MMG thread 2, may acquire (422) the mutex after it is released (420) by MMG thread 1. MMG thread 2 can send 424 another sync request to synchronization component 28 via requestfifo401 and wait 426 for a receipt indicating that the sync request was received, and synchronization component 28 can queue 428 the received sync request and send 430 an acknowledgement that the sync request was received back to MMG thread 2. MMG thread 2 can release 432 the mutex and wait 433 for an acknowledgment of msync.

[042] In FIG. 4, the synchronization component 28 may batch requests as they are received. For example, synchronization component 28 may continue to batch sync requests until it has received a set of consecutive or substantially consecutive sync requests, or some predetermined event or timeout occurs. At this point, synchronization component 28 may stop accepting new sync requests on requestfifo401 and execute (434) all received sync requests. If the appropriate portion of the mapped file has already been memory mapped, then a simple msync may be invoked (434). If not, then the appropriate portion may be memory mapped and then msync may be invoked (434). Synchronization component 28 may send 436 out N sync acknowledgments to the affected threads (e.g., MMG thread 1 and MMG thread 2) and wait 438 for one or more acknowledgments to send the N sync acknowledgments. N is the number of MMG threads waiting for an acknowledgement. Although not shown, if a received sync request does not qualify as substantially contiguous with the current batch of write requests, the received sync request may be delayed until the start of the next batch, according to some configuration parameters, in which the relayed sync request may be processed.

[043] There may be a race condition where not every MMG thread has received a signal from the configifo 403 before additional MMG threads begin listening to the configifo 403. Thus, confconfconfffifo 404 may be used such that each MMG thread is required to acknowledge receipt of each acknowledgment. When each acknowledgement has been acknowledged, synchronization component 28 can once again begin receiving and acknowledging receipt of other sync requests. For example, MMG thread 1 may send (440) an acknowledgement of the transmitted (436) msync acknowledgement back to synchronization component 28, and MMG thread 2 may send (442) an acknowledgement of the transmitted (436) msync acknowledgement back to synchronization component 28. Synchronization component 28 may return 444 to continue receiving and queuing 416 msync requests from active threads.

[044] The timeout may be utilized in synchronization component 28 to handle situations where MMG20 stops running before it can acknowledge receipt of the acknowledgement via confconfconfffifo 404. Similarly, MMG20 may have a timeout so that it can detect when synchronization component 28 stops running and when it restarts.

[045] At low throughput, using a separate synchronization component may be slower than a simple synchronous request/response system, but the approach has a higher maximum throughput. This may result in fewer system calls msync, as each call will require msync to synchronize more data. The longer synchronization component 28 waits for a sync request to form a contiguous block, the higher the throughput that can be achieved, but this means that low throughput systems are less efficient and experience higher latencies. Thus, a number of configuration options may be assigned to the synchronization component 28 to fine tune the performance of the synchronization component 28. For example, the following configuration options may be set in the synchronization component: maximum batch size specified in bytes; a maximum batch size specified by the number of write requests; maximum batch collection time; maximum allowed time without new write requests; where the new write request will no longer be considered to be a substantially contiguous maximum offset (in pages or bytes).

[046] A memory mapping operation. Multiple regions may be memory mapped to queue data file 24 for writing and reading messages to and from queue data file 24. A predetermined size in octets may be used when mapping the area of interest and care should be taken when selecting the size so that the system does not perform remapping as frequently. In general, the size of the area to be mapped should be a multiple of the disc page size. It is important to note that each thread or process may have its own mapping of queue data files 24.

[047] For example, according to one embodiment of the invention, the method may be to have one mapping area for reading and another mapping area for writing. This means that the "read" area can be remapped when writing to the "write" area. Another important thing is to check the boundaries of each mapped region before doing anything to it. The synchronization component 28 may use its own memory map when performing disk synchronization tasks.

[048] According to another embodiment of the invention, a method may be used to maintain a pool of configurable sizes of memory mapped regions that may be locked for exclusive use by any individual thread or process, optionally remapped, read, or written by a thread and process, and then unlocked when the thread or process ends. In this embodiment, when multiple threads request a mapped region, a wait-queue of threads is maintained and then have the mapped region in the pool.

[049] Fig. 5 is a flow chart of an example of a method of processing streaming data according to an alternative embodiment of the present invention. In FIG. 5, a write request may be received (505) from a running process, and the method may determine (510) whether to add the write request to a batch of substantially contiguous disk writes. If it is determined (510) to add a write request to the batch of substantially contiguous disk writes, then the write request may be added (515) to the batch of substantially contiguous disk writes. The method may determine (520) whether to write the batch of substantially contiguous disk writes to non-volatile memory, and if it is determined to write the batch of substantially contiguous disk writes to non-volatile memory, may determine (525) whether the batch of substantially contiguous disk writes is memory mapped.

[050] In FIG. 5, if it is determined (525) that the batch of substantially contiguous disk writes is memory mapped, then the batch of substantially contiguous disk writes may be written (530) to non-volatile memory. Writes acknowledging the batch of substantially contiguous disk writes may be sent 535 to the process or processes from which the write request was originally received. The method may wait for an acknowledgement to the sent (535) acknowledgement to be received (540) from each of the one or more processes that sent (535) the write acknowledgement. If an acknowledgement to the sent 535 acknowledgement is received 540 from each of the one or more processes, the batch of substantially contiguous disk writes may be purged 545 and a determination 550 can be made as to whether there are any skipped write requests and/or whether the write requests are saved in temporary memory. If it is determined (550) that the write requests are stored in temporary storage, the method may loop to determine (510) whether to add one of the stored write requests to a batch of substantially contiguous disk writes and continue as described above. If it is determined (550) that no write requests are saved in temporary memory, the method may loop to receive (505) a new write request and continue as described above.

[051] In FIG. 5, if it is determined (510) not to add the received write request to the batch of substantially contiguous disk writes, then it may be determined (555) whether to skip or save the write request to temporary storage. If it is determined (555) to skip the write request, the method may loop to receive (505) a new write request and continue as described above. If it is determined (555) to save the write request to temporary storage, the method may save (560) the write request to temporary storage and loop to receive (505) a new write request and continue as described above.

[052] In FIG. 5, if it is determined (525) that the batch of substantially contiguous disk writes is not memory mapped using non-volatile memory, then the method may memory map (565) the write request using non-volatile memory. The method may write (530) the batch of substantially contiguous disk writes to non-volatile memory and continue as described above.

[053] Fig. 6 is a process flow diagram illustrating a process of determining whether stateless routing is appropriate according to one embodiment of the invention. In FIG. 6, a write request may be received (610) from a running process, and the method may add (620) the write request to a batch of substantially contiguous disk writes. The method may determine (630) whether to write the batch of substantially contiguous disk writes to non-volatile memory, and if it is determined (630) to write the batch of substantially contiguous disk writes to non-volatile memory, the batch of substantially contiguous disk writes may be written (640) to non-volatile memory. An acknowledgement of the write to the batch of substantially contiguous disk writes may be sent 650 to the process from which the write request was originally received. The method may receive (660) an acknowledgement to the sent (650) acknowledgement from each process that sent (650) a write acknowledgement. The batch of substantially contiguous disk writes may be purged (670) and the method may loop to receiving (610) a new write request and continue as described above.

[054] In FIG. 6, if it is determined (630) that the batch of substantially contiguous disk writes is not to be written, the method may loop to receiving (610) a new write request and continue as described above.

[055] Those skilled in the art will appreciate from the foregoing description that the broad techniques of the embodiments of the present invention can be implemented in a variety of forms. Therefore, while the embodiments of this invention have been described in connection with particular examples thereof, the true scope of the embodiments of the invention should not be so limited since other modifications will become apparent to the skilled practitioner upon a study of the drawings, the specification and the following claims.

Claims

1. A method for storing messages, comprising:

receiving a write request;

adding the write request to a batch of substantially contiguous disk writes;

determining to write the batch of substantially contiguous disk writes to non-volatile memory;

writing the batch of substantially contiguous disks to the non-volatile memory;

sending a confirmation to write the batch of substantially contiguous disk writes;

receiving an acknowledgment to the write acknowledgment; and

the batch of substantially contiguous disk writes is purged,

wherein, based on the system throughput, one of the following criteria is selected for the determination to write the batch of disk writes:

2. The method of claim 1, wherein receiving the write request comprises:

a write request is received for a memory location that is contiguous with a memory location in the batch of substantially contiguous disk writes.

3. The method of claim 1, wherein receiving the write request comprises:

a write request is received from a processing thread.

4. The method of claim 3, wherein receiving a write request from a processing thread comprises:

a write request is received from a processing thread located at a multimedia message gateway.

5. The method of claim 1, wherein writing the batch of substantially contiguous disk writes to the non-volatile memory comprises:

the batch of substantially contiguous disk writes is asynchronously written to the non-volatile memory.

6. The method of claim 1, wherein writing the batch of substantially contiguous disk writes to the non-volatile memory comprises:

determining whether the batch of substantially contiguous disk writes is memory mapped;

if the batch of substantially contiguous disk writes is not memory mapped, memory mapping the batch of substantially contiguous disk writes; and

the batch of substantially contiguous disk writes is written to the non-volatile memory.

7. The method of claim 1, further comprising:

determining whether there are outstanding write requests outstanding in the temporary storage; and

if there are outstanding unprocessed write requests in the temporary storage, then the outstanding unprocessed write requests are added to a new batch of substantially contiguous disk writes.

8. An apparatus for storing messages, the apparatus comprising:

means for receiving a write request;

means for adding write requests to a batch of substantially contiguous disk writes;

means for determining that it is possible to write the batch of substantially contiguous disk writes to non-volatile memory;

means for writing the batch of substantially contiguous disks to the non-volatile memory;

means for sending an acknowledgment of writing the batch of substantially contiguous disk writes;

means for receiving an acknowledgement of the write acknowledgement; and

means for erasing the batch of substantially contiguous disc writes,

9. The apparatus of claim 8, wherein the means for receiving a write request further comprises:

10. The apparatus of claim 8, wherein the means for receiving a write request further comprises:

a write request is received from a processing thread.

11. The apparatus of claim 10, wherein the means for receiving a write request further comprises:

12. The apparatus of claim 8, wherein the means for writing the batch of substantially contiguous disk writes to the non-volatile memory further comprises:

13. The apparatus of claim 8, wherein the means for writing the batch of substantially contiguous disk writes to the non-volatile memory further comprises:

14. The apparatus of claim 8, wherein the means for writing the batch of substantially contiguous disk writes to the non-volatile memory further comprises:

15. The apparatus of claim 8, further comprising:

means for determining whether there are outstanding write requests in the temporary storage; and

means for adding an outstanding unprocessed write request to a new batch of substantially contiguous disk writes if the outstanding unprocessed write request is in the temporary storage.

16. A method for storing messages, comprising:

obtaining, by a first thread or process, control of a shared mutex lock;

sending a first write request from a first thread or process to a synchronization process;

queuing the first write request by a synchronization thread or process;

sending an acknowledgement of receipt of the first write request to the first thread or process;

releasing, by the first thread or process, control of the shared mutex lock upon receipt of the acknowledgment of receipt of the first write request;

obtaining, by a second thread or process, control of the shared mutex lock;

sending a second write request from a second thread or process to the synchronous process;

queuing, by the synchronization process, the second write request and the first write request if the first and second write requests are to substantially contiguous memory locations;

sending an acknowledgement of receipt of the second write request to the second thread or process;

releasing, by the second thread or process, control of the shared mutex lock upon receipt of the acknowledgment of receipt of the second write request;

writing the queued first and second write requests to the non-volatile memory;

sending an acknowledgement of the write to the non-volatile memory to each of the first thread or process and the second thread or process;

sending an acknowledgement from each of the first thread or process and the second thread or process of receipt of the write acknowledgement; and

the method is restarted if an acknowledgement of receipt of the write acknowledgement is received from each of the first thread or process and the second thread or process.

17. The method of claim 16, wherein writing the queued first and second write requests to the non-volatile memory comprises:

determining whether a predetermined condition has occurred; and

the queued first and second write requests are written to the non-volatile memory if the predetermined condition has occurred.

18. The method of claim 17, wherein the predetermined condition comprises:

the predetermined period of time is exceeded.

19. The method of claim 17, wherein the predetermined condition comprises:

the size of the queued write requests exceeds a predetermined maximum number of bytes.

20. The method of claim 17, wherein the predetermined condition comprises:

the number of queued write requests exceeds a predetermined maximum number.

21. The method of claim 17, wherein the predetermined condition comprises:

the predetermined maximum time to collect queued write requests has been exceeded.

22. The method of claim 17, wherein the predetermined condition comprises:

the predetermined maximum time since the last receipt of a write request has been exceeded.

23. The method of claim 17, wherein the predetermined condition comprises:

the new write request exceeds the maximum allowed page offset from the queued write request.

24. The method of claim 17, wherein the predetermined condition comprises:

the new write request exceeds the maximum allowed byte offset from the queued write request.

25. An apparatus for storing messages, the apparatus comprising:

means for obtaining control of a shared mutex lock by a first thread or process;

means for sending a first write request from a first thread or process to a synchronous process;

means for queuing the first write request by the synchronous thread or process;

means for sending an acknowledgement of receipt of the first write request to the first thread or process;

means for releasing control of the shared mutex lock by the first thread or process upon receiving an acknowledgement of receipt of the first write request;

means for obtaining control of the shared mutex lock by a second thread or process;

means for sending a second write request from a second thread or process to the synchronous process;

means for queuing, by the synchronization process, the second write request and the first write request if the first and second write requests are to substantially contiguous memory locations;

means for sending an acknowledgement of receipt of the second write request to the second thread or process;

means for releasing control of the shared mutex lock by the second thread or process upon receiving an acknowledgement of receipt of the second write request;

means for writing the queued first and second write requests to the non-volatile memory;

means for sending an acknowledgement of the write to the non-volatile memory to each of the first process and the second process;

means for sending an acknowledgement of receipt of the write acknowledgement from each of the first process and the second process; and

means for restarting the method if an acknowledgement of receipt of the write acknowledgement is received from each of the first process and the second process.

26. The apparatus of claim 25, wherein the means for queuing the first write request comprises:

the first write request is stored in a batch of files.

27. The apparatus of claim 26, wherein the means for queuing the second write request comprises:

the second write request is stored in the batch of files.

28. The apparatus of claim 25, wherein the queued first and second write requests are written upon occurrence of a predetermined condition.

29. The apparatus of claim 28, wherein the predetermined condition comprises:

the predetermined period of time is exceeded.

30. The apparatus of claim 28, wherein the predetermined condition comprises:

31. The apparatus of claim 28, wherein the predetermined condition comprises:

the number of queued write requests exceeds a predetermined maximum number.

32. The apparatus of claim 28, wherein the predetermined condition comprises:

33. The apparatus of claim 28, wherein the predetermined condition comprises:

34. The apparatus of claim 28, wherein the predetermined condition comprises:

35. The apparatus of claim 28, wherein the predetermined condition comprises:

36. An apparatus for storing messages, comprising:

a gateway;

a queue data file coupled to the gateway;

an index file coupled to the gateway; and

the apparatus of claim 8.

37. The apparatus of claim 36, wherein the gateway comprises:

a multimedia message gateway.

38. The apparatus of claim 36, wherein the queue data file and the index file are included in a disk queue.

39. The apparatus of claim 36, wherein the queue data file comprises:

a record containing information sufficient to reconstruct the associated entry for the record in the index file.

40. The apparatus of claim 36, wherein the index file comprises:

an entry associated with a record in the queue data file, the entry comprising a data structure having an index component and an entry information component.

41. The apparatus of claim 36, wherein the gateway is to read the queue data file and the index file to recover a queue processing state that existed prior to a system crash of the apparatus.

42. The apparatus of claim 36, wherein the queue data files are located on a single disk partition.

43. The apparatus of claim 36, wherein the queue data file comprises a plurality of contiguous disk blocks.

44. A method for storing messages, comprising:

receiving a write request;

determining whether to add the write request to a batch of outstanding requests;

if it is determined that the write request is to be added to the batch, adding the write request to the batch;

determining whether to write the batch to non-volatile memory;

if it is determined that the batch is to be written to non-volatile storage, determining whether the batch is memory mapped to non-volatile storage;

writing the batch to non-volatile storage if it is determined that the batch is memory mapped to non-volatile storage;

sending an acknowledgment of writing to the batch to the process associated with each write request in the batch;

determining whether all acknowledgments to write the batch have been received;

if all the acknowledgments to write to the batch have been received, clearing the batch;

determining whether any write requests are skipped or stored in temporary memory; and

if it is determined that there are no skipped or stored write requests, it loops back to receive the next write request.

45. The method of claim 44, further comprising:

if it is determined that outstanding write requests are in the temporary storage, the outstanding write requests are added to a new batch of outstanding requests.

46. The method of claim 44, further comprising:

storing the write request in the temporary memory if it is determined that the write request is to be stored in the temporary memory; and

if it is determined that the write request is to be skipped, a loop is made back to receive the next write request.

47. The method of claim 44, further comprising:

if it is determined that the batch is not to be written to non-volatile storage, then a loop is made back to receive the next write request.

48. The method of claim 44, further comprising:

if it is determined that the batch is not memory mapped to the non-volatile memory, the batch is memory mapped to the non-volatile memory before writing the batch to the non-volatile memory.

49. The method of claim 44, further comprising:

if it is determined that the saved write request is in temporary memory, then it is determined whether to write the saved write request to non-volatile memory.

50. An apparatus for storing messages, the apparatus comprising:

a first device to receive a write request;

second means for determining whether to add the write request to a batch of outstanding requests;

third means for adding the write request to the batch if it is determined that the write request is to be added to the batch;

fourth means for determining whether to write the batch to the non-volatile memory;

fifth means for determining if the batch is memory mapped to non-volatile memory if it is determined that the batch is to be written to non-volatile memory;

sixth means for writing the batch to the non-volatile memory if it is determined that the batch is memory mapped to the non-volatile memory;

seventh means for sending an acknowledgment of writing to the batch to the process associated with each write request in the batch;

eighth means for determining whether all acknowledgments to write the batch have been received;

ninth means for clearing the batch if all acknowledgments to write to the batch have been received;

tenth means for determining whether any write request is skipped or stored in the temporary storage; and

eleventh means for looping back to receive a next write request if it is determined that there are no skipped or stored write requests.

51. The apparatus of claim 50, wherein the apparatus further comprises:

twelfth means for adding the outstanding write requests to a new batch of outstanding requests if it is determined that the outstanding write requests are in the temporary storage.

52. The apparatus of claim 50, wherein the apparatus further comprises:

thirteenth means for storing the write request in the temporary storage if it is determined that the write request is to be stored in the temporary storage; and

fourteenth means for looping back to receive a next write request if it is determined that the write request is to be skipped.

53. The apparatus of claim 50, wherein the apparatus further comprises:

fifteenth means for looping back to receive a next write request if it is determined that the batch is not to be written to non-volatile storage.

54. The apparatus of claim 50, wherein the apparatus further comprises:

sixteenth means for mapping the batch to the non-volatile memory prior to writing the batch to the non-volatile memory if it is determined that the batch is not memory mapped to the non-volatile memory.

55. The apparatus of claim 50, wherein the apparatus further comprises:

seventeenth means for determining whether to write the saved write request to the nonvolatile memory if it is determined that the saved write request is in the temporary memory.