The evolution of pipe buffers
One change which has already been merged is the addition of a set of operations for pipe buffers:
struct pipe_buf_operations {
int can_merge;
void *(*map)(struct file *, struct pipe_inode_info *,
struct pipe_buffer *);
void (*unmap)(struct pipe_inode_info *, struct pipe_buffer *);
void (*release)(struct pipe_inode_info *, struct pipe_buffer *);
};
The can_merge flag addresses one of the issues raised last week: coalescing of writes into existing pages in the buffer. If can_merge is non-zero, coalescing will be performed. Otherwise, each write to a pipe buffer will result in the creation of a new circular buffer entry, and, by default, the allocation of a new page.
The map() and unmap() methods are charged with controlling the visibility of pipe buffer pages in the kernel's virtual address space. The default map() operations for buffers implementing Unix pipes is quite simple:
static void *anon_pipe_buf_map(struct file *file,
struct pipe_inode_info *info,
struct pipe_buffer *buf)
{
return kmap(buf->page);
}
Since the mapping operation has been abstracted out, there are now fewer assumptions regarding how data is really stored within a pipe buffer. This opens the door to different pipe implementations, such as pipes which implement a direct window into device memory.
The release() method should clean things up when the pipe buffer is no longer needed.
Linus has also created an initial implementation of a splice() system call, though this work is clearly not ready for merging at this point. This system call looks like:
long sys_splice(int fdin, int fdout, size_t len, unsigned long flags);
fdin and fdout are two file descriptors; a call to sys_splice() will result in len bytes being copied from fdin to fdout, one of which is expected to be a pipe. The flags argument is not currently used by the sample implementation.
To make sys_splice() work, Linus added two new methods to the ever-expanding file_operations structure:
ssize_t (*splice_write)(struct inode *in_pipe, struct file *out,
size_t len, unsigned long flags);
ssize_t (*splice_read)(struct file *in, struct inode *out_pipe,
size_t len, unsigned long flags);
The patch includes a generic splice_read() implementation suitable for filesystem-backed file descriptors. It simply populates the page cache with some pages from the file, then loads those pages into the pipe buffer represented by out_pipe. Like ordinary read() and write() methods, the splice variants can transfer fewer bytes than requested. Linus's version will stop at the maximum capacity of a pipe buffer - 16 pages, currently.
As Linus acknowledges, there are a number of shortcomings to the current
implementation - it is incomplete, the interfaces are ugly, and it will
oops the system if anything goes wrong. It is, however, an indication of
where he expects this work will lead. Stay tuned.
| Index entries for this article | |
|---|---|
| Kernel | Circular buffers |
| Kernel | Pipes |
| Kernel | splice() |