分享
 
 
 

Linux Kernel Internals(4)--Linux Page Cache

王朝system·作者佚名  2006-11-24
窄屏简体版  字體: |||超大  

Next Previous Contents

--------------------------------------------------------------------------------

4. Linux Page Cache

In this chapter we describe the Linux 2.4 pagecache. The pagecache is - as the name suggests - a cache of physical pages. In the UNIX world the concept of a pagecache became popular with the introduction of SVR4 UNIX, where it replaced the buffercache for data IO operations.

While the SVR4 pagecache is only used for filesystem data cache and thus uses the struct vnode and an offset into the file as hash parameters, the Linux page cache is designed to be more generic, and therefore uses a struct address_space (explained below) as first parameter. Because the Linux pagecache is tightly coupled to the notation of address spaces, you will need at least a basic understanding of adress_spaces to understand the way the pagecache works. An address_space is some kind of software MMU that maps all pages of one object (e.g. inode) to an other concurrency (typically physical disk blocks). The struct address_space is defined in include/linux/fs.h as:

--------------------------------------------------------------------------------

struct address_space {

struct list_head pages;

unsigned long nrpages;

struct address_space_operations * a_ops;

void * host;

struct vm_area_struct * i_mmap;

struct vm_area_struct * i_mmap_shared;

spinlock_t i_shared_lock;

};

--------------------------------------------------------------------------------

To understand the way address_spaces works, we only need to look at a few of these fields: pages is a double linked list of all pages that belong to this address_space, nrpages is the number of pages in pages, a_ops defines the methods of this address_space and host is a opaque pointer to the object this address_space belongs to. The usage of pages and nrpages is obvious, so we will take a tighter look at the address_space_operations structure, defined in the same header:

--------------------------------------------------------------------------------

struct address_space_operations {

int (*writepage)(struct page *);

int (*readpage)(struct file *, struct page *);

int (*sync_page)(struct page *);

int (*prepare_write)(struct file *,

struct page *, unsigned, unsigned);

int (*commit_write)(struct file *,

struct page *, unsigned, unsigned);

int (*bmap)(struct address_space *, long);

};

--------------------------------------------------------------------------------

For a basic view at the principle of address_spaces (and the pagecache) we need to take a look at ->writepage and ->readpage, but in practice we need to take a look at ->prepare_write and ->commit_write, too.

You can probably guess what the address_space_operations methods do by virtue of their names alone; nevertheless, they do require some explanation. Their use in the course of filesystem data I/O, by far the most common path through the pagecache, provides a good way of understanding them. Unlike most other UNIX-like operating systems, Linux has generic file operations (a subset of the SYSVish vnode operations) for data IO through the pagecache. This means that the data will not directly interact with the file- system on read/write/mmap, but will be read/written from/to the pagecache whenever possible. The pagecache has to get data from the actual low-level filesystem in case the user wants to read from a page not yet in memory, or write data to disk in case memory gets low.

In the read path the generic methods will first try to find a page that matches the wanted inode/index tuple.

hash = page_hash(inode->i_mapping, index);

Then we test whether the page actually exists.

hash = page_hash(inode->i_mapping, index); page = __find_page_nolock(inode->i_mapping, index, *hash);

When it does not exist, we allocate a new free page, and add it to the page- cache hash.

page = page_cache_alloc(); __add_to_page_cache(page, mapping, index, hash);

After the page is hashed we use the ->readpage address_space operation to actually fill the page with data. (file is an open instance of inode).

error = mapping->a_ops->readpage(file, page);

Finally we can copy the data to userspace.

For writing to the filesystem two pathes exist: one for writable mappings (mmap) and one for the write(2) family of syscalls. The mmap case is very simple, so it will be discussed first. When a user modifies mappings, the VM subsystem marks the page dirty.

SetPageDirty(page);

The bdflush kernel thread that is trying to free pages, either as background activity or because memory gets low will try to call ->writepage on the pages that are explicitly marked dirty. The ->writepage method does now have to write the pages content back to disk and free the page.

The second write path is _much_ more complicated. For each page the user writes to, we are basically doing the following: (for the full code see mm/filemap.c:generic_file_write()).

page = __grab_cache_page(mapping, index, &cached_page); mapping->a_ops->prepare_write(file, page, offset, offset+bytes); copy_from_user(kaddr+offset, buf, bytes); mapping->a_ops->commit_write(file, page, offset, offset+bytes);

So first we try to find the hashed page or allocate a new one, then we call the ->prepare_write address_space method, copy the user buffer to kernel memory and finally call the ->commit_write method. As you probably have seen ->prepare_write and ->commit_write are fundamentally different from ->readpage and ->writepage, because they are not only called when physical IO is actually wanted but everytime the user modifies the file. There are two (or more?) ways to handle this, the first one uses the Linux buffercache to delay the physical IO, by filling a page->buffers pointer with buffer_heads, that will be used in try_to_free_buffers (fs/buffers.c) to request IO once memory gets low, and is used very widespread in the current kernel. The other way just sets the page dirty and relies on ->writepage to do all the work. Due to the lack of a validitity bitmap in struct page this does not work with filesystem that have a smaller granuality then PAGE_SIZE.

--------------------------------------------------------------------------------

Next Previous Contents

 
 
 
免责声明:本文为网络用户发布,其观点仅代表作者个人观点,与本站无关,本站仅提供信息存储服务。文中陈述内容未经本站证实,其真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。
2023年上半年GDP全球前十五强
 百态   2023-10-24
美众议院议长启动对拜登的弹劾调查
 百态   2023-09-13
上海、济南、武汉等多地出现不明坠落物
 探索   2023-09-06
印度或要将国名改为“巴拉特”
 百态   2023-09-06
男子为女友送行,买票不登机被捕
 百态   2023-08-20
手机地震预警功能怎么开?
 干货   2023-08-06
女子4年卖2套房花700多万做美容:不但没变美脸,面部还出现变形
 百态   2023-08-04
住户一楼被水淹 还冲来8头猪
 百态   2023-07-31
女子体内爬出大量瓜子状活虫
 百态   2023-07-25
地球连续35年收到神秘规律性信号,网友:不要回答!
 探索   2023-07-21
全球镓价格本周大涨27%
 探索   2023-07-09
钱都流向了那些不缺钱的人,苦都留给了能吃苦的人
 探索   2023-07-02
倩女手游刀客魅者强控制(强混乱强眩晕强睡眠)和对应控制抗性的关系
 百态   2020-08-20
美国5月9日最新疫情:美国确诊人数突破131万
 百态   2020-05-09
荷兰政府宣布将集体辞职
 干货   2020-04-30
倩女幽魂手游师徒任务情义春秋猜成语答案逍遥观:鹏程万里
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案神机营:射石饮羽
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案昆仑山:拔刀相助
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案天工阁:鬼斧神工
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案丝路古道:单枪匹马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:与虎谋皮
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:李代桃僵
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:指鹿为马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:小鸟依人
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:千金买邻
 干货   2019-11-12
 
推荐阅读
 
 
 
>>返回首頁<<
 
靜靜地坐在廢墟上,四周的荒凉一望無際,忽然覺得,淒涼也很美
© 2005- 王朝網路 版權所有